(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I

Body – Language – Communication
HSK 38.1
Handbücher zur
Sprach- und Kommunikations-
wissenschaft
Handbooks of Linguistics
and Communication Science
Manuels de linguistique et
des sciences de communication
Mitbegründet von Gerold Ungeheuer (†)

Mitherausgegeben 1985−2001 von Hugo Steger
Herausgegeben von / Edited by / Edités par

Herbert Ernst Wiegand
Band 38.1
De Gruyter Mouton
Body – Language –
Communication
An International Handbook on
Multimodality in Human Interaction
Edited by
Cornelia Müller
Alan Cienki
Ellen Fricke
Silva H. Ladewig
David McNeill
Sedinha Teßendorf
Volume 1
De Gruyter Mouton
ISBN 978-3-11-020962-4
e-ISBN 978-3-11-026131-8
ISSN 1861-5090
Library of Congress Cataloging-in-Publication Data

A CIP catalog record for this book has been applied for at the Library of Congress
Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.dnb.de.
© 2013 Walter de Gruyter GmbH, Berlin/Boston
Cover design: Martin Zech, Bremen
Typesetting: Apex CoVantage
Printing: Hubert & Co. GmbH & Co. KG, Göttingen
s
⬁ Printed on acid-free paper
Printed in Germany
www.degruyter.com
Contents
Volume 1
Introduction Cornelia Müller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
I. How the body relates to language and communication:

Outlining the subject matter
1. Exploring the utterance roles of visible bodily action:
A personal account Adam Kendon. . . . . . . . . . . . . . . . . . . . . . . . . . ... 7
2. Gesture as a window onto mind and brain, and the relationship to
linguistic relativity and ontogenesis David McNeill . . . . . . . . . . . . . . . . 28
3. Gestures and speech from a linguistic perspective: A new field
and its history Cornelia Müller, Silva H. Ladewig and Jana Bressem. . . 55
4. Emblems, quotable gestures, or conventionalized body
movements Sedinha Teßendorf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5. Framing, grounding, and coordinating conversational interaction:
Posture, gaze, facial expression, and movement in
space Mardi Kidwell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6. Homesign: When gesture is called upon to be
language Susan Goldin-Meadow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7. Speech, sign, and gesture Sherman Wilcox . . . . . . . . . . . . . . . . . . . . . . 125
II. Perspectives from different disciplines

8. The growth point hypothesis of language and gesture as a dynamic
and integrated system David McNeill . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9. Psycholinguistics of speech and gesture: Production, comprehension,
architecture Pierre Feyereisen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10. Neuropsychology of gesture production Hedda Lausberg . . . . . . . . . . . 168
11. Cognitive Linguistics: Spoken language and gesture as expressions
of conceptualization Alan Cienki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
12. Gestures as a medium of expression: The linguistic potential
of gestures Cornelia Müller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
13. Conversation analysis: Talk and bodily resources for the organization
of social interaction Lorenza Mondada . . . . . . . . . . . . . . . . . . . . . . . . . 218
14. Ethnography: Body, communication, and cultural
practices Christian Meyer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
15. Cognitive Anthropology: Distributed cognition
and gesture Robert F. Williams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
16. Social psychology: Body and language in social
interaction Marino Bonaiuto and Fridanna Maricchiolo . . . . . . . . . . . . 258
vi Contents
17. Multimodal (inter)action analysis: An integrative

methodology Sigrid Norris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
18. Body gestures, manners, and postures in
literature Fernando Poyatos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
III. Historical dimensions

19. Prehistoric gestures: Evidence from artifacts
and rock art Paul Bouissac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
20. Indian traditions: A grammar of gestures in classical dance
and dance theatre Rajyashree Ramesh . . . . . . . . . . . . . . . . . . . . . . . . . 306
21. Jewish traditions: Active gestural practices in religious
life Roman Katsman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
22. The body in rhetorical delivery and in theater: An overview of
classical works Dorota Dutsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
23. Medieval perspectives in Europe: Oral culture and bodily
practices Dmitri Zakharine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
24. Renaissance philosophy: Gesture as universal
language Jeffrey Wollock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
25. Enlightenment philosophy: Gestures, language, and the origin
of human understanding Mary M. Copple . . . . . . . . . . . . . . . . . . . . . . 378
26. 20th century: Empirical research of body, language,
and communication Jana Bressem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
27. Language – gesture – code: Patterns of movement in artistic dance
from the Baroque until today Susanne Foellmer . . . . . . . . . . . . . . . . . . 416
28. Communicating with dance: A historiography of aesthetic
and anthropological reflections on the relation between dance,
language, and representation Yvonne Hardt . . . . . . . . . . . . . . . . . . . . . 427
29. Mimesis: The history of a notion Gunter Gebauer
and Christoph Wulf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
IV. Contemporary approaches

30. Mirror systems and the neurocognitive substrates of bodily
communication and language Michael A. Arbib . . . . . . . . . . . . . . . . . . 451
31. Gesture as precursor to speech in evolution Michael C. Corballis . . . . . 466
32. The co-evolution of gesture and speech, and downstream
consequences David McNeill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
33. Sensorimotor simulation in speaking, gesturing,
and understanding Marcus Perlman and Raymond W. Gibbs . . . . . . . . 512
34. Levels of embodiment and communication Jordan Zlatev . . . . . . . . . . . 533
35. Body and speech as expression of inner states Eva Krumhuber,
Susanne Kaiser, Kappas Arvid and Klaus R. Scherer . . . . . . . . . . . . . . . 551
36. Fused Bodies: On the interrelatedness of cognition
and interaction Anders R. Hougaard and Gitte Rasmussen . . . . . . . . . 564
37. Multimodal interaction Lorenza Mondada . . . . . . . . . . . . . . . . . . . . . . 577
Contents vii
38. Verbal, vocal, and visual practices in conversational

interaction Margret Selting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
39. The codes and functions of nonverbal communication Judee K.
Burgoon, Laura K. Guerrero and Cindy H. White . . . . . . . . . . . . . . . . . 609
40. Mind, hands, face, and body: A sketch of a goal and belief
view of multimodal communication Isabella Poggi . . . . . . . . . . . . . . . . 627
41. Nonverbal communication in a functional pragmatic
perspective Konrad Ehlich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
42. Elements of meaning in gesture: The analogical links Geneviève Calbris. . . 658
43. Praxeology of gesture Jürgen Streeck . . . . . . . . . . . . . . . . . . . . . . . . . . 674
44. A “Composite Utterances” approach to meaning N. J. Enfield . . . . . . . 689
45. Towards a grammar of gestures: A form-based
view Cornelia Müller, Jana Bressem and Silva H. Ladewig . . . . . . . . . . 707
46. Towards a unified grammar of gesture and speech: A multimodal
approach Ellen Fricke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
47. The exbodied mind: Cognitive-semiotic principles as motivating
forces in gesture Irene Mittelberg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
48. Articulation as gesture: Gesture and the nature of
language Sherman Wilcox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
49. How our gestures help us learn Susan Goldin-Meadow. . . . . . . . . . . . . 792
50. Coverbal gestures: Between communication and speech
production Uri Hadar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
51. The social interactive nature of gestures: Theory, assumptions,
methods, and findings Jennifer Gerwing and Janet Bavelas . . . . . . . . . . 821
V. Methods
52. Experimental methods in co-speech gesture research Judith Holler. . . . 837
53. Documentation of gestures with motion capture Thies Pfeiffer . . . . . . . 857
54. Documentation of gestures with data gloves Thies Pfeiffer . . . . . . . . . . 868
55. Reliability and validity of coding systems for bodily
forms of communication Augusto Gnisci, Fridanna Maricchiolo
and Marino Bonaiuto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
56. Sequential notation and analysis for bodily forms of
communication Augusto Gnisci, Roger Bakeman
and Fridanna Maricchiolo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892
57. Decoding bodily forms of communication Fridanna Maricchiolo,
Angiola Di Conza, Augusto Gnisci and Marino Bonaiuto . . . . . . . . . . . 904
58. Analysing facial expression using the facial action coding system
(FACS) Bridget M. Waller and Marcia Smith Pasqualini. . . . . . . . . . . . 917
59. Coding psychopathology in movement behavior: The movement
psychodiagnostic inventory Martha Davis . . . . . . . . . . . . . . . . . . . . . . . 932
60. Laban based analysis and notation of body
movement Antja Kennedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
61. Kestenberg movement analysis Sabine C. Koch and K. Mark Sossin . . . 958
62. Doing fieldwork on the body, language,
and communication N. J. Enfield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
viii Contents
63. Video as a tool in the social sciences Lorenza Mondada . . . . . . . . . . . . 982

64. Approaching notation, coding, and analysis from a conversational
analysis point of view Ulrike Bohle . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
65. Transcribing gesture with speech Susan Duncan . . . . . . . . . . . . . . . . . 1007
66. Multimodal annotation tools Susan Duncan, Katharina Rohlfing
and Dan Loehr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015
67. NEUROGES – A coding system for the empirical analysis of hand
movement behaviour as a reflection of cognitive, emotional,
and interactive processes Hedda Lausberg . . . . . . . . . . . . . . . . . . . . . 1022
68. Transcription systems for gestures, speech, prosody, postures,
and gaze Jana Bressem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
69. A linguistic perspective on the notation of gesture
phases Silva H. Ladewig and Jana Bressem . . . . . . . . . . . . . . . . . . . . 1060
70. A linguistic perspective on the notation of form features in
gestures Jana Bressem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079
71. Linguistic Annotation System for Gestures (LASG) Jana Bressem,
Silva H. Ladewig and Cornelia Müller . . . . . . . . . . . . . . . . . . . . . . . . . 1098
72. Transcription systems for sign languages: A sketch of the different
graphical representations of sign language and their
characteristics Brigitte Garcia and Marie-Anne Sallandre . . . . . . . . . . 1125
Introduction
1. Why a handbook on body, language, and communication?

The handbook offers an encompassing account of the current state of the art in an
emerging and highly interdisciplinary field. Given its scope and size, the book has the
character of an encyclopedia. It introduces fundamental concepts, theories, empirical
methods and a documentation of what is known about forms and functions of the
body as a modality that goes hand in hand with speech in face-to-face communication.
Why do we need a handbook on the body in relation to language and communica-
tion? Do we need one, at all? We think that yes indeed, the time is ripe to direct schol-
arly attention to the very nucleus of human communication: the face-to-face situation of
communication. Whenever we speak with each other it is not only through words;
bodily movements are always involved and they are so closely intertwined with lan-
guage that they sometimes become part and parts of language or even become language
themselves – as is the case in sign languages all around the world. Face-to-face commu-
nication is by nature multi-modal; it is the nucleus of communication and it is here
where language in onto- and phylogenesis emerges. It is here where the “modern
mind” evolves and where intersubjectivity appears on the evolutionary stage (Donald
1993; Tomasello 2000). Other forms of communication such as writing or talking on
the phone are ultimately derived from this communicative practice. This is one reason
for devoting a handbook to these primary forms of interpersonal communication.
Yet this is not the only reason why the relation of the body to language and commu-
nication has become a focus of interest in a variety of disciplines such as: linguistics, psy-
chology, cognitive science, anthropology, sociology, semiotics, literature, computing and
engineering. The main – albeit mostly not explicitly recognized – triggering force is the
contemporary availability of the microscopes of face-to-face communication: film and
video. Video technology, being affordable and even more common nowadays than
audio recording, has turned into the default medium of documenting face-to-face com-
munication, and it is the specifics of this instrument that have literally created new inter-
ests. More and more scholars from neurology to linguistics have realized that speakers
use their bodies, their hands, arms, and faces when they speak, and they are becoming
aware of this because they videorecord communication rather than just capturing the
audio portion. It is with this discovery that new questions arise which constitute the
focus of this handbook: what do these movements mean, how do we analyze them,
and how do we classify and annotate them? Body – Language – Communication aims
at bringing together the available knowledge to answer these pertinent questions.
Thus it is ultimately the “microscope” of video and film that has triggered the sudden
increase of scholarly attention, from a broad range of disciplines, to particular facts –
facts that before the availability of an “objective” documentation of verbal and bodily
forms of communication in real time conditions and natural contexts of use were simply
not recognized as pertinent features of language and communication at all – not even in
conversation and discourse analysis. This “microscope” is the prerequisite for empirically
grounded, scientific research on the body in relation to language and communication.
2 Introduction
One consequence of this development is the sudden increase in interest in the bodily
aspects of language and communication, a phenomenon that is apparent from the esca-
lation in the number of research projects from fields such as artificial intelligence to
media studies and conversation analysis on topics which are being subsumed under
the term “multimodality”.
The handbook offers a perspective on the body as “part” and “partner” of language
and communication. In this way it contributes to some of the current key issues of the
humanities and the natural sciences: the multimodality of language and communication,
and the notion of embodiment as a resource for meaning-making and conceptualization
in language and communication. It overcomes the longstanding dichotomy represented
in the concepts of verbal and nonverbal communication, and promotes an incorporation
of the body as integral part of language and communication. With this perspective, the
handbook documents the bodily and embodied nature of language as such. We should
underline that nonverbal communication studies are products of a fundamentally differ-
ent concept of how bodily and linguistic forms of communication cooperate in commu-
nication. Nonverbal communication studies focus on the social-psychological
dimensions of bodily communication and have basically separated the body from lan-
guage. Informed by Watzlawick’s dichotomy of analogic and digital forms of communi-
cation and the functional attributions of “social-relation” versus “linguistic content”,
nonverbal communication research has devoted most of its interest to research on social
and affective facets of bodily forms of communication (Watzlawick, Bavelas and Jack-
son 1967). The claim in this approach is that the verbal part of the message is what car-
ries content, while the nonverbal part does not, it only conveys affective and social
meaning. This theory has inspired highly important strands of research, among them:
the very rich field of studying facial expressions of affect and emotion (Ekman and Da-
vidson 1994; Ekman and Richardson 1997), and in this context, the study of forms of
deceit and nonverbal leakage; movement analysis as a measure for psychic integration
and disintegration (Davis 1970; Davis and Hadiks 1990; Lausberg, von Wintersheim and
Hubert 1996; Lausberg 1998); and fields of study concerning issues of gender, culture,
and social status. It is not by accident that the analysis of hand movements or gestures
plays a minor role in nonverbal communication studies. Gestures were recognized as
being not non-verbal enough to be considered of interest for nonverbal communication
research (see the debate on the “verbal” or “non-verbal” status of gesture in Psycho-
logical Review Butterworth and Hadar 1989; Feyereisen 1987; McNeill 1985, 1987,
1989). Indeed it is the close integration of gestures with speech (Beattie 2003; Cienki
1998, 2005; Cienki and Müller 2008a, 2008b; Duncan, Cassell and Levy 2007; Fricke
2007, 2012; Kendon 1972, 1980, 2004; McNeill, 1992, 2005; Müller 1998; Müller 2009)
that has made those forms of bodily movements less interesting for the research con-
ducted in the spirit of nonverbal communication. And it is precisely this that makes ges-
tures such an interesting topic for students of language proper.
An obvious consequence of the particular orientation of nonverbal communication
studies was that relatively little was known about human gesticulation and its integra-
tion with language and communication until very recently. Only when the humanities
shifted more significantly towards cognitive science in the 1970s and 80s did gestures
begin very slowly to attract the interest of linguists, psychologists and anthropologists.
The grounds were laid early on with the pioneering writings of Adam Kendon and
David McNeill; they served as a basis on which a steadily increasing community of
Introduction 3
scholars from various disciplines could build their research in the 90s on the hand move-
ments that people make when they talk. Since then a field of gesture studies has
emerged with its own journal, book series, society, and biennial international confer-
ences. The research carried out on human and non-human forms and uses of gestures
will be widely documented in this handbook. Hand-gestures are the “articulators” that
are closest to vocal language: they contribute to all levels of meaning, and they are syn-
tactically, pragmatically, and semantically integrated with speech, forming in Adam
Kendon’s terms gesture-speech ensembles, and constituting in David McNeill’s terms
the imagistic part of language, playing a crucial role in the cognitive processes of think-
ing for speaking. And as we have mentioned already, it is the hand movements that
under certain circumstances may turn into a full-fledged language. Note that this is
not true for the face, the torso, or the legs. The hands are the primary articulators
along with our vocal tracts that can become articulators of language. Despite the impor-
tance of gestures, however, we will underline in the handbook the fact that it is not only
the hands plus vocal tract which are used to communicate: we will highlight the integra-
tion of other concomitant forms of visible action as well, such as the face, gaze, posture,
and body movement and orientation. With this orientation, the handbook seeks to over-
come Watzlawick’s dichotomy that has blindfolded the close cooperation of visible and
audible forms of communication.
2. General statement of goals

The handbook gives an overview of the scope of the wide interdisciplinary field of
research that addresses the relation of the body to language and communication. It
gives an overview of historical as well as contemporary approaches, presents a variety
of currently proposed – sometimes competing – theoretical frameworks, introduces fun-
damental concepts, and offers an overview of core controversial issues under scholarly
scrutiny. It thus offers a unique tool for experienced scholars as well as for novices,
while on the one hand introducing the pertinent theoretical issues under debate from
the perspective of various disciplines and on the other hand documenting varying meth-
odological approaches which naturally come with the different disciplines involved in
this kind of research. The handbook thus offers for the first time a truly interdisciplinary
perspective on one of the most vital topics in the humanities and the natural sciences:
the multimodality of human (and non-human) communication. It includes an overview of
established methodological procedures for the study of body, language, and communica-
tion, including both qualitative and quantitative procedures, and it presents a systematic
account of what is known regarding the structures, categories, and functions of gesture,
posture, touch, gaze, facial expression, and movement in space. In addition, the handbook
covers a wide variety of specific topics and phenomena without giving preference to one
specific approach, it aims at providing a non-biased interdisciplinary perspective, allowing
for the coexistence of competing theoretical and methodological approaches.
As a consequence of this interdisciplinary scope, the handbook also addresses
scholars from a variety of different fields, including linguistics and communication as
well as the cognitive sciences, psychology, neurology, and semiotics in particular but
also anthropology, sociology, literature, history, computing and engineering, and all
the disciplines that share the interest in bodily forms of language and communication.
4 Introduction
To ensure cross-disciplinary transparency, the articles in this handbook are written and
conceptualized for an interdisciplinary audience.
The handbook may serve both as a resource for specific questions as well as for gain-
ing an overview of specific topics, problems, and questions discussed in the field. It may
serve both as guideline and orientation for anyone interested in this new field of
scientific interest.
3. Structure of the book

The central idea of the book is the integration of visible bodily movements with lan-
guage as used in face-to-face communication (including distance communication of
the new audio-visual media). The term “body” is used to refer to visible bodily move-
ments and the handbook documents the various dimensions of how the body relates to
language and communication. Body movements are inherently intertwined with lan-
guage and communication: they carry a potential for development into linguistic
signs, such as in sign languages, but they are different since they constitute an integrated
ensemble with vocal language. In sign languages all the functions have to be fulfilled by
the visual articulators and this has systematic consequences for the bodily signs and
their interrelation. Sign languages and sign linguistics have developed into a major
field on their own and this handbook will close the gap between research on vocal lan-
guages and that on signed languages by focussing on visible movements of the hearing
that are used in conjunction with speech.
This fundamental idea inspires the structure of Body – Language – Communication.
As a consequence, articles addressing bodily signs which are “close” to language such as
gestures that are integrated with language and communication (be they spontaneous
creations or conventionalized “gesture” words), play a central role. On the other
hand, the handbook also includes bodily movements that are less clearly tied with lin-
guistic forms of communication such as bodily movements in dance, or bodily move-
ments as symptoms in clinical diagnoses. In its core the handbook addresses the
“multimodality of language and communication”.
The handbook Body – Language – Communication is divided into two volumes. The
first volume offers the theoretical, notional, and methodological grounds of the field.
The second one documents what we know about forms, functions, types of bodily move-
ments, and their cross-cultural distribution, and it offers space for the presentation of a
range of specific perspectives on the body in relation to communication. The handbook
entails chapters on cultures, contexts and interactions, embodiment, cognition and emo-
tion, and it closes with a chapter on visible body movement as sign language. In an
Appendix, a list of relevant organizations, links, reference publications, and periodicals
are provided.
Volume I contains 5 chapters with a total of 72 articles. The first chapter of volume I
outlines the subject matter of the book. It begins with the two pioneers of contemporary
gesture studies: Adam Kendon and David McNeill outlining their respective approaches
to the study of gestures with speech. The chapter then proceeds with an overview of
research on gestures and speech from a linguistic point of view, an documentation of
conventionalized gestures, so-called emblems, then extends the scope to how all the
other body parts contribute to conversation and ends with two chapters that document
hand-movements in two types of manual signed languages: home signs, e.g. sign systems
Introduction 5
evolving within one family. Here a particular emphasis is put on how gestures relate to
signs in signed languages.
Chapter two outlines perspectives on the relation of the body to language and commu-
nication from the perspective of various different disciplines. Multimodal communication
has raised the interest of a wide range of disciplines, and this chapter is giving accounts
from: Psychology of Language, Psycholinguistics, Neuropsychology, Cognitive Linguis-
tics, Linguistics, Conversation Analysis, Ethnography, Cognitive Anthropology, Social
Psychology, Multimodal Interaction, and Literature.
Chapter three presents a documentation of historical and cross-cultural dimensions
of research regarding the relation of body movements to language and speech. Starting
from prehistoric gestures, Indian traditions of a grammar of gestures in dance, Jewish
traditions and their active gestural practices in religious life, it moves on to European
scholarly treatments. It further includes articles on medieval practices of the body, on
Renaissance ideas on gestures as universal language, on enlightenment philosophy
and the debate around gestures, language, and the origin of human understanding
and it ends with a sketch of 19th and 20th century research of body, language, commu-
nication. The historical considerations of body movements as communication are con-
cluded with contributions from arts and philosophy – dance and the history of the
notion of mimesis.
Chapter four offers an encompassing collection of contemporary approaches of how
the relation between body motion and language in communication should be conceived.
Notably, each author outlines his or her particular view on this subject matter and we
are presenting here views of eminent and senior scholars as well as perspectives ad-
vanced by junior colleagues. These articles present theories or approaches to the
body in communication in a nutshell. Topics range from mirror systems and gestures
as precursor of speech in evolution to the social interactive nature of gestures. Chapter
five finally provides a valuable collection of methods for the analysis of multimodal
communication. Again methods included here cover a wide range of disciplines includ-
ing quantitative as well as qualitative takes on the analysis of body movement used with
and without speech.
4. References
Beattie, Geoffrey 2003. Visible Thought: The New Psychology of Body Language. London:
Routledge.
Butterworth, Brian and Uri Hadar 1989. Gesture speech and computational stages: A reply to
McNeill. Psychological Review 96(1): 168–174.
Cienki, Alan 1998. Metaphoric gestures and some of their relations to verbal metaphorical expres-
sions. In: Jean-Pierre Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: Center for the Study of Language and Information.
Cienki, Alan 2005. Metaphor in the “Strict Father” and “Nurturant Parent” cognitive models:
Theoretical issues raised in an empirical study. Cognitive Linguistics 16(2): 279–312.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John
Benjamins.
Cienki, Alan and Cornelia Müller (2008b). Metaphor, gesture, and thought. In: Raymond W.
Gibbs, Jr. (ed.), The Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge:
Cambridge University Press.
6 Introduction
Davis, Martha 1970. Movement characteristics of hospitalized psychiatric patients. In: Claire
Schmais (ed.) Proceedings of the Fifth Annual Conference of the American Dance Therapy
Association, 25–45. Columbia: The Association.
Davis, Martha and D. Hadiks 1990. Nonverbal behavior and client state changes during psy-
chotherapy. Journal of Clinical Psychology 46(3): 340–351.
Donald, Merlin 1993. Origins of the Modern Mind. Cambridge, MA: Harvard University Press.
Duncan, Susan, Justine Cassell and Elena Levy (eds.) 2007. Gesture and the Dynamic Dimension
of Language. Amsterdam: John Benjamins.
Ekman, Paul and Richard J. Davidson (eds.) 1994. The Nature of Emotion. Oxford: Oxford
University Press.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals. Basic and Applied Studies
of Spontaneous Expression using the Facial Action Coding System (FACS). Oxford: Oxford
University Press.
Feyereisen, Pierre 1987. Gestures and speech, interactions and separations: A reply to McNeill.
Psychological Review 94(4): 493–498.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: de Gruyter.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin:
De Gruyter Mouton.
Kendon, Adam 1972. Some relationships between body motion and speech: An analysis of an
example. In: Aaron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–210. New York: Pergamon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Lausberg, Hedda 1998. Does movement behavior have differential diagnostic potential? Ameri-
can Journal of Dance Therapy 20(2): 85–99.
Lausberg, Hedda, Jörn von Wietersheim and Feiereis Hubert 1996. Movement behaviour of
patients with eating disorders and inflammatory bowel disease. A controlled study. Psychother-
apy and Psychosomatics 65(6): 272–276.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1987. So you do think gestures are nonverbal! A reply to Feyereisen. Psycholog-
ical Review 94(4): 499–504.
McNeill, David 1989. A straight path – to where? Reply to Butterworth and Hadar. Psychological
Review 96(1): 175–179.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Müller, Cornelia 2008. Metaphors. Dead and Alive, Sleeping and Waking. A Dynamic View.
Chicago: University of Chicago Press.
Müller, Cornelia 2009. Gesture and Language. In Kirsten Malmkjaer (ed.) Routledge’s Linguistics
Encyclopedia. 214–217. Abington/New York: Routledge.
Tomasello, Michael 2000. The Cultural Origins of Human Cognition. Cambridge, MA: Harvard
University Press.
Watzlawick Paul, Janet H. Beavin Bavelas and Don D. Jackson 1967. Pragmatics of Human
Communication. A Study of Interactional Patterns, Pathologies and Paradoxies. New York:
W.W. Norton.
Cornelia Müller, Frankfurt (Oder) (Germany)

I. How the body relates to language and
communication: Outlining the subject matter
1. Exploring the utterance roles of visible bodily

action: A personal account
1. Utterance visible action as a domain of inquiry
2. Temporal co-ordination between speech and hand, arm and head actions
3. The semantics of utterance visible actions in relation to the semantics of verbal expression
4. When utterance visible action is the main utterance vehicle
5. Broader implications
6. References
Abstract
In this essay, I offer a survey of the main questions with which I have been engaged in
regard to “gesture,” or, as I prefer to call it, and as will be explained below, “utterance
visible action.” In doing so, I hope to make clear the approach I have employed over
a long period of time in which, to put it in the most general terms, visible bodily actions
used in the service of utterance are seen as a resource which can be used in many dif-
ferent ways and from which many different forms of expression can be fashioned, de-
pending upon the circumstances of use, the communicative purposes for which they are
intended, and how they may be used in relation to other media of expression that may
be available. Accordingly, I have sought to describe the diverse ways in which utterance
visible actions may be employed, their semiotic properties, and how they work as
components of utterance in relation to the other components that may also be being
employed.
1. Utterance visible action as a domain of inquiry

As Goffman (1963) pointed out, humans, when in co-presence and by means of visible
bodily action, continually provide each other with information about their intentions,
interests, feelings and ideas, whether they wish to do so or not. Within a gathering,
the pattern of positions, spacing, and directions of gaze of the participants provide
much information about who is engaged with whom, the nature of those engagements,
and the level and nature of their involvement in the situation. Activities directed
toward objects or features of the environment provide information about a person’s
aims, goals, and interests. There are also actions that are deemed to be expressive, how-
ever. Thus, by how people approach each other or withdraw, by patterns of action in
the face, and with actions of their forelimbs, they show each other affection, disdain,
indifference, concern, gratitude; they challenge or threaten one another; they submit,
comply, or defy one another, or they show fear, joy, and so on.
8 I. How the body relates to language and communication
Visible bodily action may also serve as a means of discourse, however. Either by
itself, or in collaboration with speaking, visible bodily actions can be used as a means
of saying something. For example, one draws attention to something by pointing at
it, one may employ one’s hands to describe the appearance of something or to suggest
the form of a process or the structure of an idea. By means of visible bodily action one
can show that one is asking a question, pleading for an answer, is in disagreement, and a
host of other things, specific to the current linguistically managed interchange. There
are forms of visible bodily actions that can serve instead of words, and in some circum-
stances entire language systems are developed using only visible action. In short, there
are many different ways in which visible bodily action may be employed to accomplish
expressions that have semantic and pragmatic import similar to, or overlapping with,
the semantic and pragmatic import of spoken utterances.
This constitutes the utterance uses of visible bodily action. It is this that I shall call
utterance visible action, and it corresponds to what is often referred to by the word
“gesture.” However, because “gesture” is also sometimes used more widely to refer
any kind of purposive action, for example the component actions of practical action
sequences, or actions that may have symptomatic significance, such as self-touchings,
patting the hair, fiddling with a wedding ring, rubbing the back of the head, and the
like, because it is also used as a way of referring to the expressive significance of any
sort of action (for example, saying that sending flowers to someone is a “gesture of
affection”), and because, too, in some contexts the word “gesture” carries evaluative
implications not always positive, it seems better to find a new and more specific
term. I also think that doing so invites the undertaking, without prejudice, of a compar-
ative semiotic analysis of all of the different ways in which visible bodily action can
enter into the creation of utterances (Kendon 2008, 2010).
By “utterance” I mean any action or ensemble of actions that may be employed to
provide expression to something that is deemed by participants to be something that the
actor meant to express, that was expressed wilfully, that is. Goffman’s distinction
between information that is “given” and information that is “given off ” is helpful in
clarifying this (see Goffman 1963: 13–14). As he says, everything we do all of the
time “gives off ” information about our intentions, interests, attitudes, and the like.
However, some kinds of actions are taken to have been done with the intent to express
something, whether by words alone, by words combined with actions, or by visible
actions alone (as in sign languages). These actions are taken to “give” information,
they express what the person “meant” and the actor can be called to account for
them. Actions treated by co-participants in this manner are the actions of utterance,
and we may establish a domain of concern that attends to the different ways in
which visible bodily action can serve as utterance action, and how it may do so (see
Kendon 1978, 1981, 1990: Chapter 8; and, especially, Kendon 2004: Chapters 1–2).
It is important to stress that this domain cannot be established with sharp boundaries
nor can rigid criteria be established according to which an action is or is not admitted as
an utterance visible action. There is a core of actions about which there seems to be
widespread agreement that they comprise utterance visible actions. This includes, wav-
ing, pointing, the use of symbolic gestures of any kind, manual actions made while
speaking (“gesticulation”), as well actions performed in the course of creating utter-
ances in sign language. There are always forms of action whose status is ambiguous,
however. If we compare actions that tend to be accepted as being done with what we
1. Exploring the utterance roles of visible bodily action: A personal account 9
might call “semantic intent” with those that are not so regarded, we may discover a set
of features which actions may have less of or more of. The less they have of these fea-
tures, the more likely they are to be disregarded, not attended, or not counted as
“meant.” Sometimes this ambiguity is exploited. On occasion, someone wishing to con-
vey something to another by means of a visible action which they want to be understood
only by one specific other, and not by anyone who may be co-present, may alter the per-
formance of their action so that it seems casual or to have the character of a “comfort
movement” or some other sort of disattenable action (for examples, see de Jorio 2000:
179–180, 185, 188, 260–261; Morris et al. 1979: 67–68, 88–89).
Comfort movements (“self-adaptors” in the terminology of Ekman and Friesen
1969) and other kinds of “mere actions” may well be studied for what they reveal as
symptoms of a person’s motivational or affective state, thus attracting attention from
a psychological point of view (for early studies see Krout 1935; Mahl 1968). Actions
considered as “meant,” on the other hand, attract attention from a point of view that
is closer to that of students of language and discourse. Issues of interest here include
questions about the semiotic character of utterance visible actions and how they are
employed as components in utterance construction. These modes of expression also
raise issues for cognitive theories of language. For example, utterance visible actions
are treated by some authors as if they are image-like representations of meaning
(McNeill 1992, 2005). When deployed in relation to spoken language, their study
may suggest how the mental representation of utterance meaning is multi-levelled
and organised as a simultaneous configuration, aspects of which can be represented
through utterance visible action at the same time as other aspects can be represented
by means of the linear structures of spoken language. Old questions about the relation-
ship between language and the structure of thought, debated extensively in the eigh-
teenth century, may be re-opened in a new way through studies of utterance visible
action both in speakers and in signers (Woll 2007; see also Ricken 1994).
I now turn to discuss some of the main themes which have occupied me in my work
in this domain. I begin with aspects of how utterance visible action, speech and verbal
expression are related within the utterance. Then I discuss work on utterance visible
action when it is the sole vehicle of utterance. This includes a study of a primary
(deaf) sign language in Papua New Guinea, and a much larger study of alternate sign
languages in use among Australian Aborigines. I conclude with a short survey of what
I see as some of the broader implications of these studies.
2. Temporal co-ordination between speech and hand,

arm and head actions
My earliest work on utterance visible action was influenced by an early exposure to
ethology. I took an interest in the organization of human communication conduct as
it may be observed in human co-present interaction. I appreciated very much the
fine-grained observation Erving Goffman pioneered in his work on human interaction,
and I wanted to investigate what he called the “small behaviors” of spacing and posture,
of glances and spoken utterances, of hand actions and head movements – the observ-
able stuff out of which occasions of interaction are fashioned (Goffman 1967: 1).
Among other things, this led me to the work of Ray Birdwhistell (see Birdwhistell
1970; Kendon 1972a), who offered very interesting observations on how movements
of the body, especially of the head and face, patterned in relation to aspects of speech.
In consequence of this, and adopting methods I had learned from an association with
William Condon (Condon and Ogston 1966, 1967; Condon 1976), I undertook a de-
tailed analysis of the bodily action that could be observed in a two-minute film showing
a continuous discourse by a man who was engaged in an informal discussion of
“national character” in a London pub. In a paper published in 1972 (Kendon 1972b),
which reported this analysis, I described how, in association with each “tone unit” (Crys-
tal 1969) in the spoken discourse one could observe a contrasting pattern of bodily
action. Patterns of action of shorter duration might be accompanied by other contrasting
patterns of longer duration – so one could say that the movement flow was organized at
multiple levels simultaneously. To a considerable degree these multiple levels in the
movement flow corresponded to the several different levels of organization in terms
of which the flow of speech could be analyzed. I was led to suggest – to quote from
the paper – “the speech production process is manifested in two forms of activity simul-
taneously: in the vocal organs but also in bodily movement, particularly in movements of
the hands and arms” (Kendon 1972b: 205).
In the aforementioned 1972 study, I attempted to deal with all observable movements –
the fingers and hands and arms, the shifts in positionings of the trunk, changes in ori-
entation of the head. From this it appeared that the larger segments of discourse were
bracketed by sustained changes in posture or new orientations of the head or repeated
patterns of head action, while shorter segments of discourse, down to the level of the
tone unit and even syllables within the tone unit, were associated with phrases of move-
ment of shorter duration. This was in accord with observations that Birdwhistell and his
colleague Albert Scheflen had summarised in earlier publications (see Scheflen 1965).
From this single study I concluded that the “utterance” manifested itself in two aspects
simultaneously – in speech and visible bodily action (Kendon 1980a).
In subsequent work, my attention focused more upon speakers’ hand actions. Such
actions, as is well known, had been in the past, as they have been subsequently, the prin-
cipal focus of interest in studies of “gesture.” There is good reason for this. After all, as
Quintilian noted some 2000 years ago, of all the body parts that speakers move, the
hands are closest to being instruments of speaking. In his discussion of the role of visible
bodily action in Delivery (Actio), he writes while “other parts of the body merely help
the speaker … the hands may almost be said to speak” (see Quintilian Institutio
Oratoria Book XI, iii. 86–89 in Butler 1922).
Subsequent to my 1972 publication, I developed a terminology and a scheme for ana-
lyzing the organization of speaker’s hand movements and offered some general obser-
vations on how these relate to speech (Kendon 1980a). These suggestions were re-
stated and slightly revised later (Kendon 2004: Chapter 7). The slight modifications
in terminology given in this revision are reflected in what follows here. As a starting
point I noted how the forelimb movements of utterance visible actions are organised
as excursions – the hand or hands are lifted away from a position of rest (on the
body, on the arm of chair, etc.), they move out into space and perform some action,
thereafter returning to a position of rest, often very similar to the one from which it
started. This entire excursion, from position of rest to position of rest, I called a Gesture
Unit. Within such an excursion the hand or hands might perform one or more actions –
pointing, outlining or sculpting a shape, performing a part of an action pattern, and
so on. This action was called the stroke. Whatever the hand or hands did to organize
themselves for this action was called the preparation. The preparation and stroke,
taken together, I referred to as a Gesture Phrase, with the stroke and any subsequent
sustained position of the hand considered as the nucleus of the Gesture Phrase. Once
the hand or hands began to relax, the Gesture Phrase was finished. The hand (or
hands) might then start upon a new preparation, in which case a new Gesture Phrase
begins, or it might go back to a position of complete rest, in which case the Gesture
Unit would be finished. The distinction between Gesture Unit, the more inclusive
unit, and Gesture Phrase, was necessary, for in this way a succession of Gesture Phrases
within the frame of a single Gesture Unit could be accommodated. As we had shown
(in Kendon 1972b), the nested hierarchical relationship between Gesture Unit and Ges-
ture Phrases corresponded to the nested hierarchical relationship between tone unit
groupings at various levels within the spoken discourse. Just as spoken discourse is
organized at multiple levels simultaneously, so this appears to be true of associated
utterance visible actions of the forelimbs.
Examination of how these Gesture Phrases were organized in relation to their con-
current tone units suggested that the stroke of the Gesture Phrase tended to anticipate
slightly, or to coincide with the tonic center of the tone unit. Looking at the form of
action in the stroke and what it seemed to express, it seemed that there was a close coor-
dination between the meanings attributed to the action of the stroke and the meaning
being expressed in the associated tone unit. This did not mean that the meanings attrib-
uted to the forms of action in the Gesture Phrases were always the same as the mean-
ings expressed in the associated speech. Rather, it meant that there was generally a
semantic coherence between them (McNeill 1992 has called this “co-expression”).
Sometimes these meanings seemed to parallel verbal meaning, but they often seemed
to complement it or add to it in various ways. Uttering, that is, could be done both ver-
bally and kinesically in coordination. This gave rise to the general observation that,
somehow, expression in words and expression in visible bodily action are intimately
related. These conclusions were, in part, confirmed in independent observations by
McNeill and were incorporated and re-stated by him in his book Hand and Mind
(McNeill 1992).
Subsequent to this demonstration that a speaker, in using the hands in this way, does
so as an integral part of the utterance, I began to investigate the different ways in which
these hand actions could be deployed in relation to the speech component of the utter-
ance. From this it appeared that the utterer can be flexible in how this is done. The co-
ordinate use of the two modes of expression is orchestrated in relation to whatever
might be the speaker’s current rhetorical aim. Thus we described examples where the
speaker delayed speech so that a kinesic expression could be foregrounded or com-
pleted, examples in which the speaker delays a kinesic expression so that it could be
placed appropriately in relation to what was said, and yet other examples showing
how the speaker, though repeating the same verbal expression, employed a different
kinesic expression with each repetition. These observations were presented in Chapter 8
of Kendon (2004). I took them as supporting the view that these manual actions “should
be looked upon as fully fashioned components of the finished utterance, produced as
an integral part of the ‘object’ that is created when an utterance is fashioned” (Kendon
2004: 157).
3. The semantics of utterance visible actions in relation to the

semantics of verbal expression
Utterance visible actions, especially those of the forearms, are generally seen as being
done, as we have put it, with “semantic intent.” That is, they are seen as actions done by
the actor as part of an effort to express meanings. These actions differ widely in terms of
the extent and nature of the meanings attributed to them. In most speaking commu-
nities, probably in all, there exist shared vocabularies of kinesic expressions which
are used with shared meanings. These have, in some cases, been separately described,
as if they constitute a distinct class or category. In such cases they have been termed
“emblems” (Ekman and Friesen 1969) or “symbolic gestures” (Morris et al. 1979) or
“quotable gestures” (Kendon 1992). Dictionaries of them have also been attempted
(see Meo-Zilio and Mejia 1980–1983 for one of the largest of these). These highly con-
ventional forms are used by speakers in various contexts (unfortunately this has re-
ceived little systematic attention, but see Sherzer 1991 and Brookes 2001, 2004,
2005). However, even when speakers are not making use of forms with “quotable”
meanings, the forms of action they employ still convey meaning in various ways and
are governed, to varying degrees, by social conventions. The most well documented
demonstration of this point remains, remarkably, that of David Efron in 1941 (there
is no later comparable comparative study; see Efron 1972). An important question for
investigation is how these meanings of utterance visible actions (whether or not they
are “quotable”) may interact with meanings expressed verbally, and what consequences
this interaction may have for how the utterance is understood by others.
From the point of view of how the meaning of a speaker’s utterance may be inter-
preted, concurrent or associated utterance visible actions, in virtue of their own mean-
ings, in interaction with what is expressed in words, can extend, enrich, supplement,
complement spoken meaning in various ways and in respect to various aspects and le-
vels of meaning. In a preliminary discussion of this, I suggested five main ways in which
these actions may do so (Kendon 2004: 158 et seq.). These may be termed referential, in
which the kinesic expression contributes to the referential or propositional meaning of
what is being uttered; operational, in which the kinesic expression operates in relation
to what is being expressed verbally, as when it confirms it, denies it, negates it; perfor-
mative, in which the kinesic action expresses or makes manifest the illocutionary force
of the utterance, as in showing whether a question is being asked, a request or an offer is
being made, and the like; modal, in which the action provides an interpretative frame
for what is being expressed verbally, as in indicating that what the speaker is saying
is a quotation, is hypothetical, is to be taken literally, to be taken as a joke, and so
forth; and parsing or punctuational, where the utterance visible action appears to
make distinct different segments or components of the discourse, providing emphasis,
contrast, parenthesis, and the like, or where it marks up the discourse in relation to as-
pects of its structure such theme-rheme or topical focus (see also Kendon 1995). We
now give more detail for each of these different functions in turn.
3.1. Referential
There are two ways in which visible actions can contribute to referential or proposi-
tional meaning. One way is by pointing. Here the actor, by pointing at something,
can establish what is pointed at as the referent to some deictic expression in the dis-
course. In a study of pointing (Kendon and Versante 2003; Kendon 2004: Chapter 11)
the different hand shapes used when pointing were described (six different forms were
identified), and the discourse contexts in which they were used were examined. It
emerged that different hand shapes are used in pointing, according to the way in
which speakers used the referent of the pointing in their discourse. For example, if it
was important that the speaker’s recipients distinguish one specific object pointed at
from another (“Over there you see St. Peters, then to left you see the Old Vicarage”),
the extended index finger was the commonest hand shape. On the other hand, if the
speaker referred to something because it is an example of a category (“you see there
a fine example of a war memorial”), because the speaker makes a comment about it,
or because it is something which has features the speaker’s recipients are to take
note of (“you can see again the quality of the building in this particular case”), the
speaker is more likely to use a hand in which all fingers are extended and held together,
palm oriented vertically or upwards. That is, the shape and orientation of the hand
employed in pointing is chosen according to how the speaker is treating, in spoken
discourse, the referent of the pointing action.
This may reflect a more general feature of utterance visible actions, which is that,
very often, they are derived forms of actions made when operating upon or manipulat-
ing the objects which the refer to, whether these be literal or metaphorical. When we
talk about things, we conjure them up as objects in a virtual presence and with our
hands we may manipulate them in various ways, pushing them into position, touching
them as we speak of them, arranging them in relation to one another spatially, and
so on (for a view not unrelated to this, see Streeck 2009).
The other way for actors to use their hands in relation to the referential content of
their discourse is to use them to do something which itself has referential meaning.
These actions may be highly conventionalized, recognized as having quite specific or
restricted meanings (often directly expressible in a word or phrase that is regarded as
having an equivalent meaning), or they may be forms of action by which a sketch or
diagram of some object is provided, by which some pattern of action is depicted, or
which provides a movement image analogous to the dynamic character of a process
or mode of action.
In a survey of numerous recordings of unscripted conversations in various settings,
I distinguished six different ways in which visible actions could, in this manner, partic-
ipate in the referential meaning of the speaker’s discourse (Kendon 2004: 158–198).
These may be summarized as follows:
(i) A manual expression with a “narrow gloss” (“quotable gesture”) is used simulta-
neously with a word that has an identical or very similar meaning. In Naples, in
Italy, where I collected recordings of conversations, it was not uncommon to
observe how, from time to time, such expressions were used in the course of
talk, so that it was as if the speaker uttered the same word simultaneously in
speech and kinesically. A speaker explaining that nowadays in Naples there
were too many thieves uttered the word “ladri” (thieves) and used a manual
expression which is always glossed with this word. Again, as a speaker says
“money” he rubs the tip of his index finger against the tip of his thumb in an action
always glossed as “money.” Yet again, as a (British) speaker describes her job and
says “I do everything from the accounts, to the typing, to the telephone, to the
cleaning,” as she says “typing” and “telephone” and “cleaning” she does an action,
in each case a conventional form, often glossed with the same words that she utters
(see Kendon 2004: 178 for these examples). In such cases the semantic relationship
between the two modalities appears to be one of complete redundancy. However, a
study of the contexts in which this occurs, taking into consideration how the action
is performed, suggests that there are various effects speakers achieve by using such
narrow gloss expressions in this way. More attention to this kind of use of kinesic
expressions is needed.
(ii) Kinesic expressions with a narrow gloss may also be used in parallel with verbal
expressions in such a way that they are not semantically redundant but make a sig-
nificant addition to the content of what the speaker is saying. For example, a city
bus driver (in Salerno, Italy), describing the disgraceful behavior of boys on the
buses adds that they behave this way in front of girls, who are not in the least
upset, saying that also they are happy about it. As he says this, he holds both
hands out, index fingers extended, positioned so that the two index fingers are
held parallel to one another. In this way he adds the comment that boys and girls
are equal participants in this activity, using here a kinesic expression glossed as
“same” or “equal,” among other meanings given it (de Jorio 2000: 90). Kendon
(2004: 181–185) describes this and several other examples.
(iii) Kinesic expressions may be used to make more specific the meaning of something
that is being said in words. For example, it is common to observe how an enact-
ment, used in conjunction with a verb phrase, appears to make the meaning of
the verb phrase much more specific. For example, a speaker speaks of how some-
one used to “throw ground rice” over ripening cheeses to dry off the cheeses’
“sweat.” As he says “throw” he shapes his hand as if it is holding powder and
does a double wrist extension as if doing what you would do if you were to scatter
a powder over a surface. In this way the actions referred to by the verb “throw” are
given a much more specific meaning (Kendon 2004: 185–190).
(iv) Hand actions may be used to create the representation of an object of some kind.
This may be deployed in relation to what is being said as if it is an exemplar or an
illustration of it. For example, a speaker is explaining how, in a new building being
discussed, a security arrangement will include “a bar across the double doors on
the inside.” As he says “bar” he lifts up his two hands and moves them apart
with a hand shape that suggests he is molding a wide horizontal elongate object.
As the speaker talks about an object, he uses his hands to create it, as if to bring
it forth as an exhibit or illustration (Kendon 2004: 190–191).
(v) Hand actions are often used either as a way of laying out the shape, size and spatial
characteristics or relationships of an object being referred to, or as a way of exhi-
biting patterns of action which provide either visual or motoric images of processes
(Kendon 2004: 191–194).
(vi) Hand actions can also be employed to create objects of reference for deictic ex-
pressions. For example, a speaker described a Christmas cake and said it was
“this sort of size,” using his extended index fingers to sketch out a rectangular
area over the table in front of him, thus enabling recipients to envisage a large
rectangular object lying on the table (Kendon 2004: 194–197).
3.2. Operational
In contrast to these kinds of uses, hand or head actions are common that function as an
operator in relation to the speaker’s spoken meaning. An obvious way in which this
may be observed is in the use of head or hand actions that add negation to what is
being said. This is not always a straightforward matter, however. For example, the
head shake, commonly interpreted as a way of saying “no” is of course used for this,
but it can also be used when a speaker is not saying “no” to anything directly, but saying
something which implies some kind of negation (Kendon 2002). Likewise, there is a
very widely used hand action in which the hand, held with all fingers extended and ad-
ducted (a so-called open hand), held on a supinated forearm (so the palm faces down-
wards), is moved horizontally and laterally. Such a hand action is commonly seen in
relation to negative statements or statements that imply a negative circumstance (as
in a shopkeeper using this action as she explains her supply of a cheese to a customer:
“That’s the finish of that particular brie”), but it may also be seen in relation to positive
absolute statements, as if the hand action serves to forestall any attempts to deny what
is being said, as in: “Neapolitan cooking is the best of all cooking,” the horizontal hand
action acting here as if to say that any contrary claim will be denied (see Kendon 2004:
265–264; see also Harrison 2010).
3.3. Modal
Utterance visible actions may also be used to provide an interpretative frame for a
stretch of speech. The use of the “quotation marks” gesture to indicate that the speaker
is putting what he is saying in quotes is a common example. In an example drawn from
one of my recordings (not published), a speaker is in a conversation with someone
about how he negotiated a good deal with a representative of a mobile phone company.
In describing his successful negotiation he repeats what he said to the representative in
accepting some offer. He says: “yes, I’ll have that” and as he does so he held his hand up
to his ear in a Y hand shape, commonly used as a kinesic expression for “telephone.” In
this way he frames his words as quoted – as what he said to the representative – and
shows that he said this while talking on the telephone. In another example, also from
my recordings (made in Salerno in 1991), someone discussing a robbery puts forward
a speculation about what the robber might have done. As he describes what the robber
did he places a “finger bunch” hand against the side of his forehead and moves it away
and upward, expanding his fingers as he does so. This is an action that is widely accepted
(in Southern Italy) as a reference to imagination. Here it serves to frame his statement
as a hypothesis.
3.4. Performative
Hand actions are often used as a way of making manifest the speech act or illocutionary
force of what a speaker is saying. Many examples of this sort of usage were described
by Quintilian, and some of the forms he described are also used today (Dutsch
2002; Quintilian Book XI, iii: lines 14–15, 61–149 in Butler 1922). In my own work I
have described the use of ‘praying hands’ or mani giunte and also of the ‘finger
bunch’ or grappolo as devices for marking questions in Neapolitan speakers (Kendon

1995), and some of the uses of the so-called palm up open hand also can be used in this
manner, as when a palm up open hand is proffered when a speaker gives an example of
something, or when a speaker asks a question of another, holding out the palm-up-open
hand as if they want something to be put in it (Kendon 2004: 264–281. See also Müller
2004).
3.5. Parsing
Lastly, there is a punctuational parsing or discourse structure marking function of
speaker’s hand or head actions. For example, speakers not uncommonly, in giving a
list of items, place their head in a slightly different angular position in relation to each
item as they describe it. “Batonic” movements of the hand can be observed to occur
in apparent association with features of spoken discourse that are given prominence
(see Efron 1972; Ekman and Friesen 1969). However there are also hand action se-
quences, such as the “finger-bunch-open-hand” sequence observed in Neapolitan speak-
ers that are coordinated with the topic-comment structure of the speaker’s discourse. A
version of this has also been described for Persian speakers in southern Iran (Seyfeddi-
nipur 2004). Also observed among Neapolitan speakers, but observed elsewhere as well,
is the thumb-tip-to-index-finger-tip “precision grip.” This is often used to mark a stretch
of speech which the speaker deems to be of central importance to what is being said, as
when the speaker is emphasizing something that is quite specific and important (see
Kendon 1995; Kendon 2004: 238–247). For an account of German uses of this hand
action see Neumann (2004). For uses by an American speaker see Lempert (2011).
3.6. Discussion
It should be stressed that the different ways described here of how visible actions can
contribute to the meaning of an utterance is only a beginning. More complete and more
systematic accounts have yet to be provided. Previous partial attempts similar to this
include Efron (1972), McNeill (1992) (and see also Streeck 2009 and Calbris 2011). Fur-
thermore, and it is important to stress this, it should be understood that these semantic
and pragmatic functions of utterance visible actions are not mutually exclusive. A given
action can serve in more than one way simultaneously, and a given form may function in
one way in one context and in a different way in another.
A second point must be made. We have spoken about different ways in which these
utterance visible actions can contribute to the meaning of the utterance, pointing out
how they may contribute to the propositional content of an utterance, or function in
various ways in relation to various aspects of its pragmatic meaning. The different
ways we have outlined are different ways which have been arrived at by observers or
analysts, after they have reflected upon how the form of visible action, regarded as in
some way intended or meant as part of the speaker’s expression, can be related to
the semantic or pragmatic content that has been apprehended from the speech. Our
ability to do this is based upon our ability to grasp how these actions are intelligible.
The basis for this understanding remains obscure, however. Very little attention has
been paid to the problem of how the semantic “affiliation” claimed between words
and kinesic expressions is justified. Involved here is the question of the intelligibility
of utterance visible actions and how this interacts with the intelligibility of associated
spoken expression. The nature of this intelligibility and of this semantic interaction
deserves much more systematic attention (one recent relevant discussion is Lascarides
and Stone 2009).
Finally, how can we be sure whether, or to what extent, these utterance visible ac-
tions make a difference to how recipients grasp or understand the meanings of the utter-
ances they are a part of. We do know, both from everyday experience and from
numerous experimental studies (Hostetter 2011; Kendon 1994), that these visible ac-
tions do make a difference for recipients, but whether they always do so, and whether
they do so in the same way, this we cannot say, nor do we have a good understanding of
the circumstances in which they may or may not do so. (See Rimé and Schiaratura 1991:
272–275 for an interesting start in investigating this issue).
To conclude, the brief survey offered above should make clear the diverse ways in
which speakers employ utterance visible action. No simple statement can be made
about what these actions do or what they are for. For me, it seems, a consideration
of these different modes of use supports the view that these actions are to be regarded
as components of a speaker’s final product. That is, they are not (or are not only) symp-
toms of processes leading to verbal expression (as some approaches to them might sug-
gest). Rather, they are integral components of a person’s expression which, in the cases
we have been considering, are composed as an ensemble of different modalities of
expression.
4. When utterance visible action is the main utterance vehicle

Utterance visible action, as indicated earlier, includes, of course, its use as a means of
utterance when it is used on its own, without speech. This comes about in a variety of
circumstances. For example, when people are too far away to hear one another, but oth-
erwise need to exchange utterances, visible action is made use of. This may be observed
on an occasional basis in all sorts of circumstances, but there are circumstances where it
happens as a matter of routine. This has been reported in factories (e.g. Meissner and
Philpott 1975), in cities such as Naples (e.g. de Jorio (1819: 108) where he describes
the language of the basso popolo as “double” – they also have a language of gesture,
that is, (2000). See also Mayer 1948 and discussion in Kendon 2004: 349–354), or
among hunter-gatherers, such as Congo Pygmies (Lewis 2009), and Australian Abori-
gines (Kendon 1988). In these circumstances fairly complex kinesic codes may become
established.
There are also circumstances in which speech is prevented for ritual reasons and
here systems of kinesic communication may become highly elaborated, to the extent
that they may earn the title “sign language.” The most notable examples are the systems
found in the central desert areas of Australia where the practice of tabooing speech as
part of mourning ritual (among women) or as part of initiation ceremonies (among
men) was and is followed (Kendon 1988), and the systems at one time in widespread
use among the Plains Indians of North America (Davis 2010; Farnell 1995; Mallery
1972). Sign languages developed for ritual reasons also were (and perhaps still are)
used in some Christian monastic orders (Bruce 2007; Umiker-Sebeok and Sebeok
1987). Besides this, and best known of all, are the circumstances of deafness. As has
long been known, among the deaf, elaborate systems of utterance visible actions are
employed and developed with semiotic features that are comparable to spoken lan-
guages. Depending on the community and the place of deaf persons within it, these
sign languages may also be used between deaf and hearing, as well as just among the
deaf. The literature on these sign languages is now very extensive. For a representative
survey see Brentari (2010).
In my own work on utterance visible action as the sole vehicle for utterance, I have
undertaken two pieces of research. One was a small scale study of material collected
in Papua New Guinea (Kendon 1980b, 1980c, 1980d), mainly from one deaf young
woman. The other was a large scale study of sign languages in Aboriginal Australia
(Kendon 1988). The work with the material collected in Papua New Guinea was (for
me) a pioneering and preliminary effort in many ways, and restricted in scope, since
it was based on limited material collected as a result of a chance encounter. In the
course of attempting to make films of certain kinds of social occasions among the
Enga in the Papua New Guinea highlands, a young deaf woman appeared one day
near my residence. She talked in signs with great fluency. She was using a system that
was used by various families in the valley who had deaf members. The deafness in the
valley was said to be a consequence of an epidemic of meningitis of some years back. For-
tunately, my New Guinean field assistant was able to converse with her, since he also had
deaf relatives. He was later able to interpret for me much of what I was able to record, as
well as assisting in the recordings. I later undertook a detailed study of some of this mate-
rial. Despite its limitations, undertaking such a detailed study led me to confront some
fundamental issues regarding the way in which meanings may be encoded in the media
of visible bodily action (see the discussion in Kendon 1980c).
The fundamental process involved seems to be one in which the actor, by means of
range of different techniques of representation, “brings forth” or “conjures” actions,
objects, movements, spatial relations, in this way representing concepts, ideas, and the
like, so they are understood as making reference to these things. This may take the
form of a kind of re-enactment of actions and their circumstances and of the actions
themselves in a fairly elaborated pantomimic manner. Very quickly, however, these
forms of action become reduced schematized and standardized in various ways as
they become a shared means by which meanings may be represented. This is a funda-
mental and general process that has been described many times by students of auton-
omous utterance visible actions. Although the terminology is various, the processes of
“sign formation” that have been described by Kuschel (1973), Tervoort (1961), Klima
and Bellugi (1979), Yau (1992), Eastman (1989), Cuxac (Cuxac and Sallandre 2007;
see also Fousellier-Souza 2006) – to name just a few of the authors who have written
about this – are all fundamentally similar. To represent a meaning for someone else
(and also, I think, to represent it for oneself), one resorts to a sort of re-creation. As
if, by showing the other the thing that is meant, the other will come to grasp it in a
way that overlaps with the way it is grasped by oneself. As these representations
become socially shared, they rapidly undergo various processes of schematization. In
consequence they are no longer understood only because they are depictions of some-
thing but also because they are forms which contrast with other forms in the system,
acquiring the status of lexical items in a system, that is. In this process we seem able
to observe the processes of language system formation. This provides one of the
main reasons why primary sign languages (sign languages of the deaf, that is) have
become objects of such intense interest.
In a much larger investigation, I examined Australian Aboriginal sign languages.

What interested me here was the fact that these are well developed, fully flexible sys-
tems, developed by speaker-hearers who have always had full access to spoken lan-
guage. These sign languages, developed for ritual reasons, for the most part, are also
widely used as a convenient alternative to speech in all sorts of circumstances (Kendon
1988: Chapter 14). In an area of central Australia that extends northwards from above
Alice Springs in the Northern Territory as far as the border with Arnhem Land, a prac-
tice is followed in which a woman, once bereaved, forgoes the use of speech for long
periods. This has given rise to complex sign languages which are used among women
and which may be used in all circumstances of every day life. There are many interest-
ing aspects of these sign languages, from cultural and semiotic points of view, and it is
very interesting to compare them to other alternate sign languages (such as those re-
ported from North America or in Christian monastic communities) and also to primary
sign languages. It is also useful to compare them with other language codes, such as writ-
ing or drum and whistle languages (see Kendon 1988: Chapter 13). Here I will comment
on just one issue, which was central in my work, and that is how these central Australian
sign languages are related, structurally, to the spoken languages of their users.
A comparison of signing among women of different ages that I undertook at the
Warlpiri settlement, Yuendumu (Kendon 1984), suggested that, as users became more
proficient at using these sign languages, they come to use, more and more, signs that rep-
resent the semantic units expressed by the semantic morphemes of the spoken language.
A notable feature is that it appears common for signs to develop which represent the
meanings of the morphemes of the spoken language, qua morphemes. In consequence,
concepts expressed in spoken language by compound morphemes get expressed by com-
pound signs that are the equivalent of these morphemes, and not by a separate sign de-
rived from some property of the thing in question. I give just one example to illustrate
this point (for this and other examples see Kendon 1988: 369–372). In Warlpiri “scor-
pion” is kana-parnta, a compound of kana “digging stick” and -parnta, a possessive suf-
fix, which we can render in English as “having.” Thus “scorpion” in Warlpiri is, literally,
“digging stick-having.” In a language of a neighbouring community, Warlmanpa, the
same creature is known as jalangartata, which is a compound of the word jala “mouth”
and ngartata “crab.” In the signs for “scorpion” we find however, that in Warlpiri it is
a compound sign, the equivalent of a sign for “digging stick” followed by a sign which
is used, among other things, as a sign for a possessive. In Warlmanpa, in contrast, the
sign is also a compound sign, but this time a compound of the sign for “mouth” followed
by the sign for “crab.” It is interesting that, in creating signs for these creatures, we do not
find a sign for “scorpion” derived from some feature of the animal (its action of raising its
tail comes to mind) but signs based on representations of the meanings of the verbal
components which make up the verbal expression. In sign languages of neighboring
language communities in this part of Australia, we thus can find differences in signs
for similar things which derive from the fact that these sign languages in part develop
as kinesic representations of the semantic units of their respective spoken languages.
It is interesting to consider this in relation to some recent findings regarding differ-
ences between manual expressions and object-placement verbs in speakers of different
languages. Gullberg (2011) has reported that, in Dutch, the equivalent of the verb “to
put” is different according to the nature of the physical object being put somewhere or
the orientation of the object being placed. For example, to describe the putting of a vase
on a shelf or some other object which has a base on which it stands, one chooses the
verb zetten. However, if the object does not have a base or is something, such as a
book, that can be put down on its side, one chooses the verb leggen. In French, on
the other hand, one uses the same verb mettre, whatever the object or its placement ori-
entation might be. Gullberg found that Dutch speakers, if using hand actions as they
talked about putting objects somewhere, accompanied their verb phrases with different
hand actions, according to which placement verb they used. French speakers, on the
other hand, did not use hand actions that were differentiated in this way, regardless of
the nature of the object they were talking about. This suggests that where a language
makes semantic distinctions of this sort and manual expressions are also being employed,
these may reflect these semantic distinctions. The language spoken, thus, may link directly
to the kinds of manual expressions that may be used, if these are used when speaking.
This is a further piece of evidence in favour of the view that, as Kendon (1980a) put it,
“gesticulation and speech are two aspects of the process of utterance.” Exactly how this
is to be understood is yet to be made clear. However, the detailed way in which Warlpiri
speakers or Warlmanpa speakers have created kinesic expressions for the semantic
units their spoken languages supply reinforces the view, also suggested by Gullberg’s
work (and suggested, too, by the phenomenon we described earlier, in which “narrow
gloss” kinesic expressions may be used conjointly with spoken expressions of the same
meaning), that word meanings are somehow linked to or grounded in schematic percep-
tuo-motor patterns so that, if the hands are also employed when speaking, we see these
patterns being drawn upon as a source for the hand actions. For the Warlpiri women,
who, of necessity, were to create kinesic representations of concepts provided by
their language, a strategy they followed was to draw upon repertoires of already existing
perceptuo-motor representations. If this is so, this might mean that the “imagery” that
McNeill (2005) suggests is opposed to the categorical expressions of words is not always
to be so sharply separated. Kinesic expressions can also be like words. Indeed, they are
often highly schematic in form and serve as devices to refer to conceptual categories in
ways very similar to words. Cogill-Koez (2000a, 2000b) shows this for “classifier predi-
cates” in sign languages, which have features in common with some kinds of manual ex-
pressions seen in speakers (see Kendon 2004: 316–324; Schembri, Jones, and Burnham
2005). Whatever it is that is made available through verbal expression can also be made
available by other means. The distinction between imagistic expression and verbal
expression may be much less sharp than has often been supposed.
5. Broader implications
In the foregoing I have touched upon some of the questions I have been concerned with
in my studies of utterance visible action. My purpose has been to illustrate the partic-
ular perspective in terms of which I have approached the study of this domain of human
action. What are some of the broader implications?
5.1. Utterance visible action and speech and the construction

of utterances
In section 3 above, the various ways in which utterance visible actions can enter into the
construction of utterances that also involve speech, and the different levels at which
they may do so, suggests that in the process of utterance production the speaker forges
“utterance objects” out of materials of diverse semiotic properties. This makes it possi-
ble for a speaker to “escape” the constraints of the linearity of verbal expression, at
least to some degree. As has recently been pointed out, in sign languages use is
made of multiple articulators simultaneously. This means that, in these languages,
simultaneous as well as linear constructions must be envisaged as part of their grammar
(Vermeerbergen, Leeson, and Crasborne 2007). Once it is seen that speakers also can
make use of utterance visible actions as they construct utterances, it will be clear that a
similar kind of simultaneity of construction becomes possible. As the examples we have
mentioned make clear (and as is clear from the many others that have been described),
speakers do in fact exploit this possibility. For the most part, at least as far as is known,
the use of simultaneous constructions in spoken language through the combination
of speech and utterance visible action has not, in any community of speakers, become
stabilized and formalized as a shared practice to the point that it must be considered
as a part of the formal grammar of any spoken language. Such a manner of construct-
ing utterances is widespread nevertheless, and, from the point of view of describing
languaging, rather than language, it must be taken into consideration (Kendon 2011).
5.2. The emergence of linguistic symbols

A second issue of great interest that the study of utterance visible action can throw light
upon has to do with the emergence of linguistic symbols. We already have referred to
this briefly, in reference to the study of sign languages, where the study of the phenom-
ena of “sign formation” has allowed us to see how forms of expression that first come
into being as pantomimic or picture-like representations (“Highly Iconic Structures” in
Cuxac’s terminology – Cuxac and Sallandre 2007) become transformed into econo-
mized schematic forms which, in virtue of the fact that they are shared between people
as expressions in common, come to exist as autonomous symbolic forms which have the
characteristics that are perceived as arbitrary. Yet the “iconicity” of linguistic expres-
sion is always latently present, and it can re-emerge at any time. This is the implication
of the presence of analogic forms of expression in sign language such as may be seen in
the use of so-called “classifiers” (Emmorey 2003), the depiction of conceptual relations
by means of spatial diagrams (Liddell 2003) and the modification of sign performance
to achieve “iconic effects” (e.g. Duncan 2005). We see comparable processes in speak-
ers, as in the various uses of vocal effects that speakers exploit, but this is even more
evident if we take into consideration their uses of visible action. Just as we see how,
in signed discourse, there is a continuous interplay between aspects that admit of formal
structural description at the same time as aspects are used which are dynamic, analo-
gous, “iconic,” so we may see the same things in speakers. Constructing an utterance
as a meaningful object, whatever modalities may be used, is always the result of a
co-operative adjustment between forms governed by shared formal structures and
modes of expression that follow analogic or “iconic” principles. The dialectic between
“imagistic” and “linguistic categorical” expression that McNeill (2005) describes in his
theory of the “growth point” may be regarded as an attempt to capture this point. How-
ever, in my view, the actor is continually adjusting his expressive resources in relation
to one another as he seeks to create an “utterance object” that meets his rhetorical
aims within the frame of whatever interactional moment he is faced with. I do not
see a dialectical struggle, but an orchestration of resources under the guidance of a

communicative aim.
The study of the ways in which utterance visible actions can become shared forms –
seen especially in the study of sign languages, but not only there – helps to throw light
on the social and semiotic processes that are involved in the creation of “language sys-
tems.” What might be called the “effort after representation” – making a connection
here with Bartlett’s (1932) notion of “effort after meaning” – seems to be a fundamen-
tal process in language. On this view, the place of so-called “iconic processes” including
“sound symbolism” in speech, should be re-evaluated. Whereas when spoken languages
and also, to a degree sign languages, are considered as abstracted “social objects” and
described as formal systems, the “iconic impulse” may seem not to be so very impor-
tant, when considering the genesis of language, its continual emergence in everyday
interaction (as well as historically), it is clear that it is of fundamental importance.
5.3. Utterance visible action and language origins

There is a long tradition, which goes back at least to Condillac (see his Essai of 1746 –
Condillac 2001), that suggests that “gesture” – some form of utterance visible action
which Condillac referred to as la langage d’action – must have been the first form of
language (Rosenfeld 2001 provides an excellent discussion). In modern times, especially
since the seminal paper of Hewes of 1973, this view has gained increasing support.
Scholars such as Donald (1991), Armstrong, Stokoe and Wilcox (1995), Stokoe (2001),
Corballis (2002), Arbib (2005, 2012), and Tomasello (2008), among others, have all en-
dorsed this idea, although the details of the evolutionary scenarios offered differ some-
what from one author to another. Common to all of these scholars is the idea that the
kind of symbolic action that would support a form of communication that would count
as being “linguistic” ( just what this means also differs between authors) would have first
emerged through visible bodily action. There are many different points brought up in
support of this idea (including, for example, the allegedly inflexible character of ape vo-
calizations, in contrast to the flexibility of ape gesture use; the first manifestations of
language-like symbolic action in human babies is with gestures like pointing; the readi-
ness with which humans are able to develop full-fledged languages in the medium of
visible bodily action), but I think that the fundamental reason why it has attraction is
because, in the medium of visible action, it is easier to envisage how a transition
might be made from literal action to symbolic action. It is harder to imagine how voca-
lizations could come to have symbolic significance because they do not seem good ve-
hicles for iconic representation, and iconic representation, as already noted, is widely
agreed to be a fundamental process in the formation of linguistic symbols.
This “gesture first” language origins scenario has its critics, of course. Perhaps the
most important objection raised is the fact that, with the relatively rare exception of
deafness, which forces people to express themselves linguistically only with visible
bodily action, all human languages are spoken. Furthermore, anatomical and neurophy-
siological studies suggest that humans are biologically specialized as speaking creatures,
and must have evolved as such, over a very long period of time. Gesture first scenarios
all refer to a transition or a switch from “gesture” to “speech,” but none of the advo-
cates of this scenario have been able to provide a convincing account of how or why
this might have occurred. On the other hand, although those who argue that language
evolved as a system of vocal expression do not face this “switch” problem, none of them
pay very much attention to the intimate interrelations between speaking and visible
bodily action we have discussed here. The involvement of manual (and other) bodily
action in speaking needs to be accounted for in any proposal put forward to account
for the origin of language in evolutionary terms.
Most writers who advocate a “gesture first” theory of language origins draw atten-
tion to the commonly noted intimate association between gesture and speech as sup-
porting evidence (as, indeed, I did myself in Kendon 1975). However, given that
utterance visible action, when used in conjunction with speech, has a rather different
role in utterance and, accordingly, exhibits a different range of semiotic properties
than it does when it is employed as the sole vehicle of utterance (as in signing), it is
clear that it is not some kind of left-over from a non-speech kind of language and is
not appropriately so regarded. It is, rather, an integrated component of contemporary
languaging practice. Further, given modern developments in our understanding of the
neurological interrelations between speaking and hand actions (for one review see Will-
ems and Hagoort 2007), it seems much better to suppose that speaking and utterance
manual action evolved together.
According to a proposal that I am currently working on (expressed in a preliminary
way in Kendon 2009), it is suggested that we might better approach the problem if we
started out, not by thinking about the actions of speaking and gesturing as being des-
cended with modification only from communicative or expressive actions, but by think-
ing of them as including modifications of the practical actions involved in manipulating
and altering the environment, especially as this is required in the acquisition of food,
and including the manipulation and alteration of the behavior of conspecifics, as in
mothering, grooming, mating and fighting. MacNeilage (2008) has suggested that the
complex oral actions that form the basis of speech have their origins in the oral manip-
ulatory actions that are involved in the management of food intake. Perhaps this could
be extended to actions of other parts of the body involved in feeding. If an animal is to
masticate its food, food has to be brought into the mouth in some way. Leroi-Gourhan
(1993) pointed out that an animal may do this by moving its whole body close enough
to foodstuffs so that it can grasp them with its mouth directly. Animals that do this tend
to be herbivores and all four of their limbs are specialized for body support and loco-
motion. They acquire food by grazing or cropping. On the other hand, many animals,
for example squirrels and raccoons, grasp and manipulate foodstuffs with their hands,
which they also use to carry food to the mouth. Such animals tend to be carnivores or
omnivores and their forelimbs are equipped as instruments of manipulation, each with
five mobile digits. In mammals of this sort, a system of forelimb-mouth co-ordination be-
comes established. This development is particularly marked in primates, of course, who,
perhaps, in adopting an arboreal style of life, have developed forelimbs that can serve in
environmental manipulation as well as in body support and locomotion. This sets the
stage for the development of oral-forelimb manipulatory action systems, and this may
explain the origin of co-involvement of hand and mouth in utterance production (see
Gentilucci and Corballis 2006).
This implies that the actions involved in speaking and in utterance visible action, two
forms of action that, as we have seen, are so intimately connected that they must some-
how be regarded as two aspects of the same process, are adaptations of oral and manual
environmental manipulatory systems employed in practical actions. The adaptations
that allow them to serve communication at a distance are adaptations that arise as prac-
tical actions came to function in situations of co-present interaction between conspeci-
fics, at first, perhaps, as “try-out” or “as if ” versions of true practical actions (Kendon
1991). On this view the actions of speaking and gesturing do not derive only from ear-
lier forms of expressive actions. We may expect, accordingly, that there will be compo-
nents of the executive systems involved in speech that will be closely related to those
involved in forelimb action and that these will be different from those components of
oral and laryngeal action that are part of the vocal-expression system. This view re-
ceives some support in the neuroscience literature, where it is reported that the actions
of the tongue and lips by which the oral articulatory gestures of speech are achieved,
controlled as they are in the pre-motor and motor cortex, can be separated from actions
involved in exhalation and in the activation of the larynx, which produce vocalization.
The control circuits for these actions involve sub-cortical structures instead. However,
in normal speech, the oral gestures of speech articulation are combined with vocal
expression, which provides the affective and motivational components of speaking
(see, for example, Ploog 2002).
Engaging in utterance, doing language, as we might say, is thus to be thought of as
being derived from forms of action by which a creature intervenes in the world. Langua-
ging (doing language), in consequence, because it involves practical action, involves the
mobilization of oral and manual practical action systems. It also involves the mobiliza-
tion of vocal and kinesic expressive systems, as they come to be a part of social action.
Utterance visible actions, thus, are neither supplements nor add-ons. They are an inte-
gral part of what is involved in taking action in the virtual or fictional world that is
always conjured up whenever language is made use of. A theory of language that
takes this perspective, we suggest, will be better able to allow us to understand why
it is that visible bodily action is also mobilized when speakers speak and why, more gen-
erally, speaking, using language in co-present interaction, that is, is always a form of
action that involves several different executive systems in co-ordination.
6. References
Arbib, Michael 2005. From monkey-like action to human language: An evolutionary framework
for neurolinguistics. Behavioral and Brain Sciences 28: 105–167.
Arbib, Michael 2012. How the Brain Got Language: The Mirror Neuron Hypothesis. Oxford:
Oxford University Press.
Armstrong, David F., William C. Stokoe and Sherman E. Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Bartlett, Frederick C. 1932. Remembering: A Study in Experimental and Social Psychology. Cam-
bridge: Cambridge University Press.
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays in Body Motion Communication. Philadel-
phia: University of Pennsylvania Press.
Brentari, Diane (ed.) 2010. Sign Language. Cambridge: Cambridge University Press.
Brookes, Heather J. 2001. O clever “He’s streetwise.” When gestures become quotable: The case
of the clever gesture. Gesture 1: 167–184.
Brookes, Heather J. 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14: 186–224.
Brookes, Heather J. 2005. What gestures do: Some communicative functions of quotable gestures
in conversations among black urban South Africans. Journal of Pragmatics 37: 2044–2085.
Bruce, Scott G. 2007. Silence and Sign Language in Medieval Monasticism: The Cluniac Tradition
C.900–1200. Cambridge: Cambridge University Press.
Butler, Harold E. 1922. The Institutio Oratoria of Quintilian. With an English Translation by H. E.
Butler. London: William Heinemann.
Calbris, Genevieve 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Cogill-Koez, Dorothea 2000a. Signed language classifier predicates: Linguistic structures or sche-
matic visual representation? Sign Language and Linguistics 3: 153–207.
Cogill-Koez, Dorothea 2000b. A model of signed language “classifier predicates” as templated
visual representation. Sign Language and Linguistics 3: 209–236.
Condillac, Étienne Bonnot de 2001. Essay on the Origin of Human Knowledge. Translated and
edited by Hans Aarsleff. Cambridge: Cambridge University Press.
Condon, William S. 1976. An analysis of behavioral organization. Sign Language Studies 13: 285–
318.
Condon, William S. and Richard D. Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143: 338–347.
Condon, William S. and Richard D. Ogston 1967. A segmentation of behavior. Journal of Psychi-
atric Research 5: 221–235.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Princeton, NJ: Prin-
ceton University Press.
Crystal, David 1969. Prosodic Systems and Intonation in English. Cambridge: Cambridge Univer-
sity Press.
Cuxac, Christian and Marie-Anne Sallandre 2007. Iconicity and arbitrariness in French sign lan-
guage: Highly iconic structures, degenerated iconicity and diagrammatic iconicity. In: Elena
Pizzuto, Paola Pietandrea and Raffael Simone (eds.), Verbal and Signed Languages: Compar-
ing Structures, Concepts and Methodologies, 13–33. Berlin: De Gruyter Mouton.
Davis, Jefferey E. 2010. Hand Talk: Sign Language among American Indian Nations. Cambridge:
de Jorio, Andrea 1819. Indicazione Del Più Rimarcabile in Napoli E Contorni. Naples: Simoniana
[dalla tipografia simoniana].
de Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A Translation of
“La Mimica Degli Antichi Investigata Nel Gestire Napoletano” by Andrea De Jorio (1832)
and with an Introduction and Notes by Adam Kendon. Bloomington: Indiana University Press.
Donald, Merlin 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and
Cognition. Cambridge, MA: Harvard University Press.
Duncan, Susan 2005. Gesture in signing: A case study from Taiwan sign language. Language and
Linguistics 6: 279–318.
Dutsch, Dorata 2002. Towards a grammar of gesture: A comparison between the type of hand
movements of the orator and the actor in Quintilian’s Institutio Oratoria. Gesture 2: 259–281.
Eastman, Gilbert C. 1989. From Mime to Sign. Silver Spring, MD: T. J. Publishers.
Efron, David 1972. Gesture, Race and Culture, Second Edition. The Hague: Mouton. First pub-
lished [1941].
Ekman, Paul and Wallace Friesen 1969. The repertoire of nonverbal behavior: Categories, origins,
usage and coding. Semiotica 1: 49–98.
Emmorey, Karen (ed.) 2003. Perspectives on Classifier Constructions in Sign Languages. Mahwah,
NJ: Lawrence Erlbaum.
Farnell, Brenda 1995. Do You See What I Mean? Plains Indian Sign Talk and the Embodiment of
Action. Austin: University of Texas Press.
Fousellier-Souza, Ivani 2006. Emergence and development of signed languages: From a semio-
genic point of view. Sign Language Studies 7: 30–56.
Gentilucci, Maurizio and Michael C. Corballis 2006. From manual gesture to speech: A gradual
transition. Neuroscience and Biobehavioral Reviews 30: 949–960.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press of Glencoe.
Goffman, Erving 1967. Interaction Ritual. Chicago: Aldine.

Gullberg, Marianne 2011. Language-specific encoding of placement events in gestures. In: Eric
Pederson and Jürgen Bohnemeyer (eds.), Event Representations in Language and Cognition,
166–188. Cambridge: Cambridge University Press.
Harrison, Simon 2010. Evidence for node and scope of negation in coverbal gesture. Gesture 10(1):
29–51.
Hewes, Gordon W. 1973. Primate communication and the gestural origins of language. Current
Hostetter, Autumn B. 2011. When do gestures communicate? A meta-analysis. Psychological Bul-
letin 137: 297–315.
Kendon, Adam 1972a. A review of “Kinesics and Context” by Ray L. Birdwhistell. American
Journal of Psychology 85: 441–455.
Kendon, Adam 1972b. Some relationships between body motion and speech. An analysis of an
example. In: Aaron Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–216. Elmsford, NY: Pergamon Press.
Kendon, Adam 1975. Gesticulation, speech and the gesture theory of language origins. Sign Lan-
guage Studies 9: 349–373.
Kendon, Adam 1978. Differential perception and attentional frame: Two problems for investiga-
tion. Semiotica 24: 305–315.
Kendon, Adam 1980a. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 1980b. A description of a deaf-mute sign language from the Enga Province of
Papua New Guinea with some comparative discussion. Part I: The formational properties of
Enga signs. Semiotica 32: 1–32.
Kendon, Adam 1980c. A description of a deaf-mute sign language from the Enga Province of
Papua New Guinea with some comparative discussion. Part II: The semiotic functioning of
Enga signs. Semiotica 32: 81–117.
Kendon, Adam 1980d. A description of a deaf-mute sign language from the Enga Province of
Papua New Guinea with some comparative discussion. Part III: Aspects of utterance construc-
tion. Semiotica 32: 245–313.
Kendon, Adam 1981. Introduction: Current issues in the study of “nonverbal communication.” In:
Adam Kendon (ed.), Nonverbal Communication, Interaction and Gesture, 1–53. The Hague:
Mouton.
Kendon, Adam 1984. Knowledge of sign language in an Australian aboriginal community. Journal
of Anthropological Research 40: 556–576.
Kendon, Adam 1988. Sign Languages of Aboriginal Australia: Cultural, Semiotic and Communi-
cative Perspectives. Cambridge: Cambridge University Press.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
Kendon, Adam 1991. Some considerations for a theory of language origins. Man (N.S.) 26: 602–
619.
Kendon, Adam 1992. Some recent work from Italy on quotable gestures (“emblems”). Journal of
Linguistic Anthropology 2(1): 77–93.
Kendon, Adam 1994. Do gestures communicate? A review. Research on Language and Social
Interaction 27: 175–200.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in southern Italian
conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 2002. Some uses of the head shake. Gesture 2(2): 147–182.
Press.
Kendon, Adam 2008. Some reflections on “gesture” and “sign.” Gesture 8: 348–366.
Kendon, Adam 2009. Manual actions, speech and the nature of language. In: Daniele Gambarara
and Alfredo Giviigliano (eds.), Origine e Sviluppo Del Linguaggio, Fra Teoria e Storia, 19–33.
Rome: Aracne Editrice.
Kendon, Adam 2010. Pointing and the problem of “gesture”: Some reflections. Revista Psicolin-
guistica Applicata 10: 19–30.
Kendon, Adam 2011. “Gesture first” or “speech first” in language origins? In: Donna Jo Napoli
and Gaurav Mathur (eds.), Deaf Around the World, 251–267. New York: Oxford University
Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan.” In: Sotaro Kita (ed.),
Pointing: Where Language, Culture and Cognition Meet, 109–137. Mahwah, NJ: Lawrence
Erlbaum.
Klima, Edward A. and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
Krout, Maurice H. 1935. Autistic gestures: An experimental study in symbolic movement. Psycho-
logical Monographs 208(46): 1–126.
Kuschel, Rolf 1973. The silent inventor: The creation of a sign language by the only deaf mute on a
Polynesian Island. Sign Language Studies 3: 1–27.
Lascarides, Alex and Matthew Stone 2009. Discourse coherence and gesture interpretation. Ges-
ture 9: 147–180.
Lempert, Michael 2011. Barack Obama, being sharp: Indexical order in the pragmatics of
precision-grip gesture. Gesture 11(3): 241–270.
Leroi-Gourhan, André 1993. Gesture and Speech. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Lewis, Jerome 2009. As well as words: Congo pygmy hunting, mimicry and play. In: Rudolf Botha
and Chris Knight (eds.), The Cradle of Language, 236–256. Oxford: Oxford University Press.
Liddell, Scott K. 2003. Grammar, Gesture and Meaning in American Sign Language. Cambridge:
MacNeilage, Peter F. 2008. Origin of Speech. Oxford: Oxford University Press.
Mahl, George F. 1968. Gestures and body movements in interviews. Research in Psychotherapy
(American Psychological Association) 3: 295–346.
Mallery, Garrick 1972. Sign Language Among North American Indians Compared With That
Among Other Peoples and Deaf Mutes. The Hague: Mouton.
Mayer, Carl Augusto 1948. Vita Popolare a Napoli Nell’ Età Romantica. Bari: Gius. Laterza & Figli.
McNeill, David 1992. Hand and Mind. Chicago: Chicago University Press.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
Meissner, Martin and Stuart B. Philpott 1975. The sign language of sawmill workers in British
Columbia. Sign Language Studies 9: 291–308.
Meo-Zilio, Giovanni and Silvia Mejia 1980–1983. Diccionario De Gestos: España E Hispanoamér-
ica. Tomo I (1980), Tomo II (1983). Bogotá: Instituto Caro y Cuervo.
Morris, Desmond, Peter Collett, Peter Marsh and Maria O’Shaughnessy 1979. Gestures: Their Ori-
gins and Distribution. London: Jonathan Cape.
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures, 233–256. Berlin: Weidler Buchverlag.
Neumann, Ranghild 2004. The conventionalization of the ring gesture in German discourse. In:
tures, 216–224. Berlin: Weidler Buchverlag.
Ploog, Deltev 2002. Is the neural basis of vocalization different in non-human primates and Homo
Sapiens? In: Tim J. Crow (ed.), The Speciation of Modern Homo Sapiens, 121–135. Oxford:
Ricken, Ulrich 1994. Linguistics, Anthropology and Philosophy in the French Enlightenment. Lon-
don: Routledge.
Rimé, Bernard and Laura Schiaratura 1991. Gesture and speech. In: Robert S. Feldman and Ber-
nard Rimè (eds.), Fundamentals of Nonverbal Behavior, 239–281. Cambridge: Cambridge Uni-
versity Press.
Rosenfeld, Sophia 2001. Language and Revolution in France: The Problem of Signs in Late Eigh-
teenth Century France. Stanford, CA: Stanford University Press.
Scheflen, Albert E. 1965. The significance of posture in communication systems. Psychiatry: Jour-
nal of Interpersonal Relations 27: 316–331.
Schembri, Adam, Caroline Jones and Denis Burnham 2005. Comparing action gestures and clas-
sifier verbs of motion: Evidence from Australian sign language, Taiwan sign language and non-
signers gestures without speech. Journal of Deaf Studies and Deaf Education 10: 272–290.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the “Pistol
Hand.” In: Cornelia Müller and Roland Posnan (eds.), The Semantics and Pragmatics of Every-
day Gestures, 205–216. Berlin: Weidler Buchverlag.
Sherzer, Joel 1991. The Brazilian thumbs-up gesture. Journal of Linguistic Anthropology 1: 189–197.
Stokoe, William C. 2001. Language in Hand: Why Sign Came Before Speech. Washington, DC:
Gallaudet University Press.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facturing of Meaning. Amsterdam: John Benjamins.
Tervoort, Bernard 1961. Esoteric symbolism in the communication behavior of young deaf chil-
dren. American Annals of the Deaf 106: 436–480.
Tomasello, Michael 2008. The Origins of Human Communication. Cambridge: Massachusetts
Institute of Technology Press.
Umiker-Sebeok, Jean and Thomas A. Sebeok (eds.) 1987. Monastic Sign Languages. Berlin: De
Gruyter Mouton.
Vermeerbergen, Myriam, Lorraine Leeson and Onno Crasborn (eds.) 2007. Simultaneity in Signed
Languages: Form and Function. Amsterdam: John Benjamins.
Willems, Roel M. and Peter Hagoort 2007. Neural evidence for the interplay between language,
gesture and action: A review. Brain and Language 101: 278–289.
Woll, Bencie 2007. Perspectives on linearity and simultaneity. In: Myriam Vermeerbergen, Lor-
raine Leeson and Onno Crasborn (eds.), Simultaneity in Signed Languages, 337–344. Amster-
dam: John Benjamins.
Yau, Shun-Chiu 1992. Creations Gestuelle Et Debuts Du Langage: Creation De Langues Gestuelles
Chez Des Sourds Isoles. Paris: Editions Langages Croisés.
Adam Kendon, Philadelphia, PA (USA)
2. Gesture as a window onto mind and brain, and the

relationship to linguistic relativity and ontogenesis
1. Introduction
2. “Gesture” in a psychological perspective
3. Example of this perspective
4. The growth point
5. Gesture and linguistic relativity
6. Gesture and ontogenesis
7. Neurogesture
8. Summary and brain model
9. References
2. Gesture as a window onto mind and brain 29
Abstract
This paper provides an overview on what is currently known about gestures and speech
from a psychological perspective. Spontaneous co-verbal gestures offer insights into ima-
gistic and dynamic forms of thinking while speaking and gesturing. Includes motion event
studies, also from cross-cultural and developmental perspectives, and concerning those
with language impairments.
“it’s like seeing someone’s thought” – Mitsuko Iriye, historian, on observing how to
code gestures.
1. Introduction
To see in gesture “someone’s thought,” as our motto remarks, we look at each case indi-
vidually and in close detail. Since they are unique in their context of occurrence, ges-
tures, for this purpose, are transcribed one by one, never accumulated, and, since
often it is the tiniest features through which thought peeks, we record in detail. Taking
gesture at this fine-grained scale, we cover a wide range – gestures in different types of
language (the “S-type” and “V-type”), gestures of children, and gestures in neurological
disturbances – and find in each region that our “window” provides views of thinking as
it takes place, different across languages, ages, and neurological condition.
2. “Gesture” in a psychological perspective

Defining “gesture” is a necessary but vexing exercise, bound to fall short of a fully
satisfying definition. It is a word with many uses, often pejorative and misleading,
and to find a replacement would be a real contribution, but one does not appear.
One problem is that the meaning of the word is not independent of the perspective
one takes. It thus has built into it acceptances and exclusions. It is wise to make
these known from the start.
A “psychological” perspective implies its own definition of “gesture.” Adam Kendon
placed gestures in the category of “actions that have the features of manifest deliberate
expressiveness” (2004: 13–14). I adopt this definition; it is the best that I have seen but
do so with one qualification and one proviso.
The qualification is that gesture cannot be deliberate; as we define them, “gestures”
are unwitting and anything but deliberate. (Kendon may have meant by “deliberate”
non-accidental, and with this I agree; but the word also conveys “done for a purpose,”
and with that I do not agree.)
The proviso concerns “action.” If by action we understand movements orchestrated
by some significance created by the speaker, this is accurate but (again) are not actions
to attain some goal.
So our definition, based on Kendon’s but excising “deliberate” and specifying the
kind of action (and far from tripping off the tongue), is this: A gesture is an unwitting,
non-goal-directed action orchestrated by speaker-created significances, having features
of manifest expressiveness.
Very often I use “gesture” still more restrictively to mean all of the above, plus:
An expressive action that enacts imagery (not necessarily by the hands or hands
alone) that is part of the process of speaking.
A slightly different term does denote speech-linked gesture: “gesticulation,” which in

fact was used by Kendon in an earlier publication (1980). I remain with “gesture” partly
for brevity, but more crucially because “gesticulation” carries an image of windmilling
arms that is false to the reality we are aiming to explain. This reaction is not idiosyncratic:
according to the Oxford English Dictionary to gesticulate is “To make lively or energetic
motions with the limbs or body; esp. as an accompaniment or in lieu of speech”.
These are regarded as rival claims about infant predispositions for the forms of lan-
guage, although in fact they seem closely related, emphasizing one side or the other of
the linguistic sign, signifier or signified. Bootstrapping from either is expected to lead to
other areas of language. One hypothesis holds that certain semantic (signified) patterns
evolved (such as actor-action-recipient, from Pinker 1989); the other that syntactic (sig-
nifier) patterns did (such as subject-verb-object, from Gleitman 1990); both provide
entrée to the rest of language. Of course, both or neither (as Slobin 2009 suggests in
his review) may have evolved. The picture is murky to say the least.
So through the gesture window we see a kind of action of the mind that is linked to
imagery and is part of language dynamically conceived, action regarded as part of the
process of thinking for (and while) speaking.
3. Example of this perspective

Here is one participant in our first experiment, recounting an animated color cartoon
that she has just watched. It is typical of many gestures that we see (Fig. 2.1). It is not
codified or “quotable” (to use Adam Kendon’s term), a gesture of the kind that appears
in gesture atlases and dictionaries of the “gesture language” of some nationality or
other. It is instead a unique, unlikely-to-recur, spontaneous, individually formed expres-
sion of the speaker’s idea at the moment of speaking. (For notation and the method of
data collection, see McNeill this volume b, but briefly, square brackets enclose the ges-
ture phrase; boldface shows the gesture stroke; underlining is a gesture hold, in this
case a poststroke hold; “/” is a silent speech pause, and font size reflects prosodic peaks.)
[ / and it goes dOWn]
BH/mirroring each other in tense spread

C-shapes, palms toward center (PTC), move
down from upper central periphery to lower cen-
tral periph.; sudden stop.
Fig. 2.1: Iconic gesture depicting an event from the animated stimulus, Canary Row. In the car-
toon, Sylvester, the ever-frustrated cat, attempts to reach Tweety, his perpetual prey, by climbing
a drainspout conveniently attached to the side of the building where Tweety in his birdcage
is perched. In this instance, Sylvester is climbing the pipe on the inside, a stealth approach. Tweety
nonetheless spots him, rushes out of his cage and into the room behind, then reappears with
an impossibly large bowling ball, which he drops into the pipe. In the example, the speaker is
describing the bowling ball’s descent. Used with permission of Cambridge University Press.
It is important to consider the precise temporal details of a gesture. They suggest that
in the microgenesis of an utterance the gesture image and linguistic categorization con-
stitute one idea unit, and their timing is an inherent part of how this idea unit is cre-
ated. The start of the gesture preparation is the dawn of the idea unit, which is kept
intact and is unpacked, as a unit, into the full utterance. The phases fall out at the pre-
cise moment of their intellectual realizations. Timing the gesture phases is inherent to
this developing meaning.
3.1. Interpreting the example

It is not implied that gesture and speech always convey the same elements of meaning;
they are co-expressive if they capture the same idea, but each may express a different
aspect of it. In a referential sense, speech and gesture in the example did convey the
same content, but semiotically they are not the same. Speech and gesture are “co-
expressive,” meaning that they each express, in their own semiotic ways, the same
underlying idea – here, that the bowling ball was moving down inside the pipe – but
semiotically they are not redundant.
The gesture in this example showed, in one symbolic form, a pipe-shape moving
downward. The object and the path through which it passes moved together. They
were fused into one package. Such a path-figure-ground unit (to use Talmy’s 2000
terms) does not occur in any language so far as we are aware. Talmy (2000) proposed
a typology reflecting how the basic motion verbs of a language incorporate motion
event components. In English and other “satellite-framed” languages, such as German,
the Scandinavian languages, Chinese and still others, verbs incorporate the manner of
motion but not the path (so run, walk, stagger, all denote different manners of motion
with direction unspecified). In Spanish, Japanese, Korean, American Sign Language and
other “verb-framed” languages, verbs incorporate path but not manner (two English
verbs borrowed from Romance give satellite speakers the flavor of verb framing,
“exit” and “enter” – “he exited out” or “entered in” is redundant). Yet other languages
have verbs that incorporate the figure, or the entity doing the moving; this mind-
bending situation appears in Atsugewi, a Hokan language spoken in Northern Califor-
nia that Talmy studied in depth and, again, appears in eclectic English (rain as a verb).
But apparently no language has single verbs that incorporate the figure and “ground”
(landmark) with the path, such as *to inside, a verb of motion meaning “some figure
moving upward or downward inside a container”. Nonetheless, the gesture in the exam-
ple embodied this semantic package, figure plus path plus ground. This kind of fusion is
typical of gesture. Meanings that in speech are analyzed into separate linguistic seg-
ments can be synthesized into single symbolic forms in gesture. Further, the speaker’s
hands, in their tension (“tense spread C-shapes”), may have embodied the narrative
idea of the bowling ball as the point of maximum energy of the episode, for which,
again, there is no speech equivalent (although in this instance the speaker could in
theory have mentioned it).
The gesture, then, synthesized several elements of meaning that would be separated
in speech, including some (like “downward moving hollowness”) that are impossible in
speech.
In speech, nearly everything is the reverse – the words “and it goes down,” the
intransitive construction, the metapragmatic function of the “and” – all conventional
forms in the codified system of English, and the meaning of the whole is composed out
of these separately meaningful parts according the plan of the intransitive phrase.
Due to synchrony, the gesture semiotic presents its content at the same time as the
linguistic semiotic, and this duality is an important key to what evolved. This mecha-
nism of combined semiotic opposites is one important spectacle we see through the
gesture window.
4. The growth point

A growth point (GP) is a mental package that combines both linguistic categorial and
imagistic components. It is called a growth point because it is meant to be the initial
pulse of thinking for and while speaking, out of which a dynamic process of organiza-
tion emerges. Growth points are brief dynamic processes, during which idea units take
form. It is a minimal unit, in Vygotsky’s 1987 sense of being the smallest unit that re-
tains the quality of being a whole – here, a minimal unit of combined imagery and lin-
guistic form. It is accordingly the smallest packet of an imagery-language dialectic, a
minimal unit on the dynamic dimension of language, and the smallest unit of change
on the microgenetic scale. For extensive discussion of the growth point, its relationship
to context, and how it models the dynamic dimension of language, see the previously
cited article (McNeill this volume b).
5. Gesture and linguistic relativity

“Whorf,” a gifted amateur linguist, in this discussion is less an individual than an
emblem for a range of ideas having to do with the influence of language on thought.
The “Whorfian hypothesis” addresses language as a static object. It describes “habitual
thought” as a static mode of cognition and how it is shaped through linguistic analogies
(see Lucy 1992a, 1992b; Whorf 1956). A corresponding dynamic hypothesis is “thinking
for speaking,” introduced by Dan Slobin as follows: “ ‘Thinking for speaking’ involves
picking those characteristics that (a) fit some conceptualization of the event, and (b) are
readily encodable in the language” (Slobin 1987: 435).
In terms of growth points, the thinking for speaking hypothesis is that, during speech,
growth points may differ across languages in predictable ways. It might be better called
the hypothesis of “thinking while speaking.” A major insight, however the dynamic
hypothesis is named, comes from the distinction between “satellite-framed” versus
“verb-framed” languages identified by Talmy (1975, 1985, 2000), a distinction referring
to how languages package motion event information, including path and manner.
5.1. S-type and V-type languages

As pointed out earlier, English follows the satellite-framed or “S-type” semantics (Slo-
bin 1996). In S-type languages, a verb packages the fact of motion with information
about manner. Rolls is an example, and the verb describes, in one package, that some-
thing is in motion and how it is moving – motion by rolling. The path component, in
contrast, is outside the verb. From rolls alone, we have no inkling of the direction of
motion; for that we add one or more satellites: rolls out/in/down/up/through.
A complex curvilinear path in an S-type description tends to be resolved into a series

of straight segments or paths. E.g., “and it goes down but it rolls him out down the rain-
spout out into the sidewalk into a bowling alley” (a recorded example) – one verb and
six satellites (italicized) or segments of path. It is also typical of these languages to
emphasize ground – each path segment tends to have its own locus with respect to a
ground element, as in this example: the sidewalk, rainspout, and bowling alley.
The other broad category of language is the verb-framed or “V-type,” of which Span-
ish, French, Turkish, American Sign Language, and Japanese are examples. In such lan-
guages, a verb packages the fact of motion with path (the direction of motion) – ascend
or descends, exits or enters, etc. – and it is manner that is conveyed outside the verb (or
omitted altogether).
Unlike an S-type language, a complex curvilinear path can be described holistically
with a single verb – descends, for example, for the same curvilinear path that was broken
into six segments in the English example. V-type languages tend to highlight a whole
mise en scène rather than an isolable landmark or ground (a collection of descriptions
like “there are tall buildings and a slanted street with some signs around, and he ascends
climbing,” in contrast to “he climbs up the drainpipe,” where upward path is localized
to the ground, the drainpipe). This is termed the “setting” by Slobin.
5.2. Implications for growth points

In keeping the Whorfian hypothesis, gestures also differ between S-type and V-type, im-
plying the possibility of different growth points and imagery-language dialectics in lan-
guages of the two kinds (see McNeill and Duncan 2000).
5.3. Effects on path

The following comparisons of English and Spanish speakers describing the same bowl-
ing ball episode, speakers create different visuospatial imagery.
5.3.1. English
The above example of a speaker describing the aftermath of the bowling ball event di-
vided the event into six path segments, each with its own path gesture:
(i) and it goes down

(ii) but it rolls him out
(iii) down the rain spout
(iv) out
(v) into the sidewalk
(vi) into a bowling alley and he knocks over all the pins
The match between speech and gesture is nearly complete. The speaker’s visuospatial
cognition – in gesture – consists of a half dozen straight line segments, not the single
curved path that Sylvester actually followed (Fig. 2.2).
Gesture Synchronous speech
PATH 1[/ and it goes

down]
PATH 2 but [[it roll]

[s him out*]]
PATH 3 [[down the / / ]
PATH 4 [ / rainspo]]
(Continued )
PATH 5 [ut/ out i][nto
PATH 6 the sidew]alk/

into a] [bowling alley
Fig. 2.2: English speaker’s segmentation of a curvilinear path. Computer art in this and subse-
quent illustrations by Fey Parrill. Used with permission of University of Chicago Press.
5.3.2. Spanish
In video recordings by Karl-Erik McCullough and Lisa Miotto, Spanish speakers,
in contrast, represent this scene without significant segmentation. Their gestures are
single, unbroken curvilinear trajectories. In speech, the entire path may be covered
by a single verb. The following description is by a monolingual speaker, recorded in
Guadalajara, Mexico:
(1) [entonces SSS]

then SSSS he falls
The accompanying gesture traces a single, unbroken arcing trajectory down and to
the side. What had been segmented in English becomes in Spanish one curvaceous ges-
ture that re-creates Sylvester’s path. In speech, the speaker made use of onomatopoeia,
which is a frequent verb substitute in our Spanish-language narrative materials
(Fig. 2.3).
To quantify this possible cross-linguistic difference, the table shows the number of
path segments that occur in Spanish and English gestures for the path running from
the encounter with the bowling ball inside the pipe to the denouement in the bowling
alley (Tab. 2.1). English speakers break this trajectory into 43 percent more segments
than Spanish speakers: 3.3 in English and 2.3 in Spanish. Extremes, moreover, favor
Fig. 2.3: Spanish speaker’s single continuous arc for Sylvester’s circuitous trip down the pipe,
along the street and into the bowling alley (scan images from left to right). Elapsed time is
about 1 sec. This illustrates Spanish-style visuospatial cognition of a curved trajectory as single
a single, unsegmented path. Used with permission of University of Chicago Press.
Tab. 2.1: Segmentation of paths by English- and Spanish-speaking adults

Number of gestures
Segments English Spanish
0 0 1
1 3 5
2 7 6
3 3 4
4 2 1
5 1 0
≥6 5 1
Total 21 18
English. Five English speakers divided the trajectory into six or more segments, com-
pared to only one Spanish speaker. Thus Spanish speakers, even when they divide
paths into segments, have fewer and broader segments.
5.4. Effects on manner

5.4.1. Manner fogs
Slobin has observed many times in Spanish speech and writing that manner is cumber-
some to include, and consequently speakers and writers tend to avoid it if they can (Slo-
bin 1996). However, manner does not necessarily thereby disappear from the speaker’s
consciousness. The result is often a “manner fog” – a scene description that has multiple
manner occurrences in gesture but lacks manner in speech. An example is the follow-
ing, a description of Sylvester climbing the pipe on the inside:
(2) e entonces busca la ma[nera (silent pause)]

and so he looks for the way
Gesture depicts the shape of the pipe: ground.
(3) [de entra][r / / se met][e por el]

to enter REFL goes-into through the
Both hands rock and rise simultaneously: manner + path combined (left hand only
through "mete").
(4) [desague / / ] [/ / si?]

drainpipe…yes?
Right hand circles in arc: manner + ground (shape of pipe).
(5) [desague entra /]

drainpipe, enters
Both hands briefly in palm-down position (clambering paws) and then rise with
chop-like motion: manner+ path combined.
Gestural manner was in the second, third, and fourth lines, despite the total absence of
spoken manner references. Thus, while manner may seem absent when speech alone is
considered, it can be present, even abundant, in the visuospatial thinking.
Fig. 2.4: Spanish speaker’s “manner fog,” while describing Sylvester’s inside ascent. She is at line
2, “[de entra][r / / se met][e por el]” (to enter refl goes-into through the). Her hands continually
rock back and forth (= climbing manner) while rising (= upward path) but without verbal mention
of manner.
5.5. Manner modulation

In English, the opposite takes place, in a sense. Whereas a manner fog adds manner
when it is lacking from speech, modulation adjusts manner that is obligatory in speech.
Modulation solves a problem created by English verbs, that they often package manner
with motion and are accordingly manner verbs as well as verbs of motion, even when a
speaker intends to convey only the fact of motion. A gesture, however, can include
manner or not, and can accordingly modulate the manner component of verbs in
exact accord with intentions. The following examples, from different English speakers,
show manner modulation – respectively, reinforcement of manner and removal of
manner:
(6) Speaker A (removes manner)

but [it roll]s him out down the
Both hands sweep to right and both rotate at the wrist as they go, conveying both
path and manner
The gesture contains manner and synchronizes with the manner verb, “rolls.” The con-
text highlighted manner as the point of differentiation. The content and co-occurrence
highlight manner and suggest that it was part of the psychological predicate.
(7) Speaker B (removes manner)

and he rolls [/ down the drai]nspout
Left hand (loose fist shape, palm-side toward self) plunges straight down, convey-
ing path only
This gesture, despite the presence of the same verb, “rolls,” skips the verb and presents
no manner content of its own. It shows path alone, and co-occurs with the satellite,
“down.” Both the timing and the shape of the gesture suggest that manner was not a
major element of the speaker’s intent and that “rolls,” while referentially accurate,
was de-emphasized and functioned as a verb of motion only, with the manner content
modulated (the speaker could as well have said “goes down,” but this would have
meant editing out the true reference to rolling).
5.6. Chinese and English: Thematic groups and predicate domination

Chinese motion event gestures often resemble those in English, as would be expected
from their shared typology as S-type languages. However, there are also differences.
Chinese speakers perform a kind of gesture that appears, so far as I am aware, only
in that language. It is as if, in English, we said a stick and, simultaneously, performed
a gesture showing how to give a blow. While such a combination is obviously possible
for an English speaker, it does not occur often, and when it does it is treated as an error
by the speaker. However, they take place with Mandarin speakers, and seem to do so
with some frequency.
The hallmark of this Chinese pattern is that a gesture occurs earlier in the tempo-
ral sequence of speech than we would find in English or Spanish. In an example and
transcription from Susan Duncan we find the following:
(8) lao tai-tai [na -ge (9) da bang hao]-xiang gei (10) ta da-xia
old lady hold big stick seem CAUSE him hit-downverb-satellite
CLASSIFIER
‘The old lady apparently knocked him down with a big stick’
The gesture (a downward blow with her left hand, fist clenched around “the stick,” palm
facing center) that accompanied the spoken reference to the stick (da bang ‘big stick’)
was expressively the same as the verb and satellite, da-xia ‘hit-down’. However, the
speaker’s hand promptly relaxed, long before this verb phrase was reached in speech.
Chinese is what Li and Thompson (1976, 1981) termed a “topic prominent” lan-
guage. Wallace Chafe stated the sense of topicalization intended: “What the topics
appear to do is limit the applicability of the main predication to a certain restricted
domain […] the topic sets a spatial, temporal, or individual framework within which
the main predication holds” (Chafe 1976: 50).
In this instance, the domain is what was done with the big stick. English and Spanish,
in contrast, are “subject prominent.” Utterances in the latter languages are founded on
subject-predicate relations. In line with this typological distinction, we find cases like
the above, in which gesture provides one element and speech another element, and
they jointly create something like a topic frame. This may be again, therefore, the
impact of language on thinking for speaking.
In English, too, gestures occasionally occur that depict an event yet to appear in
speech (referring here to time lapses far longer than the previously discussed frac-
tion-of-a-second gesture anticipations). Such premature imagery is handled by the
speaker as an error, requiring repair. In the following, a gesture shows the result of
an action and it occurred with speech describing its cause. This is a semantically
appropriate pairing not unlike the Chinese example, but it involved separating the ges-
ture from the predicate describing the same event. It was repaired first by holding it
until the predicate arrived, and then repeating it in enlarged form:
[so it hits him on the hea]

[d and he winds up rolling down the stre]et
The two gestures in the first clause depicted Sylvester moving down the street, an event
described only in the following clause. The difference between Chinese and English in
this situation is apparent in the second line, the point at which the target predication
emerged in speech. Unlike the Chinese speaker, whose hands were at rest by now,
this English speaker held the gesture (underlined text) and then repeated it in a larger,
more definite way when the possible growth point occurred.
The subsequent enhanced repeat indicates the relevance of the gesture to the pred-
icate. In other words, the speaker retained the imagery from the first clause for the
growth point of the second. She did not, as the Chinese speaker did, use it as a self-
contained framing unit when it first appeared.
5.7. Summary: Visuospatial cognition across languages

From the gesture evidence, we infer the following differences in visuospatial cognition
across the languages surveyed:
(i) Gestural paths tend to be broken into straight line segments in English and into
unbroken curvilinear wholes in Spanish. Chinese also tends to break paths into
straight line segments.
(ii) Gestural manner tends to expand the encoding resources of Spanish and to
modulate them in English (the relationship in Chinese is not known).
(iii) Gestures can combine with linguistic segments to create discourse topics: this
occurs in Chinese, but not in English or Spanish.
6. Gesture and ontogenesis

6.1. The decomposition effect
“Decomposition” refers to a reduction of motion event complexity in gestures after an
earlier stage where, seemingly, they have full complexity. Decomposition suggests that
the meanings within gestures are becoming analytically integrated with speech: the path
and manner components of motion events come to be handled separately, as they also
are in linguistic representations (see “he climbs [manner and the fact of motion] up
[path]”).
Episodes in the animated cartoon are often comprised of motion events in which
path and manner components are simultaneously present. Sylvester rolling down a
street with the bowling ball inside him is a motion event incorporating both path
(along the street) and manner (rolling). Adults, when they describe such motion events,
typically produce gestures showing only path (for example, the hand moving down) or
gestures showing in a single gesture both manner and path (for example, the hand
rotating as it goes down for rolling). Manner without path, however, rarely occurs. Chil-
dren, like adults, have path-only gestures but, unlike adults, they also have large num-
bers of pure manner gestures and few path+manner gestures.
Fig. 2.5: No decomposition with an English speaking 2;6 year-old, who has path and manner in
one gesture. The hand simultaneously swept to the right, moved up and down, and opened and
closed. Computer art by Fey Parrill. Used with permission of the University of Chicago Press.
In other words, they “decompose” the motion event to pure manner or pure path, and
tend not to have gestures that combine the semantic components.
Decomposition, while seemingly regressive, is actually a step forward. The youngest
child from whom we have obtained any kind of recognizable narration of the animated
stimulus was a two-and-a-half year-old English-speaking child. The accompanying illus-
tration (Fig. 2.5) shows her version of Sylvester rolling down the street with the bowling
ball inside him (she reasons that it is under him).
The important observation is that she does not show the decomposition effect. In a
single gesture, the child combines path (her sweeping arc to the right) and manner (in
two forms – an undulating trajectory and an opening and closing of her hand as it
sweeps right, suggested by the up-and-down arrow).
Is this an adult-like combined manner-path gesture? I believe not. An alternative
possible interpretation is suggested by Werner and Kaplan (1963), who described a non-
representational mode of cognition in young children, a mode that could also be the
basis of this gesture. Werner and Kaplan said that the symbolic actions of young chil-
dren (in this case, the gesture) have “the character of ‘sharing’ experiences with the
Other rather than of ‘communicating’ messages to the Other” (1963: 42). Sharing
with, as opposed to communicating and representing to, could be what we see in the
two-and-a-half year-old’s gesture. The double indication of manner is suggestive of
sharing, since this redundancy would not be a relevant force, as it might have been in
a communicative representation of this event where the child were merely trying to
re-create an interesting spectacle for her mother.
One of the first attempts by children to shape their gestures for combination with
language could be the phenomenon of path and manner decomposition. The mecha-
nism causing this could be that the decomposition effect creates in gesture what Karmil-
off-Smith (1979) has suggested for speech: When children begin to see elements of
meaning in a form, they tend to pull these elements out in their representations to
get a “better grip” on them. Bowerman (1982) added that the elements children select
tend to be those with “recurrent organizational significance” in the language. Manner
and path would be such elements, and their reduction in gesture to a single component
could be this kind of hyperanalytic response.
Three illustrations show the decomposition effect in English (age 4, Fig. 2.6), Man-
darin (age 3;11, Fig. 2.7), and Spanish (age 3;1, Fig. 2.8).
Fig. 2.6: English-speaking four-year-old with decomposition to manner alone. The child is describ-
ing Tweety escaping from his cage. The stimulus event combined a highly circular path with flying.
The child reduces it to pure manner – flapping wings, suggested by the two arrows – without path,
which was conspicuous in the stimulus and had been emphasized by the adult interlocutor (not
shown, but who demonstrated Tweety’s flight in a simultaneous path-manner gesture). The
embodiment of the bird, in other words, was reduced to pure manner, path excised. Computer
art by Fey Parrill. Used with permission of the University of Chicago Press.
Fig. 2.7: Mandarin-speaking 3;11-year-old with decomposition to manner – clambering without

upward motion. The child is describing Sylvester’s clambering up the pipe on the inside. The
hands depict manner without upward path (while he says, “ta* [# ta zhei- # yang-zi* /] he* (‘#
he this- # way* /’). Direction is shown through his upward-orientated body and arms. Direction
is one aspect of path, although there is no upward motion in this case. Computer art by Fey Parrill.
Used with permission of the University of Chicago Press.
Fig. 2.8: Spanish-speaking 3;1-year-old with decomposition to manner – clambering without

upward motion. The child is likewise describing Sylvester as he climbs up the pipe on the inside.
His mother had asked, y lo agarró? (‘and did he grab him?’) and the child answered, no # /se
subió en el tubo [y le corrió] (‘no # he went up the tube and he ran’), with the gesture illustrated –
both hands clambering without path. Computer art by Fey Parrill. Used with permission of the
University of Chicago Press.
In languages as different as English, Mandarin and Spanish, children beyond three

years decompose motion events that are fused in the stimulus, and are fused again by
adult speakers of the same languages (see “Whorf ” section above).
6.2. Perspective
We gain insight into the decomposition effect and how it forms a step in the child’s
emerging imagery-language dialectic when we consider gestural viewpoint; the first-
person or character viewpoint (C-VPT) and the third-person or observer viewpoint
(O-VPT). In observer viewpoint, the speaker’s hands are a character or other entity
as a whole, the space is a stage or screen on which the action occurs, and the speaker’s
body is distanced from the event and is “observing” it. In character viewpoint, the
speaker’s hands are the character’s hands, her body is its body, her space its space,
etc. – the speaker enters the event and becomes the character in part. Unlike panto-
mime, character viewpoint is synchronized and co-expressive with speech and forms
psychological predicates (see Parrill 2011 for extensive discussion of viewpoint combi-
nations). Tab. 2.2 shows the viewpoints of path decomposed, manner decomposed and
fused path + manner gestures for three age groups; all are English speakers.
For adults, we see that most gestures are observer viewpoint, both those that fuse
manner and path and those with path alone. Few gestures in either viewpoint occur
with manner alone.
For children, both older and younger, we see something quite different. Not only do
we see the decomposition effect, but manner and path are sequestered into different
viewpoints. Path tends to be observer viewpoint and manner character viewpoint.
This sequestering enforces the path-manner decomposition: if one gesture cannot
have both viewpoints, it is impossible to combine the motion event components.
Tab. 2.2: Gestural viewpoints of English speakers at three ages*

Viewpoint M+P M P
Combined Decomposed Decomposed
Adults C-VPT 0% 6% 0%
N=25 O-VPT 38% 0% 58%
7-11 years C-VPT 3% 27% 19%
N=23 O-VPT 4% 13% 34%
3-6 years C-VPT 5% 25% 2%
N=45 O-VPT 9% 10% 49%
*All figures are percentages. M = manner; P = path. C-VPT = character viewpoint; O-VPT =
observer viewpoint.
The decomposition effect and this viewpoint sequestering are very long lasting; children
up to twelve years old still show them. Longevity implies that the final break from
decomposition depends on some kind of late-to-emerge development that enables
the child (at last) to conceptualize manner in the observer viewpoint. Until this devel-
opment, whatever it may be, the difference in perspective locks in path-manner
decomposition.
6.3. Imitation
In an example thanks to Karl-Erik McCullough, the decomposition of manner and its
sequestering in character viewpoint is revealed in another way by imitation. Children
do not imitate model gestures with manner in observer viewpoint, even when the
model is directly in front of them and the imitation is concurrent. They change the
model to fit their own decompositional semantics, putting manner into the character
viewpoint and omitting path. In Fig. 2.9, a four-year-old is imitating an adult model.
The adult depicts manner (running) plus path in observer viewpoint, his hand moving
forward, fingers wiggling. The child watches intently; nonetheless, she transforms
Fig. 2.9: Decomposition to manner alone in imitation of model with combined path and manner.
Computer art by Fey Parrill. Used with permission of the University of Chicago Press.
the gesture into manner with no path, in character viewpoint (in the gesture, she is
Sylvester, her arms moving as if running).
7. Neurogesture
I describe here a case of severe Broca’s agrammatic aphasia from Pedelty (1987), a case
of Wernicke’s aphasia, also from Pedelty, a case of split-brain gesture, collected in col-
laboration with Dalia Zaidel, a psychologist at University of California, Los Angeles,
and the effects of right hemisphere injury on gesture, collected in collaboration with
Laura Pedelty. The first case demonstrates the presence of growth points in Broca’s
aphasia, the second the truncation of growth points in Wernicke’s aphasia, and the
split-brain case a role in the production of iconic gesture for the right hemisphere, a
role confirmed by the study of right-hemisphere injury itself.
7.1. Agrammatic (Broca’s) aphasia

To judge from Pedelty’s data, Broca’s aphasia spares
(i) growth points,

(ii) the capacity to construct the context from which it is differentiated, and
(iii) the formation of psychological predicates,
but it impairs the ability to access constructions and to orchestrate sequences of speech
and gesture movements.
The speaker in Fig. 2.10 had viewed the animated stimulus (the bowling ball scene).
She clearly was able to remember many details of the scene but suffered extreme
impairment of linguistic sequential organization: “cat – bird? – ‘nd cat – and uh – the
uh – she (unintell.) – ‘partment an’ t* – that (?) – [eh ///] – old uh – [mied //] – uh –
woman – and uh – [she] – like – er ap – [they ap – #] – cat [/] – [an’ uh bird /] – [is //] –
Fig. 2.10: Gestures by an agrammatic (Broca’s) aphasic speaker timed with “an’ down t’ t’ down.”
The speaker was attempting to describe Tweety’s bowling ball going down the drainpipe. Com-
puter art by Fey Parrill. Use with permission of the University of Chicago Press.
I uh – [ch- cheows] – [an’ down t’ t’ down]”. Gestures occurred at several points, indicated
with square brackets, and appeared to convey newsworthy content. The figure shows a
gesture synchronous with “an’ down t’ t’ down,” depicting the bowling ball’s downward
path. Plausibly, this combination of imagery and linguistic categorization was a growth
point. The gesture occurred at the same discourse juncture where gestures from normal
speakers also occur, implying that for the patient, as for normals, a psychological pred-
icate was being differentiated from a field of oppositions.
7.1.1. Broca catchments

We also see Broca’s aphasics briefly overcoming severe agrammatic limits in the course
of gesture catchments (catchments are when space, trajectory, hand shape, etc. recur in
two or more – not necessarily consecutive – gestures). Such recurring features mark
out discourse cohesion and provide an empirical route, based on gestures themselves,
to the discovery of the discourse beyond the individual utterance (for more, see
McNeill this volume b). In one case, over time, with ongoing spatial recurrences
made by repeated gesture points into the upper space, speech advanced from single
elements (“el”), to phrases (“on the tracks”), to a single clause (“he saw an el
train”) to, finally, in the last slide and without a gesture, a sentence with an embedded
clause (“he saw the el train comin’ ”). Of course, the duration of the time it took to
reach the final two-clause construction was far too great for normal social discourse
(two minutes, seventeen seconds), but it shows that complex linguistic forms are pos-
sible with gesture support.
7.2. Wernicke’s aphasia

Wernicke’s aphasia, in a sense, is the inverse of Broca’s. While fluent, speech is seman-
tically and pragmatically empty or semantically and pragmatically unconstrained,
such as distortions of word forms (paraphasias) and “clangs,” unbridled phonetic
primings such as “a little tooki tooki goin’ to-it to him,” for example. It is difficult
to say what exactly the speaker is trying to say in the following, but the recurring
speech and gesture seem to reflect his impression of Sylvester’s many attempts to
reach Tweety (“go to it”).
a little tooki tooki goin to-it to him

looki’ on a little little tooki goin’ to him
it’s a not digga not næ he weduh
like he’ll get me mema run to-it they had to is
then he put it sutthing to it takun a jo to-it
that’s nobody to-it
I mean pawdi di get to-it she got
got got glasses she could look to-it
After injury, gestures, like speech, are garbled and lack intelligible pragmatic or seman-
tic content. Strikingly, one gesture-speech combination (“to-it” with the gesture in
Fig. 2.11), seems to have become fixed in his memory, and repeatedly occurred; a
growth point that – very abnormally – would not switch off (normal growth points
disintegrate after a second or two; see McNeill 1992: 240–244).
Fig. 2.11: Wernicke aphasic recurring imagery with the phrase “to-it.” Each panel shows a speech-
gesture combination created without meaningful context. The panels represent temporally widely
separated points, and show “getting to-it.” Computer art by Fey Parrill. Used with permission of
the University of Chicago Press.
Within traditional models of the brain, Wernicke’s area supplies linguistic categorial
content. It is known to be essential for speech comprehension, which is severely dis-
rupted after injury to the posterior speech area (Gardner 1974). However, it also
might play a role in speech production.
As inferred from the effect of injuries, Wernicke’s area could help generate the ca-
tegorial content of growth points; in turn, this content giving the imagery of the
growth point a shape that accords with the linguistic meanings. Damage accordingly
interferes the growth point, as we see in the transcript and Fig. 2.11. The repetitiveness
in the “to-it” example, whereby an initially meaningful speech-gesture combination (as
it appears) became detached from context and sense, ensures that all ensuing growth
points would be denied content (since they cannot vary their linguistic categorial
parts).
7.3. Right hemisphere injury

The right hemisphere is often called “nonlinguistic,” and this label is appropriate in one
sense – limited access to the static dimension. But the dynamic dimension – imagery,
context, relevance – depends on it. As suggested in the Wernicke discussion, the
right hemisphere may be a brain region involved in the formation of growth points.
In contrast to Wernicke’s aphasia, where the growth point itself breaks down, right
hemisphere damage affects the contextual background of the growth point, catchments
and fields of oppositions, and hence psychological predicates. All of this is demonstrated
in the cases below, recorded in collaboration with Laura Pedelty (see McNeill and Ped-
elty 1995 for a summary).
7.3.1. Imagery decline

One effect of right hemisphere damage is to reduce the sheer amount of gesture. In
turn, reduced output suggests depletion of imagery. Not all right-hemisphere injured
patients display reduced gesture output. Our sample of 5 clumps at the low end of the
distribution. Two other right-hemisphere patients had gesture outputs in the normal
range. The difference between the groups presumably is due to details of the injured
areas (hand dominance was not a factor). Since the two preserved patients and the 5 de-
pleted patients were non-overlapping groups, it is misleading to combine them statisti-
cally. We focus on the depletion phenomenon and limit our sample to that group
therefore. Tab. 2.3 compares Canary Row narrations by 5 right hemisphere patients
to those by 3 normal speakers.
Tab. 2.3: Effect of right hemisphere injury on gesture

5 RH 3 Normal
20 total gestures average 103 total gestures average
0.2 gestures/clause 1.1 gestures/clause
4.2 gestures/minute 15 gestures/minute
Injury has no impact on speech, as measured in the number of clauses and number of
words, or the length of time taken to recount the stimulus; if anything, right hemisphere
damaged speakers are more talkative on these measures (Tab. 2.4).
Tab. 2.4: Non-effect of right hemisphere injury on speech

5 RH 3 Normal
114 clauses 96 clauses
773 words 656 words
And right hemisphere damaged patients talk faster – more words and clauses, while
taking less time (Tab. 2.5).
Tab. 2.5: Effect of right hemisphere injury on speech time

5 RH 3 Normal
5.3 minutes 6.8 minutes
21.5 clauses/minute 14.1 clauses/minute
145.8 words/minute 96.5 words/minute
Gesture imagery thus seems to be the specific target of right hemisphere damage.
7.3.2. Cohesion deficit

It is well-known that right hemisphere injury interrupts the cohesion and logical coher-
ence of discourse (Gardner et al. 1983). This breakdown is seen in the verbal descrip-
tions such patients produce. Narratives clearly exhibit a breakdown of cohesion/
coherence. One patient begins with an event (ascending the drainpipe) and then,
without indicating a transition, jumps to the middle of a different event (involving an

organ grinder and a monkey disguise). It then shifts back to the end of the drainpipe
event; then moves back to the start of the organ grinder-monkey event; and finally re-
turns to the organ grinder-monkey event now in the middle.
As Susan Duncan has pointed out, the patient seems unaware, in other words, of the
logical and temporal flow in the story. The speaker recalls the events from the cartoon
in a more or less random order. His narrative strategy was to follow stepwise associa-
tions, with each successive association triggering a further step. We shall encounter a
similar incremental style in a split-brain patient (patient LB). Reliance on association
might be the left hemisphere’s modus operandi in both cases.
7.3.3. Unstable growth points

Given the central role of imagery in the formation of growth points, right hemisphere
injury could (a) disrupt growth point functioning by disturbing the visuospatial compo-
nent of the growth point. It could also (b) lead to instability of the imagery-language
dialectic, making catchments difficult to achieve.
A phenomenon supporting both hypotheses is that, in some right hemisphere pa-
tients, chance gaze at one’s own gesture causes a change of linguistic categorization.
This illustrates an instability and fragility of the language-gesture combination and
is a further manifestation of a lack of discourse cohesion. In the following, there is a
lexical change after the subject observes her own hopping gesture:
I just saw the* # the cat running ar* run* white and
black cat [# running here or there t* hop to here*] here, there, everywhere.
a b c
Hand hops forward four times:
a = onset of first hopping gesture
b = between second and third hopping gestures, and the approximate point
when her hand entered her field of vision
c = fourth hopping gesture
The speaker was describing Sylvester’s running and began with this verb, but, for rea-
sons unknown, her hand was undulating as it moved forward. As she caught sight of her
own motion, she started to say “hopping.” Kinetic experience also may have been a fac-
tor, but it was not sufficient since the change occurred only when her hopping hand
moved into her field of view.
The example illustrates an imagery-language looseness and release from ongoing
cohesive constraints that seems to be a result of right hemisphere damage. The imagery
with “running” was not constrained by the linguistic category “to run,” in contrast
to normal gesture imagery that is strongly adapted to the immediate linguistic environ-
ment. It also illustrates, however, that speech and gesture are still tightly bound after
right hemisphere damage, in that speech shifted when the undulating gesture came
into view.
7.4. The split-brain

The surgical procedure of commisurotomy (the complete separation of the two hemi-
spheres at the corpus callosum) has been performed in selected cases of intractable
epilepsy, where further seizures would have led to dangerous brain injury. Such
cases have fascinated neuropsychologists for generations. The patients seem to have
two sensibilities inside one skull, each half brain with its own powers, personality
and limitations. We had an opportunity to test two patients, LB and NG, through
the good offices of Colwyn Trevarthen, at the University of Edinburgh, who intro-
duced us to Dalia Zaidel, psychologist at University of California, Los Angeles. She
was studying and looking after the patients and generously agreed to videotape
them retelling our standard animated stimulus (for a general description of the split-
brain patient, see Gazzaniga 1970, and for a history of how they have been studied,
Gazzaniga 1995).
There should be obstacles as a result of the split-brain procedure for organizing lin-
guistic output in terms of the expected coordination of the two hemispheres. Straight-
forward organization of linguistic actions should not be possible. In fact, LB and NG
appear to follow distinct strategies designed to solve the two-hemisphere problem
(see McNeill 1992). LB seems to rely heavily on his left hemisphere, even for the pro-
duction of gestures, and makes little use of his right hemisphere. NG, in contrast, seems
“bicameral,” her left hemisphere controlling speech and her right hemisphere her ges-
tures (she was strongly right handed, but a bicameral division of function is possible
since each hemisphere has motor links to both sides of the body). Accomplishing this
feat implies that NG was communicating to herself externally – her left hemisphere
watching her right hemisphere’s gestures and her right hemisphere listening to her
left hemisphere’s speech. As a result, although her gestures were often synchronized
with speech, they also could get out of temporal alignment. The most telling asynchrony
is when speech precedes co-expressive gesture, a direction almost never seen in normal
speech-gesture timing, but not uncommon in NG’s performance.
LB had few gestures. Most were beats or simple conduit-like metaphoric gestures
with the hand, palm up, “holding a discursive object,” performed in the lower center
gesture space, near his lap. This absence of iconicity is consistent with a left-hemisphere
origin of his gestures. He could make bimanual gestures, almost always two similar
hands of the Palm Up or Palm Down Open Hand types, with corresponding metaphoric
significances. Again, this could be managed from the left hemisphere via bimanual
motor control. His narrative style was list-like, a recitation of steps from the cartoon,
sometimes accompanied by counting on his fingers, which also is consistent with a pre-
ponderantly left-hemisphere organization. This decontextualized style and minimal ges-
turing may be what the left hemisphere is capable of on its own. His approach was not
unlike that of the right-hemisphere patient described earlier, who also displayed a list-
like form of recitation. LB’s recall, however, was better, and far more sequential. Such
similarity is explained if neither speaker was using his right hemisphere to any degree,
albeit for different reasons.
In contrast, NG remembered less, but had gestures of greater iconicity. Her gestures
look repetitive and stylized, although this impression is difficult to verify. Still, her nar-
ration, while poorer than LB’s in the amount recalled, was more influenced by a sense
of the story line.
LB and NG therefore jointly illustrate one of our main conclusions above – the right
hemisphere (available to NG, apparently minimally used by LB) is necessary for situ-
ating speech in context and imbuing imagery with linguistically categorized signifi-
cance; the left hemisphere (relied on by LB, available to NG) orchestrates well-
formed speech output but otherwise has minimal ability to apprehend and establish
discourse cohesion.
7.4.1. A right-hemisphere coup d’état?

LB had of course an intact right hemisphere. In some instances it appears to have as-
serted itself in a kind of coup d’état. LB sometimes performed elaborate iconic ges-
tures; the trade-off was that speech then completely stopped. The right brain appears
to have taken control and speech – Broca’s specialty – ceased for the duration. An
example is LB saying, “he had a plan,” then speech stopping while an elaborate iconic
gesture took place (the elapsed time more than a second). After the gesture, speech
then resumed with “to get up,” completing the clause – as if the left hemisphere had
switched to standby while the right hemisphere intruded. Each hemisphere was thus
performing its specialty but could not coordinate with the other hemisphere. With nor-
mal speakers this event is a discourse climax and would be registered in an iconic
gesture with synchronous speech and together they would highlight the climatic
role. Climax is what the right hemisphere would apprehend, and the discourse junc-
ture seems to have activated LB’s gesture but in so doing, his growth point leapt the
chasm to the right hemisphere, leaving the left and with it the power of speech
behind. Lausberg et al. (2003) suggest that the isolated left hemisphere simply ignores
experiences arising from the right hemisphere (also Zaidel 1978). In this instance,
however, LB’s left hemisphere was attentive to a right hemisphere gesture – like
NG in this regard, possibly traveling an external attention route by one hemisphere
to the other.
8. Summary and brain model

This article is meant to give an overview of a “psychological perspective” on gesture –
what through this window we see of mind and brain. The vista can be summarized with
steps toward a brain model of language and gesture.
The language centers of the brain have classically been regarded as just two, Wer-
nicke’s and Broca’s areas, but if we are on the right track, contextual background infor-
mation must be present to activate the broader spectrum of brain regions that the
model describes. Typical item-recognition and production tests would not tap these
other brain regions but discourse, conversation, play, work, and the exigencies of
language in daily life would.
(i) The brain must be able to combine motor systems – manual and vocal/oral – in a
systematic, meaning-controlled way.
(ii) There must be a convergence of two cognitive modes – visuospatial and linguistic –
and a locus where they converge in a final motor sequence. Broca’s area is a log-
ical candidate for this place. It has the further advantage of orchestrating actions
that can be realized both manually and within the oral-vocal tract. MacNeilage
(2008) relates speech to cyclical open-close patterns of the mandible, and proposes
that speech could have evolved out of ingestive motor control. (See language ori-
gin theories in McNeill this volume a).
(iii) More than Broca’s and Wernicke’s areas underlie language – there is also the right
hemisphere and interactions between the right and left hemispheres, as well as
possibly the frontal cortex. A germane result is Federmeier and Kutas (1999),
who found through evoked potential recordings different information strategies
in the right and left sides of the brain – the left they characterized as “integrative,”
the right as “predictive.” These terms relate very well to the hypothesized roles
of the right and left hemispheres in the generation of growth points and unpack-
ing. The growth point is integrative, par excellence, and is assembled in the right
hemisphere, per the hypothesis of this chapter; unpacking is sequential orches-
tration, and orchestration would be involved in prediction, when that is the
experimental focus. And Kelly, Kravitz and Hopkins (2004) observe evoked
response effects (N400) in the right brain when subjects observe speech-gesture
mismatches.
(iv) Wernicke’s area serves more than comprehension – it also provides categorization,
might initiate imagery and might also shape it.
(v) Imagery arises in the right hemisphere and needs Wernicke-originated categoriza-
tions to form growth points. Categorial content triggers and/or shapes the imagery
in the right hemisphere. At the same time, it is related to the context to which the
right hemisphere has access.
(vi) The growth point is unpacked in Broca’s area. Growth points may take form in the
right hemisphere, but they are dependent on multiple areas across the brain (fron-
tal, posterior left, as well as right and anterior left). In addition, the cerebellum
would be involved in the initiation and timing of gesture phases relative to speech
effort (see Spencer et al. 2003). However, this area is not necessarily a site
specifically influenced by the evolution of language ability.
(vii) Catchments and growth points specifically are shaped under multiple influences –
from Wernicke’s area, the right hemisphere, and the frontal area – and take form
in the right hemisphere. (For catchments, see McNeill this volume b).
Throughout the model, the concept is that information from the posterior left hemi-
sphere, the right hemisphere, and the prefrontal cortex converge and synthesize in
the frontal left hemisphere motor areas of the brain – Broca’s area and the adjacent
premotor areas. This circuit could be composed of many smaller circuits – “localized
operations [that] in themselves do not constitute an observable behavior […] [but]
form part of the neural ‘computations’ that, linked together in complex neural circuits,
are manifested in behaviors” (Lieberman 2002: 39). See Feyereisen (volume 2), for
evidence from aproprioception for a thought-language-hand link in the brain.
Broca’s area in all this is the unique point of (a) convergence and (b) orchestration of
manual and vocal actions guided by growth points and semantically framed language
forms. The evolutionary model presented in McNeill (this volume a) specifically aims
at explaining orchestration of actions under other significances in this brain area and
how it could have been co-opted by language and thought.
9. References
Bowerman, Melissa 1982. Starting to talk worse: Clues to language acquisition from children’s late
speech errors. In: Sidney Strauss (ed.), U-Shaped Behavioral Growth, 101–145. New York: Aca-
demic Press.
Chafe, Wallace 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view.
In: Charles N. Li (ed.), Subject and Topic, 25–55. New York: Academic Press.
Feyereisen, Pierre volume 2. Gesture and the neuropsychology of language. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body – Language – Communication: An International Handbook on Multimodality in Human
Interaction. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Federmeier, Kara D. and Marta Kutas 1999. Right words and left words: Electrophysiological evi-
dence for hemispheric differences in meaning processing. Cognitive Brain Research 8: 373–392.
Gardner, Howard 1974. The Shattered Mind. New York: Vintage Books.
Gardner, Howard, Hiram H. Brownell, Wendy Wapner and Diane Michelow 1983. Missing the
point: The role of the right hemisphere in the processing of complex linguistic material. In:
Ellen Perecman (ed.), Cognitive Processing in the Right Hemisphere, 169–191. New York: Aca-
demic Press.
Gazzaniga, Michael S. 1970. The Bisected Brain. New York: Appleton-Century-Crofts.
Gazzaniga, Michael S. 1995. Consciousness and the cerebral hemispheres. In: Michael S. Gazza-
niga (ed.), The Cognitive Neurosciences, 1391–1400. Cambridge: Massachusetts Institute of
Technology Press.
Gleitman, Lila 1990. The structural sources of verb meanings. Language Acquisition 1(1): 3–55.
Karmiloff-Smith, Annette 1979. Micro- and macrodevelopmental changes in language acquisition
and other representational systems. Cognitive Science 3: 91–118.
Kelly, Spencer D., Corinne Kravitz and Michael Hopkins 2004. Neural correlates of bimodal
speech and gesture comprehension. Brain and Language 89: 253–260.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: May
Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Lausberg, Hedda, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Split-brain patients neglect left
personal space during right-handed gestures. Neuropsychologia 41: 1317–1329.
Li, Charles N. and Sandra A. Thompson 1976. Subject and topic: A new typology of language. In:
Charles N. Li (ed.), Subject and Topic, 457–490. New York: Academic Press.
Li, Charles N. and Sandra A. Thompson 1981. Mandarin Chinese: A Functional Reference Gram-
mar. Berkeley: University of California Press.
Lieberman, Philip 2002. On the nature and evolution of the neural bases of human language. Year-
book of Physical Anthropology 45: 36–63.
Lucy, John A. 1992a. Grammatical Categories and Cognition: A Case Study of the Linguistic Rel-
ativity Hypothesis. Cambridge: Cambridge University Press.
Lucy, John A. 1992b. Language Diversity and Thought: A Reformulation of the Linguistic Relativ-
ity Hypothesis. Cambridge: Cambridge University Press.
MacNeilage, Peter F. 2008. The Origin of Speech. Oxford: Oxford University Press.
of Chicago Press.
McNeill, David this volume a. The co-evolution of gesture and speech, and its downstream conse-
quences. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body-Language-Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
McNeill, David this volume b. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking for speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
McNeill, David and Laura Pedelty 1995. Right brain and gesture. In: Karen Emmorey and Judy
Snitzer Reilly (eds.), Sign, Gesture, and Space, 63–85. Hillsdale, NJ: Erlbaum.
Parrill, Fey 2011. The relation between the encoding of motion event information and viewpoint
in English-accompanying gestures. Gesture 11: 61–80.
Pedelty, Laura L. 1987. Gesture in Aphasia. Ph.D. dissertation, University of Chicago.
Pinker, Steven 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cam-
bridge, MA: Massachusetts Institute of Technology Press.
Slobin, Dan I. 1987. Thinking for speaking. In: Jon Aske, Natasha Beery, Laura Michaelis and
Hana Filip (eds.), Proceedings of the Thirteenth Annual Meeting of the Berkeley Linguistic
Society, 435–445. Berkeley, CA: Berkeley Linguistic Society.
Slobin, Dan I. 1996. From “thought and language” to “thinking for speaking.” In: John Joseph
Gumperz and Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge:
Slobin, Dan I. 2004. The many ways to search for a frog: Linguistic typology and the expression of
motion events. In: Sven Strömqvist and Ludo Verhoeven (eds.), Relating Events in Narrative,
Volume 2: Typological and Contextual Perspectives, 219–257. Mahwah, NJ: Lawrence Erlbaum.
Slobin, Dan I. 2009. Review of M. Bowerman & O. Brown (eds). Crosslinguistic perspectives on
argument structure: Implications for learnability. Journal of Child Language 36: 697–704.
Spencer, Rebecca M.C., Howard N. Zelaznik, Jörn Diedrichsen and Richard B. Ivry 2003. Dis-
rupted timing of discontinuous but not continuous movements by cerebellar lesions. Science
300: 1437–1439.
Talmy, Leonard 1975. Syntax and semantics of motion. In: John P. Kimball (ed.), Syntax and
Semantics, Volume 4, 181–238. New York: Academic Press.
Talmy, Leonard 1985. Lexicalization patterns: Semantic structure in lexical forms. In: Timothy
Shopen (ed.), Language Typology and Syntactic Description, Volume III: Grammatical Cate-
gories and the Lexicon, 57–149. Cambridge: Cambridge University Press.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Cambridge: Massachusetts Institute of Tech-
nology Press.
Vygotsky, Lev Semenovich 1987. Thought and Language. Edited and translated by Eugenia Hanf-
mann and Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts
Werner, Heinz and Bernard Kaplan 1963. Symbol Formation. New York: John Wiley. [Reprinted
in 1984 by Erlbaum].
Whorf, Benjamin Lee 1956. Language, Thought, and Reality. Selected Writings of Benjamin Lee
Whorf. Edited by John B. Carroll. Cambridge: Massachusetts Institute of Technology Press.
Zaidel, Eran 1978. Concepts of cerebral dominance in the split brain. In: Pierre A. Buser and Arl-
ette Rougeul-Buser (eds.), Cerebral Correlates of Conscious Experience, 263–284. Amsterdam:
Elsevier.
David McNeill, Chicago, IL (USA)

3. Gestures and speech from a linguistic perspective: A new field and its history 55
3. Gestures and speech from a linguistic perspective:

A new field and its history
1. Gestures as part of spoken language – a sketch of historical perspectives
2. Gestures as part of language – a sketch of the present state of art
3. Conclusion
4. References
Abstract
This chapter gives a brief overview of gesture research from a linguistic point of view.
It begins with a short overview of the history of research on gestures as part of spoken
language and an attempt to understand the longstanding lack of linguistic interest in
considering gestures a relevant topic – or a relevant feature of language.
It then shows that a new field of gesture research has emerged over the past decades,
which regards gesture and speech as inherently intertwined. We have attempted to system-
atize the findings regarding the nature of gestures and their relation to language in use
according to the four aspects currently most widely researched: 1) form and meaning
of gestures, 2) gestures and their relation to utterance formation, 3) gestures, language,
and cognition, and 4) gestures as communicative resource in interaction and discourse.
In doing this, an overview of the present state of the art of research on gesture as part
of spoken language is presented. The chapter is complemented by an encompassing
bibliography of current research on gestures and speech from a linguistic perspective.
1. Gestures as part of spoken language – a sketch of

historical perspectives
Regarding gestures as part of spoken language or even as a language in itself reaches far
way back into the Western history of thought. In the tradition of Rhetoric, gestures
were considered a major part of the elocutio, the delivery. At the height of classical
Rhetoric, Quintilian developed a detailed understanding of the communicative func-
tions of co-verbal gestures. Quintilian stands out among other rhetoricians by consider-
ing the actio‚ the bodily performance coming along with a speech, a major aspect of an
orator’s performance on stage. In his lectures for a speaker Institutionis oratoriae (Quin-
tilian, Institutionis oratoriae XI 3, 92–106) he distinguished gestures relating to parts
of speech (beginning, narration, debate, accusation, conviction), gestures expressing
speech-acts (accusing, denouncing, promising, advising, praising, affirming, question-
ing), gestures expressing affective stance and emotions (certainty, sharpness of accusa-
tion, emphasis, affirmation, modesty, anxiety, admiration, indignation, fear, remorse,
rage, refusal) and gestures which relate to the structure of speech itself (presenting,
structuring, and emphasizing the speech, enumerating evidences, and discriminating dif-
ferent aspects mentioned verbally) (for further detail see Müller 1998: 33–43; Dutsch
this volume; and Graf 1994). Barnett characterizes Quintilian’s account of gestures as
an art in itself and as “a part of a double language system consisting of a highly detailed
and sophisticated verbal language together with an equally expressive nonverbal

language consisting of gesture, postures and actions.” (Barnett 1990: 65)
Quintilian was convinced that gestures are a natural language of mankind and he
proposed that they have almost all expressive qualities of words themselves as they
may point, demand, promise, count, display the size and the amount of concrete and
abstract entities, indicate time and they may even be used as adverbs and pronomina
(Quintilian: Institutionis oratoriae XI 3, 87) (see Müller 1998: 35 and Dutsch this
volume).
As for the hands, without which all action would be crippled and enfeebled, it is scarcely
possible to describe the variety of their motions, since they are almost as expressive as
words. For other portions of the body merely help the speaker, whereas the hands may
almost be said to speak. Do we not use them to demand, promise, summon, dismiss,
threaten, supplicate, express aversion or fear, question or deny? Do we not employ
them to indicate joy, sorrow, hesitation, confession, penitence, measure, quantity, number
and time? Have they not power to excite and prohibit, to express approval, wonder or
shame? Do they not take the place of adverbs and pronouns when we point at places
and things. In fact, though the peoples and nations of the earth speak a multitude of ton-
gues, they share in common the universal language of the hands. The gestures of which I
have thus far spoken are such as naturally proceed from us simultaneously with our words
(Quintilian: Institutionis oratoriae XI 3, 85–88).
The idea of gestures as a universal language was present in Renaissance ideas (Bacon,
Bulwer), it played a prominent role in the philosophy of Enlightenment (Condillac, Di-
derot) and also Romanticism discussed the idea (Vico, Herder) (for more detail see
Copple this volume; Müller 1998; Wollock 1997, 2002, this volume). Notably this long-
standing recognition of gesture’s linguistic properties and their potential for language
declined with the 19th and the 20th century. Treatments like de Jorio’s Mimica degli
Antichi investigata nel gestire napoletano in the early 19th century (see de Jorio
[1832] 2000 with an introduction by Adam Kendon) and Wundt’s work on gestures
and signs of Neapolitans, Plains Indians and Deaf people (Wundt 1921) did not inspire
scholarly reflection upon gesture as part of language (see also Kendon 2004: chapter 3
and 4).
Wollock (this volume) summarizes this development and its implications for contem-
porary reflections on gestures as follows:
Renaissance ideas on gesture foreshadow the 18th century, and to some extent even
Romanticism (see Vico, Herder). Important for us today is not so much the literal question
whether gesture is a universal language, as the fact that in this period gesture called atten-
tion to linguistic processes that are certainly universal – psychophysiological processes
common to verbal and nonverbal thought – but that were often overlooked, downplayed,
or even denied in 20th–century linguistics. (Wollock this volume)
Within 20th century linguistics gestures were not considered a relevant topic – or a
relevant feature of language. Under the auspices of Saussurean linguistics, hand-
movements that go along with speech were thrown into the wastebasket of parole or
of language use (Saussure, Bally, and Sechehaye 2001; Albrecht 2007). The idea of lan-
guage as a social system langue underlying all forms of language use was critical in de-
fining and establishing a scholarly discipline of linguistics in Europe. It had an immense
impact on the humanities in Europe. Structuralism became one of the most influential
schools of thought in the twentieth century.
Such a focus on language as a social system distanced the attention of linguists from
those phenomena that are characteristic of language use. This also holds for American
structuralism, which was strongly marked by the great challenge of documenting the
wealth of unwritten languages of Native America (see Bloomfield 1983; Z. Harris
1951). Notably, those were languages without a writing system, spoken languages,
which were characterized through their lack of literacy: un-written languages. Interest-
ingly enough American linguistic anthropology focused on the de-contextualized
systematic features of these languages, not on their particular nature as spoken lan-
guages. However, given their goal to identify the grammatical structures or the linguis-
tic system “behind” the spoken words this appeared to make perfect sense – concepts of
emergent grammar were not discussed at the time (Hopper 1998).
An exceptional study that did empirically investigate forms and functions of gestures
that are used in conjunction with speech comes out of American anthropology: the doc-
toral dissertation of David Efron ([1941] 1972), a student of Franz Boas. Carried out
during the Second World War it was not aimed at a contribution to linguistic questions,
but as an empirical study within the “nature-nurture” debate and seeking an empirical
answer to the question: Is human behavior shaped by culture or by nature? To counter
racist and eugenics positions, David Efron went out to study the gestural behavior of
traditional Eastern-Jewish and Southern Italian immigrants in New York City and com-
pared their style of gesturing with that of the second-generation immigrants. What he
found were hybrid gestural forms in the second generation, such as gestures, which
blended “American” and “Italian” or “Jewish” forms of gesturing. This was taken as
an indication and support of the nurture position, because it showed the influence of
culture on shaping human communicative behavior. Efron’s meticulous semiotic analy-
sis of gestural forms was prerequisite to actually defining and identifying these differ-
ences. But the study did not inspire scholars of language to look at gesture. His work
and his classification system of gestures was later made public by the psychologist
Paul Ekman and became a standard reference system in 20th century research on
bodily behavior more generally and non-verbal communication research in particular
(Ekman and Friesen 1969).
With Chomskian linguistics taking over the lead in the middle of the twentieth cen-
tury, linguistics made a turn towards cognitive science: the universal cognitive compe-
tence of humans for acquiring language came to be the topic of linguistics proper.
“Language performance” and accordingly gesture was not regarded a relevant topic
of inquiry within the so defined field of linguistics (Chomsky 1965; R. Harris 1995).
At roughly the same time however there were some singular attempts to analyze
body movements from a point of view of structuralism: Ray Birdwhistell (1970) –
linguistic anthropologist – put forward an account of facial expressions, postures and
hand movement for which he coined the term “Kinesics”. He developed a structuralist
framework for the description of body-movements and proposed that units of gestures
are very much structured like linguistic units:
The isolation of gestures and the attempt to understand them led to the most important
findings of kinesic research. This original study of gestures gave the first indication that ki-
nesic structure is parallel to language structure. By the study of gestures in context, it
became clear that the kinesic system has forms which are astonishingly like words in a
language. (Birdwhistell 1970: 80)
In the sixties the anthropologist and linguist Kenneth Lee Pike put forward a theoret-
ical framework for language as part of human behavior (Pike 1967). Extending the pho-
nology/phonetics or phonem/phon distinction of structural linguistics to human
behavior in general, he introduced the differentiation between emic and etic aspects
of human behavior. He proposed that emic aspects of human behavior concern meaning
and etic aspects of behavior address their material characteristics. Pike even argued that
these behavioral units could form what nowadays would be termed a “multimodal syn-
tactic structure”, namely a sentence in which verbal and gestural forms are systemati-
cally integrated. (For a detailed account of Pike’s contribution to a multimodal
grammar see Fricke 2012, this volume).
A pioneer in researching bodily behavior with speech is Adam Kendon (with an edu-
cation in biology and in particular ethology). In the sixties and seventies he researched
the patterns of interactive bodily behavior (Kendon, Harris, and Key 1975; Kendon
1990). His analysis of the behavioral units and sequencing of greetings (Kendon and
Ferber 1973) revealed that communicative bodily actions are highly structured, mean-
ingful and closely integrated with speech. In the early seventies Kendon provided the
first systematic micro-analysis of gestural and vocal units of expression. At that time
film recordings became available for scientific research and the possibility to inspect
these sequences again and again made it possible to discover the fine-grained micro-
structures of human bodily and verbal behavior. An important outcome of this devel-
opment in technology was the fist micro-analysis of speech and body motion. Kendon
showed that units of speech and units of body motion possess a similar hierarchical
structure: larger units of movements go along with larger units of speech and smaller
units of movement parallel smaller portions of speech (Kendon 1972). It was only
about 10 years later that he formulated explicitly the idea of gesture and language as
being two sides of one process of utterance (Kendon 1980; for his current view see
Kendon 2004, this volume).
In the seventies and also for most part of the eighties linguistics continued to be
dominated by generative theory (then reformulated as Government and Binding
Theory; see Chomsky 1981). Psychology adopted the concept of non-verbal communi-
cation (Argyle 1975; Feldmann and Rimé 1991; Hinde 1972; Ruesch and Kees 1970;
Scherer and Ekman 1982; Watzlawick, Bavelas, and Jackson 1967) and gestures as
part of speech were regarded as only marginally relevant for such a field of research.
On the contrary, those body-movements not related to speech and with functions differ-
ent from language attracted most interest. One consequence of this was a big increase in
researching facial expression (see Ekman and Rosenberg 1997 for an overview). Such a
scholarly climate made it difficult to pursue a linguistic perspective on gestures and lan-
guage throughout the eighties. However, there were other positions: David McNeill
(1979) – coming from psychology and linguistics – proposed a theory of language and
gesture in which both modalities form one integrated system. Already at that time
McNeill and Kendon concentrated on gestures as movements of hands and on their par-
ticular relationship to speech. In contrast to nonverbal communication scholars they
were interested in the movement of the hands because they exhibit a particularly
tight interrelatedness with language.
McNeill’s idea of gesture as being part of the verbal utterance challenged the distinc-
tion of verbal versus non-verbal behavior, which characterized the mainstream research
on nonverbal communication at the time. It even triggered a public debate carried out
in several articles in the Journal “Psychological Review”. McNeill challenged the
psycholinguistic belief in gestures as part of the NON-verbal dimensions of communica-
tion, by raising the question: So you think gestures are non-verbal (McNeill 1985). Parti-
cipants of this debate were: Brian Butterworth, Pierre Feyereisen, and Uri Hadar on the
one hand and David McNeill on the other hand (Butterworth and Hadar 1989; Feyerei-
sen 1987; McNeill 1985, 1987, 1989). While McNeill criticized the idea of gestures as
being non-verbal, Butterworth and Hadar presented psycholinguistic evidence for their
assumption of gestures as something different from speech (see Hadar this volume).
In 1992 McNeill published his integrated theory of gestures and speech in, what
became a landmark book for a psychological and linguistic approach to gesture and
speech, “Hand and Mind. What gestures reveal about thought.” (McNeill 1992) In
this book David McNeill develops his theory of language and gesture. He proposes
that gesture and speech are different but integrated facets of language: gesture as imagis-
tic, holistic, synthetic and language as arbitrary, analytic, and linear. These two sides of
language reside on two different modes of thought: one imagistic – the other one propo-
sitional and McNeill considers the dialectic tension between the two modes of thought as
propelling thought and communication (for more detail, see McNeill this volume).
With its core idea of gestures as “‘window’ onto thinking” (McNeill and Duncan
2000), as revelatory of imagistic forms of thought, the book matched a turn towards
cognitive science in the humanities and raised a great amount of interest in psycholog-
ical research on language and cognition (for an overview see McNeill 2000). But for lin-
guistics proper, gesture remained a phenomenon at the margins of interest – if at all.
This holds even for cognitive linguistics [including cognitive grammar (Langacker
1987); metaphor theory (Ortony 1993); or blending theory (Fauconnier and Turner
2002)], which developed a counter position to the modularism of generativism (includ-
ing its further developments as Government and Binding Theory and the minimalist
program, see Chomsky 1981, 1992). Cognitive linguistics argues that language resides
on general cognitive principles and capacities challenging the generativist position of
linguistic competence as a particular and cognitively distinct module. A cognitive lin-
guistic position quite naturally opens up the gate for a concept of language which is
not restricted to the oral mode alone and which allows for an integration of different
modalities within one process of utterance (see Cienki 2010, 2012). Despite this theo-
retical pathway, cognitive linguists for the most part have relied on the analysis of in-
vented sentences – not on data from language use, which might have attracted their
attention to the work gestures do in conjunction with speech.
But over the past two decades the situation has changed and we find an increasing
amount of publications within cognitive linguistics that do consider gestures as part
of linguistic analysis (examples are Cienki 1998a, 1998b; Cienki and Müller 2008b;
McNeill and Duncan 2000; Mittelberg 2006, 2010a, 2010b; Müller 1998; Müller and
Tag 2010; Sweetser 1998; Núñez and Sweetser 2006). Moreover, also outside of Cogni-
tive Linguistics an increasing amount of publications with a linguistic perspective on
gestures – or at least a perspective that is compatible with a linguistic analysis of
gestures – was published. In 2004 Kendon’s monograph on “Gesture: Visible Action
as Utterance” appeared, presenting an encompassing account of the manifold ways in
which gestures can become part of verbal utterances, including a detailed historical
section of gesture classifications as well as a discussion of what makes a movement of
the hands a gesture. Other books relevant to a linguistic perspective on gestures
include: Calbris’ “Elements of meaning in gesture” (2011), Enfield’s “The anatomy
of meaning: speech, gesture, and composite utterances” (2009), Fricke’s “Origo,
Geste und Raum: Lokaldeixis im Deutschen” (2007) and “Grammatik multimodal”
(2012), McNeill’s edited volume “Language and Gesture” (2000) and his book on “Ges-
ture and Thought” (2005), Müller’s “Redebegleitende Gesten: Kulturgeschichte –
Theorie – Sprachvergleich” (1998), Müller and Posner’s edited volume “The semantics
and pragmatics of everyday gestures” (2004), and Streeck’s “Gesturecraft: The manu-
facture of meaning” (2009).
After the turn of the century, more and more scholars have begun to look at gestures
from a linguistic perspective thereby focusing on a range of different aspects.
In the following sections we will present and discuss the present state of the art of a
linguistic view on gestures in more detail. We will concentrate on four main areas of
research: form and meaning of gestures, gestures and their relation to utterance forma-
tion, gestures, language, and cognition, and gestures as a dynamic communicative
resource in discourse.
2. Gestures as part of language – a sketch of the present

state of art
2.1. Form and meaning of gestures
“If we explain the meaning of a gesture we explain the form” (McNeill 1992: 23) is how
McNeill sums up his account of the distinct nature of meaning in gestures. No “duality
of patterning” (Hockett 1958) or “double articulation” (Martinet 1960/1963), “no stan-
dards of form”, no two different systems on the level of form and meaning as in lan-
guage where phonemes distinguish meaning and morphemes carry meaning (McNeill
1992: 22, Saussure, Bally, and Sechehaye 2001). In language two distinct systems are
matched onto each other by convention and the relation between form (sound) and
meaning is characterized by an arbitrary mapping (Saussure, Bally, and Sechehaye
2001). In gestures, on the contrary, the meaning resides in the form: “Kinesic form is
not an independent level as sound is an independent level of language. Kinesic form
in a gesture is determined by its meaning.” (McNeill 1992: 23) Gestures are considered
fundamentally different, they are conceived of as motivated signs, created on the spot
that convey meaning in a global-synthetic way. While in language “parts (the words) are
combined to create a whole (a sentence); the direction [being] from part to whole”, in
gestures the meaning of the parts is determined by the whole, the gestalt of the form(s)
(in this sense they are considered “global”) (McNeill 1992: 19). Gestures convey mean-
ing in a synthetic way: “one gesture can combine many meanings (it is synthetic)”
(McNeill 1992: 19). McNeill illustrates his view on a global-synthetic nature of meaning
in gestures with an example of a gesture in which wiggling fingers depict a character
running along a wire:
This gesture-symbol is global in that the whole is not composed out of separately meaning-
ful parts. Rather, the parts gain meaning because of the meaning of the whole. The
wiggling fingers mean running only because we know that the gesture, as a whole, depicts
someone running. It’s not that a gesture depicting someone running was composed out of
separately meaningful parts: wiggling + motion, for instance. The gesture also is synthetic.
It combines different meaning elements. The segments of the utterance, “he + running +
along the wire,” were combined in the gesture into a single depiction of Sylvester-running-
along-the-wire. (McNeill 1992: 20–21)
The ways in which gestures convey meaning across sequences of gestures is furthermore
characterized as being “non-combinatoric” and “non-hierarchical”: “two gestures pro-
duced together don’t combine to form a larger, more complex gesture. There is no hier-
archical structure of gestures made out of other gestures” (McNeill 1992: 21) and even
if several gestures are combined this does not result in a more complex gesture: “Even
[…] several gestures don’t combine into a more complex gesture. Each gesture depicts
the content from a different angle, bringing out a different aspect or temporal phase,
and each is a complete expression of meaning by itself.” (McNeill 1992: 21)
For McNeill’s theory of language and gesture this sharp distinction between the ways
in which meaning is “carried” in language and how it is “conveyed” in gesture, is of core
importance. McNeill uses a structuralist account of language as a system of arbitrary
signs as a contrastive frame that brings out the particular articulatory properties of ges-
tures and that maximizes the differences between the two modes of expression. This
sharp distinction is a prerequisite for his theory of language, gesture and thought, in
which gestures are considered to reveal a fundamentally different type of thought,
one that is imagistic and global-synthetic and holistic, whereas language forces thought
into the linearity of speech-sounds and the arbitrariness of linguistic signs. It is also con-
stitutive for his understanding of thinking and speaking as a dynamic process propelled
by an imagery-language dialectic, whose basic unit is the so-called “Growth-Point” (see
McNeill 1992: 219–239 and 2005: 92–97; McNeill and Duncan 2000): “It is this unstable
combination of opposites that fuels thought and speech.” (McNeill 2005: 92) Notably, in
his 2005 book McNeill brings in a phenomenological turn, now taking a non-dualistic
point of view with regard to the relation of gesture and mind. Rather than assuming
that gestures reveal inner images, he now argues with reference to the work of Merleau-
Ponty (1962) that gestures do not represent meaning but “inhabit” it (McNeill 2005:
91–92). Drawing on Heidegger, he proposes the H-Model of gestures, suggesting that:
“To the speaker, gesture and speech are not only ‘messages’ or communications, but are
a way of cognitively existing, of cognitively being, at the moment of speaking.” (McNeill
2005: 99) This new concept of gestural meaning and its relation to form is brought
together in the concept of gestures as “material carriers” (inspired by Vygotsky 1986),
and advances a phenomenological understanding of the meaning of gestures:
A material carrier […] is the embodiment of meaning in a concrete enactment or material

experience. […] The concept implies that the gesture, the actual motion of the gesture itself,
is a dimension of meaning. Such is possible if the gesture is the very image; not an “expres-
sion” or “representation” of it, but is it. (McNeill 2005: 98, highlighting in the original)
McNeill’s (1992) book set the stage for a view on the meaning of gestures as holistic,
“global-synthetic” and it inspired a wealth of research in the domain of language and
thought (see McNeill 2000; Parrill 2008; Parrill and Sweetser 2002, 2004; for a discussion
see Kendon 2008).
In his 2009 book “Gesturecraft. The manu-facture of meaning” Jürgen Streeck pro-
poses a praxeological account to the meaning of gestures, which is strongly informed by
phenomenology too. However, Streeck focuses on the situatedness of meaning making
in mundane practices of the world:
The point of departure for the research reported in this book, thus, are human beings in
their daily activities. The perspective on gesture is informed by the work of phenomenolo-
gical philosophers (Heidegger 1962; Merleau-Ponty 1962; Polanyi 1958) who have argued
that we must understand human understanding by finding it, in the first place, in concrete
practical, physical activity in the world, as well as by more recent work in anthropology
[…], philosophy and linguistics […], educational psychology […], and sociology, which is
defined by the view that the human mind – and the symbols that it relies upon – are
embodied. (Streeck 2009: 6, highlighting in the original)
Consequently Streeck conceives of the meaning of gestures in terms of particular situa-

tional settings, e.g. “gesture ecologies” such as: making sense of the world at hand, dis-
closing the world within sight, depiction, thinking by hand, displaying communicative
action, ordering and mediating transactions (see Streeck 2009: 7–11). However, he
takes the form of gestures in terms of their physiological properties and as being an
instrument, a technique to deal with the world at hand as a point of departure for an
analysis of gestural meaning within those different ecologies (Streeck 2009: chapter 3).
In this regard the form and meaning of gestures are derived from practical actions of
the hands and their mode of being in a particular ecological context (see Streeck this
volume).
While Kendon (2004) also regards gestures as forms of action, notably visible actions,
he suggests that gestures must be considered in the Goffmanian sense as moves in an inter-
action (Kendon 2004: 7). For the meaning of gestures in general, he assumes it to be an
achievement of a speaker, resulting from a rhetorical goal, which motivates the meaning
of both speech and gestures and drives the construction of “gesture-speech ensembles”:
We suggest, […], that the conjunction of the stroke with the informational centre of the
spoken phrase is something the speaker achieves. In creating an utterance that uses
both modes of expression, the speaker creates an ensemble in which gesture and speech
are employed together as partners in a single rhetorical enterprise. (Kendon 2004: 127)
Kendon underlines the flexible ongoing adjustment of the two modes of expression in
this intertwined process of verbo-gestural meaning construction in an ongoing conver-
sation (Kendon 2004: 127).
Enfield develops a concept of meaning of gestures which departs from the perspec-
tive of the interpreter (taking a Peircian approach in this regard). In his book on the
“Anatomy of meaning: speech, gesture, and composite utterances” (Enfield 2009: IX)
he brings together semiotic (Peirce), pragmatic (Grice, Levinson) and interactive (Goff-
man, Sacks) approaches to the meaning of gestures and language as used in an interac-
tion. However, Enfield, is also in line with Kendon’s and Streeck’s take on the meaning
of gesture as situated interactional moves by conceiving gestures as elements of com-
posite utterances. His proposal opens up further important facets of Kendon’s gesture-
speech ensembles: “ composite utterances [are defined] as a communicative move that
incorporates multiple signs of multiple types.” (Enfield 2009: 15) The meaning of such
a composite utterance then is a matter of interpreting the co-occuring signs based on a

pragmatic heuristics: “Composite utterances are interpreted through the recognition
and bringing together of these multiple signs under a pragmatic unity heuristic or
co-relevance principle, i.e. an interpreter’s steadfast presumption of pragmatic unity
despite semiotic complexity.” (Enfield 2009: 15, see Enfield this volume)
In parallel to those holistic and global approaches to the meaning of gestures a strand
of gesture research has developed which suggests that gestural meaning is to a certain
degree decomposable into form features (Calbris 1990, 2003, 2008, 2011, this volume;
Kendon 2004; Mittelberg 2006, 2010a; Müller 2004, 2010b; Webb 1996, 1998, inter alia).
However, what appears as opposed views at first sight, turn out to be assumptions
addressing different types of gestures: while McNeill’s characterization of gestures as
being global-synthetic and holistic applies to spontaneously created singular gestures,
the proposal that gestures are decomposable into form features as originally proposed
by Calbris (1990, this volume) applies to gestures that are either fully conventionalized
(e.g. emblematic gestures) or to gestures which are in a process of conventionalization
(e.g. recurrent gestures, for the distinction of singular versus recurrent gestures see
Müller 2010b, submitted; Müller, Bressem, and Ladewig this volume).
For instance, Kendon’s concept of a gesture family, which consists of a group of ges-
tures sharing a common formational core and semantic theme, is based on the idea of a
core set of form features that make up the core meaning of a gesture. For instance in the
gesture families of the open hand, the critical formational feature is shared hand shape.
“In both of these families the hand shape is ‘open.’ That is, the hand is held with all di-
gits extended and more or less adducted (they are not ‘spread’).” (Kendon 2004: 248)
Gestures belonging to the Open Hand Prone family share the formational core of the
forearm being directed downwards. Within the Open Hand Supine family, in contrast,
the forearm is always directed upwards. Differences within one family are marked with
regard to the other formational features: movement and orientation of the palm. The
Open Hand Prone family, for instance, shows two form variants: In Vertical Palm
(VP) gestures the palm of the hand is directed away from the speaker’s body and
they indicate “an intention to halt […] a current line of action” (Kendon 2004: 251).
Horizontal Palm (ZP) gestures face downward and “obliquely away” (Kendon 2004:
251). Furthermore they are always moved in a horizontal lateral movement. Those ges-
tures where observed in contexts, which involved “a reference to some line of action
that is being suspended, interrupted or cut off.” (Kendon 2004: 255)
Such “simultaneous structures of gestures” (Müller 2010b, submitted; Müller, Bres-
sem, and Ladewig this volume; Müller et al. 2005) can be described systematically by
adapting the four parameters of sign language “hand shape”, “orientation” of the
palm, “movement”, and “position” in gesture space (Battison 1974; Stokoe 1960; see
also Bressem this volume; Bressem, Ladewig, and Müller this volume). The idea of de-
composing gestures into their meaningful segments was particularly advanced by stu-
dies on gestures that show a particular recurrent gestural form coming with a
particular semantic theme (Calbris 2003, 2008; Fricke 2010; Harrison 2009, 2010; Lade-
wig 2010, 2011; Müller 2004, 2010b; Kendon 2004; Bressem, Müller, and Fricke in prep-
aration; Teßendorf 2008). Distribution analyses across different contexts of use revealed
that such “recurrent gestures”, as they were termed (Ladewig 2007, 2010, 2011; Müller
2010b), often show a variation of meaning that becomes manifest in a correlation
between form and context of use. This means that speakers draw upon a repertoire
of gestural forms, which they use recurrently. As McNeill suggests for the Palm Up
Open Hand (2005: 48–53) recurrent gestures are in a process of becoming conventiona-
lized, their form-meaning relation is motivated and the motivation is still transparent,
but given their recurrent usages in a limited set of contexts, they appear best be placed
somewhere in the middle of a continuum between spontaneously created gestures on
the one hand and fully conventionalized emblems on the other hand (for different
gesture continua McNeill 2000: 2–7).
Based on a discussion of recurrent gestural forms and recurrent gestural meanings,
studies have furthermore documented that gestures can become semanticized as well
as grammaticalized. Hence culture-specific gestures can be deployed as lexical or gram-
matical elements in co-occurrence with speech or they may even enter sign linguistic
systems as lexemes or grammatical morphemes. Accordingly, gestures were, for
instance, identified as markers of negation (Harrison 2009, 2010; Müller and Speck-
mann 2002; Bressem, Müller, and Fricke in preparation), of Aktionsart (Becker et al.
2011; Bressem 2012; Ladewig and Bressem forthcoming; Müller 2000), of a topic com-
ment structure (Kendon 1995; Seyfeddinipur 2004), or as plural markers (Bressem
2012). Furthermore pathways of grammaticalization from gesture to sign have been
traced, for instance, for classifier constructions (Pfau and Steinbach 2006; Müller
2009), tense and modality (Janzen and Shaffer 2002; Wilcox 2004; Wilcox and Rossini
2010; Wilcox and Wilcox 1995), topic marking (Janzen and Shaffer 2002) or for the
development of pronouns and auxiliaries (Pfau and Steinbach 2006, 2011).
2.2. Gestures and utterance formation

The interplay of gestures and speech in forming utterances has been subject of investi-
gation in gesture studies early on. Researched facets of this interplay concern: the cor-
relation of bodily movements to patterns of the speech stream, the syntactic integration
of gestures into spoken utterances, and the distribution of semantic information over
both modalities.
Based on observations that the speaker’s body dances “synchronously with the ar-
ticulatory segmentation of his speech” (Condon and Ogston 1967: 234), gesture scholars
studied in detail the temporal alignment of gestural and spoken units. Amongst them
are studies on parallel hierarchical structures in spoken discourse and in accompanying
body movements (e.g., Kendon 1972, 1980) or the temporal alignment between units of
body movements and units of speech (Condon and Ogston 1966, 1967; Kendon 1972,
1980; McNeill et al. 2002). In particular the correlation of kinesic and prosodic stress
(e.g., Birdwhistell 1970; Loehr 2004, 2007; McClave 1991, 1994; Scheflen 1973; Tuite
1993) as well as of intonation and movements of particular body parts (Birdwhistell
1970; Scheflen 1973; Bolinger 1983; McClave 2000) was examined. Evidence for the
tight link of both modalities has also been gained by showing that the whole
“gesture-speech ensemble” may be modified in order to meet the necessities of articu-
lating both modalities at the same time. Both, speech and gesture, may be repeated, re-
vised or adapted to meet the structure of the other modality (Kendon 1983, 2004; see
also Seyfeddinipur 2006). The tight interrelation of both modalities, gesture and speech,
finds its expression in the often quoted remark that “speech and movement appear
together as manifestations of the same process of utterance” (Kendon 1980: 208) and
“arise from a single process of utterance formation” (McNeill 1992: 30).
Several studies have also shown that gestures and speech are intertwined on the level
of syntax and semantics, each providing necessary information to the formation of an
utterance. Gestures are obligatory elements for the use of particular verbal deictic ex-
pressions such as so, here or there (e.g., de Ruiter 2000; Fricke 2007; Kita 2003; Streeck
2002; Stukenbrock 2008; inter alia) and may even differ in the gestural form depending
on the intended reference object of the deictic expression (e.g., Fricke 2007; Kendon
2004). Gestures also stand in close relation with aspects of verbal negation (e.g., Bres-
sem, Müller, and Fricke in preparation; Calbris 1990, 2003, 2008; Harrison 2009; Ken-
don 2003, 2004; Bressem, Müller, and Fricke in preparation; Streeck 2009 inter alia)
and may go along with different types of negation, such as negative particles, morpho-
logical negation, implicit negation as well as the grammatical aspects of scope and node
of negation (e.g., Harrison 2009, 2010; Lapaire 2006). Furthermore, gestures seem to be
closely related with grammatical categories of the verbal utterance so that iconic ges-
tures, for instance, often correlate with nouns, verbs, and adjectives (e.g., Hadar and
Krauss 1999; Sowa 2005; Bergmann, Aksu, and Kopp 2011).
Various scholars have furthermore argued that gestures can be integrated into the
syntactic structure of an utterance (e.g., Andrén 2010; Bohle 2007; Clark 1996; Clark
and Gerrig 1990; Goodwin 1986, 2007; Enfield 2009; Langacker 2008; McNeill 2005,
2007; Müller and Tag 2010; Slama-Cazacu 1976; Streeck 1988, 1993, 2002, 2009; Wilcox
2002). Recent empirical studies have expanded those characterizations and suggest that
gestures can take over syntactic functions either by accompanying or by substituting
speech. Fricke (2012), for instance, distinguishes two forms of integrability: gestures
may be integrated by positioning, that is either through occupying a syntactic gap or
through temporal overlap; or they might be integrated cataphorically, that is by using
deictic expressions son or solch (‘such a’). These deictic expressions demand “a quali-
tative description that can be instantiated gesturally” (Fricke 2012, our translation). In
doing so gestures expand a verbal noun phrase and serve as an attribute. This phenom-
enon, also referred to as “multimodal attribution”, furnishes evidence for the structural
and functional integration of gestures into spoken language and laid the ground for
developing the framework of a “multimodal grammar” (Fricke 2012).
Bressem (2012) and Ladewig (2012) expanded the notion of a multimodal grammar
by showing that gestures either accompany or substitute nouns and verbs of spoken
utterances. Bressem could show that gestural repetitions can serve attributive function
when they co-occur with noun phrases and they serve adverbial function in cases of
temporal overlap with verb phrases. The potential of gestures to take over syntactic
functions by specifying the shape and size of objects or depicting the manner of the
action is in those cases not bound to a cataphoric integration of the gestures into verbal
utterance by explicit linguistic devices (see Fricke 2012), but rather seems to be based
on the temporal, semantic, and syntactic overlap of the gestures with speech. In her
study on gestures in syntactic gaps exposed by interrupted utterances, Ladewig could
show that gestures do not adopt all kinds of syntactic functions when substituting
speech as was assumed by some authors (e.g., Slama-Cazacu 1976). On the contrary,
when replacing speech gestures preferably fulfill the function of objects and predicates.
Based on her study she argued for a “continuum of integrability” (Ladewig 2012: 183)
in which the link between gesture and speech can be conceived of as varyingly strong
depending on three aspects: the type of integration, the distribution of information
over the different modalities, and the order in which speech and gesture are deployed.
In her study, she furthermore found that referential gestures (e.g., gestures referring to
concrete or abstract actions, entities, events, properties) are the most frequently used
type of gesture with a substitutive function. This finding questions the widely accepted
assumption that the kinds of gestures typically used to replace speech are emblematic
or pantomimic gestures.
Gestures not only take over syntactic functions when forming multimodal utterances,
but they also contribute to the semantics of an utterance. Gestures may replace infor-
mation, illustrate and emphasize what is being uttered verbally, soften or slightly modify
the meaning expressed in speech or even create a discrepancy between the gestural and
verbal meaning (see Bavelas, Kenwood, and Phillips 2002; Bergmann, Aksu, and Kopp
2011; Bressem 2012; Calbris 1990; Engle 2000; Freedman 1977; Fricke 2012; Gut et al.
2002; Kendon 1987, 1998, 2004; Ladewig 2011, 2012; McNeill 1992, 2005; Scherer 1979;
Slama-Cazacu 1976). In exploring the semantic relation of gesture and speech, linguistic
studies have offered different approaches. Semantic information conveyed in both mod-
alities can be described in terms of image schematic structures (e.g., Cienki 1998b, 2005;
Ladewig 2010, 2011, 2012; Mittelberg 2006, 2010a; Williams 2008) or in terms of seman-
tic features (e.g., Beattie and Shovelton 1999, 2007; Bergmann, Aksu, and Kopp 2011;
Bressem 2012; Kopp, Bergmann, and Wachsmuth 2008; Ladewig 2012). Together with
the temporal position of gesture and speech the semantic relation of both modalities
as well as the semantic function of gestures can be captured: If gestures double the
information expressed in speech, their relation can be described as co-expressive
(McNeill 1992, 2005) or redundant (e.g., Gut et al. 2002). If they add information to
that expressed in speech, the relation between both can be described as “complemen-
tary” or “supplementary”. In these cases gestures modify information expressed in
speech (e.g., Andrén 2010; Birdwhistell 1970; Bergmann, Aksu, and Kopp 2011; Bres-
sem 2012; Kendon 1986, 2004; Freedman 1977; Fricke 2012; Scherer 1979). Thereby
both modalities are considered as being “enriched” by their co-occurrence and by
the context in which they are embedded (Enfield 2009, this volume; see also Bressem
2012; Ladewig 2012). At the same time, the range of a gesture’s possible meanings is
reduced as the spoken modality provides necessary information to single out a refer-
ence object (Ladewig 2012). In doing so gestures “are not limited to primarily depicting
specific situations or individuals” but “can be used to depict types or kinds of things,
like prototypes” (Engle 2000: 39). Gestures may single out exemplar interpretations
in speech by picking out a specific individual from a collection mentioned in speech
(Engle 2000) and thus refer to a meaning or concept associated with a word, that is a
prototype, or to an intended object of reference. By being interpretant- or object
related (Fricke 2007, 2012 based on Peirce 1931), gestures are not always and only
tied to the representation of referents in the real world, but are also capable of
seemingly contradicting the intended object of reference (see Fricke 2012).
Gestures that replace speech fulfill a substitutive function and can form an utterance
on their own or provide the semantic center of a multimodal utterance (e.g., Bohle
2007; Clark 1996; Clark and Gerrig 1990; McNeill 2005, 2007; Müller and Tag 2010;
Slama-Cazacu 1976; Wilcox 2002). The unit created by both modalities has been
referred to as “gesture-speech ensemble” (Kendon 2004), “multimodal utterance”
(Goodwin 2006; Wachsmuth 1999), “composite utterance” (Enfield 2009, this volume),
“composite signal” (Clark 1996; Clark and Gerrig 1990; Engle 1998), “multimodal
package” (Streeck this volume), and as “hybrid utterance” (Goodwin 2007).
2.3. Gestures, language and cognition

The cognitive foundation of gestures has gained rising interest in the field of gesture stu-
dies since McNeill proposed that gestures offer a “window onto thinking” (McNeill and
Duncan 2000: 143; see also McNeill 1992) (for an overview see Cienki 2010). McNeill’s
1992 book “Hand and mind. What gestures reveal about thought” had a major impact
on gesture research from a psychological perspective. It also was of paramount impor-
tance for raising the interest of cognitive linguists and metaphor scholars (for an over-
view see Cienki this volume). Notably, McNeill distinguishes in this book iconic from
metaphoric gestures – opening up a path to gesture for scholars interested in metaphor
research in the early nineties of the past century. Cienki (1998a) and Sweetser (1998)
conducted the first studies taking the analysis of metaphoric gestures into cognitive lin-
guistics. Many more studies followed which addressed co-verbal gestures’ relation to
human cognition and have hitherto taken gesture as an indication for the cognitive lin-
guistic claim that metaphor and metonymy should be regarded as general cognitive pro-
cesses or – to put it in Raymond Gibbs’ words: that the mind makes use of poetic
processes (see Gibbs 1994).
[…] the traditional view of mind is mistaken, because human cognition is fundamentally
shaped by various poetic or figurative processes. Metaphor, metonymy, irony, and other
tropes are not linguistic distortions of literal mental thought but constitute basic schemes
by which people conceptualize their experience and the external world. (Gibbs 1994: 3)
Research on gestures in relation to verbal metaphoric expressions has made even more
specific claims by proposing that gestures used in conjunction with speech may form
multimodal metaphors. In doing this, the gesture part of the metaphor very frequently
embodies the experiential source domain of the verbalized metaphoric expression (e.g.,
Calbris 1990, 2003, 2011; Cienki 1998a, 2008; Cienki and Müller 2008b; Kappelhoff and
Müller 2011; McNeill 1992; McNeill and Levy 1982; Müller 1998, 2004, 2008a, 2008b;
Müller and Cienki 2009; Núñez and Sweetser 2006; Sweetser 1998; Webb 1998; a
state of the art is given in Cienki and Müller 2008a). What is striking about the studies
on metaphor, gesture, and speech is the variable relation of both modalities in expres-
sing metaphoricity: either metaphoricity can be expressed monomodally, that is in
speech or gesture, or multimodally, that is in both speech and gesture (Cienki 2008;
Cienki and Müller 2008b; Müller and Cienki 2009). The observed distribution of meta-
phoric meaning across the different modalities led to an enhanced understanding of the
“modality-independent nature of metaphoricity” (Müller and Cienki 2009: 321; see also
Müller 2008a, 2008b). It has been suggested that metaphors are clearly delimited count-
able units – apt for statistical analysis in corpus linguistics – yet studying metaphors in
the context of multimodal discourse, e.g. as a phenomenon of use, it turned out that
very often metaphors are not bound to single lexical items but rather evolve over
time. This points to an understanding of metaphoricity as a process not as product, a
process which can evolve dynamically over time in an interaction, through speech, ges-
ture, and other modalities (Kappelhoff and Müller 2011; Kolter et al. 2012). In those
multimodally orchestrated interactions, metaphoric gestures in conjunction with speech
are used as a foregrounding strategy which activates metaphoricity of sleeping meta-
phors (e.g., so called “dead” metaphors, see Müller 2008a, 2008b and Müller and Tag
2010 for an extended version of this argument).
Also the notion of conceptual metonymy (e.g., Gibbs 1994; Lakoff and Johnson
1980) is recently receiving increasing attention in the field of gesture studies (Ishino
2001; Mittelberg 2006, 2008, 2010a, 2010b; Müller 1998, 2004, 2009). It is assumed to
play a major role in gestural sign formation. Mittelberg (2006, 2008, 2010b), for
instance, proposes that observers of gestures follow a metonymic path from a gesture
to infer a conceived object. She suggests “that accounting for metonymy in gesture
may illuminate links between habitual bodily acts, the abstractive power of the mind,
and interpretative/inferential processes.” (Mittelberg 2006: 292–293) By introducing
the concepts of “internal and external metonymy” (Jakobson and Pomorska 1983; Mit-
telberg 2006, 2010b), different processes of abstraction involved in the creation and
interpretation of gestures are disentangled. Accordingly, in the case of “internal meto-
nymy” an observer of a gesture can infer a whole action or an object of which salient
aspects are depicted gesturally. In case of “external metonymy” objects that are ma-
nipulated by the hands can be inferred via a contiguity relation between the object
and the hand.
Processes of abstraction in gestures are pertinent to the motivation of the form of
gestures and they contribute significantly to the meaning of gestures. They concern
the level of pre-conceptual structures such as image schemas (see above), action sche-
mas (e.g., Bressem, Müller and Fricke in preparation; Calbris 2011, this volume; Mittel-
berg 2008; Teßendorf 2008; Streeck 2008, 2009), mimetic schemas (Zlatev 2002), and
motor patterns (Mittelberg 2006; Ladewig and Teßendorf 2008, in preparation).
Conceptual blending must be regarded as a higher cognitive process since it concerns
the construction of complex forms of meaning in gestures (Parrill and Sweetser 2004;
Sweetser and Parrill volume 2) as well as in signs (Liddell 1998) and in the interactive
construction of multi-layered blends as for instance in the context of a school teacher’s
explanation of how a clock symbolizes time (Williams 2008).
Gestures in language use have also been subject to analyses addressing more specif-
ically issues of a cognitive grammar: Bressem (2012) on repetitions in gestures, Harrison
(2009) on gestures and negation, Ladewig (2012) on the semantic and syntactic integra-
tion of gestures, and Wilcox on the grammaticalization of gestures to signs of signed
languages (2004). Núñez (2008) and Streeck (2009) have pointed out that gestures
embody what Leonard Talmy terms “fictive motion” (Talmy 1983), thus exhibiting
that abstract concepts lexicalized for instance as motion verbs (e.g. the road runs
along the river), are conceived as actual body motion. Mittelberg (2008) and Bressem
(2012) both found that gestures appear to play a vital role in the establishment of
so-called “reference-points” (e.g. Langacker 1993). A reference point is a cognitively
salient item that provides mental contact with a less salient target. A gestural form
serves as a reference point by providing cognitive access to a concrete or abstract
object. In so doing, gestures may guide the hearer’s attention to particular aspects of
a conversation. In reference point relations, gestures may “serve as an index” providing
cognitive access to a construed object (Mittelberg 2008: 129).
Broadening the scope from the meaning and functions of single units to cognition
and gesture in language as use, Harrison (2009), Andrén (2010), and Bressem (2012)
have proposed to conceive of this interplay as multimodal or embodied constructions.
Müller (2008a, 2008b), Müller and Tag (2010), and Ladewig (2012) have suggested
that gestures display the flow of attention, especially with regard to the foregrounding
and activating of metaphoricity (for an extension to audio-visual multimodal metaphor
see Kappelhoff and Müller 2011). Furthermore, Bressem (2012) has suggested that
repetitions in gesture follow the attentional flow.
A further aspect, which has gained vital interest over the past years in gesture
research, concerns the gestural representation of motion events and its relation with
grammatical aspects of the verbal utterance and the information distributed across
the modalities (e.g., Duncan 2005; Gullberg 2011; Kita 2000; Kita and Özyürek 2002;
McNeill 2000; McNeill and Duncan 2000; McNeill and Levy 1982; Müller 1998; Parrill
2008 inter alia). Numerous studies have shown that gestural representations of the same
motion event may differ across languages depending on whether the languages are
verb- or satellite-framed. Whereas speakers of English, for instance, might express
the notion of a ball rolling down a hill in one clause and one gesture, which represents
the motion and the direction at the same time, Japanese or Turkish speakers express the
same notion in two verbal clauses accompanied by two distinct gestures, one expressing
the motion and the other the direction or manner of motion (Kita and Özyürek 2002;
Kita et al. 2007). Thus, if meaning is distributed over two spoken clauses, the same
meaning is likely to be expressed in two gestures, each expressing similar meaning as
the spoken clause (Kita et al. 2007). Therefore, gestures reflect information considered
relevant for expression (what to say) as well as its linguistic encoding (how to say it),
with cross-linguistic consequences. Gestures thus reflect linguistic conceptualization
and cross-linguistic differences in such conceptualizations. (Gullberg 2011: 148)
To sum up: bringing together gesture studies and cognitive perspectives on language
and language as use contributes to the discussion of “embodied cognition”, underlining
that cognitive processes and conceptual knowledge are deeply rooted in the body’s
interactions with the world.
2.4. Gesture as a dynamic communicative resource in the process

of meaning construction in discourse
Not only has a multimodal perspective on interaction (see Enfield; Gerwing and Bave-
las; Hougaard and Rasmussen; Kidwell; Mondada; Streeck all this volume) implications
on how concepts, such as utterances (see above), metaphor (e.g., see above), conversa-
tional pauses (e.g., Bohle 2007; Esposito and Marinaro 2007; Ladewig 2012; Müller and
Paul 1999), or turns (e.g., Bohle 2007; Mondada 2007; Schmitt 2005; Streeck und Hartge
1992) need to be conceived of, but it also has fundamental consequences for a theory of
language in general. Since gesture should be included into the study of language use,
researchers proposed the multimodal nature of language, not only on the level of lan-
guage use but also on the level of the language system (Bressem 2012; Fricke 2012, this
volume; Ladewig 2012; Müller 2007, 2008a, 2008b). Furthermore, studies on discourse
revealed a dynamic intertwining of speech and gesture revealing a dynamic dimension
of language.
What we see when people speak and gesture is, in McNeill’s terms, a product of an
online “dialectic between speech and gesture” (McNeill 2005). Both speech and gesture
are outcomes of “the moment-by-moment thinking that takes place as one speaks”
(McNeill 2005: 15) whereby different modes of thinking are reflected in both modal-
ities – imagistic thinking in gestures and analytic and categorical thinking in language.
Both modes of thinking are seeded in the growth point (McNeill 1992, 2005, this vol-
ume). The two modalities, combined in a growth point, are considered as equal partners
when creating discourse, participating “in a real-time dialectic during discourse, and
thus propel and shape speech and thought as they occur moment to moment” (McNeill
2005: 3).
Another dynamic dimension introduced by McNeill “that reveals itself quite natu-
rally when extending one’s focus from single gesture-speech units to the unfolding of
discourse” (Müller 2007: 109f.) is that of “communicative dynamism” (Firbas 1971).
Following Firbas, communicative dynamism is regarded “as the extent to which the
message at a given point is pushing the communication forward.” (McNeill 1992:
207) McNeill observed that the quantity of gestures as well as the complexity of gestural
and spoken expressions would “increase at points of topic shift, such as new narrative
episodes or new conversational themes” (McNeill and Levy 1993: 365). Further,
when speech and gesture synchronize, i.e., when they are used in temporal overlap,
co-expressing “a single underlying meaning”, the “point of highest communicative
dynamism” is reached (McNeill 2007: 20). With these information revealed in speech
and gesture, it can be traced what a speaker focuses on along a narration. “As the
speaker moves between levels and event lines, at any given moment some element is
in focus and other elements recede in the background […] The focal element will
have the effect of pushing the communication forward” (McNeill 1992: 207).
McNeill’s observations on communicative dynamism pave ways for Müller’s obser-
vations of dynamic meaning activation going along with different attentional foci of
the speaker (Müller 2007, 2008a, 2008b; Müller and Tag 2010). Adopting a discourse
perspective on the analysis of multimodal communication, she found that meaning
(and in particular metaphoric meaning) is not created on the spot but emerges over
the flow of the discourse. Through the interplay of the different communicative re-
sources participants of a conversation have at hand, meaning can be activated to differ-
ent degrees and become foregrounded for both speaker and recipient. In their analysis
on metaphoricity in multimodal communication, Müller and Tag (2010) identified three
different foregrounding techniques in which gestures play a significant role. Accord-
ingly, when metaphoricity is being expressed in only one modality, that is in speech
or gesture, it is regarded only minimally activated. When metaphoricity is being elabo-
rated or expressed in both speech and gesture it is considered waking and highly acti-
vated. This dynamic foregrounding of different aspects of (metaphoric) meaning goes
along with a moving focus of attention (Chafe 1994). In doing so, “participants in a
conversation co-construct an interactively attainable salience structure, that they
engage in a process of profiling metaphoric meaning by foregrounding it” (Müller
and Tag 2010).
Recent work within the framework of “dynamic multimodal communication” (Mül-
ler 2008a, 2008b) focuses on the experiential grounding of metaphoric meaning. More
precisely, minute studies of face-to-face communication in therapeutic settings and in
the context of dance lessons revealed that bodily movements as well as their “felt qua-
lities” (Johnson 2005; see also Sheets-Johnstone 1999) provide the affective, embodied
grounds of metaphoricity (Kappelhoff and Müller 2011; Kolter et al. 2012). Metaphori-
city can be observed to emerge from bodily movement being verbalized at a later point
in the conversation proving the dynamic dimension of metaphoric meaning. These ob-
servations give empirical evidence of what has been referred to as “languageing of
movement” (Sheets-Johnstone 1999) – the translation of body movements into words
and, as such, the emergence of meaning from the body.
3. Conclusion
Regarding gestures and speech from a linguistic perspective addresses the properties of
gestures as a medium of expression both in conjunction with speech and as a modality
with its own particular characteristics. It departs from the assumption that the hands
possess the articulatory and functional properties to potentially develop a linguistic sys-
tem (Müller 1998, 2009, this volume; Müller, Bressem, and Ladewig this volume). That
the hands can indeed become language is visible in signed languages all over the world.
In the early days of sign linguistics the challenge was to prove that signed languages
are actually languages. In order to substantiate this claim, a sharp boundary had to be
drawn between gestures and signs. However, with the increasing recognition of signed
languages as full-fledged linguistic systems, the stage has opened up for gestures to
be studied as precursors of signs (Kendon 2004: chapter 15; Armstrong and Wilcox
2007).
This brings us back to claims concerning gestures as the universal language of man-
kind, especially as Quintilian formulates them. What we see in co-verbal gestures are
pre-requisites of embodied linguistic structures and patterns that can evolve to lan-
guage, when the oral mode of expression is not a viable form of communication. We
would like to suggest therefore that studying gestures and their “grammar” allows us
to gain some insights into processes of language evolution within the manual modality.
Despite the missing reflection upon gestures as part of language for most part of the
twentieth century, a linguistic view on the multimodality of language has by now proven
to be a valuable “companion to other present foci, such as psychological or interactional
approaches, by expanding the fields of investigations and approaches in gesture studies
and thereby contributing to a more thorough understanding of the medium ‘gesture’
itself as well as the relation of speech and gesture.” (Bressem and Ladewig 2011: 87)
By allowing for a different point of view on phenomena observable in gestures and
its relation with speech, a linguistic view not only further unravels that nature of how
speech and gesture “arise from a single process of utterance formation” (McNeill
1992: 30) and are able to “appear together as manifestations of the same process of
utterance” (Kendon 1980: 208), but moreover underpins the multimodal nature of
language use and of language in general.
Acknowledgements
We are grateful to the Volkswagen Foundation for supporting this work with a grant for
the interdisciplinary project “Towards a grammar of gesture: Evolution, brain and
linguistic structures” (www.togog.org).
4. References
Albrecht, Jörn 2007. Europäischer Strukturalismus: Ein forschungsgeschichtlicher Überblick.
Tübingen: Gunter Narr.
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. dissertation, Centre for Lan-
guages and Literature, Lund University.
Argyle, Michael 1975. Bodily Communication. New York: International Universities Press.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. New York:
Barnett, Dene 1990. The art of gesture. In: Volker Kapp (ed.), Die Sprache der Zeichen und
Bilder, Rhetorik und nonverbale Kommunikation in der frühen Neuzeit, 65–76. Marburg:
Hitzeroth.
Battison, Robin 1974. Phonological deletion in American sign language. Sign Language Studies 5:
1–19.
Bavelas, Janet Beavin, Trudy Johnson Kenwood and Bruce Phillips 2002. An experimental study
of when and how speakers use gesture to communicate. Gesture 2(1): 1–17.
Beattie, Geoffrey and Heather Shovelton 1999. Do iconic hand gestures really contribute anything
to the semantic information conveyed by speech? An experimental investigation. Semiotica
123(1/2): 1–30.
Beattie, Geoffrey and Heather Shovelton 2007. The role of iconic gesture in semantic communi-
cation and its theoretical and practical implications. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language, Volume 1, 221–241.
Philadelphia: John Benjamins.
Becker, Raymond, Alan Cienki, Austin Bennett, Christina Cudina, Camille Debras, Zuzanna
Fleischer, Michael Haaheim, Torsten Müller, Kashmiri Stec and Alessandra Zarcone 2011. Ak-
tionsarten, speech and gesture. Proceedings of the 2nd Workshop on Gesture and Speech in
Interaction – GESPIN, Bielefeld, Germany, 5–7 September.
Bergmann, Kirsten, Volkan Aksu and Stefan Kopp 2011. The relation of speech and gestures:
Temporal synchrony follows semantic synchrony. Paper presented at the 2nd Workshop on
Gesture and Speech in Interaction – GESPIN, Bielefeld, Germany, 5–7 September.
Birdwhistell, Ray L. 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Bloomfield, Leonard 1983. An Introduction to the Study of Language. Volume 3. Amsterdam: John
Benjamins.
Bohle, Ulrike 2007. Das Wort ergreifen – das Wort übergeben: Explorative Studie zur Rolle rede-
begleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bolinger, Dwight 1983. Intonation and gesture. American Speech 58(2): 156–174.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, European University Viadrina, Frankfurt (Oder).
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases – articulatory features of
gestural movement? Semiotica 184(1/4): 53–91.
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic annotation system
for gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Bressem, Jana, Cornelia Müller and Ellen Fricke in preparation. “No, not, none of that” – cases of
exclusion and negation in gesture.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech, and computational stages: A reply to
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis. Gesture as the representa-
tion of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1): 19–46.
Calbris, Geneviève 2008. From left to right…: Coverbal gestures and their symbolic use of space.
In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 27–53. Amsterdam: John
Benjamins.
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Calbris, Geneviève this volume. Elements of meaning in gesture. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Lan-
guage – Communication: An International Handbook on Multimodality in Human Interaction.
(Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Chafe, Wallace L. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: Massachusetts Institute of
Technology Press.
Chomsky, Noam 1981. Lectures on Government and Binding. Dordrecht, the Netherlands: Foris.
Chomsky, Noam 1992. A Minimalist Program for Linguistic Theory. Cambridge: Massachusetts
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphorical ex-
pressions. In: Jean-Pierre König (ed.), Discourse and Cognition: Bridging the Gap, 189–204.
Stanford, CA: Center for the Study of Language and Information.
Cienki, Alan 1998b. Straight: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9(2): 107–149.
Cienki, Alan 2005. Image schemas and gesture. In: Beate Hampe (ed.), From Perception to Mean-
ing: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter Mouton.
Cienki, Alan 2008. Why study metaphor and gesture. In: Alan Cienki and Cornelia Müller (eds.),
Metaphor and Gesture, 5–25. Amsterdam: John Benjamins.
Cienki, Alan 2010. Gesture and (cognitive) linguistic theory. In: Rosario Caballero (ed.), Proceed-
ings of the XXVII AESLA International Conference ‘Ways and Modes of Human Communica-
tion’, 45–56. Ciudad Real, Spain: Universidad de Castilla-La Mancha.
Cienki, Alan 2012. Usage events of spoken language and the symbolic units (may) abstract from
them. In: Krzysztof Kosecki and Janusz Badio (eds.), Cognitive Processes in Language, 149–
158. Frankfurt: Peter Lang.
Cienki, Alan this volume. Cognitive Linguistics: Spoken language and gesture as expressions of
conceptualization. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture and thought. In: Raymond W. Gibbs
(ed.), Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cambridge Uni-
versity Press.
Clark, Herbert H. 1996. Using Language. Volume 4. Cambridge: Cambridge University Press.
Clark, Herbert H. and Richard J. Gerrig 1990. Quotations as demonstrations. Language 66(4):
764–805.
Condon, William C. and Richard Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143(4): 338–347.
Condon, William C. and Richard Ogston 1967. A segmentation of behavior. Journal of Psychiatric
Research 5: 221–235.
Copple, Mary this volume. Enlightenment philosophy: Gestures, language, and the origin of
human understanding. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
De Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A translation of
La mimica degli antichi investigata nel gestire napoletano. With an introduction and notes by
Adam Kendon. Bloomington: Indiana University Press. First published Fibreno, Naples [1832].
De Ruiter, Jan Peter 2000. The production of gesture and speech. In: David McNeill (ed.), Lan-
guage and Gesture, 284–311. Cambridge: Cambridge University Press.
Duncan, Susan 2005. Gesture in signing: A case study in Taiwan Sign Language. Language and
Linguistics 6(2): 279–318.
Dutsch, Dorota this volume. The body in rhetorical delivery and in theatre – An overview of clas-
sical works. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published [1941].
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage and coding. Semiotica 1(1): 49–98.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals: Basic and Applied Studies
of Spontaneous Expression Using the Facial Action Coding System (FACS). New York: Oxford
University Press.
Enfield, N. J. 2009. The Anatomy of Meaning: Speech, Gesture, and Composite Utterances. Cam-
Enfield, N. J. this volume. A ‘Composite Utterances’ approach to meaning. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interaction.
Engle, Randi A. 1998. Not channels but composite signals: Speech, gesture, diagrams and object
demonstrations are integrated in multimodal explanations. In: Morton Ann Gernsbacher and
Sharon J. Derry (eds.), Proceedings of the Twentieth Annual Conference of the Cognitive
Science Society, 321–326. Mahwah, NJ: Erlbaum.
Engle, Randi A. 2000. Toward a theory of multimodal communication combining speech, gestures,
diagrams, and demonstrations in instructional explanations. Ph.D. dissertation, Stanford University.
Esposito, Anna and Maria Marinaro 2007. What pauses can tell us about speech and gesture part-
nership. In: Anna Esposito, Maja Bratanic, Eric Keller and Maria Marinaro (eds.), Fundamentals
of Verbal and Nonverbal Communication and the Biometric Issue, 45–57. Amsterdam: IOS Press.
Fauconnier, Gilles and Mark Turner 2002. The Way We Think: Conceptual Blending and the
Mind’s Hidden Complexities. New York: Basic Books.
Feldmann, Robert S. and Bernard Rimé (eds.) 1991. Fundamentals of Nonverbal Behavior. Cam-
Firbas, Jan 1971. On the concept of communicative dynamism in the theory of functional sentence
perspective. Brno Studies in English 7: 12–47.
Freedman, Norbert 1977. Hands, words and mind: On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and Stanley
Grand (eds.), Communicative Structures and Psychic Structures, 109–132. New York: Plenum.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik: Wie Artikulatio-
nen zu Typen werden, die bedeuten können. In: Sprache und Literatur 41(1): 70–88.
De Gruyter Mouton.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech. In: Cornelia Müller, Alan
Language – Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Gerwing, Jennifer and Janet Beavin Bavelas this volume. The social interactive nature of gestures:
theory, assumptions, methods, and findings. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Commu-
nication: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Gibbs, Raymond W. 1994. The Poetics of Mind: Figurative Thought, Language, and Understanding.
Cambridge: Cambridge University Press.
Goodwin, Charles 1986. Gesture as a resource for the organization of mutual orientation. Semi-
otica 62(1/2): 29–49.
Goodwin, Charles 2006. Human sociality as mutual orientation in a rich interactive environment:
Multimodal utterances and pointing in Aphasia. In: N. J. Enfield and Stephen C. Levinson
(eds.), Roots of Human Sociality: Culture, Cognition and Interaction, 97–125. London: Berg.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell, and
Elena Levy (eds.), Gesture and Dynamic Dimensions of Language, 195–212. Amsterdam: John
Benjamins.
Graf, Fritz 1994. Gestures and conventions: The gestures of Roman actors and orators. In: Jan Bremmer
and Herman Roodenburg (eds.), A Cultural History of Gesture, 36–58. Cambridge: Polity Press.
Gullberg, Marianne 2011. Thinking, speaking and gesturing about motion in more than one lan-
guage. In: Aneta Pavlenko (ed.), Thinking and Speaking in Two Languages, 143–169. Bristol:
Multilingual Matters.
Gut, Ulrike, Karin Looks, Alexandra Thies and Dafydd Gibbon 2002. Cogest: Conversational ges-
ture transcription system version 1.0. Fakultät für Linguistik und Literaturwissenschaft, Univer-
sität Bielefeld, ModeLex Tech. Report, 1.
Hadar, Uri this volume. Coverbal gestures: Between communication and speech production. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
modality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.)
Berlin: De Gruyter Mouton.
Hadar, Uri and Robert Krauss 1999. Iconic gestures: The grammatical categories of lexical affili-
ates. Journal of Neurolinguistics 12(1): 1–12.
Harris, Randy A. 1995. The Linguistics Wars. New York: Oxford University Press, USA.
Harris, Zellig 1951. Methods in Structural Linguistics. Chicago: Chicago University Press.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Michel de Montaigne, Bourdeaux 3.
29–51.
Heidegger, Martin 1962. Being and Time. Translated by John Macquarrie and Edward Robinson.
New York: Harper and Row.
Hinde, Robert A. (ed.) 1972. Nonverbal Communication. Cambridge: Cambridge University Press.
Hockett, Charles F. 1958. A Course in Modern Linguistics. New York: MacMillan.
Hopper, Paul 1998. Emergent grammar. In: Michael Tomasello (ed.), The New Psychology of Lan-
guage: Cognitive and Functional Approaches to Language Structure, volume 1, 155–175. Mah-
wah, NJ: Lawrence Erlbaum.
Hougaard, Anders and Gitte Rasmussen this volume. Fused bodies: on the interrelatedness of cog-
nition and interaction. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin: De Gruyter Mouton.
Ishino, Mika 2001. Conceptual metaphors and metonymies of metaphoric gestures of anger in dis-
course of native speakers of Japanese. Mary Andronis, Christopher Ball, Heidi Elston and Syl-
vain Neuvel (eds.), CLS 37: The Main Session, 259–273. Chicago: Chicago Linguistic Society.
Jakobson, Roman and Krystyna Pomorska 1983. Dialogues. Cambridge: Massachusetts Institute of
Technology Press.
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL grammati-
calization. In: Richard P. Meier, Kearsy Cormier and David Quinto-Pozos (eds.), Modality and
Structure in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University Press.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.), From
Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruyter Mouton.
Kappelhoff, Hermann and Cornelia Müller 2011. Embodied meaning construction. Multimodal
metaphor and expressive movement in speech, gesture, and feature film. Metaphor in the Social
World 1(2): 121–153.
example. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communica-
tion, 177–210. New York: Elsevier.
R. Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam (1983). Gesture and speech: How they interact. In: John M. Wiemann (ed.), Non-
verbal interaction, 13–46. Beverly Hills, California: Sage Publications.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62(1/2): 3–28.
Kendon, Adam 1987. On gesture: Its complementary relationship with speech. In: Aaron W. Sieg-
man and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 65–97. London:
Lawrence Erlbaum.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behaviour in Focused Encounters. Cam-
Kendon, A. 1995. Gestures as illocutionary and discourse structure markers in Southern Italian
conversation. Journal of Pragmatics, 23: 247–279.
Kendon, Adam 1998. Die wechselseitige Einbettung von Geste und Rede. In: Caroline Schmauser
and Thomas Knoll (eds.), Körperbewegungen und ihre Bedeutungen, 9–19. Berlin: Arno Spitz.
Kendon, Adam 2008. Language’s matrix. Gesture 9: 355–372.
Kendon, Adam this volume. Exploring the utterance roles of visible bodily action: A personal
account. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Kendon, Adam and Andrew Ferber 1973. A description of some human greetings. In: Richard
Phillip Michael and John Hurrell Crook (eds.), Comparative Ecology and Behaviour of Pri-
mates, 591–668. London: Academic Press.
Kendon, Adam, Richard M. Harris and Mary Ritchie Key 1975. The Organization of Behavior in
Face-to-Face Interaction. The Hague: Mouton.
Kidwell, Mardi this volume. Framing, grounding and coordinating conversational interaction: Pos-
ture, gaze, facial expression, and movement in space. In: Cornelia Müller, Alan Cienki, Ellen
Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language –
Communication: An International Handbook on Multimodality in Human Interaction. (Hand-
books of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture. Cambridge: Cambridge University Press.
Kita, Sotaro 2003. Pointing where language, culture, and cognition meet. Mahwah, NJ: Lawrence
Erlbaum.
Kita, Sotaro and Asli Özyürek 2002. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Kita, Sotaro, Asli Özyürek, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishizuka
2007. Relations between syntactic encoding and co-speech gestures: Implications for a model of
speech and gesture production. Language and Cognitive Processes 22(8): 1212–1236.
Kolter, Astrid, Silva H. Ladewig, Michela Summa, Sabine Koch, Thomas Fuchs and Cornelia
Müller 2012. Body memory and emergence of metaphor in movement and speech. An interdis-
ciplinary case study. In: Sabine Koch, Thomas Fuchs, Michela Summa and Cornelia Müller
(eds.), Body Memory, Metaphor, and Movement, 201–226. Amsterdam: John Benjamins.
Kopp, Stefan, Kirsten Bergmann and Ipke Wachsmuth 2008. Multimodal communication from
multimodal thinking – towards an integrated model of speech and gesture production. Interna-
tional Journal of Semantic Computing 2(1): 115–136.
Ladewig, Silva H. 2007. The family of the cyclic gesture and its variants – systematic variation of
form and contexts. http://www.silvaladewig.de/publications/papers/Ladewig-cyclic_gesture_pdf;
accessed January 2008.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
Ladewig, Silva H. 2012 Syntactic and semantic integration of gestures into speech: Structural, cogni-
tive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discovering
Structures in gestures based on the four parameters of sign language. Semiotica.
Ladewig, Silva H. and Sedinha Teßendorf in preparation. The brushing-aside and the cyclic
gesture – reconstructing their underlying patterns.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University Press.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Theoretical Prerequisites. Stan-
ford, CA: Stanford University Press.
Langacker, Ronald W. 1993. Reference-point constructions. Cognitive Linguistics 4(1): 1–38.
Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford Univer-
sity Press.
Lapaire, Jean-Remı́ 2006. Negation, reification and manipulation in a cognitive grammar of sub-
stance. In: Stephanie Bonnefille and Sebastian Salbayre (eds.), La Négation, 333–349. Tours:
Presses Universitaires François Rabelais.
Liddell, Scott 1998. Grounded blends, gestures, and conceptual shifts. Cognitive Linguistics 9(3):
283–314.
Loehr, Dan 2004. Gesture and intonation. Ph.D. dissertation, Georgetown University, Washington, DC.
Loehr, Dan 2007. Aspects of rhythm in gesture and speech. Gesture 7(2): 179–214.
Martinet, André (1960/1963). Grundzüge der Allgemeinen Sprachwissenschaft. Stuttgart: Kohlhammer.
McClave, Evelyn Z. 1991. Intonation and gesture. Ph.D. dissertation, Georgetown University,
Washington, DC.
McClave, Evelyn Z. 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
Research 23(1): 45–66.
McClave, Evelyn Z. 2000. Linguistic functions of head movements in the context of speech. Jour-
nal of Pragmatics 32(7): 855–878.
McNeill, David 1979. The Conceptual Basis of Language. Hillsdale, NJ: Erlbaum.
McNeill, David 1987. So you do think gestures are nonverbal. Reply to Feyereisen (1987). Psycho-
logical Review 94(4): 499–504.
Review 96(1): 175–179.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. Cambridge: Cambridge University Press.
McNeill, David 2007. Gesture and thought. In: Anna Esposito, Maja Bratanić, Eric Keller and
Maria Marinaro (eds.), Fundamentals of Verbal and Nonverbal Communication and the
Biometric Issue, 20–33. Amsterdam: IOS Press.
McNeill, David this volume. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David

McNeill, David and Susan D. Duncan 2000. Growth points in thinking-for-speaking. In: David
McNeill, David and Elena T. Levy 1982. Conceptual representations in language activity and
gesture. In: Robert J. Jarvella and Wolfgang Klein (eds.), Speech, Place, and Action, 271–295.
New York: Wiley and Sons.
McNeill, David and Elena T. Levy 1993. Cohesion and gesture. Discourse Processes 16(4): 363–386.
McNeill, David, Francis Quek, Karl Eric McCullough, Susan Duncan, Robert Bryll, Xin-Feng Ma
and R. Ansari 2002. Dynamic imagery in speech and gesture. In: Björn Granström, David
House and Inger Karlsson (eds.), Multimodality in Language and Speech Systems, Volume
19, 27–44. Dordrecht, the Netherlands: Kluwer Academic.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. Translated by Colin Smith. London:
Routledge.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discoursive evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University.
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
representations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Ges-
ture, 145–184. Amsterdam: John Benjamins.
Mittelberg, Irene 2010a. Geometric and image-schematic patterns in gesture space. In: Vyvyan
Evans and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and
New Directions, 351–385. London: Equinox.
Mittelberg, Irene 2010b. Interne und externe Metonymie: Jakobsonsche Kontiguitätsbeziehungen
in redebegleitenden Gesten. In: Sprache und Literatur 41(1): 112–143.
Mondada, Lorenza 2007. Multimodal resources for turn-taking: pointing and the emergence of
possible next speakers. Discourse Studies 9(2): 194–225.
Mondada, Lorenza this volume. Multimodal interaction. In: Cornelia Müller, Alan Cienki, Ellen
Communication: An International Handbook on Multimodality in Human Interaction. Handbooks
of Linguistics and Communication Science 38.1. New York: De Gruyter Mouton.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich. Berlin:
Arno Spitz.
Müller, Cornelia 2000. Zeit als Raum. Eine kognitiv-semantische Mikroanalyse des sprachlichen
und gestischen Ausdrucks von Aktionsarten. In: Ernest W. B. Hess-Lüttich and H. Walter
Schmitz (eds.), Botschaften verstehen. Kommunikationstheorie und Zeichenpraxis. Festschrift
für Helmut Richter, 211–218. Frankfurt a.M.: Peter Lang.
Müller, Cornelia 2004. Forms and uses of the palm up open hand. A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everyday Gestures,
234–256. Berlin: Weidler.
Müller, Cornelia 2007. A dynamic view on gesture, language and thought. In: Susan D. Duncan,
Justine Cassell and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language,
109–116. Amsterdam: John Benjamins.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View.
Chicago: Chicago University Press.
Müller, Cornelia 2008b. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Cornelia Müller (eds.), Metaphor and Gesture, 249–275. Amsterdam: John Benjamins.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), Routledge’s Linguistics
Encyclopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Martin Vöhler und Christiane
Voss (eds.), Die Mimesis und ihre Künste, 149–187. Paderborn: Fink.
Müller, Cornelia 2010b. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In Sprache und Literatur 41(1): 37–68.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Müller, Cornelia submitted. How gestures mean – The construal of meaning in gestures with speech.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gestures:
a form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Müller, Cornelia and Alan Cienki 2009. Words, gestures, and beyond: Forms of multimodal met-
aphor in the use of spoken language. In: Charles Forceville and Eduardo Urios-Aparisi (eds.),
Multimodal Metaphor, 297–328. Berlin: De Gruyter Mouton.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a Grammar of
Gesture: Evolution, Brain, and Linguistic Structures. Berlin: Antrag im Rahmen der Förderini-
tiative “Schlüsselthemen der Geisteswissenschaften. Programm zur Förderung fachübergrei-
fender und internationaler Zusammenarbeit”.
Müller, Cornelia and Ingwer Paul 1999. Gestikulieren in Sprechpausen. Eine konversations-
syntaktische Fallstudie. In: Hartmut Eggert and Janusz Golec (eds.), … wortlos der Sprache
mächtig. Schweigen und Sprechen in Literatur und sprachlicher Kommunikation, 265–281.
Stuttgart: Metzler.
Müller, Cornelia and Roland Posner (eds.) 2004. The Semantics and Pragmatics of Everyday Ges-
tures. Berlin: Weidler.
Müller, Cornelia and Gerald Speckmann 2002. Gestos con una valoración negativa en la conver-
sación cubana. DeSignis 3: 91–103.
Müller, Cornelia and Susanne Tag 2010. The embodied dynamics of metaphoricity: Activating
metaphoricity in conversational interaction. Cognitive Semiotics 6: 85–120.
Núñez, Raphael 2008. A fresh look at the foundations of mathematics: Gesture and the psycho-
logical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor
and Gestures, 225–247. Amsterdam: John Benjamins.
Núñez, Rafael E. and Eve Sweetser 2006. With the future behind them: Convergent evidence from
Aymara language and gesture in the crosslinguistic comparison of spatial construals of time.
Cognitive Science 30(3): 401–450.
Ortony, Andrew 1993. Metaphor and Thought. Cambridge: Cambridge University Press.
Parrill, Fey 2008. Form, meaning and convention: An experimental examination of metaphoric
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 225–247. Amster-
Parrill, Fey and Eve Sweetser 2002. Representing meaning: Morphemic level analysis with a hol-
istic appraoch to gesture transcription. Paper presented at the First Congress of the Interna-
tional Society of Gesture Studies, The University of Texas, Austin.
Parrill, Fey and Eve Sweetser 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4(2): 197–219.
Peirce, Charles S. 1931. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Harvard Uni-
versity Press.
Pfau, Roland and Markus Steinbach 2006. Pluralization in sign and in speech: A cross-modal typo-
logical study. Linguistic Typology 10(2): 135–182.
Pfau, Roland and Markus Steinbach 2011. Grammaticalization in sign languages. In: Bernd Heine
and Heiko Narrog (eds.), Handbook of Grammaticalization, 681–693. Oxford: Oxford Univer-
sity Press.
Pike, Kenneth Lee 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior (second and revised edition). The Hague: Mouton.
Polanyi, Michael 1958. Personal Knowledge. Chicago: University of Chicago Press.

Quintilian, Marcus Fabius 1969. The Institutio Oratoria of Quintilian. With an English translation
by H. E. Butler. New York: G. P. Putnam.
Ruesch, Jurgen and Weldon Kees 1970. Nonverbal Communication: Notes on the Visual Perception
of Human Relations. Berkeley: University of California Press.
Saussure, Ferdinand de, Charles Bally and Albert Sechehaye 2001. Grundfragen der allgemeinen
Sprachwissenschaft. Berlin: Walter de Gruyter.
Scheflen, Albert E. 1973. How Behavior Means. New York: Gordon and Breach.
Scherer, Klaus R. 1979. Die Funktionen des nonverbalen Verhaltens im Gespräch. In: Klaus R.
Scherer and Harald G. Wallbott (eds.), Nonverbale Kommunikation: Forschungsberichte zum
Interaktionsverhalten, 25–32. Weinheim, Germany: Beltz.
Scherer, Klaus R. and Paul Ekman 1982. Handbook of Methods in Nonverbal Behavior Research.
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung –
Online-Zeitschrift zur verbalen Interaktion 6: 17–61.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the ‘Pistol Hand’.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures, 205–216. Berlin: Weidler.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting speech and gesture (MPI Series in Psycho-
linguistics 39). Nijmegen: University of Nijmegen.
Sheets-Johnstone, Maxine 1999. The Primacy of Movement. New York: John Benjamins.
Slama-Cazacu, Tatiana 1976. Nonverbal components in message sequence: “Mixed syntax”. In:
William Charles McCormack and Stephen A. Wurm (eds.), Language and Man: Anthropolog-
ical Issues, 217–227. The Hague: Mouton.
Sowa, Timo 2005. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Berlin:
Akademische Verlagsgesellschaft Aka.
Stokoe, William C. 1960. Sign Language Structure. Buffalo, NY: Buffalo University Press.
Streeck, Jürgen 1988. The significance of gesture: How it is established. International Pragmatics
Association Papers in Pragmatics 2(1/2): 60–83.
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60(4): 275–299.
Streeck, Jürgen 2002. Grammars, words, and embodied meanings: On the uses and evolution of so
and like. Journal of Communication 52(3): 581–596.
Streeck, Jürgen 2008. Depicting by gestures. Gesture 8(3): 285–301.
Streeck, Jürgen 2009. Gesturecraft: Manufacturing Understanding. Amsterdam: John Benjamins.
Streeck, Jürgen this volume. Praxeology of gesture. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communi-
cation: An International Handbook on Multimodality in Human Interaction. (Handbooks of Lin-
guistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer
and Alsdo di Luzio (eds.), The Contextualization of Language, 135–157. Amsterdam: John
Benjamins.
Stukenbrock, Anja 2008 “Wo ist der Hauptschmerz?” – Zeigen am menschlichen Körper in der
medizinischen Kommunikation. Gesprächsforschung. Online-Zeitschrift zur verbalen Interak-
tion 9: 1–33.
Sweetser, Eve 1998. Regular metaphoricity in gesture: bodily-based models of speech interaction.
Actes du 16e Congres International des Linguistes (CD-ROM).
Sweetser, Eve and Fey Parrill volume 2. Gestures as conceptual blends. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body
– Language – Communication: An International Handbook on Multimodality in Human Inter-
action. (Handbooks of Linguistics and Communication Science 38.1.) Volume 2. Berlin: De
Gruyter Mouton.
Talmy, Leonard (1983). How language structures space. In: Herbert L. Pick and Linda P. Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum Press.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures – combining functional with cogni-
tive approaches. Unpublished manuscript, European University Viadrina, Frankfurt (Oder).
Teßendorf, Sedinha and Silva H. Ladewig 2008. The brushing-aside and the cyclic gesture – recon-
structing their underlying patterns, GCLA-08/DGKL-08. Leipzig, Germany.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1/2): 83–105.
Vygotsky, Lev 1986. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar, revised and edited by Alex Kozulin. Cambridge: Massachusetts Institute of
Technology Press.
Wachsmuth, Ipke 1999. Communicative rhythm in gesture and speech. In: Annelies Braffort, Ra-
chid Gherbi, Sylvie Gibet, James Richardson and Daniel Teil (eds.), Gesture-Based Communi-
cation in Human-Computer Interaction – Proceedings International Gesture Workshop GW’99,
277–289. Berlin: Springer.
Watzlawick, Paul, Janet Beavin Bavelas, and Don D. Jackson 1967. Pragmatics of Human Commu-
nication: A Study of Interactional Patterns, Pathologies and Paradoxes. New York: Norton.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. Unpublished Ph.D. dissertation,
University of Rochester, New York.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures. In:
Serge Santi, Isabelle Guaitella, Christian Cavé and Gabrielle Konopczynski (eds.), Oralité et
Gestualité: Communication Multimodale, Interaction, 387–391. Paris: L’Harmattan.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana Al-
bertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
Wilcox, Sherman 2004. Gesture and language. Gesture 4(1): 3–73.
Wilcox, Sherman and Paolo Rossini 2010. Grammaticalization in sign languages. In: Diane Bren-
tari (ed.), Sign Languages, 332–354. Cambridge: Cambridge University Press.
Wilcox, Sherman and Phyllis Wilcox 1995. The gestural expression of modality in ASL. In: Joan
Bybee and Suzanne Fleischman (eds.), Modality in Grammar and Discourse, 135–162. Amster-
Williams, Robert F. 2008. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Wollock, Jeffrey 1997. The Noblest Animate Motion: Speech, Physiology, and Medicine in Pre-Car-
tesian Linguistic Thought. Amsterdam: John Benjamins.
Wollock, Jeffrey 2002. John Bulwer (1606–1656) and the significance of gesture in 17th century
theories of language and cognition. Gesture 2(2): 227–258.
Wollock, Jeffrey this volume. Renaissance philosophy: Gesture as universal language. In: Cornelia
Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf
(eds.), Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin:
De Gruyter Mouton.
Wundt, Wilhlem 1921. Völkerpyschologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Erster Band. Die Sprache. Leipzig: Engelmann.
Zlatev, Jordan 2002. Mimesis: The “missing link” between signals and symbols in phylogeny and
ontogeny? In: Anneli Pajunen (ed.), Mimesis, Sign and Language Evolution, 93–122. Publica-
tions in General Linguistics 3. Turku: University of Turku Press.
Cornelia Müller, Frankfurt (Oder) (Germany)

Silva H. Ladewig, Frankfurt (Oder) (Germany)
Jana Bressem, Chemnitz (Germany)
4. Emblems, quotable gestures, or conventionalized

body movements
1. Introduction
2. Definition(s) and history of the term “emblem”
3. Theoretical approaches
4. Emblem repertoires
5. Cross-cultural findings
6. Characteristics of emblems
7. Concluding remarks
8. References
Abstract
This article offers an account of the nature of emblems, the history of the concept and its
definitions. It sketches the most important theoretical approaches towards emblems
and their findings. Here, we will present insights from cognitive and psychological, semi-
otic, and ethnographic and pragmatic perspectives on the subject. We will give a very
brief overview about mono-cultural and cross-cultural repertoires of emblems and
address some of the cross-cultural findings concerning this conventional gesture type.
At the end of the article, some of the most important characteristics of emblems are
described.
1. Introduction
Emblems or quotable gestures are conventional body movements that have a precise
meaning which can be understood easily without speech by a certain cultural or social
group.
In this article, we will concentrate on emblematic hand gestures only. Examples of
prototypical emblems are the so-called “thumbs up gesture” that, at least in Western
cultures, is used to express something good or positive and can be glossed with “OK”
(Morris et al. 1979; Sherzer 1991) or the “V as victory sign” (Brookes 2011; Morris
et al. 1979; Schuler 1944) where the index and middle finger are stretched, the other
fingers curled in and the palm faces the interlocutor. These are the gestures that are
likely to appear on photos in newspapers, advertisements, or in ancient paintings having
a clear message and often express the attitude of the gesturer.
At least since the beginning of modern gesture studies with the seminal study of
David Efron ([1941] 1972) emblems have been regarded as a class of gestures different
from spontaneous co-speech gestures. The majority of gesture students agrees that em-
blems differ from spontaneous co-speech gestures, which are assumed to be created
rather on the spot (see McNeill 1992, 2000, 2005; Müller 1998, 2010, this volume;
Poggi 2002, inter alia), in that emblems are historically developed and therefore belong
to the gestural repertoire of a certain culture or group. Emblems are conventional ges-
tures that have a standard of well-formedness. It is widely accepted that they have a
more or less defined meaning, are easily translatable into words or a phrase and can
therefore be used as a substitute for speech (Ekman and Friesen 1969, 1972; Johnson,
4. Emblems, quotable gestures, or conventionalized body movements 83
Ekman, and Friesen 1975; McNeill 1992, 2000, 2005; Morris 2002; Morris et al. 1979;
Müller 1998, 2010; Payrató 1993; Poggi 2002, 2007, inter alia). Consequently, that
means that emblems have, at least in part, an illocutionary force (Kendon 1995; Payrató
1993, 2003; Poggi 2002).
This article will not treat other conventionalized body movements such as coded ges-
tures (e.g. the semaphore language of arm signals, see Morris 2002: 40) or technical ges-
tures (Morris 2002: 38) which are invented and used by a minority for technical
communication, e.g. the gestures of crane drivers or firemen. These gestures usually
do not enter the gestural repertoire of a wider group and are therefore not addressed
here (see Kendon 2004b: 291ff. for an overview).
2. Definition(s) and history of the term “emblem”

Throughout the years, terms and definitions for these conventional gestures have var-
ied: They have been called emblematic gestures or emblems (Efron 1972; Ekman and
Friesen 1969, 1972; Johnson, Ekman, and Friesen 1975; McNeill 1992, 2000, 2005,
inter alia), symbolic gestures (Calbris 1990; Efron 1972; Poggi 2002; Sparhawk 1978;
Wundt 1900), semiotic gestures (Barakat 1973), quotable gestures (Brookes 2001,
2005, 2011; Kendon 1984, 1992), autonomous gestures (Kendon 1983; Payrató 1993)
or narrow gloss gestures (Kendon 2004b), each term shedding a different light on the
phenomenon. In this article, we will adhere to the term emblem because it is the
most widespread within the research community, yet acknowledging that it is not a
label for gestures that are “semiotically, of the same type, when in fact, […] this is
not the case” (Kendon 1992: 92). We will also adopt this term, because it was coined
by David Efron, who – with the first empirical study on gestures – drew the scholarly
attention to these cultural and conventional gestures, being aware of the concept of
symbolic gestures from Wilhelm Wundt (1900). They constitute the most complex
class in Wundt’s gesture classification and are found especially in traditional sign lan-
guages. According to Wundt, they are as close to a word as a gesture can be because
their relation to the signified is characterized by associations and not by iconicity.
These associations are strengthened by the constant use of the gesture which in the
end may lead to a completely conventional and arbitrary sign.
Efron’s study Gesture and Environment from 1941 (re-published as Gesture, Race,
and Culture by Paul Ekman in 1972) introduced the term emblematic gesture to
describe symbolic, conventional and arbitrary gestures. While reviewing theories
about gestures as being always pictorial, natural and congenital, Efron picks up the
term emblem from the Renaissance and cites the work of Francis Bacon (1640):
Notes therefore of things, which without the helpe and mediation of Words signifie Things,
are of two sorts; whereof the first sort is significant of Congruitie, the other ad placitum. Of
the former are Hieroglyphiques and Gestures; […] As for Gestures they are, as it were,
Transitory Hieroglyphiques. […] This in the meane is plain, that Hieroglyphiques and Ges-
tures ever have some similitude with the thing signified, and are kind of Emblemes. (Bacon
1640: 258–259; Efron 1972: 94–95)
Although Bacon includes all gestures that work as signs without the “mediation” of lan-
guage, and compares them to hieroglyphs, because both signs are connected to their
signified through similarity, Efron reserves the term symbolic or emblematic gestures
for those that are “representing either a visual or a logical object by means of a pictorial
or a non-pictorial form which has no morphological relationship to the thing repre-
sented” (Efron 1972: 96), actually reserving the term emblems for arbitrary gestures,
and excluding those that have “some similitude with the thing signified”. Nevertheless,
in a footnote he notes that there are some symbolic gestures that are partially similar to
their referent and calls them “hybrid movements” (Efron 1972: 96). But since they
fall into two different categories and are therefore hard to classify, he refrains from
considering them any further.
While Bacon in this “very brief discussion uses the term “hieroglyphic” in a very
generic way, for all iconic ideograms” ( Jeffrey Wollock, personal communication; see
also this volume), Efron reserves this term for those gestures that have no relation of
similarity to their signified, but an arbitrary one. In her survey of the history of gesture
studies Müller (1998: 61–62, footnote 72) points out that with the discovery of the
Egyptian hieroglyphs in the 16th and 17th century a sudden interest in iconology
arose within the intellectual circles of Europe, leading to the development of pictorial
symbols, such as emblems, displaying proverbs, idioms and abstract notions. It was in
this context that gestures were considered ideograms or emblems. Müller alludes to
the fact that some emblematic gestures (in the Efronian sense) are in effect grounded
in the gestural representation of proverbs and idioms (Müller 1998: 62; see also Payrató
2008).
For David Efron, emblematic gestures are meaningful by the conventional symbolic
connotation that they possess independently from the speech for which they “may, or
may not, be an adjunct” (Efron 1972: 96), a characteristic which also counts for deictic
or pictorial gestures. The matter of iconicity, arbitrariness, and conventionality has been
discussed thoroughly by Barbara E. Hanna (1996, see below). Most emblem researchers
have followed Ekman and Friesen’s definition (1972, a slightly adjusted version of the
one presented in 1969) who shifted the focus from conventionality towards the
emblem’s relation to speech:
Emblems are those nonverbal acts (a) which have a direct verbal translation usually con-
sisting of a word or two, or a phrase, (b) for which this precise meaning is known by most or
all members of a group, class, subculture or culture, (c) which are most often deliberately
used with the conscious intent to send a particular message to other person(s), (d) for
which the person(s) who sees the emblem usually not only knows the emblem’s message
but also knows that it was deliberately sent to him, and (e) for which the sender usually
takes responsibility for having made that communication. A further touchstone of an
emblem is whether it can be replaced by a word or two, its message verbalized without sub-
stantially modifying the conversation. (Ekman and Friesen 1972: 357)
This definition focuses on the word-likeliness of emblems, and, following Hanna (1996),
has hindered the development of thorough studies on emblems as communicative signs
in their own right and lead to a series of emblem repertoires (see also Payrató 1993 for a
systematic discussion). Adam Kendon qualifies his own definition about emblems as
“autonomous or quotable gestures” as a practical user’s definition, thereby declaredly
circumventing the difficulties of establishing coherent semiotic criteria, which even
within theoretical reasoning are difficult to meet. The term therefore refers to gestures
that “are standardized in form and which can be quoted and glossed apart from a con-
text of spoken utterance” (Kendon 1986: 7–8). With this definition, Kendon grasps
those gestures which have already made their way “into an explicit list or vocabulary”
(Kendon 2004b: 335), such as for instance the “thumbs up gesture”, the “victory ges-
ture” or the “fingers cross gesture” (see Morris et al. 1979, for examples).
3. Theoretical approaches
Emblems have been treated by almost all gesture researchers because they hold a
prominent position between conventional and codified gestural systems, such as sign
languages, and supposedly idiosyncratic and singular co-speech gestures. This idea is ex-
pressed in the so-called Kendon’s continuum that was introduced by David McNeill
(1992: 37–38) and elaborated in McNeill (2000) and Kendon (2004b) which arranges
gesture types on a scale from holistic, spontaneous, idiosyncratic and co-speech depen-
dent gesticulations to the language-like, conventional signs of sign languages. In
between are language-like gestures, pantomimes, and emblems, the last described as
having a “segmentation, standards of well-formedness, a historical tradition, and a com-
munity of users” (McNeill 1992: 56). What has been looked at when considering em-
blems depends heavily on the respective researcher’s theoretical assumptions. In the
following, we will sketch the most influential approaches.
3.1. Psychological and cognitive perspectives

As noted above, the work of the anthropologists and psychologists Paul Ekman and
Wallace V. Friesen (1969, 1972; with Harold G. Johnson 1975) has been most influential.
Their goal was to code and classify all nonverbal behavior according to its origin, cod-
ing, and usage, being well aware that their endeavor was actually impossible. Emblems
were seen as a social phenomenon, almost word like. Although they adopted the term
emblem from Efron they changed the scope, and included iconic gestures in this cate-
gory. According to Ekman and Friesen, emblems differ from spontaneous gestures
mainly because they are used consciously, intentionally and without speech. At the
same time, though, they have to be replaceable “by a word or two, its message verba-
lized without substantially modifying the conversation” (Ekman and Friesen 1972: 357).
With this definition, the characteristics of the emblem as a conventional and cultural
communicative sign in its own right were moved out of focus.
Isabella Poggi (Poggi 1983, 1987, 2002, 2004, 2007, inter alia; Poggi and Zomparelli
1987) has been pursuing a quite similar aim: the establishment of a lexicon for each
modality (touch, gaze, gesture), working with a semiotic and cognitive model of com-
munication in terms of the notions of goal and beliefs (see Castelfranchi and Parisi
1980; Poggi 2007). Following Ekman and Friesen, she states a strict division between
emblems and other gestures, emblems being comparable to words in a foreign language
and stored the same way (see Poggi and Magno Caldognetto 1997). They are culturally
codified, autonomous and translatable. Poggi’s semantic analysis of different aspects of
gestures and the establishment of a gesture typology leads to the “proto-grammatical”
differentiation between holophrastic emblems and lexical or, more recently, articulated
emblems. According to her findings, holophrastic emblems can be compared to
interjections, an equivalent of a complete speech act, with a clear and unchangeable il-
locutionary force, whereas articulated emblems behave like components of a communi-
cative act. Comparable to words they participate in communicative acts, but their
performative character changes according to the context.
In short, both approaches can be characterized by their semantic focus and their
verbocentric point of view.
3.2. Semiotic perspectives

The following lines of research share the semiotic perspective on the class of emblems.
The ontogeny of emblems as a result of ritualization has been exemplified with the
method of rational reconstruction by Roland Posner (2002), explicating the cognitive
as well as semiotic processes at work. The lexicalization process from a spontaneous
gesture to an emblem or a highly conventional gesture is illustrated by Kendon (1988).
Considering gestures as signs in their own right widens his perspective to include re-
flections on general properties of the gestural medium, an issue that, very surprisingly,
is rarely addressed. As such, Kendon (1981, 1996, 2004a, 2004b and elsewhere) under-
lines the characteristics of gestures. The fact that gestures are silent, faster to produce
than speech, visible, energetically cost-effective, have a greater immediate impact,
do not rely on organized structures of attention and are hideable, contributes highly
to the emergence and development of emblem repertoires which seem to evolve
around a restricted set of communicative functions (see below). A semiotic and
also linguistic perspective is characteristic for the work of Sparhawk (1976, 1978)
and, in a more general way, Calbris (1990, 2003). In her investigation of the formal
features of Iranian emblems, using the methodology of Stokoe (1960) for Sign Lan-
guages, and Pike (1947), Sparhawk concludes that there is a set of iconic contrasting
features in Persian emblems. The reason that it does not develop into a whole system
of oppositions can be explained by the relatively small number of emblems, which
makes such elaboration unnecessary. A part of Geneviève Calbris’ work can be
seen in a similar vein. Among other things, like a systematic analysis of semantic fields
in gestures, she investigated the formal properties of French gestures, such as move-
ment pattern, hand shape, direction, etc. tending toward a systematic set of form fea-
tures that is motivated and conventional at the same time, and as such culturally
coded (see Calbris 1990). Barbara E. Hanna (1996) has redefined the emblem in a
thoroughly semiotic way, drawing on the theories of Eco (1976), Peirce (1931–1958)
and Jakobson (1960). She emphasizes their conventional character as a sign within
the field of wider semiotics. According to her detailed and encompassing analysis,
she concludes that the main characteristics of emblems, as a class of gestures with
fuzzy edges, lie in their strong coding and their generality between contexts, where
analogous links are unnecessary and questions of motivation and/or arbitrariness
can be neglected.
3.3. Cultural, ethnographic and/or pragmatic perspectives

David Efron’s research on gestures was driven by the question whether gestures were
part of nature or part of human culture. He was not the first to investigate emblems
from an ethnographic perspective (see Kendon 2004b for an overview) when he
compared the gesture use of US-immigrants from Southern Italy with the gesture use of
Jewish immigrants from Eastern Europe, but he was the first to apply a variety of empir-
ical methods, for example direct observation combined with sketches, and – as a revo-
lutionary novelty – the compilation and interpretation of film material recorded on the
scene within natural communicative settings. Efron found that the use and especially
the repertoire of conventional gestures differed greatly between the two groups inves-
tigated. While the Italians had an extensive and diversified repertoire of conventional
gestures (151 gesture-words, not only emblematic, but also physiographic gestures),
the Jews hardly made any use of emblems at all, only six rather symbolic movements
could be spelt out. The assimilated groups of both origins, though, had clearly taken
over the US American standard displayed by their new status and/or social group
and hardly used any emblematic gestures at all.
Adam Kendon’s work starts out just where Efron’s ended. With a great expertise on
alternate sign languages, gesture and culture, his efforts have been directed towards the
investigation of gestures in use, in their natural surrounding. He has argued quite early
(e.g. 1988) against a definitorial division between the so-called spontaneous or idiosyn-
cratic and conventional or quotable gestures. In a study in 1995, Kendon compared em-
blems and formally similar conventional gestures, such as the emblematic gesture of the
mano a borsa with the finger bunch, a recurrent gesture according to Ladewig (2011a).
Both gestures are used pragmatically, the mano a borsa to indicate a certain speech act
(request, negative comment), the finger bunch in order to mark the topic of the utter-
ance. This suggests that a pragmatic use of gestures might be related to a process of con-
ventionalization. In her study of the “pistol hand” in Iran, Seyfeddinipur (2004) obtains
similar results.
In a comparative study of the gesturing of a Neapolitan and an Englishman, Kendon
(2004a) confirmed Efron’s findings about the elaborate repertoire and usage of conven-
tional gestures by the Italian. One possible explanation of the abundant gesture vocab-
ulary seems to lie in what he calls the ecology of interaction in Naples (Kendon 2004a,
2004b). In a review of Morris et al.’s book about the origin and distribution of emblems
in Europe (Morris et al. 1979, see below), Kendon (1981) resumed the functions of
these gestures on the basis of existing emblem repertoires. As we have noted above,
emblems are used to express communicative acts, rather than being used as a mere sub-
stitute for a word (see below for the functions). They are especially used for communi-
cative acts of interpersonal control, for announcing one’s own current state, and as
evaluative descriptions of the action or appearances of someone else.
Two contextual studies stand out in this line of research: Joel Sherzer (1991) has
undertaken a careful context of use analysis for the omnipresent “thumbs up gesture”
in urban Brazilian settings, so has Heather Brookes for the “clever gesture” (2001) in
the South African townships, followed by contextual studies of the “drinking”, “clever”
and “money gesture” (2005), and the “HIV gesture” (Brookes 2011) in the same com-
munity. Basing his analysis on the theories of Jakobson and Goffman, Sherzer shows
that the “thumbs up” gesture combines the paradigmatic notion of “OK”, “positive”
with the syntagmatic or interactive function of “social obligation met”. This combina-
tion accounts for a multifunctional use of this emblem covering almost all functions
that Kendon (1981) had extracted from the different repertoires. According to Sherzer
the main reason for its abundant use is that the gesture expresses a key concept of the
Brazilian culture representing a friendly and positive linkage between people, “a public
self-image very important to Brazilians” (Sherzer 1991: 196), who actually live in a
socially and economically divided society. A quite similar approach is Brookes’ study
(2001) on the “clever gesture” in South African townships, which expresses the concept
of being clever in the sense of “streetwise” or “city slick”, an important cultural concept
in township life. The different functions of this gesture are connected through the
semantic core: a formal reference to seeing. The core, the situational context, and the
facial expression constitute the gesture’s functions, as a warning, as a comment, or
even as a greeting. In the case of the “HIV gesture” (Brookes 2011), we can actually
observe the emergence, frequent use and decay of a gesture (see below). Here, the ges-
ture’s use and prominence are shown to be a result of a taboo, which is connected
to the connotation of sex and severity of this widespread illness together with social
communicational norms, like politeness for example.
The pragmatic linguist Lluı́s Payrató (1993, 2003, 2004, 2008, volume 2 of this hand-
book; Payrató, Alturo, and Payà 2004) has not only compiled a basic repertoire of Cat-
alan emblems, used by a certain social class in Barcelona, but has also introduced solid
methods of pragmatics, sociolinguistics, cognitive linguistics, and ethnography of com-
munication to emblem research. For Payrató, the determinant feature of emblems is
their illocutionary force. Using the speech act classification of Searle (1979) to investi-
gate emblematic functions more closely, he confirmed Kendon’s results (1981) and,
moreover, was able to show a tendency toward emblematization or conventionalization.
Considering the data of the Catalan basic repertoire, it can be said that directive ges-
tures, gestures for interpersonal control and such gestures that are based on interactive
actions seem to be the ones most likely to undergo emblematization (Payrató 1993:
206). In questions that concern the structure of an emblem repertoire, Payrató (2003)
used prototype theory, family resemblance, and relevance theory (Sperber and Wilson
1995) to account for the different relationships and meanings of single gestures or their
variants. On different occasions (Payrató 1993, especially 2001, 2004) he has argued for
an implementation of diverse precise linguistic methods into gesture studies and for
an opening of the traditional linguistics towards the fundamental insights that gesture
studies can contribute to the understanding of human communication, examples of a
fruitful integration can be seen throughout his work.
4. Emblem repertoires
The collection of emblems goes back to ancient times. Although Quintilian also ad-
dresses conventional gestures, among the first repertoires known is Bonifacio’s trea-
tise on the art of signs (L’arte de’ Cenni 1616, see Kendon 2004b: 23) together with
the works of John Bulwer (Chirologia and Chironomia [1644] 1972), both in the con-
text of gestures as the natural language of mankind. In the 19th century de Jorio’s
([1832] 2000) and Mallery’s ([1881] 2001) works stand out for their detailed descrip-
tion and ethnographic interest. Throughout the centuries, there has been a great inter-
est in collecting emblems as cultural gestures, and a detailed historic account would
exceed the possibilities of this paper, but Kendon (1981, 2004b) offers a good sum-
mary and Bremmer and Roodenburg (1992) present a diachronic view on gesture
use. A good overview of emblem repertoires can be found in Kendon (1981, 1996,
2004b), and, with a detailed bibliography, in Payrató (1993), for the Hispanic
tradition see Payrató (2008).
Considering the number of repertoires, one should expect theoretical insights

regarding this field. Unfortunately, this is not the case, one of the major reasons
being the lack of a common set of techniques and criteria for the elicitation and hand-
ling of data (see Kendon 1981; especially Payrató 2001, 2004; Poyatos 1981) and its em-
bedding in cultural or linguistic theories. Often, the methods of data gathering are left
unclear, exceptions being Brookes (2004), Johnson, Ekman, and Friesen (1975), Morris
et al. (1979), Payrató (1993), Sparhawk (1978). Those reasons are partly responsible for
the fact that concise cultural comparisons are rather scarce. Poyatos’ (1981) review of
the findings and the methods of Green (1968), and Efron (1972), and Kendon’s
(1981) review of Morris et al. (1979), and Saitz and Cervenka (1972), are exceptions
displaying on a more theoretical level.
4.1. Mono-cultural repertoires

The following lists are by no means exhaustive, but they try to include the most important
repertoires. Some European gesture repertoires are: Posner et al. (in preparation) for
Berlin, Germany; Cestero (1999), Gelabert and Martinell (1990), Green (1968), Poyatos
(1970) for Spanish in Spain; Payrató (1993) for Catalan in Barcelona; Calbris (1990, in-
cluding some contrastive findings with Hungarian and Japanese speakers), Calbris and
Montredon (1986), Wylie (1977) for French; Kreidlin (2004), Monahan (1983) for Rus-
sian; Diadori (1990), Munari (1963), Ricci Bitti (1992), Poggi (2002, 2004) for Italy; de
Jorio (2000), Paura and Sorge (2002) for Naples, Italy. For the USA there is the repertoire
of Johnson, Ekman and Friesen (1975); for Santo Domingo see Pérez (2000); for South
Africa see Brookes (2004); Sparhawk (1976, 1978 using an emblem list of Paul Ekman;
see also Johnson, Ekman, and Friesen 1975) and Seyfeddinipur (2004) for Iran; Barakat
(1973) for the gestures of the Levantine Arabs; Tumarkin (2002) for Japanese gestures.
Interestingly, and d’accord with the cliché, the Mediterranean area, especially the coun-
tries with Latin heritage, seem to be very attractive for gesture research (for historical
continuities of Latin emblems, see de Jorio 2000, and Fornés and Puig 2008).
4.2. Cross-cultural and contrastive emblem collections

The following are repertoires that compare emblems either of different countries and
different languages, or, as in the case of Meo-Zilio and Mejı́a (1980) and Rector and
Trigo (2004) one language within different geographical areas. Sociolinguistic compar-
isons within one language and one area are still missing. Influential collections are: Saitz
and Cervenka (1972, gestures from Colombia and USA), Meo-Zilio and Mejı́a (1980,
presenting more than 2000 gestures of Spain and Latin America, and, 1986, presenting
the extralinguistic sounds to the gestures); Rector and Trigo (2004, focusing on Portu-
guese on three different continents), Nilma Nascimento Dominique (2008, comparing
Brazilian and Iberian Spanish emblems); Morris et al. (1979, a comparison of the origin,
distribution and use of a sample of 20 different gestures in 25 different European coun-
tries); Kacem (2012, comparing German and Tunisian emblems, particularly in the
school context); Creider (1977, who compared four different groups with different lan-
guages within Kenya); Efron (1972, a thorough analysis of the emblem use by Southern
Italians, and Lithuanian and Polish Jews, see above), and Safadi and Valentine (1990,
comparing gestures and nonverbal behavior of the USA and Arabic countries).
5. Cross-cultural findings
Cross-cultural findings regarding emblems can be subdivided into issues of varying com-
plexities: Differences in the meaning(s) of individual gestures, their spread and distribu-
tion; differences in cultural key concepts expressed by emblems and finally differences
in the use, size and diversity of a gestural repertoire. The fact that – on an individual
level – emblems differ from one culture to another can be proven by the mere existence
of culture specific dictionaries or repertoires as listed above. It is, of course, difficult to
know why a certain gesture exists in this form in one area and in a different form or with
another meaning in another area gesturers rely on the iconic interpretation of signs
leading to widespread and very popular speculations about the origins of emblems
(see again Morris et al. 1979 for diverse etymological derivations).
Although emblems have been defined as having a clearcut translation, they are not
restricted to one meaning, not only across cultures, but also within one culture. As
Adam Kendon (1981) observed, there seems to be a link between the range of mean-
ings and their spread. The most widespread gestures in Morris et al. (1979), like the
“nose thumb”, for instance, have only one or a few related meanings attributed to
them, while the ones with a whole range of (unrelated) meanings are geographically en-
trenched. Reasons for the spread of emblems can be seen in culture contact, common
history, common religion, beliefs, and traditions, a common language, common climate,
traveling, and the influence of modern media. None of these factors act exclusively, or
predictably. When a certain emblem is tied to a specific idiom, an interjection and the
like, it might not cross linguistic borders. When an emblem is tied to religious beliefs, its
spread will probably depict the spread of this religion. In trying to answer the question
what keeps gestures from spreading, Morris et al. (1979: 263–265) propose, among
other things, cultural prejudice barriers, linguistic barriers, ideological and religious bar-
riers, geographical barriers, and gesture taboos. And, on a somewhat different level, the
semantic characteristics of the existing repertoire that prevents or shapes the adoption
of an emblem.
Close contextual and ethnographic studies such as those by Brookes (2001), Kendon
(1995) and Sherzer (1991) have shown that the frequent use of certain emblems in a
community may shed some light on important key concepts or concerns of this commu-
nity. Being positive and meeting social obligations in everyday interaction is an impor-
tant characteristic in urban Brazil, just as “doing” being clever and streetwise, and
belonging to the right group, is in the townships of South Africa, both are concepts
that need to be negotiated within everyday communication and interaction. Brookes’
(2004, 2005) collection of the gesture repertoire of South African urban young
men has similar features. By investigating the gesturers and their gesture use in
their everyday surrounding, distinguishing different forms and functions in various in-
teractional contexts, she was able to get a very detailed hold on the characteristics of
this special repertoire which belongs to the overall communicative behavior gesturing
as a skill which should be mastered as an important part of male township identity.
In order to gain cross-cultural insights, though, other, comparable investigations are
required.
Cultural differences in gesture repertoires have been presented most notably by
David Efron (1972). He observed that the Italians in his study used more pictorial ges-
tures, and had a far bigger repertoire than the Eastern European Jews. To Efron it
seemed that the Italian repertoire could serve as an exclusive means of communication
while the Jews hardly used any emblems at all, and if they did, they were not inter-
preted consistently. De Jorio (2000), Kendon (1995, 2004a, 2004b), and others have con-
firmed and described the size and diversity of the (Southern) Italian repertoire (see also
Burke 1992). While de Jorio concentrated on the historical aspect of gestures, tracing
the gestures back to ancient times, Kendon developed a theory about the overall ecol-
ogy of Naples as a reason for the abundance of conventional gestures. The dominance
of a somewhat theatrical public life, the crowded streets, the overall noise, the interest
in display and a tradition of secrecy, according to Kendon, have their share in the
emergence of this refined communication system.
6. Characteristics of emblems
The following sections will touch upon some of the most important characteristics that
are at stake when discussing emblems: their semantic domains, the emergence and
origin of emblems, their compositionality, conventionality, and their relation to speech.
6.1. Semantic domains: Meanings and functions

Emblems seem to cluster in certain semantic domains: they are used for certain func-
tions and within certain contexts. David Efron (1972: 124) resumed the semantic do-
mains of the Italian “gesture words”, which include gestures about bodily functions,
moral qualities, values, and attitudes, logical and affective states and superstitious
motives. When Johnson, Ekman and Friesen (1975: 343) compared their findings on
American emblems with others they observed that most emblems were found within
greetings and departures, were insults, interpersonal directions, replies, comments
one’s own physical state, expressions of affect, and appearance. Summarizing and sys-
tematizing the findings of different repertoires (Creider 1977; Efron 1972; Morris et al.
1979; Payrató 1993; Saitz and Cervenka 1972; Sparhawk 1978; Washabaugh 1986;
Wylie 1977) on a more abstract level, Kendon (1981, 2004b) also noticed that the
great majority of emblems can be divided into three major groups according to the
messages they convey: Emblems are used for interpersonal control, such as “stop”,
“I am watching you”, “be quiet”, etc., secondly for announcing one’s own current
state, such as “I am hungry”, “I am late”, and thirdly as an evaluative description
of the action or appearances of someone else, as in “he is crazy” (see Kendon
2004b: 339). What is rare throughout most repertoires are pure nominal glosses,
such as the money gesture, where thumb and index are rubbed repeatedly, or the scis-
sors gesture, where index and middle finger reenact the opening and closing of a pair
of scissors, exceptions are presented by Sparhawk (1978) and Brookes (2004).
Brookes classified the South African emblem repertoire adapting, among other ana-
lytical tools, the functional typology proposed by Poggi (1983, 1987) that divides em-
blems into holophrastic and lexical gesture, which she extends by concept gestures
(see Kendon 2004a). Rather surprisingly, in the South African repertoire lexical ges-
tures present the majority of emblems dealing with the actions and objects of every-
day life, such as gestures referring to a phone, a pen, to cooking, and are used for
rather practical reasons. But they also include gestures that reflect the young
men’s township identity relating to typical clothing, crime, and violence and are
used for the identification of people, for commenting on them, for threatening and
warnings. Brookes concludes that the functions of lexical gestures vary and that those
emblems that are based on practical objects and actions fulfill a smaller range of func-
tions than the others. Those lexical gestures seem to be close to what Kendon (2004b:
chapter 10) has called narrow gloss gestures when they display rather substantial than
pragmatic information. The other lexical gestures seem to be used as interactional
moves just as described a detailed comparison of the functions of lexical emblems
with other repertoires has not been undertaken.
As mentioned above, Payrató (1993) used Searle’s (1979) speech act classification in
order to describe the functions of the gestures in the overall Catalan repertoire, consist-
ing of emblems, pseudoemblems and other items, where the last two categories decrease
in conventionality and preciseness of meaning. Due to the fact that one gesture can
have multiple illocutionary values, the categories (assertives, directives, etc.) were not
seen as exclusive. The results reveal that most emblems have an assertive function, fol-
lowed by directive and expressive functions. What is even more interesting is that the
comparison of the three sets shows that the assertive function increases within the lesser
conventional sets, just as the directive function decreases. This suggests that there is a
clear correlation between gestural functions analyzed in strict linguistic terms and con-
ventionality: Gestures with a directive function tend to undergo emblematization more
easily, a trend which underlines once again the findings of Adam Kendon and others,
namely that emblems cluster around functions that are concerned “with the immediate
interaction situation” (Kendon 1981: 142).
6.2. Emergence and origin

As we have noted earlier, the exact origin of emblems remains unclear most of the
times. Only in two gestures, so far, can we observe the process of the origin and emer-
gence of a conventional gesture: the “V as victory sign”, as described by Schuler (1944)
and “The three letters”, a gesture signifying HIV in South Africa, as described by
Brookes (2011). The “V as victory sign” was invented as a secret sign to unite the efforts
in their fight against Nazi fascism. The “HIV gesture” emerged as a sign to communi-
cate a relevant social (health) issue that was taboo (see Brookes volume 2). Both ges-
tures are connected to the linguistic system. In the case of the “V as victory sign” the
fingers represent the letter “V” as in “victory”, in the second case the counting gesture
“three” was re-semanticised and linked to the verbal expression “the three letters”,
meaning HIV, that eventually faded. Here we have a process of obfuscation that starts
with the verbal use of an acronym, then followed by a verbal reference to the mere
number of the letters of this acronym, and finally by the gestural representation of
this number. In both cases, though, the communicative need in the community to
address something privately, secretly, thereby respecting the social norms seems to be
relevant, not only for the emergence of the gestures, but also for their change and dura-
bility. Similar processes can be assumed for the emergence of gestures for insults, direc-
tions, and the expression of attitudes, where a medium that is quiet, quickly performed,
visible and at the same time disguisable, is most apt for these communicative needs.
Beginning with the base of an emblem, Roland Posner (2002) describes in his ratio-
nal reconstruction the ontogenesis of the emblem of “flapping one’s hand”. This gesture
is used to express something hot with all of its metaphorical and metonymical mappings
and originates in the actual burning of one’s own hand as a bodily experience. Posner’s
semiotic and ethological analysis of the emergence of an emblem as a process of ritua-
lization is of a more general scope because it may hold for a wider range of emblems,
namely such that are based on body movements of different sorts, regardless of their
communicative function. The emergence of a historic emblem, the gesture of
“bound hands”, from an action in a ritual context is described as a modulation in
Goffman’s terms (see Goffman 1974) by Müller and Haferland (1997). Similar em-
blems like the “fingertip kiss” can be found in Morris et al. (1979). Having started
this section with the emergence of emblems out of relevant communicative and social
needs, we have come to the emergence of emblems from different bases, such as body
movements or ritual actions. Further bases of emblems are other (co-speech) gestures,
affect displays and expressions of feelings, adaptors, interpersonal actions, intention
movements, (symbolic) objects, idioms and other linguistic expressions, and abstract
entities (see Brookes 2011 for an overview, and Kendon 1981). Regarding the Catalan
repertoire, Payrató (1993) concludes that gestures based on interactive actions are
most likely to become emblematized than others, which, again, seems to match the
overall assessment that emblems are concerned with the immediate interactional
situation.
6.3. Conventionality
Emblems are conventional gestures and therefore differ from spontaneous, singular or
creative co-speech gestures. The only study, to our knowledge, that treats the conven-
tionality of emblems in depth is the one by Barbara E. Hanna (1996). As we have
sketched above, according to Hanna, emblems are conventional signs and as such
they are strongly coded. Apparently, they have a standard of form and a notion of gen-
erality. For Hanna, an emblem is a replica of a type that is already known and that spe-
cifies the form and the meaning. Because of the strong coding, neither an analogous link
to the object represented nor a specific context is necessary. While for Hanna conven-
tion is essential to the functioning of every sign, what makes emblems specific is “that
the interpretation of emblems is governed by strong habits, that emblems are ruled by
strong conventions, thus being conventionalized to the point of generality” (1996: 346).
Emblems are a category of gestures with fuzzy edges and conventionality is not an
exclusive characteristic of them.
Kendon’s continuum or continua (McNeill 1992, 2000; see also Kendon 2004b) was a
way to determine the characteristics of different gesture types on a continuum that
comprised their relationship to speech, linguistic properties, to conventions, and the
character of their semiosis. According to this tradition, emblems are in between signs
of a sign language and gesticulation or spontaneous idiosyncratic gestures. The relation-
ship between gestures and signs has been reconsidered recently by Wilcox (2005, this
volume) and Kendon (2008), inter alia, insofar as the interconnections are foregrounded
and not the divide. This line of research might open up new perspectives in the question
of the conventionality of emblems. From the perspective of co-speech gestures,
Kendon’s work has been influential yet again. As mentioned above, Kendon’s (1995)
comparative study of emblems and apparently conventional co-speech gestures, that
were used primarily with pragmatic functions, initiated the investigations of what has
been called recurrent gestures (Ladewig 2011a, volume 2; Müller 2010, this volume).
Although further research is needed, it appears that there are fundamental overlaps
between emblems and recurrent gestures, like the “palm-up-open-hand-gesture” that
presents something on the open hand (see Müller 2004). An experimental study by
Fey Parrill (2008) comparing the “palm-up-open-hand-gesture” with the “OK” emblem
presents similarities and differences between those two types. While the emblem had a
more restricted range of usages, both gestures were acknowledged to have formal var-
iants. Interestingly, standards of well-formedness could not be confirmed for either ges-
ture. More insights on the subject of conventionality of emblems can be found in studies
about the process of emblematization such as the ones by Brookes (2011) and Payrató
(1993). Comparing the three sets of gestures in his repertoire, Payrató was able to con-
clude that “directive gestures, interpersonal control gestures, and gestures based on
interactive actions are the least restrained by the filters in the basic repertoire of Cat-
alan emblems; therefore, they seem to be more likely than any others to reach the
highest level of emblematization or conventionalization of body action” (Payrató
1993: 206).
6.4. Compositionality
Another characteristic of emblems is its basic compositionality, meaning that an emblem
can consist of more than one formal gestural component as, for example, when the
“thumbs up” gesture is moved repeatedly towards the interlocutor, combining a signifi-
cant hand configuration with a movement pattern (Calbris 1990, 2003; Kendon 1995,
2004b; McNeill 1992, 2000; Poggi 2002; Sparhawk 1978). The results of Sparhawk’s ana-
lysis show that although she could confirm a set of contrasting elements, even some min-
imal pairs, in the Persian data, they differ notably from the contrastive system of sign
languages. Rebecca Webb (1996) has undertaken a similar approach toward so-called me-
taphoric gestures. Her findings suggest a small set of “morpheme-like” components that
can be recombined with other components.
Compositionality can also mean that an emblem consists of a hand gesture and a
facial expression (among others Calbris 1990; Payrató 2003; Poggi 2002, inter alia; Poya-
tos 1981; Ricci Bitti 1992; Sparhawk 1978). The importance of the facial component in
emblems has been shown by Poggi (2002: 80). In order to decide whether an emblem
represents a fixed communicative act or not, she performed different performative
faces to see if they match or mismatch the gestural function. If variations are possible,
it is an articulated emblem, if only one facial expression is valid, it is a holophrastic
emblem.
In a third interpretation, compositionality might mean that two emblems combine
into a new one (Calbris 1990; Johnson, Ekman, and Friesen 1975; Morris et al. 1979).
This case is very seldom, but Morris et al. report a combination of the “flat hand-
chop threat emblem” with the “ring” and the combination of the “fig” or “horn ges-
ture” with the “forearm jerk”, as to double the impact of the insult (Morris et al.
1979: 267). Somewhat differently, it may mean that an emblem is used with a sound,
which can be paralinguistic, made by the mouth, by the hand or by another articulator,
or an interjection, for instance (Calbris 1990; Meo-Zilio 1986; Posner 2002; Poyatos
1981). In the case of the “flapping hand gesture” presented by Posner, the original
sound of blowing onto the burnt hand and of taking a deep breath develops towards
linguistic articulation, ending in two interjections, each of them leading to a different
interpretation of the overall gesture. While one refers to the danger of something, the
other refers to its fascination.
6.5. Relation to speech

Similar to the issue of compositionality, the emblem’s relation to speech can be subdi-
vided. First, and maybe foremost, it describes its relationship to the ongoing verbal dis-
course. As referred to earlier, the absence of speech has become a widely adopted
criterion for emblems. But, as Poyatos (1981: 39–40) observed, it seems that emblems
are generally used together with speech, at least within the Hispanic culture and Ken-
don points to similar observations for Naples (2008: 360). From the opposite perspec-
tive, the studies of Ladewig (2011b) and Andrén (2010) have shown that gestures
that are not conventional can perfectly fit into syntactic slots where there is no speech
or that they can form utterances by gesture only. Interestingly, empirical investigations
of the use of emblems with or without concurring speech are still lacking.
Another way of regarding the relation between emblems and speech is when em-
blems develop on the base of an idiom, acronym or another linguistic expression and
are therefore language dependent instead of culture dependent, a criterion proposed
by Payrató (2008). This division is especially useful when emblems are investigated
cross-culturally in areas like the Mediterranean, where language, cultural and national
borders have spread in different ways and where culture and language contact take
place on a daily basis.
A last way of looking at emblems and their relation to speech is the comparison of
the characteristics of the two communicative systems, speech and gesture. Brookes
(2011) has addressed the attitude towards gesture, in contrast to verbal speech. The
emblem for HIV, which is established by reference to the spoken acronym, benefits
from the fact that “gesture is seen as a secondary, and indirect source, an act of “non-
saying” and thereby respecting social values” (Brookes 2011: 211). Since gesture is
not the only communicative system, users feel freer to play around with it (see Calbris
1990 for similar considerations). Not many authors have asked why people use em-
blems, when they are like words, although it appears to be such a central question.
Adam Kendon (1981, 2004b) proposes the following properties of gesture as possible
reasons: Gestures are quick to perform and can express complex concepts and inter-
actional moves in silence, and do not consume vast amounts of communicative
energy. These features allow the gesture for “encounters that are fleeting” (Kendon
2004b: 343), but also for side exchanges within a conversation, or for secret ex-
changes. Gestures are visible, which makes them apt for communicative exchanges
at a distance.
Throughout this article, we have tried to sketch the characteristics of emblems as a pre-
sumed class of conventional gestures. What is fascinating about them is that they “act as
conveyors of meaning in their own right” as Kendon puts it (1981: 146). Regarding
them as mere word-substitutes has not only obscured their functions within communi-
cation, but has also distracted the attention from their versatility and dynamics. More
recently, studies from different theoretical backgrounds seem to have overcome this
constraint. In some areas, though, such as gesture acquisition, gesture processing, and
most of the psycholinguistic tradition emblems need to receive more attention. Besides
more ethnographic and contextual studies, what is essential for future research on em-
blems is the development of scientific standards that allow for a true comparison of
emblem repertoires.
Acknowledgements
I would like to thank Jeffrey Wollock for his insights on emblems in the Renaissance
and Cornelia Müller for helpful comments on earlier versions of this chapter.
8. References
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. thesis, Centre for Languages
and Literature, Lund University.
Bacon, Francis 1640. Of the Advancement and Proficience of Learning. Book VI. Oxford: Young
and Forrest.
Barakat, Robert A. 1973. Arabic gestures. Journal of Popular Culture 4: 749–793.
Bremmer, Jan and Herman Roodenburg (eds.) 1992. A Cultural History of Gesture. Ithaca, NY:
Cornell University Press.
Brookes, Heather 2001. The case of the clever gesture. Gesture 1(2): 167–184.
Brookes, Heather 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14(2): 186–224.
Brookes, Heather 2005. What gestures do: Some communicative functions of quotable gestures in
conversations among Black urban South Africans. Journal of Pragmatics 32: 2044–2085.
Brookes, Heather 2011. Amangama amathathu ‘The three letters’. The emergence of a quotable
gesture (emblem). Gesture 11(2): 194–217.
Brookes, Heather volume 2. Gestures and taboo. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language – Communi-
cation: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Bulwer, John 1974. Chirologia or the Natural Language of the Hand, etc. (and) Chiromania or the
Art of Manual Rhetoric, etc. Cabonville: Southern Illinois Press First published [1644].
Burke, Peter 1992. The language of gesture in early modern Italy. In: Jan Bremmer and Herman
Roodenburg (eds.), A Cultural History of Gesture, 71–83. Ithaca, NY: Cornell University Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis: Gesture as the representa-
Calbris, Geneviève and Jacques Montredon 1986. Des Gestes et des Mots Pour le Dire. Paris: Clé
International.
Castelfranchi, Cristiano and Domenico Parisi 1980. Linguaggio, Conoscenze e Scopi. Bologna:
Il Mulino.
Cestero, Ana Marı́a 1999. Repertorio Básico de Signos no Verbales del Español. Madrid: Arco Libros.
Creider, Chet A. 1977. Towards a description of East African Gestures. Sign Language Studies 14:
1–20.
De Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A translation of La
mimica degli antichi investigata nel gestire napoletano (Fibreno, Naples 1832), with an introduc-
tion and notes by Adam Kendon. Bloomington: Indiana University Press.
Diadori, Pierangela 1990. Senza Parole: 100 Gesti degli Italiani. Rome: Bonacci Editore.
Eco, Umberto 1976. A Theory of Semiotics. Bloomington: Indiana University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton First published [1941].
gins, usage, and coding. Semiotica 1: 49–98.
Ekman, Paul and Wallace V. Friesen 1972. Hand movements. Journal of Communication 22: 353–374.
Fornés, Maria Antònia and Mercè Puig 2008. El Porqué de Nuestros Gestos. La Roma de Ayer en
la Gestualidad de Hoy. Palma: Edicions Universitat de les Illes Balears.
Gelabert, Marı́a José and Emma Martinell 1990. Diccionario de Gestos con sus Usos Más Usuales.
Madrid: Edelsa.
Goffman, Erving 1974. Frame Analysis. An Essay on the Organization of Experience. Cambridge,
MA: Harvard University Press.
Green, Jerald R. 1968. Gesture Inventory for the Teaching of Spanish. Philadelphia: Chilton Books.
Hanna, Barbara E. 1996. Defining the emblem. Semiotica 112(3/4): 289–358.
Johnson, Harold G., Paul Ekman and Wallace Friesen 1975. Communicative body movements:
American emblems. Semiotica 15(4): 335–353.
Kacem, Chaouki 2012. Gestenverhalten an Deutschen und Tunesischen Schulen. Ph.D. thesis,
Technical University, Berlin. URN: urn:nbn:de:kobv:83-opus-34158 URL: http://opus.kobv.
de/tuberlin/volltexte/2012/3415/
Kendon, Adam 1981. Geography of gesture. Semiotica 37(1–2): 129–163.
Kendon, Adam 1983. Gesture and speech: How they interact. In: John M. Wieman and Randall P.
Harrison (eds.), Nonverbal Interaction, 13–45. Beverly Hills, CA: Sage.
Kendon, Adam 1984. Did gesture have the happiness to escape the curse at the confusion of
Babel? In: Aaron Wolfgang (ed.), Nonverbal Behavior: Perspectives, Applications, Intercultural
Insights, 75–114. Lewiston, NY: C. J. Hogrefe.
Kendon, Adam 1986. Some reasons for studying gestures. Semiotica 62: 3–28.
Kendon, Adam 1988. How gestures can become like words. In: Fernando Poyatos (ed.), Cross-
Cultural Perspectives in Nonverbal Behavior, 131–141. Toronto: C. J. Hogrefe.
Kendon, Adam 1992. Some recent work from Italy on quotable gestures (emblems). Journal of
Linguistic Anthropology 2(1): 92–108.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 1996. An agenda for gesture studies. Semiotic Review of Books 7(3): 7–12.
Kendon, Adam 2004a. Contrasts in gesticulation. A British and a Neapolitan speaker compared.
tures. Proceedings of the Berlin Conference, April 1998, 173–193. Berlin: Weidler.
Kendon, Adam 2004b. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam 2008. Some reflections on the relationship between ‘gesture’ and ‘sign’. Gesture 8(3):
348–366.
Kreidlin, Grigori E. 2004. The Russian dictionary of Gestures. In: Cornelia Müller and Roland
Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings of the Berlin
Conference, April 1998, 173–193. Berlin: Weidler.
Ladewig, Silva H. 2011a. Putting a recurrent gesture on a cognitive basis. CogniTexte 6 http://
Ladewig, Silva H. 2011b. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. thesis, European University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. volume 2. Recurrent gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language – Communica-
tion: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Mallery, Garrick 2001. Sign Language among North American Indians. New York: Dover. First
published [1881].
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought, 2nd edition. Chicago:
Chicago University Press.
McNeill, David 2000. Introduction. In: David McNeill (ed.), Language and Gesture, 1–10.
Meo-Zilio, Giovanni 1986. Expresiones extralingüı́sticas concomitantes con expresiones gestuales
en el español de América. In: Sebastian Neumeister (ed.), Actas del IX Congreso de la Asocia-
ción Internacional de Hispanistas.
Meo-Zilio, Giovanni and Silvia Mejı́a 1980. Diccionario de Gestos: España e Hispanoamérica.
Bogota: Instituto Caro y Cuervo.
Monahan, Barbara 1983. A Dictionary of Russian Gestures. Ann Arbor, MI: Hermitage.
Morris, Desmond 2002. Peoplewatching. London: Vintage.
Morris, Desmond, Peter Collett, Peter Marsh and Marie O’Shaughnessy 1979. Gestures. Their Ori-
gins and Distributions. New York: Stein and Day.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2004. The Palm-Up-Open-Hand. A case of a gesture family? In: Cornelia Müller
and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings of
the Berlin Conference, April 1998, 233–256. Berlin: Weidler.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68. Munich: Fink.
Müller, Cornelia this volume. Linguistics: Gestures as a medium of expression. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interaction.
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände. Zur Semiose performativer Ges-
ten. Mitteilungen des Deutschen Germanistenverbandes 44(3): 29–53.
Munari, Bruno 1963. Supplemento al Dizionario Italiano. Milan: Muggiani.
Nascimento Dominique, Nilma 2008. Inventario de emblemas gestuales españoles y brasileños.
Language Design 10: 5–75.
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 225–247. Amster-
Paura, Bruno and Marina Sorge 2002. Comme te L’aggia Dicere? Ovvero L’arte Gestuale a Napoli.
Naples: Intra Moenia.
Payrató, Lluı́s 1993. A pragmatic view on autonomous gestures: A first repertoire of Catalan
emblems. Journal of Pragmatics 20: 193–216.
Payrató, Lluı́s 2001. Methodological remarks on the study of emblems: The need for common eli-
citation procedures. In: Christian Cavé, Isabelle Guaitella and Serge Santi (eds.), Oralité et
Gestualité: Interactions et Comportements Multimodeaux dans la Communicacion, 262–265.
Paris: Harmattan.
Payrató, Lluı́s 2003. What does ‘the same gesture’ mean? A reflection on emblems, their organi-
zation and their interpretation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Ges-
tures, Meaning and Use, 73–81. Porto: Fernando Pessoa University Press.
Payrató, Lluı́s 2004. Notes on pragmatic and social aspects of everyday gestures. In: Cornelia Mül-
ler and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings
of the Berlin Conference, April 1998, 103–113. Berlin: Weidler.
Payrató, Lluı́s 2008. Past, present, and future research on emblems in the Hispanic tradition: Pre-
liminary and methodological considerations. Gesture 8(1): 5–21.
Payrató, Lluı́s volume 2. Emblems or quotable gestures: Structures, categories, and functions.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Jana Bressem (eds.), Body – Language – Communication: An International Handbook
Payrató, Lluı́s, Núria Alturo and Marta Payà (eds.) 2004. Les Fronteres del Llenguatge. Lingüı́stica
I Comunicació No Verbal. Barcelona: Promociones y Publicaciones Universitarias.
Peirce, Charles Sanders 1960. Collected Papers of Charles Sanders Peirce (1931–1958), Volume I:
Principles of Philosophy, Volume II: Elements of Logic, edited by Charles Hartshorne and Paul
Weiss. Cambridge, MA: Belknap Press of Harvard University Press.
Pérez, Faustino 2000. Diccionario de Gestos Dominicanos. Santo Domingo, Republica Domeni-
cana: Faustino Pérez.
Pike, Kenneth L. 1947. Phonemics: A Technique for Reducing Languages to Writing. Ann Arbor:
University of Michigan Press.
Poggi, Isabella 1983. La mano a borsa: Analisi semantica di un gesto emblematico olofrastico. In: Gra-
zia Attili and Pio Enrico Ricci Bitti (eds.), Comunicare Senza Parole, 219–238. Rome: Bulzoni.
Poggi, Isabella (ed.) 1987. Le Parole nella Testa: Guida a un’ Edicazione Linguistica Cognitivista.
Bologna: Il Mulino.
Poggi, Isabella 2002. Symbolic gestures. The case of the Italian gestionary. Gesture 2(1): 71–98.
Poggi, Isabella 2004. The Italian gestionary. Meaning representation, ambiguity, and context. In:
tures. Proceedings of the Berlin Conference, April 1998. Berlin: Weidler.
Poggi, Isabella 2007. Mind, Hands, Face and Body: A Goal and Belief View of Multimodal Com-
munication. Berlin: Weidler.
Poggi, Isabella and Emanuela Magno Caldognetto 1997. Mani Che Parlano. Padova: Unipress.
Poggi, Isabella and Marina Zomparelli 1987. Lessico e grammatica nei gesti e nelle parole. In: Isa-
bella Poggi (ed.), Le Parole nella Testa: Guida a un’ Edicazione Cognitivista, 291–328. Bologna:
Il Mulino.
Posner, Roland 2002. Everyday gestures as a result of ritualization. In: Monica Rector, Isabella
Poggi and Nadine Trigo (eds.), Gestures. Meaning and Use, 217–230. Porto: Fernando Pessoa
University Press.
Posner, Roland, Reinhard Krüger, Thomas Noll and Massimo Serenari in preparation. The Berlin
Dictionary of Everyday Gestures. Berlin: Weidler.
Poyatos, Fernando 1970. Kinésica del español actual. Hispania 53: 444–452.
Poyatos, Fernando 1981. Gesture inventories: Fieldwork methodology and problems. In: Adam
Kendon (ed.), Nonverbal Communication, Interaction, and Gesture. Selections from Semiotica,
371–400. The Hague: Mouton.
Rector, Monica and Salvato Trigo 2004. Body signs: Portuguese communication on three conti-
nents. In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Every-
day Gestures. Proceedings of the Berlin Conference, April 1998, 195–204. Berlin: Weidler.
Ricci Bitti, Pio Enrique 1992. Facial and manual components of Italian symbolic Gestures. In:
Fernando Poyatos (ed.), Advances in Nonverbal Communication, 187–196. Amsterdam: John
Benjamins.
Safadi, Michaela and Carol Ann Valentine 1990. Contrastive analysis of American and Arab non-
verbal and paralinguistic communication. Semiotica 82(3–4): 269–292.
Saitz, Robert L. and Edward J. Cervenka 1972. Handbook of Gestures. The Hague: Mouton.
Schuler, Edgar A. 1944. V for victory: A study in symbolic social control. Journal of Social Psy-
chology, 19: 283–299.
Searle, John R. 1979. Expression and Meaning. Studies in the Theory of Speech Acts. Cambridge:
tures. Proceedings of the Berlin Conference, April 1998, 205–216. Berlin: Weidler.
Sherzer, Joel 1991. The Brazilian thumbs-up gesture. Journal of Linguistic Anthropology 1(2):
189–197.
Sparhawk, Carol M. 1976. Linguistics and gesture: An application of linguistic theory to the study
of Persian emblems. Ph.D. thesis, The University of Michigan.
Sparhawk, Carol M. 1978. Contrastive-Identificational features of Persian Gesture. Semiotica

24(1/2): 49–85.
Sperber, Dan and Deirdre Wilson 1995. Relevance: Communication and Cognition, 2nd edition.
Oxford: Blackwell.
Tumarkin, Petr S. 2002. On a dictionary of Japanese gesture. In: Monica Rector, Isabella Poggi and
Nadine Trigo (eds.), Gestures. Meaning and Use. Porto: Fernando Pessoa University Press.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. Ph.D. thesis, New York: Univer-
sity of Rochester.
Wilcox, Sherman 2005. Routes from gesture to language. Revista da Abralin 4(1/2): 11–45.
Wilcox, Sherman this volume. Speech, sign, and gesture. In: Cornelia Müller, Alan Cienki, Ellen
Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body-Language-Com-
munication: An International Handbook on Multimodality in Human Interaction (Handbooks
of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Wollock, Jeffrey this volume. Renaissance philosophy: Gesture as universal language. In: Cornelia Mül-
ler, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
Body-Language-Communication: An International Handbook on Multimodality in Human Interac-
Wundt, Wilhelm 1900. Völkerpsychologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Volume 1: Die Sprache, Part 1. Leipzig: Engelmann.
Wylie, Lawrence 1977. Beaux Gestes: A Guide to French Body Talk. Cambridge, MA: Undergrad-
uate Press.
Sedinha Teßendorf, Frankfurt (Oder) (Germany)
5. Framing, grounding, and coordinating

conversational interaction: Posture, gaze,
facial expression, and movement in space
1. Introduction
2. Background
3. Posture
4. Gaze
5. Facial expression
6. Movement
7. The interplay of body and talk in interaction
8. Conclusion
9. References
Abstract
This chapter examines several forms of embodied action in interaction. After discussing
the historical emergence of an interactionist approach to embodied action from early
figures in American anthropology, to the Palo Alto group, to present day conversation
analysts, it considers research on body posture, gaze, facial expression, and movement
in space for their distinct contributions to the moment-by-moment production and
5. Framing, grounding and coordinating conversational interaction 101
management of conversational interaction. Then, the chapter examines the interplay of

these particular forms of embodied action in recurrent interactional activities, using as
examples openings and storytelling. As is demonstrated through these examples, the vari-
ety of embodied actions that participants make use of in interaction are part of an extraor-
dinarily powerful yet nuanced toolkit for differentiating their work as particular sorts of
participants (i.e., as speaker and recipient, storyteller and story recipient, doctor and
patient, etc.), and in the particular sorts of interactional, interpersonal, and institutional
business that comprises encounters.
1. Introduction
The framing, grounding, and coordination of conversational interaction is a nuanced
and complex enterprise, one that is made possible in large part by the relative flexibility
of the human body. The head, eyes, mouth, face, torso, legs, arms, hands, fingers, and
even the feet comprise moveable elements of the human body that can be arranged
and mobilized in conjunction with talk in a potentially limitless variety of configura-
tions. These configurations convey participants’ readiness to interact; the nature and
quality of their relationships; the current and unfolding tenor of the immediate interac-
tion; as well as the moment-by-moment differentiation of their identities as speakers
and hearers, storytellers and story recipients, doctors and patients, and other such iden-
tities that are associated with a variety of interactional, interpersonal, and institutional
activities in interaction. These are activities that are constituted via the particulars of
participants’ speech and body movements as being recognizably about something, as
being directed toward some end, and they comprise the frameworks for meaning and
action in interaction.
2. Background
The study of the body and speech in interaction as a detailed, naturalistic endeavor
owes its beginnings to a confluence of figures from several disciplines and the emer-
gence of technologies – namely, film and video – capable of capturing behaviors that
appear one moment and disappear the next. By most accounts, the meeting of scholars
at Stanford University’s Center for Advanced Study in the Behavioral Sciences in 1955,
and extending via briefer meetings through the late 1960s, marks a pivotal point in this
area of study (Kendon 1990; Leeds-Hurwitz 1987), as does the work of sociologist Erving
Goffman, and later, of conversation analysts.
The Stanford group, sometimes referred to as the Palo Alto group, included such
early and mid-twentieth century figures as psychiatrists Frieda Fromm-Reichmann
and Henry Brosin; linguists Charles Hockett and Norman McQuown; and anthropolo-
gists Alfred Kroeber, Gregory Bateson and Ray Birdwhistell, among others (for a more
complete list, see Leeds-Hurwitz 1987). These figures had, in part, inherited their inter-
est in culture and communication, including gesture and body motion, and the desire to
study these phenomena closely through film, from an earlier generation of anthropolo-
gists who included Frans Boas and Edward Sapir, and, later, Margaret Mead (the latter
figures, Boas’ students); they were also influenced by figures in cybernetics and informa-
tion theory. The group’s initial goal was to use film to understand the role of nonverbal
behavior in the treatment of psychiatric patients, but their work came to be associated
with the emergence of a research approach that treated communication as an integrated

system of embodied as well as linguistic behaviors that take on meaning not in isolation,
but in the contexts of other behaviors and events. The approach came to be known as
the structuralist approach to communication or, as “context analysis” (Kendon 1990:
15). Its influence can be seen in the work of a number of scholars working from the
1960s onward including Albert Scheflen, William Condon, Starkey Duncan, and Erik
Erikson (discussed in Leeds-Hurwitz 1987: 31–32). Kendon, one of the most prolific
contributors to an interactionist approach to the study of body and speech, was a late-
comer to the Palo Alto group and notes not only its influence on subsequent interac-
tionist research, but also the influence of Erving Goffman and conversation analysts
(Kendon 1990: 38–41 and 44–49).
From a different tack, Erving Goffman approached the study of interaction not with
the methods afforded by film and microanalysis, but rather through astute ethnographic
observation and anecdote. He is to be credited with the championing of interaction as
an object of inquiry in its own right within sociology (Goffman 1983), and of providing
an analytic apparatus for understanding the basic organization of interaction with, for
example, his conceptualizations of the working consensus; participation frameworks;
the management of dominant and subordinate involvements; and face work (Goffman
1963, 1981). With Goffman, we understand the basic performativity involved in interac-
tional processes. Goffman’s students, Harvey Sacks and Emanuel Schegloff, sometimes
at odds with their teacher, went on with their colleague Gail Jefferson to found conver-
sation analysis (see e.g., Sacks, Schegloff, and Jefferson 1974; Schegloff and Sacks 1973).
This approach, also informed by the central interest of ethnomethodology in everyday
sensemaking (Garfinkel 1967; Heritage 1984), involves the rigorously empirical and
detailed study of conversational interaction using recorded, naturally-occurring
data. It has, since the 1980s, been influential to a number of investigations into speech
and body movement carried out by such scholars as Charles and Marjorie Harness
Goodwin, Christian Heath, Jürgen Streeck, Lorenza Mondada, Curtis LeBaron, and
others.
To be considered in this chapter are the findings of some of the scholars mentioned
above and others: first for their work on how particular forms of body behavior contrib-
ute to interaction; and then, in more detail, for how these behaviors work in concert
toward the accomplishment of recurrent interactional activities such as openings and
storytelling. It should be noted that while much of the research on embodied action
in interaction to date focuses primarily on native English speakers of American and
British background, some that is represented in this chapter draws on interactants
from other nationalities such as Japan, Italy, Finland, and Papua New Guinea, suggesting,
perhaps, that for at least some uses of the body in interaction there is a cross-cultural
consistency.
3. Posture
When two or more people interact, they arrange their bodies to communicate their or-
ientations to engagement. The “ecological huddle” in Goffman’s terms (1961), or F-for-
mation in Kendon’s (1990), is the positioning of one’s body toward another (or others)
for interaction, and in ways that convey varying degrees of involvement in any number
of other, possibly competing, activities and events. With their body arrangements,
participants create a “frame” of engagement and visibly display their alignment toward
one another as interactants.
As a number of researchers have noted, the human body provides a segmentally or-
ganized hierarchy of resources for communicating participants’ engagement in interac-
tion (Goffman 1963; C. Goodwin 1981; M. H. Goodwin 1997; Kendon 1990; Robinson
1998; Schegloff 1998). The head, torso, and legs especially can be arranged to convey
different points of attentional focus: for example, the head can be oriented in one direc-
tion, the torso in another and the legs in yet another. When these body segments are
aligned in the same direction, a single dominant orientation is communicated; when
they are not, they communicate multiple simultaneous orientations that are ranked
in accord with the relative stability of each body segment. Put another way, the most
stable of these segments, the legs, communicates a person’s dominant orientation rela-
tive to the torso and the head, while the torso communicates a more dominant orien-
tation relative to the head. Schegloff (1998) writes that when these body segments
are arranged divergently, and as such communicate multiple simultaneous involve-
ments, they convey a postural instability that projects a resolution in terms of moving,
for example, the least stable segment, the head, back into alignment with the more sta-
ble segments, the torso and the legs. Thus, a person’s fleeting and transitory involve-
ments are communicated as such relative to their more primary and long term
involvements, and this has important consequences for the forwarding and, alternately
holding off, of interaction.
Schegloff (1998) finds, for example, that co-participants to a conversation treat the
unstable, or “torqued,” body posture of their interlocutors as cause for limiting expan-
sion of a sequence of talk, as when the co-participant turns her head but not her lower
body to engage in talk; alternately, he finds that the alignment of the lower body with
the torqued head can be cause for sequence expansion. As another case in point, in
medical consultations, Robinson (1998) reports that patients entering the consultation
room may find that the doctor, who is seated at his desk, has turned his head to
greet them, although his legs and torso remain directed forward, oriented to the med-
ical records on the desk in front of him. In this way, the doctor’s body, representing a
hierarchy of differentially aligned segments, projects his initial engagement with the
patient as fleeting, and a return to the business with the records as an impending and
dominant involvement – although in the activity context of this encounter, a return
to interaction with the patient is projectably imminent. Patients are sensitive to this
matter: when the doctor turns back to the medical records, they occupy themselves
with such activities as settling in (e.g., shutting the door and taking a seat). When the
doctor is ready to begin the business proper, he will typically turn and orient his entire
body toward the patient, that is, with head, torso, and lower body simultaneously
aligned, and produce a topic initiating utterance such as “what’s the problem?” or
“what can we do for you today?”
4. Gaze
Gaze, too, is an integral element in the communication of participants’ orientations to
engagement, and works in concert with body posture. Looking at another, and another’s
looking back, is a critical step in the move from “sheer and mere co-presence” to rati-
fied mutual engagement, and people may avoid others’ gazes, and/or avoid directing
their own gaze to others, to discourage interaction (Goffman 1963). The management of
speaker-recipient roles, once interaction has begun, has been taken up by a number of
researchers (e.g., Argyle and Cook 1976; Bavelas, Coates, and Johnson 2002; Egbert
1996; C. Goodwin 1981; M. H. Goodwin 1980; Hayashi 2005; Kendon 1967, 1990; Kid-
well 1997, 2006; Lerner 2003; Rossano, Brown, and Levinson 2009; Streeck 1993, 1994).
Speakers and recipients do not typically gaze at one another continuously, but intermit-
tently: recipients gaze toward speakers as an indication of their attentiveness to talk,
and speakers direct their gaze to recipients to show that talk is being addressed to
them; recipients typically gaze for a longer duration at speakers, and speakers for
shorter duration (they tend to look away during long turns at talk as when telling a
story; C. Goodwin 1981; Kendon 1967, 1990). When speakers do not have the gaze
of a recipient, they may produce cut-offs, re-starts, and other dysfluencies until they
secure the recipient’s gaze (C. Goodwin 1981). Speakers may also produce such actions
as tapping or touching the other, bringing their own face and eyes into the other’s line
of regard, and, in some cases, even taking hold of the other’s face and turning it toward
their own; these are actions that are linked with efforts to remediate an encounter with
a resistant and/or unwilling interactant (Kidwell 2006). Recipients, too, may take action
to get a speaker to begin talk or address ongoing talk to them, for example by directing
their gaze (i.e., a show of recipiency) to the would-be speaker, making a sudden body
movement, or contacting the other’s body via some manner of touching (Heath 1986;
Kidwell 1997).
5. Facial expression
The face, while an important topic of study in psychological approaches to body com-
munication (especially, for example, in the work of Ekman), has often been overlooked
as an element in the coordination and management of conversational interaction. The
great mobility of the face, along with the speed (i.e., relative to other body parts) with
which it can be deployed in interaction, make it an especially useful resource as both a
stand-in for, and elaborator of, talk. There is a rich line of research into the syntactic
and semantic functions of the face in conjunction with speech (e.g., Bavelas and Chovil
1997; Birdwhistell 1970; Chovil 1991; Ekman, Sorenson, and Friesen 1969). However,
the face can also be used as a means of regulating talk and other interactional activities.
Kendon (1990) writes of the face in a “kissing round” between a man and a woman sit-
ting on a park bench for how the face, particularly that of the woman’s, regulates the
approach and orientations of the male. While Kendon notes a number of types of facial
expressions (for example “dreamy look” and “innocent look”), he specifically notes
that a closed-lip smile by the woman invites kissing, while a teeth-exposed smile does
not. In this way, the woman’s face serves as a resource for projecting not only what
she will do next (i.e., kiss or not kiss), but also what she will allow the male to do. In
conversational openings, Pillet-Shore (2012) notes that the face, particularly smiling
in conjunction with greetings, is used to “do being warm” at the outset of an encounter,
and invites further interaction. The face may be displayed prominently in interaction,
particularly for the role it plays in the expression of positive affect, but it may also
be shielded in interaction, particularly when it is used in expressions of grief. Thus, par-
ticipants will shield their eyes with their hands or a tissue, turn away, or lower their
heads to prevent others from seeing their faces during emotionally painful moments
(Beach and LeBaron 2002; Kidwell 2006). The face itself, as Goffman noted, is one of
“the most delicate components of personal appearance” and integrally involved in the
interactional work by which participants show themselves via constant control of their
facial movements to be situationally present, or “in play” and alive to the obligations of
their involvements with others (Goffman 1963: 27).
In a more recent study of the face in interaction, Ruusuvuori and Peräkylä (2009)
have demonstrated that facial displays not only accompany specific elements of talk,
but can project and follow these elements both in redundant and non-redundant
ways, in effect, making use of the face to extend the temporal boundaries of an action
beyond a turn at talk. They examine the role of the face in storytelling assessments and
other types of tellings. As they report, the face may be used by the speaker to fore-
shadow a stance toward something being described, in this way preparing the listener
for how to respond. Following an utterance, the face may be used by a speaker to pur-
sue uptake by a listener who fails to respond, as when a speaker continues to smile after
completing talk. They also demonstrate that the listener may respond not only verbally
in a way that shows understanding and affiliation with a speaker’s stance, but also with a
like facial expression: in other words, listeners may reciprocate a speaker’s facial
expression as a means of producing a reciprocating stance. It has also been reported
that listeners may use facial actions in conjunction with acknowledgement tokens and
continuers such as “mh hm” and “okay”, or as stand-alone responses to another’s
talk (i.e., without accompanying verbalizations; cf. Bavelas and Chovil 1997).
6. Movement
Movement is not so much an overlooked element in the coordination and management
of conversational interaction as it is a taken-for-granted one. Someone’s approach
toward another, like gaze directed at another, is one of the most basic and pervasive
ways by which interaction is initiated and, with the person’s movement away,
terminated – a particularly powerful resource for even very young children, who are
in the pre- and early-verbal stages of language use (Kidwell and Zimmerman 2007).
Body movement as an interactional resource has been considered in other ways as
well. For example, police may strategically move their bodies toward a suspect in con-
junction with their talk to prompt a confession (LeBaron and Streeck 1997); in a public
place such as a museum, visitors are attracted to exhibits that others are attracted to,
and move into the spaces left by others when they move on to the next exhibit
(Lehn, Heath, and Hindmarsh 2001). Regarding a fundamental organization of body
movement, Sacks and Schegloff (2002) showed that moving bodies, including moving
hands and limbs, typically return to the place from which they started, that is, to a
“home position” (Sacks and Schegloff 2002).
During conversation, participants may exhibit “interactional synchrony”, that is, a
roughly similar flow of body movements such as postural shifts, positioning of limbs,
and head movements by which they make visible and regulate their involvement with
one another (Condon and Ogston 1966; Kendon 1970, 1990). Head movements have
been found to have quite diverse functions. As a semantic matter, the head can be
used with or without speech to signify an affirmative or negative response. McClave
(2000) reports on a number of additional semantic patterns: for example, in conjunction
with certain words or phrases, lateral head sweeps can be used to show inclusivity;
lateral head shakes can be used to show intensity, disbelief, and/or uncertainty (M. H.
Goodwin 1980). Head movements are produced with greater frequency by speakers,
and speakers’ head movements may trigger listeners’ head movements (McClave
2000: 874–875). Listeners also produce head movements as a demonstration of their
attention to talk. Head nods may be produced alone, or in conjunction with acknowl-
edgement tokens and continuers such as “mh hm” and “okay”. Stivers (2008) notes a
distinct difference between the use of head nods and verbal tokens. Specifically, head
nods that are placed in the mid-telling position of a story demonstrate an affiliative
stance toward that displayed via speaker’s formulation of story events, while verbal to-
kens demonstrate alignment. Listeners may also make more affective responses with
their heads, as when they make a sudden jerk back to show surprise, or a particular
sort of comprehension, what Goodwin has called “take” (M. H. Goodwin 1980: 309).
7. The interplay of body and talk in interaction

To be considered next is how the interplay of the body behaviors discussed here – pos-
ture, gaze, facial expression, and movement – contribute to the constitution of impor-
tant interactional activities. Openings in interaction and storytelling will be examined
for how participants mobilize these body behaviors, in conjunction with talk, to set
up and coordinate frameworks for distinct types of activities with distinct types of
participation opportunities for those involved.
7.1. Openings
In interaction, participants must have some way of beginning an encounter, that is, of
indicating their interest in interacting, and their availability and willingness to do so.
Openings are critical to the initiation of interaction, not only in terms of coordinating
participants’ basic entry into an encounter, but also in terms of proposing something
about the nature of participants’ relationship to one another, the business at hand,
and, often, the tenor of the interaction to come.
7.1.1. Availability: Establishing and managing physical co-presence

Before interaction can begin in face-to-face situations, participants must first come into
one another’s physical presence. In this way, they make visible their availability to inter-
act, and can monitor others for their availability and readiness. For example, in medical
encounters, a patient coming through the door of the consultation room makes her or
himself available to interact with the doctor (Heath 1986; Robinson 1998). In service
encounters, a party’s approach toward an information desk is the first step toward
the initiation of interaction with the receptionist (Kidwell 2000). The establishment
of physical co-presence may be thought of as a pre-initiating move on the way to the
initiation of interaction for any number of face-to-face activities (Heath 1984: 250;
also, Schegloff 1979).
However, as Goffman (1963) writes, the management of physical co-presence itself is
an intricate enterprise. When people are in one another’s presence, whether intending
interaction or not, they monitor – or “glean” as Goffman writes – information about
one another (Goffman 1959). One can imagine that such situations include activities
like waiting for a bus or sitting in a class. These sorts of scenarios represent in Goff-
man’s terms, the realm of unfocused interaction, situations in which, although people
are co-present and attending consciously or unconsciously to any number of embodied
or otherwise unspoken communication phenomena by others, ratified social interaction
has yet (if at all) to take place (Goffman 1963).
7.1.2. Gaze co-ordination

The move to ratified social interaction, that is, in Goffman’s terms to focused interaction,
is one in which participants cooperate to sustain a single focus of attention, typically
through talk, but also in such activities as playing a game of chess, dancing, performing
surgery and any other activity that requires participants’ intentionally coordinated joint
action (Goffman 1963). Either concurrently with the establishment of co-presence, or
shortly thereafter, a next move toward the initiation of ratified interaction is through par-
ticipants’ coordination of gaze. Indeed, people can be co-present and withhold gaze from
another either because they do not intend interaction, or because they see that another is
pre-occupied and not yet ready for interaction. People in public settings may also quickly
gaze at another, and then gaze away, performing toward an unacquainted other a
moment of “civil inattention”, an act by which they acknowledge another’s presence
but convey that they do not intend interaction (Goffman 1963). The establishment of
co-presence plus the coordination of mutual gaze are necessary pre-conditions for parties’
entry into ratified social interaction, that is, in the move from “mere and sheer” physical
co-presence, to social co-presence (Goffman 1963; Mondada 2009; Pillet-Shore 2011).
Once these pre-conditions have been satisfied, participants then work to begin the inter-
action proper. One of the most pervasive ways this is accomplished is through greetings.
7.1.3. Greetings
Greetings may be verbal (Hello! Hey! How’s it goin’?) and/or embodied actions (waves,
head tosses, handshakes, hugs), that also typically include participants orientation of
their eyes and bodies toward one another and facial displays (e.g., smiles and eyebrow
flashes). Greetings proffered and greetings returned is a way that parties acknowledge
one another when they come into one another’s presence, a fundamental means of “per-
son appreciation”, but they also open up the possibility of further interaction and are
perhaps the most frequent way that participants begin interaction. Through their lexical
and intonational verbal production, in conjunction with their embodied components,
greetings reflect and propose something about the character of a relationship: for exam-
ple, are participants strangers or casual acquaintances, or are they good friends who
have not seen one another in a long time?
Kendon and Ferber (1973; Kendon 1990), in their classic paper on greetings, describe
a recurrent sequence of behaviors by which participants come to greet one another in
naturally occurring social gatherings. In the backyard birthday party example, guest and
host proceed through distinct phases, what are termed the “distance salutation” (made
when the guest first enters through the backyard gate), the “approach”, and the “close
salutation”. In the distance salutation phase, behaviors include sighting, in which parti-
cipants visually locate one another and typically wait for a return sighting, followed by
greeting displays (e.g., a hand wave, a head toss, and/or a “hello”) and accompanying
smiles. The approach phase may occur concurrently, or shortly thereafter. This phase is
characterized by participants looking away from one another, especially as they get
close; participants may also engage in self-grooms (e.g., smoothing their hair, adjusting
their clothing) and “body crosses” (crossing one or both arms or hands in front of the
body) as they approach. Once participants are near enough to begin the close salutation,
they again look at one another and produce another greeting, often followed by or pro-
duced simultaneously with such actions as a handshake, embrace, kiss, and/or other sort
of touching; this phase is also accompanied by smiles. The authors note that greeting in-
teractions are interrelated with participants’ roles as guests and hosts, their degree of
familiarity, and their relational status. For example, a host traveling far from the center
of the party to greet a guest creates a display of respect and enthusiasm at their arrival;
guests entering into the center before being greeted create a show of familiarity, while
those who wait on the fringe to be greeted first show relative unfamiliarity.
Indeed, the very first moves in face-to-face openings enable participants to discern
whether or not they are acquainted with someone and to design their greetings and
next moves accordingly. As Pillet-Shore (2011) writes, gaze is used to do critical iden-
tification/recognition work, that is, to discern in the very first moments of interaction
whether or not participants who are coming into one another’s presence already
know each other. Participants’ distinction between the acquainted and the unac-
quainted, and the consequences for subsequent interaction, is a major organizing fea-
ture of social behavior, as noted by Goffman (1963). Pillet-Shore (2012) documents the
systematicity by which participants, upon visually locating another as an acquaintance
or not, produce greetings that are recipient designed. Greetings between acquainted
parties are produced at a relatively louder volume than surrounding speech, and
make use of such features as a higher pitch, “smiley voice” intonation in conjunction
with smiles, continuing and rising final intonation, sound stretches, and multiple greet-
ing components (verbal and embodied); these latter two features enable greetings to be
produced in overlap. Hence, acquainted participants “do being warm” and index their
familiarity, in this way conveying that their identification/recognition work has been
successful and that they may move forward in the interaction.
7.2. Storytelling
One very common sort of activity that participants engage in in interaction is storytell-
ing. As Sacks (1972) and Jefferson (1978) noted, stories have a distinct structure that
consists of
(i) initiation,
(ii) delivery, and
(iii) reception by the story recipients.
Each of these components is realized via the moment-by-moment changing configura-

tions of participants’ body behaviors and talk – story teller and story recipients’ – that
work to create and sustain the participation frameworks of any given moment
(C. Goodwin 1984; M. H. Goodwin 1997).
In the following case, for example, three women (A, T, and R) are sitting around a
table. They have been playing a board game, and there has been a lull in their talk,
when one of them, A, turns to another, T, and initiates a story (discussed in Kidwell
1997). Transcription conventions can be found in the appendix.
1 A: =*did I **tell you that I met another recovering

2 M-A-***S-N volunteer this wee:k?
((* A turns, shifts gaze to T; **slaps table/ T and R shift their gaze to A
just after; ***T shakes head “no”))
A’s actions at line 1, that is, her turn toward T in addition to slapping the table and di-
recting her gaze at her, are embodied techniques that, in conjunction with her talk, des-
ignate T as her primary addressed recipient. A’s actions, however, have the effect of
eliciting a display of recipiency from both T and R: they both shift their gaze to A
although it is only T (the addressed recipient) who answers the story initiation question
that A has posed, which she does with a negative head nod. Story initiations are de-
signed to separate knowing from unknowing recipients, prepare recipients for the
kind of story that is being offered, and set up an extended turn space for the teller to
deliver the story (C. Goodwin 1981; Jefferson 1978; Sacks 1972). T’s action (the negative
head nod) informs A that she hasn’t heard the story, and, thus, functions as a go-ahead for
A to tell the story.
Getting the go-ahead, A assumes a distinct teller’s posture by returning her gaze
back to the center of the table and adjusting her clothing; she then places her elbows
on the table, and rests her head in her hands as she speaks at line 6 (see also C. Good-
win 1984). She maintains this position until she once again shifts her gaze to T at line 8.
Of note, however, is that it is R, the unaddressed recipient who responds with
continuers and head nods at lines 10 and 12.
6 A: *˚someho:w,˚ (.) I don’t even remember how I -

7 O:h cuz someone, (0.2) ˚ok˚ this is a guy
8 who’s organizing **queery? which is
9 this new r[adio show
10 R: ***[hmm hmm
11 A: ****on WORTS= ((radio station name))
12 R: ***=hm [m hmm
13 A: [okay
((* A places elbows on table, head in hands; **shifts gaze to
T; ***onset of R’s head nods; ****onset of T’s head nods))
The gaze shift by A at line 8 is done as part of a reference check: A wants to confirm
that T knows what she is talking about and, in addition to shifting her gaze to T, she
produces the word “queery” with a try-marked, rising intonation (Sacks and Schegloff
1979). Getting no indication of recognition from T, she continues with an explanation of
what “queery” means in lines 8, 9, and 11 while she looks at T. Although A has not di-
rected any of her gazes to R, and thus has not treated her as someone for whom the
story is being told, it is R who responds. R produces continuers that, along with her
head nods and gaze toward A, displays – and claims – recipient status; T, for her
part, makes only head nods at line 11 (Schegloff 1982). Moreover, R’s continuers and
head nods, produced in overlap with A’s explanation rather than at turn construction
unit boundaries, displays that she already knows what A is talking about, a way of
demonstrating that the story-in-progress is relevant for her, too.
In sum, A uses body positioning, movement, and gaze to designate T as her primary
addressed recipient. However, R challenges this framework with her embodied actions
and vocalizations. By positioning her head nods and continuers as she does, R shows
that not only is she a recipient, but that certain story elements are familiar to her,
too, and, therefore, that she is entitled to being addressed as a recipient. These
moves and other moves by R (not shown here) work to re-shape the participation
framework such that A subsequently (albeit briefly) accommodates her as a story
recipient.
8. Conclusion
As has been discussed here, the human body provides participants with a critical
resource for accomplishing and differentiating their work as particular sorts of partici-
pants in interaction (i.e., speaker and recipient, storyteller and story recipient, doctor
and patient, etc.), and in the variety of interactional, interpersonal, and institutional ac-
tivities that comprise encounters. The sensitivities of participants to these body behav-
ioral resources speak to the fundamental sociality of a social species in which even the
most minimal of movements of the body, face, eyes, hands, head and so on are of con-
sequence for what they understand about what others are doing, and what they them-
selves are expected to do, upon occasions of their coming together. Together with talk,
these resources are part of an extraordinarily powerful yet nuanced toolkit for going
about the complex business of being human.
Appendix: Transcription conventions

Below is a list of transcription conventions developed by Gail Jefferson and used in con-
versation analytic transcriptions of talk. Embodied actions are described in double
parentheses following talk; an asterisk (*) designates the point of onset relative to
the talk. For other systems of representing embodied action, see C. Goodwin (1981),
Heath (1986), and Robinson (1998).
[ brackets indicate overlapping talk

() talk heard, but not understood
(word) a guess at the talk
(.) very brief pauses
(1.0) measured silence
wor:d colon(s) indicates elongation of prior sound
word– dash indicates cut-off word
=word equals sign indicates latched speech
word underline indicates stress on word
WORD extra loud volume
˚word˚ spoken softly
↑↓ indicate rise and fall in pitch, respectively
.hh inbreath (preceded by period)
hh outbreath
9. References
Argyle, Michael and Mark Cook 1976. Gaze and Mutual Gaze. Cambridge: Cambridge University
Press.
Bavelas, Janet, Linda Coates and Trudy Johnson 2002. Listener responses as a collaborative pro-
cess: The role of gaze. Journal of Communication 52(3): 566–580.
Bavelas, Janet and Nicole Chovil 1997. Faces in dialogue. In: James A. Russell and José Miguel
Fernandez-Dols (eds.), The Psychology of Facial Expression, 334–346. Cambridge: Cambridge
University Press.
Beach, Wayne A. and Curtis D. LeBaron 2002. Body disclosures: Attending to personal problems
and reported sexual abuse during a medical encounter. Journal of Communication 52: 617–639.
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays on Body Motion Communication. Phila-
delphia: University of Pennsylvania Press.
Chovil, Nicole 1991. Discourse-oriented facial displays in conversation. Research on Language and
Social Interaction 25: 163–194.
Condon, William S. and William D. Ogston 1966. Sound film analysis of normal and pathological
Egbert, Maria 1996. Context sensitivity in conversation analysis: Eye gaze and the German repair
initiator “bitte.” Language in Society 25: 587–612.
Ekman, Paul, E. Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial
displays of emotions. Science 164(3875): 86–88.
Garfinkel, Harold 1967. Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.
Goffman, Erving 1959. Presentation of Self in Everyday Life. New York: Doubleday Anchor.
Goffman, Erving 1961. Encounters: Two Studies in the Sociology of Interaction. Indianapolis:
Bobbs-Merrill.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press.
Goffman, Erving 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press.
Goffman, Erving 1983. The interaction order. American Sociological Review 48: 1–17.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
London: Academic Press.
Goodwin, Charles 1984. Notes on story structures and the organization of participation. In: John
Maxwell Atkinson and John Heritage (eds.), Structures of Social Action: Studies in Conversa-
tion Analysis, 225–246. Cambridge: Cambridge University Press.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated in the production of
description sequences. Sociological Inquiry 50: 303–317.
Goodwin, Marjorie Harness 1997. By-play: Negotiating evaluation in storytelling. In: Gregory R.
Guy, Crawford Feagin, Deborah Schiffrin and John Baugh (eds.), Toward a Social Science of
Language: Papers in Honor of William Labov, Volume 2, 77–102. Amsterdam: John Benjamins.
Hayashi, Makoto 2005. Joint turn construction through language and the body: Notes on embodi-
ment in coordinated participation in situated activities. Semiotica 156: 21–53.
Heritage, John 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Heath, Christian 1984. Talk and recipiency: sequential organization in speech and body move-
ment. In: John Maxwell Atkinson and John Heritage (eds.), Structures of Social Action: Studies
in Conversation Analysis, 247–265. Cambridge: Cambridge University Press.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Jefferson, Gail 1978. Sequential aspects of story telling in conversation. In: Jim N. Schenkein (ed.),
Studies in the Organization of Conversational Interaction, 213–248. New York: Academic Press.
Kendon, Adam 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1970. Movement coordination in social interaction. Acta Psychologica 32: 1–25.
Kendon, Adam and Andrew Ferber 1973. A description of some human greetings. In: Richard
Phillip Michael and John Hurrell Cook (eds.), Comparative Ecology and Behavior of Primates,
591–668. London: Academic Press.
Kidwell, Mardi 1997. Demonstrating recipiency: Resources for the unacknowledged recipient.
Issues in Applied Linguistics 8(2): 85–96.
Kidwell, Mardi 2000. Common ground in cross-cultural communication: Sequential and insti-
tutional contexts in front desk service encounters. Issues in Applied Linguistics 11(1):
17–37.
Kidwell, Mardi 2006. “Calm Down!”: The role of gaze in the interactional management of hysteria
by the police. Discourse Studies 8(6): 745–770.
Kidwell, Mardi and Don Zimmerman 2007. Joint attention as action. Journal of Pragmatics 39(3):
592–611.
Leeds-Hurwitz, Wendy 1987. The social history of the “Natural History of an Interview”: A multi-
disciplinary investigation of social communication. Research on Language and Social Interac-
tion 20: 1–51.
LeBaron, Curtis D. and Jürgen Streeck 1997. Built space and the interactional framing of experi-
ence during a murder interrogation. Human Studies 20: 1–25.
Lehn, Dirk, Christian Heath and Jon Hindmarsh 2001. Exhibiting interaction: Conduct and collab-
oration in museums and galleries. Symbolic Interaction 24(2): 189–216.
Lerner, Gene H. 2003. Selecting next speaker: The context-sensitive operation of a context-free
organization. Language in Society 32: 177–201.
Lerner, Gene H., Don Zimmerman and Mardi Kidwell 2011. Formal structures of practical tasks:
A resource for action in the social lives of very young children. In: Charles Goodwin, Jürgen
Streeck and Curtis D. LeBaron (eds.), Multimodality and Human Activity: Research on Human
Behavior, Action, and Communication, 44–58. Cambridge: Cambridge University Press.
McClave, Evelyn Z. 2000. Linguistic funtions of head movements in the context of speech. Journal
of Pragmatics 32: 855–878.
Mondada, Lorenza 2009. Emergent focused interactions in public places: A systematic analysis of
the multimodal achievement of a common interactional space. Journal of Pragmatics 41(10):
1977–1997.
Pillet-Shore, Danielle 2011. Doing introductions: The work involved in meeting someone new.
Communication Monographs 78(1): 73–95.
Pillet-Shore, Danielle 2012. Displaying stance through prosodic recipient design. Research on
Language and Social Interaction. 45(4): 375–398.
Robinson, Jeffrey David 1998. Getting down to business: Talk, gaze, and body orientation during
openings of doctor-patient consultations. Human Communication Research 25: 97–123.
Rossano, Federico, Penelope Brown and Stephen C. Levinson 2009. Gaze, questioning and cul-
ture. In: Jack Sidnell (ed.), Conversation Analysis: Comparative Perspectives, 187–249. Cam-
Ruusuvuori, Johanna and Anssi Peräkylä 2009. Facial and verbal expressions in assessing stories
and topics. Research on Language and Social Interaction 42(4): 377–394.
Sacks, Harvey 1972. On the analyzability of stories by children. In: John J. Gumperz and Dell
Hymes (eds.), Directions in Sociolinguistics: The Ethnography of Communication, 325–345.
New York: Rinehart and Winston.
Sacks, Harvey and Emanuel A. Schegloff 1979. Two preferences in the organization of reference
to persons in conversation and their interaction. In: George Psathas (ed.), Everyday Language:
Studies in Ethnomethoology, 15–21. New York: Erlbaum.
Sacks, Harvey and Emanuel Schegloff 2002. Home position. Gesture 2: 133–146.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the orga-
nization of turn-taking for conversation. Language 50: 696–735.
6. Homesign: When gesture is called upon to be language 113
Schegloff, Emanuel A. 1979. Identification and recognition in telephone openings. In: George
Psathas (ed.), Everyday Language: Studies in Ethnomethoology, 24–78. New York: Erlbaum.
Schegloff, Emanuel A. 1982. Discourse as an interactional achievement: Some uses of “uh huh”
and other things that come between sentences. In: Deborah Tannen (ed.), Analyzing Dis-
course: Text and Talk, 71–93. Washington, DC: Georgetown University Press.
Schegloff, Emanuel A. 1998. Body torque. Social Research 65: 535–586.
Schegloff, Emanuel A. and Harvey Sacks 1973. Opening up closings. Semiotica 8: 289–327.
Stivers, Tanya 2008. Stance, alignment and affiliation during story telling: When nodding is a token
of preliminary affiliation. Research on Language in Social Interaction 41: 29–55.
munication Monographs 60: 275–299.
Streeck, Jürgen 1994. Gesture as communication II: The audience as co-author. Research on Lan-
guage and Social Interaction 27: 239–267.
Mardi Kidwell, Durham, NH (USA)
6. Homesign: When gesture is called upon

to be language
1. Gesture’s role in learning a spoken language
2. Gesture’s role when a model for language is not available: Homesign
3. The input to homesign
4. The next step after homesign
5. References
Abstract
When people speak, they gesture, and young children are no exception. In fact, children
who are learning spoken language use gesture to take steps into language that they cannot
yet take in speech. But not all children are able to make use of the spoken input that sur-
rounds them. Deaf children whose profound hearing losses prevent them from acquiring
spoken language and whose hearing parents have not exposed them to sign language also
use gesture, called homesigns, to communicate. These homesigns take on the most basic
functions and forms of language – lexicon, morphology, sentential structure, grammatical
categories, sentential markers for negations, questions, past and future, and phrasal struc-
ture. As such, the deaf children’s homesign gestures are qualitatively different from the
co-speech gestures that surround them and, in this sense, represent first steps in the
process of language creation.
All children who learn a spoken language use gesture. But some children – deaf chil-
dren with profound hearing losses, for example – are unable to learn the spoken language
that surrounds them. If exposed to a conventional sign language, these deaf children will
acquire that language as naturally as hearing children acquire spoken language (Lillo-
Martin 1999; Newport and Meier 1985). If, however, deaf children with profound hearing
losses are not exposed to sign, they have only gesture to communicate with the hearing
individuals in their worlds.
The gestures used by deaf children in these circumstances are known as homesigns.
They are different in both form and function from the gestures that hearing children pro-
duce to communicate along with speech, and resemble more closely the signs that deaf
children of deaf parents and the words that hearing children of hearing parents learn
from their respective communities. We begin with a brief look at the gestures that hearing
children produce in the early stages of language learning, and then turn to the homesign
gestures that deaf children create to substitute for language.
1. Gesture’s role in learning a spoken language

Gesture is very often a young child’s first way of communicating with others. At a time
when children are limited in the words they know, gesture can extend the range of ideas
they are able to express. The earliest gestures children use, typically beginning around
10 months, are deictics, gestures whose referential meaning is given entirely by the con-
text and not by their form, e.g., holding up an object to draw an adult’s attention to that
object or, later in development, pointing at the object (Bates et al. 1979). In addition to
deictic gestures, children also use iconic gestures. Unlike deictics, the form of an iconic
gesture captures aspects of its intended referent and thus its meaning is less dependent
on context, e.g., opening and closing the mouth to represent a fish. These iconic ges-
tures are rare in some children, frequent in others. If parents encourage their children
to use iconic gestures, these gestures become more frequent, which then facilitates,
at least temporarily, the child’s production of words (Goodwyn, Acredolo, and
Brown 2000). The remaining types of gestures that adults produce – metaphorics
(gestures whose pictorial content presents an abstract idea rather than a concrete
object or event) and beats (small baton-like movements that move along with the
rhythmical pulsation of speech) – are not produced routinely until relatively late in
development.
The early gestures that children produce not only predate their words, they predict
them. It is, for example, possible to predict a large proportion of the lexical items that
eventually appear in a child’s spoken vocabulary from looking at that child’s earlier
pointing gestures (Iverson and Goldin-Meadow 2005). Moreover, one of the best pre-
dictors of the size of a child’s comprehension vocabulary at 42 months is the number
of different objects to which the child pointed at 14 months. Indeed, child gesture
at 14 months is a better predictor of later vocabulary size than mother speech at
14 months (Rowe, Ozcaliskan, and Goldin-Meadow 2008; Rowe and Goldin-Meadow
2009).
In addition to presaging the shape of their eventual spoken vocabularies, gesture also
paves the way for early sentences. Children combine pointing gestures with words to
express sentence-like meanings (“open” + point at box) months before they can express
these same meanings in a word + word combination (“open box”). Importantly, the age
at which children first produce gesture + speech combinations of this sort reliably pre-
dicts the age at which they first produce two-word utterances (Goldin-Meadow and
Butcher 2003; Iverson and Goldin-Meadow 2005). Gesture thus serves as a signal
that a child will soon be ready to begin producing multi-word sentences. Moreover,
the types of gesture + speech combinations children produce change over time and
presage changes in their speech (Ozcaliskan and Goldin-Meadow 2005). For example,
children produce gesture + speech combinations conveying more than one
proposition (akin to a complex sentence, e.g., “I like it” + eat gesture) several months
before producing a complex sentence entirely in speech (“I like to eat it”). Gesture
thus continues to be at the cutting edge of early language development, providing
stepping-stones to increasingly complex linguistic constructions.
2. Gesture’s role when a model for language is not available:

Homesign
Children make use of gestures even if they are not learning language from their elders
but are, instead, forced to create their own language. Deaf children whose hearing
losses are so severe that they cannot learn a spoken language and whose hearing par-
ents have not exposed them to a sign language nevertheless communicate with the hear-
ing individuals in their worlds and use homesign gestures to do so (Lenneberg 1964;
Moores 1974; Tervoort 1961). Interestingly, homesigners use their gestures for the func-
tions to which conventional languages are put. They use homesigns not only to get
others to do things for them (i.e., to make requests), but also to share ideas and request
information (i.e., to make comments and ask questions). Homesigners even use their
gestures to serve some of the more sophisticated functions of language – to tell stories
(Phillips, Goldin-Meadow, and Miller 2001), to comment on their own and others’ ges-
tures, and to talk to themselves (Goldin-Meadow 1993). In this sense, the children’s
communications are qualitatively different from those produced by language-trained
apes who use whatever language they are able to develop to change peoples’ behavior,
not to change their ideas (see, for example, Greenfield and Savage-Rumbaugh 1991).
The homesigners’ gestures serve the functions of language.
The homesigners’ gestures also take on the forms of language. They are structured in
language-like ways despite the fact that the children do not have a usable model of a
conventional language to guide their gesture creation (Goldin-Meadow 2003). We
describe the properties of homesign that have been studied thus far in the following
sections.
2.1. Lexicon
Like hearing children at the earliest stages of language-learning, deaf homesigners use
both pointing gestures and iconic gestures to communicate. Their gestures, rather than
being mime-like displays, are discrete units, each of which conveys a particular meaning.
Moreover, the gestures are non-situation-specific – a twist gesture, for instance, can be
used to request someone to twist open a jar, to indicate that a jar has been twisted open,
to comment that a jar cannot be twisted open, or to tell a story about twisting open a
jar that is not present in the room. In other words, the homesigner’s gestures are not
tied to a particular context, nor are they even tied to the here-and-now (Morford
and Goldin-Meadow 1997). In this sense, the gestures warrant the label sign.
Homesigners use their pointing gestures to refer to the same range of objects that
young hearing children refer to using, first, pointing gestures and, later, words – and
in the same distribution (Feldman, Goldin-Meadow, and Gleitman 1978). Both groups
of children refer most often to inanimate objects, followed by people and animals. They
also both refer to body parts, food, clothing, vehicles, furniture and places, but less
frequently.
Homesigners use iconic gestures more frequently than most hearing children learn-
ing spoken language. Their iconic gestures function like nouns, verbs, and adjectives in
conventional languages (Goldin-Meadow et al. 1994), although there are fundamental
differences between iconic gestures and words. The form of an iconic gesture captures
an aspect of its referent; the form of a word does not. Interestingly, although iconicity is
present in many of the signs of American Sign Language (ASL), deaf children learning
American Sign Language do not seem to notice. Most of their early signs are either not
iconic (Bonvillian, Orlansky, and Novack 1983) or, if iconic from an adult’s point of
view, not recognized as iconic by the child (Schlesinger 1978). In contrast, deaf indivi-
duals inventing their own homesigns are forced by their social situation to create ges-
tures that not only begin transparent but remain so. If they didn’t, no one in their
worlds would be able to take any meaning from the gestures they create. Homesigns
therefore have an iconic base.
Despite the fact that the gestures in a homesign system need to be iconic to be under-
stood, they form a stable lexicon. Homesigners could create each gesture anew every
time they use it, as hearing speakers seem to do with their gestures (McNeill 1992).
If so, we might still expect some consistency in the forms the gestures take simply
because the gestures are iconic and iconicity constrains the set of forms that can be
used to convey a meaning. However, we might also expect a great deal of variability
around a prototypical form – variability that would crop up simply because each situa-
tion is a little different, and a gesture created specifically for that situation is likely to
reflect that difference. In fact, it turns out that there is relatively little variability in
the set of forms a homesigner uses to convey a particular meaning. The child tends
to use the same form, say, two fists breaking apart in a short arc to mean “break”,
every single time that child gestures about breaking, no matter whether it’s a cup break-
ing, or a piece of chalk breaking, or a car breaking (Goldin-Meadow et al. 1994). Thus,
the homesigner’s gestures adhere to standards of form, just as a hearing child’s words or
a deaf child’s signs do (Singleton, Morford, and Goldin-Meadow 1993). The difference
is that the homesigner’s standards are idiosyncratic to the creator rather than shared by
a community of language users.
2.2. Morphology
Modern languages (both signed and spoken) build up words combinatorially from a
repertoire of a few dozen smaller meaningless units. We do not yet know whether
homesign has phonological structure (but see Brentari et al. 2012). However, there is
evidence that homesigns are composed of parts, each of which is associated with a par-
ticular meaning; that is, they have morphological structure (Goldin-Meadow, Mylander,
and Butcher 1995; Goldin-Meadow, Mylander, and Franklin 2007). The homesigners
could have faithfully reproduced in their gestures the actions that they actually perform.
They could have, for example, created gestures that capture the difference between
holding a balloon string and holding an umbrella. But they don’t. Instead, the children’s
gestures are composed of a limited set of handshape forms, each standing for a class of
objects, and a limited set of motion forms, each standing for a class of actions. These
handshape and motion components combine freely to create gestures, and the meanings
of these gestures are predictable from the meanings of their component parts. For
example, a hand shaped like an “O” with the fingers touching the thumb, that is, an
OTouch handshape form, combined with a Revolve motion form means “rotate an
object <2 inches wide around an axis”, a meaning that can be transparently derived
from the meanings of its two component parts (OTouch = handle an object <2 inches
wide; Revolve = rotate around an axis).
Importantly in terms of arguing that a morphological system underlies the children’s
homesigns, we note that (1) the vast majority of gestures that each deaf child produces
conforms to the morphological description for that child, and (2) this morphological
description can be used to predict the forms and meanings of the new gestures that
the child produces. Thus, homesigns exhibit a simple morphology, one that is akin to
the morphologies found in conventional sign languages. Interestingly, it is much more
difficult to impose a coherent morphological description that can account for the ges-
tures that hearing speakers produce (Goldin-Meadow, Mylander, and Butcher 1995;
Goldin-Meadow, Mylander, and Franklin 2007), suggesting that morphological struc-
ture is not an inevitable outgrowth of the manual modality but is instead a linguistic
characteristic that deaf children impose on their communication systems.
2.3. Sentence structure

Homesigners frequently combine their gestures with other gestures, unlike hearing chil-
dren who rarely produce gesture + gesture combinations (Goldin-Meadow and Morford
1985). We consider a string of gestures to be a single unit if the child does not pause or
relax his hand between gestures (Goldin-Meadow and Mylander 1984). For example, a
homesigner combined a point at a toy grape with an “eat” gesture to comment on the
fact that grapes can be eaten, and at another time combined the “eat” gesture with a
point at a visitor to invite her to lunch with the family. The same homesigner combined
all three gestures into a single sentence to offer the experimenter a snack. Interestingly,
homesign sentences convey the same meanings that young children learning conven-
tional languages, signed (Newport and Meier 1985) or spoken (Brown 1973), typically
convey with their sentences (Goldin-Meadow and Mylander 1984).
In addition, homesign sentences are structured in language-like ways. For example,
the homesigners’ gesture sentences are organized around predicate frames (e.g., x
sleeps, x goes to y, x beats y, x gives y to z) and thus are structured at an underlying
level (Goldin-Meadow 1985). Moreover, the gesture sentences often contain more
than one predicate frame or proposition and, in this sense, constitute complex sentences
(e.g., drum beat straw sip, produced to describe a scene in which a solder is beating a
drum and a cowboy is sipping a straw; Goldin-Meadow 1982).
The homesigners’ gesture sentences also exhibit (at least) three devices for marking
who does what to whom. Homesigners indicate the thematic role of a referent by pre-
ferentially producing (as opposed to omitting) gestures for referents playing particular
roles. Homesigners in both America and China are more likely to produce a gesture for
the patient (e.g., the eaten cheese in a sentence about eating) than to produce a gesture
for the actor (e.g., the eating mouse; Goldin-Meadow and Mylander 1998).
Homesigners’ second device for indicating thematic roles is to place gestures for ob-
jects playing particular roles in set positions in a sentence. In other words, they use lin-
ear position to indicate who does what to whom (Feldman, Goldin-Meadow, and
Gleitman 1978; Senghas et al. 1997). Surprisingly, homesigners in America and China
use the same particular linear orders in their gesture sentences despite the fact that
each child is developing his or her system alone without contact with other deaf chil-
dren and in different cultures (Goldin-Meadow and Mylander 1998). The homesigners
tend to produce gestures for patients in the first position of their sentences, before ges-
tures for verbs (cheese–eat) and before gestures for endpoints of a transferring action
(cheese–table). They also produce gestures for verbs before gestures for endpoints
(give–table). In addition, they produce gestures for intransitive actors before gestures
for verbs (mouse-run).
Homesigners’ third device for indicating thematic roles is to displace verb gestures
toward objects playing particular roles, as opposed to producing them in neutral
space (at chest level). These displacements are reminiscent of inflections in conven-
tional sign languages (Padden 1983, 1990). Homesigners tend to displace their gestures
toward objects that are acted upon and thus use their inflections to signal patients. For
example, displacing a twist gesture toward a jar signals that the jar (or one like it) is the
object to be acted upon (Goldin-Meadow et al. 1994). Thus, homesign sentences adhere
to simple syntactic patterns marking who does what to whom.
2.4. Grammatical categories

Young homesigners use their morphological and syntactic devices to distinguish nouns
and verbs (Goldin-Meadow et al. 1994). For example, if the child uses twist as a verb,
that gesture would likely be produced near the jar to be twisted open (i.e., it would
be inflected); it would not be abbreviated (it would be produced with several twists
rather than one); and it would be produced after a pointing gesture at the jar (that–
twist). In contrast, if the child uses that same form twist as a noun to mean “jar”, the
gesture would likely be produced in neutral position near the chest (i.e., it would not
be inflected); it would be abbreviated (produced with one twist rather than several);
and it would occur before the pointing gesture at the jar (jar–that). Thus, the child dis-
tinguishes nouns from verbs morphologically (nouns are abbreviated, verbs inflected)
and syntactically (nouns occur in initial position of a two-gesture sentence, verbs in
second position after the patient). Interestingly, deaf homesigners’ adjectives sit some-
where in between – as they often do in natural languages (Thompson 1988). Adjectives
are marked like nouns morphologically (broken is abbreviated but not inflected) and
like verbs syntactically (broken is produced in the second position of a two-gesture
sentence).
Older homesigners also have the grammatical category subject (possibly younger
ones do, too, but this has not been investigated yet). Grammatical subjects do not
have a simple semantic correlate. Also, no fixed criteria exist to categorically identify
a noun phrase as a subject, but a set of common, multi-dimensional criteria can be
applied across languages (Keenan 1976). A hallmark of subject noun phrases cross-lin-
guistically is the range of semantic roles they display. While the subject of a sentence
will likely be an agent (one who performs an action), many other semantic roles can
be the subject. For example, the theme or patient can be a subject (The door opened),
as can an instrument (The key opened the door) or instigator (The wind opened the
door). Older homesigners studied in Nicaragua used the same grammatical device
(clause-initial position) to mark agent and non-agent noun phrases in their gestured re-
sponses, thus indicating that their systems include the category subject (Coppola and
Newport 2005).
2.5. Sentential markers for negations, questions, past and future

Homesigners’ gesture sentences contain at least two forms of sentence modification,
negation and questions (Franklin, Giannakidou, and Goldin-Meadow 2011). Young
homesigners express two types of negative meanings: rejection (e.g., when offered a car-
rot, the homesigner shakes his head, indicating that he doesn’t want the object) and
denial (e.g., the homesigner points to his chest and then gestures school while shaking
his head, to indicate that he is not at school). In addition, they express three types of
questions: where (e.g., the homesigner produces a two-handed flip when searching for
a key), what (e.g., the homesigner produces the flip when trying to figure out which
object his mother wants), and why (e.g., the homesigner produces the flip when trying
to figure out why the orange fell). As these examples suggest, different forms are used
to convey these two different meanings – the side-to-side headshake for negative mean-
ings, the manual flip for question meanings. These gestures are obviously taken from the
hearing speakers’ gestures that surround the deaf children. But the homesigners use
these gestures as sentence modulators and produce them in systematic positions
in their gesture sentences: headshakes appear at the beginning of sentences, flips at
the end.
Homesigners also use particular gestures to make reference to the past and future
(Morford and Goldin-Meadow 1997). For example, one homesigner produced a gesture,
not observed in the gestures of his hearing parents, to refer to both remote future and
past events – needing to repair a toy (future) and having visited Santa (past). The ges-
ture is made by holding the hand vertically near the chest, palm out, and making an arc-
ing motion away from the body. Another homesigner invented a comparable gesture to
refer only to past events. In addition to these two novel gestures, homesigners have
been found to modify a conventional gesture to use as a future marker. The gesture,
formed by holding up the index finger, is typically used to request a brief delay or
time-out and is glossed as wait one minute. The homesigners used the form for its con-
ventional meaning but they also use it to identify their intentions, that is, to signal the
immediate future. For example, one homesigner produced the gesture and then pointed
at the toy bag to indicate that he was going to go retrieve a new toy. Hearing speakers
use wait to get someone’s attention, never to refer to the immediate future. The form of
the homesigners’ gesture is borrowed from hearing speakers’ gestures but it takes on a
meaning of its own.
2.6. Phrasal structure

As noted earlier, homesigners refer to entities either by pointing at the entity, or by pro-
ducing an iconic gesture evoking some aspect of the entity. There is evidence that these
two types of noun-like gestures can be combined to form a larger unit akin to a Noun
Phrase (Hunsicker and Goldin-Meadow in press). For example, rather than point at a
penny and then at himself (that–me), or produce an iconic gesture and then point at
himself (penny–me), to ask someone to give him a penny, the homesigner produced
the iconic gesture along with the point at the penny (penny-that-me). The point + iconic
combination thus occupied the patient slot in the sentence and functioned like a single
unit. Indeed, point + iconic combinations of this sort serve the same semantic and syn-
tactic functions as pointing gestures and iconic gestures do when used on their own to
refer to objects. In other words, the larger unit substitutes for the smaller units and, in
this way, functions as a phrase.
3. The input to homesign

Homesigners, by definition, are not exposed to a conventional sign language and thus
could not have fashioned their gesture systems after such a model. They are, however,
exposed to the gestures that their hearing parents use when they talk to them. Although
the gestures that hearing speakers typically produce when they talk are not character-
ized by language-like properties (McNeill 1992), it is possible that hearing parents alter
their gestures when communicating with their deaf child. Perhaps, the deaf children’s
hearing parents introduce language-like properties into their own gestures. If so,
these gestures could serve as a model for the structure in their deaf children’s
homesigns. We explore this possibility in this section.
Hearing parents gesture when they talk to young children (Bekken 1989; Iverson
et al. 1999; Shatz 1982) and the hearing parents of homesigners are no exception.
The deaf children’s parents were committed to teaching their children to talk and
sent them to oral schools. These schools advised the parents to talk to their children
as often as possible. And when they talked, they gestured. The question is whether
the parents’ gestures display the language-like properties found in homesign, or
whether they look just like any hearing speaker’s gestures.
To find out, Goldin-Meadow and Mylander (1983, 1984) analyzed the gestures that
the primary caregiver (the mother in every case) of six homesigners produced when
talking to their deaf children. They attempted to look at the gestures through the
eyes of a child who cannot hear and thus turned off the sound and coded the mothers’
gestures as though they had been produced without speech.
Not surprisingly, all six mothers used both pointing and iconic gestures when they
talked to their children. Moreover, the mothers used pointing and iconic gestures in
roughly the same distribution as their children. However, the mothers’ use of gestures
did not resemble their children’s homesigns along many dimensions.
First, the mothers produced fewer different types of iconic gestures than their chil-
dren, and they also used only a small subset of the particular iconic gestures that
their children used (Goldin-Meadow and Mylander 1983, 1984).
Second, the mothers produced very few gesture combinations. That is, like most
English-speakers (McNeill 1992), they tended to produce one gesture per spoken clause
and rarely combined several gestures into a single, motorically uninterrupted unit.
Moreover, the very few gesture combinations that the mothers did produce did not
exhibit the same structural regularities as their children’s homesigns (Goldin-Meadow
and Mylander 1983, 1984). The mothers thus did not appear to have structured their
gestures at the sentence level.
Nor did the mothers structure their gestures at the word level. Each mother used her
gestures in a more restricted way than her child, omitting many of the handshape and
motion morphemes that the child produced (or using the ones she did produce more
narrowly than the child), and omitting completely a very large number of the hand-
shape/motion combinations that the child produced. Indeed, there was no evidence
at all that the mothers’ gestures could be broken into meaningful and consistent parts
(Goldin-Meadow, Mylander, and Butcher 1995).
Finally, the hearing mothers’ iconic gestures were not stable in form and meaning
over time while their deaf children’s homesigns were. Moreover, the hearing mothers
did not distinguish between gestures serving a noun role and gestures serving a verb
role; the deaf children did (Goldin-Meadow et al. 1994).
Did the deaf children learn to structure their homesign systems from their mothers?
Probably not – although it may have been necessary for the children to see hearing peo-
ple gesturing in communicative situations in order to get the idea that gesture can be
appropriated for the purposes of communication. But in terms of how the children
structure their homesigns, there is no evidence that this structure came from the chil-
dren’s hearing mothers. The hearing mothers’ gestures do not have structure when
looked at with tools used to describe the deaf children’s homesigns (although they
do when looked at with tools used to describe co-speech gestures, that is, when they
are described in relation to speech).
The hearing mothers interacted with their deaf children on a daily basis. We there-
fore might have expected that their gestures would eventually come to resemble their
children’s homesigns (or vice versa). But they did not. Why didn’t the hearing parents
display language-like properties in their gestures? The parents were interested in teach-
ing their deaf children to talk, not gesture. They therefore produced all of their gestures
with speech – in other words, their gestures were co-speech gestures and had to behave
accordingly. The gestures had to fit, both temporally and semantically, with the speech
they accompanied. As a result, the hearing parents’ gestures were not “free” to take on
language-like properties.
In contrast, the deaf homesigners had no such constraints. They had no productive
speech and thus always produced gesture on its own, without talk. Moreover, because
the manual modality was the only means of communication open to the children, it had
to take on the full burden of communication. The result was language-like structure.
Although the homesigners may have used their hearing parents’ gestures as a starting
point, it is clear that they went well beyond that point. They transformed the co-speech
gestures they saw into a system that looks very much like language.
4. The next step after homesign

Although homesigns are structured, they do not display all of the properties found in
natural languages, presumably because each child is developing a gesture system on
his or her own. But what would happen if individual homesigners were brought together
into a community? In 1980, a group of Nicaraguan homesigners were brought together
for the first time. Over the next three decades, a sign language with much of the gram-
matical complexity of well-established sign languages evolved. Nicaraguan Sign Lan-
guage, as this newly emergent language is called, is far more complex than any of the
homesign systems out of which it was formed.
Nicaraguan Sign Language offers us a unique opportunity to watch homesign develop
into a fully formed sign language over generations of creators. The initial step in the cre-
ation process took place when deaf children in Managua were brought together for the
first time in an educational setting. The deaf children had been born to hearing parents
and had presumably developed gesture systems in their individual homes. When brought
together, these homesigners needed to develop a common sign language. Not surprisingly,
we see many of the properties of homesign in the sign system created by this first cohort
of signers. For example, the first cohort combines their signs as do homesigners, adhering
to consistent word orders to convey who does what to whom (Senghas et al. 1997).
But Nicaraguan Sign Language has not stopped there. Every year, new students
enter the school and learn to sign among their peers. This second cohort of signers
has as its input the sign system developed by the first cohort and, interestingly, changes
that input so that the product becomes more language-like. For example, second cohort
signers go beyond the small set of basic word orders used by the first cohort, introducing
new orders not seen previously in the language (Senghas et al. 1997). As a second exam-
ple, the second cohort begins to use spatial devices invented by the first cohort, but they
use these devices consistently and for contrastive purposes (Senghas et al. 1997;
Senghas and Coppola 2001).
The second cohort, in a sense, stands on the shoulders of the first. They do not need
to invent the properties of language found in homesign – those properties are already
present in their input. They can therefore take the transformation process one step fur-
ther. The Nicaraguan homesigners (and, indeed, all homesigners) take the first, and per-
haps the biggest, step: They transform their hearing parents’ gestures, which are not
structured in language-like ways, into a language-like system (Coppola et al. 1997;
see also Singleton, Goldin-Meadow, and McNeill 1995). The first and second cohort
of Nicaraguan signers are then able to build on these properties, creating a system
that looks more and more like the natural languages of the world.
There is, however, another interesting wrinkle in the language-creation story – it
matters how old the creator is. Second cohort signers who began learning Nicaraguan
Sign Language relatively late in life (after age 10) do not exhibit these linguistic ad-
vances and, in fact, use signs systems that are no different from those used by late-learn-
ing first cohort signers (Senghas 1995; Senghas and Coppola 2001). It looks like the
creator may have to be a child to take full advantage of the input provided by the
first cohort to continue the process of language creation. Thus, we see in Nicaraguan
Sign Language that language creation depends not only on what the creator has to
work with, but also on who the creator is.
To summarize, when children are provided with a model for language, they use ges-
ture to take steps into language that they cannot yet take in speech. But when children
do not have a model for language, they use gesture to fill the void. They create a system
of homesigns that assumes the most basic forms and functions of language. These home-
sign gestures are qualitatively different from the co-speech gestures that serve as input
to the system and, in this sense, represent the first steps of language creation.
Acknowledgements
This research was supported by grant no. R01 DC00491 from NIDCD. Thanks to my
many collaborators for their invaluable help in uncovering the structure of homesign,
and to the children and their families for welcoming us into their homes.
5. References
Bates, Elizabeth, Laura Benigni, Inge Bretherton, Luigia Camaioni and Virginia Volterra 1979. The
Emergence of Symbols: Cognition and Communication in Infancy. New York: Academic Press.
Bekken, Kaaren 1989. Is there “Motherese” in gesture? Unpublished doctoral dissertation, Uni-
versity of Chicago.
Bonvillian, John D., Michael D. Orlansky and Lesley Lazin Novack 1983. Developmental mile-
stones: Sign language acquisition and motor development. Child Development 54: 1435–1445.
Brentari, Diane, Marie Coppola, Laura Mazzoni and Susan Goldin-Meadow 2012. When does a
system become phonological? Handshape production in gesturers, signers, and homesigners.
Natural Language and Linguistic Theory 30(1): 1–31.
Brown, Roger 1973. A First Language. Cambridge, MA: Harvard University Press.
Coppola, Marie and Elissa L. Newport 2005. Grammatical “subjects” in home sign: Abstract lin-
guistic structure in adult primary gesture systems without linguistic input. Proceedings of the
National Academy of Sciences 102: 19249–19253.
Coppola, Marie, Ann Senghas, Elissa L. Newport and Ted Supalla 1997. The emergence of gram-
mar: Evidence from family-based sign systems in Nicaragua. Paper presented at the Boston
University Conference on Language Development.
Feldman, Heidi, Susan Goldin-Meadow and Lila Gleitman 1978. Beyond Herodotus: The creation
of language by linguistically deprived deaf children. In: Andrew Lock (ed.), Action, Symbol,
and Gesture: The Emergence of Language. New York: Academic Press.
Franklin, Amy, Anastasia Giannakidou and Susan Goldin-Meadow 2011. Negation, questions, and
structure building in a homesign system. Cognition 118(3): 398–416.
Goldin-Meadow, Susan 1982. The resilience of recursion: A study of a communication system de-
veloped without a conventional language model. In: Eric Wanner and Lila R. Gleitman (eds.),
Language Acquisition: The State of the Art, 51–77. New York: Cambridge University Press.
Goldin-Meadow, Susan 1985. Language development under atypical learning conditions: Replica-
tion and implications of a study of deaf children of hearing parents. In: Keith Nelson (ed.),
Children’s Language, Volume 5, 197–245. Hillsdale, NJ: Lawrence Erlbaum and Associates.
Goldin-Meadow, Susan 1993. When does gesture become language? A study of gesture used as a
primary communication system by deaf children of hearing parents. In: Kathleen R. Gibson
and Tim Ingold (eds.), Tools, Language and Cognition in Human Evolution, 63–85. New
York: Cambridge University Press.
Goldin-Meadow, Susan 2003. The Resilience of Language. Philadelphia, PA: Taylor and Francis.
Goldin-Meadow, Susan and Cindy Butcher 2003. Pointing toward two-word speech in young chil-
dren. In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet. Mahwah,
NJ: Erlbaum Associates.
Goldin-Meadow, Susan, Cindy Butcher, Carolyn Mylander and Mark Dodge 1994. Nouns and
verbs in a self-styled gesture system: What’s in a name? Cognitive Psychology 27: 259–319.
Goldin-Meadow, Susan and Marolyn Morford 1985. Gesture in early child language: Studies of
deaf and hearing children. Merrill-Palmer Quarterly 31: 145–176.
Goldin-Meadow, Susan and Carolyn Mylander 1983. Gestural communication in deaf children:
The non-effects of parental input on language development. Science 221: 372–374.
Goldin-Meadow, Susan and Carolyn Mylander 1984. Gestural communication in deaf children:
The effects and non-effects of parental input on early language development. Monographs
of the Society for Research in Child Development 49: 1–121.
Goldin-Meadow, Susan and Carolyn Mylander 1998. Spontaneous sign systems created by deaf
children in two cultures. Nature 91: 279–281.
Goldin-Meadow, Susan, Carolyn Mylander and Cindy Butcher 1995. The resilience of combinator-
ial structure at the word level: Morphology in self-styled gesture systems. Cognition 56:
195–262.
Goldin-Meadow, Susan, Carolyn Mylander and Amy Franklin 2007. How children make language
out of gesture: Morphological structure in gesture systems developed by American and Chi-
nese deaf children. Cognitive Psychology 55: 87–135.
Goodwyn, Susan, Linda Acredolo and Catherine Brown 2000. Impact of symbolic gesturing on
early language development. Journal of Nonverbal Behavior 24: 81–103.
Greenfield, Patricia M. and E. Sue Savage-Rumbaugh 1991. Imitation, grammatical development,
and the invention of protogrammar by an ape. In: Norman A. Krasnegor, Duane M.
Rumbaugh, Richard L. Schiefelbusch and Michael Studdert-Kennedy (eds.), Biological and

Behavioral Determinants of Language Development, 235–262. Hillsdale, NJ: Erlbaum.
Hunsicker, Dea and Susan Goldin-Meadow in press. Hierarchical structure in a self-created com-
munication system: Building nominal constituents in homesign. Language.
Iverson, Jana M., Olga Capirci, Emiddia Longobardi and M. Cristina Caselli 1999. Gesturing in
mother-child interaction. Cognitive Development 14: 57–75.
Iverson, Jana M. and Susan Goldin-Meadow 2005. Gesture paves the way for language develop-
ment. Psychological Science 16: 368–371.
Keenan, Edward 1976. Towards a universal definition of subject. In: Charles N. Li (ed.), Subject
and Topic, 303–334. New York: Academic Press.
Lenneberg, Eric H. 1964. Capacity for language acquisition. In: Jerry A. Fodor and Jerrold J. Katz
(eds.), The Structure of Language: Readings in the Philosophy of Language. Englewood Cliffs,
NJ: Prentice-Hall.
Lillo-Martin, Diane 1999. Modality effects and modularity in language acquisition: The acquisi-
tion of American Sign Language. In: William C. Ritchie and Tej K. Bhatia (eds.), The Hand-
book of Child Language Acquisition, 531–567. New York: Academic Press.
McNeill, David 1992. Hand and Mind: What Gestures Reveal About Thought. Chicago: University
of Chicago Press.
Moores, Donald F. 1974. Nonvocal systems of verbal behavior. In: Richard L. Schiefelbusch and
Lyle L. Lloyd (eds.), Language Perspectives: Acquisition, Retardation, and Intervention. Balti-
more: University Park Press.
Morford, Jill P. and Susan Goldin-Meadow 1997. From here to there and now to then: The devel-
opment of displaced reference in homesign and English. Child Development 68: 420–435.
Newport, Elissa L. and Richard P. Meier 1985. The acquisition of American Sign Language. In:
Dan Slobin (ed.), The Cross-Linguistic Study of Language Acquisition, Volume 1: The Data,
881–938. Hillsdale, NJ: Lawrence Erlbaum.
Ozcaliskan, Seyda and Susan Goldin-Meadow 2005. Gesture is at the cutting edge of early lan-
guage development. Cognition 96: B01–113.
Padden, Carol 1983. Interaction of morphology and syntax in American Sign Language. Unpub-
lished Ph.D. Dissertation, University of California at San Diego.
Padden, Carol 1990. The relation between space and grammar in ASL verb morphology. In: Ceil
Lucas (ed.), Sign Language Research: Theoretical Issues, 118–132, Washington, D.C.: Gallaudet
University Press.
Phillips, Sarah B. Van Deusen, Susan Goldin-Meadow and Peggy Miller 2001. Enacting stories,
seeing worlds: Similarities and differences in the cross-cultural narrative development of lin-
guistically isolated deaf children. Human Development 44: 311–336.
Rowe, Meredith L. and Susan Goldin-Meadow 2009. Differences in early gesture explain SES dis-
parities in child vocabulary size at school entry. Science 323: 951–953.
Rowe, Meredith, Seyda Ozcaliskan and Susan Goldin-Meadow 2008. Learning words by hand:
Gesture’s role in predicting vocabulary development. First Language 28: 185–203.
Schlesinger, Hilde 1978. The acquisition of bimodal language. In: Izchak Schlesinger and Lila
Namir (eds.), Sign Language of the Deaf: Psychological, Linguistic, and Sociological Perspec-
tives, 57–93. New York: Academic Press.
Senghas, Ann 1995. The development of Nicaraguan Sign Language via the language acquisition
process. Proceedings of Boston University Child Language Development 19: 543–552.
Senghas, Ann and Marie Coppola 2001. Children creating language: How Nicaraguan Sign Lan-
guage acquired a spatial grammar. Psychological Science 12: 323–328.
Senghas, Ann, Marie Coppola, Elissa L. Newport and Ted Supalla 1997. Argument structure in
Nicaraguan Sign Language: The emergence of grammatical devices. In: Elizabeth Hughes,
Mary Hughes and Annabel Greenhill (eds.), Proceedings of the 21st Annual Boston Univer-
sity Conference on Language Development, Volume 2, 550–561. Somerville, MA: Cascadilla
Press.
7. Speech, sign, and gesture 125
Shatz, Marilyn 1982. On mechanisms of language acquisition: Can features of the communicative
environment account for development? In: Eric Wanner and Lila R. Gleitman (eds.), Lan-
guage Acquisition: The State of the Art, 102–127. New York: Cambridge University Press.
Singleton, Jenny L., Susan Goldin-Meadow and David McNeill 1995. The cataclysmic break
between gesticulation and sign: Evidence against an evolutionary continuum of manual com-
munication. In: Karen Emmorey and Judy Reilly (eds.), Language, Gesture, and Space, 287–
311. Hillsdale, NJ: Erlbaum Associates.
Singleton, Jenny L., Jill P. Morford and Susan Goldin-Meadow 1993. Once is not enough: Stan-
dards of well-formedness in manual communication created over three different timespans.
Language 69: 683–715.
Tervoort, Bernard T. 1961. Esoteric symbolism in the communication behavior of young deaf chil-
dren. American Annals of the Deaf 106: 436–480.
Thompson, Sandra A. 1988. A discourse approach to the cross-linguistic category “adjective”. In:
John A. Hawkins (ed.), Explaining Language Universals, 167–185. Cambridge, MA: Basil
Blackwell.
Susan Goldin-Meadow, Chicago, IL (USA)
7. Speech, sign, and gesture

2. The linguistic study of signed languages
3. The growth of Gesture Studies
4. Cognitive Linguistics
5. The modern synthesis
6. References
Abstract
For much of history, the relationship among spoken language, signed language, and ges-
ture has been a source of contention among language scholars and philosophers of
language. The chapter examines the history of this question, beginning with the infamous
Milan conference and the debate over whether deaf children should be permitted to sign
or be required to learn to speak. Framed in Cartesian mind-body dualism, the debate
determined the scientific world view for the next 100 years. Recently, however, three
developments in the science of communication have begun to form a unified view of
the nature of human semiotic capabilities: (1) the linguistic study of signed languages;
(2) the growth of gesture studies; and (3) the new approach to linguistic theory called cog-
nitive linguistics and cognitive grammar. Each development, and the integration among
them, is described. Finally, the emerging interdisciplinary synthesis of the three areas is
discussed.

For much of history, the relationship among spoken language, signed language, and ges-
ture has been a source of debate and contention by scientists, philosophers, and
language scholars. Language has consistently been equated with speech. Signed lan-
guages were rarely, if ever, recognized as language; rather, they were commonly seen
as nothing more than depictive gestures. Gesture was regarded as a universal language,
more closely related to nature than is spoken language.
For deaf people who use their hands and bodies to express their language, the nature
of the relationship between these three systems is far from merely a philosophical ques-
tion. It informed a centuries-long debate about whether deaf children could be edu-
cated and become integrated into the general society. In the mid- to late 1800s this
debate centered around whether deaf children should be permitted to use signed lan-
guage, or whether signed languages should be forbidden and deaf children should be
taught using only speech and articulation training. These two forces, those who sup-
ported a manual approach and those who supported oralism, came together at the
Milan Conference of 1880. It was here that the proponents of oralism most forcefully
made their case.
Marius Magnat, the director of an oral school in Geneva at the time, made the case
for why the oralist approach should be preferred over that of signed language:
The advantages of articulation training [i.e., speech] […] are that it restores the deaf to
society, allows moral and intellectual development, and proves useful in employment.
Moreover, it permits communication with the illiterate, facilitates the acquisition and
use of ideas, is better for the lungs, has more precision than signs, makes the pupil the
equal of his hearing counterpart, allows spontaneous, rapid, sure, and complete expression
of thought, and humanizes the user. Manually taught children are defiant and corruptible.
This arises from the disadvantages of sign language. It is doubtful that sign can engender
thought. It is concrete. It is not truly connected with feeling and thought. […] It lacks pre-
cision. […] Sign cannot convey number, gender, person, time, nouns, verbs, adverbs, adjec-
tives, he claims. […] It does not allow [the teacher] to raise the deaf-mute above his
sensations. […] Since signs strike the senses materially they cannot elicit reasoning, reflec-
tion, generalization, and above all abstraction as powerfully as can speech. (Lane 1984:
387–388)
The president of the conference, Giulio Tarra, approached the argument from a differ-
ent perspective. Not only did he argue that speech should be preferred over signs, but
he linked signs to gesture, and claimed that neither was the proper instrument for
developing the intellect and the mind of the deaf child:
Gesture is not the true language of man which suits the dignity of his nature. Gesture,
instead of addressing the mind, addresses the imagination and the senses. Moreover, it
is not and never will be the language of society […] Thus, for us it is an absolute necessity
to prohibit that language and to replace it with living speech, the only instrument of
human thought. […] Oral speech is the sole power that can rekindle the light God
breathed into man when, giving him a soul in a corporeal body, he gave him also a
means of understanding, of conceiving, and of expressing himself. […] While, on the
one hand, mimic signs are not sufficient to express the fullness of thought, on the
other they enhance and glorify fantasy and all the faculties of the sense of imagination.
[…] The fantastic language of signs exalts the senses and foments the passions, whereas
speech elevates the mind much more naturally, with calm and truth and avoids the danger
of exaggerating the sentiment expressed and provoking harmful mental impressions.
(Lane 1984: 391, 393–394)
The arguments made by Magnat and Tarra were not new. They reflected Cartesian
philosophical ideas common at the time. Descartes distinguished two modes of
conceptualization – understanding or reasoning, and imagination:
I believe that this power of imagining that is in me, insofar as it differs from the power of
understanding, is not a necessary element of my essence, that is, of the essence of my mind;
for although I might lack this power, nonetheless I would undoubtedly remain the same
person I am now. Thus it seems that the power of imagining depends upon something dif-
ferent from me. (Descartes 1980: 90)
Cartesian thought also established the duality that separated the mind and the body:
Although perhaps […] I have a body that is very closely joined to me, nevertheless,
because on the one hand I have a clear and distinct idea of myself – insofar as I am a
thing that thinks and not an extended thing – and because on the other hand I have a dis-
tinct idea of a body – insofar as it is merely an extended thing, and not a thing that thinks –
it is therefore certain that I am truly distinct from my body, and that I can exist without it.
(Descartes 1980: 93)
Both of those philosophical themes are present in the arguments presented in favor of
oralism, which reveal long-standing preconceptions, and indeed misconceptions, about
the relation between spoken language, signed language, and gesture (see Tab. 7.1).
Tab. 7.1: The mind-body duality and speech, sign, and gesture
Mind Body
Language Gesture
Speech Sign
Acquisition of ideas Concrete
Expression of thought, instrument of thought Cannot engender thought
Those who use it are restored to society with calm, Those who use it are defiant, corruptible,
prudence, truth (human-like) undignified (animal-like)
Precision (grammar) Lacks grammar (number, gender, nouns,
verbs, etc.); mimic
Elicits and permits reasoning, reflection, Associated with the senses, material, glorifies
abstraction, generalization, conceptualization, imagination and fantasy, foments the
rationality passions, encourages harmful mental
impressions
The soul, the spirit, the breath of God (aspiration The corporeal body, the flesh, the material
and speech), res cogitans world, the sensual, res extensa
The impact of these ideas was tremendous because they established the scientific world
view for the next 100 years. Scholars were left with the following deeply entrenched
assumptions about the relationship between speech, sign, and gesture:
(i) Language is of the mind; gesture is of the body

(ii) Language is expressed solely through speech
(iii) Gesture is distinct from language
(iv) Because language is speech, sign is not language; sign is gesture
Three recent developments in the science of communication have not only challenged
these views, but have begun to synthesize to form a bold new unified view of the nature
of human semiotic capabilities. These are:
(i) the linguistic study of signed languages;

(ii) the growth of gesture studies; and
(iii) the new approach to linguistic theory called cognitive linguistics and cognitive
grammar.
2. The linguistic study of signed languages

As we have noted, the linguistic status of signed languages was the focus of considerable
controversy. At the root of this controversy was the issue of whether signed languages
are natural human languages or merely gestures devoid of linguistic structure. One
manifestation of this among linguists was the assumption that signed languages lacked
duality of patterning. The words (or “signs”) of signed languages were believed to be
holistic units, depictive gestures consisting of no internal structure.
Modern linguistic analysis of signed languages began with the pioneering work of
William C. Stokoe. In Sign Language Structure, Stokoe (1960) offered the first linguistic
analysis of American Sign Language (ASL) form, demonstrating that it exhibits duality
of patterning. Stokoe analyzed signs into three major phonological classes: handshape
(the configuration that the hand makes when producing the sign), location (the place
where the sign is produced), and movement (the movement made in producing the
sign). He termed these meaningless units of formation cheremes, the signed equivalent
of phonemes in spoken languages. Stokoe’s insight established once and for all that
American Sign Language was a true human language. His work was taken up and ex-
tended by a growing number of linguists in many other countries, who documented the
linguistic structure of many of the world’s signed languages.
One unfortunate consequence of this recognition that signed languages are natural
human languages was that sign linguists rejected any connection between signed lan-
guages and gesture. In fact, Stokoe himself did not assume a disjunction of sign and ges-
ture. To explore this idea, Stokoe (1974: 37) introduced the term gSign, the gestural
manifestation of a sign-vehicle in a semiotic system, equivalent to sSign, the manifesta-
tion of a sign-vehicle in the spoken modality. With this foundation, Stokoe then asked a
critical question: How do gSigns relate to language? Pointing out that while scholars
have often assumed “sSigns and language emerged fully formed, all at once, and
indissolubly joined,” Stokoe suggested non-Cartesian alternative:
If one considers a natural human sign language from inside not outside, its gSigns denote
the kinds of things that sSigns denote. […] Because the full language use of gSigns is not
generally known, it has been supposed that there must be an unbridgeable gap between
gSign as affect display and gSign as language element. Yet there may be a semiotic and
evolutionary continuum as we shall see. The habit of thinking of animal gSigns as simple
and human signs, both gestured and spoken, as complex is preposterous. […] The verte-
brates’ affect display gSign is immensely complex. Its reference is in fact to the whole
ethology, and it is moreover complex in a semiotically exploitable way. Its vehicle, deno-
tation, and denotatum are all rich material for evolutionary development. If it is
foolhardy to maintain this kind of gSign evolved directly into language, it is foolish to
dismiss it and seek elsewhere for the origins of more specialized gSigns and sSigns.
(Stokoe 1974: 36)
As we shall see, only recently has Stokoe’s original position been taken up by sign lin-
guists and gesture researchers. Even while spoken language researchers were exploring
the deep connections between language and gesture, the view remained that sign and
gesture could not be unified (Singleton, Goldin-Meadow, and McNeill 1995).
3. The growth of Gesture Studies

The Cartesian dichotomy separating language from gesture was the received wisdom
across the fields of communication and psychological science. While a few social scien-
tists, such as Ray Birdwhistell (1970) and Condon and Sander (1974), studied the tight
synchronicity of speech and gesture, linguists rarely looked at language in relation to
gesture. One notable exception was Kenneth Pike (1967), who argued that language
and gesture must be examined within a unified theory.
Two contemporary scholars who have contributed greatly to the relation between
language and gesture are David McNeill (1985, 1992, 2000, 2005) and Adam Kendon
(1980, 1991, 1994, 1997, 2004). Research on co-speech gesture has increased exponen-
tially over the last several decades. The growth of the scientific study of gesture is docu-
mented in many of the chapters of this handbook. Significantly, an emerging theme in
contemporary gesture research seeks to link gesture studies to signed languages and to
cognitive approaches to linguistics (Cienki and Müller 2008a, 2008b; Ladewig 2011; La-
dewig and Bressem forthcoming; Mittelberg 2008, 2010; Müller 2004, 2008, 2009; Müller
and Cienki 2009).
4. Cognitive Linguistics
The Cartesian idea that language is of the mind, while gesture is of the body, has deeply
influenced modern linguistic theory. While some philosophers, such as Condillac, sug-
gested that language began as a gesture language or langage d’action, while others, in-
cluding Rousseau, Herder, and Humboldt, caricatured and ridiculed such a claim,
arguing that language could not have arisen from such natural, animalistic beginnings.
Humboldt contended that: “Language must be regarded as built-in man; for as the work
of his mind, in the clarity of consciousness, it is completely inexplicable. It is of no avail
to allow for thousands and thousands of years for its invention. Language could not be
invented if its archetype were not already present in the human mind” (Wells 1987).
The notion that language must be “built-in”, that there is in fact a faculty of language
distinct from other, general cognitive abilities, is a defining assumption of modern form-
alist or generative linguistic theory. Another defining assumption of formalist ap-
proaches is that grammar and syntax can be explained independently of meaning.
Chomsky, for example, writes derisively of the matter:
A great deal of effort has been expended in attempting to answer the question: “How can
you construct a grammar with no appeal to meaning?” The question itself, however, is
wrongly put, since the implication that obviously one can construct a grammar with appeal
to meaning is totally unsupported. One might with equal justification ask: “How can
you construct a grammar with no knowledge of the hair color of the speaker?” (Chomsky
1957: 93).
Cognitive linguistics presents a radical departure from such a view. Cognitive linguistics
is an approach to language that is based on our experience of the world and the way
we perceive and conceptualize it. Cognitive linguistics adopts three foundational
hypotheses (Croft and Cruse 2004):
(i) Language is not an autonomous cognitive faculty.

(ii) Grammar is conceptualization.
(iii) Knowledge of language emerges from language use.
Under the cognitive linguistic framework, language is dissociable from other facets of
human cognition. Knowledge of language cannot be sharply delimited and distinguished
from other kinds of knowledge and ability.
One approach to cognitive linguistics is known as cognitive grammar. In direct reply
to the more traditional assumption that grammar is a purely formal system devoid of
meaning, Langacker (2008: 3) presents the provocative claim that grammar is meaning-
ful. Grammar, according to Langacker, is symbolic in nature, where the term “symbolic”
means the pairing between a semantic structure and a phonological structure – between
a meaning and its physical manifestation. Within this theory, lexicon, morphology, and
syntax form a continuum of symbolic structures. All aspects of grammar, including
grammatical classes, grammatical markers, grammatical rules, and so forth, are symbolic
structures, having both semantic and phonological import.
According to cognitive grammar, one of the overriding aspects of grammar is that it
imposes a particular construal onto conceptual content. This ability to construe situa-
tions in myriad ways is based on imaginative and creative abilities. Cognitive linguistics
eschews purely propositional or truth-conditional accounts of meaning, and instead fa-
vors imagistic accounts. One type of such conceptual structure is a set of image schemas,
“schematized patterns of activity abstracted from everyday bodily experiences,
especially pertaining to vision, space, motion, and force” (Langacker 2008: 32).
While the cognitive linguistic approach does not rule out certain innate abilities,
cognitive grammar does insist that:
if language serves a symbolic function, establishing systematic connections between con-

ceptualizations and observable phenomena like sounds and gestures, it would seem both
natural and desirable to seek an account that grammar is itself symbolic […] From a
naı̈ve perspective (i.e. for those who lack linguistic training), it is hard to fathom why
our species would have evolved an autonomous grammatical system independent of
conceptual and phonological content. (Langacker 2008: 6)
5. The modern synthesis

On their own, each of these three developments have significantly advanced our under-
standing of the world’s signed languages, of the nature of gesture and the role it plays in
communication and conceptualization, and of the nature of human spoken language as
an expression of general cognitive abilities that are grounded in our sensorimotor and
kinesthetic experiences.
It is in the modern synthesis of these three developments, which is only now happen-
ing, that a truly interdisciplinary understanding is beginning to emerge about the body,
language, and communication. Initially, the integration followed two tracks. In one
track, linguists (Cienki and Müller 2008a; Parrill and Sweetser 2004) began to apply
cognitive linguistic concepts and analytical tools to the study of gesture, demonstrating
that gesture can reveal in a very transparent way the nature of conceptualization.
In the second track, cognitively-trained signed language linguists began to utilize
cognitive linguistic methods to study signed languages (S. Wilcox 2007; S. Wilcox and
P. Wilcox 2010). These studies resulted in new insights into problems that had seemed
impenetrable, such as the nature of metaphor in signed languages (P. Wilcox 2000;
S. Wilcox 2007; S. Wilcox, P. Wilcox, and Jarque 2003) and the significance of iconicity
in a visual language (Perniss 2007; Pietrandrea 2002; Taub 2001; S. Wilcox 2004a). Addi-
tionally, models unique to cognitive linguistics such as blending and conceptual integra-
tion (Dudis 2004; Liddell 1998, 2003; Wulf and Dudis 2005) were applied to signed
languages, resulting in new insights into the structure of these languages.
Recently, gesture researchers and linguists have begun to integrate all three develop-
ments. One of the first publications to explore this new, integrated approach was Ges-
ture and the Nature of Language (Armstrong, Stokoe, and S. Wilcox 1995). The
interface between gesture and signed language has become the focus of several linguists
(Müller 2008; S. Wilcox 2004b). One aspect of this research has been to document the
grammaticalization process by which non-linguistic gestures become incorporated into
the linguistic system of signed languages (Janzen and Shaffer 2002; S. Wilcox 2009).
Increasingly, the disciplinary boundaries between sign linguists, gesture scholars, and re-
searchers in other fields of communication science such as psychologists and cognitive
neuroscientists, are beginning to fade. As they do, the fruits of this interdisciplinary uni-
fication are beginning to shed new light on the nature of human language as embodied
communication.
6. References
Armstrong, David F., William C. Stokoe and Sherman Wilcox 1995. Gesture and the Nature of
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays on Body Motion Communication. Phila-
Chomsky, Noam 1957. Syntactic Structures. The Hague: Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John
Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture, and thought. In: Raymond Gibbs
(ed.), The Cambridge Handbook of Metaphor and Thought, 483–502. Cambridge: Cambridge
University Press.
Condon, William S. and Louis W. Sander 1974. Synchrony demonstrated between movements of
the neonate and adult speech. Child Development 45(2): 456–462.
Croft, William and Alan D. Cruse 2004. Cognitive Linguistics. Cambridge: Cambridge University
Press.
Descartes, René 1980. Discourse on Method and Meditations on First Philosophy. Indianapolis:
Hackett.
Dudis, Paul G. 2004. Body partitioning and real-space blends. Cognitive Linguistics 15(2):
223–238.
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL gramma-
ticization. In: Richard Meier, David Quinto and Kearsy Cormier (eds.), Modality and Structure
in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University Press.
Kendon, Adam 1980. Gesticulation and speech: two aspects of the process of utterance. In: Mary
Hague: Mouton.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man 26(2): 199–221.
Interaction 27(3): 175–200.
Kendon, Adam 1997. Gesture. Annual Review of Anthropology 26: 109–128.
Press.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6 http://
cognitextes.revues.org/406. Accessed May 2012.
structures in gestures on the basis of the four parameters of sign language. Semiotica.
Lane, Harlan 1984. When the Mind Hears: A History of the Deaf. New York: Random House.
sity Press.
Liddell, Scott K. 1998. Grounded blends, gestures, and conceptual shifts. Cognitive Linguistics
9(3): 283–314.
Liddell, Scott K. 2003. Grammar, Gesture, and Meaning in American Sign Language. New York:
McNeill, David 1992. Hand and Mind: What Gestures Reveal About Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. New York: Cambridge University Press.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyvian
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family. In:
Roland Posner and Cornelia Müller (eds.), The Semantics and Pragmatics of Everyday Ges-
Müller, Cornelia 2008. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), The Routledge Linguis-
tics Encyclopedia, 510–518. London: Routledge.
Multimodal Metaphor, 297–328. Amsterdam: John Benjamins.
Perniss, Pamela M. 2007. Space and Iconicity in German Sign Language (DGS). Max Planck Insti-
tute Series in Psycholinguistics 45. Nijmegen, the Netherlands: Max Planck Institute.
Pietrandrea, Paola 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2(3): 296–321.
Pike, Kenneth L. 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior. The Hague: Mouton.
Singleton, Jenny L., Susan Goldin-Meadow and David McNeill 1995. The cataclysmic break
between gesticulation and sign: Evidence against a unified continuum of gestural communica-
tion. In: Karen Emmorey and Judy Reilly (eds.), Language, Gesture, and Space, 287–311. Hills-
dale, NJ: Lawrence Erlbaum.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Visual Communication Sys-
tems of the American Deaf. Buffalo, NY: University of Buffalo.
Stokoe, William C. 1974. Motor signs as the first form of language. In: Roger W. Wescott, Gordon
W. Hewes and William C. Stokoe (eds.), Language Origins, 35–50. Silver Spring, MD: Linstok
Press.
Taub, Sarah 2001. Language in the Body: Iconicity and Metaphor in American Sign Language.
Wells, Georg A. 1987. The Origin of Language: Aspects of the Discussion from Condillac to
Wundt. La Salle, IL: Open Court.
Wilcox, Phyllis P. 2000. Metaphor in American Sign Language. Washington, DC: Gallaudet Uni-
versity Press.
Wilcox, Phyllis P. 2007. Constructs of the mind: Cross-linguistic contrast of metaphor in verbal and
signed languages. In: Elena Pizzuto, Paola Pietrandrea and Raffaele Simone (eds.), Verbal and
Signed Languages: Comparing Structures, Constructs and Methodologies, 251–274. Berlin:
Mouton de Gruyter.
Wilcox, Sherman 2004a. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed
languages. Cognitive Linguistics 15(2): 119–147.
Wilcox, Sherman 2004b. Gesture and language: Cross-linguistic and historical data from signed
languages. Gesture 4(1): 43–75.
Wilcox, Sherman 2007. Signed languages. In: Dick Geeraerts and Hubert Cuyckens (eds.), The
Oxford Handbook of Cognitive Linguistics, 1113–1136. Oxford: Oxford University Press.
Wilcox, Sherman 2009. Symbol and symptom: Routes from gesture to signed language. Annual
Review of Cognitive Linguistics 7(1): 89–110.
Wilcox, Sherman and Phyllis Wilcox 2010. The analysis of signed languages. In: Bernd Heine and
Heiko Narrog (eds.), The Oxford Handbook of Linguistic Analysis, 739–760. Oxford: Oxford
University Press.
Wilcox, Sherman, Phyllis Wilcox and Maria J. Jarque 2003. Mappings in conceptual space: Meto-
nymy, metaphor, and iconicity in two signed languages. Jezikoslovlje 4(1): 139–156.
Wulf, Alyssa and Paul Dudis 2005. Body partitioning in ASL metaphoric binds. Sign Language
Studies 5(3): 317–332.
Sherman Wilcox, Albuquerque, NM (USA)

II. Perspectives from different disciplines
8. The growth point hypothesis of language and

gesture as a dynamic and integrated system
1. Introduction
2. The growth point
3. Growth point properties
4. Context
5. Catchments and fields of oppositions
6. Unpacking
7. Note on consciousness
8. Conclusions
9. References
Abstract
This is a sketch of the theoretical background and basic assumptions for regarding
language and gesture as an integrated system along with an account of the empirical
observations and findings which have inspired and that lend support to this theory.
1. Introduction
A dynamic view emphasizes how language integrates itself with thought and context
(discursive, interpersonally, material, etc.) – factors integral, not external, to language
considered dynamically that fuel the real-time microgenesis of speech and gesture.
The growth point (GP), featured here, is a hypothesis concerning this dynamic system.
2. The growth point

The growth point is a cognitive package that combines semiotically opposite linguistic
categorial and imagistic components (McNeill and Duncan 2000). Tab. 8.1 summarizes
the semiotic oppositions inside a growth point. When I say that gesture “is opposed to
language”, I mean language in the sense of Saussure’s 1966 langue – the synchronically
analyzed system of differences that comprise linguistic value. When I say on the other
hand that gesture is “part of language”, I mean language in the sense of his langage –
“the semiotic totality of language in general”. The oppositions create conditions for
an imagery-language dialectic, the basic engine of thought, language and action in con-
text. A growth point is a nexus where the static and dynamic intersect. Thus both di-
mensions must be considered. In combining them, the growth point becomes the
minimal unit of the dynamic dimension itself. It is called a growth point because it is
meant to be the initial pulse of thinking-for-(and while)-speaking, out of which a
dynamic process of organization emerges. The linguistic component categorizes the
136 II. Perspectives from different disciplines
visuo-actional imagery component. The linguistic component is important since, by

categorizing the imagery, it brings the gesture into the system of language. Imagery is
equally important, since it grounds sequential linguistic categories in an instantaneous
visuospatial frame. It provides the growth point with the property of “chunking”, a hall-
mark of expert performance (see Chase and Ericsson 1981), whereby a chunk of linguis-
tic output is organized around the presentation of an image. Synchronized speech and
gesture are the key to this theoretical growth point unit.
2.1. Two dimensions, and the imagery-language dialectic

The dimensions the growth point combines have classically been called “linguistic” and
“psychological” but better (and less proprietary) terms are static and dynamic.
– The static dimension is accessed through the synchronic method, in which language is
viewed as a totality at a single theoretical instant (hence “syn-” “chronic”). Saussure
([1959] 1966) argued that only in this way, with the whole of language laid out panor-
amically, could the contrasts that define linguistic values be discerned. Saussure
famously contrasted French “mouton” to the two English words, “sheep” and “mut-
ton”, to illustrate how values depend on “differences” (here, a difference in English,
none in French) as well as reference. The values of “mouton” and “sheep” can never
be the same, despite being mutual translations and having the same references. Also
see Saussure’s recently discovered notes (Saussure 2002) and comments by Harris
(2002, 2003). “In language”, he said, “there are only differences” (Saussure 1966:
652) and this is the essence of the static dimension – linguistic objects in contrast
(“static” does not mean “stasis” – we are not speaking of moments of repose between
bursts of activity: Every linguistic event can be regarded statically or dynamically, or
as here both).
– The dynamic dimension could be termed the “activity” of language but I will call it
(invoking Merleau-Ponty 1962) “inhabiting” language with thought and action. This
term has the advantage of invoking both langue, the static system, and something that
animates it, bringing in the dynamic dimension. A historical figure associated with
this tradition is Vygotsky (1987). On this dimension, utterances come and go, emerge
and disperse in real time, and this stands in contrast to the abstraction from time of
the linguistic objects in langue.
An imagery-language dialectic in turn involves: A conflict or opposition between the

dimensions, the dynamic and the static – two modes of semiosis simultaneously expres-
sive of the same idea. It comprises on one side imagery that is global-synthetic and
context-sensitive; and on the other a linguistic categorization that is listable, recurrent,
analytic and combinable. Such a combination of opposite modes of representation for
the same idea is unstable.
Hence, the dialectic seeks stability. Resolution of the conflict comes through change,
a positive force for thought and speech which is achieved by “unpacking” the growth
point into one or more stable constructions capable of presenting it. This growth
point dialectic is a model, both in its embodiment of conflict and the changes it evokes,
of the animation of language, of the dynamic dimension itself.
8. The growth point hypothesis of language and gesture 137
3. Growth point properties

3.1. Empirical base
The growth point is an empirical concept. Growth points are inferred from the totality
of communicative events, with special focus on speech-gesture synchrony and co-
expressivity. In any given instance, a growth point is a hypothesis concerning observed
speech-gesture data. To be justified, a growth point must explain:
(i) speech and gesture synchrony;

(ii) speech and gesture co-expressiveness;
(iii) that speech and gesture jointly are a psychological predicate; and
(iv) that speech and gesture embody the same idea in opposite semiotic modes (“syn-
chrony”, “co-expressiveness” and “psychological predicate” are explained in the
following sections).
With careful observation these criteria can be tested for applicability, as will be
illustrated in multiple examples below.
3.2. Minimal units

A growth point is what Vygotsky (1987) termed a “psychological unit” – a smallest
component that retains the quality of a whole, in this case the whole of an imagery-
language dialectic. “By a unit we mean a product of analysis which, in distinction
from elements, possesses all the basic properties of a whole. Further, these properties
must be a living portion of the unified whole which cannot be broken down further”
(Vygotsky, Thinking and Speech [Russian 1934 version: 9], quoted by Zinchenko 1985: 97).
Vygotsky (1987) contrasted units to “elements”, the result of reducing (in this case)
the growth point to features that do not preserve the imagery-language dialectic (sub-
personal events, Quaeghebeur and Reynaert 2010, such as word retrievals and many
others that figure in experimental studies because they are relatively easy to measure,
but are not psychological units).
3.3. An example
In Fig. 8.1, a speaker is shown recounting an episode of an animated cartoon (variously
termed the “inside ascent” or the “bowling ball episode”) in which one character,
Sylvester, attempts to reach a second character, Tweety, by climbing the inside of a
drainpipe attached to the side of a building – a pipe conveniently topping out just
where Tweety is perched in an upper story window. The cat Sylvester enters the
drainpipe at street level and starts to climb it but is thwarted when Tweety Bird
drops a bowling ball into the top. Sylvester and ball meet midway, and he swallows it.
This and other examples were collected in a standardized elicitation: A participant is
shown an approximately 8-minute long animated Tweety and Sylvester cartoon
(“Canary Row”, Warner Brothers, 1950). Immediately after viewing the cartoon, the
participant recounts the story to a second participant “as accurately and completely
as possible, as “BR” will be asked to retell the story based on your narration” (or
words to that effect). The performance is video recorded, with the speaker in full
Fig. 8.1: Gesture of “rising hollowness” with: “he goe[ss up through the pipe] this time”
view and at least the front half of the listener as well. The second participant is a
genuine listener, not one of the experimenters. The instructions emphasized that the
purpose of the experiment was to study storytelling and there is no mention of gesture.
(Computer art in Fig. 8.1 and later figures by Fey Parrill.)
Typological conventions in the transcriptions of examples distinguish the phases of
gestures (Kendon 1980). The gesture phrase, the larger unit, is enclosed within
“[” and “]” brackets. The stroke, the image phase, is marked in boldface. Preparation is
the hand getting into position for the stroke and is indicated by the span from the
left bracket to the boldface (the onset of preparation is the first sign the gesture is
becoming organized); holds or cessations of movement, either prestroke, awaiting the
stroke, or poststroke, preserve the position and hand shape of the stroke after move-
ment ceases and are indicated with underlining (there was a poststroke hold over
“the” in the example); and retraction is the hand retreating to rest and is indicated
by the span to the right bracket (retraction extended through the rest of the noun
phrase, the noun “pipe”). Only the stroke phase is obligatory. Without a stroke a
gesture is not said to occur but the other phases may or may not occur.
In the boldfaced section of Fig. 8.1, the speaker raised her right hand upward, her
palm up and fingers and thumb spread apart: a kind of open basket shape moving
up, as illustrated (the clip is at the moment she is saying the vowel of “thróugh”).
This gesture depicts the path of motion (upward), the moving figure (the cat), and
the interiority of the path. The gesture synthesizes all this information in a single
symbol.
Tab. 8.1: Semiotic Oppositions Within a GROWTH POINT

Imagery side Language side
Global: meanings of parts dependent on Compositional: meaning of whole dependent on
meaning of whole. meaning of parts.
Synthetic: distinguishable meanings in single Analytic: distinguishable meanings separated.
image.
Additive: no new syntagmatic values when Combinatoric: new syntagmatic values when parts
images combine. combine.
3.4. Co-expressiveness
The synchronous speech and gesture, “and he go[es up thróugh the pipe] this time”,
were co-expressive. The term “co-expressive” means that speech and gesture cover the
same idea unit. Co-expressiveness is distinct from “mismatch” or “supplement”, where
speech and gesture cover distinct meanings. Co-expressively with “up” the speaker’s
hand rose upward; co-expressively with “thróugh” her fingers spread outward to create
an interior space. The upward movement and the opening of the hand took place con-
currently, not sequentially, and these movements occurred synchronously with “up
thróugh,” the linguistic package that carries the same meanings. The contrastive empha-
sis on “thróugh,” highlighting interiority, is matched by the added complexity of the
gesture, the spreading of the upturned fingers. What makes speech and gesture co-
expressive is this joint realization of the idea of upward motion and interiority. Utter-
ances can occur without gestures and imagery still be present. While absence of gesture
has multiple causes, one is that the less newsworthy the information, the less materia-
lized the speech and gesture. Absence of gesture is the endpoint of this continuum.
Speech, at the same endpoint, likewise becomes simple and hackneyed and may also
briefly cease. So even if gesture is absent, imagery can be present.
3.5. And contrast

But also note the differences. “Co-expressive” also means that speech and gesture are
non-redundant. They are so in two respects. In some cases, they may express different
aspects of the idea unit. Also, even when content is the same (as it is in Fig. 8.1), they
present this idea in semiotically opposite modes: Speech divides the event into semantic
units – a directed path (“up”), plus the idea of interiority (“through”). Analytic segre-
gation further requires that direction and interiority be combined syntactically, to obtain
the composite meaning of the whole. In gesture, this composite meaning is fused into
one symbol and the semantic units are simultaneous – there is no combination. Thus,
speech and gesture, at the moment of their synchronization, were co-expressive but
semiotically non-redundant, and this sets the stage for doing one thing (conception of
the cat’s climbing up inside the pipe) in two forms – language and gesture, one analytic/
combinatoric and the other global/synthetic.
By “image” I do not imply a photo or pictogram. I mean a mode of semiosis that is
“global” and “synthetic”. The “rising hollowness” gesture has both properties.
By “global”, I mean that the gesture’s parts (= the hands/fingers/trajectory/ space/
orientation/movement details of a gesture) have meanings dependent upon the mean-
ing of the gesture as a whole. The parts do not have their own meanings, and the mean-
ing of the whole is not composed out of the parts; rather; significance flows the other
direction, downward, from whole to parts. Thus we understand that the hand as a rising
whole is Sylvester ascending; the fingers are the pipe; and the spreading of the fingers is
the idea of hollowness.
By “synthetic”, I mean that the meanings are synthesized into one symbolic form (the
rising hollowness hand). In the companion speech, the gesture’s meaning may be analyti-
cally separated into elements that spread over the surface of the sentence (“goes” + “up”).
To summarize, the gesture in Fig. 8.1 embodies several ideas determined globally
by the whole – the character (the hand itself) rising up (the trajectory) and interiority
(the open shape). The gesture does not admit decomposition. It is a minimal Vygots-
kian psychological unit, and there are no subunits with their own meanings, no repea-
table significances for the outspread fingers and upward palm. Only upward motion can
be said to have an independent meaning – it means upward, but that is all and it is not
enough to generate the meaning of the gesture whole. And even this upward meaning
acquires significance as a part of the whole (it means rising hollowness, which comes
from the whole, not from upward simple). The gesture in other words is not composed
out of parts: The parts are composed out of it. Also, the gesture is more a unified whole
than just the combination of up and through. I have tried to convey this unity with the
expression “rising hollowness” but whatever the phrase the gesture has interiority, entity,
and upward motion in one, undecomposable symbolic form. The gesture synthesized
ideas that required separation in speech – the figure in motion, the direction, and the
idea of interiority were unified in the gesture while the same ideas were distributed
into “he”, “goes up” and “thróugh” in speech.
3.6. Combinatoric (syntagmatic) potential

A third semiotic contrast fuelling thought and speech is combinatoric or syntagmatic
potential; linguistic forms possess it, gesture imagery does not, and both the potential
and the absence of it exist within the growth point.
When linguistic forms combine, new syntagmatic values emerge as part of the com-
bination. This is a crucial property of the static dimension. Syntagmatic value (a com-
ponent of Saussurian linguistic value) is a property of a combination qua a combination.
A simple example is the direct object. A direct object exists only in combination with a
verb; outside this combination the noun has no such value. In “toss the ball” the direct
object property emerges, but “ball”, by itself, is not a direct object. This is referred to as
combinatoric “potential”, to recognize that a linguistic element in a growth point carries
syntagmatic value even when other linguistic elements with which it combines in the
sentence are not part of the dynamic growth point unit (we shall see such examples
later).
When gestures combine, in contrast, they add imagery but create no syntagmatic
values; there are extra details only. Fig. 8.2 is an illustration.
Fig. 8.2 shows two gestures from the same speaker that register at different levels of
detail the bowling ball as it was dropped into the top of the drainpipe. The left panel is a
low-resolution gesture when the speaker first described the episode; the right panel is
her elaboration after the listener requested clarification. The two-handed gesture is a
“combination” in the sense that it makes coordinated use of the two hands but the mean-
ings of the hands are determined globally: The gesture as a whole means something like
it’s a gutter pipe, he drops a bowling ball into it and this meaning of the whole deter-
mines the meaning of the parts. There is no new meaning arising through the combination
as such, the way that “direct object” arises. There is no equivalent in the gesture to a
higher-level syntactic phrase. In the gesture the “pipe” hand has value, not because of its
combination with the ‘ball’ hand, but from its own role in the gesture as a whole (its
meaning descending from the meaning of the whole – the global property). Instead,
the two hands build a more complex image of the bowling ball’s trajectory, the left
hand adding the ground element, “the gutter pipe”, and its spatial relationship to the
act of thrusting – meanings it acquires on its own as part of the gesture whole.
Fig. 8.2: Illustrating gesture combination. “Down the pipe” at different levels of detail.
Left panel, the speaker’s initial description, one hand showing Tweety’s “hand” shaped over the bowl-
ing ball as it is thrust into a drainpipe; the downward thrust occurred three times:
Speaker: [and throws a bowling ball] [down in the*] [the thing]
Right panel, the second description after the listener requested clarification, an elaborated gesture
together with an expanded verbal description (the downward thrust now occurring six times):
Listener: where does he throw the bowling ball?
Speaker: [it’s one of those*] [the] [gu][tter pipes] [an’ he t][hrows the ball into the top]
In the right panel, the left hand adds pictorial detail but the value is intrinsic to the imagery and does
not arise from the left hand-right hand combination, unlike the way that a direct object arises when
a noun is combined with a transitive verb in a verb phrase.
Simultaneously, speech combinations create actual syntagmatic values (“ball” is

direct object, “one of those gutter pipes” is particularized in a nominal phrase), and
this forms another element of the semiotic opposition of imagery and language in the
growth point.
4. Context
Another source of dynamic dimension change is the inclusion of the immediate context
of speaking; it is not that a growth point “consults” the context or the context sets para-
meters for it; the growth point incorporates it. Growth point and context are dependent
on each other; they are mutually constitutive. These considerations are subsumed by the
Vygotskian concept of a psychological predicate.
4.1. The psychological predicate

In a psychological (as opposed to a grammatical) predicate, newsworthy content is differ-
entiated from a context. One of Vygotsky’s examples is the following (1987: 250): There
is a crash in the next room – someone asks: “what fell?” (the answer: “the clock”); or:
“what happened to the clock?” (“it fell”). Depending on the context – here, the question –
the newsworthy reply (the psychological predicate) highlights different elements.
This logic also applies to the growth point. In forming a growth point, the speaker
shapes the background in a certain way, in order to highlight the intended differentiation
within it, much as the questioner about the falling clock shaped the context of the replies.
A psychological predicate:
(i) marks a significant departure in the immediate context; and

(ii) implies this context as background.
Regarding the growth point as a psychological predicate suggests that the mechanism
of growth point formation is differentiation of a focus from a background. Such
differentiation is validated by the very close temporal connection of gesture strokes
with the peaks of acoustic output in speech. Shuichi Nobe (1996) has documented
this connection instrumentally:
The robust synchrony between gesture strokes and the peaks of acoustic aspects suggests
that the information the gesture stroke carries has an intrinsic relationship with the accom-
panying speech information prominently pronounced with these peaks. The manifestation
of the salient information seems to be realized through the synchronization of these two
modalities (Nobe 1996: 35).
I use the terms field of oppositions and significant (newsworthy) contrast to refer to the
constructed background and the differentiation of psychological predicates (growth
points) within it. All of this is meant to be a dynamic, continuously updated process
in which new fields of oppositions are formed and new growth points or psychological
predicates are differentiated in ongoing cycles of thinking for speaking.
4.1.1. What is a “meaning”? – Two things in relation

In a growth point, a point of differentiation is what a meaning is. Say that a speaker in-
tends to describe Sylvester ascending the pipe on the inside. Merely intending it is not
enough. She must also fashion a context or field of oppositions wherein the intended
significance is newsworthy. This shaping of the context is not passive; it is the active
formulation of fields of oppositions. Where there are different fields of oppositions,
different growth points form even though the objective references are the same (see
the “inside-only” and the “outside-inside” examples in Figs. 8.3 and 8.4 of the “natural
experiment” to follow).
Control of the context by the speaking individual ensures that growth points estab-
lish meanings true to the speaker’s intentions and memory, with revisions and rejections
when self-monitoring shows they don’t fit these standards. Regarding the growth point
as a psychological predicate in a field of oppositions clarifies the sense in which we are
using the term “context”. This word has a host of meanings (see Duranti and Goodwin
1992), but for our purposes “context” is the background from which a psychological
predicate is differentiated.
4.1.2. A natural experiment

A quirk in the cartoon stimulus enables us to show in a concise way how growth points
form to differentiate newsworthy information within the immediate context. The bowl-
ing ball episode is the second of two attempts by Sylvester to reach Tweety by means of
the drainpipe. His first attempt makes use of the drainpipe as a kind of ladder or rope,
which he clambers up on the outside, but the effort fails. Undaunted, he immediately
tries a second ascent, now on the inside in a kind of stealth approach, as we have
seen. This second ascent was crowned (literally) by the bowling ball episode.
The natural experiment is the following: Describing the first attempt, the field of
oppositions would have been something like WAYS OF USING A DRAINPIPE
(since this ascent was the first mention of the pipe), and the psychological predicate
something like CLIMB IT. With the second attempt, climbing the pipe is no longer
newsworthy. It is background, and the field of oppositions would be updated, something
like: WAYS OF CLIMBING A DRAINPIPE. Now interiority would be newsworthy:
ON THE INSIDE.
If a speaker recalls both attempts, in the correct outside-inside order, the psycholog-
ical predicate relating to the second attempt should focus on interiority. This follows
from the psychological predicate concept; in the updated field of oppositions, interiority
is the newsworthy feature.
However, if a speaker recalls only the inside attempt and fails to recall the outside
attempt, or recalls both attempts but reverses their order, interiority should not be
newsworthy; it should not be a feature of the psychological predicate, even though
the speaker has perceptually registered that Sylvester did climb the pipe on the inside.
This is so because interiority does not contrast with exteriority in an inside-only or
inside-outside context. The field of oppositions would only be about climbing and the
interiority, even when registered, would be non-differentiating and not form a psycho-
logical predicate (no one has ever recalled only the outside attempt but it would also be
expected to highlight only ascent). Susan Duncan (pers. comm.) discovered this logic of
this natural experiment. It is now the basis of a designed experiment she and Dan Loehr
are conducting, comparing gestures after participants have watched the standard
outside-inside order to the gestures of different participants who have watched it in
the reverse inside-outside order. They find interiority in gesture for the outside-inside
order but not for inside-outside, showing powerfully that interiority, while perceptually
present, is not significant in inside-first.
4.1.2.1. Inside alone

The first two speakers below recalled only the inside attempt. For them, interiority had
no newsworthy significance and their gestures did not contain it, even though they de-
scribed the bowling ball and its aftermath, demonstrating that they had recognized the
interiority feature.
Cel. he tries climbing up the rai]n barrel
No interiority. Hand simply rises. Cel. continued her description

by explaining that “Tweety … drops a bowling bal][l down the rain
b][arrel”, showing that she had registered the ascent as internal.
Den. and <um> / he tries crawling up the drainp[ipe /] to get Tweety.
Right thumb (see arrow) lifts for ascent. No interiority. Den. contin-
ued with “Tweety drops a bowling ball down the drainpipe”, again
showing that the internal ascent had been registered.
Fig. 8.3: Isolated “inside” gestures do not highlight interiority.
4.1.2.2. Inside after outside

Three other speakers recalled both attempts in the correct order. In each case, the sec-
ond gesture for the second ascent highlighted interiority (Fig. 8.4). The first speaker
made the “rising hollowness” gesture; the second and third speakers extended the
index finger, not only to point but to depict the compression the plump Sylvester
had to undergo to squeeze into the pipe (as was depicted in the cartoon stimulus; the
single extended finger had significance beyond deixis and became a kind of proto-
morph for Sylvester himself; for example Fig. 8.7, line 8, where the hands depict
Sylvester rotating and have first fingers extended non-deictically). The three speakers’
outside gestures had none of these features (outside gesture screenshots shown before
each inside gesture) in Fig. 8.4.
San. Preceding outside:
Ascent and locus but no pipe indicator.
San. Interiority gesture.
he goe[[ss up th][rough the pipe]] this
This is the previously cited “rising hollowness”

gesture.
Viv. Preceding outside:
Describes ascent but no gesture
Viv. Interiority gesture.
he<e> / tri<i>es / going [[up] [the insid][e of the

drainpipe # ]]
Again, an extended index finger possibly iconically

depicting the confined space inside the pipe.
Jan. Preceding outside:
Ascent and locus but no pipe indicator.
Jan. Interiority gesture.
[ / this time he tries to go up inside the rain gutter / ]
Another extended index finger for confinement

inside the pipe
Fig. 8.4: “Inside” gestures after “outside” all highlight the newsworthy interiority feature.
4.1.2.3. The exception that proves the rule

Thanks to observations by Susan Duncan (pers. com.), we see that the next speaker,
Lau., is the proverbial exception that proves the rule (Fig. 8.5). She remembered
both attempts, in the correct order, but picked out a different significant contrast, a
twist conceivably triggered by a lexical error when she described the first “outside”
ascent. She erroneously said that Sylvester was climbing “a ladder”. There was no men-
tion of the pipe and her gesture seemed to depict grasping the rungs of this fictitious
ladder. By the second ascent she did mention the pipe, so the second ascent, for her,
contrasted not in terms of paths (inside versus outside) but in terms of ground elements
(a pipe versus a “ladder”). Her second ascent hand shape and timing suggest this inter-
pretation (hand cupped, as if the round shape of the pipe, and synchronized with
“climbing”). The extended index finger is similar to the above gestures with an ex-
tended single finger but may not have signified interiority. The finger was aimed
more forward and may have been instead a meta-narrative indicator, foretelling the
next event as she described it, “and he climbed into the apartment”, where the forward
angle fits this construction of the episode.
Lau. (outside the pipe) and he trie[s / climbing up a

la]dder #
Exception that proves the rule #1. Gesture with

“climbing up a ladder.”
Lau. (inside the pipe) he tries[s cli]m[[bing

up the <nn> drainspout / / ]
Exception that proves the rule #2. Gesture with

“drainspout.”
Fig. 8.5: Highlighting “pipe” (not interiority) when the contrast is to the ground element
(misremembering the first ascent as on a ladder).
Thus the natural experiment shows that the growth point, as a psychological predicate,
captures exactly what is newsworthy in the immediate field of oppositions and ignores
the same features when, through narrative mischance, they are not contrastive.
4.2. A designed experiment

Fey Parrill (2008) devised a method for cueing the discourse focus with a flashing arrow and
was able to change speech and gesture descriptions of events concurrently in such a way that
they coalesced around newsworthy content, creating psychological predicates on demand.
People saw the portion of the animated stimulus after Sylvester had swallowed the
bowling ball and rolled down the street with it inside him. The ball prompt was a flash-
ing arrow pointing at Sylvester’s rounded bottom end, and resulted in more ball-subject
utterances, compared to the Cat prompt (arrow pointing at the his head) – for example,
“the ball rolls him down the street”, rather than “he rolls down the street” (after the Cat
prompt). The ball prompt resulted in more manner in gesture – for example, the hand
rotating, rather than a straight trajectory gesture. The induced points of focus thus
formed show newsworthy co-expressive speech and gesture.
4.3. Gestures shift timing dependent on context

Sometimes gesture form does not embody the newsworthy content but gesture timing
does. For example, one speaker used the same verb (“climbs up”) and performed sim-
ilar gestures for the outside and inside ascents, but on the second occurrence, when it
was no longer newsworthy, the gesture skipped the verb and shifted to an expression
of interiority (Fig. 8.6):
Outside (upward gesture with “climbs up”):
“and then the second part / [is he climbs up the drain]”
Inside (similar upward gesture skips “climbs up” to go with “through”
“<uuhh> let’s see the next time / is he tries to* <uh> /tries to cliimb / [up / in / through /
the] [drain* /”
Fig. 8.6: Timing shift to highlight interiority.
5. Catchments and fields of oppositions

A growth point’s context or field of oppositions can be empirically recovered from the
gestures themselves. Catchments happen when space, trajectory, hand shapes, etc. recur
in two or more (not necessarily consecutive) gestures. Catchments show the effective
contextual background and provide an empirical route to discovering it. Recurring ges-
ture imagery is examined to see if it embodies a discourse theme. For both climbing up
the outside and climbing up the inside of the drainpipe, the same spaces and trajectories
occurred (iconically depicting Sylvester’s entrance at the bottom of the pipe). The re-
currences suggest a catchment or discourse theme having to do with entering the
pipe. Verbally, too, references such as the temporal indicator, “this time”, point to
this kind of discourse unit. Newsworthy gesture content then occurs as a modification
or elaboration of the catchment, preserving its imagery while adding what is newswor-
thy. For the “up thróugh” ascent, the catchment comprised the lower periphery and
ascent itself, and the newsworthy content was the interior space.
Proof of the catchment comes from an ingenious test by Furuyama and Sekine
(2007) who noticed a systematic avoidance of gestures precisely where their referential
content, had it been included, would have disrupted an ongoing catchment. The catch-
ment vetoed referential imagery that was valid according to the episode but was
thematically inconsistent with the catchment.
A good illustration of a catchment and how growth points absorb the context from it
is a set of gestures created by one of the above participants, Viv., for the inside ascent,
the subsequent encounter with the bowling ball and its aftermath. We see multiple
catchments converging to form growth points (Fig. 8.7).
We pick up the inside episode at a slightly later stage – the moment Tweety injects
the bowling ball into the pipe. In words, Viv. said, “and Tweety runs a gets a bowling b
[all and drops it down the drainpipe]”, accompanied by the downward thrusting gesture
shown in Fig. 8.7, line (2).
The first thing to notice is that the timing of the gesture stroke of (2) (boldface) is
somewhat off, if we think that gestures should line up with synchronically definable lin-
guistic constituents. The stroke excluded the lexical affiliate, the verb “drops”; it coin-
cided instead with “it down”, and in this way combined two constituents, the Figure
and Satellite (using Talmy’s 2000 categories), but excluded another, the Activating Pro-
cess, to which the Figure is actually more tightly coupled in the grammatical structure of
the sentence.
The exclusion was no accident. Preparation began at the first mention of the bowl-
ing ball in the preceding clause, which suggests that the bowling ball was part of the
discourse focus at that moment. And it continued right through the verb, suggesting
that the verb was irrelevant to this focus. Further, a brief prestroke hold seems to
have preceded “it down” (although coding of the hold varies), which, if present, tar-
geted “it down”. Finally, an unmistakable poststroke hold lasted exactly as long as it
took to complete the spoken articulation of “down”. This hold, preserved the seman-
tic synchrony of the gesture stroke with the articulation of “it down”. So the stroke
fully and exactly timed with just these two words, and actively excluded a third,
“drops”, which happens to be the closest lexical approximation to it. But why? To
explain it we must examine the catchment structure into which the “it down” growth
point fits.
(1) he tries going [[up] [the insid][e of the

drainpipe #]] and
C1 One-handed gestures – items (1) and

(6) – ties together references to Sylvester
as a solo force
(2) Tweety Bird runs and gets a bowling

ba[ll and drops it down the drainpipe #]
C2 Two-handed SYMMETRICAL gestures –

items (2), (7), (8) and (9) – groups descrip-
tions where the bowling ball is the antag-
onist, the dominant force. The 2-handed
symmetric gesture form highlights the
shape of the bowling ball.
(3) [and / as he’s coming up]
C3 Two-handed ASYMMETRICAL gestures –

items (3), (4), (5) and (6) – groups items
in which the bowling ball (LH) and Sylve-
ster (RH) are equals differing only in
their direction of motion.
(Continued)
(4) [and the bowling ball’s coming d]]

(5) [own he ssswallows it]

(6) [# and he comes out the bottom of the

drai]
C1 One-handed gestures – items (1) and

(6) – ties together references to Sylvester
as a solo force. AND

(Continued)
(7) [npipe and he’s got this big bowling

ball inside h]im

tions where the bowling ball is the antago-
nist, the dominant force. The 2-handed
(8) [and he rolls on down] [into a bowling

all]

(9) [ey and then you hear a sstri]ke #

symmetric gesture
Fig. 8.7: Catchments in the bowling ball and its aftermath description.
The catchments for (1) through (9) appear in hand use – right hand or left; one hand
or two; and, when two hands, same or different hand shape and/or hand position (other
catchments could be present). Each of the gesture features embodies a certain thematic
content and this content is what motivates it: C1 is about Sylvester as a moving entity
and its recurring gesture feature is a single moving hand; C2 is about the bowling ball
and what it does, and its recurring feature is a rounded shape (in gesture transcription
terms, “2 similar hands”); C3 is about the relative positions of Sylvester and bowling ball
in the drainpipe and its recurring feature involves the two hands in the appropriate
spatial configuration (“2 different hands”). The “it down” growth point was part of
the symmetrical 2-handed C2 (several later gestures were also part of it).
The occurrence of (2) in the symmetrical catchment shows that one of the factors
comprising its field of oppositions at this point was the various guises in which the bowl-
ing ball appeared in its role as an antagonist. The significant contrast in the example was
its downward motion. Because of the field of oppositions at this point, this downward
motion had significance as an antagonistic force against Sylvester. We can write this
field of oppositions and growth point, respectively, as:
An antagonistic Force: Bowling Ball (“it”) Downward (“down”)
This was the context and contrast respectively. Thus, the “it down” growth point,
unlikely though it may seem as a unit from a grammatical point of view, was the cog-
nitive core of the utterance in (2) – the “it” indexing the bowling ball, and the
“down” indexing the significant contrast itself in the field of oppositions.
6. Unpacking
The final contribution to the dynamic dimension is called here “unpacking”. In the fol-
lowing, it should be evident that we are not trying to derive linguistic structures from
imagery or any other underlying psychological mechanisms. Such would undercut the
entire dialectic concept. Growth point instability is resolved by unpacking when it is
cradled in one or more syntactic constructions. The effect is to stabilize the cognition
of the growth point. A construction is a resting point par excellence (Goldberg 1995).
How a construction is selected is the source of dynamism. Since the construction will
have its own context and add its own significances selecting it involves the generation of
further meanings. The growth point and the unpacking can occur simultaneously but
they are functionally different – the core meaning in the growth point, a peripheral
meaning that presents this core in the unpacking.
The first context of “it down” we have already analyzed; it was the C2 theme in
which the bowling ball was an antagonistic force. A second context can be seen in
the fact that the two-handed gesture at (2) also contrasted with C1 – the preceding
one-handed gesture in (1) depicting Sylvester as a solo force. (1) and (2) comprised a par-
adigm of opposed forces: Sylvester versus Tweety (via the bowling ball). This contrast led
to the other parts of the utterance in (2) through a partial repetition of the utterance
structure of (1), a poetic framework within which the new contrasts were formed
(see Jakobson 1960). Contrasting verbal elements appeared in close to equivalent slots
(the match is as close as possible given that the verb in (2) is transitive while that in
(1) is intransitive):
(1) (Sylvester) up in “he tries going up the inside of the drainpipe”
(2) (Tweety) down in “and (Tweety) drops it down the drainpipe”
The thematic opposition can be summarized as an opposition of counterforces –

Sylvester-up vs. Tweety-down. Our feeling that the paradigm is slightly ajar is due to
the shift from spontaneous to caused motion with “drops”. This verb does not alter
the counterforces paradigm but transfers the counterforce from Tweety to the bowling
ball, as required for the objective content of the episode.
The parallel antagonistic forces in (1) and (2) made Tweety the subject of (2), match-
ing Sylvester as subject of (1), as is appropriate for the counterforces paradigm. The
contrast of (2) with (1) thus had two effects on our target utterance. It was the source
of the verb, “drops,” and was why the subject was “Tweety,” rather than “bowling ball.”
The subject slot fulfilled Tweety’s role as agent and the verb “drops” shifted the down-
ward force theme to the bowling ball. The contrast of subjects, and the similar syntactic
frames, expressed the antagonistic forces paradigm itself. The continuing preparation
and possible prestroke hold over “drops” is thus also explained: the verb, deriving
from an antagonistic forces context, was not part of the growth point, and the stroke
was withheld until its linguistic side was present.
Also explained is the total gesture phrase, from the onset of preparation in “ball” to
the completion of retraction with “pipe”, how it came into being, grew and then ceased
to be as a living cognitive unit. It is this period of time that gesture imagery schematized.
The whole utterance, the growth point and the unpacking, was orchestrated to present
the growth point. In awareness terms, the focus was “it down” and the periphery was
awareness of the construction and its meaning as a counterforces paradigm.
7. Note on consciousness
Consciousness was once the central topic of psychology but it was swept away by behav-
iorist puritanism. Interest in it is returning after a long exile, and the growth point offers
its own perspective. Wundt ([1900] 1973) a century ago described the “sentence” as a
dynamic psychological phenomenon:
From a psychological point of view, the sentence is both a simultaneous and a sequential
structure. It is simultaneous because at each moment it is present in consciousness as a
totality even though the individual subordinate elements may occasionally disappear
from it. It is sequential because the configuration changes from moment to moment in
its cognitive condition as individual constituents move into the focus of attention and
out again one after another. (Translation by Blumenthal 1970: 21, a passage cited by
Zenzi Griffin)
Wundt’s insight is that two simultaneous phenomena occur as the “psychological struc-
ture” of the sentence, something that exists all at once and something else, the same
meaning but in another form, that is successive. Wundt had in effect noticed in these
wavering levels of consciousness the growth point and its unpacking. Growth points sur-
face in the unpacking construction as brief dynamic pulses (to use Susan Duncan’s 2006
term). The dialectic is the focus of this rhythmically situated point in the flow of speech
action (called the “L-center” in McNeill 2003). The unpacking is the sequential frame
housing it. The unpacked growth point and its construction are aligned structures that
laminate consciousness and are experienced in the dual way that Wundt described.
8. Conclusions
Growth points are the brief dynamic pulses wherein idea units take form. Housing op-
posed semiotic modes, they are intrinsically unstable and stability is sought in the form
of unpacking into grammatical constructions, which offer static dimension stability par
excellence. The sources of utterance dynamics sketched here are:
(i) the imagery-language dialectic within each growth point;

(ii) the incorporation of context into growth points as psychological predicates; and
(iii) the further meanings and contexts generated in unpacking. The growth point
model applies at the intersection of the two dimensions of language – the dynamic
and static – and is itself a minimal unit of the dynamic dimension. Not described
here for lack of space is the incorporation by growth points of social-interactive
information; mimicry; “mind-merging”; the embodiment of social growth points
in two and more bodies; the material carrier; and gestures during conversational
discourse; see McNeill (1992, 2005) for most of these, McNeill et al. (2009) for
mind-merging and Quaeghebeur and Reynaert (2010), Feyereisen (volume 2),
for the material carrier.
9. References
Blumenthal, Arthur (ed. and trans.) 1970. Language and Psychology: Historical Aspects of Psycho-
linguistics. New York: John Wiley & Sons.
Chase, William G. and K. Anders Ericsson 1981. Skilled memory. In: John Robert Anderson (ed.),
Cognitive Skills and Their Acquisition, 227–249. Hillsdale, NJ: Erlbaum.
Duncan, Susan 2006. Co-expressivity of speech and gesture: Manner of motion in Spanish,
English, and Chinese. In: Proceedings of the 27th Berkeley Linguistic Society Annual Meeting,
353–370. [Meeting in 2001.] Berkeley, CA: Berkeley Linguistics Society.
Duranti, Alessandro and Charles Goodwin 1992. Rethinking context: An introduction. In: Ales-
sandro Duranti and Charles Goodwin (eds.), Rethinking Context: Language as an Interactive
Phenomenon, 1–42. Cambridge: Cambridge University Press.
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body – Language – Communication: An International Handbook on Multimodality in
De Gruyter Mouton.
Furuyama, Nobuhiro and Kazuki Sekine 2007. Forgetful or strategic? The mystery of the system-
atic avoidance of reference in the cartoon story narrative. In: Susan D. Duncan, Justine Cassell
and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language, 75–82. Amster-
Goldberg, Adele 1995. Constructions: A Construction Approach to Argument Structure. Chicago:
Harris, Roy 2002. Why words really do not stay still. Times Literary Supplement 5182. 26 July 2002.
Harris, Roy 2003. Saussure and His Interpreters, 2nd edition. Edinburgh: Edinburgh University Press.
Jakobson, Roman 1960. Closing statement: Linguistics and poetics. In: Thomas A. Sebeok (ed.),
Style in Language, 350–377. Cambridge: Massachusetts Institute of Technology Press.
Hague: Mouton.
of Chicago Press.
McNeill, David 2003. Aspects of aspect. Gesture 3: 1–17.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking for speaking. In: David
McNeill, David, Susan D. Duncan, Amy Franklin, James Goss, Irene Kimbara, Fey Parrill, Ha-
leema Welji, Lei Chen, Mary Harper, Francis Quek, Travis Rose, and Ronald Tuttle 2009.
Mind merging. In: Ezequiel Morsella (ed.), Expressing Oneself / Expressing One’s Self: Com-
munication, Language, Cognition, and Identity, 143–164. London: Taylor and Francis.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. Translated by C. Smith. London:
Routledge.
Nobe, Shuichi 1996. Representational gestures, cognitive rhythms, and acoustic aspects of speech:
A network/threshold model of gesture production. Ph.D. dissertation, University of Chicago.
Parrill, Fey 2008. Subjects in the hands of speakers: An experimental study of syntactic subject and
speech-gesture integration. Cognitive Linguistics 19(2): 283–299.
Quaeghebeur, Liesbet, Susan D. Duncan, Shaun Gallagher, Jonathan Cole and David McNeill
volume 2. Aproprioception, gesture, and cognitive being. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language –
books of Linguistics and Communication Science 38.2.) Berlin: De Gruyter Mouton
Quaeghebeur, Liesbet and Peter Reynaert 2010. Does the need for linguistic expression constitute
a problem to be solved? Phenomenology and the Cognitive Sciences 9(1): 15–36.
Saussure, Ferdinand de 1966. Course in General Linguistics. Edited by Charles Bally and Albert
Sechehaye, in collaboration with Albert Riedlinger; translated by W. Baskin. New York:
McGraw-Hill. First published [1959].
Saussure, Ferdinand de 2002. Écrits De Linguistique Général. Compiled and edited by Simon
Bouquet and Rudolf Engler. Paris: Gallimard.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Cambridge: Massachusetts Institute of Tech-
nology Press.
Vygotsky, Lev S. 1987. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts Institute of
Technology Press.
Wundt, Wilhelm 1973. The Language of Gesture. The Hague: Mouton. First published [1900].
Zinchenko, Vladimir P. 1985. Vygotsky’s ideas about units for the analysis of mind. In: James V.
Wertsch (ed.), Culture Communication, and Cognition: Vygotskian Perspectives, 94–118. Cam-
David McNeill, Chicago (USA)

9. Psycholinguistics of speech and gesture:

Production, comprehension, architecture
1. Information processing architectures
2. Challenges to information processing models
3. Empirical issues
4. Conclusion
5. References
Abstract
Since the beginnings of psycholinguistics, gestures were considered as significant parts of
the multimodal messages that are exchanged during social interactions. The first models
proposed to explain production and comprehension of these messages took various forms
of cognitive architectures. These information-processing models explicitly distinguished
several components and processing levels, either modality-specific or “abstract”, i.e.
shared by different modalities as input to production and output of comprehension.
The assumption of such abstract conceptual representations underlying uses of speech
and gesture was challenged in two directions. In a dynamic perspective, the notion of
fixed conceptual, lexical or motor representations was criticized and meaning was sup-
posed to emerge from temporary interactions among sub-symbolic units connected in
multi-layered networks. In another, pragmatic direction, a distinction was made between
monologues and dialogues. Speech and gestures in conversations are joint actions that
require collaborative partners sharing a common ground. These various models have in-
spired experimental or quasi-experimental manipulations of several factors: the charac-
teristics of the speakers, of the addressees, of the situational context, of the speech
content, of the task demands. In this way, multiple routes were drawn to link up the
verbal, motor, and social cognitive subsystems that are involved in multimodal
communication.
1. Information processing architectures

1.1. The early stages
In the middle of the twentieth century, the influential Mathematical Theory of Commu-
nication (Shannon and Weaver 1949) oriented psycholinguistic research towards statis-
tical issues concerning the probabilities of signals and responses in message encoding
and decoding. This is the origin of the schematic diagrams that represent communica-
tion between a sender and a receiver through physical channels, both spoken-auditory
and gestural-visual. Very soon, however, the emergence of generative grammar modi-
fied conceptions about language comprehension, production and acquisition. These is-
sues were surveyed in The Psychology of Language by Fodor, Bever, and Garrett (1974).
One of their key ideas was that speech production can be viewed as the translation of an
internal message into acoustic and visual forms by means of successive If-Then rules.
Symmetrically, the output of this process serves as input for the recognition system.
The computing language of the initial message conceived in the mind differs from
9. Psycholinguistics of speech and gesture: Production, comprehension, architecture 157
any spoken language and therefore is called “mentalese” or the “language of thought”
(Fodor, Bever, and Garrett 1974: 374–377).
Such a language is based on abstract symbols, or concepts, which also code for motor
actions and for outcomes of perception. These symbols combine within propositions
(i.e. elementary logical predicate-argument structures such as “the Earth is round”).
By their abstract nature, conceptual representations can interface words, objects, and
actions. Computation assumes serial processing stages, top-down in production and
bottom-up in comprehension. In both cases, the operations take place within “mod-
ules,” i.e. autonomous components specialized for definite functions, such as syntactic
parsing or object recognition, at an intermediate level between the conceptual system
and the sensory-motor processes. Fodor, Bever, and Garrett’s (1974) proposal was thor-
oughly revised in further elaborations (e.g. the sentence-production models of Garrett
(1988) and Levelt (1989)) but some basic ideas remain. These include: the notion of
abstract conceptual representations at the beginning of the message production and
at the end of its comprehension; the notion of modularity; and the sequential timing
of information processing.
1.2. Information-processing models of gesture production

and comprehension
Several investigators have suggested that the speech production model of Levelt (1989)
can be developed to account for the processing of co-verbal gestures (e.g. Krauss,
Chen, and Chawla 1996; Krauss and Hadar 1999; Krauss, Chen, and Gottesman 2000;
de Ruiter 2000, 2007; Kita and Özyürek 2003; Kita 2009). The synthetic model pre-
sented in Fig. 9.1 summarizes this type of proposal. In his model, Levelt distinguished
three successive stages in the generation of spoken messages: conceptualization, formu-
lation, and articulation. The common suggestion in the architectures for gestural infor-
mation processing is that the component responsible for grammatical and phonological
encoding in speech production has to be matched by a second production system under-
lying the planning of bodily movements. Similarly, by analogy with the component in-
volved in speech comprehension, a second recognition system should be added to allow
gesture interpretation.
Beyond that, different architectures were developed with different labels to express
disagreements. In Krauss and his co-workers’ (1996, 1999, 2000) model, gesture produc-
tion is independent of the communicative intention that determines speech formulation.
A “motor planner” receives information directly from the spatial/dynamic component
of the working memory, and gestures are not supposed to fulfil communicative func-
tions. Krauss and his colleagues suggested that complex movements, which they called
“lexical gestures” (rather than “iconic” gestures), share meaning with speech content.
They hypothesized that these gestures facilitate the retrieval of word forms by a
cross-modal priming of lexical access through a connection from the gesture-production
system to the phonological-encoding component. Morsella and Krauss (2004) found
that rates of lexical gestures were higher when speakers described objects from memory
than when they described images in view. They infer that these gestures help the main-
tenance of spatial information in working memory. In another study, Morsella and
Krauss (2005) recorded the electro-muscular activity of the arm during noun retrieval
from definitions and found that concrete words elicited larger response amplitudes
Internal Working memory

Long-term
representation of central executive,
knowledge:
situated context: phonological loop,
world, self,
visual, spatial, bodily, episodic buffer,
other minds
social visuo-spatial
sketchpad
Communicative intention:
conceptualization
Action
repertoire: Lexicon:
lemmas,
hierarchical
forms
schemas
Formulation: Comprehension:
grammatical sentence parsing,
Gesture Gesture
encoding, word recognition,
recognition planning
phonological phonological
encoding analysis
Visual Auditory
Execution Articulation
processing processing
Gesture Speech
Fig. 9.1: Information processing architecture for production and comprehension of speech
and gesture: a synthetic model. (As in Levelt (1989), boxes represent processing components
and ellipses represent knowledge stores.)
than did abstract words. Following Barsalou (1999), they assumed that word meaning re-
activated sensory-motor states (“perceptual symbols”) that differ from the conceptual
symbols assumed by Fodor, Bever, and Garrett (1974).
Two of Krauss et al.’s main claims – that lexical gestures help word retrieval and play
only a minor role in communication – remain disputed on the basis of available empir-
ical evidence, but they have had a strong impact on experimental investigation of ges-
ture processing (Feyereisen 2006; see also Goldin-Meadow this volume; Hostetter
2011). de Ruiter (2000, 2007) proposed an alternative model he successively called
the Sketch Model and the Postcard Architecture. In this conception, thoughts and com-
municative intentions are expressed in separate and parallel channels like the two sides
of a postcard. This model is consistent with the views developed by Kendon (2004; see
also Hadar this volume). Kendon carefully described cases of mutual adaptations in the
use of speech and gestures: sometimes, execution of a movement is interrupted by a
holding phase to fit the structure of the spoken expression, whereas, on other occasions,
speech is interrupted by silent or filled pauses to accommodate the requirements of
gestural expression. In the Postcard Architecture speakers are assumed to conceive a pre-
liminary message during the conceptualization phase and to select different communica-
tive forms. Accordingly, information that is not formulated in speech may be conveyed in
gesture or vice versa, depending on the relevance and availability of the chosen vehicle. In
support of this model, Melinger and Levelt (2004) found that the choice of performing
gestures or keeping the hands immobile influenced the lexical choices used to describe
spatial layouts. By analogy with a lexicon, de Ruiter (2000) also assumed the existence of
a “Gestuary,” a store of gesture templates. This is consistent with the notion of gesture
families proposed by Kendon (1995, 2004).
The Interface Hypothesis was initially proposed by Kita and Özyürek (2003) on the
basis of cross-linguistic evidence and then supported by experimental findings (see also
Kita 2009; Kita et al. 2007). Languages such as English, Japanese, and Turkish do not
express motion events in similar ways. For instance, the verb “to swing” has no equiv-
alent in Turkish and must be translated by a paraphrase. Kita and Özyürek (2003) found
similar differences in the gesture modality. English speakers described the swing move-
ment much more often with an arc trajectory than with a straight trajectory, whereas
Japanese and Turkish speakers used both kinds of gestures at similar rates. Accordingly,
the Interface Hypothesis consists of splitting the conceptualization stage into several
components and creating an intermediate level in which a Message Generator informs
verbal formulation and an Action Generator informs motor control. In this model, bi-
directional links between these two Generators represent reciprocal adaptations of ges-
ture forms to speech contents and vice versa. Interactions are located at this level and
not in the more abstract conceptualization stage, as in de Ruiter’s (2000) model. Feed-
back from formulation to message generation also allows cross-linguistic differences in
the way motion events are represented.
Kita (2000) and Goldin-Meadow (2003) both considered an alternative to the facil-
itation hypothesis of Krauss and his co-workers. They suggested that gesture production
can help conceptualization rather than formulation. According to Kita’s (2000) Infor-
mation Packaging hypothesis, “the production of representational gestures helps speak-
ers organize rich spatio-motoric information into packages suitable for speaking” (Kita
2000: 163). Experimental findings support this hypothesis, as gesture rates are higher in
the most difficult tasks than in tasks involving access to infrequent words (Hostetter,
Alibali, and Kita 2007; Melinger and Kita 2007).
Goldin-Meadow (2003) has been particularly concerned by mismatches between ges-
tures and speech during cognitive development. At some ages, in transition between
two stages, children express one idea verbally and a different idea in their gestures,
suggesting that gestures may help them to learn a new problem solving strategy.
Goldin-Meadow (2003) also hypothesized that gestures facilitate cognitive processing
by alleviating the load in working memory. In working memory tasks, it has been
shown that if the discourse of the participants is accompanied by gestures in the interval
between presentation and recall, recall scores for verbal and visuo-spatial materials are
higher than if the participants did not use gestures (see, e.g. Wagner, Nusbaum, and
Goldin-Meadow 2004). Goldin-Meadow assumed that the interposed verbal task was
easier (or less disturbing) when the expression was assisted by gestures. Her views
have not been presented through box-and-arrow diagrams, but they are relevant
to the issue of production architecture, and de Ruiter (2007) included this model
under the rubric of Window Architecture to express the idea that gesture is a window
into the mind.
Most of these discussions concern the production of representational gestures in the
framework of the Levelt’s (1989) model. Much less is known about the control mechan-
isms of other kinds of movements such as beat gestures in relation to speech rhythm and
variations of intonation (but see Krahmer and Swerts 2007; McClave 1994, 1998), head
movements and gaze orientation (e.g., McClave 2000), and use of facial gestures in
conversation (Bavelas and Chovil 2006). Likewise, on the comprehension side, the pro-
cesses underlying the integration of auditory and visual information have not been
made explicit (for an information-processing model of the listener, see Cutler and
Clifton 1999). Moreover, there are controversies about the reception stage at which
the gesture and speech comprehension systems interact. However, in various other di-
rections, developments in cognitive sciences have led to profound changes in the use of
processing metaphors for the mind.
2. Challenges to information processing models

2.1. Connectionism and dynamic systems
Advances in computational modelling and consideration of how the brain actually func-
tions have inspired alternative models of cognitive processing. In this respect, connec-
tionism and the dynamic-systems approach converge on alternative conceptions of
mechanisms underlying language use and action control. They have in common the
rejection of discrete representations such as concepts, lexical entries, or motor pro-
grams, replacing them by interactions among sub-symbolic entities. From such a
point of view, [mental] “representations are not abstract symbols but rather regions
of state space” (Elman 1995: 196). Grammatical constructions and action sequences
do not follow rules but rather “trajectories through state space” (Elman 1995: 215).
These two approaches also share an emphasis on the role of context in explaining
variability over occasions and in individual performance.
The connectionist architectures consist of layers of elementary units representing
input (perceptual features), output (response features), and hidden units that represent
the result of learning experience, which captures statistical input-output regularities.
Various investigators have proposed connectionist models of some (often limited) as-
pects of language comprehension and production (see Christiansen and Chater 2001
for an overview of connectionist psycholinguistics). Similarly, complex non-linguistic se-
quences such as preparing a sandwich were simulated in models that did not assume a
repertoire of discrete schemas of elementary actions. Connectionist architectures may
differ from each other in many respects, but they share some assumptions that distin-
guish them from more traditional modular approaches: distributed representations
over multiple elementary units and bi-directionality of information flow. Connectionist
models have had little impact on the study of gestures, but they question some of the
important assumptions of information processing models. In some cases, however, com-
promises have been found in hybrid models that included symbolic representations at
the more general level (intended goals) and sub-symbolic interactive activations at
the lower levels (“how the machine works”).
The Dynamic System theory emerged from physical and mathematical descriptions
of changes at different time scales (van Gelder and Port 1995). It had a considerable
impact on the study of motor control and motor development, and also influenced mod-
els of language acquisition and spatial memory. The basic idea is that performance
emerges from the history of multiple interactions among heterogeneous components
of the system formed by the body and its environment. For instance, object-name learn-
ing is grounded in sensory-motor experience of manipulation, recognition of similari-
ties, frequency of heard names, and vocal imitation ability, all factors which change
over time (Smith and Samuelson 2003). These changes can be modelled by mathemat-
ical equations that enable the present state to be predicted from knowledge of previous
states of the system. These equations also allow predictions to be made about the results
of experiments in which critical parameters (e.g. cue strength and timing) are manipu-
lated. In dynamic accounts, behaviour does not rely on fixed representations but
emerges from embodied interactions in the context of a particular task (Barsalou,
Breazal, and Smith 2007; Smith 2005).
This anti-representationalist claim and non-modular view are similar to another
theory of the interaction of gesture and speech developed on different grounds by
McNeill (1987, 1992, 2000, 2005, this volume). Like dynamicists, McNeill deeply dis-
agreed with an information processing approach to gesture. According to him, there
is neither a language of thought nor a mental image serving as an input to the machin-
ery that translates the message into a spoken and bodily output. Thought is not defined
by its content, but as a process coming into existence through a progressive develop-
ment of grammatical and imagistic forms. To describe the formulation of the utterance,
McNeill used the term “unpacking” that suggests a segmentation of a whole and a
contrast of the communicative event with its context. A central concept in this theory
is the notion of “growth point,” i.e. the minimal psychological unit of inner speech from
which gestures and speech arise. This psychological predicate is a dynamic unit (a tem-
porary activation) in two senses. First, it results from the instability of an idea, which
can be conveyed in various modes, images and linguistic forms. Second, it emerges
from a contrast with the context, i.e. the inherent background from which the psycho-
logical predicate is differentiated. McNeill’s two main criticisms of the information-
processing models are the separation of language from imagery (as in the dual-code
hypothesis) and the use of context as an external parameter, outside the model. This
complex and original theoretical work also differs from connectionist models of
language processing, which represent input as a layer of feature nodes and do not incor-
porate imagery-language dialectic as a major determinant of gesture and speech
production.
Unlike other dynamical systems approaches, McNeill’s theory is formulated in ver-
bal and not mathematical terms. However, it has inspired some computer scientists
who try to simulate human behaviour through an artificial conversational agent able
to convey information by means of speech and gesture (Sowa et al. 2008). Previous at-
tempts had been based on information processing models such as Kita and Özyürek’s
(2003) Interface Model (see e.g. Kopp, Bergmann, and Wachsmuth 2008) with separate
modules for visuo-spatial imagery and sentence formulation. In the revised model,
gestures are not produced by combining pre-defined features but are learned from
interactions with the context. Yet, this computational model did not capture the
imagery-linguistic duality assumed in the notion of a growth point.
Another source of disagreement among researchers concerns the relationship
between co-verbal gestures and action. According to McNeill (2005), “gestures arise
from the process of thinking and speaking and can arise separately (in part) from the
brain systems controlling instrumental actions” (McNeill 2005: 234). In a different
research tradition, however, hand and mouth movements are closely associated from
birth to mechanisms involved in manipulation and feeding, and not in communication
(see e.g., Iverson and Thelen 1999). Synchrony of gestures and speech emerges from
this primitive coupling of vocal and manual repetitive movements. Connections exist
between the low-level control mechanisms involved in determining the kinematic

parameters of object grasping and syllable pronunciation (e.g. Bernardis and Gentilucci
2006; Gentilucci and Dalla Volta 2008). On a higher level, the origins of thought in sen-
sory-motor activity orient cognition toward embodied representations: gestures express
knowledge because knowledge is grounded in the perception-for-action system used in
bodily interactions with the environment (Hostetter and Alibali 2008). From such a per-
spective, conceptual representations are nothing other than representations of modal-
ity-specific bodily states, and thus what speech and gesture convey are memory traces
of sensory-motor interactions with the environment (Barsalou 1999, 2008). Thus, the
notion of dynamic systems has inspired different architectures for gesture and speech
production and comprehension. However, advances in the philosophy of language pro-
gressively displaced the focus on semantics and syntax toward concerns for pragmatic
aspects of language use.
2.2. The pragmatic turn

Understanding an utterance is not only decoding the perceived signal but also inferring
the intention of the speaker by means of knowledge shared by the two partners. The
speakers try to elicit expected reactions, and, for that purpose, they adapt their dis-
course to the mental states of their addressees. Thus, main language uses are joint ac-
tivities grounded in social exchanges and physical contexts. In this respect, they
fundamentally differ from other verbal tasks like reading a text or reciting from mem-
ory (Clark 1996, 1997; for a comprehensive overview, see Krauss and Fussell 1996).
Face-to-face conversation is the prototypical language setting. The architecture of the
system involves interactions between two interlocutors. This can be represented by add-
ing a mental model of the addressee in the mind of the speaker or, as Pickering and
Garrod (2004) proposed, by connecting the various components of the two systems
on multiple levels (situation model, lexicon, syntax, etc.). There is also some evidence
of imitation of gesture forms (Kimbara 2008).
A study by Clark and Krych (2004) illustrates such a pragmatic approach. Their
experiment examined the influence of signals sent by the addressee on the communica-
tive behaviour of a speaker called the director, in charge of explaining to a partner
(called the builder) how to assemble a model from a set of Lego® blocks. Four condi-
tions of visual availability were compared by means of appropriate barriers. In all cases,
the model was out of sight of the builder. In two conditions, the builder’s workspace was
visible, and in the two others it was masked. Half of the time, an aperture in the barrier
allowed facial communication, and in the other half, faces were hidden. Analyses of per-
formance indicated that workspace visibility had significant effects on performance, but
the effects of facial signals were negligible. In particular, building time were shorter
when the workspace was visible than when it was not and both partners used fewer
words overall. Some deictic expressions (here, there, like this, like that) were used
more frequently by both the director and the builder in the visible workspace condi-
tions, as were several kinds of gesture by the builder (exhibiting blocks, poising to elicit
yes/no responses, pointing) to obtain additional information. Thus, verbally and non-
verbally, the recipients sent signals about their degree of comprehension, which were
monitored by the speakers in order to adjust the form of their utterance. The process
was severely disrupted when visual signals were eliminated.
Bavelas and Chovil (2006) also assumed that the uses of language and gesture differ
in monologues and dialogues. Some gestures (called interactive gestures) integrate the
conversational partner into the process of message production and are absent in indi-
vidual narrative recall (see also Bavelas et al. 1992). These gestures refer to the inter-
locutor or to previous exchanges (through phrases such as “as you know” or “as we
said”) rather than to external elements (as the so-called topic gestures do). Bavelas
et al. (2008) designed an experiment to investigate the separate contributions of visi-
bility and dialogue by comparing the description of a picture of an eighteenth century
dress, out of sight of the addressee, in three conditions: face-to-face, by telephone, and
to a tape recorder. Analyses of speech and gestures revealed the effects of the two
manipulated factors. Gestures were more frequent in the dialogues (either face-to-face
or by telephone) than in monologues, but differed in form and content depending on
visibility conditions. Their amplitude was greater in face-to-face interaction (for
instance, by depicting the size of the dress with reference to the speaker’s own
body) and gestures were more often redundant with speech in the telephone condi-
tion. As might be expected, deictic expressions and pointing gestures were more fre-
quent when the addressee was visible. These differences related to the distinction
made by Clark (1996) between the three ways of communicating: describing by
means of symbols, indicating by pointing and deictic expressions, and demonstrating
through action.
In conclusion, psycholinguistic research is fuelled by several controversial topics
such as domain-specific modularity, format of thought (abstract, language-like, or sen-
sory-motor representations), directions of information flows (top-down and/or bottom
up), and communication mechanisms. In most cases, these conceptions are not really
antagonist and hybrid models can be elaborated.
3. Empirical issues
Psycholinguistics is also an empirical discipline, and the various models can be tested
through experimental and quasi-experimental manipulation of several factors.
– The speakers. Individual differences in the use of bodily signals are massive but still
poorly explained. Other differences may relate to age, language proficiency, bilingu-
alism, or capacity to generate mental images (see, e.g. Feyereisen and Havard 1999;
Hostetter and Alibali 2007; Nicoladis 2007).
– The addressees. The knowledge that the speakers can assume in their partners plays
an important role in the use of gestures. Jacobs and Garnham (2007), for instance,
compared the number of representational gestures in successive narratives, using
either three different or the same story repeated to either three different or the
same listener. Across trials, gesture frequency declined in the same/same condition
but remained relatively high in the same/different and different/different conditions.
These results are inconsistent with the view that gestures are performed to facilitate
lexical retrieval rather than to influence the listener. There is now growing evidence
that common ground shared between the speaker and the listener influences the use
of gesture (e.g. Gerwing and Bavelas 2004).
– The situational context. Bavelas et al. (2008) have shown that the presence and
visibility of a listener influence the use of gesture through attribution of states of
knowledge. The physical arrangement of conversational partners and referents may

also play a role by, for instance, affecting the relevance of pointing (Bangerter 2004).
– The speech content. The use of gestures is closely linked to spatial content and motor
activity (Alibali 2005; Hostetter and Alibali 2010). According to the Information
Packaging hypothesis, gesture production also depends on conceptual load (see
e.g., Hostetter, Alibali, and Kita 2007; Melinger and Kita 2007). The difficulty of re-
presenting multidimensional spatial layouts in linear speech is a particular instance of
conceptual load. On the comprehension side, gestures may facilitate understanding if
they refer to features such as size or spatial relationship, while other pieces of infor-
mation are not easily conveyed through gestures (Beattie and Shovelton 1999; see
also McNeill this volume).
– Timing and the issue of control. Inferences about mental architecture rely not only on
the analysis of frequencies (number of words and gestures, proportion of correct re-
sponses to audio- and audio-video messages), but also on the timing of events. It is
commonly assumed that speakers and listeners have to make choices among several
alternatives (forms of expression and divergent interpretations). Decision making is
not immediate and in the process of accumulating evidence some responses may be
more readily available than others. In Stroop-like interference paradigms, responses
are elicited from multi-dimensional stimuli in which two dimensions can be consistent
or inconsistent. For instance, Langton and Bruce (2000) presented records of an actor
saying “up” and “down” while pointing upwards or downwards and orienting the
head in the same or the opposite direction. Participants were asked to respond up
or down by focusing only on the verbal message, ignoring the visual signals. As ex-
pected, responses were faster when auditory and visual information was consistent.
When the two sources were in conflict the decision-making was slower. In dual-
task interference paradigms, experimenters compared performance in conditions in
which only one or two responses had to be selected. For instance, the participants
had to name a picture, perform an action associated with the picture, or perform
the action while naming the picture (e.g. Feyereisen 1997, 2007). Responses were
slower in the dual-task conditions, as is usual when attention has to be divided
between two simultaneous or successive activities. From such a perspective, speech
interruption or inhibition should facilitate the emergence of more elaborate gestures.
4. Conclusion
The various psycholinguistic architectures suggested to account for the use of language
and bodily movements in communication have in common the idea that exchange of
information involves multiple levels of analyses and can take several routes. There
are no reflex-like connections between the ideas (intentions) and their expressions (sig-
nals), as it is implied from the “mind-in-the-mouth assumption”, as Bock (1996) called
it. Instead, making decisions about the best way to reach a goal or about the best inter-
pretation of the situation involves complex computations that integrate factors of var-
ious kinds. Bodily communication is at the intersection of at least three domains: social
cognition, language use, and sensory-motor interactions with the environment. It is no
surprise that it can be seen from multiple perspectives.
5. References
Alibali, Martha W. 2005. Gesture in spatial cognition: Expressing, communicating and thinking
about spatial information. Spatial Cognition and Computation 5: 307–331.
Bangerter, Adrian 2004. Using pointing and describing to achieve joint focus of attention in
dialogue. Psychological Science 15: 415–419.
Barsalou, Lawrence W. 1999. Perceptual symbol systems. Behavioural and Brain Sciences 22:
577–660.
Barsalou, Lawrence W. 2008. Grounded cognition. Annual Review of Psychology 59: 617–645.
Barsalou, Lawrence W., Cynthia Breazal and Linda B. Smith 2007. Cognition as coordinated non–
cognition. Cognitive Processes 8: 79–91.
Bavelas, Janet B. and Nicole Chovil 2006. Nonverbal and verbal communication: Hand gestures
and facial displays as part of language use in face-to-face dialogue. In: Valerie L. Manusov
and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 97–115.
Thousand Oaks, CA: Sage.
Bavelas, Janet B., Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures.
Discourse Processes 15: 469–489.
Bavelas, Janet, Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing on the
telephone: Independent effects of dialogue and visibility. Journal of Memory and Language
58: 495–520.
Beattie, Geoffrey and Heather Shovelton 1999. Mapping the range of information contained in
the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social
Psychology 18: 438–462.
Bernardis, Paolo and Maurizio Gentilucci 2006. Speech and gesture share the same communica-
tion system. Neuropsychologia 44: 178–190.
Bock, Kathryn 1996. Language production: Methods and methodologies. Psychonomic Bulletin &
Review 3: 395–421.
Christiansen, Morten H. and Nick Chater 2001. Connectionist psycholinguistics: Capturing the
empirical data. Trends in Cognitive Sciences 5: 82–88.
Clark, Herbert H. 1996. Using Language. New York: Cambridge University Press.
Clark, Herbert H. 1997. Dogmas of understanding. Discourse Processes 23: 567–598.
Clark, Herbert H. and Meredyth A. Krych 2004. Speaking while monitoring addressees for under-
standing. Journal of Memory and Language 50: 62–81.
Cutler, Anne and Charles Clifton Jr. 1999. Comprehending spoken language: A blueprint of the
listener. In: Colin M. Brown and Peter Hagoort (eds.), The Neurocognition of Language, 123–
166. New York: Oxford University Press.
de Ruiter, Jan P. 2000. The production of gesture and speech. In: David McNeill (ed.), Language
and Gesture, 284–311. Cambridge: Cambridge University Press.
de Ruiter, Jan P. 2007. Postcards from the mind: The relationship between speech, imagistic ges-
ture, and thought. Gesture 7: 21–38.
Elman, Jeffrey L. 1995. Language as a dynamic system. In: Robert F. Port and Timothy Van Gelder
(eds.), Mind as Motion, 195–225. Cambridge, MA: Massachusetts Institute of Technology Press.
Feyereisen, Pierre 1997. The competition between gesture and speech production in dual-task
paradigms. Journal of Memory and Language 36: 13–33.
Feyereisen, Pierre 2006. How could gesture facilitate lexical access? Advances in Speech-Lan-
guage Pathology 8: 128–133.
Feyereisen, Pierre 2007. How do gesture and speech production synchronise? Current Psychology
Letters: Behaviour, Brain and Cognition 22(2). Published online on 9 July 2007. URL: http://
cpl.revues.org/document1561.html.
Feyereisen, Pierre and Isabelle Havard 1999. Mental imagery and production of hand gestures
during speech by younger and older adults. Journal of Nonverbal Behavior 23: 153–171.
Fodor, Jerry A., Thomas C. Bever and Merrill F. Garrett 1974. The Psychology of Language:
Introduction to Psycholinguistics and Generative Grammar. New York: McGraw Hill.
Garrett, Merrill F. 1988. Processes in language production. In: Frederick J. Newmeyer (ed.),
Linguistics: The Cambridge Survey – III. Language: Psychological and Biological Aspects,
Gentilucci, Maurizio and Riccardo Dalla Volta 2008. Spoken language and arm gestures are con-
trolled by the same motor control system. Quarterly Journal of Experimental Psychology 61:
944–957.
Gerwing, Jennifer and Janet Bavelas 2004. Linguistic influences on gesture’s form. Gesture 4:
157–195.
Goldin-Meadow, Susan 2003. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA:
Belknap Press of Harvard University Press.
Goldin-Meadow, Susan this volume. How our gestures help us learn. In: Cornelia Müller, Alan
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter
Mouton.
Hadar, Uri this volume. Coverbal gestures: Between communication and speech production. In:
Hostetter, Autumn B. 2011. When do gestures communicate? A meta-analysis. Psychological Bul-
letin 137: 297–315.
Hostetter, Autumn B. and Martha W. Alibali 2007. Raise your hand if you’re spatial: Relations
between verbal and spatial skills and gesture production. Gesture 7: 73–95.
Hostetter, Autumn B. and Martha W. Alibali 2008. Visible embodiment: Gestures as simulated
action. Psychonomic Bulletin & Review 15: 495–514.
Hostetter, Autumn B. and Martha W. Alibali 2010. Language, gesture, action! A test of the Ges-
ture as Simulated Action framework. Journal of Memory and Language 63: 245–257.
Hostetter, Autumn B., Martha W. Alibali and Sotaro Kita 2007. I see it in my hands’ eye: Represen-
tational gestures reflect conceptual demands. Language and Cognitive Processes 22: 313–336.
Iverson, Jana M. and Esther Thelen 1999. Hand, mouth and brain: The dynamic emergence of
speech and gesture. Journal of Consciousness Studies 6: 19–40.
Jacobs, Naomi and Alan Garnham 2007. The role of conversational hand gestures in a narrative
task. Journal of Memory and Language 56: 291–303.
Kimbara, Irene 2008. Gesture form convergence in joint description. Journal of Nonverbal Behav-
ior 32: 123–131.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.),
Language and Gesture, 261–283. Cambridge: Cambridge University Press.
Kita, Sotaro 2009. A model of speech-gesture production. In: Ezequiel Morsella (ed.), Expressing
One Self/Expressing One’s Self: Communication, Cognition, Language, and Identity, 9–22. Lon-
don: Taylor & Francis.
of speech and gesture reveal? Evidence for an interface representation of spatial thinking and
Kita, Sotaro, Asli Özyürek, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2007. Relations between syntactic encoding and co-speech gestures: Implications for a
model of speech and gesture production. Language and Cognitive Processes 22: 1215–1236.
multimodal thinking: Towards an integrated model of speech and gesture production. Interna-
tional Journal of Semantic Computing 2: 115–136.
Krahmer, Emiel and Marc Swerts 2007. The effects of visual beats on prosodic prominence:
Acoustic analyses, auditory perception and visual perception. Journal of Memory and
Language 57: 396–414.
Krauss, Robert M., Yihsiu Chen and Purmina Chawla 1996. Nonverbal behaviour and nonverbal
communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.), Ad-
vances in Experimental Social Psychology, Volume 28, 389–450. San Diego, CA: Academic Press.
Krauss, Robert, Yihsiu Chen and Rebecca F. Gottesman 2000. Lexical gestures and lexical access:
A process model. In: David McNeill (ed.), Language and Gesture, 261–283. Cambridge:
Krauss, Robert M. and Susan R. Fussell 1996. Social psychological models of interpersonal com-
munication. In: E. Tony Higgins and Arie W. Kruglanski (eds.), Social Psychology: Handbook
of Basic Principles, 655–701. New York: Guilford.
Krauss, Robert M. and Uri Hadar 1999. The role of speech-related arm/hand gestures in word
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech, and Sign, 93–116.
New York: Oxford University Press.
Langton, Stephen R.H. and Vicky Bruce 2000. You must see the point: Automatic processing of
cues to the direction of social attention. Journal of Experimental Psychology: Human Percep-
tion and Performance 26: 747–757.
Levelt, Willem J.M. 1989. Speaking: From Intention to Articulation. Cambridge, MA: Massachu-
setts Institute of Technology Press.
McClave, Evelyn 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
McClave, Evelyn 1998. Pitch and manual gestures. Journal of Psycholinguistic Research 27: 69–89.
McClave, Evelyn Z. 2000. Linguistic functions of head movements in the context of speech. Jour-
nal of Pragmatics 32: 855–878.
McNeill, David 1987. Psycholinguistics: A New Approach. New York: Harper and Row.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2000. Catchments and contexts: Non-modular factors in speech and gesture pro-
duction. In: David McNeill (ed.), Language and Gesture, 312–328. Cambridge: Cambridge
University Press.
McNeill, David this volume. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
Melinger, Alissa and Sotaro Kita 2007. Conceptualisation load triggers gesture production.
Language and Cognitive Processes 22: 473–500.
Melinger, Alissa and Willem J. M. Levelt 2004. Gesture and the communicative intention of the
speaker. Gesture 4: 119–141.
Morsella, Ezequiel and Robert M. Krauss 2004. The role of gestures in spatial working memory
and speech. American Journal of Psychology 117: 411–424.
Morsella, Ezequiel and Robert M. Krauss 2005. Muscular activity in the arm during lexical retrieval:
Implications for gesture-speech theories. Journal of Psycholinguistic Research 34: 415–427.
Nicoladis, Elena 2007. The effect of bilingualism on the use of manual gestures. Applied Psycho-
linguistics 28: 441–454.
Pickering, Martin J. and Simon Garrod 2004. Toward a mechanistic psychology of dialogue.
Behavioral and Brain Sciences 27: 169–226.
Shannon, Claude E. and Warren Weaver 1949. The Mathematical Theory of Communication.
Urbana, IL: University of Illinois Press.
Smith, Linda B. 2005. Cognition as a dynamic system: Principles from embodiment. Developmen-
tal Review 25: 278–298.
Smith, Linda B. and Larissa K. Samuelson 2003. Different is good: Connectionism and dynamic
systems theory are complementary emergentist approaches to development. Developmental
Science 6(4): 434–439.
Sowa, Timo, Stefan Kopp, Susan Duncan, David McNeill and Ipke Wachsmuth 2008. Implement-
ing a non-modular theory of language production in an embodied conversational agent. In:
Ipke Wachsmuth, Manuela Lenzen and Guenther Knoblich (eds.), Embodied Communication
in Humans and Machines, 425–449. New York: Oxford University Press.
van Gelder, Tim and Robert F. Port 1995. It’s about time: An overview of the dynamical approach
to cognition. In: Robert F. Port and Tim van Gelder (eds.), Mind as Motion: Explorations in the
Dynamics of Cognition, 1–43. Cambridge, MA: Massachusetts Institute of Technology Press.
Wagner, Susan M., Howard Nusbaum and Susan Goldin-Meadow 2004. Probing the mental repre-
sentation of gesture: Is handwaving spatial? Journal of Memory and Language 50(4): 395–407.
Pierre Feyereisen, Louvain-la-Neuve (Belgium)
10. Neuropsychology of gesture production

1. The methodological challenge to investigate the neuropsychology of the production of
spontaneous gestures
2. Kimura’s neuropsychological theory on the left hemispheric specialization for
communicative gesture production
3. Evidence for a right hemispheric generation of gestures
4. Specific hand preferences for different gesture types
5. Hemispheric specialization for the production of different gesture types
6. Conclusion
7. References
Abstract
This chapter focuses on where in the brain – in terms of the right and left hemispheres –
co-speech gestures are generated. This question is not only of neurobiological relevance
but its investigation also provides an empirical basis to explore with what kind of cogni-
tive and emotional processes in the two hemispheres gesture generation may be asso-
ciated. First addressed are the methodological difficulties and the approaches currently
used to empirically investigate the question of hemispheric specialization in the produc-
tion of gestures. Second, an empirically grounded theory proposing a left hemispheric
generation of co-speech gestures is presented and critically discussed. The left hemisphere
proposition is contrasted by empirical data providing evidence for a right hemispheric
generation of gestures. The chapter concludes with a proposition on the distinct roles
of the right and left hemispheres in gesture production.
10. Neuropsychology of gesture production 169
1. The methodological challenge to investigate the

neuropsychology of the production of spontaneous gestures
Little is known about the neuropsychology of the production of communicative
gestures, which are spontaneously displayed in communicative situations or when talk-
ing to oneself. In these situations, many different types of communicative gestures in
this example, as classified by occur, such as deictics, batons, ideographics, iconographics,
kinetographics, or emblems (Efron [1941] 1972).
The neurobiological correlates of the production of these spontaneous hand gestures
are difficult to investigate, as most of these gestures are generated implicitly, i.e., beyond
the gesturer’s awareness. In contrast, explicit gestures, which are generated within the
gesturer’s awareness, can be more readily subject to empirical investigations, because
they can be executed on command. The investigation of the neurobiological correlates
of explicit gestures now profits from the great progress in the development of neuro-
imaging methods. However, these methods are not suited for the investigation of the
production of spontaneous, implicitly generated gestures, because neuroimaging
investigations require that gestures are generated on command in time and repetitively
and, in order to prevent movement artifacts, that the gesturer is immobile except for
movements of the lower arms. The examination of gestures, which are executed on
command, is only of limited value for understanding the production of spontaneous ges-
tures, because the neurobiological correlates of explicit gesture production differ from
those of implicit gesture production (Bogen 2000; Buxbaum et al. 1995; Geschwind
et al. 1995; Lausberg et al. 1999; Lausberg et al. 2003; Liepmann and Maas 1907;
Marangolo et al. 1998; Rapcsak et al. 1993; Tanaka et al. 1996; Watson and Heilman
1983). While implicit gestures can be generated in the right hemisphere or in both hemi-
spheres, there is a left hemispheric specialization for many components of the explicit
gesture production. Therefore, despite the current progress in neuroimaging techniques,
the state of the art to examine the contribution of the right and left hemispheres to the
production of spontaneously displayed gesture types is to examine hand preferences in
healthy subjects or in split-brain patients.
The anatomical basis to infer hemispheric specialization from hand preferences is
that the left hemisphere controls the (contralateral) right hand, and vice versa, the
right hemisphere the (contralateral) left hand. In subjects with normal neural connec-
tion, the corpus callosum, which is the biggest neural fiber connection between the
right and left hemispheres, enables to exert control also over the ipsilateral hand. As
an example, if a right-handed person with left hemisphere language dominance intends
to write with the left hand, the command is send from the language-competent left
hemisphere via corpus callosum to the right hemisphere, which controls the left hand.
Split-brain patients offer a unique opportunity to examine hemispheric specializa-
tion in the production of spontaneous gestures, because in these patients the corpus cal-
losum is sectioned (in this chapter, the term “split-brain patients” will be used for
patients with complete callosal disconnection independently of the cause, i.e., operation
or stroke). In most cases, the operation is conducted in patients with intractable epi-
lepsy in order to prevent that epileptic seizures spread from one hemisphere to the
other. It is noteworthy, that this operation is rarely indicated, therefore only a few
split-brain subjects are available for studies. After callosal disconnection, each hand
can distinctly be controlled only by the contralateral hemisphere (Gazzaniga, Bogen,
and Sperry 1967; Lausberg et al. 2003; Sperry 1968; Trope et al. 1987; Volpe et al. 1982).
As a result, the actions of the right and left hands reflect competence or incompetence
of the contralateral hemisphere. As an example, as the left hemisphere is language
dominant, these patients cannot execute verbal commands with the left hand, which
is controlled by right hemisphere (left hand apraxia). In contrast, the split-brain pa-
tient’s right hand performs worse than the left hand in copying figures (right hand con-
structional apraxia), because the right hand is disconnected from the right hemispheric
visuo-spatial competence (e.g. Bogen 1993; Lausberg et al. 2003). Therefore, studies on
spontaneous hand preferences in split-brain patients provide valuable information
about the neurobiological correlates of the production of different gesture types.
Likewise, in healthy subjects spontaneous hand preferences reflect the activation of
the contralateral hemisphere (Hampson and Kimura 1984; Verfaellie, Bowers, and
Heilman 1988). Hampson and Kimura (1984) observed in right-handed healthy subjects
a shift from right hand use in verbal tasks toward greater left hand use in spatial tasks.
They suggested that the problem-solving hemisphere preferentially uses the motor
pathways, which originate intrahemispherically. Consequently, the right hemisphere
that primarily solves the spatial task employs the contralateral left hand. Indeed, in
behavioral laterality experiments, when resources are sufficient for both decision and
response programming, there is an advantage to responding with the hand controlled
by the same hemisphere that performs the task (Zaidel et al. 1988). In the same line,
right-handers prefer the left hand for self-touch gestures (Lausberg, Sassenberg, and
Holle submitted). Self-touch gestures are displayed when individuals are stressed or
emotionally engaged (Freedman and Bucci 1981; Freedman et al. 1972; Freedman
and Hoffmann 1967; Freedman 1972; Lausberg 1995; Lausberg and Kryger 2011;
Sainsbury 1955; Sousa-Poza and Rohrberg 1977; Ulrich 1977; Ulrich and Harms
1985). Further, the right hemisphere is activated more than the left during emotionally
loaded or stressful situations (Ahern and Schwartz 1979; Berridge et al. 1999; Borod
et al. 1998; Grunwald and Weiss 2007; Killgore and Yurgelun-Todd 2007; Stalnaker,
Espana, and Berridge 2009). Thus, the left hand preference for self-touch reflects the
right hemispheric activation during emotional engagement. It is noteworthy that if
the body-focused activity includes the manipulation of body-attached objects, such as
playing with a necklace, there is a significant right-hand preference (Lausberg, Sassen-
berg, and Holle submitted). This concurs well with the left hemispheric dominance for
tool use (see 5. for detailed discussion).
However, in healthy subjects, if required the intact corpus callosum enables each
hemisphere to exert control over the ipsilateral hand. Therefore, in studies on healthy
subjects other factors have to be ruled out, which might require the use of a specific
hand. These are as follows:
(i) handedness; recently, handedness is considered to be a multidimensional trait

(Brown et al. 2006; Corey, Hurley, and Foundas 2001; Healey, Liedermann, and
Geschwind 1986; Wang and Sainburg 2007). Right-handers typically show a left
hand preference for movements, which rely on the axial musculature and involve
strength and secure the accurate final position, while they prefer the right hand for
movements, which require dexterity and fine motor coordination and control of
trajectory speed and direction;
(ii) a semantic purpose, such as when talking about the left or right of two objects
(Lausberg and Kita 2003);
(iii) cultural conventions, such as when Arrente speakers in Central Australia use
the left hand to refer to targets that are on the left and vice versa (Wilkins and de
Ruiter 1999). Likewise, in explicit gesture production, right-handers make 68%
of reaches to left hemispace with the left hand (Bryden, Pryde, and Roy 2000);
(iv) an occupation of the right hand with some other physical activity, such as holding a
cup of coffee.
If these factors are controlled for, in empirical studies on healthy subjects
spontaneous hand preferences are a good indicator of hemispheric specialization.
2. Kimura’s neuropsychological theory on the left hemispheric

specialization for communicative gesture production
An important suggestion in the literature is Kimura’s hypothesis (1973a, 1973b) that
hand preference for free movements, i.e., communicative gestures, is determined by
language lateralization in the two hemispheres. Kimura (1973a) noted that right-
handers preferred the right hand in free movements that accompany speech (left :
right ratio 10:31). Among left-handers, those with right ear advantage and inferred
left hemisphere language used both hands for speech-accompanying gestures with a
slight left hand preference (left : right ratio 48:42), and those with left ear advantage
clearly preferred the left hand (left : right ratio 83:29) (Kimura 1973b). Thus, Kimura
proposed that hand preference for communicative gestures was determined by speech
lateralization in the cerebral hemispheres. More explicitly, right-handers prefer the
right hand for communicative gestures because their left hemisphere controls certain
oral (i.e., speech) as well as brachial movements (i.e., communicative gestures)
(Lomas and Kimura 1976). However, in her theory, Kimura did not clearly distinguish
between language comprehension, language production, and motor control of oral
movements. She infers a lateralization of the control of “certain oral movements” in
right-handers from dichotic examination. However, this provides indirect evidence
at best, since dichotic listening tasks usually measure phonetic perception, which, in
turn, may be associated with auditory language comprehension more than with
other language functions (Zaidel, Clarke, and Suyenobu 1990). Kimura did not con-
sider handedness as a factor that could influence hand preference for communicative
gestures. She rejected the interpretation that language lateralization and handedness
as independent additive factors would explain the patterns in the three groups. In
her opinion, this assumption would not sufficiently explain the high number of right
hand gestures in left-handers with left ear advantage. Instead, she suggested that in
left-handers, both hands are used for gesticulation because of bilateral language rep-
resentation, “where speech is not unilaterally organized, gesturing should also be man-
ifested less unilaterally” (Kimura 1973b: 54). Since Kimura further suggested that
gestures and speech were controlled by a common system, her hypothesis concerning
left-handers would, strictly speaking, imply that if a left-hander gestured with his left
hand he would rely on right hemisphere language and if he “spoke” with his left hemi-
sphere he would use his right hand for gestures that accompany speech. This propo-
sition would require further investigation. In addition, in Kimura’s interpretation
of the data, the exclusion of handedness as an independent additive factor is not

well-founded.
A further limitation of Kimura’s theory is the lack of an explanation for left hand
gestures in right-handers. Even in those studies, which report a statistically significant
right hand preference for right-handers, the percentage of left hand gestures in uniman-
ual gestures ranges between 25 and 39% (Dalby et al. 1980; Kimura 1973a; Lavergne
and Kimura 1987; Souza-Poza, Rohrberg, and Mercure 1979; Stephens 1983). In addi-
tion, in several studies on hand use in communicative gestures, either right hand pref-
erence was not significant (Lausberg and Kita 2003; Lavergne and Kimura 1987), or an
equally frequent use of the right and left hands was reported (Blonder et al. 1995;
Ulrich and Harms 1979).
To summarize, there is ample evidence to question Kimura’s theory that speech and
gestures are controlled by a common motor system that in right-handers is located in
the left hemisphere.
3. Evidence for a right hemispheric generation of gestures

Since David McNeill’s original idea to examine communicative gestures in split-brain
patients up to now, communicative gestures have been analyzed in five subjects: N.G.
(Lausberg et al. 2007; McNeill 1992; McNeill and Pedelty 1995), L.B. (McNeill 1992;
McNeill and Pedelty 1995), A.A. and G.C. (Lausberg et al. 2007), and U.H. (Lausberg,
Davis, and Rothenhäusler 2000; Lausberg et al. 2007). The right-handed patients A.A.,
N.G. (and presumably L.B.) had exclusively left hemispheric language production and
left hemisphere praxis dominance. The same applies to U.H., but he was left-handed.
G.C. had bihemispheric language production and he suffered from intermanual conflict.
Therefore, he sometimes even sat on his left hand and exclusively used the right hand to
prevent an intermanual conflict. This, in turn, influenced his overall pattern of hand
preference for communicative gestures.
As stated above, in split-brain patients the right hand can only be controlled dis-
tinctly by the left hemisphere and vice versa, the left hand by the right hemisphere. Ac-
cording to Kimura’s theory, namely that speech and gestures are controlled by a
common motor system that in right handers is located in the left hemisphere, split-
brain patients with left hemispheric language production should exclusively use the
right hand for communicative gestures. However, the split-brain studies reveal that
this is not the case. In McNeill’s transcript on N.G. and L.B. (1992) it is documented
that they gestured with both hands. A systematic investigation of hand preferences in
split-brain patients (Lausberg et al. 2007) evidenced a retest-reliable left-hand prefer-
ence for communicative gestures in the three patients A.A., N.G., and U.H., who had
left hemisphere language dominance and left hemispheric specialization for praxis.
The left-hand preference for communicative gestures cannot be interpreted as an
unusual pattern of hand preferences resulting from callosal disconnection because sim-
ilar patterns of hand preferences were observed in the two study control groups. The
patients with complete callosal disconnection did not differ from patients with partial
callosotomy and from healthy subjects with respect to number of total communicative
gestures per minute, number of right hand gestures per minute, number of left hand
gestures per minute, number of bimanual gestures per minute, and asymmetry ratio
scores (Kimura 1973a) as a measure of hand preference. Thus, despite the fact that
the split-brain patients were unable to use the left hand on verbal command (left verbal
apraxia), they spontaneously preferred the left hand for communicative gestures. This
data evidences that spontaneous communicative gestures can be generated in the
right hemisphere independently from left hemispheric speech production.
Hence, the question arises what determines the hand choice for spontaneous com-
municative gestures. Thus far, factors other than speech dominance or handedness
have rarely been investigated systematically. In the following sections, I will argue
that the hand choice for different gesture types reflects the hemispheric lateralization
of different cognitive and emotional functions, which the generation of the specific ges-
ture types is associated with. The following review of empirical studies reveals distinct
patterns of hand preferences for the different gesture types.
4. Specific hand preferences for different gesture types

The review of studies on hand preferences for gesture types is complicated by the fact
that different researchers apply different gesture coding systems, the types of which
show only partial conceptual overlap. Therefore, in the following review I will use
Efron’s seminal coding system (1972) as a frame of reference. It comprises the follow-
ing categories: baton (emphasizing the beat pattern of the speech), deictic (pointing
to a real or imagined object or indicating a direction), emblematic gestures (conven-
tional signs having specific linguistic translation), physiographic with the subtypes
iconograph (depicting a form) and kinetograph (depicting a movement), and ideo-
graphic (sketching a thought pattern). Please note that there is only a partial overlap
between McNeill’s concept of metaphorics and Efron’s concept of ideographics. As
there is no match for Efron’s category of ideographics in other coding systems, if nec-
essary in the following review this gesture type is summed up with physiographics
under the term pictorial gestures.
McNeill’s gesture type analysis of the split-brain patients N.G. and L.B. showed that
beats emphasizing prosody (in Efron’s terminology: batons) were performed mainly
(N.G.) or exclusively (L.B.) with the left hand, whereas iconic gestures that pictured
the verbal content (Efron’s physiographics) were performed exclusively (N.G.) and
mainly (L.B.) with the right hand (McNeill 1992; McNeill and Pedelty 1995). In the
study by Lausberg et al. (2007), the four split-brain patients A.A., N.G., G.C., and
U.H. produced batons and tosses (definition: Short up-down or circular movement of
hand with upward or outward accent) more often with the left hand than with the
right hand. Likewise, unilateral shoulder shrugs were displayed more often with
the left shoulder than with the right. In contrast, pantomime gestures (definition: The
speaker-gesturer him/herself pretends to perform a motor action, often referring to
the use of a tool, e.g. tooth brushing, or the direct manipulation of an object or in adap-
tation to an imaginary surroundings) were displayed more often with the right hand in
all four patients. In G.C. und U.H, deictics/directional gestures were produced more
often with the right hand, whereas N.G. produced them more often with the
left hand. Further analysis revealed that N.G. consistently used the right hand when
pointing to the right and the left hand when pointing to the left. A similar trend as
in N.G. was found in U.H.. A.A. produced deictics/directional gestures infrequently
with no hand preference. For physiographics, there was a right hand preference in
G.C., N.G., and U.H., whereas for ideographics, there was a clear left hand preference
in N.G. and U.H.. A context analysis was conducted only on U.H. (Lausberg, Davis, and
Rothenhäusler 2000). U.H.’s left hand pictorial gestures occurred in speech pauses and
reflected the ideational process (ideographics). In contrast, the right hand was used
exclusively for pictorial gestures, which matched the verbal utterance semantically
(physiographics) and temporally. U.H.’s unilateral shrugs of the left shoulder occurred
frequently and in a context of lack of knowledge and resignation, whereas the rare uni-
lateral shrugs of the right shoulder were performed when talking about the “right side”.
Furthermore, his right hand deictics only referred to the external space, whereas the left
hand deictics occurred when the patient referred to himself.
Analogous hand preferences for the specific gesture types are observed in healthy
subjects. Souza-Poza, Rohrberg and Mercure (1979) reported that in right-handers a
right hand preference was only significant for the representational gestures (includes
all of Efron’s types except for batons), but not for the nonrepresentational gestures
(Efron’s baton). Stephens (1983) found a significant right hand preference for iconics,
a non-significant right hand preference for metaphorics, as well as a non-significant
left hand preference for beats (Efron’s baton). In a study by Blonder et al. (1995), a
right-handed control group showed a trend towards more right hand use for symbolic
gestures (Efron’s emblematic), whereas the left hand was used more often for expres-
sive gestures (Efron’s baton). In Foundas et al. (1995) study, a right-handed control
group showed a significant right hand preference for content gestures (includes all of
Efron’s types except for batons and partly ideographic) and for emphasis gestures
(Efron’s baton) as well as a right hand trend for fillers (overlap with Efron’s ideo-
graphic). Kita, de Condappa and Mohr (2007) reported a significant right hand prefer-
ence for deictics (idem to Efron) and for depictive gestures (includes Efron’s
ideographic and physiographic), except for those depictive gestures that had a character
viewpoint in a metaphor condition. For deictics, a right hand preference has been re-
ported in healthy adults (Kita, de Condappa, and Mohr 2007; Wilkins and de Ruiter
1999) and in infants and toddlers (Bates et al. 1986; Vauclair and Imbault 2009). In a
recent study by Lausberg, Sassenberg and Holle (submitted), a distinct pattern of
hand preferences for different gesture types was found in 37 right-handed participants.
In order to collect a broad spectrum of data, the participants were examined in two dif-
ferent communicative settings, i.e., during narrations of everyday activities and during
semi-standardized interviews with personal topics. No hand preferences were found for
self-deictics, body-deictics, directions, iconographics, batons, back-tosses, palm-outs,
shrugs, and emblems. While there was a significant left hand preference for self-touch
(see 1.), a significant right-hand preference was found for pantomimes, positions (Def-
inition: The hand positions an imaginary object/subject at a specific location in an imag-
inary scene, which is projected into the gesture space.), traces (definition: The hand
traces an imaginary line or contour), deictics to external loci, kinetographs, and
body-attached object manipulation.
5. Hemispheric specialization for the production of different

gesture types
The pattern of hand preferences as reported above for the different gesture types
cannot be explained by handedness, because in all cited studies right-handers were
investigated (only the split-brain patient U.H. was a left-hander, but his left-handedness
did not substantially alter his pattern of hand preference for the different gesture types
as compared to the right-handed patients A.A. and N.G.). The current theoretical mod-
els of handedness as a multidimensional trait serve to explain the complementary func-
tions of the right and left hands during tool use and object manipulation, but they
provide only little explanation for the hand choice in spontaneous gestures in commu-
nicative situations. The exception is that right-handers might prefer the right hand for
those gesture types, which require a high degree of fine motor coordination and mod-
ulation of speed or direction. However, the execution of the right hand preference ges-
ture types, such as deictics, pantomimes, positions, traces, kinetographs, or iconographs
(see 4.) does not require more fine motor coordination than the execution of the gesture
types with no hand preference, such as self-deictics, body-deictics, directions, ideo-
graphics, metaphorics, batons, back-tosses, palm-outs, shrugs, and emblems. In other
words, the fact that the right and left hands are used to execute the latter gesture
types is not due to kinesi simplicity. Thus, handedness cannot sufficiently explain the
hand preferences for the different types of hand gestures, which are spontaneously dis-
played in communicative situations. With this viewpoint, I partly agree with Kimura
(1973a), who rejected handedness as an independent additive factor that could influ-
ence the hand choice for free movements and self-touch. However, I suggest that
especially in the case of kinesically complex gestures handedness is a co-factor that
influences the hand choice.
The following paragraphs focus on the relation between hand preferences for the dif-
ferent gesture types and hemispheric lateralization for different cognitive and emo-
tional functions, such as emotional processes, prosody, metaphorical thinking, and the
tool use competence. It is noteworthy that there is a group of gesture types, i.e., batons,
tosses, self-deictics, and shrugs, for which split-brain patients show a clear left hand pref-
erence and for which the right-handed healthy subjects show either a trend towards
more left hand use or no hand preference. This suggests that in right-handed healthy
subjects the effect of the right hemisphere generation of these gesture types, which
is strongly suggested by the split-brain data, is attenuated because the intact corpus
callosum potentially enables the right-handers to use the more dexterous right hand.
For batons, no hand preference or a trend towards more left hand use was found
(Blonder et al. 1995; Lausberg et al. 2007; Lausberg, Sassenberg, and Holle submitted;
McNeill 1992; Souza-Poza, Rohrberg, and Mercure 1979; Stephens 1983). The same
applies to back-tosses, which set rhythmical accents just like batons (Lausberg et al.
2007; Lausberg, Sassenberg, and Holle submitted). The exception was Foundas et al.
(1995), who reported a right-hand preference for emphasis gestures in 12 healthy sub-
jects. Thus, two assumptions concerning the neuropsychology of batons and tosses are
forwarded here:
(i) both hemispheres are equally competent to execute batons and tosses; or
(ii) the influence of right-handedness was attenuated by the right-hemispheric prosodic
contribution to generation of these gesture types.
Indeed, as batons and tosses emphasize prosody, it can be hypothesized that their pro-
duction is associated with the right hemispheric specialization for the production of
emotional prosody and a contribution to prosodic fundamental frequency (e.g. Schirmer
et al. 2001). Furthermore, in a functional magnetic resonance imaging (fMRI) study

the right hemisphere planum temporale was identified as a region of beat/speech inte-
gration during perception (Hubbard et al. 2009). This interpretation of a major right
hemispheric contribution to the production of batons contrasts McNeill’s position
that “non-imagistic beats” are generated in the “image-poor, language-rich left cerebral
hemisphere” (1992: 345–7). McNeill’s conclusion is surprising, because his investigation
of N.G. and L.B. demonstrated that they mainly or even exclusively used the left hand
for beats.
Deictics are displayed more often with the right hand than with the left hand (Bates
et al. 1986; Kita, de Condappa, and Mohr 2007; Lausberg et al. 2007; Vauclair and Im-
bault 2009; Wilkins and de Ruiter 1999). Unfortunately, most of the studies do not
report the target of the deictic (left space, right space, body part, the self), which
makes it difficult to link the grove category of deictics to a specific cognitive function.
In explicit gesture production, right-handers make 68% of reaches to left hemispace
with the left hand (Bryden, Pryde, and Roy 2000) and Arrente speakers in Central
Australia conventionally use the left hand to refer to targets that are on the left (Wilkins
and de Ruiter 1999). In contrast to the right hand preference for deictics to external
loci, no hand preference was found for body-deictics (definition: Pointing to a body
part, often accompanied by gaze at the respective body part) and self-deictics (defini-
tion: Pointing to the sternum or chest, not accompanied by gaze) (Lausberg, Sassen-
berg, and Holle submitted). In a previous study on a split-brain patient, it was
reported that the patient used his right hand for deictics, which referred to the external
space, whereas the left hand deictics occurred when the patient pointed to himself (self-
deictic) (Lausberg et al. 1999). Thus, it could be assumed that the right-handers’ ten-
dency to use the right hand for deictics is attenuated, when he/she refers to him-/herself.
It is plausible that the generation of self-deictics is associated with more emotional
engagement than the generation of most of the deictics, which refer to the external
space. There is ample evidence that the right hemisphere plays the dominant role
in emotional processing (Ahern and Schwartz 1979; Borod et al. 1998; Killgore and
Yurgelun-Todd 2007), and it is well established that emotional expression is stronger
on the left side of the face than on the right (Borod, Koff, and White 1983; Borod
et al. 1998; Moscovitch and Olds 1982). Likewise, Moscovitch and Olds (1982) docu-
mented a shift towards more left hand use for communicative gestures, which were
accompanied by an emotional facial expression, as compared to communicative ges-
tures with no concurrent facial expression. In the same vein, Sousa-Poza, Rohrberg,
and Mercure (1979) reported a shift towards more left hand use for communicative re-
presentational gestures when talking about personal topics versus unpersonal topics.
Thus, the right hemispheric emotional engagement, which seems to be associated
with the generation of these gesture types, induces a shift toward more left hand use.
Despite the fact that shoulder shrugs can be controlled by ipsilateral motor path-
ways, in a previous study on split-brain patients a left side preference for unilateral
shrugs was documented (Lausberg et al. 2007). In healthy subjects, no side preference
was found for unilateral shrugs. Shrugs are interactive signs with an emotional conno-
tation, which often occur in the context of helplessness and resignation (Darwin [1890]
2009; Johnson, Ekman, and Friesen 1975). Thus, it is plausible that their generation is
related to the right hemispheric specialization for emotional expression. Likewise,
the split-brain patients U.H.’s unilateral shrugs of the left shoulder occurred frequently
and in a context of lack of knowledge and resignation, whereas the rare unilateral
shrugs of the right shoulder were performed when talking about the “right side” (Laus-
berg, Davis, and Rothenhäusler 2000).
The significant right hand preference for pantomime gestures (Lausberg et al. 2007;
Lausberg, Sassenberg, and Holle submitted) is in line with lesion studies and functional
neuroimaging studies, which demonstrate that the left hemisphere plays a central role in
the generation of pantomime gestures on command in right-handers and left-handers.
Split-brain patients demonstrated a left hand callosal apraxia when pantomiming on
command to visual presentation of tools (Lausberg et al. 2003). Furthermore, patients
with left hemisphere damage were more impaired in pantomiming tool use on com-
mand than right hemisphere damaged patients (De Renzi, Faglioni, and Sorgato
1982; Hartmann et al. 2005; Liepmann and Maas 1907). Neuroimaging studies demon-
strated that independently of whether the right or left hand is used, pantomime is ac-
companied by left hemisphere activation (Choi et al. 2001; Hermsdörfer et al. 2007;
Johnson-Frey, Newman-Norlund, and Grafton 2005; Moll et al. 2000; Ohgami et al.
2004; Rumiati et al. 2004). Lausberg et al. (2003) suggested that the generation of
pantomime gestures relies on the specifically left hemispheric competence to link the
movement concept for tool use with the mental representation of the tool.
There would be some reasons to assume a left hand preference for trace gestures,
because visuo-constructive abilities are localized in the right hemisphere. Split-brain
patients spontaneously choose the left hand for visuo-motor tasks (Graff-Radford,
Welsh, and Godersky 1987; Lausberg et al. 2007; Sperry 1968;) and they show better
performances with the left hand than with the right in these kind of tasks, e.g. when
drawing the Taylor figure. Furthermore, in right-handed healthy subjects Hampson
and Kimura (1984) observed a shift from right hand use in verbal tasks toward greater
left hand use in spatial tasks. Likewise, for position gestures a right hemisphere advan-
tage could be assumed because the right hemisphere is specialized for the conceptu-
alization of imaginary scenes in the whole gesture space, while the left hemisphere
neglects the gesture space left of the subject’s body midline (Lausberg et al. 2003).
However, for both gesture types, trace and position, a significant right hand preference
was evidenced in the recent study by Lausberg, Sassenberg, and Holle (submitted).
With regard to gesture type phenomenology, it could be hypothesized that trace and
position gestures are derived from pantomime gestures. Trace gestures can be re-
garded as body-part-as-object pantomimes, i.e., as pantomiming “drawing” with the
index used as if it were a pen (Alternatively, in an evolutionary scenario it could hy-
pothesized that first the finger was used and then it was replaced by a pen [C. Müller,
personal communication]). In the same vein, the position gesture could be the panto-
mime of placing something. Thus, trace and position gestures might originate from
pantomimes or even further from tool use or direct object manipulation. However,
in contrast to the actual pantomime gestures, in which the gesturer pretends to act,
e.g. “I am drawing” or “I am positioning”, the gestural message of trace and position ges-
tures focuses on the contour, which is created, e.g. “a square”, or on the position, which is
marked in an imaginary scene, e.g. “here is [the church], and behind it, there is [the super
market]”. The present data indicate that despite the fact that the gestural information is
primarily of spatial nature, the origin of the gesture type, which is here the left hemi-
spheric function pantomiming, or in other words, the “gestural mode of representation”
(Müller 2001) overruns the impact of the right hemispheric spatial contribution.
The data comparison for pictorial gestures is complicated by the fact that researchers
here use quite different concepts (Efron: iconographics, kinetographics, ideographics;
McNeill: iconic, metaphoric). In general, for pictorial gestures, there is a significant
right hand preference (Foundas et al. 1995; McNeill 1992; Lausberg et al. 2007; Souza-
Poza, Rohrberg, and Mercure 1979; Stephens 1983). However, metaphoric use (Kita,
de Condappa, and Mohr 2007; Stephens 1983) and ideographic use (Lausberg et al.
2000; Lausberg et al. 2007) induce a shift toward more left hand use. This observation
concurs with the dominant role of the right hemisphere for the processing of
conventionalized metaphors (e.g. Ferstl et al. 2008; Mashal and Faust 2009).
6. Conclusion
The split-brain data provide evidence that spontaneous communicative gestures can be
generated in the right hemisphere independently from left hemispheric speech produc-
tion. Furthermore, split-brain patients as well as healthy subjects show distinct hand
preferences for specific gesture types. While right-handers prefer the right hand for
deictics, pantomimes, traces, positions, and for concrete pictorial gestures, they prefer
the left hand or no hand for self-deictics, batons, tosses, shrugs, and ideographics/meta-
phorics. Neither handedness nor speech-lateralization can explain the distinct pattern of
hand preferences. Instead, I argue that the hand preferences for the different gesture
types reflect the lateralization of cognitive and emotional functions in the left and
right hemispheres, which are associated with the production of these gesture types.
Some of the right hand preference gesture types are characterized by a close relation
to tool use, which is a primarily left hemisphere competence. In contrast, the left
hand preference gesture types are related to the right hemispheric competences for
prosody, emotional processes, and metaphorical thinking. The substantial right hemi-
spheric contribution to the generation of these gesture types attenuates the right-han-
ders’ right hand preference or even induces a left preference. Thus, I suggest that some
gestures, which are spontaneously displayed in communicative situations, directly
emerge from right hemispheric emotional processes, processes underlying prosody,
and metaphorical thinking.
7. References
Ahern, Geoffrey L. and Gary E. Schwartz 1979. Differential lateralization for positive versus neg-
ative emotion. Neuropsychologia 17: 693–698.
Bates, Elizabeth, Barbara O’Connel, Jyotsna Vaid, Paul Sledge and Lisa Oakes 1986. Language
and hand preference in early development. Developmental Neuropsychology 44: 178–190.
Berridge, Craig W., Elizabeth Mitton, William Clark and Robert H. Roth 1999. Engagement in a
non-escape (displacement) behavior elicits a selective and lateralized suppression of frontal
cortical dopaminergic utilization in stress. Synapse 32: 187–197.
Blonder, Lee Xenakis, Allan F. Burns, Dawn Bowers, Robert W. Moore and Kenneth M. Heilman
1995. Spontaneous gestures following right hemisphere infarct. Neuropsychologia 33: 203–213.
Bogen, Joseph E. 1993. The callosal syndromes. In: Kenneth M. Heilman and Edward Valenstein
(eds.), Clinical Neuropsychology, 337–408. New York: Oxford University Press.
Bogen, Joseph E. 2000. Split-brain basics: Relevance for the concept of one’s other mind. Journal
of the American Academy of Psychoanalysis 28: 341–369.
Borod, Joan C., Elissa Koff and Betsy White 1983. Facial asymmetry in posed and spontaneous
expressions of emotion. Brain and Cognition 2: 165–175.
Borod, Joan C., Elissa Koff, Sandra Yecker, Cornelia Santschi and Michael Schmidt 1998.
Facial asymmetry during emotional expression: Gender, valence, and measurement technique.
Neuropsychologia 36(11): 1209–1215.
Brown, Susan G., Eric A. Roy, Linda Rohr and Pamela J. Bryden 2006. Using hand performance
measures to predict handedness. Laterality 11(1): 1–14.
Bryden, Pamela J., Kelly M. Pryde and Eric A. Roy 2000. A performance measure of the degree of
hand preference. Brain and Cognition 44: 402–414.
Buxbaum, Laurel J., Myrna F. Schwartz, Branch H. Coslett and Tania G. Carew 1995. Naturalistic
action and praxis in callosal apraxia. Neurocase 1: 3–17.
Choi, Seong Hye, Duk L. Na, Eunjoo Kang, Kyung Min Lee, Soo Wha Lee and Dong Gyu Na
2001. Functional magnetic resonance imaging during pantomiming tool-use gestures. Experi-
mental Brain Research 139(311): 311–317.
Corey, David M., Megan M. Hurley and Anne L. Foundas 2001. Right and left handedness de-
fined: a multivariate approach using hand preference and performance. Neuropsychiatry, Neu-
ropsychology, Behavioural Neurology 14(3): 144–152.
Dalby, J. Thomas, David Gibson, Vittorio Grossi and Richard Schneider 1980. Lateralized hand
gesture during speech. Journal of Motor Behaviour 12: 292–297.
Darwin, Charles R. 2009. The Expression of the Emotions in Man and Animals, 2nd edition. London:
Penguin Group. First published [1890].
De Renzi, Ennio, Pietro Faglioni and P. Sorgato 1982. Modality-specific and supramodal mechan-
isms of apraxia. Brain 105: 301–312.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published [1941].
Ferstl, Evelyne C., Jane Neumann, Carsten Bogler and D. Yves von Cramon 2008. The extended
language network: a meta-analysis of neuroimaging studies on text comprehension. Human
Brain Mapping 29(5): 581–593.
Foundas, Anne L., Beth L. Macauley, Anastasia M. Raymer, Lynn M. Maher, Kenneth M. Heil-
man and Lesley J. G. Rothi 1995. Gesture laterality in aphasic and apraxic stroke patients.
Brain and Cognition 29: 204–213.
Freedman, Norbert 1972. The analysis of movement behavior during clinical interview. In: Aron
W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175. New York:
Pergamon Press.
Freedman, Norbert and Wilma Bucci 1981. On kinetic filtering in associative monologue. Semio-
tica 34(3/4): 225–249.
Freedman, Norbert and Stanley P. Hoffmann 1967. Kinetic behaviour in altered clinical states:
Approach to objective analysis of motor behaviour during clinical interviews. Perceptual and
Motor Skills 24: 527–539.
Freedman, Norbert, James O’Hanlon, Philip Oltman and Herman A. Witkin 1972. The imprint of
psychological differentiation on kinetic behaviour in varying communicative contexts. Journal
of Abnormal Psychology 79(3): 239–258.
Gazzaniga, Michael S., Joseph E. Bogen and Roger W. Sperry 1967. Dyspraxia following division
of the cerebral commissures. Archives of Neurology 16: 606–612.
Geschwind, Daniel H., Marco Iacoboni, Michael S. Mega, Dahlia W. Zaidel, Timothy Cloughesy
and Eran Zaidel 1995. Alien hand syndrome: Interhemispheric motor disconnection due to a
lesion in the midbody of the corpus callosum. Neurology 45: 802–808.
Graff-Radford, Neill R., Kathleen Welsh and John Godersky 1987. Callosal apraxia. Neurology
37: 100–105.
Grunwald, Michael and Weiss, T. 2007. Emotional stress and facial self-touch gestures. Unpublished
paper presented at the Lindauer Psychotherapietage, April, 15–27, 2007, Lindau, Germany.
Hampson, Elizabeth and Doreen Kimura 1984. Hand movement asymmetries during verbal and
nonverbal tasks. Canadian Journal of Psychology 38: 102–125.
Hartmann, Karoline, Georg Goldenberg, Maike Daumüller and Joachim Hermsdörfer 2005. It
takes the whole brain to make a cup of coffee: The neuropsychology of naturalistic actions in-
volving technical devices. Neuropsychologia 43: 625–637.
Healey, Jane M., Jaqueline Liedermann and Norman Geschwind 1986. Handedness is not a uni-
dimensional trait. Cortex 22(1): 33–53.
Hermsdörfer, Joachim, Guido Terlinden, Mark Mühlau, Georg Goldenberg and Afra M.
Wohlschläger 2007. Neural representations of pantomimed an actual tool use: Evidence from
an event–related fMRI study. NeuroImage 36: 109–118.
Hubbard, Amy L., Stephen Wilson, Daniel Callan and Mirella Dapretto 2009. Giving speech a
hand: Gesture modulates activity in auditory cortex during speech perception. Human Brain
Mapping 30(3): 1028–1037.
Johnson, Harold G., Paul Ekman and Wallace V. Friesen 1975. Communicative body movements:
American emblems. Semiotica 15: 335–353.
Johnson-Frey, Scott H., Roger Newman-Norlund and Scott T. Grafton 2005. A distributed left hemi-
sphere network active during planning of everyday tool use skills. Cerebral Cortex 15(6): 681–695.
Killgore, William D. S. and Deborah A. Yurgelun-Todd 2007. The right-hemisphere and valence
hypotheses: Could they both be right (and sometimes left)? Social Cognitive and Affective
Neuroscience 2: 240–250.
Kimura, Doreen 1973a. Manual activity during speaking – I. Right–handers. Neuropsychologia 11:
45–50.
Kimura, Doreen 1973b. Manual activity during speaking – II. Left-handers. Neuropsychologia 11:
51–55.
Kita, Sotaro, Olivier de Condappa and Christine Mohr 2007. Metaphor explanation attenuates the
right-hand preference for depictive co-speech gestures that imitate actions. Brain and Lan-
guage 101: 185–197.
Lausberg, Hedda 1995. Bewegungsverhalten als Prozeßparameter in einer kontrollierten Studie
mit funktioneller Entspannung. Unpublished paper presented at the 42. Arbeitstagung des
Deutschen Kollegiums für Psychosomatische Medizin, March 2–4, 1995, Friedrich-Schiller-
Universität Jena, Germany.
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Pantomime to
visual presentation of objects: Left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
Lausberg, Hedda, Martha Davis and Angela Rothenhäusler 2000. Hemispheric specialization in
spontaneous gesticulation in a patient with callosal disconnection. Neuropsychologia 38:
1654–1663.
Lausberg, Hedda, Reinhard Göttert, Udo Münßinger, Friedrich Boegner and P. Marx 1999. Cal-
losal disconnection syndrome in a left-handed patient due to infarction of the total length of
the corpus callosum. Neuropsychologia 37: 253–265.
Lausberg, Hedda and Sotaro Kita 2003. The content of the message influences the hand choice in
co-speech gestures and in gesturing without speaking. Brain and Language 86: 57–69.
Lausberg, Hedda, Uta Sassenberg and Henning Holle submitted. Right-handers display distinct
hand preferences for different gesture types in communicative situations: Evidence for specific
right and left hemisphere contributions to implicit gesture production.
Lausberg, Hedda, Eran Zaidel, Robyn F. Cruz and Alain Ptito 2007. Speech-independent produc-
tion of communicative gestures: Evidence from patients with complete callosal disconnection.
Neuropsychologia 45: 3092–3104.
Lausberg, Hedda and Kryger Monika 2011. Gestisches Verhalten als Indikator therapeutischer
Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Reprä-
sentation von Objektbeziehungen in gestischen Darstellungen. Psychotherapie-Wissenschaft 1:
41–55.
Lavergne, Joanne and Doreen Kimura 1987. Hand movement asymmetry during speech: No effect
of speaking topic. Neuropsychologia 25: 689–693.
Liepmann, Hugo and Maas, O. 1907. Fall von linksseitiger Agraphie und Apraxie bei rechtsseitiger
Lähmung. Journal für Psychologie und Neurologie 10 (4/5): 214–227.
Lomas, Jonathan and Doreen Kimura 1976. Intrahemispheric interaction between speaking and
sequential manual activity. Neuropsychologia 14: 23–33.
Marangolo, Paola, Ennio De Renzi, Enrico Di Pace, Paola Ciurli and Alessandro Castriota-Skan-
denberg 1998. Let not thy left hand know what thy right hand knoweth. The case of a patient
with an infarct involving the callosal pathways. Brain 121: 1459–1467.
Mashal, Nira and Maria Faust 2009. Conventionalisation of novel metaphors: A shift in hemi-
spheric asymmetry. Laterality 14(6): 573–589.
of Chicago Press.
McNeill, David and Laura Pedelty 1995. Right brain and gesture. In: Karen Emmorey and Judy S.
Reilly (eds.), Language, Gesture, and Space, 63–85. Hillsdale, NJ: Lawrence Erlbaum.
Moll, Jorge D., Ricardo de Oliveira-Souza, Leigh J. Passman, Fernando Cimini Cunha, Fabiana
Souza-Lima and Pedro Angelo Andreiuolo 2000. Functional MRI correlates of real and ima-
gined tool-use pantomimes. Neurology 54: 1331–1336.
Moscovitch, Morris and Janet Olds 1982. Asymmetries in spontaneous facial expressions and their
possible relation to hemispheric specialization. Neuropsychologia 20: 71–81.
Müller, Cornelia 2001. Iconicity and gesture. In: Christian Cavè, Isabelle Guaitella and Serge Santi
(eds.), Oralité et Gestualité, 321–328. Aix-en-Provence, France: L’Harmattan.
Ohgami, Yuko, Kayako Matsuo, Nobuko Uchida and Toshiharu Nakai 2004. An fMRI study of
tool-use gestures: body part as object and pantomime. Cognitive Science and Neuropsychology
15(12): 1903–1906.
Rapcsak, Steven Z., Cynthia Ochipa, Pélagie M. Beeson, and Alan B. Rubens 1993. Praxis and the
right hemisphere. Brain and Cognition 23: 181–202.
Rumiati, Raffaella I., Peter H. Weiss, Tim Shallice, Giovanni Ottoboni, Johannes Noth, Karl Zilles
and Gereon Fink 2004. Neural basis of pantomiming the use of visually presented objects. Neu-
roImage 21: 1224–1231.
Sainsbury, Peter 1955. Gestural movement during psychiatric interview. Psychosomatic Medicine
17: 454–469.
Schirmer, Annet, Kai Alter, Sonja A. Kotz and Angela D. Friederici 2001. Lateralization of pros-
ody during language production: A lesion study. Brain and Language 76: 1–17.
Sousa-Poza, Joaquin F. and Robert Rohrberg 1977. Body movements in relation to type of infor-
mation (person- and non-person oriented) and cognitive style (field dependence). Human
Communication Research 4(1): 19–29.
Souza-Posa, Joaquin F., Robert Rohrberg and André Mercure 1979. Effects of type of information
(abstract-concrete) and field dependence on asymmetry of hand movements during speech.
Perceptual and Motor Skills 48: 1323–1330.
Sperry, Roger Wolcott 1968. Hemisphere deconnection and unity in conscious awareness. Amer-
ican Psychologist 23: 723–733.
Stalnaker, Thomas A., Rodrigo A. Espana and Craig W. Berridge 2009. Coping behaviour causes
asymmetric changes in neuronal activation in the prefrontal cortex and amygdala. Synapse 63:
82–85.
Stephens, Debra 1983. Hemispheric language dominance and gesture hand preference. Ph.D. dis-
sertation, Department of Behavioral Sciences, University of Chicago.
Tanaka, Yasufumi, A. Yoshida, Nobuya Kawahata, Ryota Hashimoto and Taminori Obayashi
1996. Diagnostic dyspraxia – Clinical characteristics, responsible lesion and possible underlying
mechanism. Brain 119: 859–873.
Trope, Idit, Baruch Fishman, Ruben C. Gur, Neil M. Sussman and Raquel E. Gur 1987. Contral-
ateral and ipsilateral control of fingers following callosotomy. Neuropsychologia 25: 287–291.
Ulrich, Gerald 1977. Videoanalytische Methoden zur Erfassung averbaler Verhaltensparameter

bei depressiven Syndromen. Pharmakopsychiatrie 10: 176–182.
Ulrich, Gerald and K. Harms 1979. Video–analytic study of manual kinesics and its lateralization in
the course of treatment of depressive syndromes. Acta Psychiatrica Scandinavica 59: 481–492.
Ulrich, Gerald and K. Harms 1985. A video analysis of the non-verbal behaviour of depressed pa-
tients before and after treatment. Journal of Affective Disorders 9: 63–67.
Vauclair, Jacques and Juliette Imbault 2009. Relationship between manual preferences for object
manipulation and pointing gestures in infants and toddlers. Developmental Science 12(6): 1060–1069.
Verfaellie, Mieke, Dawn Bowers and Kenneth M. Heilman 1988. Hemispheric asymmetries in
mediating intention, but not selective attention. Neuropsychologia 26: 521–531.
Volpe, Bruce T., John J. Sidtis, Jeffrey D. Holtzman, Donald H. Wilson and Michael S. Gazzaniga
1982. Cortical mechanisms involved in praxis: Observations following partial and complete sec-
tion of the corpus callosum in man. Neurology 32: 645–650.
Wang, Jinsung and Robert L. Sainburg 2007. The dominant and nondominant arms are specialized
for stabilizing different features of task performance. Experimental Brain Research 178: 565–570.
Watson, Robert T. and Kenneth M. Heilman 1983. Callosal apraxia. Brain 106: 391–403.
Wilkins, David and Jan P. de Ruiter 1999. Hand preference for representational gestures: A compar-
ison of Arente and Dutch speakers. In: Verle Van Geenhoven and Natasha Warner (eds.), Annual
Report 1999, 51–52. Nijmegen, the Netherlands: Max Planck Institute for Psycholinguistics.
Zaidel, Eran, Jeffrey M. Clarke and Brandall Suyenobu 1990. Hemispheric independence: A par-
adigm case for cognitive neuroscience. In: Arnold B. Scheibel and Adam F. Wechsler (eds.),
Neurobiology of Higher Cognitive Function, 297–355. New York: Guilford Press.
Zaidel, Eran, Hedy White, Eanro Sakurai and William Banks 1988. Hemispheric locus of lexical
congruity effects: Neuropsychological reinterpretation of psycholinguistic results. In: Christine
Chiarello (ed.), Right Hemisphere Contributions to Lexical Semantics, 71–88. New York: Springer.
Hedda Lausberg, Cologne (Germany)
11. Cognitive Linguistics: Spoken language and gesture

as expressions of conceptualization
1. Introduction
2. Metaphor
3. Metonymy
4. Schemas
5. Construal and perspective
6. Mental spaces; formal and conceptual integration/blending
7. Mental simulation
8. Conclusions
9. References
Abstract
Speakers of oral languages use not only speech but also other kinds of bodily movements
when they communicate. From the perspective of cognitive linguistics, all of these beha-
viors can provide insight into speakers’ ongoing conceptualizations of the physical world
and of abstract ideas. Increasing attention is being paid in Cognitive Linguistic (CL)
11. Cognitive Linguistics 183
research to gesture (in particular, manual gesture) with speech in relation to such topics
as: conceptual metaphor; conceptual metonymy; (image) schemas; perspective-taking and
construal; mental spaces; formal and conceptual integration/blending theory; and mental
simulation and simulation semantics. The inclusion of gesture in cognitive linguistic
research raises not only methodological issues concerning what constitutes linguistic
data and what means can be used to analyze it, but also significant theoretical questions
about the nature of language itself, especially in relation to its dynamic character and the
degree to which grammar is multimodal.
1. Introduction
Cognitive Linguistics is a collective label for several lines of research that stem from the
1970s and 80s. Their origin was motivated by a combination of factors. On the one hand,
they grew out of opposition with existing paradigms that dominated linguistics and the
philosophy of language in academia, particularly in the United States. The strongest of
these in linguistics was, and still is, the formalist tradition, developed from the theories
of Noam Chomsky (e.g., 1965, 1995). At the same time, there was a burgeoning in
research in cognitive psychology which exposed problematic aspects of tenets underly-
ing the formalist approach, such as assumptions about how people use categories, in-
cluding linguistic categories, in unconscious reasoning (see Lakoff 1987; Taylor [1989]
1995). This research also brought about work in linguistics in areas which had largely
been ignored in the past. Examples include studying metaphor as a means by which
we think and talk about one conceptual domain in terms of another (Lakoff and John-
son 1980), recognizing frames as a fundamental means by which we organize our
knowledge and express it in linguistic constructions (Fillmore 1982), theorizing about
grammar in terms of principles of cognitive processing (Langacker 1982, 1987, 1991b),
and understanding reference and discourse structure as the setting up of mental models
(Fauconnier [1985] 1994). In contrast to the emphasis on syntax in the Chomskyan
formalist tradition of linguistics, what these approaches have in common is that they
highlight semantics as a starting point for explaining linguistic structure, with meaning
understood as some form of conceptualization.
Some of the work in cognitive linguistics has focused on elements that have tradi-
tionally been studied in descriptive linguistics, falling within the categories of phonol-
ogy, morphology, syntax, etc. However, several of the principles behind this approach
have provided a basis for broadening the scope of investigation. One is the focus on
conceptualization, and since cognitive linguistics takes an encyclopedic view of meaning
(rather than assuming that word meanings are best characterized in terms of discrete
lexical entries), then that very meaning may be expressed in various forms of behavior
in the context of using language. This leads us to a second important principle, which is
that cognitive linguistics is a usage-based approach to the study of language (Barlow
and Kemmer 2000; Langacker 1988). Grammar is assumed to be abstracted from recur-
ring pairings between forms of expression and meanings in usage events of language
(Langacker 2008: 457–458). The conclusion above, that the meanings conventionally
expressed in words can also appear in other forms of behavior, is also relevant here
in that there may be forms of expression beyond the words themselves which recur
in usage events to such an extent that they also gain the status of symbolic units.
Based on both of these lines of argumentation, gesture occurring with and around
talk is, and should be, included as an important source of data within cognitive linguistics
(Sweetser 2007).
This chapter will thus focus on what the field of cognitive linguistics has contributed
and could contribute to the study of bodily behavior in face-to-face communicative set-
tings and, in turn, what looking at gesture from a linguistic point of view means for the
study of language. The focus will be on spontaneous gesture used with and during talk,
concentrating primarily on manual gesture as the type that has received the most atten-
tion by cognitive linguistics researchers. The category of gesture covered here is meant
to include the range from “spontaneous movements of the hands and arms accompany-
ing speech” (McNeill 1992: 37), which Kendon (1980 and elsewhere) terms gesticula-
tion, to more symbolic, emblematic gestures (Efron [1941] 1972) for which there is a
more fixed correspondence between gestural form and conventionally accepted mean-
ing. The gestures may or may not have been used intentionally to communicate and
may or may not have been seen or perceived by the addressee as a signal. We will
see that considering the relevance of cognitive linguistics theories for the analysis of
gesture with speech not only provides new ways of interpreting the functions and
roles of gestures but also raises larger questions about what should fall within the
scope of linguistic inquiry.
A note is in order on the phrase “expressions of conceptualization” in the chapter’s
title. First, the term conceptualization rather than “concepts” is used in keeping with the
claim that this better highlights the dynamic nature of meaning in cognitive linguistics
(Langacker 2008: 30). Second, while cognitive linguistics takes meaning-expression as a
driving force behind why and how language is structured the way it is, there is not a uni-
form consensus within cognitive linguistics as to the scope of what should be included
under conceptualization with regard to language. Some take a very inclusive approach
toward this issue. In the theory of cognitive grammar, for example, “conceptualization
is broadly defined to encompass any facet of mental experience,” including sensory,
motor, and emotive experience, and apprehension of the context on various levels (lin-
guistic, physical, and social) (Langacker 2008: 30). Others qualify conceptualization in
relation to language in other ways. Slobin (1987, 1996), for example, focusses on “think-
ing for speaking” as the “special form of thought that is mobilized for communication”
(Slobin 1996: 76). Taking a step further, Evans (2006, 2009) emphasizes that semantic
structure is cognitively a different level of representation than conceptual structure in
general, with the former involving lexical concepts: “bundle[s] of varying sorts of
knowledge […] which are specialized for being encoded in language” (Evans 2009:
xi). In any case, what is agreed upon in cognitive linguistic approaches is that language
is not referring directly to the world but to conceptualization based upon the language
producer’s (such as the speaker’s) construal of some given situation.
The notion of words mapping to conceptualization may not be controversial. How-
ever, when we speak of gestures as mapping onto or even referring to conceptualiza-
tion, rather than referring to entities and situations in the physical world, this may at
first seem counterintuitive. Doesn’t an instance of pointing to an object refer to that
very object? From the perspective of cognitive linguistics, the gesture refers to the
speaker’s conceptualization of the object, but our conceptualizations of the physical
world are grounded on our perceptual input from the world (Barsalou 1999; Glenberg
and Robertson 2000). Note, though, that since gestures are produced in physical space,
even gestural reference to abstract conceptualizations is grounded by default via the
physical forms, movements, and locations in which the gestures are produced. In this
regard, even abstract reference is grounded with gestures.
A second introductory note is in order concerning two ways of gesturing, since they
will be mentioned repeatedly in the explication below, namely: pointing and represent-
ing. By pointing, I will be referring to the movement of a body part “which appears to
be aimed in a clearly defined direction as if toward some specific target” (Kendon 2004:
200). Since the focus here is mainly on manual gestures as used with European lan-
guages, the relevant pointing is usually performed with an extended finger, although
open-hand pointing is also possible (Kendon and Versante 2003; Kita 2003a). (See
the chapter on pointing in volume 2 for further discussion.)
Another prominent way of gesturing involves representing. Gestural representation
can be accomplished in different manners:
(i) by re-enactment of an action which one would normally do with the hand(s), such
as gripping the hand and moving it horizontally as if holding a pen and writing;
(ii) by using the hand(s) to stand for an entity by substituting for it and thus embodying
it, as when holding one’s hand next to one’s ear with the three middle fingers curled,
the thumb extended up, and the pinkie extended down, in order to represent a
telephone handset;
(iii) by putting the hands or fingers in a way as if holding or next to a two- or three-
dimensional form (adjacent representation), such that a viewer could infer shape
from the hand’s/hands’ contour (Leyton 1992: 121), e.g., as if holding a bowl
upright in the air;
(iv) or by moving in such a way that the hand(s) trace(s) a form, either two-dimension-
ally with a finger tip, or three-dimensionally with more of the hand shape, espe-
cially including the palm, essentially drawing in the air; in this way a viewer
could recover the shape from the motion of the depiction (Leyton 1992: 145).
(This analysis is an adaptation of Müller’s 1998a and 1998b characterization of four ges-
tural modes of representation: acting, representing, molding, and drawing, discussed
further in Müller volume 2) One manner can sometimes lead into the other, for exam-
ple if a tracing gesture finishes with the hand(s) held still in the air, in the static form of
adjacent representation mentioned above.
These ways of gesturing can be used in the service of different functions. Below we
will focus on two main types: reference and relation to the discourse itself. Reference
can be made by pointing (in a prototypical act of deictic indication) or by gestural rep-
resentation, in the ways described above (Cienki in press). Likewise discourse-related
gestures may involve pointing, as in pointing in space that is related to the structure
of the speakers’ ongoing discourse, what McNeill, Cassell, and Levy (1993) refer to
as abstract deixis, or elements of discourse themselves may be represented – for exam-
ple, as-if held in one’s hand(s) (see McNeill’s 1992 discussion of this type as exemplify-
ing the metaphoric CONDUIT gesture for communication, presented in Reddy [1979]
1993).
As discussed below, the simple examples mentioned so far already involve many
basic theoretical notions that have been articulated in cognitive linguistics. However
the bulk of cognitive linguistics research is based on analysis and theorizing about lan-
guage only on the verbal level. We will see here that many of the important ideas from
cognitive linguistics also relate integrally to the study of gesture. Indeed the relevance
of many of these directions in cognitive linguistics to co-verbal behavior raises ques-
tions about the degree to which, when, and in what contexts spoken language should
be analyzed as multimodal.
The following sections should be seen as a selection of some main areas from cogni-
tive linguistics which have found resonance in gesture studies. They treat the study of
metaphor, metonymy, schemas, construal and perspective, mental spaces, formal and
conceptual integration (otherwise known as types of blending), and mental simulation.
2. Metaphor
The study of metaphor was one of the areas of research which became a starting point
for cognitive linguistics as a field. The approach to metaphor which is specifically meant
here is one which treats it not just as a linguistic device, but as a means by which people
conceptualize (consciously or unconsciously) one domain in terms of another (Lakoff
1993; Lakoff and Johnson 1980, 1999; see Grady 2007 for an overview). Verbal meta-
phors are seen as linguistic expressions of underlying cross-domain mappings of con-
cepts. The conceptual mappings claimed to underlie sets of related verbal examples
are usually stated in such research in terms of a target domain (that captures what is
being talked about metaphorically) and a source domain (which generalizes over the
concept in terms of which the target is being understood). This is noted in a verbal state-
ment of the form TARGET IS SOURCE, e.g., SIMILARITY IS PROXIMITY (Grady 1997), where
the IS stands for the mapping. It follows logically from this approach, which has become
known as conceptual (or cognitive) metaphor theory (CMT), that conceptual metaphors
should not just be expressed in spoken or written words (that is: linguistic behavior) but
also in other forms of human behavior. This idea received early recognition in gesture
research in McNeill and Levy (1982), leading them to propose metaphoric gestures as
one of the four categories in a classification of manual gesture types; this reached a
wider audience with the publication of McNeill (1992).
As first discussed in Cienki (1998a), there are various ways in which metaphor in
spoken language and gesture may relate to each other. Perhaps the least surprising ex-
amples from the point of view of conceptual metaphor theory are those cases in which
the same imagery appears in words and gestures at the same time, as when one talks
about trying to take different factors into account equally as balancing all these things,
and simultaneously produces a two-handed gesture, with the flat hands, loosely open,
palm-up, one moving up while the other moves down and then vice versa. This gesture
has been interpreted (Cienki 1998a: 193–194) as reflecting a metaphorical mapping
from the source domain of weighing objects comparatively (on one’s hands as if on
the pans of a traditional pair of scales) to the target domain of considering their relative
importance. Such expression of the same metaphoric source domain in words and the
gestures need not occur simultaneously, but may happen consecutively or partially over-
lapping in time. Indeed, given the fact that gestures often slightly precede the verbal
utterances they accompany (Kendon 1980; McNeill 1992), it is not surprising that the
imagery of the source domain concept may be produced slightly earlier in the gesture
than it is expressed in words. This phenomenon can be seen as providing support for
the argument that at least some of the time, metaphors expressed verbally do involve
the conceptualization of both a target and a source domain. Some may argue that
the gesture may just be illustrating the metaphor that was expressed verbally, i.e., that
the gesture was reflecting the verbal metaphor (Bouissac 2008: 279). Perhaps more con-
vincing evidence, however, comes from the co-production of words and gestures which
highlight different metaphoric construals of a given target domain. For example, in
Cienki (2008: 14–15), a student talks about the moral distinction between wrong versus
right as black or white, but she gestures a different aspect of a scenario with an absolute
division, namely by putting the edge of her right hand, tense and flat in the vertical
plane, against the palm of her left hand, palm up and flat in the horizontal plane.
The gesture can be interpreted as reflecting a clear spatial division between two spaces
(on either side of the vertical palm); it renders the opposition of black and white in a
way which allows for iconic representation in gesture, but this ultimately involves the
use of different types of source domains (one based on the dark/light contrast, one
based on a spatial distinction), and so different metaphoric mappings.
Metaphors may appear in the words alone or in the gestures alone (Cienki 1998a).
Metaphor in spoken words but not in co-speech gesture may be found with verbal ref-
erence to a metaphoric source domain (e.g., a color, as in to be feeling blue meaning
“feeling slightly sad”) which cannot be represented iconically in the spatial medium of
gesture. In addition, we often use verbal metaphoric expressions in speech without
any coordinate gesture depicting the source domain (metaphorically used words need
not be accompanied by metaphorically used gestures). Conversely, metaphor may
appear in the gestures alone, such as when a speaker of English gestures from left to
right while describing a process that occurred. The metaphor of an event transpiring
over time from left to right draws on a spatial metaphor that is common, at least in
European cultures, with the preceding state being oriented to the left and the subse-
quent state to the right (Calbris 1985). This is consistent with the directionality of the
writing systems in these languages and also the illustration of a time line in mathemat-
ics. Yet the metaphor is not used verbally in most, if in any, European languages: for
example people do not say in English that they did something to the left to mean
they did it before something else. Indeed, as Bouissac (2008), Gibbs (2008), and Ritchie
(2009) note, a topic that awaits further investigation concerns which conceptual meta-
phors are only expressed via gestures and do not have verbal equivalents. However, this
is a research area in which the analytic metalanguage of conceptual metaphor theory
may impede progress, in that not all metaphoric mappings can be analyzed in terms
of words or phrases which name the target and source domains (in the form X IS Y).
Finally, we also know that a metaphor may exist in the language but simply not be
used at the moment, such that the target domain is expressed in words (honest person)
while the speaker produces a gesture which reflects a possible source domain for that
concept in that culture (such as in this case: a rigid, flat hand gesture with the palm
in the vertical plane, reflecting the honesty as straightness, Cienki 1998b) (Cienki
1998a: 199–200). These gestures can even be as subtle as small rhythmic beat gestures,
for example in a downward direction when talking about wanting to buy something
inexpensively (a mapping of the sort LESS COST IS DOWN), even without using spatial
language from the source domain (like low price) (Casasanto 2008).
Just as conceptual metaphor theory has been one of the most popular and widely-
applied approaches to the analysis of linguistic data within cognitive linguistics, so
has it been the sub-field of cognitive linguistics which has garnered the most interest
within the field of gesture studies. Other early work in this vein applied conceptual
metaphor theory to examine possible bases for components of metaphoric gestures

(Webb 1997) and the gestural expression of metaphors for speech and thought (Sweet-
ser 1998). More recent research has drawn on conceptual metaphor theory to investi-
gate such topics as the connection of metaphor in gesture to proposed cultural
models (Cienki 2005), metaphoric representations of grammatical theories in gesture
(Mittelberg 2006), different construals of time as expressed in speech and gesture
(Cooperrider and Núñez 2009; Núñez and Sweetser 2006), gestural evidence of the psy-
chological reality of mathematical principles (Núñez 2008), and the dynamic nature of
metaphor in use and in our focus of attention (Müller 2008a, 2008b). Lakoff (2008) pro-
vides proposals as to how findings from neuroscience may explain why people gesture
metaphorically. For additional overviews on metaphor and gesture, see Cienki and
Müller (2008a, 2008b), Müller and Cienki (2009), as well as the chapter on gestures
and metaphors in volume 2 of this Handbook.
3. Metonymy
Within cognitive linguistics, the study of metonymy has long played a secondary role in
relation to research on metaphor. Yet its potential for the field of cognitive linguistics
was already put forward in Lakoff and Johnson (1980) with claims such as: “Metonymic
concepts allow us to conceptualize one thing by means of its relation to something else”
(Lakoff and Johnson 1980: 39). Subsequent research on metonymy in cognitive linguis-
tics has, for the most part, maintained this broad approach to what constitutes meto-
nymy, subsuming synecdoche under it, as proposed previously in Jakobson ([1956]
1990). The growing role of conceptual metonymy as a topic of study in cognitive linguis-
tics can be seen in collections such as Barcelona (2000), Dirven and Pörings (2002), and
Panther and Radden (1999); see Panther and Thornburg (2007) for an overview. The
connection of metonymy to gesture research from an explicitly cognitive linguistics per-
spective is as yet just beginning (Cienki 2007; Cienki and Müller 2006; Mittelberg 2006,
2008; Mittelberg and Waugh 2009), and even work on metonymy and gesture from
other theoretical approaches is limited (e.g., Bouvet 2001). (See the chapter on gestures
and metonymy in volume 2 of the Handbook for more details.)
From a contemporary cognitive linguistic perspective, “Metonymy is a cognitive pro-
cess in which one conceptual entity, the vehicle, provides mental access to another con-
ceptual entity, the target, within the same cognitive model” (Radden and Kövecses
1999: 21, building on Croft 1993, Lakoff 1987, and Langacker 1993). This definition is
aptly neutral with respect to the types of expression of, or stimulus for, the relevant
conceptual entities, be they verbal, gestural, or other types of expressions.
However, it is also clear from the existing research that metonymy as expressed in
words and metonymy in gesture actually function quite differently. The verbal expres-
sion of the metonymic vehicle is a sign or combination of signs which is part of a con-
ventionalized system of form-meaning pairings in the given language. The reduced
nature of the iconicity through which most spoken and written linguistic signs refer
has been pointed out at least since de Saussure ([1916] 1959), even if that reduction
has been qualified and put into a more balanced perspective more recently in cognitive
linguistics (see van Langendonck 2007). By contrast, iconicity is fundamental to refer-
ential gestures (Müller 1998a). Even though it is usually quite schematic in nature, it
still plays a more essential role in the visuo-spatial medium of gesture (as it does in
sign languages, Taub 2001; Wilcox 2007) than it does in spoken or written words (via
their sound or graphic symbolism).
Metonymy is a building block of both pointing and representational gestures (Cienki
and Müller 2006). So demonstrative pointing does not indicate the entire referent itself,
but rather a locatable index to the referent (Clark 2003: 254). To make a connection
with the cognitive linguistic terminology of Langacker (1993), the deictic gesture indi-
cates a perceptually conspicuous site as the reference point for identifying the target.
Note that this applies to reference to concrete entities as well as to the abstract deixis
mentioned in the introduction to this chapter. Speakers (at least in Europe and North
America) sometimes point at apparently empty space when referring either to a new
idea being presented or to one previously referred to by the given interlocutor. The bor-
ders of the referent space are fuzzy, so the metonymy here can be compared to pointing
at a cloud; but again, what is pointed to is a locatable index of the idea-as-space, which
in some cases can be a physical entity or space that is metonymically associated with
the idea (e.g., if you point to a space where someone recently stood when mentioning
something that s/he had uttered). Note that this metonymic pointing to the abstract in-
herently involves metaphor as well, as the idea is reified as a space (perhaps imagined as
an invisible object).
Representational gestures also inherently involve metonymy. All of the four man-
ners of representing discussed in the introduction (re-enactment, substitution, adjacent
representation, and tracing) represent only part or parts of some action or entity. This
applies to whether they are used for concrete reference or abstract (metaphoric) refer-
ence (Cienki 2007; Cienki and Müller 2006). For example, the gestural re-enactment of
writing by hand in the air actually involves no writing instrument or paper: but via me-
tonymy the action shows parts of a whole “writing scene.” A speaker’s open hands as if
tracing a three-dimensional form in the air, be it a bowl or a story as a complete whole
(e.g., in German eine runde Geschichte, a complete, literally “round”, story, Müller
1998b), actually only show parts of that form and the rest is inferred. This leads Mittel-
berg and Waugh (2009) to argue that metonymic mapping is, both semiotically and
cognitively, a prerequisite for metaphoric mapping in metaphoric gestures.
Therefore, metonymy in words and metonymy in gestures differ in how they accom-
plish reference in qualitative terms. Given the ubiquity of metonymy in referential ges-
tures, but not in all referential words, there is also a quantitative difference in the
expression of metonymy in words and gestures. One conclusion we can draw from
the above is that gesture provides a potentially rich source of data for research on
how humans employ conceptual metonymy both in their expressions and cognitively
during face-to-face interaction.
4. Schemas
The importance of schematicity in relation to how language represents meaning has
been recognized as fundamental in cognitive linguistics since the origins of the field.
See, for example, Talmy’s (1983) work on the schematic nature of the meaning of
closed-class words; Johnson’s (1987) and Lakoff ’s (1987) articulation of image schemas
as basic patterns in our physical experience (such as PATH, CONTAINER, BALANCE) which
provide the basis for understanding more abstract domains; and Langacker’s (1987)
employment of schematicity at various levels within the theory of cognitive grammar
(discussed in Tuggy 2007). Schemas are an important tool we use to conceptualize,

retain information about, and recognize physical aspects of the world that we perceive,
allowing us to use this information for abstract thought, so it is no wonder that they fig-
ure prominently in cognitive linguistic theories and analyses.
The connection of cognitive linguistic work on schemas to gesture studies has
focused on a few distinct topics. The one prompting the most interest has been the
relation of gestures to image schemas. Some of this work considers reception –
how image schemas are patterns which observers can easily access to interpret speak-
ers’ gestures (Cienki 2005) – and other research makes claims concerning production –
how image schemas may provide structural motivation for the forms found in many
spontaneous gestures. Research in the latter category includes studies on the role of
the CYCLE image schema as structuring a family of gestures among speakers of
German (but also of other languages) involving the lax hand making a rotating outward
motion at the wrist (Ladewig 2011; Teßendorf and Ladewig 2008); and the PATH schema
as underlying a number of types of dynamic tracing gestures (Williams 2008b). However,
image schemas are of course not the only schematic types of images found in gestures.
Gestures also involve schematizations of other types of imagery, for example geometric
patterns, especially if they are motivated by iconic representation of referents that
are physical objects, or of metaphoric source domains of abstract referents, such as
grammatical concepts typically explained in diagrammatic fashion (Mittelberg 2010).
Some groups of gesture types have been interpreted as schematizations of actions
(Müller and Haferland 1997; Streeck 1994, 2009), and although this research has not
necessarily relied on cognitive linguistics as a theoretical starting point, the findings
are consonant with claims made in cognitive linguistics. The examples below are
from speakers of German, English, French, and Spanish, but again, these gestures also
appear in (at least) other European languages. Such research has considered, for exam-
ple, gestural forms like the palm-up open-hand and its likely origins in the act of present-
ing something to the addressee to look at or to take (Kendon 2004: Chapter 13; Müller
2004; Streeck 1994). Another gesture involves the flat hand with the palm down, moved
horizontally in a tense fashion outward across one’s torso, or with the palm vertical and
the hand moving downward quickly in front or to the side of one’s torso; the hand is as-if
embodying a knife making a slicing motion (Calbris 2003), even though there it is only air
that is being “sliced.” There is also the brushing away gesture (Teßendorf and Ladewig
2008), in which the slightly curled fingers of one hand quickly flick outward, often done
two or three times; the same gestural form to brush something small and unwanted (such
as crumbs or lint) off of one’s clothes is also sometimes used in the air, not against any
surface.
As the authors above argue, the schematization of these actions in gestures provides
a kind of skeletal structure which can be applied in contexts with more abstract mean-
ings or functions. The authors above show, for example that the palm-up open-hand can
offer arguments, the slicing gesture can accompany talk about stopping a process, and
the brushing away gesture can be used to dismiss an idea. The meaning in context can
be abstracted away from the original instrumental function of such gestures as to be pri-
marily discourse-functional, or pragmatic in nature (Kendon 2004; Müller 2004; Streeck
and Hartge 1992), sometimes serving performative functions as found in speech acts
(Müller 1998b; Müller and Haferland 1997). This expansion of functions, presumably
a diachronic process with the use of certain gestures in a given culture, is comparable
to what has been described in lexical semantics as a process of subjectification (Lan-

gacker 1990, 1991a: Chapter 12; Traugott 1986). See Traugott’s (1986: 540) lexical exam-
ple of the development of Old English hwile meaning ‘at that time’ to the Middle
English meaning of ‘during’ to Modern English while which can be used in place of
‘although’; the adverb for time takes on the function of expressing temporal and also
textual cohesion, leading to an expression of the speaker’s view of the relation between
two situations. Use of the expression in question extends from what is concrete and
more objectively tied to a physical context of implementation (e.g., the reference to a
point in time with Old English hwile) to what is more abstract and contingent upon
the discourse context (e.g., the use of Modern English while to set up a contrast in
one’s argument, as in “While I agree with your point, I disagree with…”). Traugott
(1988) highlights the pragmatic strengthening that goes along with the increased
salience of the speaker’s (subjective) point of view. From the point of view of cognitive
linguistics, the fact that this same process can be found in the realm of gesture as in the
verbal lexicon highlights that subjectification is a general cognitive process and not one
restricted to just word etymology.
A proposal related in some ways to the one above concerning the origins of some
gestures in practical actions is the theory of mimetic schemas, which has been develop-
ing recently (Zlatev 2005, 2007, 2008; Zlatev, Persson, and Gärdenfors 2005a, 2005b)
building on a key concept of bodily mimesis from Donald (1991). Mimetic schemas
are more highly specified than image schemas in that they involve actions which hu-
mans do, such as EAT, SIT, TAKE OUT, CRAWL. They are also patterns around which we
structure knowledge, but the patterns here involve specific motoric actions at a general-
ized level of schematicity. Zlatev (2008: 220) notes how mimetic schemas can motivate
certain gestures that are consciously used for communicative purposes, particularly the
representational gestures involving re-enacting and embodying, discussed earlier, and
pointing as used for declarative purposes.
Overall, the research above reveals that ideas that we talk about can be reified in a
way which we see reflected in the schematic imagery of many gestures. This is an aspect
of conceptualization which we can learn much about from studying gesture, more so
than from analyzing verbal language alone. This is something important which the
study of gesture can bring to cognitive linguistics.
5. Construal and perspective

In cognitive linguistics, an important tenet is that if linguistic meaning is equated with
conceptualization, conceptualization is being done by specific individuals, and so con-
ceptualization is always being carried out according to a certain perspective. This is
in line with the usage-based approach of cognitive linguistics. Perspective is also at
the basis of the important notion of construal in cognitive linguistics – how we conceive
and linguistically portray a situation (Langacker 2008: Chapter 3). It is assumed in cog-
nitive linguistics that any linguistic expression we use inherently casts what is being
talked about in line with a particular way of understanding it. One could think of
this as framing (Fillmore 1982) from a particular perspective (for example, are those
who want to spend little money for what they buy thrifty or cheap?). The metaphor
inherent in the word perspective, that is, “seeing from a particular point of view” mapped
onto “understanding in a certain way,” becomes manifested in gesture: representational
and pointing gestures show something (concrete, or abstract via metaphor) from a par-
ticular physical perspective. This perspective can vary with gesture according to differ-
ent factors, an important one being whether the gesturer uses what is called character
viewpoint or observer viewpoint (Cassell and McNeill 1990; McNeill 1992). For exam-
ple, McNeill (1992: Chapter 7) describes how speakers gestured when telling the story
from a cartoon they viewed in which a cat tried to swing on a rope from one building to
another. Some speakers narrated the event while clasping both hands together and
moving them from near one shoulder horizontally to their other side, as if holding a
rope and swinging on it themselves (character viewpoint). Other speakers would hold
up one hand in a loose fist and move it from one side to the other, depicting the cat
swinging (observer viewpoint on the action). The difference is apparent in their word-
ing, but only subtly. One from the character-viewpoint group said “and he tries to swing
from…”, actually highlighting his eventual lack of success, while one from the observer-
viewpoint group said, “and you see him swinging…” – putting the narrator “inside the
fictive world” (McNeill 1992: 193) of the story being told. For one who hears the speak-
ers and sees their gestures, the differing perspectives are clear much more readily than
for one who just reads the transcripts of the words they uttered. (See the discussion of
mental simulation below.)
Viewpoint is not always so directly related to the physical perspective on a scene,
however. Another construal phenomenon discussed in cognitive linguistics is the
description of static scenes in terms of dynamic processes, what Talmy (1996) has called
“fictive motion.” The argument is that in sentences such as “The path runs from the
house down the hill to the river,” the use of a verb of motion to characterize the static
position of an extended object reflects a dynamic mental scanning of it from a particular
point of view. By way of contrast, one could have described it starting from the river
and going up to the house. Evidence supporting the psychological reality of these claims
of mental scanning has come from experimental studies (Matlock 2004a, 2004b) as well
as observational research, the latter of which includes studies on gesture. For example,
McNeill (1992) and Núñez (2008) discuss how mathematicians produce gestures that
metaphorically represent certain concepts in terms of motion events – a dynamicity
which is not inherent in the formal definitions and axioms for these concepts in math-
ematics. Gesture can thus provide evidence of conceptualization in terms of fictive
motion even when fictive motion might not be evident in the conventional verbal
and written expressions used in the domain in question.
Construal always takes place against the background of the speaker’s previous
knowledge and attitudes, and so his/her personal interpretation of the topic being
talked about can play a more or less prominent role. The field of clinical psychology
has taken bodily movements into consideration for some time in research on changes
in speaker attitude (e.g., Freedman 1971). These gestures often involve movement
of the torso or the whole body – for example, torso shifts reflecting the taking on a
different point of view (literally and metaphorically a different “stance”) towards an
issue.
In sum, taking a multimodal perspective on meaning-expression with spoken lan-
guage in the face-to-face context, we see that gestures produced in various ways can
be used (intentionally or unwittingly) by speakers to signal their viewpoint or perspec-
tive on the subject of talk in the moment. This can include their understanding of the
developing discourse structure itself, as we will see in the following section.
6. Mental spaces; formal and conceptual integration/blending

In his 1984 book, translated in 1985 and republished in 1994, Fauconnier introduced a
construct which provided a cognitively-based alternative to the abstract symbols of for-
mal logic that had predominated as tools for analyzing the meaning of discourse struc-
tures. His notion of mental spaces concerns separate domains of referential structure
which are “constructed as we think and talk for purposes of local understanding”
(Fauconnier and Turner 2002: 102). The claim made with mental space theory is that
certain linguistic forms, both lexical and grammatical, reveal a speaker’s use of various
domains of referential structure in his/her production of discourse; in turn the linguistic
forms provide cues to the listener for how to construct “mental spaces” like those of the
speaker. Examples from English include hypothetical constructions (if…, then), expres-
sions of the setting (space, time: last year…), conjunctions (however, …), and mental
imperatives (suppose…). However, just as with conceptual metaphors, if the theoretical
claim is being made about conceptualization while talking, we might find evidence of
this phenomenon in other forms of expression than just words. In fact, in a small-
scale study of conversations in American English (Cienki 2009), it was found that
about half of the time the use of mental-space-building words was accompanied by ges-
tures of various types, including head movements (the most frequent type), hand ges-
tures, torso movements, and gaze shifts. The question remains, though, as to how
often speakers may signal new domains of referential structure in face-to-face commu-
nication only by using gestural cues.
The integration of conceptual elements from different mental spaces has been ar-
gued to be a fundamental process of cognition (Fauconnier and Turner 1998, 2002),
with more or less conscious attention being involved. Integration can occur at the
level of forms of expression and at the conceptual level (Turner and Fauconnier 1995).
Intentional as well as inadvertent blending on the linguistic level is well known, ranging
from the simple combination of morphemes (such as “chunnel” from “Channel + tun-
nel”) to idioms (Cutting and Bock 1997) to the sentence level (Cohen 1987). Research
on gesture also shows aspects of formal and conceptual integration with the accompany-
ing speech. In Parrill and Sweetser (2004), for example, a speaker verbally describes dif-
ferent computational processes while making hand gestures which iconically represent
different qualities of the processes metaphorically via different kinds of motion. Williams
(2008a) uses the setting of a classroom lesson for children about telling time with a clock
to illustrate how meaning-making, via the integration of concepts, is expressed through
the integrated use of talk, gesture, and object manipulation. Cornelissen, Clarke, and
Cienki (2012) discuss how an entrepreneur presents a list of items in a narrative (“pro-
ducts, opportunities, invoices, cash”) and expresses them as a cycle by making a circular
motion with his hand while uttering the words. The research on gesture supports the
argument that the setting up and subsequent integration of mental spaces and other
types of concepts has multimodal dimensions, at least some of the time. (See the chap-
ter on gesture and blending in volume 2 of this Handbook for further details.)
7. Mental simulation
Since the late 1990s, the term mental simulation has rapidly gained currency in cognitive
psychology (e.g., Barsalou 1999; Glenberg and Kaschak 2002; Pecher, Zeelenberg, and
Barsalou 2004) and quickly spread to cognitive linguistics. There are slightly different uses
of the term in the literature, but here we will adhere to the characterization in Barsalou
(2003) that “conceptual processing uses reenactments of sensory-motor states – simula-
tions – to represent categories” (Barsalou 2003: 521). Barsalou elaborates that these simu-
lations are, technically speaking, partial reenactments, since although perception and
conception are similar, they are not identical. Building on this research, a view that is
increasingly gaining ground in cognitive linguistics is known as simulation semantics,
which claims that “[u]nderstanding a piece of language can entail performing mental per-
ceptual and motor simulations of its content” (Bergen 2005: 262; Bergen 2007; see also
adaptations of it in the Neural Theory of Language by Feldman and Narayanan 2004;
Lakoff 2008; and the theory of Lexical Concepts and Cognitive Models, Evans 2006, 2009).
Most of the initial work in this area has focused on written linguistic cues as prompts
for mental simulations (e.g., Glenberg and Kaschak 2002; Stanfield and Zwaan 2001).
However, more recent developments also take spoken language and gesture into
account. For example, Cook and Tanenhaus (2008) had some research participants
move a tower of disks from one peg to another and others did the same task on a com-
puter, moving the images of the disks with a computer mouse. Both sets of participants
then described the task to new participants. Relevant differences were found in how the
people gestured while describing the task, depending on whether they had done it phys-
ically or virtually. The new participants then had to do the task – either using the same
apparatus that they heard described (physical or computer-based) or the other set-up.
The new participants were found to be influenced by the information that they had
gleaned from the gestures of the first set of participants as they tried to solve the task them-
selves, suggesting their mental simulation of what they had (consciously or unconsciously)
seen in the gestures. In addition, studies measuring event-related potentials (ERP) in the
brain provide supporting evidence that “iconic gestures activate image-specific informa-
tion about the concepts which they denote” (Wu and Coulson 2007: 244). It therefore ap-
pears that both words and gestures can serve as cues for addressees to simulate (at least
some aspects of) the content of talk when they can also see the speaker. (See also
Arbib this volume on the mirror neuron system in relation to gesture research.)
The premise that semantics should be approached in terms of the mental simulation
of the content of language is a solution for some of the quandaries posed by remnants of
pre-cognitive linguistic theory which have maintained a foothold in cognitive linguistics.
A case in point is the adherence to words as the medium or metalanguage for many kinds
of cognitive linguistic-based semantic analyses. Simulation semantics is thus a direction
which is more consistent with the tenets of cognitive linguistics. Yet giving this approach
its due will mean taking more seriously the generally accepted view that “the conversa-
tional use of language is primary” in terms of “providing a basic model that other uses of
language mimic and adapt as needed” (Langacker 2008: 459). In line with this, the audio-
visual nature of face-to-face talk will have to be acknowledged further in research on
mental simulation, which will mean incorporating the study of gesture to a greater
degree.
8. Conclusions
The above overview highlights some of the ways in which theory from cognitive linguis-
tics has found fertile ground for development by those taking gesture with speech into
account. We also see that analysis of gestural data can raise some challenges for cogni-
tive linguistic theory, as it conventionally employs verbally-based metalanguage in
semantic analyses: the analog nature of meaning expressed in imagery, particularly in
moving images as we have with gesture, is inadequately captured in written words,
which are static, digital symbols. (Note the diagrams used for analyses in cognitive
grammar as an exception, although these are also static in nature.)
In addition, the studies surveyed here support an argument that language is at least
sometimes multimodal (cf. Müller 2009 for the more absolute claim that spoken lan-
guage is inherently multimodal). This idea has roots independent of cognitive linguis-
tics: witness Kendon’s (1980) approach to gesticulation and speech as two aspects of
the process of utterance and McNeill’s (1992: 2) claim that gesture and language are
one system. For arguments about grammar itself as multimodal, see the project Towards
a Grammar of Gesture (www.togog.org), Fricke (2008), and Harrison (2009). Building on
these works, an approach to language as sometimes multimodal seems to best account
for the varying degrees of systematicity of gesture use in contexts of face-to-face
communication.
This suggests that rather than thinking of language as a “classical” category (i.e., one
with clear boundaries) which contains verbal communication (in written, spoken, and
signed forms of language), perhaps a different concept of the category of language is
needed, namely the prototype structure. Research in cognitive linguistics (e.g., Lakoff
1987; Taylor’s 1995 overview) has endorsed the idea that linguistic categories on a num-
ber of levels of description (such as those of the phoneme, word, and syntactic construc-
tion) exhibit prototype effects. Rather than being definable in the classical way, in terms
of necessary and sufficient conditions, these categories exhibit fuzzy boundaries – the
difficulty of delimiting what constitutes “a word” in many languages being a prime
example (is a sometimes separable clitic a word or an affix?). In a similar way, perhaps
linguists should approach language itself as a category with conventional verbal symbols
being the prototypical manifestation but also recognize that what are often considered
paralinguistic features – including expressive forms like intonation and gesture with
spoken language – can also sometimes have conventionalized symbolic status. In
some cases, this may be redundant with the spoken words, but sometimes these other
forms of expression like gesture may not be co-expressive with words but function on
their own merits. Whether the communicative system so viewed is still best character-
ized with the label language, or rather as something like a variably multimodal semiotic
system, remains an issue to be explored.
Acknowledgements
Work on this chapter was supported by a 2009-10 Fellowship-in-Residence at NIAS, the
Netherlands Institute for Advanced Study in the Humanities and Social Sciences.
9. References
Arbib, Michael this volume. Mirror systems and the neurocognitive substrates of bodily commu-
nication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
Communication Science. Berlin 38.1.) New York: De Gruyter Mouton.
Barcelona, Antonio (ed.) 2000. Metaphor and Metonymy at the Crossroads: A Cognitive Perspec-
tive. Berlin: De Gruyter Mouton.
Barlow, Michael and Suzanne Kemmer (eds.) 2000. Usage-Based Models of Language. Stanford,
CA: Center for the Study of Language and Information.
Barsalou, Lawrence W. 1999. Perceptual symbol systems. Behavioral and Brain Sciences 22: 577–660.
Barsalou, Lawrence W. 2003. Situated simulation in the human conceptual system. Language and
Cognitive Processes 18: 513–562.
Bergen, Benjamin 2005. Mental simulation in literal and figurative language understanding. In:
Seana Coulson and Barbara Lewandowska-Tomaszczyk (eds.), The Literal and Nonliteral in
Language and Thought, 255–278. Frankfurt am Main: Peter Lang.
Bergen, Benjamin 2007. Experimental methods for simulation semantics. In: Monica Gonzalez-
Marquez, Irene Mittelberg, Seana Coulson and Michael Spivey (eds.), Methods in Cognitive
Linguistics, 277–301. Amsterdam: John Benjamins.
Bouissac, Paul 2008. The study of metaphor and gesture: A critique from the perspective of semi-
otics. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 277–282. Amsterdam:
John Benjamins.
Bouvet, Danielle 2001. La Dimension Corporelle de la Parole: Les Marques Posturomimo-
Gestuelles de la Parole, leurs Aspects Métonymiques et Métaphoriques, et leur Rôle au Cours
d’un Récit. Paris: Peeters.
Calbris, Geneviève 1985. Espace-Temps: Expression gestuelle de temps. Semiotica 55: 43–73.
Calbris, Geneviève 2003. From cutting an object to a clear-cut analysis: Gesture as the represen-
tation of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1):
19–46.
Casasanto, Daniel 2008. Conceptual affiliates of metaphorical gestures. Paper presented at the
conference Language, Communication, and Cognition, Brighton, August 2008.
Cassell, Justine and David McNeill 1990. Gesture and ground. In: Proceedings of the Sixteenth
Annual Meeting of the Berkeley Linguistics Society, 57–68. Berkeley, CA: Berkeley Linguistics
Society.
Technology Press.
Chomsky, Noam 1995. The Minimalist Program. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphoric expres-
sions. In: Jean-Pierre Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: Center for the Study of Language and Information.
Cienki, Alan 1998b. STRAIGHT: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9: 107–149.
Cienki, Alan 2007. Reference points and metonymic sources in gesture. Talk presented at the
ninth International Cognitive Linguistics Conference, Kraków, Poland, July 2007.
Cienki, Alan 2008. Why study metaphor and gesture? In: Alan Cienki and Cornelia Müller (eds.),
Cienki, Alan 2009. Mental space builders in speech and in co-speech gesture. In: Ewa Jarmoło-
wicz-Nowikow, Konrad Juszczyk, Zofia Malisz and Michał Szczyszek (eds.), GESPIN: Gesture
and Speech in Interaction [CD-ROM and http://issuu.com/cognitarian/docs/cienki/1]. Poznań.
Cienki, Alan in press. Gesture, space, grammar, and cognition. In: Peter Auer and Martin Hilpert
(eds.), Space in Language and Linguistics: Geographical, Interactional, and Cognitive Perspec-
tives. Berlin: Walter de Gruyter.
Cienki, Alan and Cornelia Müller 2006. How metonymic are metaphoric gestures? Talk presented
at the German Society for Cognitive Linguistics conference, Munich, October 2006.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture, and thought. In: Raymond W. Gibbs
Jr. (ed.), The Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cam-
bridge University Press.
Clark, Herbert H. 2003. Pointing and placing. In: Sotaro Kita (ed.), Pointing: Where Language,
Culture, and Cognition Meet, 243–268. Mahwah, NJ: Lawrence Erlbaum.
Cohen, Gerald 1987. Syntactic Blends in English Parole. Frankfurt am Main: Peter Lang.
Cook, Susan Wagner and Michael K. Tanenhaus 2008. Speakers communicate their perceptual-
motor experience to listeners nonverbally. In: Brad C. Love, Ken McRae and Vladimir M.
Sloutsky (eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society,
957–962. Austin, TX: Cognitive Science Society.
Cooperrider, Kensy and Rafael Núñez 2009. Across time, across the body: Transversal temporal
gestures. Gesture 9(2): 181–206.
Cornelissen, Joep, Jean Clarke and Alan Cienki 2012. Sensegiving in entrepreneurial contexts:
The use of metaphors in speech and gesture to gain and sustain support for novel business ven-
tures. International Small Business Journal 30: 213–241.
Croft, William 1993. The role of domains in the interpretation of metaphors and metonymies.
Cognitive Linguistics 4: 335–370.
Cutting, J. Cooper and Kathryn Bock 1997. That’s the way the cookie bounces: Syntactic and seman-
tic components of experimentally elicited idiom blends. Memory and Cognition 25(1): 57–71.
Dirven, René and Ralf Pörings (eds.) 2002. Metaphor and Metonymy in Comparison and Contrast.
Efron, David 1972. Gesture, Race, and Culture. The Hague: Mouton. First published as Gesture
and Environment. New York: King’s Crown Press [1941].
Evans, Vyvyan 2006. Lexical concepts, cognitive models and meaning construction. Cognitive Lin-
guistics 17: 491–534.
Evans, Vyvyan 2009. How Words Mean: Lexical Concepts, Cognitive Models, and Meaning Con-
struction. Oxford: Oxford University Press.
Fauconnier, Gilles 1984. Espaces Mentaux. Paris: Minuit.
Fauconnier, Gilles 1994. Mental Spaces. Cambridge: Cambridge University Press. First published
Cambridge: Massachusetts Institute of Technology Press [1985].
Fauconnier, Gilles and Mark Turner 1998. Principles of conceptual integration. In: Jean-Pierre
Koenig (ed.), Discourse and Cognition: Bridging the Gap, 269–283. Stanford, CA: Center for
the Study of Language and Information.
Feldman, Jerome and Srini Narayanan 2004. Embodied meaning in a neural theory of language.
Brain and Language 89: 385–392.
Fillmore, Charles 1982. Frame semantics. In: Linguistic Society of Korea (ed.), Linguistics in the
Morning Calm, 111–137. Seoul: Hanshin.
Forceville, Charles and Eduardo Urios-Aparisi (eds.) 2009. Multimodal Metaphor. Berlin: De
Gruyter Mouton.
Freedman, Norbert 1971. The analysis of movement behavior during the clinical interview. In:
Aaron Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 132–165. New
York: Pergamon.
Fricke, Ellen 2008. Grundlagen einer multimodalen Grammatik des Deuschen. Syntaktische
Strukturen und Funktionen. Habilitationsschrift Europa-Universität Viadrina, Frankfurt Oder.
Geeraerts, Dirk and Hubert Cuyckens (eds.) 2007. The Oxford Handbook of Cognitive Linguistics.
Oxford: Oxford University Press.
Gibbs, Raymond W. 2008. Metaphor and gesture: Some implications for psychology. In: Alan Cienki
and Cornelia Müller (eds.), Metaphor and Gesture, 291–301. Amsterdam: John Benjamins.
Glenberg, Arthur M. and Michael P. Kaschak 2002. Grounding language in action. Psychonomic
Bulletin and Review 9: 558–565.
Glenberg, Arthur M. and David A. Robertson 2000. Symbol grounding and meaning: A compar-
ison of high-dimensional and embodied theories of meaning. Journal of Memory and Language
43: 379–401.
Grady, Joseph E. 1997. Foundations of meaning: Primary metaphors and primary scenes. Ph.D.
dissertation, University of California at Berkeley.
Grady, Joseph E. 2007. Metaphor. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The Oxford
Handbook of Cognitive Linguistics, 188–213. Oxford: Oxford University Press.
Hampe, Beate (ed.) 2005. From Perception to Meaning: Image Schemas in Cognitive Linguistics.
dissertation, Université Michel de Montaigne, Bordeaux, France.
Jakobson, Roman 1990. Two aspects of language and two types of aphasic disturbances. In: Linda
R. Waugh and Monique Monville-Burston (eds.), On Language: Roman Jakobson, 115–133.
Cambridge, MA: Harvard University Press. First published in: Roman Jakobson and Morris
Halle, Fundamentals of Language. The Hague: Mouton [1956].
Johnson, Mark 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Rea-
son. Chicago: University of Chicago Press.
Ritchie Key (ed.), The Relation between Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan.” In: Sotaro Kita (ed.),
Pointing: Where Language, Culture, and Cognition Meet, 109–137. Mahwah, NJ: Lawrence Erlbaum.
Kita, Sotaro 2003a. Pointing: A foundational building block of human communication. In: Sotaro
Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 1–8. Mahwah, NJ: Law-
rence Erlbaum.
Kita, Sotaro (ed.) 2003b. Pointing: Where Language, Culture, and Cognition Meet. Mahwah, NJ:
Lawrence Erlbaum.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. Online
19 May 2011 [http://cognitextes.revues.org/406].
Lakoff, George 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the
Mind. Chicago: University of Chicago Press.
Lakoff, George 1993. The contemporary theory of metaphor. In: Andrew Ortony (ed.), Metaphor
and Thought, 2nd edition, 202–251. Cambridge: Cambridge University Press.
Lakoff, George 2008. The neuroscience of meaphoric gestures: Why they exist. In: Alan Cienki
and Cornelia Müller (eds.), Metaphor and Gesture, 283–289. Amsterdam: John Benjamins.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: University of Chicago
Press.
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. New York: Basic Books.
Langacker, Ronald W. 1982. Space grammar, analysability, and the English passive. Language 58:
22–80.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar. Volume 1: Theoretical Prerequi-
sites. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1988. A usage-based model. In: Brygida Rudzka-Ostyn (ed.), Topics in
Cognitive Linguistics, 127–161. Amsterdam: John Benjamins.
Langacker, Ronald W. 1990. Subjectification. Cognitive Linguistics 1: 5–38.
Langacker, Ronald W. 1991a. Concept, Image, and Symbol: The Cognitive Basis of Grammar. Ber-
lin: De Gruyter Mouton.
Langacker, Ronald W. 1991b. Foundations of Cognitive Grammar. Volume 2: Descriptive Appli-

cation. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1993. Reference-point constructions. Cognitive Linguistics 4: 1–38.
sity Press.
Langendonck, Willy van 2007. Iconicity. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The
Leyton, Michael 1992. Symmetry, Causality, Mind. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Matlock, Teenie 2004a. The conceptual motivation of fictive motion. In: Günter Radden and René
Dirven (eds.), Studies in Linguistic Motivation, 221–248. Berlin: De Gruyter Mouton.
Matlock, Teenie 2004b. Fictive motion as cognitive simulation. Memory and Cognition 32(8):
1389–1400.
of Chicago Press.
McNeill, David, Justine Cassell and Elena Levy 1993. Abstract deixis. Semiotica 95: 5–19.
McNeill, David and Elena T. Levy 1982. Conceptual representations in language activity and ges-
ture. In: Robert J. Jarvella and Wolfgang Klein (eds.), Speech, Place, and Action, 271–295. Chi-
chester: Wiley and Sons.
Mittelberg, Irene 2006. Metaphor and Metonymy in Language and Gesture: Discourse Evidence for
Multimodal Models of Grammar. Ph.D. dissertation, Cornell University. Ann Arbor, MI: UMI.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyvyan
Evans and Paul Chilton (eds.), Language, Cognition and Space: The State of the Art and
Mittelberg, Irene and Linda R. Waugh 2009. Metonymy first, metaphor second: A cognitive-semi-
otic approach to multimodal figures of thought in co-speech gesture. In: Charles Forceville and
Eduardo Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998a. Iconicity and gesture. In: Serge Santi, Isabelle Guaı̈tella, Christian Cavé
and Gabrielle Konopczynski (eds.), Oralité et Gestualité: Communication Multimodale, Interac-
tion, 321–328. Paris: L’Harmattan.
Müller, Cornelia 1998b. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arnold Spitz.
Müller, Cornelia 2004. Forms and uses of the Palm Up Open Hand: A case of a gesture family? In:
tures: The Berlin Conference, 233–256. Berlin: Weidler Buchverlag.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View. Chi-
cago: University of Chicago Press.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), The Linguistics Ency-
clopedia, 214–217. New York: Routledge.
Müller, Cornelia volume 2. Gestural modes of representation as techniques of depiction. In: Cor-
nelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem
De Gruyter Mouton.
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände. Zur Semiose performativer
Gesten. Mitteilungen des Germanistenverbandes 3: 29–53.
Núñez, Rafael 2008. A fresh look at the foundations of mathematics: Gesture and the psycholog-
ical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and
Gesture, 93–114. Amsterdam: John Benjamins.
Núñez, Rafael and Eve Sweetser 2006. With the future behind them: Convergent evidence from
language and gesture in the crosslinguistic comparison of spatial construals of time. Cognitive
Science 30: 1–49.
Panther, Klaus-Uwe and Günter Radden (eds.) 1999. Metonymy in Language and Thought.
Amsterdam: John Benjamins.
Panther, Klaus-Uwe and Linda L. Thornburg 2007. Metonymy. In: Dirk Geeraerts and Hubert
Cuyckens (eds.), The Oxford Handbook of Cognitive Linguistics, 236–263. Oxford: Oxford
University Press.
analysis and transcription. Gesture 4: 197–219.
Pecher, Diane, René Zeelenberg and Lawrence W. Barsalou 2004. Sensorimotor simulations
underlie conceptual representations: Modality-specific effects of prior activation. Psychonomic
Bulletin and Review 11: 164–167.
Radden, Günter and Zoltán Kövecses 1999. Towards a theory of metonymy. In: Klaus-Uwe Pan-
ther and Günter Radden (eds.), Metonymy in Language and Thought, 17–59. Amsterdam: John
Benjamins.
Reddy, Michael J. 1993. The conduit metaphor: A case of frame conflict in our language about lan-
guage. In: Andrew Ortony (ed.), Metaphor and Thought, 164–201. Cambridge: Cambridge Uni-
versity Press. First published [1979].
Ritchie, L. David 2009. Review of Metaphor and Gesture ed. by Alan Cienki and Cornelia Müller
(2008). Metaphor and Symbol 24: 121–123.
Saussure, Ferdinand de 1959. Course in General Linguistics. New York: Philosophical Library.
First published Paris: Payot [1916].
Slobin, Dan 1987. Thinking for speaking. In: Proceedings of the Thirteenth Annual Meeting of the
Berkeley Linguistics Society, 435–445. Berkeley, CA: Berkeley Linguistics Society.
Slobin, Dan I. 1996. From “thought and language” to “thinking for speaking.” In: John Gumperz
and Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge: Cam-
Stanfield, Robert A. and Rolf A. Zwaan 2001. The effect of implied orientation derived from ver-
bal context on picture recognition. Psychological Science 12: 153–156.
Streeck, Jürgen 1994. “Speech-handling”: The metaphorical representation of speech in gestures.
A cross-cultural study. Unpublished manuscript.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer
and Aldo Di Luzio (eds.), The Contextualization of Language, 135–157. Amsterdam: John
Benjamins.
Sweetser, Eve 1998. Regular metaphoricity in gesture: Bodily-based models of speech interaction.
Actes du 16e Congrès International des Linguistes (CD-ROM), Elsevier.
Sweetser, Eve 2007. Looking at space to study mental spaces: Co-speech gesture as a crucial data
source in cognitive linguistics. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson
and Michael Spivey (eds.), Methods in Cognitive Linguistics, 201–224. Amsterdam: John
Benjamins.
Talmy, Leonard 1983. How language structures space. In: Herbert L. Pick Jr. and Linda Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum
Press.
Talmy, Leonard 1996. Fictive motion in language and “ception.” In: Paul Bloom, Mary A. Peterson,
Lynn Nadel and Merrill F. Garrett (eds.), Language and Space, 211–276. Cambridge:
Massachusetts Institute of Technology Press.
Taub, Sarah 2001. Language from the Body: Iconicity and Metaphor in American Sign Language.
Taylor, John R. 1995. Linguistic Categorization: Prototypes in Linguistic Theory. Oxford: Oxford
University Press. First published [1989].
Teßendorf, Sedinha and Silva Ladewig 2008. The brushing-aside and the cyclic gesture: Recon-
structing their underlying patterns. Talk presented at the third conference of the German Cog-
nitive Linguistics Association, Leipzig, Germany, September 2008.
Traugott, Elizabeth Closs 1986. From polysemy to internal semantic reconstruction. In: Proceed-
ings of the Twelfth Annual Meeting of the Berkeley Linguistics Society, 539–550. Berkeley, CA:
Berkeley Linguistics Society.
Traugott, Elizabeth Closs 1988. Pragmatic strengthening and grammaticalization. In: Proceedings
of the Fourteenth Annual Meeting of the Berkeley Linguistics Society, 406–416. Berkeley, CA:
Berkeley Linguistics Society.
Tuggy, David 2007. Schematicity. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The Oxford
Handbook of Cognitive Linguistics, 82–116. Oxford: Oxford University Press.
Turner, Mark and Gilles Fauconnier 1995. Conceptual integration and formal expression. Meta-
phor and Symbolic Activity 10: 183–203.
Webb, Rebecca 1997. Linguistic features of metaphoric gestures. Ph.D. dissertation, University of
Rochester, New York.
Wilcox, Sherman 2007. Signed languages. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The
Williams, Robert F. 2008a. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Williams, Robert F. 2008b. Path schemas in gesturing for thinking and teaching. Talk presented at
the third conference of the German Cognitive Linguistics Association, Leipzig, Germany,
September 2008.
Wu, Ying Choon and Seana Coulson 2007. How iconic gestures enhance communication: An ERP
study. Brain and Language 101: 234–245.
Zlatev, Jordan 2005. What’s in a schema? Bodily mimesis and the grounding of language. In: Beate
Hampe (ed.), From Perception to Meaning: Image Schemas in Cognitive Linguistics, 313–342.
Zlatev, Jordan 2007. Embodiment, language, and mimesis. In: Tom Ziemke, Jordan Zlatev and
Roslyn Frank (eds.), Body, Language and Mind, Volume 1: Embodiment, 297–337. Berlin:
De Gruyter Mouton.
Zlatev, Jordan 2008. The co-evolution of intersubjectivity and bodily mimesis. In: Jordan Zlatev,
Timothy R. Racine, Chris Sinha and Esa Itkonen (eds.), The Shared Mind: Perspectives on In-
tersubjectivity, 215–244. Amsterdam: John Benjamins.
Zlatev, Jordan, Tomas Persson and Peter Gärdenfors 2005a. Bodily mimesis as “the missing link”
in human cognitive evolution. Lund University Cognitive Studies 121: 1–40.
Zlatev, Jordan, Tomas Persson and Peter Gärdenfors 2005b. Triadic bodily mimesis is the differ-
ence. Behavioral and Brain Sciences 28: 720–721.
Alan Cienki, Amsterdam (Netherlands)

12. Gestures as a medium of expression: The linguistic

potential of gestures
In memoriam Helmut Richter, 10.02.1935–13.01.2012
1. Introduction
2. Bühler’s organon model of language as use: A functional-linguistic framework for gesture
3. How gestures realize the three functions of the organon model: Expression, appeal,
representation
4. Gestures with a dominantly pragmatic function
5. Conclusion
6. References
Abstract
Departing from Karl Bühler’s psychological theory of language and his theory of expres-
sion, it is proposed that gestures exhibit a potential for language because they can be used
to fulfill the same basic functions as language. Gestures can express inner states and feel-
ings, they may regulate the behavior of others (they are designed for and directed towards
somebody) and they can represent objects and events in the world (Bühler [1934] 1982;
Müller 1998a, 1998b). Notably these functions are co-present in any speech-event and
they can be co-present in gestures, with one of them taking center stage. The functional
approach to gestures relates them to three basic aspects of any communicative encounter –
the speaker, the listener, and the world we talk about. With this systematics in place, we
can categorize a wide range of gestures according to their dominant function: expression,
appeal, or representation.
1. Introduction
Regarding gestures as a medium of expression is inspired by Wilhelm Wundt and Karl
Bühler, both psychologists with a great interest for language and expression, for com-
municative body movements and for their relation to affective as well as linguistic
expression. In Bühler’s theory of expression, gestures figure as one expressive medium
which may in some cases take over representational functions comparable to words
(Bühler 1933, [1934] 1982, 2011). In Wundt’s Folk Psychology, gestures are considered
forms of expression that are at the roots of language evolution (Wundt 1921). This is
why Wundt discusses sign languages of the deaf and the signs of Northern American
Plains Indians as much as the manifold gestures of the Neapolitans. Wundt regards ges-
tural forms of linguistic communication as precursors of vocal language. Looking at
signed languages today, we know that hand movements can develop into a fully fledged
language. Signed languages have a grammar and lexicon, and they vary across cultures.
French Sign Language differs significantly from British, German, Danish or Nicaraguan
Sign Language and this means no less than in order for a sign language to develop
historically, gestures undergo processes of grammaticalization.
Now, how is it that movements of the hands are equipped to turn into language, if
necessary? What makes them apt to do so? What are their properties as an articulatory
12. Gestures as a medium of expression: The linguistic potential of gestures 203
medium that makes them the only body part other than the vocal tract that can develop
into a full-fledged language? We know that hand and mouth movements are controlled
in the same cortical region in the brain (Wilson 1998), and it is assumed that language
only evolved in the human species after the change to bipedalism (Leroi-Gourhan 1984;
Wilson 1998). The latter assumption inspired a broad range of theories proposing a ges-
tural origin of language (Arbib and Corballis, this volume, are current proponents of
such a theory; see also Kendon 1991 and Armstrong and Wilcox 2007). Condillac’s
theory of a language d’action (a language of action as precursor of vocal language) is
an important example of a close intertwining of gestures and articulatory expressions
(see Copple, this volume). Kendon (2008) and McNeill (this volume) in turn propose
different theories about how vocal and gestural articulation evolved together. We
also know from praxeological accounts of gesture that the hands are the only organ
apart from the vocal tract that have developed a capacity for flexible and variable
movements with a high degree of articulatory precision (see Streeck 2009, this volume).
Notably, the hands are the techniques of the body par excellence (Mauss 1934), they are
the primary tools that ensure survival and afford the development of culture (Leroi-
Gourhan 1984). The medium “hand” appears therefore well equipped to develop into
a full-fledged language. There are at least two major reasons: a highly flexible articula-
tion (the capacity for a high differentiation of movements is a prerequisite for a com-
plex sign system) and a manifold instrumental use of the hands, which provides the
functional grounds (and infinite sources) for the creation of gestural meaning. Everyday
actions – such as opening, closing, holding, pushing, drawing lines in a soft surface or
touching objects – may serve as roots for gestural meaning (Müller 1998a, 1998b,
2010a, 2010b; Müller and Haferland 1997; Streeck 2002, 2009, this volume).
The hands’ movement properties in conjunction with their functions as instruments
to deal with the world make them a medium of expression with a potential for language.
With these properties hands and mouth stand out against all other possible bodily
media of expression: neither face, nor legs or feet have a comparable articulatory free-
dom and instrumental and functional richness. The linguistic view on gesture proposed
here takes these medial and functional properties as a point of departure; it regards ges-
tures as a medium of expression.
2. Bühler’s organon model of language as use: A functional-

linguistic framework for gesture
In his theory of expression, Bühler argues against a principal difference of vocal and
bodily (e.g. sign) language. Instead he holds that only because vocal language has
become the predominant mode of communication have its representational or depictive
resources become as fine-grained and differentiated as they are today. If instead bodily
(sign) language would have become the primary mode of communication, it would have
developed a similar degree of differentiation as vocal language (Bühler 1933: 40). And
indeed, several decades later, sign linguistics has proven this assumption with empirical
facts (Stokoe 1960). While we can base our judgment on existing linguistic research of
sign languages, Bühler put forward his arguments only by applying a functional view
on gestures and words as signs within a concrete speech event. In his theory, Bühler refers
to Plato’s idea of language as an “organum”, an instrument used by a speaker to commu-
nicate something about the “things” to an addressee. In other words, Bühler’s theory takes
language use – or in his terms the concrete speech event – as a starting point. Language
functions as an instrument which sets up a relation between three fundamental aspects
of any such speech event: the speaker, the addressee and the things. The speaker uses
language as an instrument to tell his/her addressee something about the things in the world.
By specifying the particular functions of a linguistic sign in the moment of speaking,
Bühler critically evaluates and re-formulates Plato’s simple model of language use.
Situating language in the basic structure of any communicative encounter – the speaker,
the addressee and the world talked about – he establishes three basic functions of the
concrete ‘sound phenomenon’ (Schallphänomen) or the uttered (complex) linguistic
sign (e.g. this could be a word, or a phrase): it expresses, appeals, and represents. A
word that is spoken represents entities and events in the world; at the same time it ex-
presses inner states and feelings, and as a communicative sign it appeals to somebody
(and directs the behavior of somebody/or it addresses somebody). Here is Bühler’s
reformulation of Plato’s model (see Fig. 12.1):
The circle in the middle symbolizes the concrete acoustic phenomenon. Three variable fac-
tors in it go to give it the rank of a sign in three different manners. The sides of the in-
scribed triangle symbolize these three factors. In one way the triangle encloses less than
the circle (thus illustrating the principle of abstractive relevance). In another way it goes
beyond the circle to indicate that what is given to the senses always receives an appercep-
tive complement. The parallel lines symbolize the semantic functions of the (complex) lan-
guage sign. It is a symbol by virtue of its coordination to objects and states of affairs, a
symptom (Anzeichen, indicium: index) by virtue of its dependence on the sender, whose
inner states it expresses, and a signal by virtue of its appeal to the hearer, whose inner
or outer behaviour it directs as do other communicative signs.
This organon model, with its three largely independently variable semantic relations, was
first expounded completely in my paper on the sentence (Bühler 1918), which begins with the
words: “What human language does is threefold: profession, triggering and representation.”
Today I prefer the terms expression, appeal and representation […]. (Bühler 2011: 34–35)
In short, for Bühler, language as use is characterized by three functions: representation,

expression, appeal (Bühler [1934] 1982: 28). Note that the English term representation
is misleading here: the German word Darstellung entails the idea of presentation or
depiction, it does not imply the idea of re-presentation. Bühler’s organon model of lan-
guage as use conceptualizes the speaker and addressee as active participants shaping
and understanding language in a communicative event. The three functions of language
are conceived as fundamentals for language and they are co-present in every speech-
sign. Also Bühler uses the term appeal in very idiosyncratic way. For him it captures
the fact that every verbal sign that is used is addressed to somebody. Language is rooted
in a speech-event and cannot be thought of other than being directed to an addressee.
While Bühler has generally stated the basic functional equivalence of gestures and
sounds (Bühler 1933: 40, see also Müller 1998b: 87), he has not discussed gestures
in relation to the organon model. Nevertheless, when considering the functions of co-
verbal gestures closely, we find that gestures indeed may realize the three functions
too: gestures are used to express inner states, to appeal to somebody (or are addressed
towards somebody), and to represent objects and events in the world (Müller 1998a,
1998b, 2009). It is particularly Bühler’s representational function, that makes gestures
different from any other articulatory medium. We cannot mold the square shape of
2. ORGANON MODEL OF LANGUAGE (A)
objects and states of affairs
representation
expression
appeal
sender receiver
Fig. 12.1: Bühler’s organon model of language (Bühler 2011: 35).
an object or enact actions with the face or the head or the legs (if not they are actions of
the head or the feet; or if not, the feet are used as instruments for sketching out forms).
Hands can do all this without any effort, they can represent entities other than them-
selves and in fact people use them all the time for these purposes. With their hands, peo-
ple mold geometric shapes, reenact instrumental actions of the hands and by doing so,
they refer either to the action or the object acted upon. Speakers use their hands also
as objects in motion: wiggling fingers as moving legs, a flat hand for a piece of paper, or
the fist for a ball (see Müller 1998a, 1998b, 2009, 2010a; for more on gestural modes of
representation, see Müller volume 2 of this handbook).
As far as pointing is concerned, we can do this quite well with different kinds of ar-
ticulators (eyes, mouth, chin, feet) but the depiction of entities or events in the world is
difficult to do with other articulators than the hand. This is what makes their properties
of expression so unique and this is what they share with the vocal tract: an extremely
high level of movement differentiation and a big range of possible movements along
with the functional property to actually exploit the techniques of the hand for depictive
purposes. These articulatory and functional properties constitute the articulatory pre-
requisite for a complex sign system to emerge. Gestures fulfill all three functions formu-
lated in the organon model and they do so simultaneously. This is what makes gestures
so apt for language. They can talk about the world and express inner feelings and they
are directed to an addressee. The following section will illustrate how they do this.
3. How gestures realize the three functions of the organon model:

Expression, appeal, representation
For a linguistic perspective on gestures, the representational function is critical. It ex-
plains their potential for language and the tight functional integration with the linguistic
structure of the verbal part of the utterance that gestures exhibit. Because gestures can
represent entities other than themselves, and because the hands have the articulatory
flexibility to form manifold shapes, to move in a wide variety of manners, and to occupy
all kinds of possible places within a fairly large space, they are candidates for language.
Hand movements are highly articulate and they are humans’ most important instruments
to deal with the world, to shape it and to handle it (Streeck 1994, 2009, this volume). Free-
dom of form, movement, and space along with the complex functionality of hand move-
ments also provides the articulatory and functional grounds for the co-presence of the
three functions of expression, appeal, representation. Bühler argues that every word,
every phrase we utter realizes the three functions at the same time, with one of the func-
tions usually being dominant (Bühler [1934] 1982). Looking at gestures, this co-presence
is particularly clear: whenever we are depicting something, let’s say a rectangular object,
our hands will mold a rectangular shape, but the molding will be realized in a particular
manner, with a specific movement quality: we might carefully almost tenderly move our
hands, or perform a vigorous, energetic movement. This movement quality is an expres-
sion of our affective stance towards the object we are depicting and it is co-present with
the depictive dimension of the gesture (Kappelhoff and Müller 2011). The same holds for
the appealing function. It, too, is present in every gesture with a dominantly representa-
tional function. Whenever gestures are performed they are oriented towards an attend-
ing interlocutor, they are addressed towards somebody. So, we do not simply mold a
rectangular shape in front of our body, but we tend to direct our hands towards an attend-
ing addressee (Streeck 1993, 1994). Of course, sometimes we seem to gesture for our-
selves, when sorting out complex spatial tasks, or when speaking on the phone, but
then we as speaker/gesturers are our own addressee. The following section provides
examples for gestures with dominantly representational, appealing, or expressive function.
3.1. Gestures with a dominantly representational function depicting

concrete actions, objects or events
In the first example we see a gesture that depicts the actual unpacking of a suitcase
(Fig. 12.2, example 1). Here both hands repeatedly re-enact the unpacking movements.
In the second example the speaker traces the shape of a picture frame, with repeated
movements of the extended index fingers. He acts as if sketching invisible lines in the
air (Fig. 12.2, example 2).
In the third example we see how hands may be used to act as if molding all kinds
of different shapes, such as boxes, bowls, buildings, balls, or picture frames (Fig. 12.3,
example 3). Here both hands act as if molding a round horizontal shape in front of the
speaker’s body. In doing so they depict the shape of a round bench. The last example
of a gesture referring to a concrete object, action, or event is the gestural depiction of a
car crash (Fig. 12.3, example 4). The hands may become “sculptures” of objects, of actors,
and actions. The speaker uses her right clenched hand to represent the moving car, while
her laterally oriented, left flat hand depicts the wall, against which the car has crashed.
3.2. Gestures with a dominantly representational function referring to

abstract actions, objects, or events: metaphoric gestures
Gestures that look very similar to the ones used when people talk about concrete ob-
jects, events, or actions are very often used to represent (and refer to) abstract notions
and concepts. Very often they depict the source domain of a verbal metaphoric
Both hands reenact unpacking move- Both extended fingers trace the shape
ment, which is repeated once after the of a picture frame with a repeated
sentence is finished. stroke.
Er packte lo tenia con una marco

aus. redondo
stroke preparation stroke preparation stroke stroke
‘He ‘it had a round
unpacked.’ shape’
Fig. 12.2: Gestures with a dominantly representational function depicting concrete action and
object (example 1: acting as if unpacking a suitcase and 2: acting as if tracing a circle).
Both hands mold in front of the body Right hand, clenched fist, represents
the shape of a round bench. car in motion, bashing against a wall,
which is represented by the vertically
oriented left extended hand.
Sie war rund. Er fuhr es gegen die Wand.

preparation stroke preparation stroke
‘It was round.’ ‘He drove it against the wall.’
Fig. 12.3: Gestures with a dominantly representational function depicting a concrete object and
event (example 3: acting as if molding a round object and 4: right hand represents object in
motion, left hand represents flat, vertically oriented object).
expression. In the next example (Fig. 12.4, example 5), a woman is shown who enacts an
unpacking movement with both hands, very similar to the one that we have described in
Fig. 12.2, example 1. However, here the unpacking refers to the ‘unpacking’ of secrets
(Geheimnisse auspacken), figuratively this German idiom refers to the telling of secrets.
What we see here is a typical case of a so-called metaphoric gesture (for an overview on
metaphor and gesture, see Cienki and Müller 2008a, 2008b).
Both hands enact unpacking movement. The gestures

depict the revealing “auspacken” (lit. unpacking) of a
secret.
Er packte aus.
stroke post stroke hold retraction
‘He unpacked.’
Fig. 12.4: Gesture with a dominantly representational function referring to abstract action
(example 5: acting as if “unpacking”, e.g. revealing a secret).
Speakers also use their hands to outline abstract paths. In example 6 (Fig. 12.5), the
woman traces the ups and downs of her first love-relationship starting from the left
hand side. The gesture again is used metaphorically: the invisible line is used to meta-
phorically depict the emotional and temporal course of her first relationship.
Left extended index finger goes up and traces a wavy line from left
to right to metaphorically depict the course of the relationship
aber es ging ne es startete und flachte dann so weiter ab

preparation stroke stroke
‘but it went well it and then flattened like is on out’
Fig. 12.5: Gestures with a dominantly representational function depicting an abstract event: meta-
phoric gestures (example from Müller 2007, 2008a, 2008b) (example 6: tracing course of relationship).
Time as space, this omnipresent metaphorical structure, very frequently shows up in

gestures. Here we see a case in which a portion of time, a segment of time is treated
and conceived of as rectangular object. The lecturer in example 7 (Fig. 12.6) molds
a small rectangular object, which depicts a segment of time as an object in space.
Example 8 (Fig. 12.6) shows again a gesture that enacts the source domain of a verbal
metaphoric idiom: Er fuhr es gegen die Wand, he ruined something, in a literal transla-
tion she says: ‘He drove it against the wall’. Her left open hand represents a further
unspecified object in motion, which hits a wall, represented by the open right hand.
The hitting of the hands depicts the failing of an enterprise.
Hands mold a rectangular object and Hand represents an object in motion,

depict the scope of time. They embody the clashing against an obstacle. They depict the
source domain of the German idiom: “den crashing of an enterprise by embodying the
zeitlichen Rahmen vorgeben”, lit.: “setting source domain of the German idiom: “gegen
the frame of time”. die Wand fahren”, lit.: “driving against the
wall”.
Sie geben den zeitlichen Rahmen Er fuhr es die Wand.

vor. gegen
preparation stroke retraction rest preparation stroke Post stroke
position hold
‘They set the time frame.’ ‘He drove the wall.’
(lit.) it against (lit.)
‘They the course of He ruined the
determine event. enterprise.
(fig.) (fig.)
Fig. 12.6: Gestures with a dominantly representational function depicting an abstract object and
event: metaphoric gestures (example 7: molding scope of time and 8: crashing enterprise).
Note, however that the gestures described above are not simple reflections of something
out there in the real world. Instead, they are mediated by processes of conceptualiza-
tion, interaction, and the depictive possibilities of visible bodily movement. Notably
the four different examples of gestural depiction reflect the four basic techniques of
the hands for the creation of gestural signs, elsewhere termed gestural modes of repre-
sentation: acting, molding, tracing, representing (see Müller volume 2 of this handbook
and 1998a,1998b, 2004, 2009, 2010a, 2010b).
3.3. Gestures with a dominant function of expressing emotions

In the category of emotion expression we find gestures such as raising the hands and
arms to express joy and triumph (Fig. 12.7, example 9), covering the face to express
sadness or grief (Fig. 12.7, example 10), or moving the fist downward to express
anger or rage (Fig. 12.7, example 11).
Note that these gestures all come with a particular facial expression (for an introduc-
tion to the Facial Action Coding System FACS, see Waller and Pasqualini this volume)
and with a particular bodily posture. They are actually expressions of the entire body.
This is especially obvious in the case of the bodily expression of joy and triumph.
When victories are celebrated, as for instance in sports (after the scoring of a goal in
soccer, see our example 9, Fig. 12.7), what we see then is that the whole body is moving
upwards, trunk is stretched, arms are raised, head moves up, eyes are widened, mouth
tips are moving up (in a smile). In situations of extreme relief and joy, as for instance, in
a very important world cup soccer game, persons often jump up repeatedly, run around
with arms raised and may even be lifted by their companions and carried around as the
winner “high up”. This prominence of the upwards movement brings to mind the con-
ceptual metaphor HAPPY IS UP (Lakoff and Johnson 1980) (which might actually be
based on a conceptual metonymy) showing the embodied experiences that indicate
the experiential roots of the conceptual metaphor and the manifold verbal expressions
that relate to joy and that incorporate this idea: uplifting behavior, rising moods and
high spirits.
When, for instance, in conversations such a full bodily expression is reduced to a
hand movement, this implies a process of cognitive abstraction on the side of the
speaker/gesturer: While the basic meaning is maintained the form is reduced. What
we observe here are processes of schematization, which can be considered a first step
in a process of the emergence of linguistic form-meaning relation. This process involves
cognitive metonymy, in which the hands (arms) stand in a pars pro toto relation for the
full bodily expression.
Expressing sadness or grief by covering the face or the eyes with the hands also goes
along with a full bodily expression (see our example 10, Fig. 12.7). Here, the body exerts
an overall downward attitude or posture. The head is bent down, and the shoulders and
upper body are bent down too. In cases of very strong grief and despair, we may even
find that people’s legs will no longer support them so that they fall down to the ground
and “break down”. Here the second part of the metaphoric pair that Lakoff and John-
son described comes to mind: SAD IS DOWN; again we see here the bodily experience, to
which metaphors relate.
When expressing anger or rage, people in the western hemisphere can be observed
to use their clenched fist to exert an energetic and highly dynamic downward move-
ment – either hitting a table or “in the air”. Again, this hand gesture can be considered
a metonymic derivation from a full-fledged bodily expression in which the entire body
builds up a tension, which then is released with the dynamic fist movement, sometimes
speakers might even jump up and down with this angry fist movement (this is often used
as an expressive pattern to depict the emotional expressions of cartoon characters).
It is important to bear in mind that the kinds of expressive bodily movements de-
scribed above differ from symptomatic bodily expressions of emotions such as crying,
blushing, turning pale, and trembling. The important difference here is that in the cases
Both arms raised, fists clenched, expressing Right hand closes eyes and covers face,
joy and triumph., expressing sadness and grief.
preparation stroke post stroke hold preparation hold
Clenched fists bang energetically downwards, expressing anger and even rage.
oh ne dass ihm das be wusst gewesen

wäre.
prep stroke prep stroke prep stroke prep stroke stroke
‘with out him being a ware of it’
Fig. 12.7: Gestures with a dominant function of expressing emotions: raising arms expressing joy
and triumph (example 9), covering face expressing sadness and grief (example 10), banging fists
expressing anger and rage (example 11).
of banging the fist, raising the hands, and covering the face, the body movements are
symbolic – they are culturally shaped, conventionalized willful expressions of emotion.
They might have physiological and experiential roots – but they are no purely symptom-
atic forms of behavior. However, as we have mentioned above, all gestural movements
have an expressive dimension: So gestures with a representational function and gestures
with a dominantly appealing function do all have an expressive dimension, which re-
sides in their movement qualities (primarily). Gestures with a dominant function of
expressing emotions also differ with regard to movement qualities; here they express
differences in the degree of joy, anger, sadness, their dominant function appears to
be the expression of basic emotions, while the movement qualities seem to be more
on the side of what emotion psychology and neuro-cognitive science would characterize
as feelings (Damasio 2003).
3.4. Gestures with an dominant function of appealing to others

Gestures with a dominantly appealing function are used to regulate the behavior of
others. They can be addressed to a large audience (see for instance the calming gesture
of a speaker in the British Parliament in example 12, Fig. 12.8). On the other hand, we
observe them as well in smaller scale face-to-face interactions, when used to indicate to
an interlocutor to be quiet (see the index finger crossing the lips in example 13,
Fig. 12.8), or when asking someone to come close (as the hand wave gesture of the
fish seller on the market in example 14, Fig. 12.8).
In terms of Jakobson’s extension of Bühler’s functions this would be the conative
function of language (Auer 1999; Jakobson 1960; see Eschbach 1984 for a discussion
of Jakobson and Bühler). From a perspective of speech-act theory (Searle 1969),
these gestures can be described as having a dominantly perlocutionary function.
4. Gestures with a dominantly pragmatic function

Notably, Bühler’s model of language as an organum does not capture all types of ges-
tures that we encounter in everyday interactions. Although it is rooted in what he terms
the concrete speech-event (“das konkrete Sprechereignis”; Bühler [1934] 1982: 28) it
does not consider language use as communicative action. Bühler’s model is primarily
a semiotic model. It further develops Saussure’s bipolar characterization of the linguis-
tic sign into signifiant and signifié by relating the linguistic sign to the situation of a con-
crete speech event. Signs are not detached from use as in Saussure’s langue (language as
a system of signs) (de Saussure 1967), but they connect to the world (the objects, events,
actions), to a speaker and to an interlocutor (see also Eschbach 1984; Auer 1999).
So the organon model of language roots the concept of sign in an idealized speech-
event, but what it does not do is to regard language in use as communicative action.
Bühler’s model clearly lacks an understanding of speaking as action. Although Bühler
apparently met Wittgenstein, his thinking was very much rooted in German ‘expression
psychology’ (“Ausdruckspsychologie”; see for an overview: Krumhuber et al. this vol-
ume; Wallbott 1982; Wallbott and Asendorpf 1982; Wallbott and Helfrich 1986), Gestalt
Psychology, Behaviorism and the emerging European structuralism. He formulated his
theory of language before ordinary language philosophy developed and with it an
understanding of speaking as action. This might account for Bühler’s disregard of the
pragmatic dimensions of linguistic signs.
When we look at gestures, however, we cannot dismiss their actual embedding within
an utterance. Gestures used along with speech are not elements of a sign system that
can be regarded and analyzed out of context. Gestures are spontaneously integrated
within ongoing utterances. They are a part and a form of talking, they are realizing ref-
erential, performative, modal and discursive functions (for Müller’s functional classifi-
cation of gestures see Müller 1998b: 110–113, for the modal function of gestures see
Müller and Speckmann 2002 and volume 2 of this handbook). This is why Adam
Kendon has characterized them as visible actions, which may function as utterances
(see the title of his book Kendon 2004).
Both hands, palm down are moved downwards to Right index finger is placed across the lips to say
calm the audience in the parliament. “be quite”.
Of course I know how he hates... Sei ruhig!

preparation hold preparation stroke hold prep stroke retract rest position prep
‘Be quiet!’
With his flat hand oriented down and a stretched arm a seller on the fish market waves the bypassers to
come close and listen to his price offers.
Hal lo, komm ran hier!

hold stroke prep stroke prep stroke prep stroke
‘Hey, come close here!’
Fig. 12.8: Gestures with a dominant function of appealing to interlocutors (conative or perlocu-
tionary function) (example 12: open hands moved down to calm an audience, 13: index across
lips to request silence from interloctor and 14: waving towards body asking interlocutor to
approach).
Therefore, and with this theoretical shift, we find that there is a group of rather fre-
quently used gestures that cross-cuts Bühler’s functional approach. Kendon has referred
to them as gestures used with a pragmatic function in a rather general sense as “any of the
ways in which gestures may relate to features of an utterance’s meaning that are not a
part of its referential meaning or propositional content” (Kendon 2004: 158). He distin-
guishes three main kinds of pragmatic functions: Müller’s modal and performative ges-
tures and those with a parsing function (Kendon 2004: 159). There is quite a body of
research on pragmatic gestures (see volume 2 of this handbook for an overview Kendon
1995, 2004; Streeck 1994, 2009; Teßendorf 2008, just to mention a selection), but for now
we want to focus on one very prominent type: the performative gestures.
There are gestures whose primary function is to execute a speech-act. These gestures
function like performative verbs: when saying I swear the action of swearing is realized,
when saying I bless you, the action of blessing is realized, when offending somebody by
expressing Fuck you, the offense is realized. Interestingly most of those verbal perfor-
matives go along with a highly conventionalized performative gesture. Swearing (with
an open palm raised vertically), blessing (sketching the Christian cross), offending
(showing somebody the erected middle finger, palm turned towards speaker). And
notably these gestures do have a status as legal acts (swearing) or as actions that will
be filed in certain circumstances as an offense (the German police can charge someone
with showing the erected middle finger to another driver). Historically, some of these
verbal performatives might have been verbalizations of gestural performative acts in
the first place (see for an example of this the case of the medieval honoring gesture,
Müller and Haferland 1997, volume 2 of this handbook). Performative gestures in gen-
eral are extremely common and widespread and they are often fully conventionalized
speech acts (mostly characterized as emblems, see Teßendorf this volume).
This indicates once more that gestures are in principle functionally comparable to
verbal signs: that they may re-present something other than themselves while at the
same time expressing some inner state, being addressed towards somebody, and by ex-
ecuting speech acts and other communicative activities (Müller 1998b). From a more
strictly speech-act theoretical point of view (Austin 1962; Searle 1969), it is obvious
that every gesture is a communicative action, some of them primarily express proposi-
tional content (gestures with a representational function or referential gestures), some
of them primarily realize illocutionary force (gestures with a performative function),
while others mainly execute perlocutionary effects. These functional properties indicate
that gestures embody the functional and articulatory seeds of language.
5. Conclusion
Regarding gestures as a medium of expression advocates an embodied and functional
perspective on gestures. It opens up pathways to a linguistic understanding of gestures:
the understanding of gestures as linguistic derives in part from their functions in lan-
guage, and this can be realized by researchers if they do a close analysis of gestural
forms. In short, the linguistic potential of gestures is grounded in their properties as
a medium of expression. (see Bressem, Ladewig and Müller this volume and Müller,
Ladewig and Bressem this volume, for more detail) To be clear, co-verbal gestures
themselves are not a language on their own; they are integrated with speech and are
part and parcel of multimodal utterances. They are embodied seeds of language, and
when developing into signed languages, this potential for language – that of all
human articulators only the mouth and hands carry – comes to light.
Acknowledgment
For the presentation of the examples, we will not give a full form-meaning analysis
as proposed by the ToGoG Method of Gesture Analysis (see Müller, Ladewig and
Bressem volume 2 and Bressem, Ladewig and Müller for our Linguistic Annotation
System this volume). Instead, for the illustration of different gesture functions, we
will account for the overall gestural form gestalt and the temporal placement of the
gesture in relation to speech.
We thank Mathias Roloff for providing the drawings (www.mathiasroloff.de) and the
Volkswagen Foundation for supporting this work with a grant for the interdisciplinary
project “Towards a grammar of gesture: evolution, brain and linguistic structures”
(www.togog.org).
6. References
Arbib, Michael A. this volume. Mirror systems and the neurocognitive substrates of bodily com-
munication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
Armstrong, David F., and Sherman E. Wilcox 2007. The Gestural Origin of Language. New York:
Auer, Peter 1999. Sprachliche Interaktion: Eine Einführung anhand von 22 Klassikern. Tübingen:
Niemeyer.
Austin, John L. 1962. How to Do Things With Words. Cambridge: Harvard University Press.
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic Annotation System
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Bühler, Karl 1933. Ausdruckstheorie: Das System an der Geschichte aufgezeigt. Jena: Fischer.
Bühler, Karl 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart: Fischer. First
published [1934].
Bühler, Karl 2011. Theory of Language: The Representational Function of Language. Amsterdam:
John Benjamins.
Cienki, Alan and Cornelia Müller 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, Gesture and Thought. In: Raymond W. Gibbs
(ed.), The Cambridge Handbook Metaphor and Thought, 484–501. Cambridge: Cambridge
University Press.
Copple, Mary this volume. Enlightenment philosophy: Gestures, language, and the origin of
human understanding. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
Corballis, Michael this volume. Mirror systems and the neurocognitive substrates of bodily com-
munication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
Damasio, Antonio R. 2003. Looking for Spinoza. Joy, Sorrow, and the Feeling Brain. Orlando:
Harcourt.
Eschbach, Achim 1984. Bühler-Studien. Frankfurt am Main: Suhrkamp.
Jakobson, Roman 1960. Linguistics and Poetics. In: Krystyna Pomorska and Stephen Rudy (eds.),
Language and Literature, 62–94. Cambridge, MA: Harvard University Press.
metaphor and expressive movement in speech, gesture and feature film. Metaphor in the Social
World 1(2): 121–135.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man (The Journal of
the Royal Anthropological Institute) 26(2): 199–221.
Kendon, Adam 1995. The Open Hand. Manuscript. Albuquerque.
Kendon, Adam 2008. Signs for language origins? Public Journal of Semiotics 2(2): 2–29.
Krumhuber, Eva, Susanne Kaiser, Arvid Kappas and Klaus R. Scherer this volume. Body and
speech as expression of inner states. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguis-
tics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Leroi-Gourhan, Andre 1984. Hand und Wort: Die Evolution von Technik, Sprache und Kunst.
Frankfurt am Main: Suhrkamp.
Mauss, Marcel 1934. Les techniques du corps. Journal de Psychologie 32(3–4): 271–293.
McNeill, David this volume. Gesture as a window onto mind and brain, and the relationship to
linguistic relativity and ontogenesis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguis-
tics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia 1998a. Iconicity and Gesture. In: Serge Santi, Isabelle Guaitella, Christian Cavé
and Gabrielle Konopczynsk (eds.), Oralité et Gestualité: Communication Multimodale, Interac-
tion, 321–328. Montréal: L’Hartmann.
Müller, Cornelia 1998b. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Müller, Cornelia 2004. Forms and Uses of the Palm Up Open Hand: A Case of a Gesture Family?
In: Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everyday Gestures,
Müller, Cornelia 2007. A dynamic view on gesture, language and thought. In: Susan D. Duncan,
Justine Cassell and Elena T. Levy (eds.), Gesture and the dynamic dimension of language. Es-
says in honor of David McNeill. Amsterdam: John Benjamins.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
Müller, Cornelia 2009. Gesture and Language. In: Kirsten Malmkjaer (ed.), The Linguistic Ency-
clopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Christiane Voss and Martin Vöh-
ler (eds.), Die Mimesis und ihre Künste, 149–187. Munich: Wilhelm Fink.
Perspektive. Sprache und Literatur 41(1): 37–68.
Müller, Cornelia volume 2. Gestural modes of representation as techniques of depiction. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana
Bressem (eds.), Body – Language – Communication: An International Handbook on Multi-
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände: Zur Semiose performativer
Gesten. Mitteilungen des Deutschen Germanistenverbandes 3: 29–53.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem this volume. Towards a grammar of gesture:
A form based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David

Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Commu-
nication Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem volume 2. Methods of linguistic gesture ana-
lysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Alan Cienki, Silva H. Ladewig, David
McNeill and Jana Bressem (eds.), Body – Language – Communication: An International Hand-
Müller, Cornelia and Gerald Speckmann 2002. Gestos con una Valoración Negativa en la Conver-
sación Cubana. In: Lucrecia Escudero Chauvel and Monica Rector (eds.), DeSignis 3, 91–103.
Buenos Aires: Gedisa.
Saussure, Ferdinand de 1967. Grundfragen der allgemeinen Sprachwissenschaft. Berlin: De Gruyter.
Searle, John R. 1969. Speech Acts. Cambridge: Cambridge University Press.
Stokoe, William 1960. Sign Language Structure. Buffalo, NY: Buffalo University Press.
Streeck, Jürgen 1993. Gesture as communication 1: Its coordination with gaze and speech.
Communication Monographs 60(4): 275–299.
Streeck, Jürgen 1994. ‘Speech-handling’: The metaphorical representation of speech in Gestures. A
cross-cultural study. Manuscript. Austin.
Streeck, Jürgen 2002. A body and its gestures. Gesture 2(1): 19–44.
Streeck, Jürgen this volume. Praxeology of Gestures. In: Cornelia Müller, Alan Cienki, Ellen
Teßendorf, Sedinha 2008. From everyday action to gestural perfomance: metonymic motivations of
a pragmatic gesture. Presentation given at the Seventh International Conference on Research-
ing and Applying Metaphor (RaAM7), Cáceres (Spain).
Teßendorf, Sedinha this volume. Emblems, quotable gestures, or conventionalized body move-
ments. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Sci-
ence 38.1.) Berlin: De Gruyter Mouton.
Wallbott, Harald G. 1982. Contributions of the German ‘expression psychology’ to nonverbal
behavior research. Part III: Gaze, gestures, and body movement. Journal of Nonverbal Behav-
ior 7(1): 20–31.
Wallbott, Harald G. and Jens B. Asendorpf 1982. Contributions of the German ‘Expression Psy-
chology’ to nonverbal communication research. Part I: Theories and concepts. In: Journal of
Nonverbal Behavior 6(3): 135–147.
Wallbott, Harald G. and Hede Helfrich 1986. Contributions of the German ‘expression psychol-
ogy’ to nonverbal behavior research. Part IV: The voice. Journal of Nonverbal Behavior 10(3):
187–204.
Waller, Bridget M. and Marcia Smith Pasqualini this volume. Analysing facial expression using the
facial action coding systen (FACS). In: Cornelia Müller, Alan Cienki, Ellen Fricke, Alan
Cienki, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language –
Wilson, Frank R. 1998. The Hand. How It Shapes the Brain, Language, and Human Culture. New
York: Pantheon Books.
Wundt, Wilhelm 1921. Völkerpyschologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Erster Band. Die Sprache, 142–257. Leipzig: Engelmann.
Cornelia Müller, Frankfurt (Oder), Germany

13. Conversation analysis: Talk and bodily resources

for the organization of social interaction
1. Introduction
2. The naturalistic use of audio and video recording technologies
3. The methodic mobilization of a range of resources in social interaction
4. Multimodal analyses of turns, sequences and activities
5. Conclusion
6. References
Abstract
This chapter describes the perspective of Conversation Analysis on embodied action as it
is intertwined with talk. Conversation Analysis works on naturalistic video data as doc-
umenting the endogenous organization of social activities in their ordinary settings. Its
purpose is to study social interaction as it is organized collectively by the co-participants,
in a locally situated way, and as it is built incrementally through its temporal and sequen-
tial unfolding, by mobilizing a large range of vocal, verbal, visual and embodied re-
sources, which are publicly displayed and monitored in situ. This chapter explicates
some features of this approach. First, in order to analyze the organization of social inter-
action as an indexical, contingent, and emergent accomplishment, Conversation Analysis
adopts a methodology for collecting data which is based on fieldwork and video record-
ings in naturalistic settings. Second, these data are transcribed in a detailed way, taking
into consideration the embodied feature as they are made relevant by the participants –
constituting the multimodal resources they use for the emergent and local organization
of their action. These multimodal resources include not only gesture but also body pos-
ture and body movement. Third, analysis of these data focus particularly on the sequen-
tial organization of action, which shapes the temporality both of turns, sequences and
activities.
1. Introduction
Conversation Analysis (CA) looks at the endogenous organization of social activities in
their ordinary settings: it considers social interaction as it is organized collectively by
the co-participants, in a locally situated way, and as it is built incrementally through
its temporal and sequential unfolding, by mobilizing a large range of vocal, verbal,
visual and embodied resources, which are publicly displayed and monitored in situ.
The analysis of these feature focuses on action as an indexical, contingent, and emer-
gent accomplishment. This has some consequences for how the empirical work is con-
ceived and the data are collected, i.e. on what is observed and how it is documented.
Conversation analysis’ naturalistic approach demands the study of “naturally occurring
activities” as they unfold in their ordinary social settings. This highlights the fact that for
a detailed analysis of their relevant endogenous order, recordings of actual situated
activities are necessary.
13. Conversation analysis 219
2. The naturalistic use of audio and video recording technologies

The detailed consideration of multimodal resources (gesture, gaze, head movements,
facial expressions, body postures, manipulation of objects, etc.) depends in a crucial
way on the technologies for documenting social action. As was mentioned above,
conversation analysts work on the basis of audio or video recordings documenting
naturally-occurring practices. Historically, the majority of the first studies in conversa-
tion analysis used audio recordings, and so it is no accident that the first systematic anal-
yses were based on telephone conversations. By using such data, the analyst could be
certain that participants were not relying on their mutual visual access. As Schegloff
(2002: 288) puts it:
For studying co-present interaction with sound recording alone risked missing embodied
resources for interaction (gesture, posture, facial expression, physically implemented ongo-
ing activities, and the like), which we knew the interactants wove into both the production
and the interpretation of conduct, but which we as analysts would have no access to. With
the telephone data, the participants did not have access to one another’s bodies either, and
this disparity was no longer an issue.
Nevertheless, video began to be used very early on in the history of conversation ana-
lysis. As early as 1970, Charles and Marjorie Harness Goodwin in Philadelphia carried
out film recordings of everyday dinner conversations and other social encounters. After
1973, these recordings were used by Jefferson, Sacks and Schegloff in research seminars,
and then also in published papers. In 1975, at the Annual Meeting of the American
Anthropological Association, Schegloff presented a paper co-authored with Sacks,
who had been killed a few weeks earlier in a car accident, on “home position”. This
was an early attempt to describe bodily action systematically. In 1977, Charles Goodwin
presented his dissertation at the Annenberg School of Communications of Philadelphia
(later published as C. Goodwin 1981). The dissertation was based on approximately
50 hours of videotaped conversations in various settings (C. Goodwin 1981: 33).
Thus early work by both Emanuel Schegloff (1984) and Charles Goodwin, as well as
by Christian Heath (1986), used film materials in order to analyze and understand how,
in co-present interaction, humans mobilize orderly and situatedly a large range of verbal,
auditory and visual resources in order to produce intelligible – accountable – actions, as
well as to interpret publicly displayed and mutually available actions (Streeck 1993). In
an important way, this early work was convergent with some of the assumptions made
by pioneers in gesture studies, such as Kendon (1990) and McNeill (1981), who had ar-
gued that gesture and talk are not separate “modules” for communication but originate
from the very same linguistic, cognitive and social mechanisms. Further developments of
conversation analysis within the 1980s were characterized by an increasing interest in
data gathered in institutional settings, opening up a program of comparative studies of
speech exchange systems which differ from conversation and are characterized by distinc-
tive and restrictive speakers’ rights (cf. Drew and Heritage 1992: 19; Sacks, Schegloff, and
Jefferson 1974: 729). Studies of institutional settings from the 1980s onwards cover,
among other things, court hearings, medical interactions in primary care, and news inter-
views, partially recorded on video. After the identification of the fundamental structures
of turn-taking in ordinary conversation – considered as the basic form of intersubjective
relation in the social world, the primary form of interaction to which the child is exposed
and socialized, and thus the fundamental locus for the emergence, development and
change of language – these studies engaged in a comparison between conversation and
institutional talk, the latter being seen as presenting more formal, restricted, specialized,
and constrained sets of practices for the organization of action (Drew and Heritage 1992).
Within this group of work, Heath (1986) was one of the few authors explicitly to address
multimodal features of social interaction documented by video data – such as gaze, ges-
ture, body posture, movement, manipulation of artifacts, spatial arrangements, etc. These
features become central in the so-called workplace studies emerging in the 1990s, in
which video data are central for the documentation of complex professional activities
which are characterized by the use of technologies and artifacts and by a variety of spatial
distributions of the participants. Two projects play an important role in this context: the
first is the study of an airport initiated by L. Suchman at Xerox PARC, which includes
intensive videotaping of various locations, using sometimes as many as seven cameras
focusing on various persons at work, but also producing close-up views of the documents
and screens they were using (C. Goodwin and M.H. Goodwin 1996: 62; Suchman 1993).
The second is the study of the line control rooms of the London Underground initiated by
C. Heath (Heath and Luff 1992; Heath and Luff 2000), focusing on the coordination of
multiple activities co-present and distant. Other settings studied included an archaeology
field (C. Goodwin 1994, 1997) and a scientific ship (C. Goodwin 1995). Later on, numerous
studies followed, on surgery (Koschmann et al. 2007; Mondada 2003, 2007b), television
studios (Broth 2008, 2009), emergency calls and call centres (Fele 2008; Mondada 2008;
M. Whalen and Zimmerman 1987; J. Whalen 1995), pilots’ cockpits (Nevile 2004), etc.
This work implemented new ways of recording data, prompting a reflection about
the way in which data were collected (C. Goodwin 1993; Heath 1997; Heath, Hindmarsh,
and Luff 2010; Mondada 2006b), and about the complexity of embodied action, includ-
ing not only gesture but also object manipulations (Streeck 1996), use of documents, and
use of technologies, as well as complex forms of multi-party interaction, beyond mutu-
ally focused encounters and within peculiar ecologies and environments, all demanding
peculiar techniques of documentation and often more than one camera.
3. The methodic mobilization of a range of resources

in social interaction
Conversation analysis deals with the methodic way in which participants organize social
interaction. The very notion of “method” comes from “ethnomethodology” (Garfinkel
1967), referring to the fact that participants build the intelligibility of their action in an
orderly (that is, “methodic”) and publicly recognizable (that is, “accountable”) way,
which is both systematic and indexical, both transcending context and taking into
account a diversity of contexts (both context-free and context-shaped, Heritage
1984). As a starting observation, Sacks, Schegloff and Jefferson (1974) observed that
participants interact by smoothly alternating turns at talk, minimizing both pauses
and overlaps. This prompted a general interest in the methodic practices and resources
through which this finely-tuned coordination happens. Since the very first studies, a
range of resources have been explored, concerning firstly linguistic resources, such as
syntax, prosody, and meaning, but then also embodied resources, such as gesture,
gaze, head movements, nods, facial expressions, and body postures.
Some studies focus on one of these details and explore their systematic character: for
example, Schegloff (1984) studies gestures produced by speakers, C. Goodwin (1981)
absent vs. mutual gaze prompting re-starts at the beginning of the turn, Mondada
(2007a) pointing as displaying the imminent speaker’s self-selection, Stivers (2008)
nods as expressing affiliation in storytelling, and Peräkylä and Ruusuvuori (2006) facial
expressions as manifesting alignment and affiliation in assessing sequences.
Other studies consider the coherent and coordinated complexity of embodied con-
ducts: for example, Heath (1989) considers together gaze, body posture and body ma-
nipulations, Streeck (1993) gesture and gaze, and Mondada and Schmitt (2010) and
Hausendorf, Mondada, and Schmitt (2012) the coordination of a range of multimodal
resources, from gesture to body position and the distribution of bodies in space. This
emphasis on global Gestalts also invites researchers to investigate the entire body
and its adjustments to other bodies in its environment, taking into account object ma-
nipulations and body movements within the environment (C. Goodwin 2000). Recently,
the consideration of mobility in interaction, comprising walking, driving, and flying,
has emphasized the importance of considering the entire body (Haddington, Mondada,
and Nevile 2013).
What emerges from these studies is the necessity of going beyond the study of sin-
gle “modalities” coordinated with talk, and to take into consideration the broader em-
bodied and environmentally situated organization of activities (Streeck, Goodwin, and
LeBaron 2011).
Charles Goodwin’s research stands as a prime example of a multimodal interactional
perspective: he has done pioneering work in conceptualizing social action in a multimo-
dal environment, and this work has later been developed in a rich diversity of terrains in
workplace studies. His early work shows how the coordination of gaze, as well as the
mutual availability of the participants, has configuring effects on turn and sequence con-
struction (C. Goodwin 1981). Later on, he shows that, in addition to talk and embodiment,
it is crucial to consider other “semiotic fields”, such as the environment and the material
artifacts that surround interactants, which constitute “contextual configurations”, and to
look at how these are used as resources for producing and recognizing actions and for ac-
complishing meaningful activities (C. Goodwin 2000). For example, C. Goodwin (2000)
shows how during a game of hopscotch a player can use the hopscotch grid as a multi-
modal resource for producing and challenging action, e.g. accusing the other player of
throwing a beanbag in the wrong square and thereby of having breached the rules. In
another example, C. Goodwin (2003) provides a detailed analysis of how a graphic struc-
ture in the soil, on which two archaeologists focus their attention, acts as a crucial
resource in the production of the complex embodied activity in which they are engaged.
C. Goodwin argues that the structure (in a similar way to the grid above) “provides orga-
nization for the precise location, shape and trajectory of the gesture” (C. Goodwin 2003:
20). Consequently, different modalities (language, gestures and the body) and the envi-
ronment elaborate upon each other, are mutually interdependent, and are relevant to
the organization of an activity in interaction.
4. Multimodal analyses of turns, sequences and activities

Conversation analysis focuses on the interactional order as it is achieved by the parti-
cipants at all levels of sequential organization. Participants engage step-by-step in the
construction of their turns, which are formatted online, in an emergent way, taking into
consideration the responses of the co-participants (C. Goodwin 1979) and the contingen-
cies of the interactional context. Turns unfold in a systematic way, based on the projec-
tions and the normative expectations characterizing the organization of the sequence,
first within the fundamental structure of the adjacency pair, constituted by a first pair
part making relevant and expectable a second pair part (as in the pair question/answer,
Schegloff and Sacks 1973), and second within possible pre-sequences and post-expansions
which make it more complex (Schegloff 2007). Participants mobilize all the resources at
hand for the intelligible, mutually accountable organization of the sequentiality of inter-
action; consequently, all the levels of sequential analysis have been explored, showing
that sequentiality is configured not only by talk but also by a range of embodied
resources.
Turn-construction and turn-taking basically rely on multimodal resources for the
organization of recognizable unit completions as well as transition-relevance places
(Lerner 2003). Speakers can display their imminent self-selection by a range of multi-
modal resources used as “turn-entry devices” – such as the “[a-face”, the palm-up gesture
(Streeck and Hartge 1992), the pointing gesture (Mondada 2007a) or other complex
embodied manifestations (Schmitt 2005) – to show that the person is about to speak.
Speakers also multimodally display the completion of turn-constructional units
(TCUs), as well as of turns, projecting completion by the trajectory of their gesture
or its retraction (Mondada 2007a), which is used as a “turn-exit device”. More generally,
Schegloff (1984: 267) suggests that the pre-positioning of gestures is a way of creating a
“projection space” within an ongoing utterance. Turns can also be expanded, not only by
adding syntactically fitted materials, but also gesturally (C. Goodwin 1979, 1981). This in-
cludes gestures achieving the collaborative construction of turns, in a similar way to the
collaborative construction of utterances (Bolden 2003; Hayashi 2005; Iwasaki 2009).
Not only turn-construction but also sequence organization relies on multimodal
resources, as demonstrated by studies on assessments (C. Goodwin and M.H. Goodwin
1987; Haddington 2006; Lindström and Mondada 2009), on word searches (M.H. Goodwin
and C. Goodwin 1986; Hayashi 2003), and on repair (Fornel 1991; Greiffenhagen and
Watson 2007), not to speak of sequences co-constructed by aphasic speakers and their
partners (C. Goodwin 2004).
Opening and closing sequences, as well as transition sequences from one activity to
another, are typically organized in an embodied way. The opening of an interaction is
achieved not only by the first words spoken or the response to a summons, but, even
before participants begin to speak, by an adjustment and arrangement of their bodies
in the material environment, in such a way that an “F-formation” (Kendon 1990) appro-
priate for the imminent activity is constituted. This in turn prompts the participants to
progressively organize and assemble their bodies within the local environment, building
the relevant “interactional space” of the encounter (Mondada 2005, 2009; Hausendorf,
Mondada, and Schmitt 2012). The participants can even be engaged in different courses
of action – or in multi-activity – as displayed, for example, by “body-torqued” postures
(Schegloff 1998), in which the upper part of the body is oriented towards a particular
participation framework and the lower part to another set of relevances. Conversely,
towards the closings of the encounter, the interactional space dissolves (De Stefani
2006, 2010; Robinson 2001). The importance of bodily movements in transitions be-
tween one episode of interaction and another, often concomitant with the manipulation
of objects and artifacts, also displays the embodied orientation of the participants
towards the organization of the interaction (Heath 1986; Modaff 2003; Mondada
2006a).
More generally, turns and sequences are closely monitored in the interactive con-
struction of actions; for example, co-participants constantly orient to recipiency
(M.H. Goodwin 1980), mutual attention (Heath 1982), and the achievement of mutual
understanding (Koschmann 2011; Sidnell 2006). This in turn opens up different oppor-
tunities to participate (C. Goodwin and M.H. Goodwin 2004), both in everyday situa-
tions (see the analyses of by-play by M.H. Goodwin 1997, or of stance and participation
by C. Goodwin 2007) and in professional ones (see the analyses of embodied participa-
tion in meetings by Ford 2008; Markaki and Mondada 2011; and Depperman, Schmitt,
and Mondada 2010).
5. Conclusion
Conversation analysis is engaged in the naturalistic study of social interaction. It focuses
on the local mobilization of multimodal resources (gesture, gaze, head movements,
facial expressions, body postures and body movements) by the participants, who
thereby organize the public accountability and finely-tuned coordination of their ac-
tions. Audible and visible resources, relying on language and the body, are exploited –
and also transformed in their exploitation – in both a situated and a systematic way
within sequentiality. Sequentiality concerns turn design, the building of sequences,
action formats and the organization of larger activities.
On the basis of video recordings of naturally occurring social activities, the conver-
sation analysis multimodal approach to social interaction has included an increasingly
wide and complex range of resources for accounting for different levels of organization,
starting from gesture and gaze, then integrating head movements and facial expressions,
and finally taking more and more seriously the issues of position, arrangement and
movement of bodies in interaction. The analysis of these complex Gestalts deals not
only with everyday face-to-face conversation, but also with ordinary mobile interac-
tions – such as walking together – and interactions among larger groups of participants
like debates or public assemblies. It also describes interaction in activities mediated by
technologies and by manipulating objects, artifacts and tools.
In this sense, conversation analysis deals with the study of the complexity of human
activity, including the widest range of multimodal resources, as they are considered,
produced and interpreted locally by the participants engaged in action.
6. References
Bolden, Galina B. 2003. Multiple modalities in collaborative turn sequences. Gesture 3(2):
187–212.
Broth, Mathias 2008. The studio interaction as a contextual resource for TV-production. Journal of
Pragmatics 40(5): 904–926.
Broth, Mathias 2009. Seeing through screens, hearing through speakers: Managing distant studio
space in television control room interaction. Journal of Pragmatics 41: 1998–2016.
Depperman, Arnulf, Reinhold Schmitt and Lorenza Mondada 2010. Agenda and emergence: Con-
tingent and planned activities in a meeting. Journal of Pragmatics 42: 1700–1712.
De Stefani, Elwys 2006. Le chiusure conversazionali nell’interazione al banco di un supermercato.

In: Yvette Bürki and Elwys De Stefani (eds.), Trascrivere la Lingua. Dalla Filologia all’Analisi
Conversazionale, 369–403. Bern: Lang.
De Stefani, Elwys 2010. Reference as an interactively and multimodally accomplished practice:
Organizing spatial reorientation in guided tours. In: Massimo Pettorino, Antonella Giannini,
Isabella Chiari and Francesca Dovetto (eds.), Spoken Communication, 137–170. Newcastle
upon Tyne: Cambridge Scholars Publishing.
Drew, Paul and John Heritage 1992. Talk at work: Interaction in institutional settings. In: Paul
Drew and John Heritage (eds.), Talk at Work, 418–469. Cambridge: Cambridge University
Press.
Fele, Giolo 2008. The collaborative production of responses and dispatching on the radio: Video
analysis in a medical emergency call centre. Forum: Qualitative Social Research 9(3): art. 40.
Ford, Cecilia 2008. Women Speaking Up: Getting and Using Turns in Workplace Meetings. New
York: Palgrave.
Fornel, Michel de 1991. De la pertinence du geste dans les séquences de réparation et d’interrup-
tion. In: Bernard Conein, Michel de Fornel and Louis Quéré (eds.), Les Formes de la Conver-
sation, Vol. 2, 119–154. Paris: CNET.
Goodwin, Charles 1979. The interactive construction of a sentence in natural conversation. In:
George Psathas (ed.), Everyday Language: Studies in Ethnomethodology, 97–121. New York:
Irvington.
New York: Academic Press.
Goodwin, Charles 1993. Recording interaction in natural settings. Pragmatics 3(2): 181–209.
Goodwin, Charles 1994. Professional vision. American Anthropologist 96(3): 606–633.
Goodwin, Charles 1995. Seeing in depth. Social Studies of Science 25(2): 237–274.
Goodwin, Charles 1997. The blackness of black: Color categories as situated practice. In: Barbara
Burge, Clotilde Pontecorvo, Lauren Resnick and Roger Säljö (eds.), Discourse, Tools and Rea-
soning: Essays on Situated Cognition, 111–140. New York: Springer Verlag.
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Goodwin, Charles 2003. The semiotic body in its environment. In: Justine Coupland and Richard
Gwyn (eds.), Discourses of the Body, 19–42. New York: Palgrave/Macmillan.
Goodwin, Charles 2004. A competent speaker who can’t speak: The social life of aphasia. Journal
of Linguistic Anthropology 14(2): 151–170.
Goodwin, Charles 2007. Participation, stance and affect in the organization of activities. Discourse
and Society 18(1): 53–73.
Goodwin, Charles and Marjorie Harness Goodwin 1987. Concurrent operations on talk: Notes on
the interactive organization of assessments. Pragmatics 1(1): 1–55.
Goodwin, Charles and Marjorie Harness Goodwin 1996. Seeing as a situated activity: Formulating
planes. In: David Middleton and Yrjö Engestrom (eds.), Cognition and Communication at
Work, 61–95. Cambridge: Cambridge University Press.
Goodwin, Charles and Marjorie Harness Goodwin 2004. Participation. In: Alessandro Duranti
(ed.), A Companion to Linguistic Anthropology, 222–244. Oxford: Blackwell.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated for the production
of description sequences. Sociological Inquiry 50(3–4): 303–317.
Goodwin, Marjorie Harness 1997. By-play: Negotiating evaluation in storytelling. In: John Baugh,
Crawford Feagin, Gregory. R. Guy and Deborah Schiffrin (eds.), Towards a Social Science
of Language: Papers in Honour of William Labov, Volume 2, 77–102. Philadelphia: John
Benjamins.
Goodwin, Marjorie Harness and Charles Goodwin 1986. Gesture and coparticipation in the activ-
ity of searching for a word. Semiotica 62(1–2): 51–75.
Greiffenhagen, Christian and Rod Watson 2007. Visual repairables: Analyzing the work of repair
in human-computer interaction. Visual Communication 8(1): 65–90.
Haddington, Pentti 2006. The organization of gaze and assessments as resources for stance taking.
Text & Talk 26(3): 281–328.
Haddington, Pentti, Lorenza Mondada and Maurice Nevile (eds.) 2013. Mobility and Interaction.
Berlin: De Gruyter.
Hausendorf, Heiko, Lorenza Mondada and Reinhold Schmitt (eds.) 2012. Raum als Interaktive
Resource. Tübingen, Germany: Narr.
Hayashi, Makoto 2003. Joint Utterance Construction in Japanese Conversation. Amsterdam: John
Benjamins.
ment in coordinated participation in situated activities. Semiotica 156(1/4): 21–53.
Heath, Christian 1982. The display of recipiency: An instance of sequential relationship between
speech and body movement. Semiotica 42: 147–167.
University Press.
Heath, Christian 1989. Pain talk: The expression of suffering in the medical consultation. Social
Psychology Quarterly 52(2): 113–125.
Heath, Christian 1997. Analysing work activities in face to face interaction using video. In: David
Silverman (ed.), Qualitative Research. Theory, Method and Practice, 266–282. London: Sage.
Heath, Christian, Jon Hindmarsh and Paul Luff 2010. Video in Qualitative Research. London:
Sage.
Heath, Christian and Paul Luff 1992. Collaboration and control: Crisis management and multime-
dia technology in London Underground Line Control Rooms. Journal of Computer Supported
Cooperative Work 1(1–2): 69–94.
Heath, Christian and Paul Luff 2000. Technology in Action. Cambridge: Cambridge University
Press.
Heritage, John C. 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Iwasaki, Shimako 2009. Initiating interactive turn spaces in Japanese conversation: Local projec-
tion and collaborative action. Discourse Processes 46: 226–246.
Koschmann, Timothy (ed.) 2011. Understanding Understanding. Special issue of Journal of Prag-
matics 43.
Koschmann, Timothy, Curtis LeBaron, Charles Goodwin, Alan Zemel and Gary Dunnington
2007. Formulating the triangle of doom. Gesture 7(1): 97–118.
Lindström, Anna and Lorenza Mondada (eds.) 2009. Research on Language and Social Interaction.
(Special issue on Assessments) 42: 4.
Markaki, Vassiliki and Lorenza Mondada 2011. Embodied orientations towards co-participants in
multinational meetings. Discourse Studies 13(6): 1–22.
McNeill, David 1981. Action, thought, and language. Cognition 10: 201–208.
Modaff, Daniel P. 2003. Body movement in the transition from opening to task in doctor-patient
interviews. In: Philipp Glenn, Curtis D. LeBaron and Jenny Mandelbaum (eds.), Studies in
Language and Social Interaction: In Honor of Robert Hopper, 411–422. Mahwah, NJ: Erlbaum.
Mondada, Lorenza 2003. Working with video: How surgeons produce video records of their ac-
tions. Visual Studies 18(1): 58–72.
Mondada, Lorenza 2005. La constitution de l’origo déictique comme travail interactionnel des
participants: Une approche praxéologique de la spatialité. Intellectica 2–3(41–42): 75–100.
Mondada, Lorenza 2006a. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8: 117–129.
Mondada, Lorenza 2006b. Video recording as the reflexive preservation-configuration of phenom-

enal features for analysis. In: Hubert Knoblauch, Jürgen Raab, Hans-Georg Soeffner and
Bernt Schnettler (eds.), Video-Analysis. Methodoloy and Methods. Qualitative Audiovisual
Data Analysis in Sociology, 51–68. Frankfurt am Main: Lang.
Mondada, Lorenza 2007a. Multimodal resources for turn-taking: Pointing and the emergence of
Mondada, Lorenza 2007b. Operating together through videoconference: Members’ procedures ac-
complishing a common space of action. In: Stephen Hester and David Francis (eds.), Orders of
Ordinary Action, 51–68. Aldershot: Ashgate.
Mondada, Lorenza 2008. Using video for a sequential and multimodal analysis of social interac-
tion: Videotaping institutional telephone calls. FQS (Forum: Qualitative Sozialforschung /
Forum: Qualitative Social Research) (www.qualitative-research.net/) 9(3).
Mondada, Lorenza 2009. Emergent focused interactions in public places: A systematic analysis of
the multimodal achievement of a common interactional space. Journal of Pragmatics 41:
1977–1997.
Mondada, Lorenza and Reinhold Schmitt (eds.) 2010. Situationseröffnungen: Zur Multimodalen
Herstellung Fokussierter Interaktion. Tübingen: Narr.
Nevile, Maurice 2004. Beyond the Black Box: Talk-in-Interaction in the Airline Cockpit. Aldershot:
Ashgate.
Peräkylä, Anssi and Johanna Ruusuvuori 2006. Facial expression in an assessment. In: Hubert
Knoblauch, Jürgen Raab, Hans-Georg Soeffner and Bernt Schnettler (eds.), Video-Analysis.
Methodoloy and Methods. Qualitative Audiovisual Data Analysis in Sociology, 127–142.
Frankfurt am Main: Peter Lang.
Robinson, Jeffrey 2001. Closing medical encounters: Two physician practices and their impli-
cations for the expression of patients’ unstated concern. Social Science & Medicine 53: 639–656.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Action, 266–296. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. 1998. Body torque. Social Research 65(3): 535–586.
Schegloff, Emanuel A. 2002. Beginnings in the telephone. In: James E. Katz and Mark Aakhus
(eds.), Perpetual Contact: Mobile Communication, Private Talk, Public Performance, 284–
300. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. 2007. Sequence Organization in Interaction: A Primer in Conversation Ana-
lysis, Volume 1. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. and Harvey Sacks 1973. Opening up closings. Semiotica 8: 289–327.
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung 6: 17–61
(www.gespraechsforschung-ozs.de).
Sidnell, Jack 2006. Gesture in the pursuit and display of recognition: A Caribbean case study.
Semiotica 156(1–4): 55–87.
Stivers, Tanya 2008. Stance, alignment and affiliation during storytelling: When nodding is a token
of affiliation. Research on Language and Social Interaction 41(1): 31–57.
Streeck, Jürgen 1996. How to do things with things: Objets trouvés and symbolization. Human
Studies 19(4): 365–384.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: gestures at the transition place. In: Peter Auer
and Aldo di Luzio (eds.), The Contextualization of Language, 135–157. Amsterdam: John
Benjamins.
Streeck, Jürgen, Charles Goodwin and Curtis LeBaron (eds.) 2011. Embodied Interaction, Lan-
guage and Body in the Material World. Cambridge: Cambridge University Press.
14. Ethnography: Body, communication, and cultural practices 227
Suchman, Lucy 1993. Technologies of accountability: Of lizards and airplanes. In: Graham Button
(ed.), Technology in Working Order: Studies of Work, Interaction and Technology, 113–126.
London: Routledge.
Whalen, Jack 1995. Expert systems versus systems for experts: Computer-aided dispatch as a
support system in real-world environments. In: Peter J. Thomas (ed.), The Social and Interac-
tional Dimensions of Human-Computer Interfaces, 161–183. Cambridge: Cambridge University
Press.
Whalen, Marylin and Don H. Zimmerman 1987. Sequential and institutional contexts in calls for
help. Social Psychology Quarterly 50: 172–185.
Lorenza Mondada, Lyon (France), Basel (Swizerland)
14. Ethnography: Body, communication, and

cultural practices
1. Ethnography as a method
2. Early research on cultural differences in bodily practices and representations
3. Newer developments
4. The body as object
5. The body as means of expression
6. References
Abstract
This article gives an overview of studies that approach the body and its role for commu-
nication and cultural practices from an ethnographic perspective. First, the ethnographic
perspective itself is presented as a cultural practice that dates back to Antiquity and con-
sists in describing alien bodies and communicative practices. Secondly, the article presents
the early anthropological interest in cultural differences in the uses and representations of
the body that started with studies by Franz Boas in the US and Marcel Mauss in France.
Most innovatively, Gregory Bateson has begun to use videotaping as empirical starting
point, thus deviating from the literary orientation of classical ethnography. Thirdly, the
article provides a summary of newer developments in the field of anthropology that
address the body and its cultural production from the perspectives of ritual, medical and
political anthropology as well as from the anthropology of the senses. Fourthly, ethnogra-
phers have often reported that bodies have become the object of estheticization in art, but
also through bodily mutilation. Fifthly and finally, since bodies are extensively used for
communicative purposes, the article summarizes recent findings of cultural differences in
gestural communication detected through the detailed studies of video footage.
1. Ethnography as a method
As a rule, the term “ethnography” refers to the practices and products of description of
alien people and to their folkways and cultural achievements. Commonly, Herodotus of
Halicarnassus, who in his “Histories” described, amongst others, the appearances and
cultural particularities of the “Androphagi” of Northwestern Europe, the “Hyperbor-

eans” of Northeastern Europe, and the “Macrobians” of Northeastern Africa, is men-
tioned as founding father of this tradition. His descriptions frequently refer to the
bodies of the “Others,” depicting them as “headless and breast-eyed,” “diminutive”
or “giant,” or as the “tallest and handsomest of all men” (cf. Asheri 2007). The commu-
nicative abilities of the others were also among the issues that appeared worthy of men-
tion. The “dog-headed” people of the Indian mountains, e.g., are said by early traveller
Ctesias in the 5th century BC to “speak no language, but bark like dogs, and in this
manner make themselves understood by each other. (…) They understand the Indian
language but are unable to converse, only barking or making signs with their hands
and fingers by way of reply” (Freese 1920: 115).
Ethnography as practiced in scientific anthropology that emerged in the 19th and
20th centuries classically adopts a holistic standpoint that, instead of merely listing
extraordinary cultural phenomena as it was practiced before, relates small case descrip-
tions to broader cultural information in order to detect the logical interconnectedness
of cultural phenomena. For example, communicative practices are related to social
hierarchies, political power structures, and religious values or to models derived from
modes of economic subsistence so that they become comprehensible for outsiders.
“Ethnography,” however, also relates to a specific research method that is similarly
known as participant observation and fieldwork. Ever since Malinowski (1922)
launched this special research practice, participant observation is interested in accessing
the “native’s point of view,” i.e., it aims at gaining first hand insider knowledge by par-
ticipating in the everyday life of the people studied. To achieve this, the ethnographer,
firstly, invests a lot of time in his or her research, one year of fieldwork often being stan-
dard. Second, the ethnographer also engages with his or her own body as research
instrument in order to gain first-hand experiences that permit him or her to closely
understand the members of the group under study (cf. Madden 2010).
The preferred type of data used by ethnographers is composed of field notes and dia-
ries, which, in a textual form, preserve the experiences and observations made by the
ethnographer in the field. Ethnographic interviews often complement these data.
In the beginnings of the 1960s, contemporary to the emergence of interactional soci-
ology, ethnomethodology, conversation analysis, sociolinguistics, and other approaches,
the Ethnography of Speaking (later ethnography of communication) was launched. In a
foundational paper, Hymes (1962) called for a research program that focuses on “the
situations and uses, the patterns and functions, of speaking” not only as a methodolog-
ical device, but as a new scientific focus on it as a social and cultural “activity in its own
right” (1962: 16). In his outline, Hymes accorded two overarching goals to his endeavor.
First, it was a decidedly empirical enterprise, i.e. it called for the production of thorough
ethnographic reports about “the many different ways of speaking” (Sherzer and Dar-
nell 1972: 549) that exist within every community. This call was extensively answered
by a number of scholars who had partly been Hymes’ students.
However, Hymes did not envisage a mere descriptive project. He secondly argued
that, in the end, “systematic descriptions can give rise to a comparative study of the
cross-cultural variations in a major mode of human behavior (a “comparative speaking”
beside comparative religion, comparative law, and the like), and give it its place in
theory; for the contribution to other kinds of concern, such as studies of the formation
of personality in early years” (Hymes 1962: 17).
The ethnographic descriptions that resulted from Hymes’ appeal addressed all kinds
of topics that a “speech event” might encompass. The methodological strategy to focus
on speech events entails that neither actors, nor structures, nor situations, nor types of
behavior, but instead realized instances of verbal exchange are viewed as the units of
inquiry. Gumperz has defined speech events as “units of verbal behavior bounded in
time and space” (Gumperz 1982: 165) and “longer strings of talk each of which is
marked by a beginning, middle and an end” (Gumperz 1992: 44).
In order to methodologically approach speech events, Hymes (1972: 58–65), in his
famous mnemonic S-P-E-A-K-I-N-G acronym, has named eight elements to be focused
on: Setting, Participants, Ends, Acts, Keys (in the Goffmanian sense as framing de-
vices), Instrumentalities, Norms, and Genres. Thus, the ethnography of speaking and
communication was projected in a broad fashion so as to encompass bodily practices
as well.
Most of the ethnographic descriptions produced in this tradition provide detailed ac-
counts of speech events and genres, as well as of models and taxonomies of speaking
concepts. But, as critical voices have objected, they failed to advance to the comparative
level of the Hymesian endeavor and lost themselves in providing detailed descriptions
(see Duranti 1988; Keating 2001). In contrast to conversation analysis, the ethnography
of speaking adopts a relativist stance, in resisting all attempts of generalizing argu-
ments with counter-examples of the people they study. The ethnography of speaking
has also been criticized for a non-empiricist stance insofar as their studies only seldom
actually quote unedited talk. As Moerman (1988: 10) states, for example, in Bauman
and Sherzer’s seminal volume (1974), only two chapters do so, one of them being
provided by conversation analyst Harvey Sacks.
Only in recent years, especially in a new approach that was called “micro-ethnogra-
phy,” the recordings and thorough analyses of situations unaffected by the researcher
were also included as data in ethnographic descriptions. The term “micro-ethnography”
had first been used by scholars who aimed at studying “ ‘big’ social issues through an
examination of ‘small’ communicative behaviour” (LeBaron 2005: 494; cf. Streeck
and Mehus 2005: 381), as present, for example, in the moment-by-moment behaviors
whereby social stratification was accomplished in the classroom (firstly Smith 1966).
That is, micro-ethnographers pay thorough attention to the “small means whereby
events were jointly accomplished” which they see as “the building blocks of micro-
cultures enacted and constituted collectively” (LeBaron 2005: 494). “Micro-ethnography”
today has adopted an ethnomethodological and praxeological orientation (Streeck and
Mehus 2005: 385–386, 387–393) and aims at identifying the methods of establishing
order and the means of co-constructing meaning in the minutiae of social interaction.
Thus, in contrast to classical ethnography, which has a long-standing tradition of in-
cluding the ethnographer’s subjectivities and literary skills in ethnographic descriptions,
the micro-ethnographic approach is radically empiricist. It tries to base any analysis on
concrete hard data provided in particular by video recordings and complemented by ob-
servations, interviews, and document research. Communication and interaction are con-
sidered as constitutive for social reality. The analysis of such processes, constituted by
the visible and audible behaviors of social actors and embedded in a specific social
and material environment, is therefore its chief goal. Human bodies in interaction
are viewed not as intentional actors or physical apparatuses that enclose inner states,
but as living persons who are enculturated and socialized, thus embodying cultural
principles and preferences in their bodily practices. Micro-ethnographers thus assume

that social and cultural macro-phenomena are embodied and sustained in micro-
processes of interaction. As Schegloff (1988: 100) puts it, “relative to their domain
[the entities microethnographers are concerned with] are not ‘micro,’ and the elements
of conduct taken up in their analyses are not ‘details,’ i.e. small relative to the normal
size of objects in that domain. They are just the sort of building blocks out of which
talk-in-interaction is fashioned by the parties to it; they are the ordinary size.”
2. Early research on cultural differences in bodily practices

and representations
As mentioned at the beginning of this text, people have always been curious about the
bodies and bodily practices of the others they encountered while traveling for eco-
nomic, political, or other reasons. Hewes (1974) has provided an account of bodily prac-
tices, especially of gestures, that are reported from situations of cultural contact in the
context of the European Age of Discovery. The descriptions mainly refer to emblematic
gestures used for typical human needs such as drinking and eating, but also depict some
spontaneous ways of conveying ideas with bodily means. Frobisher, for example, who
explored the Arctic islands north of Hudson’s Bay in the late 16th century, reports
from the interactions with the local Inuit that they communicated to him by “making
signes with three fingers, and pointing to the Sunne, that they meant to returne within
3 days, untill which time we heard no more of them, & about the time appinted they
returned …” (quoted in Hewes 1974: 10). Of their communicative problems, he stated:
“And if they have not seene the thing whereof you aske them, they will wincke, or
cover their eyes with their hands, as who would say, it hath been hid from their
sight. If they understand you not whereof you aske them, they will stop their eares”
(Hewes 1974: 11).
Professional anthropologists in later times, however, did not extensively study the
body culture of the societies they were engaged with. One reason for this is that this
task was usually taken over by physical anthropologists who studied “racial” variations
in anatomy and their relation to culture. Thus, one reason for the exclusion of this topic
might count that cultural anthropology was precisely born out of a profound skepticism
towards biological explanations of culture (cf. Boas 1914), so that, as a counter-reaction,
cultural anthropology was more interested in the mental worlds of the people studied
than in their bodies. Only in the late 20th century, with the premise to view the body
as cultural and social construction, did the anthropology of the body become an important
issue in anthropology again.
There were, however, some classical anthropological topics such as ritual and dance
that also implied studying bodily practices. Boas, for example, studied the winter dances
of the Kwakiutl of Northwestern America (1897: 431–499). While at the beginning he
sometimes used his own body to mimetically represent these dances ethnographically
in museums (Hinsley and Holm 1976), he later on, in 1930, also used a camera to film
them in the field (Ruby 1980). Boas’ student Efron ([1941] 1972) was among the first
to study gesture from a cultural anthropological perspective. In his book on the adaption
processes of the gesture culture of Jews in the lower East side of New York City,
he described such culture specific gestural forms as “buttonholing” (i.e., boring with
the index-finger through the buttonhole of the coat of the interlocutor) along with
idiosyncratic “contactual” and other gestures (Efron 1972: 91–92).
In France, Durkheim (1912) emphasized that the human body is the central object
of society. According to him, society is only able to exist as a collective entity because
it is able to appropriate the sensing bodies of its individuals. This is mainly achieved
through rituals in which society conveys specific sentiments to the bodily individual
while the human individual gains experiences that convince it of the power and profit
of society. Durkheim’s student, Marcel Mauss ([1935] 1973: 73), even more explicitly
treated the human body as the “main locus of culture” and drew attention to the phe-
nomena of its cultural training, which he, preceding Bourdieu, called “habitus.” His
ideas became central for any ethnography of the body: Cultures develop, as Mauss
(1973: 75) views it, “techniques of the body,” using the body both as “man’s first and
most natural technical object, and at the same time technical means.” Every act, how-
ever asocial, or even antisocial, is culturally laden; “there is perhaps no ‘natural way’
for the adult” (1973: 74). For example, “[t]he positions of the arms and hands while
walking form a social idiosyncrasy, they are not simply a product of some purely
individual, almost completely psychical arrangements and mechanisms” (Mauss 1973:
72). Concerning embodied knowledge and abilities, as Mauss (1973: 83) states, the
anthropologist reaches their limits of understanding: “Nothing makes me so dizzy as
watching a Kabyle going downstairs in Turkish slippers (babouches). How can he
keep his feet without the slippers coming off? I have tried to see, to do it, but I can’t
understand.”
Thus, with his new attention to the body as instrument of “cultivation,” Mauss iden-
tified a whole “new field of studies: masses of details which have not been observed, but
should be, constitute the physical education of all ages and both sexes” (1973: 78).
Mauss, however, only focused on the “techniques of the body” in singular activities
and, apart from military defilation, did not pay attention to processes in which several
bodies are coordinated in their activities. He largely neglected those practices in which
processes of intersubjectivation are generated and performed by the socialized and
enculturated body.
Another student of Durkheim’s, Robert Hertz (1907), made an early study –
preceding structuralist approaches – about the power of collective representations to
unite groups. Drawing on funeral rituals, he developed the theory that these represen-
tations often refer to universal body schemata such as right and left. These obvious
physical orderings are subsequently transferred to culture-specific symbolic meanings
such as good and bad, right and wrong, virtuous and sinful, endowing them with an
apparent naturalness.
From a methodological perspective, interestingly, ethnographic film has been used in
France much earlier than by Boas in the US. In fact, the ethnographic usage is as old as
cinema itself: in 1895, the same year the Lumière brothers held the world’s first public
film screenings, French physician Felix-Louis Regnault filmed the pottery-making tech-
niques of a Wolof woman at the Exposition Ethnographique de l’Afrique Occidentale in
Paris. Subsequent films were devoted to the cross-cultural study of movement: climbing
a tree, squatting, walking, by Wolof, Fulani, and Diola men and women. Regnault also
published a methodological paper, which differentiated his aims from those of the Lu-
mière brothers (Regnault 1931). In contrast to them, Regnault regarded the camera as a
scientific instrument that was able to fix transient human events for further analysis.
He went so far as to claim that ethnography only attains the precision of a science
through the use of such instruments (MacDougall 1998: 179) and also proposed the for-
mation of anthropological film archives (Brigard 1975: 15–16). However, it took some
time before these ideas were realized.
In the US-American tradition, Boas’ initiative to study the body from a non-
biological perspective was continued, for one, by Erving Goffman and his interactionist
sociology (1963, 1967). Goffman, however, stood in an even closer relationship to the
traditions of the Chicago School (Park, Hughes), Durkheim, and Julian Huxley
when he studied the dramaturgies of the human persona and their body in public and pri-
vate spaces, in total institutions and loose encounters, thereby focusing on face-to-face
interaction and applying ethnographic methods.
More directly related to Boas was the work of Gregory Bateson and Margaret
Mead (1942), Mead having been a student of Boas and Bateson and having studied
with Alfred Haddon, who was the first to use the technique of filming in the field
during the famous Torres Straits Expedition (1898/1899) (Brigard 1975: 16). Bateson
and Mead were the first to fully exploit the potentials of visual methods in ethno-
graphic research in what they termed an “experimental innovation” (1942: xi).
They took thousands of photographs and shot footage for several ethnographic
films in a Balinese village in order to use them as primary research method and
not, as it had sometimes been done before, as visual illustrations for written ac-
counts. By artfully arranging these photographs in their monograph and by examin-
ing their films frame by frame, they attempted to reveal the unspoken, even the
unspeakable aspects of culture so as to eventually get knowledge of what it means
and how it feels to be a Balinese body, and to engage with the world through Bali-
nese senses (cf. Streeck n.d.). Some time earlier, Bateson (1936) had used the term
“ethos” for this tacit bodily sociality and culturality.
The experimental methodological endeavor in their joint book had grown out of
their dissatisfaction with the established descriptive ethnography that they found
“far too dependent upon idiosyncratic factors of style and literary skill” (Bateson
and Mead 1942: xi). Their own project, in contrast, was designed to help solving
the difficulties of the ethnographic method “to communicate those intangible aspects
of culture which ad been vaguely referred to as its ethos” (Bateson and Mead 1942:
xi; orig. emph.). As they found many cultural phenomena in Bali to be tacit (most
prominently, maybe, the practices of education to which they dedicated a significant
to part of their book), Bateson and Mead felt that having found in visual analysis a
way to show how the Balinese “through the way in which they, as living persons, mov-
ing, standing, eating, sleeping, dancing, and going into trance, embody culture, how
gesture and posture are expressive of a people’s character” (Bateson and Mead
1942: xii).
Bateson was later an important part of what came to be known as the “Natural His-
tory of an Interview” (NHI) group. This group constituted itself around the project of
experimentally interpreting a video recording of a psychiatric interview (provided by
Bateson) from a bunch of different disciplinary perspectives including psychiatry, lin-
guistics, and anthropology. They did not only pay attention to what was said in this
interview, but particularly to how it was said, including bodily ways of communication.
A number of approaches that later on became quite influential, such as, for example,
Ray Birdwhistell’s “kinesics,” Albert Scheflen’s “context analysis,” Adam Kendon’s
interactional approach social to gesture, Erik Erickson’s theory of psychosocial devel-

opment, or Starkey Duncan’s social psychology of face-to-face interaction, grew out of,
or were influenced by, this joint experiment (cf. Leeds-Hurwitz 1987).
Thus, the Natural History of an Interview initiative paved the way for those ap-
proaches in visual anthropology that aimed at using camera and film not only as docu-
mentary instrument for illustrative, educational and archival reasons (as it is usually
practiced in “ethnographic filmmaking”), but as observation device for methodological
and analytical purposes (as applied in interaction and practice research).
3. Newer developments
A branch of research that – following Durkheim – has relatively early been interested in
the human body and its practices is ritual studies. Particularly after the focus of ritual
studies turned away from asking for meaning towards concentrating on practice (see
Bell 1992), students of ritual began to include bodily practices into their scope of
research. For example, bodily practices in initiation rituals (such as “ritualized homo-
sexuality” [Herdt 1993] in Melanesian societies), the dissolution of bodily integrity in
illness and trance healing (as, e.g., breath, pain, and emotion in the shamanistic prac-
tices of the Sherpa in the Nepal Himalayas, cf. Desjarlais 1992) or the transformation
of living bodies into dead matter (Connor 1995) were studied.
Ritual studies triggered the creation of the anthropology of performance in the 1960s
that, after decades in which the mental life was granted theoretical privilege, focused
on the bodily-material aspects of culture, such as dance and movement, but also on
face-to-face interaction, verbal practices, as well as acting and theatrical performance
(Schechner 1988). Ethnographers of performance often study aesthetic languages of
bodies across cultural divides. Hahn (2007), in a study of traditional Japanese
dance, pays particular attention to the ways of body-to-body transmission of tacit
knowledge, and how they influence the sense of self. She argues that bodily knowledge
contributed to the construction of “boundaries of existence” that define physical and
social worlds. Sometimes, bodily practices are related to memory. Nelson (2008), for
example, examines how contemporary Okinawa Japanese storytellers, musicians, and
dancers engage with the legacies of a cruel Japanese colonial era and the devastations
of World War II by recalling memories, restructuring bodily experiences and practices,
and rethinking actions as they work through the past in order to re-arrange the
present.
A second contemporary anthropological sub-discipline interested in bodily practices
is medical anthropology. Its main objective consists in describing culture-specific as-
sumptions about normality and the conceptualization of health and illness, thus arguing
against the universal reification of disease entities advanced by the natural sciences.
Medical anthropology ethnographically analyses ideas about health (often including
or even centering on psychic health) as well as the healthy body, its substances, and
its epistemic state (McCallum 1996). Scholars such as Turner (1968) insist that the effi-
cacy of healing rituals is grounded in the cultural framing of subjectively experienced
sensations. To this effect, Farmer, e.g., interprets “bad blood” and “spoiled milk” in
rural Haiti as “moral barometers that submit private problems to public scrutiny”
(Farmer 1988: 62). Health and illness are thus sometimes seen as extensions of the
individual body to society and its morals (consider, e.g., discussions about AIDS as
divine retribution).
A third topic addressed by the ethnography of the body is concerned with the ways
by which human bodies are positioned in space. Especially in hierarchical societies,
these positionings are guided by norms that imply clear social roles endowed with rights
and duties of expression (see Duranti 1994; Wolfowitz 1991). Moreover, as Duranti
(1992) has stated, words, body movements, and lived space are sometimes – as in West-
ern Samoa – interconnected in interactional practice. The words used in ceremonial
greetings, as he says, cannot be fully understood without reference to sequences of
acts that include bodily movements in reference to symbolically laden space, as the per-
formance of ceremonies is located in and at the same time constitutive of the sociocul-
tural organization of space inside the house. According to Duranti’s theory, Samoan
practices of “sighting” embody language and space through “an interactional step
whereby participants not only gather information about each other and about the set-
ting but also engage in an negotiated process at the end of which they find themselves
physically located in the relevant social hierarchies and ready to assume particular
institutional roles” (Duranti 1992: 657).
Thus, by way of cross-cultural comparison, the anthropology of the body has identi-
fied variations of body concepts and practices and their relation to culture. Often their
findings also entail a critique of Western epistemological categories, as they are, for
example, inherent in the Cartesian dualism of body and soul. They have proved not
to be universal (cf. Lock 1993).
Further and more recent studies on the body include such topics as the cultural con-
struction of the self and the other, the forms and functions of emotions, specific cultures
of subjectivation, concepts and constructions of gender, as well as modalities and forms
of power and resistance as they are embedded (or, to put it in the Foucaultian jargon of
these approaches, “inscribed”) in the body (cf., e.g., Butler 1993; Lock 1993). “Body
politics” has thus developed to be an important concept of those anthropological orien-
tations, even though the methods applied in these fields often do not meet the standard
expected by scholars who are used to deal with detailed transcriptions of social events
(cf. Antaki et al. 2003).
Part of the newer anthropology of the body is, furthermore, the anthropology of the
senses that has emerged since the 1990s. In reaction to media studies that claimed the
superiority of cultures that structurally emphasize the visual sense (cf. Classen 1997),
anthropological studies were conducted that explore the whole range of the sensory
practices in other cultures, focusing on the tactile, gustatory, and olfactory but also
on the visual and auditory realm (cf. Howes 1991, 2004). They even study variations
of the “sixth sense” (Howes 2009) and contrast the Aristotelian “five sense sensorium”
with different cultural models of up to 17 senses (Synnott 1993: 155). Some of these stu-
dies also explore the usage of the senses within practical activities in a very detailed
manner (see, e.g., Goodwin 1994).
Most recently and with great methodological rigor, developments in medical tech-
nology, for example in surgery or genetic manipulation, that influenced the ways in
which bodies are conceptualized, became an important topic for ethnographers of the
body. Some of these studies address the ways bodies are dissociated and reconfigured
in the operation room when part of their expressive functions are delegated to technol-
ogies (e.g. Mol 2001). Other studies deal with changes in body concepts that are
provoked by an increasing blend between human bodies and technologies based on

robotic developments (e.g., Haraway 1991; Hogle 2005).
4. The body as object

In the ethnographic tradition, the body has also been an important topic in the study
of material culture. As ethnographic evidence suggests, the value of artistic objects
and artifacts, in fact, is not necessarily related to aesthetic categories. Rather, as Gell
(1998) advances, persons or social agents are in certain contexts substituted for by art
objects. Their agency, the power to influence their viewers, to make them act as if they
are engaging not with dead matter, but with living persons, is thus a central issue in
ethnography (Gell 1998: 5).
The relation between human expressiveness and objects is sometimes reversed. As
Leenhardt (1979) states, drawing on extensive fieldwork among the New-Caledonians,
ancestors of chiefs are always represented with exaggerated mouths and long tongues
sticking out. Objects that are presented as a gift, on the other hand, are often viewed
as part of the communicative practice of their givers. Thirdly, objects are also sometimes
used within communicative practices (such as sweet plants that are used as arguments
to symbolize peace and harmony in discussions in Papua New Guinea; cf. Bateson 1936:
126) or as devices that replace utterances in order to achieve indirection (such as
clothes adorned with mottos in Central and East Africa, cf. Beck 2001; Parkin 2004).
However, not only art objects are viewed as part of communicative practice; the
human body itself is also used in an objectified way as a medium of communicative
practice. Most commonly, rituals such as courtly ceremonies, as well as clothing, para-
phernalia, and symbols used in the same context, have been investigated as forms of
bodily communication (Althoff 2002; Stollberg-Rillinger 2011). But also body ornamen-
tation (like scarification, tattooing etc.) is a well-explored topic in ethnographic descrip-
tions. Turner (1980) states that the Kayapo of Brazil aim at detaching themselves from
nature and socialize by the use of body adornments (including lip plates and tattoos).
The external surface of the body becomes the “common frontier of society” (1980: 113)
and every individual “becomes a microcosm of the Kayapo body politic” (1980: 121).
As LaTosky (2006) has claimed in relation to the Mursi of Southern Ethiopia, lip-plates
might also serve to shape individual identities and gain self-esteem. In Brazil, in reverse,
the body can adopt the function of common ground in a highly hybrid society (see
Edmonds 2011; Freyre 1986).
5. The body as means of expression

Thus, in contrast to the Darwinian argument that bodily expressions (in his case, partic-
ularly facial expressions) are genetically determined and therefore both natural and
universal, the ethnography of the body has demonstrated that they are rather to be
viewed as culturally shaped and socially acquired. However, it was only very recently
that bodily practices have been documented in publications in ways that permit readers
to directly access the data analyzed and to closely study bodily phenomena, as it is
common today in some methodological currents of interactional linguistics and sociol-
ogy such as conversation analysis, the study of embodied situated activities, and
microethnography. The “multimodal turn” in the ethnography of the body, if ever, was
performed relatively late. Since ethnography as the most important method of practice
research is constantly confronted with two dangers, the one of excessively referring to
the knowledge of the researcher as an actor (leading at the end to the stance of using
oneself as an informant) and the other of restricting oneself to positive behavioral data
without any access to the subjective meaning of the actors (which, at the end, leads back
to a behavioristic position), micro-ethnography today, through combining both meth-
ods, is a rather complex, laborious, and time-consuming endeavor (cf. Farnell 1994).
This might be the reason why it has been applied only very tentatively in research
about other cultures, where the acquisition of a foreign language and the corresponding
body practices are additionally required, when micro-ethnography is already intricate in
the culture of the ethnographers themselves.
Even though there are a few studies which compile gestures in Non-Western cultures
(e.g., Baduel-Mathon 1971; Creider 1977), ethnographically contextualized studies of
the inherently embedded nature of bodily forms of communication that are well
grounded in empirical data gained by video recordings remain exceptional until
today. There are, however, some ethnographic, cross-cultural studies on gesture, as
for example on pointing gestures such as lip- and eye-click-pointing (Enfield 2001;
Orie 2009; Sherzer 1973) or absolute or relative directional space (Haviland 1993,
2000), as well as on gesture taboos (Kita and Essegbey 2001), emblematic gestures
(Brookes 2001, 2004; Sherzer 1991), co-speech gesturing (Eastman 1992; Enfield
2003), iconic gestures (Enfield 2005; Haviland 2007; Streeck 2009), and individual
gestures (Streeck 2002). In most recent times, ethnographic studies grounded on
video data that explore the interactional and practical consequences yielded by
culture-specific concepts of the senses have begun to be conducted (Meyer 2011).
6. References
Althoff, Gerd 2002. The variability of rituals in the middle ages. In: Gerd Althoff, Johannes Fried
and Patrick J. Geary (eds.), Medieval Concepts of the Past. Ritual, Memory, Historiography,
Antaki, Charles, Michael Billig, Derek Edwards and Jonathan Potter 2003. Discourse analysis
means doing analysis: A critique of six analytic shortcomings. Discourse Analysis Online 1
[http://www.shu.ac.uk/daol/articles/v1/n1/a1/antaki2002002-paper.html].
Asheri, David 2007. A Commentary on Herodotus: Books I–IV. Oxford: Oxford University Press.
Baduel-Mathon, Céline 1971. Le langage gestuel en Afrique Occidentale. Recherches Bibliogra-
phiques. Journal de la Société des Africanistes 41(2): 203–249.
Bateson, Gregory 1936. Naven. Cambridge: Cambridge University Press.
Bateson, Gregory and Margaret Mead 1942. Balinese Character: A Photographic Analysis. New
York: New York Academy of Sciences.
Bauman, Richard and Joel Sherzer (ed.) 1974. Explorations in the Ethnography of Speaking.
London: Cambridge University Press.
Beck, Rose-Marie 2001. Ambiguous signs: the role of the ‘kanga’ as a medium of communication.
Afrikanistische Arbeitspapiere 68: 157–169.
Bell, Catherine M. 1992. Ritual Theory, Ritual Practice. New York: Oxford University Press.
Boas, Franz 1897. The Social Organization and the Secret Societies of the Kwakiutl Indians. Wash-
ington: Government Printing Office.
Boas, Franz 1914. Kultur und Rasse. Leipzig: Veit.
Brigard, Emilie de 1975. The history of ethnographic film. In: Paul Hockings (ed.), Principles of
Visual Anthropology, 13–43. The Hague: De Gruyter Mouton.
Brookes, Heather J. 2001. O clever ‘He’s streetwise.’ When gestures become quotable. The case of
the clever gesture. Gesture 1(2): 167–184.
Brookes, Heather J. 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Butler, Judith 1993. Bodies That Matter: On the Discursive Limits of “Sex." New York: Routledge.
Classen, Constance 1997. Foundations for an anthropology of the senses. International Social
Science Journal 153: 401–412.
Connor, Linda H. 1995. The action of the body on society: Washing a corpse in Bali. Journal of the
Royal Anthropological Institute 1(3): 537–559.
Creider, Chet A. 1977. Towards a description of east African gestures. Sign Language Studies 14:
1–20.
Desjarlais, Robert R. 1992. Body and Emotion: The Aesthetics of Illness and Healing in the Nepal
Himalayas. Philadelphia: University of Pennsylvania Press.
Duranti, Alessandro 1988. Ethnography of speaking: Toward a linguistics of the praxis. In: Frederick
J. Newmeyer (ed.), Linguistics: The Cambridge Survey, Volume 4: Language: The Socio-Cultural
Context, 210–228. Cambridge: Cambridge University Press.
Duranti, Alessandro 1992. Language and bodies in social space: Samoan ceremonial greetings.
American Anthropologist 94: 657–691.
Duranti, Alessandro 1994. From Grammar to Politics: Linguistic Anthropology in a Western
Samoan Village. Berkeley: University of California Press.
Durkheim, Emile 1912. Les Formes Élémentaires de la Vie Religieuse: Le Système Totémique en
Australie. Paris: Alcan.
Eastman, Carol M. 1992. Swahili interjections: Blurring language-use/gesture-use boundaries.
Journal of Pragmatics 18: 273–287.
Edmonds, Alexander 2011. Pretty Modern: Beauty, Sex, and Plastic Surgery in Brazil. Durham:
Duke University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton. Orig. First published [1941].
Enfield, N. J. 2001. Lip pointing: A discussion of form and function with reference to data from
Laos. Gesture 1(2): 185–212.
Enfield, N. J. 2003. Producing and editing diagrams using co-speech gesture: Spatializing nonspa-
tial relations in explanations of kinship in Laos. Journal of Linguistic Anthropology 13(1): 7–50.
Enfield, N. J. 2005. The body as a cognitive artifact in kinship representations: Hand gesture dia-
grams by speakers of Lao. Current Anthropology 46(1): 51–81.
Farmer, Paul 1988. Bad blood, spoiled milk: Bodily fluids as moral barometers in rural Haiti.
American Ethnologist 15(1): 62–83.
Farnell, Brenda M. 1994. Ethno-graphics and the moving body. Man 29(4): 929–974.
Freese, John H. 1920. Photius: The Library. London: Society for Promoting Christian Knowledge.
Freyre, Gilbert 1986. The Masters and the Slaves. A Study in the Development of Brazilian Civili-
zation. Berkeley: The University of California Press. Orig. First published [1933].
Gell, Alfred 1998. Art and Agency. Oxford: Clarendon.
Goffman, Erving 1963. Behavior in Public Places. Notes on the Social Organization of Gatherings.
Glencoe: Free Press.
Goffman, Erving 1967. Interaction Ritual. Essays on Face-to-Face Behavior. New York: Doubleday
Anchor.
Goodwin, Charles 1994. Professional vision. American Anthropologist 96: 606–633.
Gumperz, John J. 1982. Discourse Strategies. Cambridge: Cambridge University Press.
Gumperz, John J. 1992. Contextualization revisited. In: Peter Auer and Aldo Di Luzio (eds.), The
Contextualization of Language, 39–54. Amsterdam: John Benjamins.
Hahn, Tomie 2007. Sensational Knowledge: Embodying Culture through Japanese Dance. Middle-
town, CT: Wesleyan University Press.
Haraway, Donna J. 1991. Simians, Cyborgs, and Women: The Reinvention of Nature. New York:
Routledge.
Haviland, John B. 1993. Anchoring, iconicity, and orientation in Guugu Yimithirr pointing ges-
tures. Journal of Linguistic Anthropology 3: 3–45.
Haviland, John B. 2000. Pointing, gesture spaces, and mental maps. In: David McNeill (ed.),
Haviland, John B. 2007. Gesture. In: Alessandro Duranti (ed.), A Companion to Linguistic
Anthropology, 197–221. Malden, MA: Blackwell.
Herdt, Gilbert H. (ed.) 1993. Ritualised Homosexuality in Melanesia. Berkeley: University of
California Press.
Hertz, Robert 1907. Contribution à une étude sur la représentation collective de la mort. Anneé
Sociologique 10: 48–137.
Hewes, Gordon W. 1974. Gesture language in culture contact. Sign Language Studies 1: 1–34.
Hinsley, Curtis M. and Bill Holm 1976. A cannibal in the national museum: The early career of
Franz Boas in America. American Anthropologist 78: 306–316.
Hogle, Linda F. 2005. Enhancement technologies and the body. Annual Review of Anthropology
34: 695–716.
Howes, David (ed.) 1991. The Varieties of Sensory Experience: A Sourcebook in the Anthropology
of the Senses. Toronto: University of Toronto Press.
Howes, David (ed.) 2004. Empire of the Senses: The Sensual Culture Reader. Oxford: Berg.
Howes, David (ed.) 2009. The Sixth Sense Reader. Oxford: Berg.
Hymes, Dell H. 1962. The ethnography of speaking. In: Thomas Gladwin and William C. Sturte-
vant (eds.), Anthropology and Human Behavior, 13–53. Washington, DC: The Anthropology
Society of Washington.
Hymes, Dell H. 1972. Models of interaction of language and social life. In: John J. Gumperz and
Dell Hymes (eds.), Directions in Sociolinguistics: The Ethnography of Communication, 35–71.
New York: Holt.
Keating, Elizabeth 2001. The ethnography of communication. In: Paul Atkinson, Amanda Coffey,
Sara Delamont, Lyn Lofland and John Lofland (eds.), Handbook of Ethnography, 285–301.
London: Sage.
Kita, Sotaro and James Essegbey 2001. Pointing left in Ghana: How a taboo on the use of the left
hand influences gestural practice. Gesture 1: 73–94.
LaTosky, Shauna 2006. Reflections on the lip-plates of Mursi women as a source of stigma and
self-esteem. In: Ivo Strecker and Jean Lydall (eds.), Perils of Face: Essays on Cultural Contact,
Respect and Self-Esteem in Southern Ethiopia, 382–397. Berlin: Lit.
LeBaron, Curtis 2005. Considering the social and material surround: Toward microethnographic
understandings of nonverbal behavior. In: Valerie Manusov (ed.), The Sourcebook of Non-
verbal Measures, 493–506. Mahwah, NJ: Erlbaum.
Leeds-Hurwitz, Wendy 1987. The social history of the natural history of an interview: A multidis-
ciplinary investigation of social communication. Research on Language and Social Interaction
20: 1–51.
Leenhardt, Maurice 1979. Do Kamo: Person and Myth in the Melanesian World. Chicago: Univer-
sity of Chicago Press.
Lock, Margaret 1993. Cultivating the body: Anthropology and epistemologies of bodily practice
and knowledge. Annual Review of Anthropology 22: 133–155.
MacDougall, David 1998. Transcultural Cinema. Princeton, NJ: Princeton University Press.
Madden, Raymond 2010. Being Ethnographic: A Guide to the Theory and Practice of Ethnogra-
phy. London: Sage.
Malinowski, Bronislaw 1922. Argonauts of the Western Pacific: An Account of Native Enterprise
and Adventure in the Archipelagoes of Melanesian New Guinea. London: Routledge.
Mauss, Marcel 1973. Techniques of the body. Economy and Society 2(1): 70–88. First published
[1935].
McCallum, Cecilia 1996. The body that knows: From Cashinahua epistemology to a med-
ical anthropology of lowland South America. Medical Anthropology Quarterly 10(3):
347–372.
Meyer, Christian 2011. Körper und Sinne bei den Wolof Nordwestsenegals. Eine mikroethnogra-
phische Perspektive. Paideuma 57: 97–120.
Moerman, Michael 1988. Talking Culture: Ethnography and Conversation Analysis. Philadelphia:
University of Pennsylvania Press.
Mol, Annemarie 2001. The Body Multiple: Artherosclerosis in Practice. Durham, NC: Duke Uni-
versity Press.
Nelson, Christopher 2008. Dancing with the Dead: Memory, Performance, and Everyday Life in
Postwar Okinawa. Durham, NC: Duke University Press.
Orie, Olanike Ola 2009. Pointing the Yoruba way. Gesture 9(2): 237–261.
Parkin, David 2004. Textile as commodity, dress as text: Swahili kanga and women’s statements.
In: Ruth Barnes (ed.), Textiles in Indian Ocean Societies, 47–67. London: Routledge.
Regnault, Félix-Louis 1931. Le rôle du cinéma en ethnographie. La Nature 59: 304–306.
Ruby, Jay 1980. Franz Boas and early camera study of behavior. The Kinesis Report 3:
6–11, 16.
Schechner, Richard 1988. Performance Theory. New York: Routledge.
Schegloff, Emanuel A. 1988. Goffman and the analysis of conversation. In: Paul Drew and An-
thony J. Wootton (eds.), Erving Goffman. Exploring the Interaction Order, 89–135. Cambridge:
Polity Press.
Sherzer, Joel 1973. Verbal and non-verbal deixis: The pointed lip gesture among the San Blas
Cuna. Language in Society 2: 117–131.
Sherzer, Joel 1991. The Brazilian Thumbs-Up Gesture. Journal of Linguistic Anthropology 1(2):
189–197.
Sherzer, Joel and Regna Darnell 1972. Outline guide for the ethnographic study of speech use. In:
John Joseph Gumperz and Dell H. Hymes (eds.), Directions in Sociolinguistics, 548–554. New
York: Harper & Row.
Smith, Louis M. 1966. The Micro-Ethnography of the Classroom. St. Louis: Central Midwestern
Regional Educational Laboratory.
Stollberg-Rillinger, Barbara 2011. Much ado about nothing? Rituals of politics in early modern
Europe and today. Bulletin of the German Historical Institute 48: 9–24.
Streeck, Jürgen and Siri Mehus 2005. Microethnography: The study of practices. In: Kristine Fitch
and Robert Sanders (eds.), Handbook of Language and Social Interaction, 381–404. Mahwah,
NJ: Erlbaum.
Streeck, Jürgen 2009. Gesturecraft. The Manu-facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen (n.d.). Balinese Hands. http://jurgenstreeck.net/bali/
Synnott, Andrew 1993. The Body Social. London: Routledge.
Turner, Terence 1980. The social skin. In: Jeremy Cherfas and Roger Lewin (eds.), Not Work
Alone: A Cross-Cultural View of Activities Superfluous to Survival, 112–140. Beverly Hills,
CA: Sage.
Turner, Victor W. 1968. The Drums of Affliction: A Study of Religious Processes among the
Ndembu of Zambia. Oxford: Clarendon Press.
Wolfowitz, Clare 1991. Language Style and Social Space: Stylistic Choice in Suriname Javanese.
Urbana: University of Illinois Press.
Christian Meyer, Bielefeld (Germany)

15. Cognitive Anthropology: Distributed

cognition and gesture
1. Introduction
2. Distributed cognition
3. Using the body to coordinate elements in a functional system
4. The affordances of gesture as a representational medium
5. Using hands to create and coordinate representational states
6. Using hands to propagate functional systems across generations
7. Conclusion
8. References
Abstract
All human cognition is distributed. It is, of course, distributed across networks of neurons
in different areas of the brain, but also frequently across internal and external representa-
tional media, across participants in interaction, and across multiple spans of time. While
most research on distributed cognition has focused on complex activities in technology-
laden settings, the principles apply just as well to everyday cognitive activities. Studies of
distributed cognition reveal that bodily activity – especially gesture – plays a central role
in coordinating the functional systems through which cognitive work gets accomplished.
Gesture does more than externalize thought; it is often part of the cognitive process itself.
Gestures create representations in the air, enact representational states on and over other
media, and bring states in different media into coordination to produce functional out-
comes. Gesture also plays a central role in propagating functional systems – associated
cultural practices, cognitive models, and forms of coordination – across generations,
while adapting them to the particulars of new problem situations. In so doing, gesture
helps to sustain and enhance the cognitive sophistication of the human species.
1. Introduction
A child told to share candies with her sibling doles them out one at a time, saying “one
(for you), one (for me); two, two; three, three….” Or she spreads them across a table
and points at candies in succession while saying “one, two, three….” Or her mother
shows her how to slide candies across the table two at a time while counting “two,
four, six….” Each scenario involves hand actions, one pointing, and one demonstrating.
Are these actions gestures?
What we call gestures and how we study them reflect particular theories of human
cognition and communication. The traditional view in cognitive science is that humans
think internally through propositional logic and/or mental imagery and express their
thoughts to others through language, spoken or written. Spoken messages are accompa-
nied by paralinguistic cues, such as vocal tone, facial expression, or body language, that
signal emotional state or stance toward what is being said. Head movements signal
agreement or disagreement, and hands support or supplement the content of spoken
messages. These signals help listeners unpack utterances and recover their propositional
and/or imagistic content. From this point of view, communicating is a matter of
15. Cognitive Anthropology: Distributed cognition and gesture 241
encoding and decoding messages transmitted from sender to receiver: what since Reddy
(1979) has been called the “conduit metaphor” of communication. This view of cogni-
tion and communication served as the basis for most research in cognitive science dur-
ing the mid-1950s to the late 20th century, until other views, including that of distributed
cognition, called this account increasingly into question.
Against this backdrop, gesture re-emerged as a topic of research due primarily to the
pioneering studies of Adam Kendon (1972, 1980) and David McNeill (1985, 1992).
Kendon and McNeill both study the expressive hand movements that accompany
speech, which Kendon (1980) calls “gesticulation,” and how they relate to spoken con-
tent. For both researchers, a primary unit of analysis is the utterance, a communicative
act consisting of a speech-gesture ensemble expressing related content and bounded as
a single intonation unit (as in Chafe 1994). For Kendon (2004), speech and gesture are
separate streams coordinated in the process of utterance, while for McNeill (2005),
speech and gesture arise together, in a dialectic of language and imagery, from a single
idea or “growth point.” Gesture, like speech, is a means for expressing or externalizing
thought in the mind of the speaker, although it can also mark or regulate aspects of the
discourse. Kendon’s data consist primarily of recordings of conversations and some
guided tours, while McNeill’s consist mostly of experimental participants narrating
events seen in a cartoon or film or recalled from a fairy tale. The gestures examined
in these studies are produced in the air in the space in front of the speaker or, in the
case of pointing gestures, directed toward objects or locations in the surrounding
space. The ground-breaking studies of Kendon and McNeill contrast in some respects
with the workplace studies typical of distributed cognition research, where gestures
over representational artifacts are common and where gesture and speech are directed
toward the accomplishment of joint activity as well as the development of mutual
understanding.
Studies of distributed cognition are closer to the work of Jürgen Streeck (2009) and
Charles Goodwin (2000), researchers from the tradition of conversation analysis who
study the communicative practices of people engaged in work and life activities in
the culturally rich settings they ordinarily inhabit. These researchers study gesture as
practice – as part of what people do and how they go about doing it – rather than as
expressions of interior mental life. They take a particular interest in times when people
coordinate with one another to develop a shared understanding of a problematic situ-
ation or to overcome snags in the flow of activity; here gesture comes to the fore. In
these studies, gesture is entwined with practical action, so that gestures are frequently
produced on or over objects in an “environmentally coupled” way (Goodwin 2007) or
with objects in hand (as in LeBaron and Streeck 2000). Gesture may not be singled out
for study but may instead be considered one of many factors shaping the construction of
meaning in situ, including the structure of the activity, aspects of the setting, the position
and orientation of participants’ bodies (including access to each other’s actions), mutual
orientation toward objects, shared knowledge or history of activity, content and struc-
ture of the preceding discourse, and, of course, the talk that participants produce
(Goodwin 2000). Attention is paid to conversational moves of various kinds (even inac-
tion), and meaning is seen as emergent from the relations between talk, gesture, arti-
facts, and situated aspects of the discourse rather than from the unpacking of utterances.
Along with a focus on practice, distributed cognition research shares with these stu-
dies the use of micro-ethnography as a method of inquiry. Data consist of recordings of
activity in real-world settings where the researcher is a participant-observer. Episodes

of recorded activity are analyzed in detail – moment by moment, frame by frame –
to reveal the subtle processes of coordination through which activity is accomplished
and through which participants jointly construct meaning. Interpretations of the video
data are warranted by evidence gathered through traditional ethnographic inquiry. The
form of any particular analysis depends on the research questions posed and the concep-
tual framework employed in the study. In distributed cognition research, this micro-
ethnographic approach is known as “cognitive ethnography” (Hutchins 2003; Williams
2006). Its goal is “to study how cognitive activities are accomplished in real-world set-
tings” (Hutchins 2003): what resources are brought to bear and how they are coordinated
to produce targeted outcomes. In other words, cognitive ethnography is close study of the
phenomena that cognitive scientists seek to explain: human cognition in natural activity.
As an approach focused on close observation and micro-analysis, cognitive ethnography
can be combined with other methods of inquiry to enhance the ecological validity of ex-
periments or to inform the design of simulation studies (Hutchins 2003). Together, these
approaches help us triangulate toward a better understanding of human cognition.
Cognitive ethnographic studies of distributed cognition show that bodily actions, in-
cluding gesture, play a central role in real-world cognitive activity. This article reviews
key tenets of distributed cognition, briefly describes the role of the body in distributed
cognitive functional systems, and highlights the affordances of gesture as a representa-
tional medium. It then discusses findings from cognitive ethnographic studies that illus-
trate critical functions of the hands in human cognition: creating and coordinating
representational states in functional systems and guiding conceptualization to propa-
gate functional systems across generations. The article concludes with brief implications
for the study of gesture as a unique and powerful human capability.
2. Distributed cognition
The term “distributed cognition” refers not to a type of cognition but to a perspective
for understanding cognition generally. As described by its leading proponent, Edwin
Hutchins, all human cognition is distributed. Some cognitive accomplishments rely
solely on interactions among neural networks in diverse areas of the brain, while others,
including the most significant human accomplishments, involve coordination of internal
structures and processes with structures in the world we engage with our bodies and
modify to suit our purposes. Through such functional couplings, we use our Stone Age
brains to lead Space Age lives.
Among the chief insights of distributed cognition is the benefit to be gained by not
defining the boundaries of the cognitive system too narrowly. If we consider cognitive
processes of reasoning, decision-making, and problem-solving to be those “that involve
the propagation and transformation of representations” (Hutchins 2001: 2068; see also
Hutchins 1995a: 49), then we must also consider that many of the most important repre-
sentations lie outside the heads of individuals, embedded in sociotechnical systems of
human activity. By incorporating relevant aspects of the material setting and social
organization into the unit of analysis, we can study how cognitive systems function
through “the propagation of representational state across a series of representational
media” and how representational states are propagated “by bringing the states of the
media into coordination with one another” (Hutchins 1995a: 117). Once we have an
understanding of how such a distributed system functions, we are then in a position to

make claims about what must be happening inside the heads of individuals to make the
system function. Working from the outside in, we can “[refine] a functional specification
for the human cognitive system” (Hutchins 1995a: 371) while avoiding the danger
of over-attributing internal structure, that is, of claiming that more of the world is
internally represented than is necessary to support adaptive behavior.
With respect to the field of cognitive science, this view of distributed cognition re-
tains a sense of cognition as computation while dispensing with the notion of cognition
as internal symbol processing. Cognition is foremost active, engaged, and embodied.
Although it can play out covertly in imagined perception and action, the remarkable
human capacity to ponder derives from a history of bodily engagement with the
world. Human cognition is also, to a vastly greater extent than in other species, a mix-
ture of the biological and cultural. Despite limited capacities for attention, memory,
perception, and processing, humans are capable of stunning achievements primarily
because “cultural practices orchestrate the coordination of low-level perceptual and
motor processes with cultural materials to produce particular higher-level cognitive pro-
cesses” (Hutchins 2010: 434). Distributed cognition views culture as, among other
things, “an adaptive process that accumulates partial solutions to frequently encoun-
tered problems” (Hutchins 1995a: 354). These partial solutions structure systems of
activity in which humans engage and through which they develop. Bodily actions, in-
cluding gestures, bring these functional systems into coordination and perpetuate them
across generations. As they do so, the systems adapt to changes in the cognitive ecology:
to different environments, emerging technologies, new forms of social organization, and
changes in cultural values and practices.
3. Using the body to coordinate elements in a functional system

While most research on distributed cognition has examined complex sociotechnical sys-
tems such as ship navigation (Hutchins 1995a), commercial fishing (Hazlehurst 1994),
air traffic control (Halverson 1995), or piloting jet aircraft (Holder 1999; Hutchins
1995b; Hutchins and Klausen 1996), the basic concepts are equally evident in mundane
activities like time-telling (Williams 2004). Take, for example the counting of objects as
portrayed in the introduction. Counting is a familiar cultural practice for determining
quantity. It addresses the question “How many…?” by producing a number that corre-
sponds to a quantity of objects. Determining quantity is a frequently encountered prob-
lem for which culture has accumulated partial solutions: various counting practices that
use bodily action to coordinate the elements of functional systems. Cultural practices
for counting are highly conventionalized, but any situated use of counting must be
improvised, in that the form of counting practice, once chosen, must be adapted to
and coordinated with the particulars of the setting and situation, including such things
as the type and arrangement of the objects to be counted and their physical presence or
absence.
As an illustration, consider the three common ways to count objects illustrated in
Fig. 15.1 (analyzed in Williams 2008c). The first, shown in Fig. 15.1(a), is to touch ob-
jects in succession while uttering number names in memorized sequence: “one, two,
three…”; the number name uttered when the last object is touched corresponds to
the quantity of objects counted. Here the body provides the coordination necessary
for the distributed cognitive system to function: the speech system utters numeric labels
in succession while the manual system moves the hand with extended index finger (in
the handshape prototypical for pointing) from object to object, touching each object
at precisely the instant when a numeric label is uttered. An essential part of this coor-
dination is imposing a path along which the hand moves so that it touches every object
exactly once. This system for computing quantity requires the coordination of brain,
body, and world (Clark 1997): it combines conceptualization (object perception, a cog-
nitive model for a specific cultural practice, a conceptual path), bodily action (speaking
and touching), and environmental structure (a configuration of objects) into a func-
tional ensemble. The system can fail in several ways: missing number names, miscoor-
dinating uttering with touching, mistaking objects to be included, losing track of the
counting path, etc. Errors can be reduced through improvised adaptations: a child
counting dots in a circle, for example, held the tip of her left index finger at one dot
while she used the tip of her right index finger to touch each of the subsequent dots
around the circle while counting aloud; marking the start of the counting path in this
way made it easier to discern its end (Williams unpublished data). A common adapta-
tion is to modify the environment before counting: to rearrange objects into a line
or array in order to facilitate a simpler counting path. These manual actions before
counting reduce errors in the execution of the functional system, making it more
robust. These examples show how conventional cultural practices must be adapted to
the particulars of setting and situation when distributed cognitive functional systems
are instantiated in real human activity. Actions of the hands are critical to these
situation-specific adaptations.
[b] [c] [h] [g]

“one” “two” “three” [a] [f]
“six”
“four”
“two”
(a) sequential touching (b) relocating objects (c) using finger proxies
Fig. 15.1: Three functional systems for counting objects
In the form of counting practice described above, the functional system operates
through a series of touch-points synchronized with speech; this looks like gestural action
without intent to communicate. The gestural quality is even more apparent when the
manual actions are reduced to a series of points (with no contact) while number tags
are uttered. With further reduction, the system operates with no hand action at all, repla-
cing it with gaze shifting: looking at (fixating) objects in succession while uttering number
names subvocally. These are varied instantiations of a common counting practice realized
through different bodily actions, some more overtly gestural than others.
A second conventional way to count objects, shown in Fig. 15.1(b), involves changing
the location of objects while uttering number tags. Examples from Williams (2008c)
include picking up and placing traffic cones, sliding coins across a table two at a time,
and dropping buttons into a bag. Again coordination of the functional system is
achieved through manual action synchronized with speech, but here the movements
look more like practical actions than gestures: grasping, lifting, sliding, placing, and
dropping objects, all performed not to accomplish some practical end on or with an
object but rather to accomplish the cognitive goal of computing quantity.
A third way to count objects, shown in Fig. 15.1(c), again appears gestural in form:
raising fingers or touching fingers successively (for example, to a surface) while uttering
names of non-present entities. Here the fingers are proxies for objects being counted.
Examples from Williams (2008c) include raising fingers while reciting the alphabet to
identify the 18th letter and touching and raising fingers while naming family members
to determine the number of people for a dinner reservation. In this functional system,
the hand configuration is modified in coordination with object-naming; the configura-
tion produced when the final object is named represents the total number of objects.
This final configuration can be identified using associations from childhood counting
practice, or the finger-raising sequence can be repeated while reciting number tags
until the target configuration, held in visual working memory, reappears. The manual
actions in this case are neither practical nor communicative: they are cognitive actions
that encode representational states during the execution of a computational process.
If we call them gesture, as I believe we should, then they are gesture for cognition –
specifically, gesture for problem-solving rather than word-finding or thinking-for-speaking,
which are cognitive functions claimed for gesture when it is considered in relation to
speech.
The distributed cognitive functional systems described in this section all accomplish
computation through sequenced actions of the hands (or eyes) that coordinate spoken
labels with objects or their proxies. The manual actions that bring these systems into
coordination cross distinctions between practical action (physically moving objects),
communicative action (pointing), and cognitive action (counting on fingers). A single
form, an index-finger point, can serve different functions (cognitive, communicative,
or both simultaneously) while a single function, coordinating number tags with objects,
can be accomplished by different forms (looking, pointing, touching, sliding, picking up
and placing, etc.). Whether manipulative or gestural in appearance, the hand actions de-
scribed here serve the common purpose of coordinating elements in a functional system
to produce a computational outcome.
4. The affordances of gesture as a representational medium

Human hands are the first tools of representation. Streeck (2009: 39–58) describes the
capabilities of hands that form the basis for practical actions and gestural movements.
From the perspective of distributed cognition, hands can represent, can produce repre-
sentations in other media, and can propagate representational state from one medium
to another, including to or from themselves. In many respects, this puts hands at the
center of human cognition, with heads as controllers of hands and interpreters of the
states they produce. Hands act on and modify the world: they manipulate objects, rear-
range them, shape them, assemble or disassemble them, transport them, and employ
them as tools to act on other objects (to draw, write, carve, and so on). These are com-
monly regarded as practical actions, but they may equally be cognitive actions, helping
us perceive the affordances of objects (Kirsh and Maglio 1994) or prepare the environ-
ment for intelligent action (Kirsh 1995), as in the example of lining up objects before
counting them. Hands also interact with the world without modifying it: they bring
attention to objects, highlight their relevant features, and annotate or elaborate their
structure. These are environmentally coupled gestures (Goodwin 1994, 2007) whose sig-
nificance derives from the culturally constituted spaces in which they are performed.
Finally, hands depict directly, in the space in front of the speaker, using conventional
gestural practices: they enact schematic actions; they evoke imagined objects through
enactments or through schematic acts of drawing, outlining, or molding; and they
model objects and their interactions (see Müller 1998: 114–126 and Streeck 2008 for dis-
cussion of gestural modes of depiction). Acting on objects, gesturing over objects, and
gesturing in the air seem like different sorts of hand actions, but from the perspective of
distributed cognition, they often serve similar or closely related purposes. The purpose
of a given hand action may become apparent only when it is considered in terms of the
functional system being instantiated to accomplish a particular outcome.
That human hands modify environments, manufacture artifacts, and use tools to
achieve desired ends is well known and widely regarded as fundamental to human
life. More specific to human cognitive achievements are hands’ entrained abilities to
create representational states in physical media. Hands create representational states
through culturally shared practices of sketching, drawing, and writing, as well as
through more specialized practices like painting, sculpting, carving, or crafting. In
much of the world today, hands create representational states in electronic media
through historically recent practices such as keyboarding, mousing, and using touch-
pads and -screens. Where physical or electronic media are absent, or where the skills
to employ them are lacking, hands rely upon themselves to represent: they use their
own physicality to materialize conceptual content. Indeed, this ability may be integral
to all the others. Hutchins (2010) claims that “humans make material patterns into re-
presentations by enacting their meanings” (Hutchins 2010: 434), and hands are humans’
primary tools for enactment.
Given this array of potential means for representation, it is worth asking: What are
the affordances of hands that lead to their being employed for depiction when other
media might instead be chosen? Because hands are parts of our bodies, they are always
“ready at hand”: they can be brought into action quickly and can produce representa-
tional states faster than these could be engendered in other media. In contrast with writ-
ing and drawing, hands represent relations in three-dimensional space and can move
while representing, enacting the dynamics of processes. Multiple changing relations
are especially hard to visualize, requiring complicated physical models or clockworks,
flat (2-D) video recordings or animations, or high-technology systems for motion cap-
ture or 3-D visualization. Hands can conjure 3-D relations and dynamics directly in
space – in so far as a partial, schematic depiction annotated by speech is sufficient to
the demands of the situation and the complexity of what needs to be represented.
Using the hands depictively also brings processes of all sizes and scopes, from the
cosmic to the microscopic, into “human scale”: the scale at which we directly perceive
and act in the world (Fauconnier and Turner 2002: 312). Two types of human scale are
important in gesture research: one in which the gesturer inhabits a space and acts
subjectively within it, called “character viewpoint,” and another in which the gesturer
models objects and interactions in the space in front of his body, called “observer view-
point” (McNeill 1992: 118–125). A speaker adopts character viewpoint if she enacts
steering a car while describing an automobile accident; she adopts observer viewpoint
if she uses her hands to model two cars colliding, a depiction she views from outside the
space of action in mutual orientation with her interlocutor, who views it from another
angle. Observer viewpoint, in particular, enables us to take processes at any imaginable
scale and to portray them in the perceivable, reachable space in front of our bodies and
thus to “dominate” them (Latour 1986: 21). And because our bodies are mobile – able
to bend, reach, turn, walk, and so on – we can transport gestural representations into
and out of co-location with states in other media, thereby linking or coupling them.
This puts hands as representational tools squarely at the center of the coordinative
processes necessary for cognitive functional systems to achieve their outcomes.
Finally, what must be noted in this discussion of the affordances of gesture as a re-
presentational medium is the limited durability, the non-persistence, of gestural repre-
sentations. Gestures have a greater material presence than words, but while they can be
sustained briefly to support perception and reasoning, they vanish as soon as the hands
are put to other uses. This is their most significant contrast with other physical media,
yet the affordances of gesture enable it to be used in conjunction with durable media
to achieve outcomes more powerful than either could achieve on its own.
5. Using hands to create and coordinate representational states

The examples discussed below are taken from studies of distributed cognition in various
settings. They demonstrate how gesture is used to represent and to coordinate represen-
tations in functional systems for accomplishing cognitive activity.
5.1. Creating provisional representations during joint imagining

An example that illustrates the coordination of gesture with other representational
media comes from the situation of naval quartermasters plotting lines of position on
a navigation chart (Hutchins 1995a, 2010). A navigation chart is a computational
device: a line drawn on the chart restricts the possible locations of the ship with respect
to the surroundings; a second, intersecting line determines this location uniquely; in
standard practice, a third line forms a small triangle whose magnitude corresponds to
the margin of error or indeterminacy in the position fix (Hutchins 1995a: 61). The nav-
igation chart itself incorporates the residua of cognitive processes extending across mul-
tiple scales of time, from the momentary actions of the navigation team plotting the fix;
to the earlier actions of team members updating the chart; to the actions of others, dis-
tributed across time and space, mapping the world represented in the chart; to the ori-
gins of the representational systems and conventions (latitude/longitude, compass
directions, and so on) that enable the outcomes of mapping expeditions to be combined;
to basic practices of counting and measurement whose roots lie in the ancient world.
These different timescales of activity are apparent in the means through which repre-
sentational structure is layered on the chart (Hutchins 1995a: 165–167). The outcomes
of centuries of past activity are captured in the printed features of the chart: in its lines,
shapes, scales, and labels. Updates to the chart (new landmarks or underwater hazards,
turn bearings or danger depth contours, etc.) are added in ink before the chart is em-
ployed in navigation. When the chart is in use, plotted lines of position and projected
future positions are marked in pencil. Finally, when navigators consider possible land-
marks for the next position fix, they trace projected lines of position on the chart with
their index fingers, leaving no marks. The significance of these gestural traces emerges
not simply from the composites of gesture and speech (the utterances) but from the
layering of the gestures, construed by speech, on the meaningful space of the chart in
the context of the mutually understood activity being jointly pursued. As Hutchins
(2006) notes: “The meanings of elements of multimodal interactions are not properties
of the elements themselves, but are emergent properties of the system of relations
among the elements” (Hutchins 2006: 381). These gestural traces of possible lines of
position are part of an embodied process of joint imagining: perceiving in a “hypothet-
ical mode” (Murphy 2004: 269) while acting in a “subjunctive mood” (Hutchins 2010:
438). The fleeting quality and lack of physical imprint of the gestures suit precisely
the nature of the cognitive task at hand: considering, but not committing to, possible
courses of action, and using these considerations as the basis for a decision that will
determine future action.
5.2. Adding a third dimension and motion dynamics for

scientific visualization
In the example of tracing imaginary lines of position on a navigation chart, gesture
layers structure onto an existing material representation, adding constraint to isolate
an outcome. In the next example, from a meeting in a scientific laboratory (Becvar,
Hutchins, and Hollan 2005), gesture extracts representational state from a flat represen-
tation and transforms it, adding a third spatial dimension and motion dynamics. The
result is a human-scale, hand-based model for theorizing about molecular interactions,
again in a hypothetical mode. In this case, the principal investigator in a chemistry lab-
oratory has just projected a ribbon diagram of the thrombin molecule on an overhead
projector, as shown in Fig. 15.2(a). She calls attention to the many loops in the diagram
(“see how you have all these little loops: this loop, this loop, this loop, and this loop”),
pointing to examples on the transparency with her left index finger, silhouetted on the
projected image, as she identifies them. She begins to say “all kin’ of ” and then breaks
off her speech, whereupon she lifts her left hand from the transparency into the air,
palm outward with fingers outstretched, and says “in three-dimensional space they’re
like this”; this moment is shown in Fig. 15.2(b). Her left hand has extracted representa-
tional state, the basic morphology of the thrombin molecule, from the ribbon diagram
and reproduced it in three dimensions in the space in front of her body. Her hand is the
molecule, and her fingers now represent the loops she has identified. She holds this hand
configuration just below eye level in clear view of the audience she is addressing, a typ-
ical position in gesture space for depictive (iconic) gestures intended for mutual orien-
tation. Then she elaborates this 3-D model. First, she points with her right index finger
to a location between the finger-loops, saying “an’ that’s the active site”; here her left
hand models the molecule using a body-part-as-object form of depiction, while her
right hand indexes a specific site on this molecule using the conventional form of
index-finger pointing associated with individuating a reference object or location.
This two-handed gesture complex, in coordination with the accompanying speech, ac-
complishes the multifaceted purpose of representing molecular structure, making it
available for visual scrutiny, and focusing attention on a detail in that structure that
is critical to understanding how the molecule functions. In a further elaboration, the
speaker adds motion dynamics to the 3-D model. She says, “And so our new theory
is that thrombomodulin does something like this,” pausing briefly to contract and
expand her fingers, “or like this,” pausing again to rotate her fingers from side to
side. In this portion of the discourse, the speaker uses her hand-as-molecule to enact
and thereby simulate possible forms of molecular motion resulting from the binding
of thrombomodulin. These simulations are, like the lines-of-position example, not yet
committed to but hypothetical. In subsequent elaborations, she places the back of
her right hand against the back of the hand-as-molecule to enact the binding of throm-
bomodulin to the back side of thrombin, and then she uses rapid movement of her right
hand toward the interior of the hand-as-molecule to enact the rapid binding of another
protein to the active site. Throughout these depictions, her left hand models the throm-
bin molecule and its dynamics while her right hand alternates between indexing parts
of the molecule and modeling other molecules’ interactions with it.
(a) thrombin diagram (b) thrombin hand model (c) thrombin hand model
(6 months later)
Fig. 15.2: Representational gesture as a cognitive artifact for scientific reasoning
This example clearly illustrates how gestural depiction becomes a component of sci-
entific reasoning. The speaker’s hand-as-molecule gesture creates a stable, visually
accessible, dynamically reconfigurable, 3-D model of a functioning molecule at human
scale. Her gestural elaborations of that model serve to highlight invisible elements and
depict imperceptible processes, all in the hypothetical mode. By using her hand move-
ments to create these representations, the speaker also brings her own embodied expe-
rience with tangible objects, felt movements, and visuospatial perception into play,
providing a bodily basis for sensing connections or making discoveries about molecular
dynamics. The gestural model takes on the status of a cognitive artifact: a representa-
tional or computational tool that is part of a cognitive functional system – in this case,
a system for reasoning about molecular interactions. This gestural model proves crucial
to the work of the group, as evidenced by two observations: first, that other members of
the group reproduce the hand-as-molecule gesture during subsequent discussion, and
second, that the hand-as-molecule gesture is produced independently and spontaneously
by a lab member (not the original speaker) six months later in an interview when she de-
scribes the goals of the project, as shown in Fig. 15.2(c). In both the science laboratory
and ship navigation examples, gestural enactments serve as components of hypothetical
thinking as well as ways of sharing that thinking with others, demonstrating how hands
are used as tools for reasoning as well as communication.
5.3. Coordinating representational states to construct a shared

object of knowledge
Another study of a science lab (Alač and Hutchins 2004) reveals additional ways in
which gestural actions participate in processes of thinking as well as communicating.
Here the focus is on functional magnetic resonance imaging (fMRI) researchers inter-
preting brain images. As Goodwin observes in “Professional Vision” (1994): “An event
being seen, a relevant object of knowledge, emerges through the interplay between a
domain of scrutiny… and a set of discursive practices… being deployed within a specific
activity” (Goodwin 1994: 606: emphasis in original). In the functional magnetic reso-
nance imaging lab, the domain of scrutiny consists of color images displayed on com-
puter monitors; these images have been produced by scans of participants’ brains as
they viewed visual stimuli. The researchers employ a set of discursive practices, includ-
ing gestures, to transform these images into objects of knowledge, namely, into repre-
sentations of brain areas and their levels of activity. As is often the case in these
kinds of studies, the process is laid bare through the interaction between an expert
and a novice, where the expert teaches the novice how to “see” the phenomena of inter-
est – that is, how to enact a functional system through which the objects of knowledge
are made manifest. Processes that for experts are largely covert (though not entirely
internal) are made overt in these interactions, and processes of thinking are opened
up into processes of communicating, with frequent production of gesture.
In contrast with the preceding example, gesture here is employed less as a means for
directly representing than as a means for coordinating representational states in differ-
ent media: in the brain images displayed on the computer screen; on a paper chart with
hand-drawn diagrams of the visual field (what the participant saw) and of retinotopy
space (a map of visual cortex); and in the talk produced by the expert as she draws, ges-
tures, and engages with her interlocutor. In one such coordinative sequence, the expert:
(i) touches the brain image on the computer screen while identifying the location she
touches as the “center”;
(ii) rotates the hand-drawn chart to align it with the image and points to a location on
the chart she identifies as “right here”;
(iii) holds the chart up next to the computer screen while saying “and when we look at
this map it looks something like that”;
(iv) traces the outline of the primary visual area on the chart with her index and middle
finger while saying “so V1 is going to be in the center”; and then
(v) transposes her hand – maintaining the tracing handshape – to the brain image on
the computer screen where she executes a matching two-fingered trace, shown in
Fig. 15.3(a), six times in rapid succession while saying “it’s gonna be this pie shape;
it’s probably covering approximately this area” (Alač and Hutchins 2004: 646).
Here the coordinative function of gesture is quite apparent. Pointing and tracing highlight
structures in external representational media whose conceptual identity is invoked by
speech. Maintaining handshape while moving the hand from one culturally constructed
space (the chart) to another (the computer image) and repeating the gesture form in
the new space together establish a conceptual link between the highlighted states of
the two media; these states are construed, named, and related by the accompanying
speech (“this,” “here,” “the center,” “V1,” “like that,” “so,” etc.) to produce the relevant
object of knowledge, namely, seeing part of the colored image as V1, the primary visual
cortex. “Seeing-as” is a cognitive accomplishment, the outcome of a discursive process in
which gesture weaves conceptual content into material patterns in the physical world.
(a) outlining gesture transposed (b) squeezing gesture overlaid on

from chart to image chart (palms moving inward)
Fig. 15.3: Gestures propagating and coordinating representational states
In other parts of the interaction, gesture is used depictively, as in the hand-as-mole-

cule example, to add dimensionality and dynamics to what is being represented. In Fig.
15.3(b), the expert places her hands, wrists together, palms inward, fingers outstretched
in a V-shape, on top of bold lines on the chart of retinotopy space as she says “take
these two meridians,” and then she reduces the angle between her palms as she says
“as if you were squeezing them together into the pie shape” (Alač and Hutchins
2004: 642), where “the pie shape” refers to a wedge outlined on the chart. Here the con-
junction of chart, hands, and speech construes the action as simulating the inward
movement of the meridian lines as they are squeezed to the boundaries of a section
in retinotopy space. As Alač and Hutchins point out, this squeezing has “no real-
world referent” in that it corresponds to “no real action wherein the visual space is
effectively squeezed and transformed into the retinotopically organized visual areas”
(Alač and Hutchins 2004: 643). The enacted squeezing is a metaphorical action: a
human-scale physical enactment of a more diffuse, invisible process through which
the visual field comes to be represented in visual cortex. The novice is directed to imag-
ine the action “as if” it were happening while the gesture enacts the process, as in the
earlier examples, in a hypothetical mode.
In the examples described above, meanings are produced not additively, as the
aggregation of meanings in separate media, but emergently, as the interrelation of states
in media brought into coordination through gesture. Gesture coordinates with visible
media through co-location and with audible media through co-timing, while it can
also coordinate with both through perceived similarities of form. Proximity and syn-
chrony are precise achievements of skilled human action that weaves media into mean-
ing. As hands enter culturally constituted spaces, they form shapes and perform
movements that take on meaning in relation to the structures and representational con-
ventions that govern those spaces. How the gestures are to be interpreted, whether in
relation to an artifact itself or to what that artifact represents, depends upon the con-
strual provided by speech and by shared knowledge of the situation. In Hutchins and
Palen (1997), for example, a pilot’s gestures on and over an instrument panel in the
flight deck are variously interpreted as actions taken on the panel and as events occur-
ring in the aircraft’s fuel system (Hutchins and Palen 1997: 37). Hands also transport
meaningful state from one constructed space – one semiotic field – to another, as
when the functional magnetic resonance imaging researcher transfers a traced outline
from the labeled chart to the brain image. Moving a representational state into a
new semiotic field transforms how that state is seen. Hutchins gives the example of a
navigator moving his dividers (a tool that can be set to span a particular interval)
from a line segment on a navigation chart to a printed scale where the distance traveled
in three minutes (1500 yards) can be read as a speed (15 knots) by ignoring the two
trailing zeroes (1995a: 151–152; further analyzed in 2010: 429–434). Whether what is
moved from one space to another is a physical tool or a configured hand makes little
difference; what matters is the semiotic shift. Of course, hands also create representa-
tions in their own semiotic space, in the air in front of the speaker, in relation to spoken
content, as when the scientist uses her hand to model the shape and movements of a
molecule. Through these processes, hands play a crucial role in producing and elabor-
ating “multimodal meaning complexes” (Alač and Hutchins 2004: 637) in the interac-
tions through which joint activities are accomplished and through which participants
come to share understanding.
5.4. Coordinating representational states across multiple participants

The multimodal meaning complexes described up to this point appear to result from the
actions of a single participant at a time, but this need not be the case. In the examples of
presenting to a group and of teaching a novice, the speaker is more knowledgeable, is
acting in the expert role, and is holding the floor during the analyzed segment of dis-
course. In the work situations studied by Hutchins, Goodwin, Streeck, and others, par-
ticipants frequently engage in familiar activities in known settings with mutually
understood goals and overlapping knowledge. The resulting high degree of intersubjec-
tivity and more balanced participation make it increasingly likely that the gestures and
speech of different participants will mutually elaborate one another. In an examination
of three pilots interacting in a training situation where one (American) is instructing the
other two ( Japanese) in a Boeing aircraft procedure, Hutchins and Nomura (2011) find
multiple instances where gesture produced by one participant develops meaning in rela-
tion to talk produced by another. In the title of their paper, Hutchins and Nomura refer
to this phenomenon as “collaborative construction of multimodal utterances.” In the
cases they describe, the collaborative construction is directed toward creating a shared
conceptual object: a sequence, cause/effect, comparison, etc. The gestures produced by
one pilot while another speaks simulate interaction with aircraft control systems by en-
acting virtual actions on imagined objects, or else they model aircraft responses using
the hands or body with outstretched arms to simulate changes in the airplane’s orien-
tation or dynamics. The gestural enactments variably precede (with a hold), coincide
with, or quickly follow their lexical affiliates in the other’s speech. Their purpose
seems to be to display intersubjective understanding through demonstrated action (in
anticipation of or in response to the other’s verbalization) and/or to practice the proce-
dure being described, possibly as an aid to future recall. Listener gestures provide vis-
ible evidence that the listener inhabits a conceptual world in common with the speaker.
In so doing, they require a commitment to particulars of the situation not evident in
speech. Gestural enactments evoke an imagined setting (the flight deck of a jet aircraft),
a role (pilot flying), and a vantage point (in the right or left seat), including details such
as the location and operation of aircraft controls. Aspects of the setting “are brought
forth as implied elements in an imagined world of culturally meaningful action”
(Hutchins and Nomura 2011: 41), and gestures “are coupled to elements of that ima-
gined environment” (Hutchins and Nomura 2011: 42). The speaker also appears to
modify his on-going talk as a consequence of the other’s gestures, variously confirming
or correcting the apparent interpretations or omitting items that have already been es-
tablished, such that a lexical affiliate for the other’s gesture may never be spoken.
Hutchins and Nomura claim that “the participants are engaged simultaneously in two
kinds of projects: they are enacting conceptual objects of interest (what they are talking
about), and they are conducting a social interaction. While these objects are analytically
separable, in action, they are woven into the same fabric” (Hutchins and Nomura 2011:
40). In these examples, talking and gesturing in coordination with others appears to be a
way of establishing common ground for the discourse while simultaneously establishing
the objects of knowledge that define the work of the profession.
Finally, it is worth pointing out that the most significant phenomena described
here – the coupling of gesture with other media, and the coordinated production of
talk and gestures by separate participants – are precluded by an experimental method-
ology in which one participant who has seen a video narrates the events, without access
to any material resources, to another who has not. These phenomena are also less likely
to be observed in studies of conversations where participants tell stories about people,
happenings, and objects not present. This point is not meant to diminish the many in-
sights that are gained from such studies. It is, however, meant to press the claim that the
primordial home for gesture is in mutual, consequential activity in culturally con-
stituted settings. Gesture in such activity is likely to be performed in relation to
other media and in close interaction with other participants, and it is likely to serve a
functional role in cognition that goes beyond the expression of internal content.
6. Using hands to propagate functional systems

across generations
We have considered ways in which people use their hands to create and coordinate re-
presentational states to accomplish cognitive activities, individually and in collaboration
with others. With frequently recurring tasks, these practices can become highly conven-
tionalized, although, as we have seen, they are always adapted to the particulars of si-
tuations. Taking a broader perspective, we may now ask: How are distributed cognitive
functional systems – coordinations of cognitive models, artifacts, and cultural practices –
propagated across generations? Here again gesture plays a crucial role.
We have already seen one illustration of the propagation of cultural practices in Alač
and Hutchins’ (2004) example of the experienced functional magnetic resonance imaging
researcher teaching the novice how to interpret brain images. Here the expert used point-
ing and tracing to highlight shapes in the visible media (the chart and the image) while
her speech profiled conceptual entities and relations manifested in those shapes. Keeping
a fixed handshape while moving her hand from one semiotic space (the chart) to another
(the image) and repeating the gestural form helped establish a conceptual link between
elements in the two spaces. By compressing analogy into identity (Fauconnier and Turner
2002: 314–315, 326), the novice learns to see the shape on the computer screen as a cor-
tical map in the participant’s brain. It may be that the expert routinely uses her hands to
accomplish this seeing on her own, perhaps by tracing outlines on the images she is ex-
amining, but when she teaches the novice, she performs these actions overtly, opening the
process to scrutiny. She also annotates her task-relevant actions with additional gestures,
speech, and shifts of gaze from work objects to her addressee, monitoring the novice’s
responses as she explicitly guides him in where to look (how to attend), what to see
(how to conceptualize what is being viewed), and what to do (how to act). The expert shifts
projects from accomplishing to teaching. This shift is evident in how she orients her body
and how she uses her hands and her talk to engage her interlocutor as well as the tools of
her trade. In her instructional discourse, the expert demonstrates how to find, interpret,
coordinate, and employ relevant states of representational media to accomplish the
work of a functional magnetic resonance imaging researcher.
A more commonplace example of shifting projects from doing to teaching, of opening
functional systems to scrutiny, and of guiding the conceptualization of novices can be found
in adults teaching children to tell time (Williams 2008a, 2008b). Expert time-tellers look at
an analog clock and read the time with gaze-fixing and slight gaze-shifting from one clock
hand to another as the only visible evidence of a cognitive process unfolding. It is doubtful
that any novice could learn to read the clock simply by watching an expert do it. Children
learn to read the clock because adults who are proficient time-tellers provide them with
active instruction: pointing to structures and tracing paths on the clock face, highlighting
elements, relations, and processes while construing them with speech, and shifting gaze
from the clock face to the learner to monitor attention and seek signs of confusion or
comprehension. While the child practices reading times on the clock, the adult monitors
and provides prompts, confirmation or correction, and additional instruction as needed.
Through this form of social interaction, children learn to see meaningful structure on
the clock face and to interpret that structure in relation to human activity and to a con-
ventional system of time measurement. Seeing time on the clock is another cognitive
accomplishment entrained by the gestural weaving of material and conceptual worlds.
As a brief example, consider the fragment of instruction shown in Fig. 15.4 (Williams
2008a). Here the teacher says “now another way that we say it, is we count by fives,
when we move this from number to number; there’s five minutes between each number”
while she enacts a hypothetical process of counting on the clock. If we break this fragment
into segments, we see the dynamic mapping of conceptual content to the clock face as
mediated by gesture. While saying “now another way that we say it,” the teacher moves
the minute hand to the 12, positioning the hand at the starting point for a clock-counting
process. When she says “is we count by fives,” she activates a cognitive model for counting
that is familiar to her first grade class: touching objects (sets of five elements) while utter-
ing “five, ten, fifteen….”. Accompanying her statement “when we move this” is a shift of
gaze to where the tip of her right index finger rests on the minute hand; this construes
the minute hand as the thing-to-be-moved, namely as the pointing finger that will touch
each object-set as it is counted. The next part of her utterance, “from number to num-
ber,” defines numbers as the object-sets to be counted; here she touches the large
numerals on the clock face in sequence, making it clear which numbers she is referring
to, while the form of her gestural movement enacts a canonical counting motion, boun-
cing from one number to the next clockwise around the dial. The gesture alone provides
the origin, direction/path, and manner of the counting motion, which is notably not the
continuous, steady movement of a clock hand but the intermittent, bouncing movement
of a human hand touching objects while counting. The same gesture continues during
the next statement, “there’s five minutes between each number,” a statement that acti-
vates a cognitive model for the conventional system of time measurement, in which an
hour is divided into 60 minutes, and maps an interval of five minutes to the space
between adjacent numbers on the clock. In this example, a single gesture in coordina-
tion with two verbal statements sets up mappings from two cognitive models: a mapping
from objects in the counting model to numbers on the clock face, and a mapping from
units of time in the time measurement model to intervals of space on the clock face. The
second mapping conjoins with the first to implicitly generate a third mapping: linking
units in the system of time measurement (minutes) to elements of the object-sets
being counted (five minutes per object-set). All of this is accomplished through the
coordination of gaze, gesture, speech, and a culturally constituted artifact, all carefully
orchestrated to guide the novice’s conceptualization (Williams 2008b).
Once these mappings are established, the teacher performs the counting process by
grasping the minute hand and moving it to the 5, the 10, and the 15, pausing momen-
tarily at each while saying “five, ten, fifteen….” If the children have succeeded in mak-
ing the correct conceptual mappings, they will see the clock hand as a counting finger
that touches each number in sequence while the elapsed minutes are counted. This,
in microcosm, is how conventional functional systems get propagated, sustaining the
cognitive accomplishments of the human species.
now another way is we count by fives from number to number;

that we say it when we move this there’s five minutes
between each number
Fig. 15.4: Gestures guiding conceptual mapping
Given this instruction, the children must perform the activity with diminishing help
until they are able to instantiate the functional system successfully in appropriate
contexts with little effort; only then would we say that they have mastered the practice.
Once they become proficient and use the system repeatedly, they will come to recognize
the hand configurations and numeric labels as standing for particular five-minute times
(oh five, ten, fifteen, and so on), and they will shift strategies from counting to directly
naming these times, retaining counting as a backup strategy should memory fail them.
A new functional system will emerge, one that supports more efficient conduct of the
activity while it reduces the cognitive demands on the individual coordinating the sys-
tem to produce the intended outcome. The expert system will differ from the novice sys-
tem, but the counting-based practice will continue to be retained by our culture as a
stepping-stone because it enables the sustained successful performance through
which the memory-based ability arises.
7. Conclusion
This article has presented evidence for gesture’s role in: (1) coordinating the functional
systems through which cognitive work gets done, and (2) propagating those systems
across generations. In purposeful human activity, participants gesture not simply to
express but to accomplish. The familiar conduit metaphor of communication proves
inadequate for studying meaning-making in situated activity because it obscures the
ways gesture operates in distributed systems for human cognition. Even where the
focus of study is exclusively on speech, the conduit metaphor tends to mislead because,
as Hutchins (2006) points out, “it is easier to establish a meaning for words embedded
with gestures that are performed in coordination with a meaningful shared world than it
is to establish meanings for words as isolated symbols” (395). That humans can commu-
nicate solely through words is clear, but that such communication should be regarded
as prototypical is clearly mistaken. Recognizing this, leading gesture researchers like
Kendon and McNeill have argued that gesture, like speech, is part of utterance.
Researchers who study distributed cognition find it more productive to treat gesture
as part of the functional systems through which cognitive outcomes are accomplished.
If we expand the unit of analysis to encompass aspects of the setting, of mutual orien-
tation and (inter-)action, and of shared knowledge and the unfolding of goal-directed
activity, then we stand a better chance of understanding and appreciating the critical
role that gesture plays in human cognition and communication.
8. References
Alač, Morana and Edwin Hutchins 2004. I see what you are saying: Action as cognition in fMRI
brain mapping practice. Journal of Cognition and Culture 4(3): 629–661.
Becvar, L. Amaya, James Hollan and Edwin Hutchins 2005. Representational gestures as cogni-
tive artifacts for developing theory in a scientific laboratory. Semiotica 156(1/3): 89–112.
Chafe, Wallace 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
Clark, Andy 1997. Being There: Putting Brain, Body, and World Together Again. Cambridge: Mas-
sachusetts Institute of Technology Press.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan D. Duncan, Justine Cassell
and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language: Essays in Honor
of David McNeill, 195–212. Amsterdam: John Benjamins.
Halverson, Christine 1995. Inside the cognitive workplace: New technology and air traffic control.
Ph.D. dissertation, Department of Cognitive Science, University of California, San Diego.
Hazlehurst, Brian 1994. Fishing for cognition: An ethnography of fishing practice in a community
on the west coast of Sweden. Ph.D. dissertation, Departments of Anthropology and Cognitive
Science, University of California, San Diego.
Holder, Barbara 1999. Cognition in flight: Understanding cockpits as cognitive systems. Ph.D. dis-
sertation, Department of Cognitive Science, University of California, San Diego.
Hutchins, Edwin 1995a. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Hutchins, Edwin 1995b. How a cockpit remembers its speeds. Cognitive Science 19: 265–288.
Hutchins, Edwin 2001. Distributed cognition. In: Neil J. Smelser and Paul B. Baltes (eds.), Inter-
national Encyclopedia of the Social & Behavioral Sciences, 2068–2072. Oxford: Elsevier.
Hutchins, Edwin 2003. Cognitive ethnography. Plenary address at the 25th meeting of the Cogni-
tive Science Society, Boston, MA, July 31–August 2.
Hutchins, Edwin 2006. The distributed cognition perspective on human interaction. In: Nick J. En-
field and Stephen C. Levinson (eds.), Roots of Human Sociality: Culture, Cognition and Inter-
action, 375–398. Oxford: Berg.
Hutchins, Edwin 2010. Enaction, imagination, and insight. In: John Stewart, Olivier Gapenne and
Ezequiel A. Di Paolo (eds.), Enaction: Towards a New Paradigm for Cognitive Science, 425–
450. Cambridge: Massachusetts Institute of Technology Press.
Hutchins, Edwin and Tove Klausen 1996. Distributed cognition in an airline cockpit. In: Yrjö En-
geström and David Middleton (eds.), Cognition and Communication at Work, 15–34. New
York: Cambridge University Press.
Hutchins, Edwin and Saeko Nomura 2011. Collaborative construction of multimodal utterances.
In: Jürgen Streeck, Charles Goodwin and Curtis LeBaron (eds.), Multimodality and Human
Activity: Research on Human Behavior, Action, and Communication, 29–43. Cambridge:
Hutchins, Edwin and Leysia Palen 1997. Constructing meaning from space, gesture, and speech.
In: Lauren B. Resnick, Roger Säljö, Clotilde Pontecorvo and Barbara Burge (eds.), Discourse,
Tools, and Reasoning: Essays on Situated Cognition, 23–40. Berlin: Springer-Verlag.
example. In: Aaron Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–210. Elmsford, NY: Pergamon Press.
Hague: Mouton.
Press.
Kirsh, David 1995. The intelligent use of space. Artificial Intelligence 73: 31–68.
Kirsh, David and Paul Maglio 1994. On distinguishing epistemic from pragmatic actions. Cognitive
Science 18(4): 513–549.
Latour, Bruno 1986. Visualization and cognition: Thinking with eyes and hands. Knowledge and
Society: Studies in the Sociology of Culture Past and Present 6: 1–40.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3):
350–371.
of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Murphy, Keith M. 2004. Imagination as joint activity: The case of architectural interaction. Mind,
Culture, and Activity 11(4): 267–278.
Reddy, Michael J. 1979. The conduit metaphor: A case of frame conflict in our language about lan-
guage. In: Andrew Ortony (ed.), Metaphor and Thought, 284–297. Cambridge: Cambridge
University Press.
Streeck, Jürgen 2008. Depicting by gesture. Gesture 8(3): 285–301.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. (Gesture Studies 2.) Amster-
Williams, Robert F. 2004. Making meaning from a clock: Material artifacts and conceptual blend-
ing in time-telling instruction. Ph.D. dissertation, Department of Cognitive Science, University
of California, San Diego.
Williams, Robert F. 2006. Using cognitive ethnography to study instruction. In: Sasha A. Barab,
Kenneth E. Hay and Daniel T. Hickey (eds.), Proceedings of the 7th International Conference
of the Learning Sciences, Volume 2, 838–844. International Society of the Learning Sciences
(distributed by Lawrence Erlbaum Associates).
Williams, Robert F. 2008a. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Williams, Robert F. 2008b. Guided conceptualization: Mental spaces in instructional discourse. In:
Todd Oakley and Anders Hougaard (eds.), Mental Spaces in Discourse and Interaction, 209–
234. Amsterdam: John Benjamins.
Williams, Robert F. 2008c. Situating cognition through conceptual integration. Paper presented at
the 9th conference on Conceptual Structure, Discourse, and Language, Case Western Reserve
University, Cleveland, OH, October 18–20.
Robert F. Williams, Appleton, WI (USA)
16. Social psychology: Body and language

in social interaction
1. Introduction
2. General social-psychological paradigms for interpersonal communication
3. Specific social-psychological models for bodily communication
4. Towards an embodied social psychology
5. References
Abstract
This contribution aims to provide an up-to-date picture of the theoretical models on com-
munication and bodily communication in social psychology, from the classical Encoding/
Decoding paradigm for explaining interpersonal communication in general to the con-
temporary Communication Accommodation Theory and Parallel Process Model for
16. Social psychology: Body and language in social interaction 259
the explanation of bodily communication in social context. In introducing the paradigms,

this chapter focuses on the role of bodily communication in social interaction, in partic-
ular on the main social-psychological processes involved in social situations, such as
social perception, judgment, attribution, attitude, identity, interaction (both interpersonal
and inter-group). The chapter analyzes contemporary dual system models in social psy-
chology for the explanation of social behaviours: according to these models, social be-
haviours are activated by either reflective or impulsive processes which act in parallel
way. The authors try to apply this type of models to the explanation of bodily commu-
nication. Finally, the importance of an embodied social-psychological perspective is
outlined.
1. Introduction
This chapter briefly revises social psychology’s approach to the study of bodily and lin-
guistic communication in social interaction, and it is organized in three main sections.
The second section describes the main broad theoretical paradigms used in studying
bodily and linguistic communication. The third section then describes the specific
social-psychological models; they are illustrated with the main topics covered by social
psychological studies, thus focusing on the role body and/or language in interaction
plays in each of the main processes involved in social interaction.
Social psychology, in focusing on psychological aspects of social life, views language
(or communication in general) and social life as intertwined: “just as language use
pervades social life, the elements of social life constitute an intrinsic part of the way lan-
guage is used” (Krauss and Chiu 1998: 41). Thus on the one hand, for social psychology,
language characterizes all the classical objects of social psychology: social perception,
attribution, attitude, identity, both interpersonal and intergroup social interaction
(e.g., Kruglanski and Semin 2007). On the other hand, while linguists focus on language
as abstract structure, social psychology focuses on the social context (e.g., participants’
definition of the social situation, participants’ perceptions of others) affecting actual
language uses, and thus communication.
Social psychology recognizes that not all communication involves language. A broad
definition of communication, accepted at least within mainstream social psychology,
views it as a process involving exchanges of representations (Sperber and Wilson
1986). Language (either speech or writing) is one possible medium and form for repre-
sentations, but it is not the only one, since bodily features could also do the same work.
Following Krauss and Chiu (1998), in human communication a common case would be
constituted by two persons as two information-processing devices where one person
modifies the physical environment of the other person (say, perturbations of air mole-
cules caused by speech), and therefore the second one constructs mental representa-
tions which are similar to the first person’s mental representations. Departing from
such a basic conception, four “paradigms” or “models” of interpersonal communication
can be identified in terms of their different characterizations of the process by which
representations are conveyed in the communication process (following Krauss
and Chiu 1998; see also Krauss and Fussell 1996): Encoding/decoding, Intentionalist,
Perspective-taking, Dialogic paradigm.
2. General social-psychological paradigms for interpersonal

communication
Each paradigm is a broad theoretical perspective which is characterized by certain as-
sumptions and an emphasis shared by the different investigators in approaching the
study of communication. These paradigms, or some of them or a variation of them,
are usually displayed by mainstream handbooks of social psychology when discussing
communication or language studies’ state of the art developments in a review chapter.
Usually, some contributions or authors deal with just one paradigm, or some variations
of it; on many occasions, a specific contribution simply assumes a paradigm or even it
simply shares some paradigm’s tenets. All these paradigms have, more or less, roots
in non-psychological disciplines, from mathematics and cybernetics to linguistic and
semiotics, from philosophy of language to sociology. All of them have then been
adapted or transposed to psychology in order to study human or animal communication
phenomena and processes.
2.1. Encoding/decoding paradigm

In this conception (roots in Wiener 1948; Shannon and Weaver 1949), representations
are conveyed by a code. A code is a system that maps a set of concrete features (the
signifier, e.g., the letters or phonemes of the word C-A-T) onto a set of meanings
(the signified, i.e., the concept, the small domesticated feline): Morse code is one of
the simplest kind of codes where the mapping is one-to-one between signifier and sig-
nified (and vice versa). The representation is transformed through code, such as lan-
guage, into a signifier (encoding) that can be transmitted from one person to the
other and which can be transformed back into representation (decoding) by the recip-
ient. In the case of language, linguistic representations are the means through which
mental representations are conveyed and shared among speakers and addressees. The
two main assumptions are: a) the meaning of a message is completely specified by its
elements (this is implicit in the definition of a code); b) encoding and decoding are
two autonomous and independent processes that realize communication. Even if lan-
guage can be likened to a code, at least to a certain degree, these two assumptions
have been questioned and basically rejected by the subsequent paradigms as sufficient
features for describing communication. One of the main problematic issues in this con-
ception is the impossibility to explain why and how it is that sometimes, or even often,
the same message can be understood to mean different things in different contexts. More-
over, even keeping the context constant, the same message can come to mean different
things to different addressees. In fact, evidence indicates that when a speaker designs a
message, s/he attempts to take the addressee’s properties into account (e.g., Bell 1980;
Fussell and Krauss 1989a; Graumann 1989). A communication paradigm relying just
on encoding and decoding, without any element of the context being part of the code,
cannot explain why the same encoding can at different times yield different decodings.
2.2. Intentionalist paradigm

In the encoding/decoding paradigm, meanings are considered to be properties of mes-
sages. However, alternative views consider that a recipient goes beyond the words to
grasp a speaker’s intended meaning. In such a perspective, communication requires an

exchange of communicative intentions, and messages are the vehicles by which such
exchange is accomplished. Communicative intentions cannot always be mapped one-
to-one onto words, as the previous paradigm would expect. Rather, speakers have to
select one out of a variety of potential alternative formulations in any given situation;
consequently, the addressee has to make an inferential step in order to derive the
speaker’s communicative intention. Such an intentionalist paradigm rests on two basic
ideas of pragmatic theory: the cooperative principle and speech act theory.
The philosopher Grice (1989) specified the principles speakers use to convey, and ad-
dressees use to identify, communicative intentions which are expressed implicitly, i.e.,
not simply on the basis of the literal meaning of the expressions used. Participants to
a conversation implicitly adhere to a set of conventions (the “cooperative principle”),
treating messages as conforming to four general rules or maxims: quality (truthful mes-
sage), quantity (informative as required, but not more informative), relation (relevant),
and manner (clear, brief, orderly). Listeners expect speakers to adhere to these maxims
and communicators use this expectation when they produce and comprehend messages.
Departures and violations from these maxims and relative expectations are considered
deliberate and as conveying a meaning which is different from the literal one: e.g., “an
utterance like ‘It’s nice to see someone who find this topic so stimulating,’ said about a
student who has fallen asleep during a lecture, is understood to have been intended
ironically” (Krauss and Chiu 1998: 44).
Within philosophy of language, speech act theory (Austin 1962; Searle 1969, 1985)
also contributed to the rise of the intentionalist approach. Any utterance comprises
three types of acts: a locutionary act (the act of uttering a specific sentence with a spe-
cific conventional meaning), an illocutionary act (the act of demanding, asserting, prom-
ising, etc., through the use of a specific locution), and a perlocutionary act (an attempt
to have a particular effect on the addressee). The same illocutionary and perlocutionary
force can be realized via a range of different locutions. For example, the act of request-
ing a person to close a door can be performed by any of the following: “Shut the door,”
“Would you mind closing the door,” “Did you forget to shut the door?,” “Can you think
of any reason why we should keep the door open?,” “I’m having trouble hearing you
because of all the noise in the hall,” “Do you feel a draft?,” and so on (Krauss and
Chiu 1998: 44). Each one of these utterances has a different literal meaning: however,
they all could be understood as a request to close the door, given the appropriate con-
text. Only the first utterance is a so-called direct speech act, namely an utterance where
the locutionary and illocutionary forces are the same (i.e., the literal and the intended
meanings overlap); all the other utterances are indirect speech acts (the locutionary and
illocutionary forces are different as are the literal and the intended meaning in each
utterance).
Empirical researches about intentionalist approaches focused largely in testing com-
prehension in addressees (e.g., models for indirect speech act comprehension), while
the processes by which speakers draw on the cooperative principle and speech acts in
formulating messages are largely overlooked by empirical researches.
In the intentionalist paradigm, messages are considered vehicles for conveying
speakers’ communicative intentions. However, the specific perspective of the recipient
can be different and therefore the reconstruction of the communicative intention under-
lying the message could be based on different interpretative contexts. Therefore, on the
one hand, the same message can have different meanings for different recipients, and,
on the other hand, speakers consequently try to take their addressees’ perspectives into
account in formulating their messages (Krauss 1987).
2.3. Perspective-taking paradigm

The relativity of a messages’ meaning given by the addressees’ and speakers’ specific
perspective has deep roots in social psychology, going back at least to George Herbert
Mead (1934): communication is based on one’s own capacity to anticipate how others
would respond to her/his own behaviours (including communicative ones). This is done
by taking the role of the other, i.e., taking into account the interlocutor’s point of view
and viewing themselves from the other person’s perspective. The intentionalist para-
digm somehow implicitly recognizes this issue: for example, when the Gricean maxim
of quantity requires the message to be adequately informative (i.e., neither incomplete
nor redundant), it implies that the addressee’s level of information is taken into account
(similar arguments hold true for other Gricean maxims as well).
Examples of features that are taken into account in order to consider an addressee’s
perspective when producing messages, i.e. communicating with him/her, are, e.g.; ad-
dressee’s and speaker’s relative spatial position for referential communication; or
social-category memberships for attitudinal communication.
Another issue pertains to the various levels of language use to which perspective tak-
ing could apply: the phonological level (degree of care in articulating a word related to
the addressee’s presumed familiarity with it); the syntactical level (e.g., an expert mem-
ber simplifying sentence structure for a novice, like a parent with her/his child); the lex-
ical choice level (e.g., literal and conventional vs. figurative and idiosyncratic when
addressing names to others vs. oneself).
Two processes operate to adapt one’s own communication to the interlocutor
(Krauss and Chiu 1998): one employs heuristics to derive an addressee’s perspective
from rather stable indices such as group or category membership’s salience in the situa-
tional context; the other one derives the addressee’s perspective from emerging
features in the ongoing social interaction, such as attention. The quality of the gener-
ated messages therefore improves when a speaker is provided with feedback (either
audible or visible) about her/his own previous messages, or when the speaker is tempo-
rarily put in the listener’s role. In everyday conversation, a close and continuous feed-
back is usually present. Here, a process of mutual reference among the interlocutors is
evident. From the examination of successive referring expressions (words or phrases
speakers use to refer to people, objects, events, relationships, and so on), it results
that the expressions tend to decrease in quantity and to be less detailed and lengthy:
this is interpreted as support of the idea that conversational partners construct shared
perspectives in the process of communication (Hardin and Higgins 1996).
On the production side, evidence, even from different theoretical perspectives, shows
that speakers take their addressees’ perspectives into account in the formulation of the
messages (e.g., Clark and Murphy 1982; Edwards and Potter 1992; Fussell and Krauss
1989a; Graumann 1989). On the comprehension side, evidence shows that messages de-
signed for a specific audience communicate less well to a different one (Fussell and
Krauss 1989b), and that having the chance to provide feedback gives the listener more
benefit in comprehending the speaker’s messages (Kraut, Lewis, and Swezey 1982).
2.4. Dialogic paradigm

The above three mentioned paradigms, though differing in many respects, locate the
meaning in one of the elements of the communication process: either in the message
(according to the encoding/decoding paradigm), in the speaker’s intentions (for the
intentionalist paradigm), or in the addressee’s point of view (for the perspective-
taking paradigm). On the contrary, in the dialogic paradigm meaning is regarded
as an emergent property of the participants’ joint activity. While for the first three
paradigms communication is described in terms of participants’ individual acts of
production and comprehension, in the dialogic paradigm communication is a colla-
borative process that produces shared meanings. For example, feedback is not simply
a mechanism by which addressees help speakers to generate more informative
messages; it is rather an intrinsic part of the process by which the meanings of mes-
sages are established (Krauss and Chiu 1998). This paradigm has been largely im-
ported from discourse analysis and conversational analysis (e.g., Edwards and
Potter 1992) The mercurial reality of everyday conversational exchanges and discur-
sive versions appears to be much more chaotic and “promiscuous” than the ideal
exchange forms based on separate and autonomous language or communicative pro-
cessors and acts considered by the previous three paradigms. According to the dialogic
paradigm, participants jointly work in conversational sequences at achieving a com-
mon purpose otherwise individually impossible: namely, a state of inter-subjectivity.
Bringing their own divergent social realities, participants work together with a mutual
commitment to a “temporarily shared social world” (Rommetveit 1980: 76): inter-
subjectivity is created and maintained with continuous modifications, thanks to acts of
communication.
Evidence has been mostly qualitative and descriptive (e.g., Edwards and Potter 1992;
Marková and Foppa 1991; Marková, Graumann, and Foppa 1995). The collaborative
model (e.g., Clark 1996) tried to pursue an experimental approach: an act of reference
is accomplished via first a presentation, when the utterance is produced, and then an
acceptance, when the participants come to agree that the message has been understood.
Each phase can comprise several acts of speaking, all aiming at ensuring that the mean-
ing has been coordinated. The meaning of an utterance emerges from the process of
interaction, and the meaning of an expression is what participants, mostly implicitly,
agree it to mean (Krauss and Chiu 1998).
A distinctive feature of this paradigm compared to the other ones is the different
view on communication and participant’s cognitive processes, as well as language-
world and language-mind relations. In individualistically oriented paradigms, percep-
tions are precursors to communication and exist independently of it. In the dialogic
paradigm, inter-individualistically oriented perceptions of the world (as well as of the
mind) themselves are a by-product of communication, since they derive from the
state of mutual orientation and the way in which people talk about the world. Here,
communication has a preeminent role in the construction of the mind and the world,
rather than the other way around as in the above mentioned paradigms.
Another distinctive feature is the intrinsically pragmatic emphasis: communication is
intended as an action, even when it serves merely descriptive functions, i.e., even when
it merely consists in making reference, describing properties of the world or of the
mind. Its outcome is the making of the world and the mind, via a joint shared activity
among participants to the social interaction. The social construction of the world and
the mind is possible thanks to communication (e.g., from Berger and Luckmann
1966; to Gergen 1985; to Edwards and Potter 1992).
It must be noted that this view applies not simply to talk-in-interaction, but to any
form of communication, even when no interlocutors are present. In fact, a potential
addressee or audience is always at stake even when writing or when arguing with
oneself (Billig 1987).
Though this paradigm has long roots in the past, it has recently found new, maybe
unexpected, vital lymph from social neuroscience: results show the power of language
and communication in shaping the mental and social reality through specific brain
changes (e.g., Paquette et al. 2003; Siegel 1999).
3. Specific social-psychological models for bodily communication

Under the above mentioned “umbrella” paradigms, more specific social-psychological
theoretical models flourished either within one paradigm or on the borders among
them with some forms of eclectic approach. Using these theoretical models, classical
social-psychological processes have been studied, such as interpersonal perception, pre-
judices and stereotypes, attitudes and interpersonal behaviours, intergroup relations
and social identity. The main theoretical models which chronologically inspired social-
psychological research in the last decades concerning bodily communication are briefly
summarized in the next four sub-sections.
3.1. Theory of interpersonal adaption

The theoretical models for the study of bodily communication in social psychology have
been primarily devoted to the role of bodily communication in social interaction and to
how people’s body conducts influence others’ behaviours. The main theoretical models
have therefore focused primarily on the phenomenon of bodily or “nonverbal adapta-
tion”: people perform interactive adjustments (interpersonal adaptation) in their bodily
movements. An individual changes his/her own behaviours to make them more or less
similar to those of his/her communication partner. These patterns of reciprocity or
divergence occur in reply to whether behaviour of involvement and intimacy increases
or decreases. Intimacy indicators are: interpersonal distance, contact, smile, body orien-
tation, and mutual gaze. These indicate willingness to communicate and send messages
of physical and psychological intimacy and interpersonal warmth. One of the first the-
oretical discussions of interpersonal adaptation is the Equilibrium Theory (Argyle and
Dean 1965). The interlocutors try to maintain the status quo of the intimacy degree of
the relationship: through bodily communicative channels, they attempt to maintain an
involvement level congruent with the intimacy level of their relation. If the verbal or
bodily intimacy expressions are too strong or weak, compensatory adjustments will
occur for maintaining or re-establishing a right or adequate level of involvement. If a
very intimate approach disturbs the intimacy equilibrium between the partners, one
can compensate by turning one’s eyes away or decreasing the amount of smiles and
gazes to the other. At other times, just with the aim of communicating one’s own inter-
nal state or a feeling for the other, people can do the inverse, that is, to return the bodily
intimacy level proposed by the interlocutor.
A theoretical model subsequent to the Equilibrium theory is the Nonverbal Expec-

tancy Violation Theory (Burgoon 1978): partners have expectations about the relational
meaning of their own and partners’ bodily behaviours. Violation of these expectations
could cause different types of reaction by the communicator, depending on the valence
(positive or negative) that s/he gives to such a violation. The violation valence is posi-
tive when actual behaviours are evaluated more favourably than the expected one. For
example, in the case of unexpected bodily behaviours of approaching, which indicate
the desire for intimacy, an individual giving a positive valence to this violation activates
reciprocity behaviours (in the example, other bodily behaviours of intimacy). In the
opposite case where the other’s behaviour violates relational expectancies and such
violation is negatively valued, compensatory responses will be activated (e.g., in the
example, interpersonal distance will be increased).
Another theoretical model that considers the reciprocity as the dominant tendency
among the interlocutors is the Interaction Adaptation Theory (IAT, Burgoon, Stern, and
Dillman 1995). According to this model, interactional behaviours are explained in
terms of required, expected and desired levels. People come into an interaction with
a set of requests, expectancies and desires that strongly influence their initial interactive
behaviours. Requests are based on biological factors, needs and strong emotional states.
Expectancies are based on standards and social requirements. Desires are highly per-
sonal and include goals, preferences and dislikes. Requests, expectancies and desires
form the structure of each relationship. The interaction between these three elements
produces a kind of interactional position (IP), which represents the behavioural attitude
that an individual will take with respect to the subsequent or ongoing interaction. The
discrepancy between the interactional position of an individual and the actual behav-
iour of her/his interlocutor will be predictive of any reciprocity or compensation, de-
pending on the valence of both the interactional position and behaviour. If a behaviour
is desirable (positive valence) and higher than the interactional position, reciprocity will
be expected (e.g., if you are strongly willing towards a more intimate interaction, and
the other, through her/his bodily behaviour, gives a positive response, the first will
meet on the second). However, when the interactional position exceeds the actual be-
haviour of the interlocutor, that is, if the actual conduct of the latter is not in the same
direction of expectancies, demands and wishes, there will be compensation in the inter-
locutor. These predictions may suggest the possible outcomes of an interaction. In other
words, if there is some knowledge or understanding of the interactional position of
the interlocutor, a person can adjust or adapt his/her bodily behaviour in order to get
the desired interactional results. Studies on these relations have shown that people’s
ability to adapt their behaviour to others’ behaviour predicts the outcome of child
development (Malatesta and Haviland 1982), marriage satisfaction (Gottman 1993),
and job interview outcomes (Siegman and Reynolds 1982).
3.2. Communication Accommodation Theory

From that theoretical background developed Communication Accommodation Theory
(CAT, Giles and Wadleigh 1999), according to which the individuals use strategic com-
munication to negotiate social distance. People adapt themselves to the others’ behav-
iour, behaving in a way more or less similar to their counterparts (convergence/
divergence).
Convergence is the strategy by which people adjust their communication to make

their visual, vocal or verbal behaviours more similar to the behaviours of their interac-
tional partners. Generally, convergence improves the effectiveness of communication:
the more two interlocutors are bodily similar, the more they will like and respect
each other and the more social rewards, including mutual acceptance, will ensue. The
convergence will be associated with favourable evaluations if the accommodative inten-
tion is favourably perceived (not as acquiescence or imitation). Convergence can also
occur in power relationship: in situations of interactional symmetry, both partners con-
verge, while in socially asymmetrical situations, only the powerless person converges
towards the powerful one. Communication Accommodation Theory suggests that the
convergence reflects the need for social integration with the other (or with a social
group or category). In this way, it is a reflection of the desire for approval.
Divergence, on the contrary, refers to the way in which the communicator accentu-
ates the differences between himself and the others through bodily behaviours. The rea-
sons for the divergence, often of social nature, may regard, for example personal disdain
for another and emphasis on one’s own social or group identity. Borrowing the assump-
tions of Social Identity Theory (Tajfel 1978), Communication Accommodation Theory
assumes that, in situation of comparison across groups, bodily behaviours emphasizing
differences (e.g., clothing, mannerisms, voice, accent, posture, gestures etc.) are used by
people to show difference and distinction with respect to those of an opposing or just
different group, the out-group. In this context, divergence would not be directed to
the individual, as it would rather refer to a group identity; indeed, particular bodily be-
haviours (accents, gestures, gait, etc.) may be perceived as a divergence by a receiver
belonging to another group. Divergence can be also positively perceived, for examples,
in different genders’ interaction. A man, speaking with a woman, uses a more deep
tone, while a woman use more treble tones with a man, as compared to when they inter-
act with persons of their own gender. The same can be said for other bodily behaviours
(posture, gait, gestures) that indicate gender identity or courtship. Moreover, even the
mere maintenance during the interaction of the initial bodily style, without “adapting”
it to the partner is a divergence strategy, but, in this case, negatively evaluated by the
recipients.
In general, therefore, interlocutors have expectations regarding the optimal levels of
convergence (or divergence), which may be based on stereotypes, interaction social
norms, and guidelines for acceptable behaviour in specific situations.
3.3. Parallel process model

More recently, social psychology has tended to move towards a comprehensive theoret-
ical explanation of bodily communication, able to integrate encoding and decoding in a
single model. One example is the Parallel Process Model (Patterson 2001), which, start-
ing from social judgment and perception theories (e.g., Fiske 1992; with “competence”
and “warmth” as the two main interpersonal judgement dimensions, e.g., Fiske, Cuddy,
and Glick 2007) and theories on social behaviour (e.g., Bargh 1997), postulates that in
social situations communicators are simultaneously senders and recipients of bodily
messages. These messages are aimed, more or less consciously, to specific social goals
and purposes. The parallel process model combines encoding and decoding processes
in one system led by a common set of determinants and mediation processes. On
the one hand, there are social judgments about others (through more or less auto-
matic processes), and on the other hand there are social behaviours (more or less
automatic too).
This model includes: (i) determinants; (ii) the social environment; (iii) cognitive-
affective mediators.
(i) Determinants are concerned with biological, cultural and personality aspects. The
first reflects the role of adaptive evolution, including communication patterns, such
as the positive and protective response towards babies’ faces that is beneficial
for their survival. Cultural and personality aspects may increase the variability
in communication: cross-cultural differences and personality styles influence and
characterize bodily communication.
(ii) The social environment regards interactional setting as well as the interlocutor.
The environment determines the stimuli to be judged and toward which to
react. Sometimes, the communicator, with his/her own specificity, chooses the
communication partner and the setting. This selection process, combined with
the personal determinants, could affect both evaluation and behavioural processes,
bringing, through homogeneity and similarity between the interlocutors, more
accurate social judgments about others, but also behavioural coordination in the
interaction.
(iii) If social and environmental determinants provide a framework for the interaction,
cognitive-affective mediators are the processes that drive the evolution of commu-
nication. These include personal dispositions, goals, emotional states, interpersonal
expectancies and cognitive resources. The goals are perhaps the most important
mediators, because they are cognitive representations of desired states toward
which people tend, and they can influence the investment of cognitive resources
based on the type of information revealed and on the processing depth of this
information. Affective states and mood are a combination of temporary disposi-
tions and can affect both the formation of social judgments and the bodily style
adopted. Finally, cognitive resources refer to cognitive investments available to
the interaction that can be variously distributed among themselves, partners,
settings, conversation topic, etc.
People evaluate others through bodily signals and react or act through bodily beha-
viors. In these activities, individuals are led by specific social goals. The dynamic rela-
tionship between the parallel social judgment and behavioural processes is constrained
by the influence of the determinants (biology, culture, and personality) and of the
social environment. Goals are critical in directing the two processes. Once goals are
activated by environmental stimuli, they consciously or unconsciously direct the course
of the two processes (social judgment and behaviour). In such a process, both cognitive
and affective mediators take part. The availability of cognitive resources and their
application in communication are important elements in this parallel processing.
When automatic processes are not available, or they are inappropriate or they do
not work, cognitive effort is then required in making social judgments and in managing
social behaviour. Furthermore, in these processes ( judgments, behaviour and cognitive
mediation), people are often influenced by their current affective state, emotions or
mood.
3.4. Automatic and deliberative processes

One of the most recent theoretical and empirical developments in social psychology is
the so-called Dual Models, in areas such as cognition, attitude, self-concept, decision
making and action (e.g., Fazio 1990; Strack and Deutsch 2004). In general, these models
distinguish between explicit and implicit, impulsive and reflective, spontaneous and
reasoned processes, derived from automatic associative processes or intentional delib-
erative processes. Evidence suggests that some judgments of others can be developed
automatically, outside conscious awareness. For example, Devine (1989) finds that a
racial stereotype is automatically activated in the face of a person of a particular ethnic
group. The development of research on automatic processes has important implications
for the understanding of communication. While verbal communication mainly involves
controlled processing, bodily communication often occurs automatically. Of course,
depending on the circumstances, people may also invest a great deal of effort in
controlling their own bodily behaviours and in trying to understand and interpret
others’ bodily behaviours. Fazio’s (1990) Motivation and Opportunity as Determinants
(MODE) model follows this trend. Developed for the explanation of social attitudes’
influence on behaviour, social perception and attribution, it can be applied to the for-
mation of interpersonal impressions through the perception of bodily behaviours and
to the influence of these impressions on bodily behavioural responses. According to
this model, motivation and opportunities, defined as cognitive resources and time,
determine whether the attribution process is deliberate or automatic. Psychological
processes (cognitive as well as behavioural) are neither purely spontaneous nor purely
intentional, but they are mixed processes involving both automatic and controlled com-
ponents. Any component controlled within a mixed sequence requires that one is
motivated to engage in the cognitive task (processing, attribution, etc.) and that s/he
has the opportunity (time and cognitive resources) to do so. Such mixed processes
are relevant for a number of issues regarding implicit measures of attitudes and
social-cognitive processes (the methodological aspect of automatic and deliberate
processes is further elaborated in Maricchiolo et al. this volume).
The two-system model proposed by Strack and Deutsch (2004) moves in the same
direction in order to explain social behaviour. Behaviours are the effect of the joint
operation of two separate information processing systems: reflective and impulsive pro-
cesses, which have different principles of representation and processing information and
follow different operative principles. The reflective system generates behavioural deci-
sions based on knowledge of the facts and experience, analysis of benefits and of likely
consequences. while the impulsive system generates behaviour through automatic
associative links and motivational orientations. The two systems operate in parallel,
although there is an asymmetry: the impulsive system is always busy in automatic pro-
cessing (by itself or in parallel to the operations of the reflective system), while the
reflective system may not be so. The latter also requires a high amount of cognitive
skills, thus distraction or extremely high or low levels of arousal may interfere with
the action of this system. On the contrary, the impulsive system requires little cognitive
ability, and it can control behaviour in sub-optimal conditions. Consequently, the pro-
cesses of the reflective system are more easily disturbed than those of the impulsive
system. The internal elements of the two systems are connected each other by different
types of relationships. In the reflective system, they are connected by semantic
relationships to which a truth value is assigned. In the impulsive system, relationships

are associative links, based on principles of contiguity and similarity.
Which then are the precursors of behaviour? The two systems use different operations
to generate behaviour. In the reflective system, behaviour is the result of a decision driven
by assessments of the future state in terms of benefits and likelihood of obtaining them
through the behaviour. In the impulsive system, behaviour is generated by the strength of
the activation of the behavioural patterns. The precursors of impulsive behaviour do not
imply knowledge of the valence and expectances. Perception is directly linked to beha-
viour, although not in the form of simple reflexes. The representation of a movement
evokes in some degree the actual movement to which it relates. This is consistent with
the assumption that in the impulsive system conceptual content and behavioural sche-
mata are bi-directionally related. This applies to the conceptual representations of the
behavioural antecedents or consequences as well as to the behaviour itself.
There is a substantial amount of studies supporting the link between conceptual and
motor contents (see Dijksterhuis and Bargh 2001 for a review). This concept is highly
applicable to bodily behaviours, especially in relation to verbal communication. Often,
the bodily cues, especially speech-related hand gestures, are the motor aspect (precur-
sor of verbal communication) of conceptual content. These representations consist of
both a conceptual and a dynamic base: to communicate, representation must be revived
not only in its conceptual aspects, but also in its dynamic aspects, that is, emotional, pos-
tural and motor activity (Rimé et al. 1984). In a motor-cognitive perspective, verbal
expression involves or includes motor phenomena, and gesture is merely the observable
part of the current representation operated by the speaker. This implies that there is no
expressive activity without a certain level of motor activity. Adapting the two-system
model to this view within communication means that the impulsive system would
control the motor activity, while the reflective system controls the cognitive system.
These theoretical positions linking action and communication have recently found
confirmation in the evidence of the mirror neurons, according to which there is a neu-
ronal correspondence within an observer of the hand movement performed by the
other party (i.e., the observer activates the same neurons that would be triggered if s/
he was the agent of the same action: Gallese and Goldman 1998). According to Rizzo-
latti and Arbib (1998: 188, 190–191) this system of correspondence between observation
and execution in some way provides a bridge between “doing” and “communicating.”
In general, applying the dual models on implicit and deliberate processes and the
variables that determine their functioning to bodily communication, the Motivation
and Opportunity as Determinants model (Fazio 1990) seems effective for understand-
ing the decoding (perception, interpretation, attribution) of bodily communication;
while the two-system model (Strack and Deutsch 2004) seems useful in explaining as-
pects of the encoding (i.e., the performance of bodily behaviours). The Motivation
and Opportunity as Determinants model could be indeed useful to explain how,
through the activation of more or less intentional (or volunteer) processes, impressions
and social judgments on others are formed or activated during the perception of their
bodily behaviours. The two-system model would be useful in order to explain how two
independent and parallel systems, impulsive and reflexive, work to generate bodily
behaviours. Moreover, given the assumed predominantly automatic (not deliberate)
nature of bodily behaviour, it would be reasonable to assume a greater influence of
the impulsive system in its activation. Nevertheless, sometimes perception and
evaluation of bodily behaviours also occurs in an automatic and unconscious way. The
origin of perception and the production of bodily communication could therefore be
based primarily on automatic associations among contiguous and elements similar in
nature rather than on the cognitive processing and analysis of benefits and likely con-
sequences of judgments and behaviours. Of course, some social-psychological variables
could work as mediators in the automaticity of these processes, such as motivation,
available cognitive resources and time, accessibility of conceptual content and meanings,
purposes, expectations and valences, as well as specific communication skills.
For example, to see our interlocutor smiling can automatically elicit a positive judg-
ment and generate, through an impulsive system and in a non-deliberative way, bodily
behaviours of approaching or replaying to the smile. The automaticity of these
processes can be based on our motivational orientation regarding:
(i) processing of positive information conveyed by the smile;

(ii) perception of the approach signal;
(iii) past experience of positive emotional states elicited by the smile;
(iv) behavioural reaction of approach (as opposed to avoidance).
On the other hand, the involvement of variables such as, for example, motivation to
understand the intentions and expectations of others and/or cognitive resources avail-
able for decoding the smiling behaviour can generate deliberate processes of interpre-
tation and, therefore, moderate the automaticity of our positive judgment of the other.
For example, we will try to understand if the other wants to obtain something from us, if
we are motivated to do so and/or if we can understand and relate her/his smile with the
interactional situation or with other verbal behaviours or otherwise. In addition, our
goals and expectations with respect to the communicative situation, as well as the ana-
lysis of the benefits and possible consequences, can activate the parallel intervention of
the reflective system in the planning and implementation of the bodily behaviour sche-
mata in reaction to the smile of the other (e.g., smile, reply, gaze, approach, contact or,
conversely, avoidance, withdrawal, escape).
In conclusion, the latest theoretical developments in the social-psychological expla-
nation of social-cognitive processes in general can also be applied to the understanding
and prediction of bodily behaviours. Understanding the deliberative and impulsive
components and processes, which lie at the origin of bodily behaviours and their per-
ception and evaluation, is a key step, after processes in social psychology, as well as
in order to understand the importance of bodily, after evaluation behaviours within
any kind of interpersonal relationship in general. For example, within the gestural
domain, it is now experimentally tested that some kind of gestures (conversational
and ideational vs. self-manipulation) have a more positive effect on the perception of
competence and composure of the speaker as well as on positive features of the mes-
sage (Maricchiolo et al. 2009). However, it is presently untested whether this kind of
effects rests on either automatic/impulsive and/or deliberative/reflective processes.
Future research should clarify this kind of issue.
4. Towards an embodied social psychology

Finally, one of the most recent trends in social psychology literature looks at the pivotal
and pervasive role that the body would have for all social psychological issues and
processes. Cognitive and affective elaboration of social objects, such as attitude, stereo-
type, social perception and emotion involves bodily state or movement as well as the
brain’s modality-specific systems for perception and action (Barsalou et al. 2003).
Such a process is called the embodiment of social cognition. Embodied cognition
theories evolved from advances in social cognition toward comprehensive accounts of
embodied phenomena that, traditionally, have been difficult to explain. Embodiment
underlies social information processing when the perceiver interacts with actual social
objects (online cognition) and when the perceiver represents social objects in their
absence (offline cognition; Niedenthal et al. 2005). Imitation of another person’s
happy facial expression is an example of online embodiment. On the other hand, under-
standing the word “happiness” or recalling a happy past experience by recruiting mod-
ality-specific systems in the present is an example of offline embodiment. Another
example is the importance of motor behaviour in attitudes. An attitude towards another
person is often showed by special behaviours, such as distance, body orientation, as well
as posture assumed during the interaction. Moreover, bodily responses during interac-
tion with novel objects or persons influence later-reported attitudes and impressions
(Tom et al. 1991). According to this theoretical approach, bodily postures (relaxed/
tense) and movements (approach/avoidance) are associated with (positive/negative)
inclinations and action tendencies toward objects and persons. Furthermore, these incli-
nations and tendencies influence attitudes toward or perception about those objects and
persons (see Bonaiuto, De Dominicis, and Ganucci Cancellieri volume 2; Maricchiolo,
Bonaiuto, and Gnisci volume 2 for one example within the domain of social power and
one within that of political orientation, respectively). Thus, attitudes and impressions
would seem to be determined, at least in part, by embodied responses (Niedenthal
et al. 2005). This process would occur when people process symbolic entities, such as
words. Cognitive elaboration of concepts, such as recognition, memory, and understand-
ing, is maximally efficient when relevant conceptual information is consistent with cur-
rent embodiments (Chen and Bargh 1999). So, also mimicry, imitation movements, and/
or postural synchrony during interaction are embodiments of positive attitude or per-
ception of the other (Bernieri and Rosenthal 1991; Chartrand and Bargh 1999); more-
over, these embodied processes would facilitate social perception, cooperation and
empathy (Neumann and Strack 2000). Finally, also the mere simulation through body
movements and poses of an embodiment can affect physiological and psychological
states. A recent study (Carney, Cuddy, and Yap 2010) shows that high power bodily dis-
plays, such as open, expansive postures, can affect changes in cortisol and testosterone
secretion as well as one’s own power self-perception and behaviours; for example, sit-
ting on a chair and leaning towards the left or the right would polarize our political ori-
entation towards, respectively, the left or the right of the political attitudes spectrum
(Oppenheimer and Trail 2010).
Such embodiment phenomena can originate in long-term processes, such as the Spa-
tial Agency Bias, according to which there is a link among the perception of social
agency and the reading and writing direction of the culture to which the person belongs;
in left-right writing cultures, people tend to attribute more agency to from left to right
postures and movements; while in right-left writing cultures the bias is reversed. The
general idea is that agentic targets (i.e., performing an action) are systematically asso-
ciated with a left position, with recipient targets to their right. The resulting direction of
the action is rightward. This idea, initially proposed by Chatterjee (2002) was then
applied from a social-psychological perspective, with agency interpreted as a fundamen-

tal characteristic of stereotype contents (Abele et al. 2008). The association between the
rightward spatial vector and agency is linked to the direction of writing/reading which in
fact is rightward in Western cultures. Cultural factors are therefore moderators of this
bias (e.g., Suitner and Maass 2008).
These recent trends therefore show how, within contemporary social psychology, the
body – and related issues of space – is gaining a central position within the conceptual
and theoretical frames of a number of authors and theories in order to innovatively re-
address the full range of social-psychological topics, from social perception and evalua-
tions, to stereotypes, prejudices and attitudes; from persuasion and social influence to
group and intergroup behaviour.
5. References
Abele, Andrea E., Mirjam Uchronski, Caterina Suitner and Bogdan Wojciszke 2008. Towards an
operationalization of the fundamental dimensions of agency and communion: Trait content rat-
ings in five countries considering valence and frequency of word occurrence. European Journal
of Social Psychology 38: 1202–1217.
Argyle, Michael and Janet Dean 1965. Eye contact, distance and affiliation. Sociometry 28: 289–304.
Austin, John L. 1962. How to Do Things with Words. Oxford: Clarendon Press.
Bargh, John A. 1997. The automaticity of everyday life. In: Robert S. Wyer (ed.), Advances in
Social Cognition, 1–61. Mahwah, NJ: Lawrence Erlbaum.
Barsalou, Lawrence W., Paula Niedenthal, Aron Barbey, and Jennifer Ruppert 2003. Social
embodiment. Psychology of Learning and Motivation 43: 43–92.
Bell, Allan 1980. Language style as audience design. Language in Society 13: 145–204.
Berger, Peter and Thomas Luckmann 1966. The Social Construction of Reality. New York:
Doubleday.
Bernieri, Frank J. and Robert Rosenthal 1991. Interpersonal coordination: Behavioral matching
and interactional synchrony. In: Robert S. Feldman and Bernard Rimé (eds.), Foundations
of Nonverbal Behaviour, 401–432. Cambridge/New York: Cambridge University Press.
Billig, Michael 1987. Arguing and Thinking. A Rhetorical Approach to Social Psychology. Cam-
Bonaiuto, Marino, Stefano De Dominicis and Uberta Ganucci Cancellieri volume 2. Gestures,
postures, gaze, and movement in work and organization. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body-Language-
Burgoon, Judee K. 1978. A communication model of personal space violations: Explication and an
initial test. Human Communications Research 4: 129–142.
Burgoon, Judee K., Lesa Stern and Leesa Dillman 1995. Interpersonal Adaptation: Dyadic Inter-
action Patterns. New York: Cambridge University Press.
Carney, Dana R., Amy J. C. Cuddy and Andy J. Yap 2010. Power posing: Brief nonverbal displays
affect neuroendocrine levels and risk tolerance. Psychological Science 21: 1363–1368.
Chartrand, Tanya L. and John A. Bargh 1999. The chameleon effect: The perception-behavior link
and social interaction. Journal of Personality and Social Psychology 76: 893–910.
Chatterjee, Anjan 2002. Portrait profiles and the notion of agency. Empirical Studies of the Arts
20(1): 33–41.
Chen, Mark and John A. Bargh 1999. Consequences of automatic evaluation: Immediate behavior
predispositions to approach or avoid the stimulus. Personality and Social Psychology Bulletin
25: 215–224.
Clark, Herbert H. 1996. Using Language. Cambridge: Cambridge University Press.

Clark, Herbert H. and Gregory L. Murphy 1982. La visée vers l’auditoire dans la signification et la
référence. Bulletin de Psychologie 35: 767–776.
Devine, Patricia G. 1989. Stereotypes and prejudice: Their automatic and controlled components.
Journal of Personality and Social Psychology 56: 5–18.
Dijksterhuis, Ap and John A. Bargh 2001. The perception-behavior expressway: Automatic ef-
fects of social perception on social behavior. Advances in Experimental Social Psychology
33: 1–40.
Edwards, Derek and Jonathan Potter 1992. Discursive Psychology. Thousand Oaks, CA: Sage.
Fazio, Russell H. 1990. Multiple processes by which attitudes guide behavior: The MODE model
as an integrative framework. Advances in Experimental Social Psychology 23: 75–109.
Fiske, Susan T. 1992. Thinking is for doing: Portraits of social cognition from daguerreotype to la-
serphoto. Journal of Personality and Social Psychology 63: 877–889.
Fiske, Susan T., Amy J. C. Cuddy and Peter Glick 2007. Universal dimensions of social cognition:
Warmth and competence. Trends in Cognitive Sciences 11: 77–83.
Fussell, Susan R. and Robert M. Krauss 1989a. The effects of intended audience on message pro-
duction and comprehension: Reference in a common ground framework. Journal of Experi-
mental Social Psychology 25: 203–219.
Fussell, Susan R. and Robert M. Krauss 1989b. Understanding friends and strangers: The effects
of audience design on message comprehension. European Journal of Social Psychology 19:
509–526.
Gallese, Vittorio and Alvin Goldman 1998. Mirror neurons and the simulation theory of mind-
reading. Trends in Cognitive Sciences 2: 493–501.
Gergen, Kenneth J. 1985. The social constructionist movement in modern psychology. American
Psychologist 40: 266–275.
Giles, Howard and Paul Mark Wadleigh 1999. Accommodating nonverbally. In: Laura K. Guer-
riero, Joseph A. DeVito and Michael L. Hecht (eds.), The Nonverbal Communication Reader:
Classic and Contemporary Readings, 2nd edition, 425–436. Prospect Heights, IL, USA:
Waveland Press.
Gottman, John M. 1993. The roles of conflict engagement, escalation or avoidance in marital inter-
action: A longitudinal view of five types of couples. Journal of Consulting and Clinical Psychol-
ogy 61(1): 6–15.
Graumann, Carl Friedrich 1989. Perspective setting and taking in verbal interaction. In: Rainer
Dietrich and Charles Friedrich Graumann (eds.), Language Processing in Social Context, 95–
112. Amsterdam: North–Holland.
Grice, H. Paul 1989. Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Hardin, Curtis D. and Edward Tory Higgins 1996. Shared reality: How social verification makes
the subjective objective. In: Edward Tory Higgins and Richard M. Sorrentino (eds.), Handbook
of Motivation and Cognition: The Interpersonal Context, Volume 3, 28–85. New York: Guilford Press.
Krauss, Robert M. 1987. The role of the listener: Addressee influences on message formulation.
Journal of Language and Social Psychology 6: 81–97.
Krauss, Robert M. and Chi-Yue Chiu 1998. Language and social behavior. In: Daniel T. Gilbert,
Susan T. Fiske and Gardner Lindzey (eds.), The Handbook of Social Psychology, Volume II,
4th edition, 41–88. Boston: McGraw Hill.
Krauss, Robert M. and Susan R. Fussell 1996. Social psychological models of interpersonal com-
munication. In: Edward Tory Higgins and Arie W. Kruglanski (eds.), Social Psychology Hand-
book of Basic Principles, 655–701. New York: Guilford Press.
Kraut, Robert, Steven H. Lewis and Lawrence W. Swezey 1982. Listener responsiveness
and the coordination of conversation. Journal of Personality and Social Psychology 43:
718–731.
Kruglanski, Arie W. and Gün Semin 2007. The epistemic bases of interpersonal communication.
In: Miles Hewstone, Henk A. W. Schut, John B. F. de Wit, Kees van den Bos and Margaret S.
Stroebe (eds.), The Scope of Social Psychology: Theory and Applications, 107–120. New York:
Psychology Press.
Malatesta, Carol Zander and Jeannette M. Haviland 1982. Learning display rules: The socializa-
tion of emotion expression in infancy. Child Development 53: 991–1003.
Maricchiolo, Fridanna, Angiola di Conza, Augusto Gnisci and Marino Bonaiuto this volume. De-
coding bodily forms of communication. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An
International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Maricchiolo, Fridanna, Augusto Gnisci, Marino Bonaiuto and Ginaluca Ficca 2009. Effects of dif-
ferent types of hand gestures in persuasive speech on receivers’ evaluations. Language and
Cognitive Processes 24(2): 239–266.
Maricchiolo Fridanna, Marino Bonaiuto and Augusto Gnisci volume 2. Body movements in polit-
ical discourse. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Jana Bressem (eds.), Body-Language-Communication: An International Handbook on
Marková, Ivana and Klaus Foppa (eds.) 1991. Asymmetrics in Dialogue. Hemel Hempstead,
England: Harvester Wheatsheaf.
Marková, Ivana, Carl Friedrich Graumann and Klaus Foppa (eds.) 1995. Mutualities in Dialogue.
Mead, George Herbert 1934. Mind, Self, and Society. Chicago: University of Chicago Press.
Neumann, Roland and Fritz Strack 2000. “Mood contagion”: The automatic transfer of mood
between persons. Journal of Personality and Social Psychology 79: 211–223.
Niedenthal, Paula M., Lawrence W. Barsalou, Piotr Winkielman, Silvia Krauth-Gruber and
Francois Ric 2005. Embodiment in attitudes, social perception, and emotion. Personality and
Social Psychology Bulletin 9(3): 184–211
Oppenheimer, Daniel M. and Thomas E. Trail 2010. Why leaning to the left makes you lean to the
left: Effect of spatial orientation on political attitudes. Social Cognition 28: 651–661.
Paquette, Vincent, Johanne Lévesque, Boualem Mensour, Jean-Maxime Leroux, Gilles Beaudoin,
Pierre Bourgouin and Mario Beauregard 2003. Change the mind and you change the brain: Ef-
fects of cognitive-behavioral therapy on the neural correlates of spider phobia. NeuroImage 18:
401–409.
Patterson, Miles L. 2001. Toward a comprehensive model of non-verbal communication. In: Wil-
liam Peter Robinson and Howard Giles (eds.), The New Handbook of Language and Social
Psychology, 159–176. Chichester: John Wiley & Sons.
Rimé, Bernard, Loris Schiaratura, Michel Hupet and Anne Ghysselinckx 1984. Effects of relative
immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level. Moti-
vation and Emotion 8: 311–325.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Neuros-
ciences 21: 188–194.
Rommetveit, Ragnar 1980. Prospective social psychological contributions to a truly interdisciplin-
ary understanding of ordinary language. Journal of Language and Social Psychology 2: 89–104.
Searle, John R. 1969. Speech Acts. Cambridge: Cambridge University Press.
Searle, John R. 1985. Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge:
Shannon, Claude Elwood and Warren Weaver 1949. The Mathematical Theory of Communication.
Urbana-Champaign: Illinois University Press.
Siegel, Daniel J. 1999. The Developing Mind. New York: Guilford Press.
Siegman, Aron W. and Mark A. Reynolds 1982. Effects of mutual invisibility and topical inti-
macy on verbal fluency in dyadic communication. Journal of Psycholinguistic Research 12:
443–455.
17. Multimodal (inter)action analysis: An integrative methodology 275
Sperber, Dan and Deidre Wilson 1986. Relevance: Communication and Cognition. Cambridge,
Strack, Fritz and Roland Deutsch 2004. Reflective and impulsive determinants of social behavior.
Personality and Social Psychology Review 8(3): 220–247.
Suitner, Caterina and Ann Maass 2008. The role of valence in the perception of agency and com-
munion. European Journal of Social Psychology 38: 1073–1082.
Tajfel, Henri (ed.) 1978. Differentiation between Social Groups: Studies in the Social Psychology
of Intergroup Relations. London: Academic Press.
Tom, Gail, Paul Pettersen, Teresa Lau, Trevor Burton and Jim Cook 1991. The role of overt head
movement in the formation of affect. Basic and Applied Social Psychology 12: 281–289.
Wiener, Norbert 1948. Cybernetics: Or Control and Communication in the Animal and the
Machine. Paris: Hermann.
Marino Bonaiuto, Rome (Italy)

Fridanna Maricchiolo, Rome (Italy)
17. Multimodal (inter)action analysis: An integrative

methodology
1. Introduction
2. History of theory and methodology
3. Analysing multimodal (inter)action: Methodology
4. Conclusions
5. References
Abstract
This article is an introduction to the theoretical and methodological backgrounds of
multimodal (inter)action theory. The aim of this theory is to explain the complexities
of (inter)action, connecting micro- and macro levels of analysis, focusing on the social
actor. The most important theoretical antecedent, mediated discourse analysis (see
Scollon 1998, 2001b), is presented with its key concepts mediated action and modes.
It is shown how action is used as the unit of analysis and how modes are understood
in multimodal (inter)action analysis – as complex cultural tools, as systems of
mediated action with rules and regularities and different levels of abstractness. Subse-
quently, methodological basics are introduced, such as lower-level, higher-level and fro-
zen action; modal density, which specifies the attention/awareness of the social actor; and
horizontal and vertical simultaneity of actions. Horizontal simultaneity can be plotted on
the heuristic model of foreground-background continuum of attention/ awareness. Vertical
simultaneity of actions comprises the central layer of discourse (immediate actions), the
intermediate layer (long-term actions) and the outer layer (institutional or societal con-
texts). In short, it is sketched how multimodal (inter)action analysis aims to answer ques-
tions about the interconnection of the different modes on a theoretical as well as on a
practical level.
1. Introduction
Multimodal (inter)action analysis is an interdisciplinary methodology that integrates
verbal and non-verbal actions (i.e.: spoken language and gesture, posture, or gaze) as
well as objects in the material world (i.e.: computers, cell phones, toys or pieces of fur-
niture) and the environment itself (i.e.: layout of a room, a city or a park). With this
methodology, we also integrate psychological notions such as feelings and levels of
attention/awareness as they reveal themselves phenomenologically in (inter)action.
Feelings may be displayed phenomenologically in a social actor’s facial expression,
and attention/awareness may be analysed through the modal intensity and/or complex-
ity of an action that is performed. However, before going into the details of how multi-
modal (inter)action analysis allows us to investigate these lower – or higher–level
actions and include the world of objects and the environment in the analysis, I would
like to begin by giving some historical and theoretical background. Every methodology
has an underlying theory, and while theories are often implicit when we write about
methodology, I would like to illustrate how theory and methodology have developed
and how they connect in multimodal (inter)action analysis to build a solid foundation
for multimodal inquiry.
2. History of theory and methodology

Multimodal (inter)action analysis has antecedents that made its development possible.
The primary antecedent is mediated discourse analysis (Scollon 1998, 2001b); other
antecedents are interactional sociolinguistics (Goffman 1959, 1963, 1974; Gumperz
1982; Tannen 1984) and semiotics (Kress and Van Leeuwen 2001; Van Leeuwen 1999).
As the primary antecedent, mediated discourse analysis has had the strongest influ-
ence on multimodal (inter)action analysis (Norris 2004, 2011a), which has kept the pri-
mary features, building upon them and expanding them in the direction of a primarily
qualitative multimodal methodology.
Multimodal (inter)action analysis, just as mediated discourse analysis, takes an
action as its unit of analysis. An action (Scollon aligned with Wertsch 1998 and essen-
tially Vygotsky 1986) is a social actor acting with and/or through a mediational means
or a cultural tool. An action, in this view, is not analysable without analysing the con-
stant tension that is created through the social actors using various and always multiple
mediational means as they perform the action.
Scollon (1998, 2001a, 2001b) theorised that an action occurs at a site of engagement,
which is the window opened up by the social actor(s) and the mediational means when
an action is being performed. A site of engagement is not a place, but does include a
spatial element; it is not a time, but does include a temporal element as well; it is not
the cultural tools, but does include the particularities of cultural tools that are being
used; it is not the social actors, but includes the psychological and physical makeup
of the social actors performing the action. A site of engagement emphasises the one-
time occurrence of actions and thus moves opposite to any possibility of reification.
However, Scollon very well recognized that not all actions are simply and only one-
time actions, but that actions also link to practices. A practice, as Scollon (2001a, 2001b)
argued, is an action with a history, and handing was one practice that he investigated at
great length. When speaking of a practice, Scollon was speaking of repeated actions.
However, even when doing this, he postulated that each practice (or repeated action) is
actually performed at a particular time in a particular place (or a site of engagement),
thus making a practice also an action.
Because of this dual property – a property of history and a property of the immediate –
the concepts become interlinked, theoretically linking the micro to the macro.
2.1. From mediated discourse analysis to multimodal

(inter)action analysis
Now I explicate the development from mediated discourse analysis to multimodal
(inter)action analysis, demonstrating the first step towards a multimodal mediated
theory, and also the methodological framework of multimodal (inter)action analysis.
For multimodal analysis, the concept of modes is highly relevant; generally, we can
say that modes are produced by social actors for social actors’ use and are a heuristic
concept that allows us to talk about the world in theoretical terms.
In mediated discourse analysis, modes are viewed as complex cultural tools, which
comprise one part of the interconnected unit of social actor acting with or through cul-
tural tools. Cultural tools are not viewed as necessarily having the same underlying
properties or being analysable in the same way, but are viewed as an integral part of
social action. In mediated discourse analysis (Norris and Jones 2005), language is a
mode which social actors use to act in the world, but it is a mode that may essentially
differ in property from other modes.
In this way, one can say that language, gesture, or posture, for example, are semiotic
systems or systems of representation with rules and regularities attached to them, as
Kress and Van Leeuwen (2001) have argued and as I myself have argued before (Norris
2004). Language in this sense is primarily a system of representation: Whether we speak
or write, we do so in order to represent thought, experiences, past or future, and much
more. Further, language is an abstract system of representation. But not all modes are
systems of representation and not all modes are abstract. Certainly, all modes do rep-
resent something at some level or other, but some modes are used more by social actors
as systems of representation than are others.
Surely, the concept of mode is an important one. Modes are easily recognisable by all
social actors as some kind of system. When we think of language, gesture, furniture, or
music, we all agree that we know what we are speaking about, even though there are
great cultural differences regarding each one of these modes and great differences
between the modes.
When thinking further about modes as systems with rules and regularities, we can
easily define walking as a mode, since there are clear rules and regularities attached:
one walks by placing one foot in front of the other at a particular pace which is neither
too fast nor too slow; and, again, everyone would agree that we all know and under-
stand that there is some kind of system at play that allows us to speak of the mode
of walking exactly because there are rules and regularities attached to it.
But is walking also a mode of representation, and is it abstract? Walking, I would like
to claim, is not primarily used in order to represent, but is used primarily to move a
social actor from some point A to a different point B (even though a person does
also represent as they are walking); also, the mode of walking is not abstract, but is
instead concrete.
Thus, we can see that the mode of language is similar to the mode of walking in that
both possess rules and regularities (though vastly different), but we have to agree that
this is the only similarity.
The challenge for the area of multimodality is to account for the differences in modes
while simultaneously building an overarching theoretical umbrella that encompasses all
modes, building on their similarities.
2.1.1. Multimodal mediated theory

There are two primary notions underlying the discussion below:
(i) a mode is produced/learned/acquired in use by social actors for social actors’

use; and
(ii) a mode is a heuristic concept that allows us to talk about the world in theoretical
terms.
A mode, as I have emphasized before, is not real, is not countable, and has to be defined
and re–defined depending upon the focus of study (Norris 2004, 2011c). However, in
loose terms, we can speak about modes (without clearly defining them) to gain a shared
(though vague) understanding of what we are referring to. For example, we can speak of
modes such as furniture, gesture, or images, and we all have some shared understanding
of what is meant. The reason we can speak of the mode of furniture, gesture, or images
is that there are certain rules and regularities that structure these modes. Yet all modes
differ greatly in their materiality and also in their structure.
A theory of mediated action allows us to account for the differences in all modes,
while also allowing us to integrate their similarities – if we add a new notion to
mediated discourse theory. Here, we want to keep in mind that most important for
the theory of mediated action is the notion that a mediated action is a social actor acting
with/through mediational means or cultural tools (throughout the chapter, the terms
cultural tool and mediational means are used interchangeably).
Taking this primary feature of mediated discourse analysis a step further, I define
modes as systems of mediated action. Further, modes may be concrete or abstract
systems of mediated action, where concrete modes are located on one end of a contin-
uum and abstract modes on the other and where most modes have some aspects of
abstractness as well as concreteness, allowing for a fuzzy distinction. The systems are
recognisable by the rules and regularities attached to the mediated actions.
As a mode, furniture is a system of mediated action because furniture is made by
social actors for social actors’ use. The mode of furniture does not exist without social
actors, and neither does any other mode. A system of mediated action (in this case the
mode of furniture) is always and only an ephemeral concept that allows us to talk about
(in this case) all or some kind of furniture. The actual pieces of furniture that social
actors use (or produce), however, are cultural tools.
Similarly, when speaking about the mode of gesture as a system of mediated action,
we refer to a concept. When speaking about a gesture that is actually being performed,
the gesture consists of a social actor using multiple mediational means.
By theorizing modes as systems of mediated action, the mere definition of the term
mode includes the irreducible tension between social actor and mediational means. This
tension is easily missed when defining modes as semiotic systems or as systems of rep-
resentation; but it is also missed when we define modes as mediational means or cul-
tural tools because in either case the system seems dislodged from social actors. Yet,
it is social actors that multimodal (inter)action analysis is interested in and who are
at the heart of communication, not semiotic systems or systems of representation nor
mediational means or cultural tools.
The term mode is thus defined as a concrete or abstract system of mediated action,
whereby most modes lie somewhere on a continuum between these two points of con-
crete and abstract. A more concrete system would be the mode of walking. while a
more abstract system would be the mode of language.
This point of view directly grows out of mediated discourse theory (Scollon 2001a,
2001b), which is based on the following three principles:
(i) The principle of social action: Discourse is best conceived as a matter of social ac-
tions, not systems of representation or thought or values.
(ii) The principle of communication: The meaning of the term “social” in the phrase
“social action” implies a common or shared system of meaning. To be social an
action must be communicated.
(iii) The principle of history: “Social” means “historical” in the sense that shared meaning
derives from common history or common past. (Scollon 2001b: 6–8 [italics original])
Defining a mode as a system of mediated action naturally follows from the first princi-
ple, which highlights the social action as primary. Further, it encompasses the second
and the third principles, which highlight the aspect of shared meaning, action, and his-
tory. Common meaning or common past is constructed through mediated actions that
are taken in the world.
However, it also deviates from mediated discourse theory, as a mediated action in
mediated discourse theory is always and only a one-time irreversible action taken in
the world. By speaking of modes as systems of mediated action, I superimpose a con-
cept onto the irreversible one-time notion of a mediated action: A system of mediated
action is a heuristic concept that allows us to express the idea of something overarching,
such as furniture or language, walking or gesture, and directly embeds the importance
of social actors within this concept.
In this view, then, language is a system of mediated action; images are conceived as a
system of mediated action; walking is seen as a system of mediated action, and so on.
Each of these systems of mediated action (or modes) are viewed as systems that have
come about through the historical conglomeration of mediated actions, which always
and irreducibly link social actors to the cultural tools used to perform social action in
the world. In other words, modes are built and re-built, shaped and changed by social
actors through use. They are not viewed as systems of representation that exist in and
by themselves, but rather are viewed as systems that are produced and re-produced by
social actors in (inter)action.
When investigating modes in use, we can see that each system is recognizable
through the rules and regularities attached to it. But in this view, where a mode is a sys-
tem of mediated action, we have explicitly unpacked the irreducible tension between
social actors and cultural tools, allowing us to theorize modes in quite different ways
than before. When defining a mode as a system of mediated action with rules and
regularities attached to it, making the irreducible tension between social actor and med-
iational means explicit, we can see that the rules and regularities of a mode may be
more attached to the mediational means or cultural tools or may be more attached
to the social actor. When thinking about the mode of language, we see that language
is an abstract system of mediated action, where we can find a great number of rules
and regularities that have become embedded in the cultural tools of semantics, syntax,
etc. Differently, when thinking about the mode of walking, we find that the rules and
regularities of the modes are largely (though not entirely) linked to the social actor’s
body.
While we can make many sentences – and even many nonsensical sentence – and
have all of them still recognizable by others as language, the ways in which we can
walk, so that our walking is recognizable as such by others, are largely constrained
by our bodies. Certainly, language is also constrained by our bodies, but for the average
social actor language is much less constrained by the body than as for example walking.
Feet have to be placed in a certain way, not only because that is what the mode affords,
but precisely because the body highly constrains the mode. Further, language is a mode
that allows us to communicate in highly abstract ways, while walking is a much more
concrete mode, which is not usually used in highly abstract communicative ways,
even though the action of walking always does communicate to others.
While language is very well studied and we can easily adopt what we know about
language for a multimodal analysis, many other modes – with the exception of
maybe gesture and some aspects of gaze – are much less understood. Our interest in
multimodal (inter)action analysis, however, is not only to gain a better understanding
of particular modes; our interest is to gain a better understanding of how modes play
together in human (inter)action.
3. Analysing multimodal (inter)action: Methodology

In multimodal (inter)action analysis, we focus on social actors and view all actions as
(inter)actions no matter if social actors are (inter)acting with other social actors, with
objects, or with the environment.
3.1. Unit of analysis: Lower-level, higher-level and frozen action

The unit of analysis is the mediated action, whereby the mediated action is defined as
three kinds: the lower-level action, which is a mode’s smallest meaning unit, such as an
utterance for the mode of language, a gaze shift for the mode of gaze, or a step for the
mode of walking; the higher-level action, which is made up of a multitude of chained
lower-level actions, such as a dinner, which is made up of many utterances, gaze shifts,
postural shifts and uses of object handling; and three, the frozen action, which is a
lower-level and/or higher-level action that is frozen in material objects and the environ-
ment, such as an open office door entailing the frozen action of someone having opened
the door. Lower-level and high-level actions constitute each other in (inter)action: just
as a dinner is made up of many chains of lower-level actions, the lower-level actions are
produced because of the higher-level action that is being produced, i.e.: the lower-level
action of pointing to the pepper mill on the table builds an aspect of the higher-level
action we may call having dinner; just as the higher-level action of having dinner allows
the lower-level action of pointing to the pepper mill on the table to come about. All
actions are demarcated by their beginning and ending points, and when investigating
actions in this way, we soon discover that there are many levels of higher-level action
(Norris 2011a). When analysing a dinner, for example, we can investigate the dinner
from beginning to end, but we can also investigate parts of the dinner as demarcated
higher-level actions by analysing the beginning and ending points, such as the beginning
and ending of food being brought to the table, or the beginning and ending of a topic
being discussed, the beginning and ending of someone eating dessert, or the beginning
and ending of a person addressing someone at the table. Thus, the extent and level of a
higher-level action always depends upon the focus of study: if one investigates addresses
during dinner, one investigates this as a higher-level action, taking into consideration
not only the verbal, but also the non-verbal actions, the objects that are handled, the
food that is eaten and the table and chairs that are used during the addresses that
one studies; and if one investigates complete dinners, one investigates the complete din-
ner as a higher-level action, taking into consideration not only the verbal, but also the
non-verbal actions, the objects that are handled, the food that is eaten and the table and
chairs that are used during the dinner that one studies. The essence in multimodal
(inter)action analysis is always the linking of the multitude of modes that are used in
(inter)action when social actors act and communicate.
3.2. Modal density foreground-background continuum of

attention/awareness
With a focus on social actors, multimodal (inter)action analysis embeds a psychological
notion of attention/awareness. Generally, we can say that when a social actor pays atten-
tion to an action, the social actor is also aware of it, and if a social actor is aware of an
action, the social actor is paying (at least some) attention to it. Attention/awareness is
viewed as a continuum, and the amount that a social actor pays attention to an action
is understood as analysable through the notion of modal density.
Modal density comes about through a social actor’s modal use, the phenomenolo-
gical audible, visible, and otherwise perceivable production of an action, and gives
insight into the social actor’s levels of attention/awareness. Modal density is the
modal intensity or the modal complexity that a social actor uses to produce a certain
higher-level action.
3.2.1. Horizontal simultaneity of actions: A family dinner

Let us imagine a family having dinner. During this dinner, five social actors are present:
the mother, the father, their ten year-old son, seven-year old daughter, and one-year old
son, who is seated in a highchair next to the father. At this time, let us turn to the atten-
tion/awareness of the father, who is feeding the baby, while the baby is also self-feeding
to some extent. There are pieces of vegetables, small pieces of meat, and a few noodles
on the highchair tray that the baby picks up and eats. While the baby is busy arranging
the food items on the tray, the father is feeding the baby some soup from his own plate.
Simultaneously, he is telling his partner (the mother) about his day and listens to her
talking about hers. Further, he includes the ten-year old boy and the seven-year old
daughter in the conversation, also asking about their day. While much talk overlaps, the
father interacts with each other social actor at the table, while he is also eating dinner
himself.
When we analyse this dinner, we can see that the father is paying focused attention
to the conversation between himself and his partner: Modal density in the conversation
between these two social actors is high, which is produced primarily through intensity of
the mode of language. His spoken language is fast-paced with a slight high pitch, and
the conversation continues for a long stretch during the dinner. During the exchange,
he mostly looks at the baby, coordinating his feeding with the baby’s self-feeding and
babbling. Once in a while, he looks up and gazes at his daughter, who is telling her
story of the day that he pays some attention to while he is engaged with the mother
and the baby, and makes a remark or asks her a question. He also sometimes gazes
at his older son, who is listening, but is mostly focused upon the food in front of him.
What we find here, is that the father engages in five (inter)actions, or higher-level
actions, simultaneously: he pays focused attention to his partner, pays attention to a
lesser degree to the baby, to a still lesser degree to his daughter, and to an even lesser
degree to his older son, and eats his own dinner without paying much attention to it at
all. In order to produce all of these simultaneous actions, he uses the modes of gaze,
gesture, spoken language, posture, proxemics, object handling, and furniture. Some
(inter)actions are constructed more through the mode of language, others more through
the mode of gaze or the mode of object handling, but all are produced through all of the
modes in one way or another, so that the modes play together differently for each
(inter)action.
When thinking about (inter)action in this way, we can see that social actors are often
engaged in several higher-level actions at the very same time. As analysts, we can read
the levels of attention/awareness that the father pays to each of the other social actors
off of the modal density that he uses as he engages with the others. While the father (as
any other social actor) can only give one higher-level action his focused attention at one
time, he can give differentiated attention to quite a few higher-level actions. The (inter)
actions that he engages in can then be plotted on the heuristic model called the
foreground-background continuum of attention/awareness (Fig. 17.1).
Modal
Density
(inter)action
with the
mother (inter)action
with the
Baby (inter)action
with the
(iner)action
daughter
with the (inter)action
son eating
Decreasing
Attention/Awareness
Fig. 17.1: The father’s attention/awareness levels plotted on a heuristic foreground-background

continuum of attention/awareness.
This model of a modal density foreground-background continuum allows us to visualise

the various phenomenologically produced higher-level actions, i.e. (inter)actions, that a
social actor is involved in simultaneously. These are what we call the horizontal simul-
taneous actions. We can determine the amount of attention/awareness that a social
actor (in this case the father) pays to each higher-level action by assessing the amount
of modal density that he employs in the construction of each higher-level action.
The stronger the (phenomenological) modal density is, the stronger is his attention/
awareness that he pays to the action. It is important to note, that levels of attention
can shift and change quite quickly (please see Norris 2011b for more information).
3.2.2. Vertical simultaneity of actions

Besides horizontal simultaneous actions, social actors also engage in vertical simulta-
neously produced higher-level actions. This means that each higher-level action that a
social actor produces is embedded in more than one layer of discourse. In multimodal
interaction analysis (Norris 2011a), we distinguish between three layers of discourse:
(i) the central layer of discourse (comprised of the immediate actions performed by a
social actor);
(ii) the intermediate layer of discourse (comprised of the long-term actions that social
actors produce within their network(s)); and
(iii) the outer layer of discourse (comprised of actions performed within the institu-
tional or societal contexts).
Often, all three vertical layers of discourse overlap, so that they are difficult to disen-
tangle. However, when it comes to misunderstandings or ruptures of some kind in
(inter)action, we often can find the answer to the problems in the different vertical
layers of discourse and their normative expectations. For example, let us think about
the father in our example above. There he is engaged with his partner in conversation,
feeding the baby, and (inter)action with the older two children at some times. On the
central layer of discourse, his actions all make sense, and on the intermediary layer of dis-
course, within his family network, his actions all run smoothly and are comprehensible to
all others involved.
But now imagine that the mother believes in a strict gender differentiation, where
she feeds the baby. In this instance, the father’s actions (his central layer of discourse)
would clash with his intermediary layer of discourse (his family or network discourse),
and instead of a smooth and happy dinner, a completely different and quite strained
(inter)action between the two adults would emerge. Thus, while the different layers
of discourse are often invisible in smooth (inter)actions because they overlap comple-
tely, they become visible when they diverge as (inter)action becomes strained because
of it in some way. Of course, such ruptures in (inter)action can also come about with the
outer layers of discourse as they disconnect from any of the other layers of discourse.
4. Conclusions
Multimodal (inter)action analysis is a method that is theoretically strongly grounded
in mediated discourse analysis. With this method, we try to analyse the complexities
in (inter)action, neither shying away from a minute and detailed micro analysis of
lower-level actions that social actors perform, nor shying away from connecting these
micro analyses to the various layers of discourse from micro to macro.
When we define modes as systems of mediated action, we highlight that modes are
conceptual notions which grow out of and change within (inter)actions. Modes are pro-
duced by social actors for social actors’ use. They are learned, developed, and changed
through (inter)action, and embed rules and regularities. Rules and regularities may be
more attached to the social actor’s body, as is the case for the mode of walking; or may
be more attached to the cultural tool, as is the case for the mode of language. Where
exactly rules and regularities can be found within a mode is of great interest to a multi-
modal (inter)action analyst, because it shows that we cannot treat all modes in the same
way. Modes have different affordances and limitations, and these differences have to be
taken into account when investigating multimodal (inter)action.
In multimodal (inter)action analysis, we take the mediated action as our unit of ana-
lysis, as we believe that humans foremost act in the world. Several actions are often per-
formed simultaneously, rather than only consecutively, and simultaneity is a notion that
multimodal (inter)action analysis investigates not only on a horizontal, but also on a
vertical level.
However, there are many different elements in (inter)action that can be analysed
using multimodal (inter)action analysis, from transcription and theorizing modes,
modal hierarchies, and modal interconnection in (inter)action to analysing identity pro-
duction (Norris 2002, 2007, 2011a, 2011c). Especially for the analysis of identity produc-
tion several methodological concepts have been developed (Norris 2011a). As an
example, we can speak of identity elements being produced as social actors perform
a higher-level action, such as the father in our example above is producing a father
identity as he feeds the baby or a partner identity as he discusses his day with the
mother in the example. He produces an immediate identity element (as he performs
the actions on the central layer of discourse, i.e.: the lower-level and higher-level ac-
tions that he performs); he also produces a continuous identity element (as he performs
the actions within the intermediary layer of discourse, i.e.: the family discourse in our
example); and he performs a general identity element (as he performs the actions
on the outer layer of discourse, i.e.: institutional or societal gender discourse in our
example).
Multimodal (inter)action analysis tries to shed new light on human communication
from a vast array of settings. Researchers have used multimodal (inter)action analysis
to study computer mediated communication (Örnberg Berglund 2005); to analyse
aspects of advertising (White 2010, 2011) and interactions between and among journal-
ists and public relations professionals (Sissons 2011); or to study Aipan art making
(Frommherz 2011).
We have further studied interaction in an elementary school classroom (Norris
2003), and have used multimodal (inter)action analysis to study doctor-patient interac-
tions, and music lessons, as well as traffic police officers and workplace practices (Norris
2004). On a more theoretical level, multiparty interactions (Norris 2006) have been
studied, as well as rhythm in (inter)action (Norris 2009) and gesture in relation to
language (Norris 2011c).
While we have gained some knowledge about multimodal (inter)action, multimodal
(inter)action analysis is still a young methodology that opens up many old questions to
new scrutiny: How do social actors perform actions? Which lower-level actions are nec-
essary to construct a higher-level action? Which lower-level actions are required
because of the construction of a higher-level action? How many actions can a social
actor perform simultaneously?
But we will also wish to answer questions such as: How do the various modes play
together in interaction? Are there rules and regularities attached to modal aggregates?
If modes, as shown in Norris (2011c) fluctuate in hierarchies, what does this mean for
the mode of language, the mode of gesture, or the mode of object handling? Further,
we will want to think about the practical impact that we can have because of our
findings and ask ourselves: Can we teach social actors about the various layers of dis-
course and the impact that these have on their everyday actions? How can we use
our new knowledge in a constructive way to alleviate miscommunication and better
communication?
With these and many other questions, multimodal (inter)action analysis tries to find
answers by integrating the study of language with context and the social actor’s use of
embodied modes such as gesture, gaze, or posture, always taking into account and trying
to build on research not only in linguistics, but also in the area of non-verbal behaviour
and gesture (as for example by Argyle and Cook 1976; Birdwhistell 1970; Dittmann
1987; Ekman and Rosenberg 1997; Kendon 2004; McNeill 1992, 2000, 2005).
5. References
Press.
Birdwhistell, Ray L. 1970. Kinesics and Context. Essays on Body Motion Communication. Phila-
Dittmann, Allen T. 1987. The role of body movement in communication. In: Aron W. Siegman
and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 2nd edition, 37–63.
Hillsdale, NJ: Lawrence Erlbaum.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals. Basic and Applied Studies
of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford: Oxford
University Press.
Frommherz, Gudrun 2011. Sacred time: Temporal rhythms in Aipan practice. In: Sigrid Norris
(ed.), Multimodality and Practice: Investigating Theory-in-Practice-through-Methodology, 66–
81. New York: Routledge.
Goffman, Erving 1959. The Presentation of Self in Everyday Life. Garden City, NY: Doubleday.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press of Glencoe.
Goffman, Erving 1974. Frame Analysis. New York: Harper and Row.
Gumperz, John 1982. Discourse Startegies. Cambridge: Cambridge University Press.
Press.
Kress, Gunther and Theo Van Leeuwen 2001. Multimodal Discourse: The Modes and Media of
Contemporary Communication. London: Edward Arnold.
of Chicago Press.
Norris, Sigrid 2002. The implication of visual research for discourse analysis: transcription beyond
language. Visual Communication 1(1): 97–121.
Norris, Sigrid 2003. Autonomy: multimodal literacy events in an immersion classroom. Annual
Meeting of the American Association of Applied Linguistics. Washington, DC, March
21–23.
Norris, Sigrid 2004. Analyzing Multimodal Interaction: A Methodological Framework. London:
Routledge.
Norris, Sigrid 2006. Multiparty interaction: A multimodal perspective on relevance. Discourse Stu-
dies 8(3): 401–421.
Norris, Sigrid 2007. The micropolitics of personal national and ethnicity identity. Discourse and
Society 18(5): 653–674.
Norris, Sigrid 2009. Tempo, Auftakt, levels of actions, and practice: Rhythms in ordinary interac-
tions. Journal of Applied Linguistics 6(3): 333–356.
Norris, Sigrid 2011a. Identity in Interaction: Introducing Multimodal Interaction Analysis. Berlin:
De Gruyter Mouton.
Norris, Sigrid (ed.) 2011b. Multimodality and Practice: Investigating Theory-in-Practice-through-
Methodology. New York: Routledge.
Norris, Sigrid 2011c. Three hierarchical positions of deictic gesture in relation to spoken language:
A multimodal interaction analysis. Visual Communication 10(2): 129–147.
Norris, Sigrid and Rodney H. Jones (eds.) 2005. Discourse in Action: Introducing Mediated Dis-
course Analysis. London: Routledge.
Örnberg Berglund, Therese 2005. Multimodality in a three-dimensional voice chat. In: Jens All-
wood, Beatriz Dorriots and Shirley Nicholson (eds.), Proceedings from the Second Nordic Con-
ference on Multimodal Communication, April 7–8: 303–316.
Scollon, Ron 1998. Mediated Discourse as Social Interaction. London: Longman.
Scollon, Ron 2001a. Action and text: Toward an integrated understanding of the place of text in
social (inter)action. In: Ruth Wodak and Michael Meyer (eds.), Methods in Critical Discourse
Analysis, 139–183. London: Sage.
Scollon, Ron 2001b. Mediated Discourse: The Nexus of Practice. London: Routledge.
Sissons, Helen 2011. Multi-modal exchanges and power relations in a public relations department.
In: Sigrid Norris (ed.), Multimodality and Practice: Investigating Theory-in-Practice-through-
Methodology, 35–49. New York: Routledge.
Tannen, Deborah 1984. Conversational Style: Analyzing Talk among Friends. Norwood, NJ:
Ablex.
Van Leeuwen, Theo 1999. Speech, Music, Sound. London: Macmillan Press.
Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts Institute of
Technology Press.
Wertsch, James V. 1998. Mind as Action. Oxford: Oxford University Press.
White, Paul 2010. Grabbing attention: The importance of modal density in advertising. Visual
Communication 9(4): 371–397. London: Sage.
White, Paul 2011. Reception as social action: The case of marketing. In: Sigrid Norris (ed.), Multi-
modality and Practice: Investigating Theory-in-Practice-through-Methodology, 138–152. New
York: Routledge.
Sigrid Norris, Auckland (New Zealand)

18. Body gestures, manners, and postures in literature 287
18. Body gestures, manners, and postures in literature

1. Introduction: Nonverbal communication in literature
2. What kinesics in literature truly includes, how we perceive it in reading, and the
temporal dimension of body movements and positions
3. Perceiving the various occurrences of kinesics in literature, intrasystemic and
intersystemic co-structuration, and parakinesic qualifiers
4. Gestural phrasing, simultaneity, congruence-incongruence, inter-masking behaviors
5. Anticipatory, hidden, phonic, and object-related kinesic behaviors in literature
6. Gaze and smiles in literary characters and their deeper interactive functions
7. The reader’s interaction with the characters’ speaking face
8. The reader’s internal and external personal “oralization” of a literary text
9. Body gestures, manners and postures in “literary anthropology”
10. References
Abstract
Whether for research or for attaining full enjoyment of the experience of reading a novel,
for instance, it would be unrealistic and rather shortsighted to identify gestures, manners
and postures by themselves, without acknowledging the other co-occurring or alternating,
conscious or unconscious, communicative bodily sign systems, simply as they manifest
themselves in everyday interactive or noninteractive situations (i.e. proxemic behaviors,
bodily chemical, thermal and dermal activities, object-related behaviors); apart from
the fact that, at the very least, they occur as co-systems within the triple structure of
speech: verbal language (the words said) – paralanguage (how those words sound, and
other word-like utterances) – kinesics (movements and stills positions co-occurring or
alternating with those words). This article, while referring to more extensive treatments
of its various topics, is meant therefore, with appropriate textual examples, as a reference
model, for researcher or serious readers, of all that ought to be regarded in literature as
kinesics and its varying sensory perception, its internal structuring, its qualifying charac-
teristics, and the different ways in which the sensitive reader, as recreator, should be able
to perceive its explicit and implicit occurrences in the writer’s created text, discussing also
the common phenomenon of our mute or sound “oralization” of the text as part of the
reading act. In addition, it suggests the application of this topic in the area of literary
anthropology.
1. Introduction: Nonverbal communication in literature

The explicit or implicit presence in literature of kinetic body behaviors, that is, kinesics
(gestures, manners, postures), is an extremely important part in the creative – recreative
processes between writers and their readers and, very specifically, in what has been
studied as the reading act, an unsuspectedly complex series of processes (Poyatos
2008). Therefore, studying them in isolation – as has been done much too often – or
perceiving only them while reading, would be quite unrealistic and would seriously
hamper the researcher’s task, or any intelligent reading, since they occur mostly in inti-
mate interrelationship, though often hidden, with other nonverbal bodily sign systems.
In fact, kinesics is, above all, one of the three mutually inherent (and more often than
not mutually conditioning) components of interactive speech: verbal language (words) –
paralanguage (voice modifying features and independent word-like utterances) –
kinesics, circumstantially associated also to other somatic or extrasomatic signs like
tear-shedding, blushing, or clothes (Poyatos 2002a, 2002b), as well as to biophysicopsy-
chological, cultural, socioeconomic and educational conditioning elements (Poyatos
2002a: 124–130). Only keeping this in mind will make it possible to focus here, due
to editorial limitations, on bodily movements and positions in literature alone.
A literary work, most typically a novel, shows the following verbal and nonverbal
components: written verbal exchanges, paralinguistic descriptions and transcriptions,
kinesic descriptions, proxemic (interpersonal and person-environment spatial relation-
ships) descriptions, other described or evoked personal signs (chemical (smells, flavors),
thermal (temperature) shape, size, consistency and strength, weight, color), sensible de-
scriptions or implicit evocations of any natural, built or artifactual environment (sound
and silence, movement and stillness, chemical signs, temperature, spaces, volumes,
shapes, consistency, light, darkness) (Poyatos 2002a: 32–48; 2003c: 3–12, 16–24), not
to be dissociated from kinesics neither in literary studies or in the actual reading act.
This multi-channel reality shows researchers and sensitive readers: that the literatures
of the different cultures are an invaluable source of data for the study of kinesics
and other nonverbal bodily signs in anthropology, sociology, psychology, etc. (Poyatos
2002d); and that serious research on narrative, dramatic or poetic texts requires an
interdisciplinary approach to all the sign systems just mentioned (Poyatos 2002c).
Expressiveness in real life, and in that other reality we recreate in our personal read-
ing act, depends greatly on our universal as well as culture-specific kinesic behaviors.
They constitute a sort of unique kinetic vocabulary (i.e., movement and associated still-
ness) – with possible grammatical functions in the verbal-nonverbal flow of speech –
that can express visually and nonvisually what would be ineffable otherwise (Poyatos
2002a: 104–112).
Nonverbal communication in literature as a research area began to receive system-
atic treatment in the 1970s by Poyatos (1972, 1977) and proliferated since then (e.g.,
Korte 1993; Marmot Rein 1986; Portch 1985; and by the contributors to Poyatos’
1988, 1992 and 1997 edited volumes), more recently by Poyatos’ endeavors to establish
a realistic all-inclusive model for the analysis of the various nonverbal aspects of liter-
ature in its creative stages, its readers’ recreation, and its interlinguistic-intercultural
translation (2002c, 2002d, 2008). In order to deal with the specific areas and points
which should widen the readers’ perspectives and spur further interdisciplinary
research, this brief overview will concentrate on the novel.
2. What kinesics in literature truly includes, how we perceive

it in reading, and the temporal dimension of body
movements and positions
Before focusing on literature (but drawing already from it) we should briefly identify
the components and functions of kinesics, realistically defined as: Conscious and un-
conscious psychomuscularly-based body movements and intervening or resulting still
positions, either learned or somatogenic, of visual, visual-acoustic and tactile and
kinesthetic perception, which, whether isolated or combined with the linguistic and
paralinguistic structures and with other somatic and object-manipulating behavioral
systems, possess intended or unintended communicative value (on kinesics, Poyatos
2002b: Chapter 5; 2002c: Chapter 4; 2002e).
Thus defined, its observation and study includes more than what is usually acknowl-
edged. First of all, its three basic categories:
– gestures, both conscious and unconscious, mainly of the head, the face alone, includ-
ing gaze, and the extremities: “Miss Crane’s thin red nostrils quivered with indigna-
tion” (Wolfe LHA: XXV);
– manners, that is, how we perform a gesture or adopt a posture, but also “social man-
ners” (eating, smoking, shaking hands, donning or doffing a garment, kind of gait):
“Mr F.’s Aunt [after eating a piece of toast] then moistened her ten fingers in slow
succession at her lips, and wiped then in exactly the same order on the white hand-
kerchief ” (Dickens LD: II, IX), “Paulie swang, catching him unexpectedly in the
jaw from the side. The fellow staggered” (Farrell YMSL: I, V);
– postures, delimiting, and caused by, movements, with which they articulate in a com-
municative continuum to be acknowledged as we read, as we must silences with
respect to sounds: “Harran fell thoughtful, his hands in his pockets, frowning moodily
at the toe of his boot” (Norris O: I, V). There are two kinds: dynamic postures, when
a basically static posture contains a moving element, or the whole body moves in a
posture: “She stood on one foot, caressing the back of her leg with a bare instep”
(Steinbeck GW: XX); and contact postures, involving another person or object:
“NAPOLEON […] [He sits down at the table, with his jaws in his hands, and his
elbows propped on the map]” (Shaw MD).
We should also include within kinesics things like:
– movements and positions of eyes and eyelids: “Mrs. Kearney rewarded his very flat
final syllable with a quick stare of contempt” ( Joyce M);
– the hidden hands, whether still or moving: “Godfrey stood, still with his back to the
fire, uneasily moving his fingers among the contents of his side-pockets, and looking
at the floor” (Eliot SM: III);
– the heaving chest, an uncontrollable manner not necessarily dissociated from words,
paralanguage and other kinesic behaviors: “She [Ruth, in court] was pale, angry,
almost sullen, and her breast heaved” (Grey RT: X);
– a sudden stiffening of the body as thunders claps or something startle us: “She felt her
sister-in-law stiffen with nervousness and clasp her little bag tightly [in her excitement
when Morris gets up]” (Woolf Y: 1891);
– shuddering in disgust or horror, or shivering from cold or emotion: “CABOT [pats her
on the shoulder. She shudders]” (O’Neill DUE: III, iv);
– something as expressive as door-knocking (with personal and cultural characteristics)
or door-slamming: “[EBEN rushes out and slams the door – then the outside front door
[…]” (O’Neill DUE: I, ii);
– a person’s eloquent stride: “Elmer watched Jim plod away, shoulders depressed, a
man discouraged” (Lewis EG: XXX, V);
– the many object-manipulating gestures: “ ‘Rosedale – good heavens!’ exclaimed Van

Alstyne, dropping his eye-glass” (Wharton HM: XIV), “Dr. Winskill […] his elbows
on his desk sliding a silver pencil backwards and forwards from hand to hand”
(Wilson ASA: I, IV).
There are also less obvious, but relevant, occurrences, mainly: microkinesics, meaning-
ful movements and still positions of small magnitude: “Adam’s face was bent down, and
his jawbone jutted below his temples from clenching” (Steinbeck EE: XXIV, I), specif-
ically as micropostures, adopted by minor body parts (as in delicately feminine holding
of a cup with fingers of both hands loosely around it): “NAPOLEON [exasperated, clasps
his hands behind him, his fingers twitching […] This woman will drive me out of my
senses [To her] Begone” (Shaw MD).
As for our perception of body movements and positions, we can do it:
– visually by directly looking at the object, or through intended or unintended periph-

eral vision (90˚ on each side of the sagittal plane, thus a total of 180˚, and 150˚ on the
vertical, obviously experienced by narrative characters);
– audibly, as with applause, clapping someone’s back, footsteps: “Ralph strode the cor-
ridor haughtily … He did not know he was mutely saying […] ‘I’m not the greasy and
tattered hobo who arrived in town this morning’ […]./ His heels clicked aggressively
on the shiny stone pavement” (Lewis M: XXV);
– tactually, as people hug, kiss or shake hands, the additional touch receptors for pres-
sure, heat, cold and perhaps pain becoming also active: “She [Mrs. Driffield] held out
her hand […] and when I took it gave mine a warm and hearty pressure” (Maugham
CA: V);
– kinesthetically (the sense of kinesthesis, through muscles, tendons, nerves, and joints,
constantly used to hold things, negotiate a flight of stairs without looking down, etc.),
as when another person’s movements are communicated through a mediating shared
couch, adding to words or silence intimate sensory perception of his or her move-
ments (e.g., tremors of anxiety, fidgeting, preening): “as I sat opposite to her at
work, I felt the table tremble. Looking up, I saw my little maid shivering from
head to foot” (Dickens BH: XXXI).
However, direct sensory perception is constantly complemented and enriched by

synesthesia (the psychological process whereby one type of sensory stimulus produces
on a body part other than the stimulated one a secondary subjective sensation from
a different sense), by which we can imagine kinesics by hearing the sound it produces
or through kinesthetic perception. Thus, the more sensitive literary readers will
acknowledge synesthesially implicit undescribed kinesics mostly through accompanying
words and paralanguage, or through reference to a specific activity: “soft spurts alter-
nating with loud spurts came in regular succession from within the shed, the obvious
sounds of a person milking a cow” (Hardy FMC: III), this of course, liable to fail in
any interlinguistic-intercultural translation (Poyatos 2008: Chapter 8).
In addition, kinesic behaviors possess a temporal dimension and project themselves
not only by vividly remembering, for instance, someone’s conversational gestures, but
by leaving significant physical traces: “I could read all that in the dust; and I [Sherlock
Holmes] could read that as he walked he grew more and more excited. That is shown by
the increased length of his strides” (Conan Doyle SS: I, IV).
3. Perceiving the various occurrences of kinesics in literature,

intrasystemic and intersystemic co-structuration, and
parakinesic qualifiers
Within gestures, manners and postures, we further distinguish between: free, not invol-
ving other body parts or objects: “Mrs. Archer raised her delicate eyebrows in the par-
ticular curve that signified: ‘The butler – ’ ” (Wharton AI: V); and bound, as when our
hands touch other body parts or when we interact contactually with another person or
object: “He lay down on his back on the wooden floor [of a truck] and he pillowed his
head on his crossed hands, and his forearms pressed against his ears” (Steinbeck
GW: XXI).
There is, besides, a triple-phase itinerary in the occurrence of any gesture, manner
or posture:
(i) formative phase, initiated in different static positions to later continue its course
(sometimes a “manner,” as with how we fold or unfold our arms): “The serenity
of her expression [Sally’s] was altered by a slight line between the eyebrows; it
was the beginning of a frown” (Maugham OHB: XXI);
(ii) central phase, either dynamic (e.g. shaking the hand in the French “O, là, là!”) or
static (e.g. holding one’s temples with thumb and forefinger while trying to remem-
ber): “I mean it, sir. Please don’t worry about me.” I sort of put my hand on his
shoulder. ‘Okay?’ I said” (Salinger CR: II);
(iii) and dissolving phase or disarticulating, before initiating the next behavior, as with
the residual smile after a laugh, or a hand receding after executing the “firm” ges-
ture in: “In the momentary firmness of the hand that was never still – a firmness
inspired by the utterance of these last words, and dying away with them – I saw
the confirmation of her earnest tones” (Dickens BH: LX).
We should also mention three more aspects of kinesics:
(i) the interrupted kinesic behaviors, at times as eloquent as unuttered words: “John
wearily swung his leg over the pommel, but did not at once dismount. His clear
gray eyes were wondering riveted upon the hunter” (Grey MF: XIX);
(ii) the intrasystemic relationships, among different body parts: “ ‘I know, old man,’ he
[Burlap] said, laying his hand on the other’s shoulder […] ‘I know what being
hard up is’ […] another friendly pat and a smile. But the eyes expressed nothing”
(Huxley PCP: XIII);
(iii) the intersystemic relationships, as with kinesics and tears as in: “She [Jennie] rose./
‘Oh,’ she exclaimed, clasping her hands and stretching her arms out toward him.
There were tears of gratefulness in her eyes” (Dreiser JG: VII).
Kinesic behaviors are variously affected by five parakinesic qualifiers, profusely illu-
strated in literature, which reveal anthropological, sociopsychological and clinical
aspects and can modify the meaning of the message, besides revealing cultural or socio-
educational backgrounds:
– intensity, or muscular tension: “Robin stirred his coffee furiously” (Wilson ASA:
I, IV), “ ‘No!’ Viciously, Warren Trent stubbed out his cigar” (Hailey H: “Tuesday,”
2), often the intensity of different systems combining in one expression: “A
broad-shouldered man […] spatted his knee with his palm. ‘I know it […]!’ he
cried” (Steinbeck GW: XXIV);
– pressure, exerted in varying degrees (distinct from the cutaneous sensations of touch,
pain, heat and cold, all parts of the sense of touch) on people or inanimate objects:
“Nothing could be stronger, more dependable, more comforting, than the pressure
of his fingers on her arm […] They laughed intimately […] he picked up her hand”
(Lewis EG: VII, IV), “There was a pause intense and real […] Then Gerald’s fingers
gripped hard and communicative into Birkin’s shoulders, as he said:/ ‘No, I’ll see to
this job through, Rupert […]’ ” (Lawrence WL: XIV);
– range: “Rising from his seat, Dismukes made a wide, sweeping gesture, symbolical of
a limitless expanse” (Grey WW: VIII);
– velocity, or temporal dimension: “He drank up his tea. Some drops fell on his little
pointed beard. He took out his large silk handkerchief and wiped his chin impati-
ently” (Woolf Y: 1880), “It was a slow smile […] a very sensual smile and it made
her heart melt in her body” (Maugham PV: II);
– duration of the behavior, distinct from (though closely related to) speed: “a strange
recital. She [June] heard it flushing painfully, and, suddenly, with a curt handshake,
took her departure” (Galsworthy MP: II, IV).
These qualifiers lend movements and positions their specific meaning and affect also
what is being expressed verbally: “[after Rachel takes her leave of her cousin God-
frey] He waited a little by himself, with his head down, and his heel grinding a hole
slowly in the gravel walk; you never saw a man look more put out” (Collins M:
“First Period,” IX).
4. Gestural phrasing, simultaneity, congruence-incongruence,

inter-masking behaviors
Gestures, like words, can occur in strings of what are perfectly coherent kinephrasal
constructions, as shown in the following example in which the hands are twice ar-
ticulated with the knees (once producing an eloquent quasi-paralinguistic sound
as well), along with the expression in both gaze and face, then the arms and a
new form of eye behavior followed by a meaningful postural change: “Mr. Weller
planted his hands on his knees, and looked full in Mr. Winkle’s face, with an expres-
sion of countenance which showed that he had not the remotest intention of being
trifled with./ [.] having accompanied this last sentiment [verbally expressed] with an
emphatic slap on each knee, folded his arms with a look of great disgust, and threw
himself back in his chair, as if awaiting the criminal’s defence” (Dickens PP:
XXXVIII).
Two or more gestures can coincide as:
– simultaneous single-meaning gestures in the same or different body area, the elements
of that multiple expression complementing and even qualifying each other: “Miriam
looked up. Her mouth opened, her dark eyes blazed and winced, but she said nothing.
She swallowed her anger and her shame, bowing her dark head” (Lawrence SL: VII);
– simultaneous multiple-meaning gestures in the same or different body area, the simul-
taneity of gesture and words lending a special tone to the verbal expression by the
dominant superimposition of the former: “Mr Tigg, who with his arms folded on
his breast surveyed them, half in despondency and half in bitterness” (Dickens
MC: VII).
This multiplicity can imply either congruence or incongruence among the various bodily
components (which should sensitize us to the interaction processes on deeper levels), as
with the congruence seen in: “ ‘I won’t bear it. No, I won’t’ he said, clenching his hand
with a fierce frown” (Beecher Stowe UTC: III).
Directly related to this are inter-gesture masking behaviors, in which, more often
than we imagine, we try to conceal the kinesic behavior (which conveys already a spe-
cific emotion or thought) by camouflaging it or masking it, even as consciously as we
would the meaning of our words (cf. Ekman 1981, referring only to the face). We
may try to mask that feeling or emotion with another one we do not feel, incongruence
thus being quite obvious: “Ataity made a disparaging grimace; but through the mask of
contempt his brown eyes shone with pleasure” (Huxley EG: XV).
But we may also, with a neutral or indifferent countenance, unsuccessfully try to
mask what we feel, even betraying a complex mixture of feelings: “As soon as he
[Mr. Dawson] set eyes on the patient [Laura] I saw his face alter. He tried to hide it,
but he looked both confused and alarmed” (Collins WW: 390).
Naturally, we can add paralanguage and even verbal language, since how words are
chosen and said can have a strong bearing on these more or less subtle masking pro-
cesses, for the three components of speech, language-paralanguage-kinesics, are often
mutually complementary in dissimulation and feigning acts: “ ‘He has everything to
do with it as far as I am concerned,’ March answered, with a steadiness that he did
not feel” (Howells HNF: IV, VII). In fact, signs like coughing, breaking eye contact,
laughter, blowing out smoke, etc., can perform these intended masking functions.
5. Anticipatory, hidden, phonic, and object-related kinesic

behaviors in literature
A peculiar characteristic of conversational movements and positions is anticipatory ki-
nesics – which Kendon (1990) noted only for gestures, saying that “gesture waits for
speech” – even in the strictly phonetic speech movements that are often conspicuously
formed in advance of the verbal or paralinguistic sound they will produce: “ ‘Mr
Jadwin!’ she exclaimed. ‘[…] Why, I hardly know the main […]’/ But Mrs. Cressler
shook her head, closing her eyes and putting her lips together./ ‘That don’t make any
difference, Laura. Trust me […]’ ” (Norris P: II).
But there are also anticipatory manners and postures: “She [Rachel] approached
Mr Godfrey at a most unlady-like rate of speed […] her face […] unbecomingly
flushed” (Collins M: “Second Period,” “First Narrative,” II); in fact, any type of bodily
signs (blushing, heaving, etc.) can announce and determine the tone of the subsequent
interaction.
Since we perceive our own kinesics, if we are conscious of it, we cannot ignore pos-
itive or negative hidden gestures, mainly facial and manual, often unseen by others,
which we want to conceal more or less consciously, though the gesture nevertheless
is still there: “Ruthie mushed her face at his back, pulled out her mouth with her fore-
finger, slobbered her tongue at him” (Steinbeck GW: XX), “He [Basil Ramson] ground
his teeth a little as he thought of the contrasts of the human lot” ( James B: III), “he
wanted to hold her hand and tell her with quick little pressures that they were sharing
the English countryside” (Lewis B: IX), “[Soames] walked on faster, clenching his
gloved hands in the pockets of his coat” (Galsworthy IC: II, II).
Our repertoires of sound-producing gestures, which we can label phonic gestures,
that is phonokinesics, which acquire a language-like quality, are very distinctly differ-
entiated, both culturally and personality-wise: “ ‘Order, order, gentlemen,’ cried
Magnus, remembering the duties of his office and rapping his knuckles on the
table” (Norris O: II, IV), “ ‘Come into money, have you?’ he [the man at the
shop] cried, chuckling and slapping his thigh with a loud report” (Markandaya NS:
XXVIII).
We should also consider the many object-related gestures in which we manipulate
something as part of a compound kinesic behavior: “ ‘Rosedale – good heavens!’ ex-
claimed Van Alstyne, dropping his eye-glass” (Wharton HM: XIV), “Dr. Winskill
[…] sat in his consulting room, his elbows on his desk sliding a silver pencil backwards
and forwards from hand to hand” (Wilson ASA: I, IV).
6. Gaze and smiles in literary characters and their deeper

interactive functions
As part of the speaking face (Poyatos 2002a: Chapter 3; 2002b: 236–244), the move-
ments and postures of the pupils, eyelids and eyelashes (and even the eyebrows) artic-
ulate intimately with other body parts in true gestalts, blending with words and their
paralinguistic qualifiers as well as with other kinesic behaviors: “a caressing sound in
his deep, rich voice, a delightful expression in his kind, shining blue eyes, which
made you feel very much at home with him” (Maugham PV: XIV).
Consciously or unconsciously on the part of the beholder, there are also in interac-
tion intersystemic blends involving gaze and, for instance, perfume, cosmetics, as well as
other behavioral signs (e.g. tearful eyes, blushing). Besides, the regard of some persons
seem to come from more than just two organs of vision: “her eyes with their long lashes
were so starry and yet so melting that it gave you a catch at the heart to look into them”
(Maugham PV: VIII).
Adding bodily contact to gaze can increase the intimate nature of an interaction:
“She […] turned her face full on me, and reaching across the table, laid her hand firmly
on my arm./ ‘Not because you are a teacher […]” (Collins WW: 96).
Besides other important aspects of smiles (Poyatos 2002b: Chapter: 211–213), we

should mention:
– how they function as gestures, qualifying what is being said, blending with other
forms of expression and playing a crucial role in how people feel about one another:
“During the colloquy Jennie’s large blue eyes were wide with interest. Whenever he
looked at her she turned upon him such a frank, unsophisticated gaze, and smiled in
such a vague, sweet way, that he could not keep his eyes off of her for more than a
minute at a time” (Dreiser, JG, I), “She had a terrifically nice smile. She really did.
Most people have hardly any smile at all, or a lousy one” (Salinger, CR, VIII);
– how a smile can be seen in other parts of the face besides the lips, particularly the
eyes: “She smiled with her lips and with her eyes […] ‘Why not?’ asked his wife,
her blue eyes still pleasantly smiling” (Maugham CA: V);
– the fact that they can qualify whole stretches of “smiling speech”: “Mr Pecksniff
smilingly explained the cause of their common satisfaction” (Dickens MC: VI);
– that they can blend with other forms of expressions at first sight incongruously:
“Rose was unable to continue for a moment […] smiling through her tears, she
said” (Wilson ASA: I, II); and specifically with gaze: “His face [William de Mille’s]
was severe even in repose, and his mouth firm in preoccupation. But the lights
blazed behind the eyes and his lips were cross-hatched with lurking smiles” (de
Mille DP: V);
– that smiles can act as disclaimers: “ ‘Are you penitent?’ […]/ ‘Heart-broken!’ he
answered, with a rueful countenance – yet with a merry smile just lurking within
his eyes and about the corners of his mouth” (A. Brontë TWF: XXIV);
– that someone’s negative impression on us can change when that person smiles:
“Jonathan’s smile, which came quickly, accompanied by a warm light in the eyes,
relieved Helen of an unaccountable repugnance she had begun to feel toward the
borderman” (Grey LT: III);
– that there is also an “anatomical smile,” which can betray an unfelt feeling: “ ‘Poor
Goggler! How fiendish we were to him!’/ ‘That’s why I’ve always pretended I
didn’t know who he was,’ said Staithes, and smiled an anatomical smile of pity
and contempt” (Huxley EG: XX).
The smile, in sum, is perhaps the human gesture which affect us the most: “Oh that
those thine – thy own sweet smiles I see,/ The same that oft in childhood solac’d
me.” (Wm. Cowper On the Receipt of My Mother’s Picture, 1.1).
7. The reader’s interaction with the characters’ speaking face

Facial features, both static and dynamic (Poyatos 2002a: 63–74; 2002b: 332–333) are
essential conditioners of the characters’ kinesic configuration and of our perception
of, and interaction with them, and appear in four categories:
(i) permanent (position, size and shape of brows, eyelids and eyelashes, nose, cheeks,
mouth, forehead, chin and mandible, to which can be added the long-term pres-
ence of a beard or moustache, conspicuous sideburns, or hairdo): “with fat little
squirrel cheeks and a mouth perpetually primed in contemptuous judgement […]

in all ways smug and insufferable” (Doctorow WF: XIII);
(ii) changing (formed over time by aging, work, suffering, hardships or motor habits,
such as wrinkles and folds, blotches, deformations, etc., usually acting in interaction
as intellectually evaluated components): “His neck was ridged with muscles
acquired from a life-long habit of stiffening his jaw and pushing it forward”
(MacLennan TS: I, XI),
(iii) dynamic (subject to our positive or negative perception as part of the triple struc-
ture language-paralanguage-kinesics, so important, for instance, for the formation
of first impressions): “SAMUEL KAPLAN […] twenty-one, slender, with dark, unruly
hair and a sensitive, mobile face” (Rice SS: I); and
(iv) artificial, actually enhancements or de-emphasizers of the face (e.g., glasses and
sunglasses, or status-identifying symbolic marks): “Wearing the vermilion mark
of marriage at the central parting of her hair, as a woman must, she would gain
freedom, freedom to live her own way” (Bhattacharya HHRT: XXVI).
A novelist may merely suggest a character’s facial signs, which, together with a not always
available kinesic configuration, are all that we have to go by in order to carry out our own
task as readers. But as we engage in bringing those characters to life, we tend to broaden
the concept of the “speaking face” to include, for instance, not only the eyes and lips as
objects of visual attention – and therefore qualifiers of personal interaction – but mainly
the characters’ hands, which in general act close to the face and in close articulation with
facial expressions and their functions: “your hands [Isabel’s] are your most fascinating
feature. They are so slim and elegant […] the infinite grace with which you use them
[…] They’re like flowers sometimes and sometimes like birds on the wing. They’re
more expressive than any words you can say” (Maugham RE: V, IV).
8. The reader’s internal and external personal “oralization”

of a literary text
Having at least summarized the true reality of the presence of kinesics in literature, this
overview requires that we acknowledge as well – for it becomes an important part of
our reading act – the fact that very frequently when we read, particularly a novel or
play, we give utterance to, that is, exteriorize, the words the writer wrote for us (or, un-
fortunately, those with which the translator rendered the original ones), which, natu-
rally, include the mouth and face movements involved in their delivery. We do this:
(i) mentally only, hearing in our mind their sounds as we articulate them internally;
(ii) half-articulating the sounds of words, still inaudibly to others, although partly vis-
ibly, since the lips are not parted;
(iii) articulating the sounds fully, but mutteringly, making them audible and visible
through the movements of our slightly parted lips;
(iv) uttering the words in a fully audible and visible way.
We can then speak, therefore, of mute oralization and sound oralization, the latter being
the common inclination in many readers when reading a compelling text, just as we may
catch ourselves mirroring the characters’ speech movements while watching the more
emotional scenes of a film; let alone while oralizing a letter by someone we do not
know personally, and never have seen, or perhaps have seen once or seldom, or some-
one we do know quite well and just “see” how he or she would say those words, partic-
ularly in a love letter, a piece of epistolary literature which so often is not only oralized
but reenacted, dramatized, emotionally uttered, with excruciating restrain if we are
forced to read it in front of witnesses such as airplane or train seatmates. But we
give free rein to our blissful identification in word and gesture with the beloved, re-
reading key words and phrases, pausing to let our imagination fly to him or her who
spoke and is speaking again, precisely like that, to us readers. That is, indeed, full ora-
lization at its best, accompanied by our own emotional reactions to the sender’s ima-
gined verbal and nonverbal speech: “She did not read the letter [Philip’s]; she heard
him utter it, and the voice shook her with its old strange power […] Philip’s letter
had stirred all the fibres that bound her to the calmer past” (Eliot MF: VII, V). Here
are two very especial instances of epistolary oralization: “I wake filled with thoughts
of you. Your portrait and the intoxicating evening we spent yesterday have left my
senses in turmoil. Sweet, incomparable Josephine, what a strange effect you have on
my heart! […] a thousand kisses; but give me none in return, for they set my blood
on fire” (Napoleon to his future wife Josephine, December 1795); “My angel, my all,
my very self […] My heart is full of many things to say to you – Ah! – there are mo-
ments when I feel that speech is nothing after all – cheer up– – remain my true, only
treasure, my all as I am yours” (Beethoven to an unknown woman, July 6, 1880).
Onomatopoeic words will trigger our own phonic articulation better: “There is a
grumbling sound and a clanking and jarring of keys. The iron-clamped door swung
heavily back” (Conan Doyle SF: V); “The silences between them were peculiar.
There would be the swift, slight ‘cluck’ of her needle, the sharp ‘pop’ of his lips as he
let out the smoke, the warm sizzle on the bars as he sat in the fire” (Lawrence SL:
III). Also, the combination of onomatopoeic and other types of words that trigger
our oralization: “[in a Kentucky tavern, 19th century] a jolly, crackling, rollicking fire,
going rejoicily up a great wide chimney […] the calico window curtain flapping and
snapping in a good stiff breeze of damp raw air” (Beecher Stowe UTC: XI); “See
that steamer out there?[…]/ ‘Yes,’ said Suzanne with a little gasp. She inhaled her
breath as she pronounced this word which gave it an airy breathlessness which had a
touch of demure pathos in it. ‘Oh, it is perfect!’ ” (Dreiser G: III, VII).
9. Body gestures, manners and postures in “literary

anthropology”
Our nonverbal repertoires, including kinesics, constitute an important area within liter-
ary anthropology, defined and developed by Poyatos as: the systematic study of the doc-
umentary and historical values of the cultural signs contained in the different
manifestations of each national literature, particularly narrative literatures (Poyatos
1988), for instance:
– as interactive activities, culture-specific or pancultural: “[at the rural ‘square dance’]

the old people on the edge of the floor took up the rhythm, patted their hands softly,
and tapped their feet; and they smiled gently and they caught one another’s eyes and
nodded [as a greeting or approval]” (Steinbeck GW: XXIV);
– as ritualized activities, with crosscultural similarities and differences and even inter-
cultural borrowings: “ ‘So it is,’ cried Tigg, kissing his hand in honour of the sex”
(Dickens MC: XXVII), “ ‘But Huzoor!’ said Hari, touching the foreman’s black
boots with his hand and taking the touch of the beef hide to his forehead. ‘Be
kind’ ” (Anand C: 198–199);
– as social manners, varying socially and historically: “He sank into a chair, laid his hat
and gloves on the floor beside him in the old-fashioned way” (Wharton AI: X);
– as task-performing acts, as in eating, drinking, smoking, etc., and associated behaviors:
“Rap with the bottom of your pint for more liquor [at the inn]” (Hardy FMC: XLII),
“he smoked a cigarette which he held between his thumb and forefinger, palm up,
in the European style” (Doctorow R: VI);
– as somatic, random and emotional acts, as in: “the man blew his nose into the palm of
his hand and wiped his hand in his trousers” (Steinbeck GW: XVI).
It has been seen to what extent creative literature offers a wealth of explicit and implicit
instances of interactive and noninteractive nonverbal occurrences, and that pursuing
the study of visual body behaviors independently of the other somatic signs would be
quite unscholarly.
10. References
Anand, Mulk Raj 1972. Coolie. New Delhi: Orient Paperbacks. C
Beecher Stowe, Harriet 1998. Uncle Tom’s Cabin. New York: Signet. First published [1852]. UTC
Bhattacharya, Bhabani 1955. He Who Rides a Tiger. New Dehli: Hind Pocket Books. RT
Brontë, Anne 1979. The Tenant of Wildfell Hall. London: Penguin Books. First published [1848].
TWF
Collins, Wilkie 1974. The Woman in White. London: Penguin Books. First published [1860]. WW
Collins, Wilkie 1986. The Moonstone. London: Penguin Books. First published [1868]. M
Conan Doyle, Arthur 1930a. The Sign of Four. The Complete Sherlock Holmes by Sir Arthur
Conan Doyle, with a Preface by Christopher Morley, Volume I. Garden City, NY: Doubleday.
SF
Conan Doyle, Arthur 1930b. A Study in Scarlet. The Complete Sherlock Holmes by Sir Arthur
Conan Doyle, with a Preface by Christopher Morley, Volume I. Garden City, NY: Doubleday.
SS
Cowper, William 1905. On the Receipt of My Mother’s Picture. Poems. London. First published
[1798]. OR
Dickens, Charles 1836–1837. Pickwick Papers. New York: Dell. PP
Dickens, Charles 1968. Martin Chuzzlewit. Harmondsworth: Penguin. First published [1843–1844].
MC
Dickens, Charles 1985. Bleak House. London: Penguin Books. First published [1853]. BH
Dickens, Charles 1973. Little Dorrit. Harmondsworth: Penguin. First published [1856–1857]. LD
Doctorow, Edgar Lawrence 1985. Ragtime. New York: Ballantine Books, Fawcett Crest. First pub-
lished [1975]. R
Doctorow, Edgar Lawrence 1985. World’s Fair. New York: Fawcett Crest. WF
Dreiser, Theodore 1963. Jennie Gerhardt. New York: Laurel. First published [1911]. JG
Dreiser, Theodore 1967. The Genius. New York: New American Library of Canada, Signet Books.
First published [1915]. G
Ekman, Paul 1981. Mistakes when deceiving. Annals of the New York Academy of Sciences 364:
269–278.
Eliot, George 1860. The Mill on the Floss. New York: Dutton; London: Dent. Everyman’s
Library. MF
Eliot, George 1992. Silas Marner. New York: Bantam Books. First published [1861]. SM
Farrell, Studs 1935. The Young Manhood of Studs Lonigan. New York: Vanguard Press. YMSL.
Galsworthy, John 1968. The Man of Property. New York: Charles Scribner’s Sons. First published
[1906]. MP
Galsworthy, John 1959. Justice. Contemporary Drama: 9 Plays. New York: Charles Scribner’s Sons.
First published [1910]. J
Galsworthy, John 1994. Chancery. Hertfordshire: Wordsworth Editions. First published [1920]. IC
Grey, Zane 1945. The Last Trail. Philadelphia: Blakiston Company. First published [1909]. LT
Grey, Zane 1961. The Rainbow Trail. New York: Pocket Books. First published [1915]. RT
Grey, Zane 1990. The Man of the Forest. New York: Harper and Row. First published [1920]. MF
Grey, Zane 1923. Wanderer of the Wasteland. New York: Grosset and Dunlop. WW
Hailey, Arthur 1966. Hotel. New York: Bantam. First published [1965]. H
Hardy, Thomas 1971. Far from the Madding Crowd. London: Pan Books. First published [1874].
FMC
Howells, William Dean 1960. A Hazard of New Fortunes. New York: Bantam Books. First pub-
lished [1890]. HNF
Huxley, Aldous 1928. Point Counterpoint. New York: Avon Books. PC
Huxley, Aldous 1961. Eyeless in Gaza. New York: Bantam Books. First published [1936]. EG
Joyce, James 1947. A Mother, Dubliners. The Portable James Joyce. New York: Viking Press. First
published [1914]. M
Korte, Barbara 1997. Body Language in Literature. Toronto: University of Toronto Press; Tübin-
gen, Germany: A. Francke.
Lawrence, D.H. 1960. Sons and Lovers. New York: New American Library, Signet Books. First
published [1913]. SL
Lawrence, D.H. 1950. Women in Love. New York: Random House, Modern Library. First pub-
lished [1921]. WL
Lewis, Sinclair 1961. Babbitt. New York: New American Library, Signet Classic. First published
[1922]. B
Lewis, Sinclair 1926. Mantrap. New York: Grosset and Dunlap. M
Lewis, Sinclair 1954. Elmer Gantry. New York: Dell. First published [1927]. EG
MacLennan, Hugh 1967. Two Solitudes. Toronto: Macmillan of Canada, Laurentian Library. First
published [1945]. TS
Markandaya, Kamala 1954. Nectar in a Sieve. New York: New American Library. NS
Maugham, William Somerset 1942. Of Human Bondage. New York: Random House, Modern
Library. First published [1915]. OHB
Maugham, William Somerset 1978. The Painted Veil. London: Pan Books. First published [1925].
PV
Maugham, William Somerset 1960. Cakes and Ales. Harmondsworth: Penguin. First published
[1930]. CA
Maugham, William Somerset 1943. The Razor’s Edge. Philadelphia: Blakiston Company. RE
Mille, Agnes de 1952. Dance to the Piper. New York: Bantam Books. DP
Norris, Frank 1971. The Octopus. New York: Bantam Books. First published [1901]. O
Norris, Frank 1956. The Pit. New York: Grove Press; London: Evergreen Books. First published
[1903]. P
O’Neill, Eugene 1974. Desire under the Elms. Masterpieces of the Drama. New York: Macmillan.
First published [1924]. DUE
Portch, Stephen R. 1985. Literature’s Silent Language. New York: Peter Lang.
Poyatos, Fernando 1972. Paralenguaje y kinésica del personaje novelesco: Nueva perspectiva en el
análisis de la narración. Revista de Occidente 113–114: 148–170.
Poyatos, Fernando 1977. Forms and functions of nonverbal communication in the novel: A new
perspective of the author-character-reader relationship. Semiotica 21(3/4): 295–337.
Poyatos, Fernando 1983. New Perspectives in Nonverbal Communication: Studies in Cultural
Anthropology, Social Psychology, Linguistics, Literature and Semiotics. Oxford: Pergamon
Press.
Poyatos, Fernando (ed.) 1988. Literary Anthropology: New Approaches to People, Signs and Lit-
erature. Amsterdam: John Benjamins.
Poyatos, Fernando (ed.) 1992. Advances in Nonverbal Communication: Sociocultural, Clinical,
Esthetic and Literary Perspectives. Amsterdam: John Benjamins.
Poyatos, Fernando 1997. Nonverbal Communication and Translation: New Perspectives and Chal-
lenges in Literature, Interpretation and the Media. Amsterdam: John Benjamins.
Poyatos, Fernando 2002a. Nonverbal Communication across Disciplines, Volume I: Culture, Sen-
sory Interaction, Speech, Conversation. Amsterdam: John Benjamins.
Poyatos, Fernando 2002b. Nonverbal Communication across Disciplines, Volume II: Paralanguage,
Kinesics, Silence, Personal and Environmental Interaction. Amsterdam: John Benjamins.
Poyatos, Fernando 2002c. Nonverbal Communication across Disciplines, Volume III: Narrative Lit-
erature, Theater, Cinema, Translation. Amsterdam: John Benjamins.
Poyatos, Fernando 2008. Textual Translation and Live Translation: The Total Experience of Non-
verbal Communication in Literature, Theater and Cinema. Amsterdam: John Benjamins.
Rain, A. Marmot 1986. La Communication Non-Verbale chez Maupassant. Paris: Nizet.
Rice, Elmer 1959. Street Scene. Contemporary Drama: Nine Plays. Edited by Ernest Bradlee Watson
and William Benfield Pressey. New York: Charles Scribner’s Sons. First published [1929]. SS
Salinger, Jerome David. 1960. The Catcher in the Rye. New York: Signet Books. First published
[1951]. CR
Shaw, Bernard 1958. The Man of Destiny. Bernard Shaw: Seven One-Act Plays. Baltimore: Pen-
guin Books. First published [1896]. MD
Steinbeck, John 1964. The Grapes of Wrath. New York: Bantam Books. First published [1931].
GW
Steinbeck, John 1953. East of Eden. Harmondsworth: Penguin, 179. EE
Wharton, Edith 1905. The House of Mirth. New York: New American Library, Signet Classic. HM
Wharton, Edith 1997. The Age of Innocence. Mineola, NY: Dover. First published [1920]. AI
Wilson, Edmund 1956. Anglo-Saxon Attitudes. New York: New American Library, Signet. ASA
Wolfe, Thomas 1929. Look Homeward, Angel. New York: Modern Library, Random House. LHA
Woolf, Virginia 1973. The Years. Harmondsworth: Penguin. First published [1937]. Y
Fernando Poyatos, University of New Brunswick (Canada)

III. Historical dimensions
19. Prehistoric gestures: Evidence from artifacts

and rock art
1. Introduction
2. Reconstructing gestures from the archaeological record
3. The depiction of gestures
4. Prehistoric hands
5. References
Abstract
All artifacts involve gestures not only as a prerequisite of their very production but also in
the ways they are used for technical or symbolic purposes. Prehistoric artifacts from stone
tools to decorated cliffs and caves are the only record we have of the gestures of early
humans. Most prehistoric gestures have to be inferred but some are occasionally repre-
sented in parietal paintings and small statues. However, the most direct evidence of pre-
historic gestures is the presence of printed or stenciled hands in caves, and on cliffs or
boulders. Whether these hands can be construed as indexical, iconic, or symbolic signs,
their abundance, ubiquity, and antiquity bear witness to irrefutable traces left by human
gestures in the deep time of Homo sapiens’s cognitive evolution.
1. Introduction
Gestures are by nature ephemeral, dynamic objects which could not be preserved by
any means until the modern technologies allowing visual recording were invented. At
most, two- or three-dimensional representations can reproduce postures or freeze
expressive gestures by selecting features combining directionality, amplitude, and finger
configurations. It can be safely assumed that anatomically modern humans and, presum-
ably, their Neanderthal cousins fostered rich gesture repertories in their social interac-
tions and in their manipulation of, confrontations with, and imitations of the objects,
both animate and inanimate, in their environment. We can only imagine the specific
gestures they were making to communicate feelings and attitudes toward each other,
and the forms of visual communication that prevailed during collective hunting, war-
fare, and the transmission of transformative techniques applied to vegetal, animal,
and mineral materials. Presupposing such gestures is broadly constrained, on the one
hand, by the range of movements that the human body makes possible and, on the
other hand, by the nature of the objects which must be transformed to be functional.
These necessarily included crushing, pounding, knapping, breaking, hitting, molding,
and tying, among many other goal oriented behaviors. Such plausible general assump-
tions remain nevertheless rather vague and hypothetical, and observing the daily life of
surviving hunter-and-gatherer cultures can only help figure out past gestures without
providing any hard evidence regarding the way it was, say thirty-thousand years ago
302 III. Historical dimensions
or more, the time about which the archaeological record provides proofs of socio-
cultural activities. There exist, however, material traces of these prehistoric gestures
in the form of artifacts which at least offer some objective clues about the manipulative
techniques which produced them.
2. Reconstructing gestures from the archaeological record

The step by step process leading from the selection of a rough flint to a functional stone
tool with a sharp edge was dubbed chaı̂ne opératoire (‘sequence of operational moves’)
by French archaeologists. Through experimenting with material to achieve the same re-
sults that prehistoric humans did, it could be shown that there is only one series of
moves performed in succession that leads to standardized types of objects. Picking
up a rough stone and evaluating its properties, holding it in the right position, then
knapping it at the right angle with another stone to obtain the desired intermediary
modified fragment constitutes a gestural path which is constrained by human physiol-
ogy and cognition, as well as by the physical properties of the mineral concerned,
whether it is flint, chert, or obsidian (Delson et al. 2000). By comparing stone tools
which were produced along the various stages of the palaeolithic and the neolithic per-
iods, we have gained a reliable knowledge of the kind of technical gestures that pro-
duced the resulting artifacts. We can assume that such series of technical movements
were schematically reproduced to form a repertory of gestures referring to particular
stone making activities, and we can even plausibly imagine that some metaphorical ges-
tures were thus generated during multimodal human interactions (Schick and Toth
1993).
When we consider sculptures, bas-reliefs, beads, and potteries, the range of gestures
which can be reconstituted is vastly expanded. The production of coloring material
through crushing, pounding, and mixing minerals and plants, and applying the resulting
product on objects or on bodies are further evidence of complex concatenations of ges-
tures whose dynamic patterns necessarily had to be transmitted across generations
through both observational learning and demonstrative teaching.
Artifacts provide another reliable access to the repertory of gestures of prehistoric
humans: their functional and symbolic uses imply behaviors which are dynamically con-
strained by the physical and structural properties of these artifacts. Throwing a stone
which has been modified so that it meets some ballistic requirements, harpooning a
fish at the right angle, threading beads or pierced shells and displaying the resulting
ornament on oneself or others, all these culturally learned actions can be performed
with stylistic variations but only within the limits determined by the dimensions, shapes,
and weights of the artifacts concerned. Like for modern humans, there is no doubt that
these populations were referring to such actions and artifacts through imitative gestures
in their daily interactions.
Moreover, there is little doubt either that anatomically modern humans were using
language to communicate about issues that mattered to the social cohesion of their
groups and to the success of their hunting and warfare strategies. It can be safely as-
sumed that they had lexicons to refer to each other and to relevant parts of their envi-
ronment, as well as to the actions that were performed to achieve their goals.
Contemporary humans commonly produce iconic gestures when they utter verbs such
as “to harpoon” or “to aim at,” or when they mention that someone had a beautiful
19. Prehistoric gestures: Evidence from artifacts and rock art 303
necklace. Words most often come as multimodal utterances, and this provides a ground
for mutual understanding among groups which do not share the same language.
Finally, burials offer irrefutable glimpses of a family of gestures which are essentially
ritualistic: the carrying of the body to its resting place; its coloring with ochre and dec-
orating with beads; its disposition in a particular posture on the ground; the depositing
of symbolic artifacts within the burial space. All this provides strong evidence of ges-
tures which were more of a symbolic than technological nature (Vialou 1998: 116).
Inferring gestures from artifacts and evidence of rituals does not require stretching
the imagination beyond the realm of the highly probable, if not the absolute certain.
3. The depiction of gestures

Carved ivory and antlers, petroglyphs and decorated pebbles, cliff and cave paintings
yield still more gesture-relevant data. Indeed, it takes a sophisticated hand-eye coordi-
nation to execute images that bear uncontroversial resemblance to the animals which
were present in the environment of prehistoric people. We can extrapolate from
these pictorial achievements the high degree of agility of their upper limbs and fingers,
the precision of their grip, and the exploitation of these learned versatile movements for
visual and haptic social interactions (Marzke 1997). But, in these cases too, we have to
rely on inferences to elaborate virtual representations of prehistoric socio-cultural
gestures.
There is indeed a dearth of represented human gestures in the prehistoric record, but
there are nevertheless some paintings and carvings which depict postures and gestures
in a schematic manner. Ivory and antler statues are constrained, of course, by the cylin-
drical shape within which they must be carved. There are nevertheless some options
regarding the positions of the arms as they are engraved on the surface of the artifacts,
and some non-organic material such as serpentine allowed for more freedom to repre-
sent limbs protruding from the trunk in various positions (e.g., Bednarik 2001: 176).
In rare cases, bone or ivory figurines show evidence of multi-components with arms
being added on the sides of the main body. These, however, are rigid and cannot
yield any sense of expressivity (e.g., White 2003: 139). But among the numerous pre-
historic feminine sculptures some are carved as bas-reliefs in the rock, a medium
which is less restrictive and allows for significant representations of gestures such as
the “Venus” from Laussel (France) whose left hand rests on her upper abdomen and
right arm gracefully elevates a (hollow?) wisent horn toward her face either as an offer-
ing or in order to drink its contents (Vialou 1998: 11). The markings on the side of the
horn – 13 regular notches – have suggested to some archaeologists that the gesture may
refer to a ritual.
Similar considerations can be adduced from the skeletons which are uncovered in
the attitudes in which they were disposed of in the burial sites. This is the closest
hint we can get of cultural deliberate gestures, even if they were normatively molded
by the communities rather than dynamically generated by the bodies themselves. They
are often arranged in a fetal position, with arms crossed on the lower abdomen (e.g., Via-
lou 1998: 116). Other positions and orientations are found in the archaeological record.
Cave and cliff paintings as well as petroglyphs offer a richer, albeit sparse repertory
of typical upper limb configurations. The “dancing shaman” of the Trois-Frères cave in
France, for example, extends his arms forming an expressive angle parallel to his left
raised thigh (White 2003: 57). There are also instances of rock engravings showing what
can be legitimately construed as evidence of meaningful gestures such as the upper
body of a man whose arms are extended in front of him forming a gentle raising
curve with palms facing the ground (Vialou 1998: 112). In other, perhaps more recent
rock paintings, schematic leg and arm positions are evocative of (trance?) dancing with
bodies arced and arms thrown back (e.g., White 2003: 164).
These are mere glimpses, but future discoveries might further document the range of
gestures which formed the dynamic cultural repertories of these populations.
4. Prehistoric hands
But the most vivid evidence of prehistoric gestures is found in the numerous panels of
hand stencils and hand prints which are displayed, often in large numbers, on parietal
surfaces (e.g., Barrière 1976; Cummins and Mildo 1961). They are direct, indexical signs
of deliberate human actions, and they presuppose the complex technologies of stencil-
ing, painting, and printing. Replications have explored various methods to produce
these marks such as blowing a coloring liquid with the mouth on the hand resting on
the rock surface for stenciling. Some hands are also smeared with a painting material
and applied following a printing technique. However, the meaning and purpose of
such behaviors remain elusive, and the morphological characteristics of many of
these hands’ indexical signs are puzzling. The only sure thing is that prehistoric humans
in some areas which are well distributed across the whole planet have consistently pro-
duced sets of such hands, sometimes in large numbers and other times more sparingly in
combination with figurative and abstract signs (e.g., Manhire 1998). This brings us as close
as can be to the physical presence of prehistoric gestures: the preparation of the hand and
its deliberate application on a mineral support on which it remained permanently or at
least until today, several tens of thousands years later. It should be noted that this practice
is still observed among Australian aborigines (Gunn 1998; Wright 1985).
But there is more. All hands on these panels do not look the same. First, there are
variations of techniques, as noted above, as well as chromatic diversity (red, white,
black, and ochre). Second, not all five fingers are visible, as if some had been cut at
the first or second joint, or – and this would be a remarkable evidence of meaningful
gesture – as if some fingers had been selectively bent before applying the color. Several
hypotheses have been proposed to explain such concentrations of hand representations
and their morphological diversity. Each one assumes a complex set of gestures per-
formed in view of a rich socio-cultural context to explain the archaeological record.
But two gesture-relevant features must be distinguished: first, the fact that humans pro-
duced these clusters of stenciled and printed hands by using coloring material and press-
ing their palms against the rocks; secondly, the fact that in many cases some fingers or
parts of fingers are missing, thus creating series of incomplete hand images. Two
extreme theories have been proposed to explain the latter data: a naturalistic view
which claimed that diseases and accidents simply can account for the absent digits
(Barrière 1976) and a linguistic interpretation which contends that the variations
shown might bear witness to the existence of a kind of sign language (Bouissac 1997).
The first hypothesis is based on medical data listing all the known pathologies which
can affect human fingers, suggesting that at least parts of the populations which produced
such images of their hands were seriously crippled by diseases and accidents. The other
19. Prehistoric gestures: Evidence from artifacts and rock art 305
offers a vision of a fairly sophisticated social life in which alternate forms of language
could emerge and generate a kind of hand writing based on sign languages. The only
evidence that could support the first theory would be the discovery of a critical number
of skeletons with missing phalanges. As to the second hypothesis, only a systematic par-
sing of the data could decide whether the variations are random or reveal some form of
combinatorial order with the iteration of recurring sequences (Bouissac 1994). But, so
far, these hypotheses have remained mere virtual possibilities and have not been the
object of serious investigation.
There are, however, some cases in which simple observation reveals intriguing prop-
erties of hand clusters. In their first illustrated account of the Cosquer cave, Clottes and
Courtin (1996) describe seven types of stenciled hands they have recorded among a set
of decorated panels in the cave:
(i) whole left hand with all fingers intact (10 examples);
(ii) whole right hand with all fingers intact (3 examples);
(iii) left hand with little finger folded (2 examples);
(iv) left hand with little and ring fingers folded (15 examples);
(v) left hand with little, ring, and middle fingers folded (6 examples);
(vi) left hand with all fingers folded except the thumb (1 example);
(vii) right hand with all fingers folded except the thumb (1 example) (Clottes and
Courtin 1996: 77).
This limited set offers less diversity than what can be observed in the Gargas cave in
which 16 types have been described (Barrière 1976). Gargas has been thoroughly
described (e.g., Leroi-Gourhan 1967), but Clottes’s and Courtin’s account is not the
result of an exhaustive recording in the Cosquer cave with its multiple galleries
which had not been yet fully explored when their book was published. Describing
types is a first step. More significant would be to record the way in which these types
combine in the panels in which they are often mixed with animal figures. For instance,
Clottes and Courtin (1996: 73) reproduce the engraving of a horse whose front part is
surrounded and partially covered with seven hands which combine four different types
according to the following sequence: types 1, 4, 2, 3, 4, 4, 1. It is hardly plausible that
these stenciled hands belonged to differently crippled people. Is it not more probable
that they are evidence of a form of sign language?
But even if such extreme theories are discarded as improbable fancies, the fact remains
that humans have performed these complex cultural behaviors consisting of elaborating
techniques of printing and stenciling, and producing images on rock surfaces which
have variously persisted until today. Whether we can reliably infer from these facts the
ancient existence of more complex gesture systems will depend, on the one hand, on fur-
ther archaeological discoveries and, on the other hand, on the elaboration of investigative
methods based on the parsing of large databases which are still to be constructed.
5. References
Barrière, Claude 1976. L’Art Parietal de la Grotte de Gargas [par] Cl. Barrière; avec la Collabo-
ration de Ali Sahly et des Élèves de l’Institut d’Art Préhistorique de Toulouse. Translated
from the French by W.A. Drapkin. Oxford: British Archaeological Reports.
Bednarik, Robert G. 2001. Rock Art Science: The Scientific Study of Palaeoart. Turnhout:
Brepols.
Bouissac, Paul 1994. Art or script? A falsifiable semiotic hypothesis. Semiotica 100(2/4): 349–367.
Bouissac, Paul 1997. New epistemological perspectives for the archaeology of writing. In: Roger
Blench and Matthew Spriggs (eds.), Archaeology and Language I: Theoretical and Methodolog-
ical Orientations, 53–62. London: Routledge.
Clottes, Jean and Jean Courtin 1996. The Cave beneath the Sea: Palaeolithic Images at Cosquer,
translated by M. Garner. New York: Harry N. Abrams.
Cummins, Harold and Charles Mildo 1961. Finger Prints, Palms, and Soles. New York: Dover.
Delson, Eric, Ian Tattersall, John A. Van Couvering and Alison S. Brooks (eds.) 2000. Encyclope-
dia of Human Evolution and Prehistory. New York: Garland.
Gunn, Roger G. 1998. Patterned hand prints: A unique form from Central Australia. Rock Art
Research 15(2): 75–80.
Leroi-Gourhan, André 1967. Les mains de Gargas. Essai pour une étude d’ensemble. Bulletin de
la Société Préhistorique Française 63(1): 107–122.
Manhire, Anthony 1998. The role of hand prints in the rock art of the south-western cape. South
African Archaeological Bulletin 53: 98–108.
Marzke, Mary W. 1997. Precision grips, hand morphology, and tools. American Journal of Physical
Schick, Kathy D. and Nicholas Toth 1993. Making Silent Stones Speak: Human Evolution and the
Dawn of Technology. New York: Simon and Schuster.
Vialou, Denis 1998. Prehistoric Art and Civilization. Translated by Paul G. Bahn. New York:
Harry N. Abrams.
White, Randall 2003. Prehistoric Art: The Symbolic Journey of Humankind. New York: Abrams.
Wright, Bruce 1985. The significance of hand motif variations in the stenciled art of the Australian
aborigines. Rock Art Research 2(1): 3–19.
Paul Bouissac, Toronto (Canada)
20. Indian traditions: A grammar of gestures

in classical dance and dance theatre
1. Introduction
2. Natyasastra and its contents
3. Nrittahasta – pure dance gestures
4. Hastabhinaya – gestural expression of hands
5. Devahastas – gestures depicting gods
6. The gaze – integrating hands and eyes
7. Conclusion
8. References
Abstract
Extensive use of gestures is a widely practiced tradition in the dance and dance theatre
forms of India. While articulation with various limbs, like hands, eyes, waist, hips,
neck, head, etc., are considered gestures and discussed in ancient treatises like the
Natyasastra (Bharata, ca. 3rd Century BC), it is the use of hands or the hasta, called
20. Indian traditions: A grammar of gestures in classical dance and dance theatre 307
Hastabhinaya that continues to enjoy relevant importance in physical expression. These

hand gestures are employed in the dynamics of meaning-construction. The forms or hand
shapes have become conventionalized through long practice and their description in
many ancient treatises, often leading to the notion that such practice is essentially cultur-
ally specific, ritualistic and prescriptive. The actual practiced living traditions however
reveal that gestures are more than just an outer representation of conventionalized
forms. A form-based approach from the perspective of movement studies and current lin-
guistic research on gestures that analyses the actual movement methodology used in the
process of meaning construction enables insight into qualitative aspects of Indian per-
forming arts hitherto not looked into. The hand form may thus derive its meaning from
the content or context it is used in, but conveys meaning essentially through its articulation,
which incorporates the hand shape as well as the movement or placement of the hand in
any given context. Such a movement analytic and linguistic approach, which looks at the
movement principles employed in the execution of hand gestures, also enables a differen-
tiation of content, which in Indian dance traditions is culturally specific, from form, thus
revealing underlying principles like conceptualization, cognition and embodiment.
1. Introduction
Performing Arts in India are known for their extensive use of gestures, specifically in
the dance and dance theatre traditions. While articulation with various limbs like
hands, eyes, waist, hips, neck, head, etc., have been considered as gestures and discussed
in ancient treatises like the Natyasastra (Bharata, ca. 3rd century BC), it is the use of
hands, the hasta, called Hastabhinaya that became the most form of physical expression,
seen in dance traditions even today. These gestures are distinguished between those
conveying the meaning in an accompanying text or song, and those that are used in
non-narrative dance, i.e. to ornament movement for aesthetic purposes. Forms or
hand shapes have become conventionalized through long practice and their description
in many ancient treatises, often leading to the notion that such practice is essentially
culturally specific and ritualistic. Also, the hand shape itself is mistaken for the gesture
that conveys the meaning, thus confusing form for content. The actual practiced living
traditions however reveal that gestures are more than just an outer representation of
conventionalized forms. The movement technique used to bring forth the semantic
intent the performer wants to achieve and transmit to the spectators varies depending
on various factors. The hand form may derive its meaning from the content or context
it is used in, but conveys meaning essentially through its articulation, which incorpo-
rates the hand shape as well as the movement or placement of the hand in any given
context. A form-based approach from the perspective of movement studies and cur-
rent linguistic research on gestures that analyses the actual movement methodology
used in the process of meaning construction enables insight into qualitative aspects
of Indian performing arts hitherto not looked into. Laban/Bartenieff Movement Stu-
dies, also known as Laban Movement Analysis and Bartenieff Fundamentals, devel-
oped by dancer, choreographer and philosopher Rudolf von Laban (1879–1958) and
his disciple Irmgard Bartenieff (1900–1981) in Germany, UK and USA, which have
evolved into a multi-layered movement analytic system, provide tools for movement
execution, observation and analysis. In the current linguistic research on gestures, the
form-based approach using gestural modes of representation provides a methodology
for linguistic gesture analysis (Müller 1998, 2009: 514–515). They analyze criteria that
lead to the execution of gestures and describe the origins of gestures in mundane
everyday practices of the hand. These two approaches are deemed appropriate for
the current analysis, which – due to the scope of this article – will be restricted to a
few hand gestures as used in the dance theatre styles Kuchipudi and Bharatanatyam
of South India. Indian performing arts employ hand gestures in the dynamics of mean-
ing-making through physical expression, which appear to be culturally specific and
prescriptive. The movement analytic and linguistic approach will look at the move-
ment principles employed in their execution, enabling a differentiation of content,
which in Indian dance traditions is culturally specific, from form, which could be
meta-cultural, following principles like conceptualization, cognition and embodiment.
Section 2 will briefly explain the contents of the earliest treatise Natyasastra, and look
at some passages relevant for the present analysis. A description and analysis of ges-
tures in non-narrative dance follows in section 3, of narrative dance in section 4 and
after briefly introducing how hand forms are used for the representations of gods in
section 5, an example will illustrate in section 6 the integration of hands and eyes.
2. Natyasastra and its contents

Natyasastra, perhaps the earliest known treatise on performing arts, gives answers to
questions with regard to Natya, defined as follows in chapter 1, verse 116: “No wise
utterance, no means to achieve learning, no art or craft, and no useful device is omitted
or ignored in it” (translation by a Board of Scholars: 1.116). Natya deals with the rep-
resentation of expression found in the world. Nritta (Ch. 4) or dance and abhinaya
(Ch. 8) which makes inner feelings and emotions obtainable and in combination with
drama constitute natya. In particular, Abhinaya (8.1–7) is specifically defined as the
art of expression using body, speech, ornamentation and inner conceptual disposition
(8.9). In the volume translated by a Board of Scholars, the root meaning of the syllables
in this term are defined as making something obtainable and arriving at an interface.
This term thus shows a close relationship to the understanding of gestures in gesture
research. Nritta, on the other hand, is the aspect of dance that does not carry meaning.
The movements here are purely ornamental and meant for pictorial aesthetic pleasure
(4.13–16). Siddhi or the impact of such representation is attained when similar feelings
are evoked in the spectator. Ideally, the expression of feelings and emotions, when
combined appropriately, i.e. using worldly realistic portrayal lokadharmi, and stylized
depiction natyadharmi, should lead to the experience of the sensate world of Rasa, a
term which can roughly be translated as ‘relish’. It is an experience, which becomes a
collective mood, encompassing the performer and the spectators.
Natyasastra explains what a performer employs or needs to employ to represent any
meaning in its fullest. A whole range of bhavas ‘emotions’, their causes and effects
and the physical manifestation of emotional states are discussed in chapters 6 and 7.
Chapters 8 to 13 explain how those conversant with the performing arts would repre-
sent such physical manifestations with the help of gestures of various body parts,
gaits and stances. Among these, Chapter 9 deals completely with the gestures of the
hands. The opening verses introduce the theme as explaining the activity of the
hands in their capacity to enhance the performance of a play, their characteristics
and their application as they are commonly used. There is a distinction made between
(i) gestures used in drama, i.e. for narrative purposes where samyutha (combined) and
asamyutha (non-combined) hands are used to convey a meaning and
(ii) gestures in dance, Nrittahastas, which are used for ornamenting a movement and
providing aesthetic experience.
Subsequent verses list the nomenclature given to the hands in each of these categories,
the description of their hand shape, movement and/or positioning and their multi-
functional usage. These names continue to be used even today. These names are often
of objects similar in shape to that formed by the hand, but the meaning that is derived
from their usage need not always have a semantic correspondence with the name or
shape. In combined gestures and dance gestures, the names often denote movement
and/or spatial aspects, as will be discussed in the analysis of these in the following sections.
Further, five different positions of the hands are identified: palm up, circular motion, ob-
lique, stable and palm down (9.176–177). Of particular interest are the verses 155 to 159
which appear useful for the present analysis. Here, Bharata explicitly says that while he ex-
plains the application of gestures in association with different ideas, such application can be
used elsewhere based on the personal judgement of the actor, bearing in mind the form,
movement significance and class, meaning social strata to which the character belongs.
However, since every gesture can be used to convey an idea, and many commonly used ges-
tures have meanings, he suggests that these can be used as one pleases to depict emotions
and their activities, the suitability of their meaning being important. These verses support
the idea that meaning construction is the basic function and movement technique plays a
significant role in making the semantic content obtainable. Also, that gestures in Natya
are abstracted from common use found in the world. They also enable an understanding
for the conceptually oriented approach to the use of gestures in Indian performing arts.
The primary text, Natyasastra, influenced later treatises that shaped regional dance
and drama styles in subsequent centuries. These styles evolved over time and continue
to be practiced in various forms such as dance, dance theatre or folk theatre, combin-
ing movement, gestures, song, spoken text, music and/or other features as governed by
context and creativity. One of these styles, that integrates all these features, is the
Kuchipudi Yakshagana tradition which evolved since the 13th century in the north east-
ern parts of the state of Andhra Pradesh. The 20th century has seen its evolution into a
sophisticated and refined dance theatre style. The style known today as Bharatanatyam
evolved in the temples and courts of South India. It focuses on precise linearity of move-
ment and subtle expression. Its present repertoire dating back to the 18th century has
been revived and refined by various practitioners in the 20th century. While Kuchipudi
follows the Natyasastra in its entirety, Bharatanatyam primarily follows a secondary
text, the “Abhinayadarpana” of Nandikeswara dated around the 11th century AD. In
the following analysis the Nrittahastas are based partly on the definitions found in the pri-
mary text and as practised in Kuchipudi today and partly on the activity of hands in pure
dance of Bharatanatyam. The analysis of gestural expression with narrative content is
based on the secondary text and as used in Bharatanatyam.
3. Nrittahasta – pure dance gestures

As mentioned in section 2 above, the hand gestures in pure dance are called Nrittahastas.
The only description given of these hands is that, as opposed to hands used to convey
a meaning, they are used in karanas. Karanas are defined in Chapter 6 (30–34) as the
combined movement of the hands and feet. They are basic units of nritta defined
above. Ghosh (1967/2007) translates this term as ‘dance hands’ and comments
that – as the meaning implies – they are used in dance and at times also in combina-
tion with the other two varieties of hand gestures for ornamental purposes. In the
Natyasastra, the nomenclature and description of these hands often specify all or
any of the following:
(i) Spatial relationships of hands to body and/or to each other;

(ii) The hand form;
(iii) The movement of hands;
(iv) Sometimes imagery enabling an understanding of such movement.
3.1. Nrittahastas of Kuchipudi

In the practiced Kuchipudi tradition, one sees how the nrittahastas defined in the
Natyasastra are executed. The first Nrittahasta for example, called caturasra, follows
the description given, namely “Two khatakamukha hands are to be held eight angulas
away from the chest while the shoulders and elbows are on the same level.” (Translation
by a board of scholars: 148). The specification here is of the hand shape used (khataka-
mukha), the distance from the body (eight angulas is in practice executed by stretching
the arm) and the exact description in relation to the body (level). In a gesture called
urahparshvardhamandali, the name has the words uras ‘chest’ and parshva ‘side’, denot-
ing in the name itself where the hands move. The description then specifies the hand
shapes to be used. It corresponds with the movement as seen in Kuchipudi, where
the hands alternate from a position in front of the chest to the side. Not all gestures
however seem to correspond in their execution with the description of the movement
that is defined in the treatise. Also, later texts often have other descriptions for the
same names, revealing yet other practical traditions. These variations suggest that
while the gestures described in the treatises build a theoretical basis for gesture execu-
tion in pure dance, the actual living tradition has always had a larger variety of move-
ments with corresponding hand shapes, which continue to be created and re-created
constantly.
3.2. Nrittahastas of Bharatanatyam

In the secondary text that is used in Bharathanatyam, the Nrittahastas are specified by
the hand forms used for this purpose. While hand forms mentioned in the treatise are
restricted to a few, the movement itself is not specified. In the practised tradition, the
spatial aspect of the movement called a Hastakshetra is defined as the movement of
the hand from one sthanaka ‘position’ to the other in space. It reveals greater freedom
in execution, enabling various combinations. The initiating and end positions of the
hands and the hand forms used therein, which have become conventionalized through
practice, however, seem to follow certain principles of body organization and spatial
cognition, which will be analyzed in the following section.
3.3. Analysis of Nrittahastas

As cited above, there is a position of the hands in space in some relationship to the
body. In most cases, there is also a specified hand form. In actual practice, both
Bharatanatyam and Kuchipudi use the space around the body extensively. Such usage
can be analysed using the Laban/Bartenieff movement analytic system, which defines
four categories that govern movement, namely Body, Effort, Shape and Space. They
enable observation, analysis, execution and experience of movement. Since space in
relationship to body seems to be the most obvious factor in the description of the nrit-
tahastas, I will restrict myself to some of the space harmonic principles postulated by
Laban in his Choreutics (Laban 1966) and organizing principles and concepts that
underlie movement of the human body, developed by Bartenieff and her disciples
(Hackney 1998) called Bartenieff Fundamentals.
The hands seen in Kuchipudi remain for the most part in the near and middle reach
range around the body, incorporating movements of the sides of the body and flexions
at the waist. There is the impression of circular movements when hands come back to
the body to start a new movement phrase. This movement technique stresses on grace-
ful movements that – when executed accordingly – flow out of the torso. Thus, hand
movements not only move at specified levels, as seen above, but also integrate the
upper body. The spatial relationships used reveal a correspondence to body organizing
principles, whereby the connection being established is predominantly between the core
or centre of the body at the navel and the distal edges defined as the extremities like
hands, feet and head. This is one of the Patterns of Total Body Integration defined
by Hackney (1998: 51–83). Another is the Breath Core. Fluidity of the movements is
thus enabled by breath, which integrates the core and the distal edges being articulated.
The aesthetics of this style with a preference for grace and flow, therefore, reveal body
organization principles that enhance fluidity, which in turn, I propose, imply an incorpo-
ration of such aesthetics with a specific purpose, namely to lead from outer articulation to
an inner experience of subtle energy attributed to breath, a concept known from the prac-
tice of physical disciplines like yoga. The notion of navel, nabhi, as organic centre also
finds reference in ancient texts when referring to the universe (Vatsyayan 1996: 53).
In the case of Bharatanatyam, it is the far reaching range of the hand that is often
emphasized, revealing a preference for the distal edges and a crisp linearity in move-
ment. Hands tend clearly towards Spatial Pulls, which according to the Space category
of Laban Movement Analysis are specific points in space corresponding to the corners
of crystalline forms. This concept of movement space, namely relating inherent move-
ment principles of the body with crystalline forms found in nature was mathematically
modelled by Laban (1966; Groff 1990: 141–158) based on movement observation in
relation to the three-dimensionality of the body of up-down, side-side and backward-
forward. Fig. 20.1 demonstrates a movement pattern found in Bharatanatyam where
the hand forms called khatakamukha and alapadma are employed for an opening
and closing movement. The fingers also open and close with a specific directionality
to arrive at the specified hand shapes. The hands are held at a distance of one measure
or tala from the chest, defined in practice as the distance between the tips of the out-
stretched thumb and index finger. As the hands open out towards the Spatial Pulls
side-high and side-low, they stretch the arms in the process, providing counter-tension
not only between each other, but also between the hand and the outstretched foot as
illustrated. Counter-tension is required for stability and equilibrium. These are concepts
again – like breath mentioned above – that are integral to the Indian world view
(Vatsyayan 1996: 56). Noteworthy is also that spatial relationships of the hastas corre-
spond to some extent with the classification of gesture space used in the analysis of
co-speech gestures (McNeill 1992: 328). For example, the initiation of movement in
pure dance of Bharatanatyam, as seen above, is almost always in front of the chest
corresponding to Centre Centre in gesture analysis.
Using Bartenieff Fundaments for analysis, one can see how the use of hands in the
Indian context goes further. Clarity of hand shape coupled with Spatial Pulls, a clear
Spatial Intent, i.e. the directional intention of the movement, and a movement phrasing
that conventionally gets initiated at the hands influence the movement of the upper
body at the neuromuscular level. Initiating movement with hand shapes involving an
exact placement of each finger, as seen in the above example, integrates the upper
body at a very subtle and deep level, because deep connectivity patterns exist between
the fingers, the ulna and radius and the scapula, going further down to the tailbone
(Hackney 1998: 157–158, 231–244; Hartley 1989: 170–174; Myers 2004: 154–165).
Hand gestures – when executed as defined – hence enable a greater movement range
involving the whole upper body, seen in flexions, rotations and spiralling movements.
For example, in other variations of the movement illustrated in Fig. 20.1, the hands
could move towards other Spatial Pulls like side-low-front and side-high-back, invol-
ving a spiralling of the body. Considering that Bharatanatyam uses all spatial pulls
around the body, a further analysis of dance movements (Ramesh 2008) along these
Fig. 20.1: Pure dance movement in Bharatanatyam

lines, too elaborate to mention here, suggests a correlation to the principles of ancient
Indian architecture called Vastusastra, where natural forces, determined amongst others
by geographic directions, influence the wellbeing of man (Schmieke 2000). This would
perhaps throw some more insight into what has been termed auspicious relevance of
nritta in the Natyasastra. The visible traditional precision of these movements in geo-
metric space, thus analyzed, clearly places the culturally specific notion of aesthetics
in a broader sense of kinaesthetic understanding, without denying its embeddedness
within the framework of a specific world view. Vatsyayan (1996, 1997) discusses the
role the human body plays in the Indian world view. The understanding and specifica-
tion of such a role is furthered by the analysis of Indian movement aesthetics based on
Laban/Bartenieff Movement Studies and linguistic gesture analysis.
4. Hastabhinaya – gestural expression of hands

In this section, I will illustrate and analyze the use of hands in narrative and expressive
contexts based on some of the 28 non-combined gestures defined in the treatise
Abhinayadarpana. For the purpose of this analysis, they have been taken from the
Bharatanatyam repertoire.
When using hands in narrative dance, the expression is the function of the movement,
allowing for a much more varied, playful and creative use to suit the narrative (Ramesh
1982: 71). Hands are used to depict a variety of contexts such as actions, objects, concrete
and abstract ideas, emotions, and more (Ramesh 1982: 68–82). When being used in asso-
ciation with the expression of emotions, they reveal a rich insight into metaphors and me-
tonyms. At the same time, these hands or hastas are not stand-alones. They get their
specific meaning only when executed with corresponding illustrative exactness within
the context of a narrative and textual content. Often, they are more than their obvious
depiction. Showing a bird sitting on a tree for instance is not necessarily just a bird sitting
on a tree. It gets associated with moods, feelings, spring or whatever the imagery of the
performer associates with the bird. For the most part, the gestures are used very similar to
co-speech gestures that occur naturally. Therefore, these gestures can be defined using the
gestural modes of representation, such as acting, representing, drawing and modelling as
classified in gesture analysis (Müller 1998: 123; 2009: 514) as some examples in Tab. 20.1
illustrate. In the following sections a detailed analysis of the actual execution of some of
these examples whilst constructing the envisaged meaning will be given. The examples
illustrate how the idea of gestures being a mimetic medium (Müller 2009) is a living
practiced tradition in performing arts in the Indian context:
Tab. 20.1: Analysis of some gestures used in the Bharatanatyam repertoire depicting actions,
objects, concrete and abstract ideas, emotions, and more.
Name of Acting Modeling Drawing Representing
Hasta
Kapittha Milking, holding/ Draping of Eye make-up Bird, Lotus in
playing cymbals, robes Goddess Lakshmi’s
offering incense hands
or light
(Continued )
Tab. 20.1: Continued
Name of Acting Modeling Drawing Representing

Hasta
Alapadma Playing ball, Mountain, Face, body, Lotus flower, pot,
rotation world limbs, daylight ball, mirror, moon,
city
Pataka Sweeping, rain Clouds, waves Whirlpool Sword, shield
Mrgasirsa Playing the Pot belly, roof Floor House, sorrow
instrument vina, of a house ornamentation
beckoning
Kartarimukha Falling Temple tower, Eyes Eyes, difference of
creeper, braid opinion
4.1. Example 1 – Kapittha

The gesture called Kapittha due to its hand form is suitable in the context of holding,
grasping or clasping. Description of some usages like holding flowers at the time of
making love, grasping the end of robes, draping of cloth or offering incense or light sug-
gest varied contextual origins. The following examples will illustrate how the gesture is
used in the construction of meaning in an actual performative context:
(i) In the case of the cymbals, the shape of the gesture with the forefinger placed over
the tip of the stretched thumb and the other fingers closed into a clasp refers to the
holding of cymbals. In actual performative use, the gesture is acting or enacting the
playing of the cymbals, by either striking the bent forefinger of the right kapittha
hand against the open left palm or striking both the kapittha hands against each
other. The illustrative exactness of the act of playing the cymbals re-creates the
actual movement when playing them, with the difference that a stylized gesture
is used to enhance its visual impact. When striking the gesture against the left
palm, the representation is of cymbals, which are small and rather heavy and
give rhythm for dance. Here the wrists are placed firmly in Centre Centre, per-
forming the striking action with quick, strong and bound movements as when hold-
ing and striking with something small and heavy to produce a specific sound.
Striking both the kapittha hands against each other is mostly used in the context
of cymbals accompanying a prayer. In such a case the movement is light and
free, suggesting the lightness of corresponding cymbals. In a broader metonymic
sense, the act of playing them could also stand for dance or prayer.
(ii) For milking the cow, the forefingers of both the kapittha hands are raised and low-
ered alternately with the acting hands moving downwards vertically, so as to illus-
trate a grasp and downward pull, thus enacting the milking process. One may ask,
why the act of milking is depicted, because not all mundane activities find expres-
sion in this way. Popular narratives are the cowherd stories involving Krishna. The
story of his stealing butter is usually an example of elaboration of a theme, which
shows the whole process of butter-making. Natyasastra, however, does not refer to
the stories of Krishna; the first play performed using natya is mentioned as the
churning of the milk ocean. This narrative refers to Vishnu, the sustainer of the
world who rests on the milk ocean. An example that reveals the use of the linguis-
tic principle of contiguity is the depiction of the milk ocean, where the action of
milking preceding a gesture depicting waves is enacted with the hands.
(iii) An example that is commonly used in actual practice and finds no reference for
such a usage in the treatise is the use of the kapittha hand to illustrate a bird.
Here, the hand shape represents the bird implying usage due to similarity in
shape. The positions of the hand in gesture space can vary corresponding to the
context in which a bird gets depicted. Higher up in far reach space shows for
e.g. the bird perched on a tree. A precise placement of the wrist at the same
time enhances the meaningful execution of the gesture. Closer to the body it
could be used to express the emotion of compassion with a corresponding patting
or feeding movement with the other hand.
(iv) In denoting the goddess Lakshmi (Fig. 20.2) both hands in kapittha are placed with
firmly grounded wrists at the level of the shoulders. The hands here depict the
holding of lotus flowers, in a strict sense as an acting gesture. They also depict
the lotus flowers in a metonymic sense and when placed as mentioned above,
the whole stance represents Lakshmi in an iconographic representation. The
hands stand for the lotuses Lakshmi holds and the positioning of the hands stands
for Lakshmi. When using only one hand, it is derived from the social context of
ladies or queens holding a flower and is used to portray a woman.
Fig. 20.2: Goddess Lakshmi

4.2. Example 2 – Hamsasya

The gesture called hamsasya, with the tips of the thumb and index finger touching, is
used as per Abhinayadarpana to denote amongst others assurance, giving instructions,
small and tender objects like pearls or jasmine flowers, act of painting or horripilation.
It is a co-speech gesture found in other cultures (Kendon 2004: 238–247). It is a mudra
‘imprint’ or ‘stamp’, another term for hands shapes used for a specific purpose, primar-
ily spiritual in function. One such mudra is the chinmudra with a spiritual contextual
origin, representing the union of the human and divine. Some of the meanings con-
structed by using this gesture in the actual practiced tradition today clearly imply
such a contextual origin, as seen in the depiction of the Buddha, purity or inner self.
The following exemplify its movement execution in correspondence with meaning.
(i) When used to show small or tender objects like pearls, jasmine flowers or the wick
of a lamp, it is often used as an acting gesture to illustrate through the context of
such an action, like plucking a flower, the object being depicted.
(ii) To illustrate a painting the gesture can be used as a sketching gesture by tracing
the rims of the frame or as an acting gesture in the actual movement of painting,
drawing or writing.
(iii) When depicting concepts like truth or time, the movement sketches a vertical line
downwards (Fig. 20.3).
Fig. 20.3: Time, truth. (I = Initiation, E = End)

4.3. Example 3 – Suchi

This hand is shaped with the index finger stretched and the other fingers closed into a
fist. The gestures performed with this hand are examples again where concrete and
abstract ideas get represented through a differentiated placement or movement of
the hand, which would best suit the spatial and visual association to such an idea.
(i) The act of threatening is executed by shaking the outstretched index finger of this
hand shape at the level of the shoulder with the corresponding expression in the
face, coming from a social context. A threat can also be expressed by pointing
the finger towards the person or persons.
(ii) It could denote the world by either tracing spiralling patterns with the index finger
in a moulding gesture above the head, while the eyes are used in a deictic sense
referring to the vast expanse of the universe. The extended index finger of the
suci hand is also used to show the world or earth in a large horizontal circular
movement at the level of the shoulder. The imagery of a spiralling universe or a
round earth being depicted here reveals insight into the nature of the world. If
being more specific and wanting to denote the concept of three worlds, another
gesture called trishula with its three outstretched fingers denoting the number
three is used, either as a preceding gesture or by holding it with the left hand,
while the right hand executes the spiralling movement.
(iii) As a deixis to point at something. For instance if a city or location is held with ala-
padma of the left hand, the right suchi hand points at it.
(iv) To represent death, when held at Centre Centre, index finger pointing up. If this
placement is combined with the other hand palm down in alapadma over the
head it represents an umbrella. The same gesture, however with the alapadma
hand held in a lateral position, represents the sun.
(v) To represent beauty, the index finger is used sketching the shape of the eyebrows.
At this point, it is important to mention that the technique of using gesture space in
front of the body using precise positions corresponds to observations made in co-
speech gestures (McNeill 1992: 378). The fact that such a placement has the purpose
of attracting the attention of the spectator to the gesture is fully utilized in Indian per-
forming arts. The above examples also reveal that the gestures are not just the hand
forms symbolic of what is being represented, but are the actual enacting of the de-
picted. At times, the gestures are used with an over-layering of modes, similar to pat-
terns of gesture production seen in co-speech gestures. For example, the gesture for
eyes both sketches the shape of the eye and represents it at the same time. Also, as
seen in some of the above examples, where gestures are executed simultaneously or
sequentially creating a phrase, the semantic intent made available is due to contiguity.
Such a meaning giving proximity is called sthithibheda in linguistic understanding in
India. Also, in these phrases the left hand is often used to denote the object and the
right hand to denote the action. In the case of pure dance gestures or nrittahastas,
the movement is mirrored on the other side equalizing the body halves and establishing
the experience of verticality, a concept also crucial in the Indian worldview (Vatsyayan
1996: 51–53, 1997: 10–11).
5. Devahastas – gestures depicting gods

The hands for depicting gods, coined Devahasta are examples of gestures which do get
used symbolically. Gods are identified by their attributes, symbolized by the objects
they hold in their hands. These objects are depicted with the help of specific prescribed
hand forms, either by representing them or using a hand form that holds the objects.
The placement of the hands in relation to the body is also prescribed, corresponding
with iconographic representations of the deities. Siva, for instance, can be distinguished
when a meditative pose is embodied with the hamsasya hands placed on the knees,
palms up, representing the posture taken during meditation. Or when each of the
hands takes a representing gesture in iconic similarity with a trident and a deer, held
at shoulder level on each side of the body in far reach space. When the hands embody
a flute, or the holding of a flute, Krishna is represented. If holding a bow and arrow,
which resembles the actual holding of bow with the left and the arrow with the right
hand, it would be Rama. In a narrative context, an acting gesture of shooting the
arrow with the right hand may be used to enact for example the destruction of an
enemy. When the hands embody gestures with iconic similarity to a conch and discus,
held at near reach space at the shoulders, one recognizes Vishnu, and so forth. These
examples of symbolic gestures enable a differentiation between symbolic and general
modes of depiction.
6. The gaze – integrating hands and eyes

Where the hands go, the eyes follow. Where the eyes go, thoughts follow. Where the
thoughts go, feelings follow and where feelings are present, Rasa is experienced
(Nandikesvara, 11 century AD). While bhava ‘feelings’ are shown in the face, the ges-
tures suggest various details. So that a precisely learnt and executed gesture can trans-
port both thoughts and feelings at the same time, the use of the eyes in association with
the hands is crucial. This applies as much to pure dance gestures as it does to expressive
dance. In pure dance, the gaze refers to the spatial intent of the hands and enhances the
spatial pulls involved. In the case of a narrative, the emotive content becomes evident
only when the face and eyes get integrated in the process of gesture production. While
plucking a jasmine flower, the reaction to the smell of the flower expressed in the eyes/
face adds a further detail to the object being depicted. Or in more subtle and refined
narratives one makes the mood obtainable in the act of plucking the flower, which
could be for the purpose of using it for prayer, adoration, ornamentation or to express
the feeling of the season it grows in with a corresponding gesture of the eyes showing an
emotion. The imagery created depends upon the ability of the performer to visualize
such a situation. In a mother-child relationship for example, there are various moods
or reactions one can associate within the context of such a relationship like an exasp-
erated, loving, patient or overjoyed mother, or a cranky, playful, or sleepy child.
I can creatively weave in little details. The eyes of the mother can convey loving won-
der, while her hands describe the child’s curls or eyes. Her annoyance or exasperation,
expressed in her body attitude and eyes can be combined with her hands showing a
rocking movement of putting the child to sleep. The feeling gets associated with the cor-
responding detail when the actual integration of the gestures of the hands, eyes, stance
etc. occurs. Normally, a combination of action and narration takes place, depending
upon whether the performer is describing the child from the mother’s perspective or
becomes that child. The challenge in the performative context is the progression
from a learnt and thus conventionalized gesture language to producing gestures as if
they were in a spontaneous co-speech context – a reversal of some sorts of Kendon’s
Continuum (McNeill 2005: 5–7).
7. Conclusion
Presented above are but a few examples from the rich repertoire of using gestures in
Indian performing arts. The small cross-section of selected gestures illustrated here
allows us to assume that the gestures of the hand in the Indian context are as rich
and varied as the capacity of the human mind to use imagery for expression. There is
also a process of stylization seen in the use of gestures to express and communicate
abstract ideas of spatial cognition. Though treatises describe mythological origins of
the gestures, the examples above also reveal that there has been an abstraction of ges-
tures from real life contexts. The analysis reveals that a cultural specificity seen in the
usage of the gestures is due to the contents depicted in Indian dance rather than due to
form. The actual execution follows principles that are cognitive in nature, involving spa-
tial relationships, visualization and imagery, which underlie conceptualization. In as
much, the function that becomes evident is that gestures abstract and embody a move-
ment event that establishes outer expression, external spatial relationships and internal
connectivity patterns. They depict a speech act, allow a thought, a feeling, or an emo-
tion to materialize – become obtainable and transmittable in the act of performance,
as the term abhinaya suggests. This also reveals that abhinaya even in its traditional
origins, as mentioned in texts and seen in practice has always been a conscious and
structured methodology to mirror the feelings and concepts in the mind.
8. References
Bharatamuni 1987. The Natya Sastra. English Translation by a Board of Scholars. Delhi: Sri Sat-
guru Publications.
Ghosh, Manomohan 1967/2007. Natyasastra (A Treatise on Ancient Indian Dramaturgy and Histri-
onics), Volume 1. Varanasi: Chowkhamba Sanskrit Series Office.
Ghosh, Manomohan 1975. Nandikesvara’s Abhinayadarpanam. Calcutta: Manisha.
Groff, Ed 1990. Laban Movement Analysis: An historical, philosophical and theoretical perspec-
tive. Unpublished Manuscript.
Hackney, Peggy 1998. Making Connections. Total Body Integration through Bartenieff Fundamen-
tals. New York: Routledge
Hartley, Linda 1989/1995. Wisdom of the Body Moving. An Introduction to Body-Mind Centering.
Berkeley, CA: North Atlantic Books.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University
Press.
Laban, Rudolf 1966. The Language of Movement. A Guidebook to Choreutics. Boston: Plays, Inc.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago/London:
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Berlin Verlag.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), Routledge Linguistics
Myers, Thomas W. 2004. Anatomy Trains. Myofasziale Meridiane. Munich: Urban & Fischer.
Ramesh, Rajyashree 1982. Krishna zeig Dein Gesicht! Erzählender Tanz aus Indien. In: Johannes
Merkel and Michael Nagel (eds.), Erzählen. Die Wiederentdeckung einer Vergessenen Kunst.
Geschichten und Anregungen: Ein Handbuch, 68–82. Reinbek, Germany: Rowohlt.
Ramesh, Rajyashree 2008. Culture and cognition in Bharatanatyam. Integrated Movement Stu-
dies Certification Program Application Project. Unpublished Manuscript.
Schmieke, Marcus 2000. Die Kraft Lebendiger Räume. Das Große Vastu-Buch. Aarau, Switzer-
land: AT Verlag.
Vatsyayan, Kapila 1996. Bharata. The Natyasastra. Delhi: Sahitya Akademi.
Vatsyayan, Kapila 1997. The Square and the Circle of the Indian Arts. New Delhi: Abhinav
Publications.
Rajyashree Ramesh, Frankfurt (Oder) (Germany)
21. Jewish traditions: Active gestural practices

in religious life
1. Introduction
2. Gestures accompanying Torah learning/reading among Yemenite Jews
3. Handclapping as part of the prayer in the Breslov Hassidic movement
4. Putting on the phylacteries, kissing the mezuzah, writing sacred texts on parchment
5. Kissing gestures
6. Shokeling
7. References
Abstract
Of all the richness of Jewish gestural culture, this entry focuses on a number of currently
active traditional bodily practices, which appear to be the beating heart of this culture.
The practices under discussion are: the laying of phylacteries, mezuzah kissing, Torah
scribing, gestures that accompany Torah reading, touching/kissing the Western Wall,
prostrating on graves of Tzadiks, applause and hands waving in the prayer, and
some others. In these practices, the cultural knowledge opens up for the retaining/
advancing cultural work, thus uniting their symbolic and pragmatic functions. The
technical and anthropological descriptions of the practices are followed by cultural-
rhetorical analysis, which discovers a metaphorical mechanism at their basis. A meta-
phorical identification is established between a real person and the ideal figure of the
perfect worshiper, or between a physical body and a sacred one (the Book, the
Word), or between a real space and a sacred one (Holy Land). Some of the mentioned
practices, being strongly built in the complex cognitive-physical-lingual activities, enter
the everyday communication and oratorical behavior as bodily techniques observed in
the closing part of the entry.
21. Jewish traditions: Active gestural practices in religious life 321
1. Introduction
The prominent role played by language(s) in Jewish culture, especially in Jewish reli-
gious culture, is well known. One should not forget, however, that in the course of hun-
dreds of years, the Jewish tradition integrated numerous gestural practices, some of
which have been retained and included in the actual religious activities (performed
both by men and, to a lesser extent, by women.) These practices are closely tied to
the lingual ones and to the language itself as the communication basis.
The present entry discusses living gestural practices associated with communicative
traditions in Jewish culture. Everyday gesticulation and purely ritual gestures will not
be considered, out of mainly methodological reasons. Ritual gestures in the strict and
narrow sense constitute an immense store of cultural knowledge, but their communica-
tive function is limited to rigid ceremonial occasions in which each participant transmits
his predictable knowledge as a kind of meta-addresser who addresses a kind of meta-
addressee. Communicativity is here stretched to the limit of its ability. A good example
of this kind of gesture is the ritual of the Priests’ Blessing. The priests perform the bles-
sing gesture with their heads and hands covered by their prayer shawls while the con-
gregation stands with downturned eyes. Uri Ehrlich has done a study of gestures in
canonical prayer customs, their meanings and their origins in Jewish law (halacha)
and culture (Ehrlich 1998).
One reason why gesticulation is an important element in everyday communication is
that it has become detached from its cultural-symbolic roots and inserted into an imme-
diate communicative situation, to whose operative needs it is subordinated: definition of
self in others’ eyes, definition and management of the situation, influence and control,
an aid to speech (Goffman 1959). Naturally, one finds unique elements of everyday ges-
ticulation among different national, local and ethnic groups, but such elements do not of
necessity reflect a national or a cultural character. David Efron’s classical study (which
showed how the New York Jews’ gestures are re-shaped in accordance with the environ-
ment) is still relevant for this issue today (Efron 1972).
Traditional gestural practices are “trans-communicative systems”, vital non-verbal
information systems which are personal and cultural, pragmatic and symbolic all at
once. Cultural information is here made available to everyday cultural activity which
simultaneously both preserves cultural memory and advances the skills of its actualiza-
tion. As it is known, certain gestures constitute entire cultural archives feeding valuable
information to be kept and passed on into intra-cultural communication channels. How-
ever, a part of these gestures, mostly ritual, do not take part in the active social rhetoric.
In terms of hierarchy, their “dialect” belongs to such a high register that there is no
social configuration that would allow its translation into a lower, everyday cultural dia-
lect, its pragmatic utilization. For their high status, these gestures pay with their abso-
lutely static condition. On the other hand, everyday gestures lacking cultural memory
cannot be kept in high codes, but enjoy almost unlimited dynamics. Mutual translatabil-
ity of high- and low-register cultural dialects is typical for a defined group of gestures
integrated in active traditional body practices and techniques, such as crossing oneself
in Western Christian cultures, or lifting one’s hat as a greeting. Such practices of ritual-
ity within everyday routine are culture’s beating heart. In the present entry, some of the
more unique practices of this type within this Jewish culture will be presented, proceed-
ing from the rare to the more common.
We shall start with the discussion of two gestural customs, typical for two respective
communities. The first one is the Torah reading gestures by Yemenite Jews – a large
community characteristic of ancient uniform roots in the past (some maintain – from
the destruction of the first Temple), of a great diaspora, and of a considerable cultural
heterogeneity nowadays. The second one – clapping hands by Breslov Hassids – a rel-
atively young community (the founder, Rabbi Nachman of Breslov, died in 1810), con-
solidated and homogeneous. Naturally, two communities and their customs were
selected for analysis in which the gestural dimension is important and especially essen-
tial. The intent behind addressing originally Middle-Eastern and Eastern-European
communities is to create a colorful picture without the pretense of developing the sub-
ject fully or reaching conclusions concerning the cultural uniformity of the customs at
hand.
After that we shall turn to customs widespread amongst Jewish observant
communities – putting on the phylacteries, kissing the mezuzah, writing sacred texts on
parchment – and we shall see that gesture is their constitutive mechanism. The gestural
essence of kissing the Holy Scriptures, the Western Wall, etc., which will be discussed fur-
ther on, is all too clear, however the explanation of its communicative and symbolic func-
tioning will require further theoretical effort. To sum up, we shall examine the gesture
constituting perhaps the most distinctive characteristic of an observant Jew – swaying
during the service or, in other words, shokeling (from the Yiddish word “to shake”).
2. Gestures accompanying Torah learning/reading among

Yemenite Jews
The custom of using manual gestures when reading from a Torah scroll has been studied
in the two contexts in which it takes place: in the interaction between teacher and stu-
dents when learning the Torah; and in the interaction between the Torah reader and his
assistant (ba’al ha-qore) in the synagogue during the Torah reading ritual (Katsman
2007). These motions do not only help in the correct recital of the text; they also pro-
vide a visualization and a spatialization of the sacred text, a kind of empirical actualiza-
tion of the memory of revelation. The fact that the scene is not meant for the eyes of the
public-at-large but rather for the intimate internal context of reading a sacred text and
for the relationship between the two people at its center, strengthens the similarity
between Yemenite Jewish gestures and Buddhistic-hinduistic ritual-dance gestures
(mudras). Torah learning is a mechanism of creating cognitive-bodily micro-dynamic
programs in the actual interaction between teacher and student, and also a mechanism
for the assimilation of these programs by the student until he can perform them auto-
matically and unconsciously. The learning consists of creating links between different
types of action: voice, hand and body movements. A Torah verse is built as a complex
dynamic construct in a physical-cognitive multidimensional space. The gestures are not
normative; they are learned as part of a normative practice, become an integral part of
ritualistic traditional behavior, and are transferred from one generation to the next, but
they do not undergo canonization and are performed freely and almost unconsciously.
The gestures constitute a body technique that at a very early age shapes the system of
behaviors which make up the reading. They are an integral part of the reading, a phys-
ical-cognitive habit.
3. Handclapping as part of the prayer in the Breslov

Hassidic movement
Among members of the Breslov Hassidic movement, handclapping during prayer has
come to possess a unique conceptual and mystical importance, especially during the
holiday prayers or in special prayers commemorating specific events or held in certain
places, such as during the pilgrimage to Rabbi Nachman’s shrine at Uman. Handclap-
ping accompanies certain words or verses which appear in the prayer. This custom, in
contrast to the gestures among Yemenite Jews, has a known source: Rabbi Nachman’s
specific instructions, as formulated in his own writings.
(i) Handclapping as preparation for the utterance of prayer: “To prepare and repair
the mouth”: “When a man awakens himself with his hands, [the wings] awaken,
the wings of the lung, where speech is formed. But still there is need to prepare
and repair the mouth, to accept speech inside it, and by striking one palm against
the other, by this the mouth is made” (Nachman 2008: 141:45).
(ii) Handclapping turns prayer into a coupling (in the mystical-symbolic sense of the
word, probably as unification of man with the Sacred, Holy Land etc.): “So there
will not be a difference between the prayer and coupling” (Nachman 2008: 141:46).
(iii) Handclapping makes prayer able to mitigate (in the original – “to sweeten”,
lehamtik) Heavenly sentences concerning the fate of the individual or the
community (Nachman 2008: 141:46).
(iv) Handclapping for achieving unity (Nachman 2008: 141:46).
(v) Handclapping as prophecy in prayer, as an aid in evoking a prophetic imagination
with which to represent the image of God (Mark 2003: 290).
(vi) Handclapping creates “air of the Land of Israel”: “By what will you be granted
that your prayer shall be in the air of the Land of Israel? (…) It is by means of
handclapping that prayer is in the air of the Land of Israel (…) and by handclap-
ping he thus lives in the air of the Land of Israel” (Nachman 2008: 141:44). “Hand-
clapping by the person who prays can arouse the imagination and create a kind of
enclave similar to the Land of Israel, which maintains a space of imagination and
prophecy that cloaks the praying person and helps him attain a prayer that is like
prophecy” (Mark 2003: 291).
This custom thus serves to symbolically originate an ideal praying person. A similar
mechanism can be seen in the Torah reading gestures of Yemenite Jews: Their symbolic
function is to produce an ideal reader/worshipper. To use Kenneth Burke’s rhetorical
terminology, handclapping creates an identification of the worshipper with the absolute
praying person, an identification of the prayer space with the space of the Land of
Israel, an identification of speech with prayer. However, the Breslov conception
of gesture contains another dimension as well: in the rhetorical identity there is a
latent non-identity as well (as Paul Ricoeur has written), which is typical of a meta-
phorical connection. Non-identity or the dismantlement of identity is the second, com-
plementary objective of the handclapping gesture; it is what leads the supplicant to the
sacral “madness” and ecstasy. Gesture here functions as a metaphorical link between
the ideal and the empirical (on the relation between gesture and metaphor see: Cienki
and Müller 2008). Here the gestures neither express, nor signify, nor symbolize, but
rather originate and maintain the ideal I-other figure, the figure of the perfect worship-
per. The functioning of a gesture as a metaphor can be presented with the help of the
Groupe µ model, as a combination of two synecdoches (Dubois et al. 1981: 106–110):
Real figure ———————>> Gesture <<——————— Ideal figure

Reducing synecdoche Reducing synecdoche
4. Putting on the phylacteries, kissing the mezuzah, writing

sacred texts on parchment
The same mechanism is operative also in practices of putting on the phylacteries and
kissing the mezuzah, as explained below. The main commandments relating to the phy-
lacteries and the mezuzah are “You shall bind them” and “You shall write them”, in
other words a kind of a symbolic act of binding. Thus, in the “Hear O Israel” passage,
one of the texts found in both phylacteries and mezuzahs, we find:
Hear, O Israel: The Lord our God, the Lord is one! You shall love the Lord your God with
all your heart, with all your soul, and with all your strength. And these words which I com-
mand you today shall be in your heart. You shall teach them diligently to your children,
and shall talk of them when you sit in your house, when you walk by the way, when you
lie down, and when you rise up. You shall bind them as a sign on your hand, and they
shall be as frontlets between your eyes. You shall write them on the doorposts of your
house and on your gates (Deuteronomy 6: 4–9).
One puts on the phylacteries at least once a day. The phylacteries are two leather boxes
(called “houses”) containing pieces of parchment with Torah verses written on them;
they are tied with leather strips, one to the arm and the other to the head. A mezuzah
is a box that contains a parchment with two texts concerned with the mezuzah com-
mandments. One kisses the mezuzah every time one crosses the threshold of nearly
every building or room: usually this involves touching the mezuzah with one’s fingers
and then kissing those fingers (see also next section). The performance of the com-
mandment consists in the gesture of binding to the body itself (as in the case of the phy-
lacteries) or by means of the body (as in the case of the mezuzah); the difference is not
significant in the matter at hand.
Writing a Torah scroll is also a binding gesture of sorts. It is only through the gesture
of writing the Torah scroll that a person can really “bind” the holy letters to his body.
This gesture, too, functions as a metaphorical operator: On the one hand it is a synec-
doche of writing (by entailment and causality), and on the other hand it is a synecdoche
of the body (by entailment, proximity, part/whole and specific/general):
Body ———————>> Gesture <<——————— Writing

Reducing synecdoche Reducing synecdoche
In these acts, of writing a Torah scroll and putting on the phylacteries, the gestures are
similar to mudras. One meaning of the word mudra, according to a proposal put forth
by Otto Francke, is “script” or “the art of reading”. Fritz Hommel, too, hypothesizes
that the etymology of mudra goes back to the Babylonian word musaru, “script”, in
which the “s” was replaced by “z” when the word entered the Persian language:
musaru – muzra – mudra (Eliade 1969: 367–368). As in the Middle Ages, the ultimate
object of all these gestures is God (Schmitt 1992: 80–81). The essence and objective of
the binding gesture is reading by the body: The mind reconstructs the invisible written
text (which is enclosed inside the casings of the phylacteries and the mezuzahs) both at
the propositional (the text’s words) and the visual (the appearance of the text written
on the parchment) levels. This is the reason why the writing rules must be so strictly
adhered to: The actual written text must conform absolutely to the image of the prayer
text which is evoked in an observant Jew at the time of binding/touching. If the written
text does not fit the mental image, the speech act, the prayer, could be deemed void or
failed. The correct and successful performance of the binding gesture thus confirms the
prayer’s correct and successful performance; in other words it, again, creates the perfect
worshipper.
5. Kissing gestures
We shall now examine a number of acts which can ostensibly be defined as kissing ges-
tures: Kissing a mezuzah, kissing the phylacteries, kissing a Torah scroll, kissing the
Western Wall. These gestures are components of religious or quasi-religious, ritual or
quasi-ritual (as in the case of kissing the Western Wall) practices. We can point to a
body technique which underlies all these gestures and which constitutes their psycho-
cultural motive, but does not exhaust the gestures; this may be called the practice’s tech-
nical component. The technical component of the above-mentioned gestures would
seem at first glance to be the gesture of kissing. Indeed, kissing is an important and
frequently-encountered body technique in many cultures, which is apparently rooted
in the instinctive and innate act of suckling. But in order to define an instinctive act
as a body technique, it is necessary to describe the cultural development which it un-
dergoes in this or that culture. And in fact, a more careful analysis of the gestures in
question shows that their technical component is not the kissing. Note that in every
one of these gestures the act of kissing is accompanied or mediated by a hand move-
ment: the person touches the mezuzah with his hand and kisses the hand (1); he holds
the phylactery case and brings it to his mouth (2); in the case of a Torah scroll, tech-
nique no. 1 is used when the scroll is taken out of the Ark and returned to it; with a
printed Torah technique no. 2 is used; at the Western Wall both techniques are used on
non-ritual/canonical occasions: Either one touches the Wall with one’s hand which is
then kissed, as in the case of the mezuzah, or one touches the Wall and at the same
time also kisses it. It is thus the hand movement which provides the key to sorting the
given gestures according to their technical-cultural functions. Clearly, therefore, the
gesture of touching with the hand is the technical component of these gestures. In
our case we can define the kiss as the practice’s core gesture. The core gesture contains
no genetic information, but the raw energy which feeds future gestures, but not the
genetic information, the technical manual which determines their growth and their
cultural role.
The body technique which is thus learned and stored in memory at a very young age
is a kiss which is accompanied/mediated by a hand movement and a touch. Every one of
the gestures in the group discussed here involves the hand/mouth touching the words
of the Torah. The gesture of touching the Western Wall is no exception. Most of the
customs associated with the Western Wall imitate synagogue-related behaviors in many
details. In fact, the Western Wall is the ultimate mizrah (a synagogue’s eastern wall) of
the Jewish liturgy. The Western Wall is treated with the same marks of respect as the
Ark containing the Torah scrolls in the synagogue. For example, one does not turn
one’s back on it before receding a few steps. Such manifestations of trepidation and
respect are directed at the sacred center represented by the Holy Book, here repre-
sented by the Western Wall. Thus observant Jews implement towards the Western
Wall the same body technique which they have learned in childhood to use in the syn-
agogue, after this new application was approved by the rabbinical authorities. Touching/
kissing the words of the Torah has a double symbolic function, which is in fact one:
Writing and binding. The words of the “Hear O Israel” passage are themselves a speech
act, similar to a blessing or oath, an elocution in John Austin’s (1962) terminology: Say-
ing the words constitutes their performance, in other words, by pronouncing them they
are metaphorically bound to the heart simultaneously with their non-metaphorical
binding to the arm and the head in the ceremony of putting on the phylacteries. The
primary gesture here is the metaphorical one, while the physical gesture is its de-meta-
phorization, which takes place in the process of the realization of the ideal worshipper.
Kissing or touching the word ties it to the body, writes it on the body (in and by means
of the body) and turns the human body into a Holy Book, into an embodiment of holy
love. This is a gesture of love: The love of God for man as embodied in the Torah’s
words encounters and becomes one with man’s love for God as realized in the touch/
kiss/writing of these words.
6. Shokeling
Back-and-forth movements of the body during prayer have become one of the most
familiar attributes of an observant Jew, to a large measure due to the efforts of the
Jews themselves, who sought and found prayer gestures that differ from those used
by other faiths. The movement consists of a series of very light bows (with bending
at the waist), slight twists of the body left-right, and from time to time – minor bending
of knees before the bowing (everything according to specific phrases of the prayer). It
usually proceeds at a more-or-less free pace, except for some specific cases (especially
during the Eighteen Blessings, the central part of the three mail daily prayers) in
which deeper bows are required. This is all that has remained of the bows, kneelings
and prostrations which characterized the ancient pre-exilic Jewish liturgy. The move-
ment hat is supposed to express the heat of faith is usually quite restrained; only
rarely, especially in certain hassidic and mystical circles, does it attain a truly ecstatic
level. The same goes for the hand movements which accompany prayer, for example
striking the chest when reciting the tahanun (plea). The movements’ restrained nature
endows them with a rational and somewhat reserved character; yet it is also what
makes them more similar to everyday and rhetorical gesticulation. The shokeling
today constitutes a body technique, which is learned and imprinted on memory in
childhood and youth, through constant, thrice-daily practice. It can be assumed that
this technique is characteristic of males rather than of females, as, basically, the obli-
gation of prayer applies to the former ones only. Furthermore, one may observe that
women for the most part shokel more slowly and less vigorously than men (Heilman
1976: 218).
It is important to note that the movement is learnt in tandem with the typical
recital tune used in prayer or reading the Torah, and in a specific social and circum-
stantial frame, which may with considerable justice be defined as an epideictic rhetor-
ical situation (e.g. a situation of public teaching). This body technique has been
transferred from prayer to the lessons in yeshivas, where it is associated with the
behavior of teachers and students when teaching, learning, reading and speaking.
Nowadays one can see and hear rabbis, saintly or just plain observant Jews slightly
bending their torso back-and-forth and speaking with singsong intonation (imitating
prayer or Torah reading) when they give a speech, eulogy or sermon, explain their
position, or debate on occasions which are very far removed from prayer, on topics
which may have nothing whatever to do with faith or religion. Furthermore, their
audience can be seen to move in tempo with them; this can become a mass phenom-
enon when the speech or sermon is given by very revered (usually hassidic) rabbis and
holy men. And since for most people prayer involves reading a prayer book, and most
religious studies are based on reading books, the body technique of shokeling has pe-
netrated into the behavior accompanying the reading of books in general, even non-
religious books in circumstances which have no connection with the liturgy. Perhaps,
however, a distinction should be made between the yeshiva student, who “never seems
to stop shokeling, even when he stands in conversation with another”, and “the mod-
ern Jew” who shokels only during moments of most intense” prayer (Heilman 1976:
218–219).
What are the symbolic and pragmatic functions of this technique? A well-known
opinion is that of the Rabbi from The Kuzari by Judah Halevi: in the past, when not
everyone could have his own book of prayers, people were bending in turns towards
the book to have a look on it. This assumption, an absolutely imaginary one, can be
understood in the context of the inter-confessional dispute described in The Kuzari:
the Rabbi’s response escapes making an analogy between the gesture in the Jewish cus-
tom and non-Jewish rituals, as well as avoids insulting other religions and representing
Jews in a defensive, opportunistic light. However, this assumption is of particular inter-
est for us, since it connects gesture with book reading. Even if it lacks any real basis, it
reflects a deep cultural logic, in which, on the one side, a need “to touch” the sacred
words stipulates the gesture performance, and, on the other side, the gesture perfor-
mance stipulates the possibility of the reading. The desire directed to God is embodied
in the gesture of desire to read the book thus enabling a worshipper to fulfill his or her
duty (and a congregation – its duty) and perform the ritual to the best advantage.
The main symbolic meaning of shokeling during prayer is well-known; it expresses
the supplicant’s self-abasement and submission before God or the Divine Presence.
When prayer is perceived as a discourse or dialogue, the accompanying gestures rein-
force and illustrate its interactive nature, that most complex and mysterious aspect of
prayer. The shokeling, together with other movements, are meant to actualize the
motto of “All my bones shall say” (Psalms 35:10), that is to make the body an active
participant in prayer (which can also be understood as an observance of the command-
ment “You shall love the Lord your God (…) with all your strength”) here are also
another symbolic explanations based on various biblical verses, such as comparison
of the human soul to the candle of God (Proverbs 20:27), and description of the trem-
bling of People of Israel when they were given the Torah near Mount Sinai (Exodus
20:15).
The gestures thus have the purpose of creating the (collective) image of the perfect,
ideal supplicant or, in the concepts of Jewish mysticism, the image of the ideal bride
(the congregation of Israel), one fit to accost the ideal, heavenly Groom. The shokeling
in fact creates the stage of sanctity and fear of God, so that the interaction among peo-
ple here and now symbolizes/stages/reconstructs the occasion of the revelation, which
was the originating interaction between God and man. The shokeling, similarly to the
handclapping in rabbi Nachman’s approach, presences the sanctity of the Land of Israel
and the Temple wherever it is performed, after (and since) the Temple was destroyed
and the People of Israel exiled from its land. Now, another Temple was built instead,
in the body of every Jew at prayer. And just as the Temple was the social, and not
just the religious center of the people, so also the curtsy has the purpose of bringing
about the social and national unity of the congregation by means of the body. This is
the source of this gesture’s social “mission” in the profane sphere, which constitutes
its pragmatic function. The gesture thus turns into a body technique which comes
back into action on every social-epideictic, religious and quasi-religious, sermonizing
and teaching occasion, and unifies the participants by turning them into ideal pilgrims
to the Temple. After all, any Jewish congregation is traditionally called a “holy
congregation”.
To sum up, the Jewish traditions preserve numerous quasi-ritual gestural practices
(some of which turned into body techniques) that are charged with rich symbolic-
cultural meaning and integrated into everyday activities. As it can be seen from the ana-
lysis of some of these practices, their main function is to shape an ideal worshipper. A
gesture functions as a metaphor, merging one’s real character with the ideal one that
fills a human being with sanctity (of the Holy Land or the Holy Scriptures). The ges-
tures thus serve as a means of spreading sanctity wherever they are performed.
When these gestures, by turning into bodily techniques, permeate non-religious spheres
of life, they act so as to produce the ideal cultural community. Since the present article
has not explored the subject in full detail and has never attempted to do so, it can only
be assumed, with a certain degree of confidence, that despite geographical and historical
differences between various Jewish communities, they share the same conception of ges-
ture and of its function in ritual and in everyday cultural (religious or quasi-religious)
communication.
7. References
Austin, John L. 1962. How to do Things with Words. Oxford: Clarendon.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Dubois, Jaques, Francis Edeline, Jean-Marie Klinkenberg, Philippe Minguet, François Pire and
Hadeline Trinon 1981. A General Rhetoric. Translated by Paul B. Burrell and Edgar M. Slotkin.
Baltimore: Johns Hopkins University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton.
Ehrlich, Uri 1998. ‘All My Bones Shall Say’: The Non-verbal Language of Prayer. In Hebrew,
Jerusalem: Magness Press.
Eliade, Mircea 1969. Yoga: Immortality and Freedom. Translated by Willard R. Trask. Princeton,
NJ: Princeton University Press.
Goffman, Erving 1959. The Presentation of Self in Everyday Life. Harmondsworth: Penguin.
22. The body in rhetorical delivery and in theater: An overview of classical works 329
Heilman, Samuel C. 1976. Synagogue Life: A Study in Symbolic Interaction. Chicago: The Univer-
sity of Chicago Press.
Katsman, Roman 2007. Gestures accompanying Torah learning/recital among Yemenite Jews.
Gesture 7(1): 1–19.
Mark, Zvi 2003. Mysticism and Madness: The Religious Thought of Rabbi Nachman of Bratslav. In
Hebrew, Tel Aviv: Am Oved.
Nachman Ben Simcha, of Breslov 2008. Likutei Moharan. In Hebrew, Modiin: Tiferet ha-Nahal
Institute.
Schmitt, Jean-Claude 1992. The rational of gestures in the West: A history from the 3rd to the 13th
centuries. In: Fernando Poyatos (ed.), Advances in Nonverbal Communication: Sociocultural,
Clinical, Esthetic, and Literary Perspectives, 77–95. Amsterdam: John Benjamins.
Roman Katsman, Ramat Gan (Israel)
22. The body in rhetorical delivery and in theater:

An overview of classical works
1. Introduction: Delivery, delivery, delivery
2. Aristotle: “No one teaches geometry in this way”
3. Theophrastus: Mirrored emotions
4. Stoics and Epicureans: The language of nature
5. Rhetoric for Herennius: Mapping the voice
6. Cicero: Controlling body language
7. Quintilian (35–100 CE): On “natural technique”
8. Conclusion
9. References
Abstract
This chapter analyzes views on deportment found in classical texts. The author argues that
while most of the extant texts focus on rhetorical delivery, they also offer insights into
theories of acting, and more general ideas about the meaning of gesture and deportment.
In addition to the standard texts of Aristotle and Quintilian, the chapter discusses fragmen-
tary and indirect testimonies to Theophrastus’ mirror theory of emotions, Stoic and Epicu-
rean views on body language, as well as Roman theorists’ attempts to map both voice and
gesture. The author concludes that a close reading of technical treatises and fragments
points to the existence of what one might term an ancient theory of human communication,
concerned with emotions, the genesis of language, and nuances of meaning.
1. Introduction: Delivery, delivery, delivery

Delivery was termed hypocrisis in Greek and actio in Latin. Both terms point to
the connection between rhetorical delivery and the skill of the theatre professional,
known as hypocrites or actor. Taking a cue from Cicero’s term eloquentia corporis or
“bodily eloquence,” we can assume that rhetorical delivery implied the same disciplined
use of the body as the art of oratory did of language. Consequently, in order to maxi-
mize the efficiency of the message he delivered, the orator had to hone both his verbal
and non-verbal skills to a high level of artistry – and artificiality. It is possible that by the
time of Quintilian the orator would have planned in advance every aspect of his per-
formance, from finding convincing proofs down to the right moment to touch his cheek,
much like the director and performer of a one-actor play (see Sonkowsky 1959:
272–273).
The earliest testimonies on the study of delivery suggest that a skilled use of hypoc-
risis was believed from the beginning to be extremely effective in public speaking. An
anecdote about Demosthenes (384–322 BCE), repeated (among others) by Cicero in
Brutus (142), Orator (56), and On the Orator (3.213) and by Plutarch in the Lives of
the Ten Orators (845a–b), bears witness to these cultures’ perception of bodily eloquence
as an integral element of the art of public speaking:
One day, when Demosthenes was walking home disheartened after he had been hissed out
of the assembly, (…) the actor Andronicus cheered him up very much by telling him that
his speeches were good, but his delivery needed work; Andronicus then recited from mem-
ory what Demosthenes had just said in the assembly. As a result, Demosthenes was con-
vinced to entrust himself to Andronicus [for training]. From then on, whenever someone
asked him what was the most important aspect of the art of rhetoric, Demosthenes replied,
“Delivery;” “And the second?” “Delivery;” “And the third?” “Delivery.” Plutarch in the
Lives of the Ten Orators (845a–b)
Some ancient theorists of rhetoric come close indeed to the radical statement that the
anecdote ascribes to Demosthenes; for example, Theophrastus (371–287 BCE) claimed
that delivery was the single most important element of the art, whereas Quintilian (35–
100 CE) went as far as to state that how one spoke was more important than what one
said (11.3). Others, however, disagreed; for example, Aristotle (384–322 BCE), De-
mosthenes’ exact contemporary, omitted the subject of delivery from his treatise on
Rhetoric, arguing that it was concerned mostly with manipulating the audience’s emo-
tions and, since it had been borrowed from actors, that it was altogether trivial. Cicero
(106–43 BCE), for his part, extolled the importance of delivery in On the Orator, en-
couraging his readers to learn from actors, yet warning them against acting as though
they were on stage (3.213–3.220). The art of rhetorical delivery, it follows, was consid-
ered both vital and entirely negligible, both similar to and different from the craft of the
hypocrites or actor. In order to disentangle these contradictions, I analyse in this essay
theories of delivery formulated in ancient treatises of rhetoric, from Aristotle’s dismis-
sive comments to Quintilian’s apology, identifying two main theoretical trends, Peripa-
tetic and Stoic, the latter having received hardly any attention in scholarly literature on
ancient delivery.
For most of the 20th century, classical scholarship on ancient rhetorical theory
showed an Aristotelian leaning: the most influential monographs mentioned deliv-
ery only briefly. The dismissive treatment of the Roman treatises on delivery in
Clarke’s classic Rhetoric at Rome ([1953] 1996) is typical of this trend. Clarke (1996:
35–6) reluctantly summarizes the account of delivery found in Rhetoric for Here-
nnius, adding that he hopes that the author’s obsession with delivery was not represen-
tative of contemporary Roman thought (see Kennedy 1972, 1980, 1994). The mid 1990s,
however, brought a radical change in critics’ attitude towards body language. Two mono-
graphs on delivery in the Roman world, Gleason’s brilliant analysis of rhetorical perfor-
mance of masculinity (1995) and Aldrete’s positivist attempt to reconstruct rhetorical
gestures (1999), appeared simultaneously with two groundbreaking books on Homer, La-
teiner’s (1995) and Boegehold’s (1999). These were followed by Gunderson’s literary
and psychoanalytical reading of rhetorical declamations (2003) and Corbeill’s extensive
critical analysis of the functions of gesture in Roman culture (2004). (For a more de-
tailed discussion of recent work on rhetorical delivery in Rome, see Hall 2007: 234).
In these works, the reader will find a perceptive context-oriented analysis of non-
verbal communication in classical literature, rhetorical practice, and ritual as well
as theoretical reflections on the nature of embodied communication (see especially
Corbeill (2004: 109–136) and his use of Bourdieu’s theory of habitus). In the present
essay, I steer away both from an empirical study of the orator’s deportment and its his-
torical conditions and from modern theory, and instead focus on the ancient writers’ at-
tempts to theorize the role of the human body in communication. This, as will be
apparent shortly, entails a discussion of both theatre and public speaking.
2. Aristotle: “No one teaches geometry in this way”

Despite his reluctance to discuss delivery, Aristotle is, in fact, the first writer on record
to elaborate the theoretical rudiments that influenced all later critics down to Cicero
and Quintilian. The only Pre-Aristotelian treatise on rhetoric we have, Alcidamas’
On Those who Write Speeches or On Sophists, argues that nothing replaces extempora-
neous speaking. This work, however, contains no comments on delivery (Edwards 2007:
47–57; O’Sullivan 1992: 42–66). In what is, then, the earliest extant attempt to concep-
tualize delivery (see Fortenbaugh 1985: 269–270), Aristotle describes this new field as
pertaining to the performer’s use of voice, facial expression, gesture, dress, and deport-
ment (Rhetoric 1408b); he stresses that all these facets of communication are expected
to convey emotions (pathe) that suit the circumstances of the speech and the speaker’s
persona (1413b9–1413b10) (see Fig. 22.1). His reasons for not discussing delivery in
detail are stated at the beginning of book three of the Rhetoric (1403b20–1404a19).
This much-quoted passage merits a detailed consideration here.
Delivery
1. Voice 2. Department
1.1 Volume
1.1.1. Loud
1.1.2. Quiet
1.1.3. Medium
1.2 Pitch
1.2.1. High
1.2.2. Low
1.2.3. Medium
1.3 Rhythm
Fig. 22.1: Aristotle on Delivery

The section on hypocrisis is a part of Aristotle’s discussion of diction (lexis), in which

he painstakingly explains to his reader that, in addition to knowing what to say, one
must also know how to put the right content into the right words. In addition to
these main two considerations, two topics – what to say and how to put it in words –
there is, Aristotle admits, a third topic, which he will not discuss, namely, how to prop-
erly deliver the right content put into the right words:
…and the third topic, concerning delivery, has a great potential, but has not yet received
attention.
Delivery influenced the art of tragedy and rhapsodic recitation late, for at first the poets
acted in their tragedies themselves. Furthermore, it is clear that this aspect is present in
rhetorical as well as poetic art, just as some people, including Glaucon of Teos, have
demonstrated.
Now, delivery is manifest in voice, how to use it[’s volume] according to each emotion, such
as, loud or quiet or in between, and how to use the tones [of voice], such as high and low
and in between, and how to use rhythms for each emotion. For there are three matters that
people study: volume, pitch, and rhythm. And these people take prizes in nearly all com-
petitions, and just as actors now have a greater success than poets there [i.e., in recitations
of epic and tragedy], so also [it happens] in political competitions because of the corruption
of the citizens [that people skilled in delivery prevail]. But there is no treatise about these
things, given that even the study of diction began only recently.
In fact, carefully considered, this matter [of delivery] is rather trivial. However, since [not
only delivery but] the whole study of rhetoric pertains to influencing opinions [rather than
attaining true knowledge], we must deal with it, not because it is right, but because it is
necessary. Seeing that the just thing to do in a speech is to seek nothing beyond [facts]
and to arouse neither sorrow nor joy, it is just to compete by means of facts themselves.
Consequently, all things outside the demonstration itself are superfluous.
Nevertheless, as we have said, delivery has a great potential because of the corruption of
our audiences. This necessity [to discuss delivery] is, however, only a small part of the
whole teaching of diction. For it does make a certain difference in clarifying things whether
we say things in this way or another, but not such a great difference, since all these things
[linked to delivery] merely pertain to putting on a show for the audience. This is why no
one teaches geometry in this way. Now, when this thing [i.e. delivery] comes [into fashion
among orators] it will do the same [to their art] as it did to the art of acting; in fact, certain
writers have attempted to say something about this, such as Thrasymachos in his Laments.
(Aristotle Rhetoric: 1403b3–1404a7)
Aristotle’s refusal to discuss delivery combines his outline of the rise of delivery, a brief
account of previous research and its scope, and the rationale for his omission of this
topic; I will discuss them in that order. The philosopher’s agenda is quite clear from
his quasi-historical account. In associating the increase of interest in delivery with the
fashion for poetic texts (both epic and dramatic) to be recited by people other than
their authors, Aristotle seems to be contrasting the performers’ studied delivery with
the authors’ presumably spontaneous rendition of their own work. Delivery, he implies,
is an essentially mimetic art that originally developed as a skill of actors imitating emo-
tions that they did not experience. As such, delivery risks skewing public debates, offer-
ing an unfair advantage to speakers willing and able to manipulate their audience’s
emotions. It is likely that, as Fortenbaugh suggests, when writing about delivery as an

instrument of corruption, Aristotle might have had in mind Athenian demagogues
such as Cleon (Fortenbaugh 2007: 119).
In the light of Aristotle’s theory, speakers who honed their delivery skills
(Demosthenes, presumably, among them) imitated actors in their attempts to elicit
an emotional response from their audience and were quite successful. According to
Aristotle, this practice had prompted several fifth-century theorists, including two stu-
dents of the Sophist Gorgias, Glaukon of Teos and Thrasymachus of Chalcedon, to dis-
cuss rhetorical delivery in their writings. None of the fifth century rhetoricians has
composed a complete treatise on this matter, and their work is now lost, so that we can-
not even ascertain that these authors used the term hypocrisis to speak of delivery (see
Sifakis 2002: 161 n.34). But it is to their observations that Aristotle probably alludes
when he explains that the study of delivery covers the volume and tone of voice, the
rhythm of speech, and the correlation between these vocal parameters and emotions.
It is worth noting that this approach bears strong resemblance to the modern psycho-
logical research as exemplified by Harrigan, Rosenthal and Scherer (eds.) 2005. (See
especially Juslin and Scherer’s analysis of vocal parameters (2005: 119) as compared
to the vocal categories in Fig. 22.2 below.)
According to Aristotle’s value judgment, however, such emotional involvement was
entirely inappropriate for public speaking, which, he tells us, should ideally consist in
presenting objective accounts of facts to audiences who would judge facts only, much
like in a discussion of geometry. Aristotle was thus fully aware of the growing impor-
tance of bodily communication in rhetoric and has proposed a definition of delivery
that included visual as well as acoustic effects. His objections against hypocrisis are
chiefly ethical: The art of using the body to simulate and stimulate emotions studied
by Sophists, he objects, brings neither the speaker nor the audience closer to truth
(as defined in Platonic terms; see Fortenbaugh 1986).
3. Theophrastus: Mirrored emotions

It would be fascinating to know how Theophrastus (371–287 BCE), Aristotle’s succes-
sor as the head of the Peripatus, circumvented his teacher’s objections in his own trea-
tise On Delivery. The work is now lost, and only a very short summary in Prefatory
Remarks to Hermogenes’ On Issues authored by Athanasius (4th–5th century CE)
allows us to deduce what On Delivery might have contained (see Rabe 1913: 3–8; see
Kennedy 2005). Athanasisus states that, just like Demosthenes, Theophrastus “claimed
that delivery was for an orator the greatest help in persuasion, and named as principles
(archai) of this art both the emotions (pathe) of the soul and the perception of them, so
that the movement of the body and the tone of the voice tend to be in harmony with the
entire science (episteme)” (see also Fortenbaugh 2007).
Theophrastus seems to have built his theory on some of the premises we found in
Aristotle’s Rhetoric. He apparently considered both voice and deportment as means
of hypocrisis and assumed that certain tones of voice and movements of the body
were naturally associated with certain emotions (Aristotle Rhetoric 1403b22). Theo-
phrastus further claimed (or so Athanasius suggests) that the art had two principles,
emotions and our perception of them, and that they had to be in harmony with “the
entire science.” The fragment does not specify which science is meant or whose
emotions and awareness would constitute the principles of the art. It seems, however,
plausible that the science in question pertains to emotions, and that the orator was ex-
pected to produce non-verbal signs to which the audience would respond with apposite
emotions (see Achard 2000: 2). Plutarch’s brief mention that Theophrastus explained
that voice could “both express and elicit sorrow, pleasure, and passion” confirms the
assumption that the audience was expected to empathize with the speaker (Table
Talk 623a–b). What the Peripatetic school proposed after Aristotle was, then, in all
likelihood, a psychological, audience-oriented theory of voice and deportment, which as-
sumed that the audience would mirror the emotions demonstrated by the speaker. This
rule of mirrored emotions would have implied that the speaker’s body is a dangerous
instrument capable of manipulating the listener’s response.
4. Stoics and Epicureans: The language of nature

As Quintilian notes in The Education of the Orator (3.1.15), the influence of Aristotle
and Theophrastus was such that “philosophers, especially the first among the Stoics and
Peripatetics, studied rhetoric with an even greater zeal than the rhetoricians them-
selves” (Quintilian 3.1.15). We know that the Stoic treatises usually contained a chapter
on delivery (Diogenes Laertius 7.43), and the snippets of their writings on the topic
bear witness to a theory rather different from that devised by the Peripatetic school.
The most explicit testimony to this theory is the following fragment from the writings
of Chrysippus (3rd century BCE): “I think that we should pay attention not only to
an honest and natural logic of speech, but also, in addition to speech, we should pay
attention to the proper elements of delivery with respect to the tones of voice that
impose themselves, and the expressions of the face and the hands” (Plutarch St. rep.
1047A–B; see Long and Sedley 1987 vol. 2: 189).
Chrysippus, it seems, thought of non-verbal communication as intrinsically related to
thought and speech – not only to emotions, of which the Stoics disapproved. (For mod-
ern views on the connection between speech and gesture, see McNeill 2005: 22–59). He
also mentions, unlike all earlier sources, hand gesture as an essential means of expres-
sion. Furthermore he intimates that there would be no need to study delivery, since the
right choices would simply be obvious to an honest speaker. Perhaps the reference to
honest and natural logic of speech can be read as an echo of a polemic against a dishon-
est and unnatural use of both language and body, but we can only speculate about the
philosophical debates around different views on delivery.
One conjecture we can make with some certainty, however, is that the Epicureans
held a position akin to the one found in the fragment of Chrisippus, namely that non-
verbal expression was natural and reflected directly all the processes that took place in
the mind, not just emotions. The Epicureans even engaged in evolutionary speculations.
Lucretius in On the Nature of Things (5.1022) alludes to the connection between gesture
and language acquisition in early childhood: “Thanks to their gestures and stammering
speech children appeal for compassion.” Epicureans not only observed that young chil-
dren communicate all sorts of concepts (e.g., direction) by means of gestures before
they acquire the ability to speak, but also deduced that human communication might
have historically progressed from gesture to spoken word: “But nature has forced peo-
ple to express diverse sounds of tongue, and necessity engraved names on things not in
any other way than inability to speak can be observed to induce infants to make
gestures in order to point with their finger things that are present” (Lucretius On the
Nature of Things: 5.1028).
Despite their scarcity, these testimonies to Greek thought on delivery allow us to risk
a tentative historical outline. The earliest theories proposed by Sophists drew upon the
art of reciting dramatic and epic poetry and focused on the voice and its ability to
express emotion. Aristotle, despite his misgivings, laid foundations for further study
of rhetorical delivery that focused on emotional associations of both voice and deport-
ment; Theophrastus continued in his steps, analyzing audiences’ emotional responses to
non-verbal signals. Next to this approach focused on public performance, we find tan-
talizing traces of broader theoretical thought on the role of the body in communication;
the Stoics expressed interest in the connection between thought, gesture, and speech,
while the Epicureans apparently reflected on the role of gesture in the development
of language. Although our sources for the broader theory are drastically limited, it is
hard not to notice the contrast between the Aristotelian argument that hypocrisis
should be dismissed as a mimetic activity that exploited emotions, and the Stoic belief
that bodily expression was natural and encompassed all activities of the mind. These
theories would have entailed rather different constructs of the speaker’s body. The peri-
patetic theory assumes that the performer/speaker needs to learn how to use his body
and project efficacious signs of emotions. The Stoic and Epicurean theories, as far as we
can conjecture, held that the body itself “knew” what to do. These concepts would, in
turn, have fostered different attitudes towards delivery, the first one leading to mistrust
towards the body’s histrionic abilities, the second, projecting enthusiasm for the alleg-
edly universal and natural language of the body. These two attitudes compete and at
times converge in the writings of Roman rhetoricians.
5. Rhetoric for Herennius: Mapping the voice

Two detailed discussions of delivery, by Lucius Plotius Gallus (see Suetonius in. Lives of
Rhetoricians 5.2) and Nigidius Figulus (see Quintilian E.O.11.3.143) were among the
earliest Roman attempts at systematizing oratory; these treatises, written at the begin-
ning of the first century BCE, are now lost, but we do have a survey of delivery in the
Rhetoric for Herennius, a general handbook (once attributed to Cicero), more or less
contemporary with Plotius and Nigidius. Working around 90 BCE (see Caplan 1954),
the author of Rhetoric for Herennius situates delivery among all other aspects of rhet-
oric, invention, style, artistic arrangement of parts, and memorization, asserting that if
delivery is not the most important aspect of the art it is a separate issue deserving care-
ful consideration (3.19). The author boasts that his discussion of this issue will be the
first, as all earlier writers on the topic deemed it “hardly possible to describe lucidly
voice, facial expression, and deportment, because these things concern our feelings
(sensus)” (Rhetoric for Herennius)
The anonymous author attempts to achieve this feat by creating numerous cate-
gories, sub-categories, and sub-sub-categories to pinpoint the communicative capacities
of the human voice (see 3.23–3.25; Fig. 22.2, left). He then presents the movements of
the body that befit various tones, implying that the body is an instrument subordinate to
voice and text (Fig. 22.2, right). “The speaker,” he writes, “should deliver his introduc-
tion in a calm and gentle voice, and in narration should resort to pauses as often as
possible, reserving his strength for the stretches of unbroken speech necessary in
conclusions” (3.21–3.22). The face is hardly mentioned and postures are treated as a
topic subordinate to vocal registers. In Rhetoric for Herennius 3.26, the author offers
his only specific advice on the subject of facial expression: “it is thus proper for the
face to express modesty and earnestness.”
Delivery
1. Quality of the voice 2. Movement of the body
1.1. Volume (magnitudo) is innate

2.1. Face(vultus): modest and animated
1.2. Firmness (firmitudo) results from practice
2.2. Body movement (gestus): neither too
1.3. Flexibility (mollitudo) is the focus of rhetorical training elegant nor too rustic
Gestus corresponding to the tones of voice
1.3.1. Colloquial (sermo)
I. Dignified (dignitas): Stay in position, lightly moving the
1.3.1.1 Dignified(dignitas): calm and reserved (almost
right hand; face expressing emotion corresponding to the
verging on tragic delivery) = I
subject
1.3.1.2 Explicative (demonstratio): calm, subdued,
II. Explicative (demonstratio): Body inclined forward
frequent pauses = II
III. Narrative (narratio): Stay in position, lightly moving the
1.3.1.3 Narrative (narratio): tempo varies from rapid to
right hand; face expressing emotion corresponding to the
slow, from reproachful to kind, from joyful to
subject (= I)
sad—depending on subject = III
IV. Witty (iocatio): Face expressing joy; no gestures
1.3.1.4 Witty (iocatio): gently quivering, with a
suggestion of a smile = IV V. Sustained Debate (continuatio): Quick gesture of
the arm, keen glance
1.3.2 Debate (contentio) VI. Broken Debate (distributio): Extend arm very quickly,
3.2.1 Sustained (continuatio) fast, uninterrupted, with walk up and down, stamp the right foot, adopt a fix look
slightly increased volume = V VII. Hortatory Emphasis (cohortatio): Slower and more
3.2.2 Broken (distributio) loud exclamations, frequent deliberate gesticulation than quick gesture of the arm,
pauses =VI keen glance
VIII. Pleading Emphasis (conquestio): Slap your thigh,
1.3.3 Emphasis (amplificatio) beat your head; use calm and uniform gesture and sad
1.3.3.1 Hortatory (cohortatio) = subdued tone, and disturbed expression
moderately loud, very fast = VII
1.3.3.2 Pleading (conquestio) = restrained voice,
deep tone, frequent pauses = VIII
Fig. 22.2: Rhetoric for Herennius on Delivery
In this system, the orator’s autonomy is severely limited by the network of rules and
subsidiary rules that he must follow when using his body in the service of his text.
This mechanical approach coincides with a fear that the voice of the speaker who
would follow prescriptions unthinkingly might become too similar to that of an actor:
“it will be proper to use the calmest and most restrained voice possible [speaking]
with full throat, in such a way, however, that we should not move from the conventions
of rhetoric into tragic [delivery]” (3.24). The advice on deportment (gestus) contains
similar words of caution: “Deportment should be neither elegant nor clumsy, so that
we do not look like actors (histriones) or seasonal workers.”
The wealth of technical detail is, it seems, proportional to the degree of anxiety about
the status of a speaker who would follow this advise to the letter. Cicero’s writings on
delivery, composed a little later than the Rhetoric for Herennius, are less concerned with
detail and instead express faith in the speaker’s judgment.
6. Cicero: Controlling body language

Like some of the Sophist theorists of delivery, Cicero was a practitioner with vast expe-
rience; he also had an interest in the history of oratory, which he understood as the
history of great orators. Most of his systematic comments on the art of rhetoric are ex-
pressed in works entitled On the Orator (55 BCE) and Orator (46 BCE). In both of
these works, Cicero concurs with the view ascribed to Demosthenes that delivery is the
dominant factor in oratory (Orator 56, On the Orator 3.213). Although he does so with
a touch of embarrassment, Cicero perceives rhetorical delivery as being akin to “the friv-
olous skills of stage actors” (On the Orator, 3.213). (On the connection between acting an
oratory in Roman theory, see Dutsch 2002, 2007; Fantham 2002; Graf 1994).
But Cicero, in addition to being an experienced speaker and keen critic of acting,
was also an erudite versed in Greek philosophical writings. Thus his theory combines
a fascination with the potential of skilled delivery with strong Peripatetic influences
and an underlying conviction that human bodies naturally transmit information. When in-
dicating, at the beginning of On the Orator (Cicero 1.18), that delivery (actio) is the art of
controlling (moderari) the movement of the whole body (motus corporis), hand gesture
(gestus), facial expression (vultus), as well as the strength and modulation of voice, Cicero
is probably drawing on Peripatetic sources. The source for his statement in On Invention
that delivery consists in “adjusting voice and body according to the status of the topic and
speech” is probably similar (Cicero 1.9). Like Peripatetic theorists, Cicero also discusses
voice in the greatest detail and views delivery as primarily concerned with emotions:
“every emotion (motus animi) has received from nature, as it were, its own facial expres-
sion, its own sound and its own gesture” (On the Orator 3.216). These remarks do not
form, however, an all-encompassing system comparable to the one found in the Rhetoric
for Herennius. The closest Cicero comes to a systematic description of the use of voice
is in Crassus’ lecture, in which the speaker states that one needs to begin with the most
natural tone, then gradually raise the pitch (On the Orator 3.227).
What we have not seen in the earlier writings is Cicero’s perception of the act of im-
plementing these principles as a manifestation of the speaker’s mastery over his body.
This exercise in self-control comes into focus in the discussion of delivery in The Orator:
The orator will use movement in such a way that there will be nothing superfluous; when
gesticulating he will remain straight and tall; his pacing back and forth should be controlled
(moderata) and never for a long time or far. There should be no effeminacy in his bending
of the neck, no twirling of fingers, no counting of the rhythm with fingertips. He will instead
be in control of himself (se ipse moderans) [as manifest in] a manly posture of his entire
torso and both his sides, extending his arm in debate and dropping in calmer moods.
(Cicero 59)
The key word in this passage is moderari, “to manage, control.” Cicero represents the
orator both as the puppeteer and the puppet, his mind controlling the entire body and
forcing it to broadcast, simultaneously through several channels, the strength, stability,
and self-control expected of an elite Roman man (on Roman masculinity, see Gleason
1995 and Gunderson 2003). As long as this elite man uses his body as a cipher for his
status, as he would have been socialized to do, he will not need the detailed instructions
of the kind offered in Rhetoric for Herennius.
7. Quintilian (35–100 CE): On “natural technique”

Book Eleven of Quintilian’s On the Education of the Orator contains the most extensive
and, as Fantham (1982) has argued, original discussion of an orator’s deportment that
has come down to us from antiquity. Although he is aware that pronuntiatio referred
originally to voice, Quintilian follows Cicero’s usage (see On the Orator 3.59.222 and
Orator 17. 55) in applying this term to both voice and deportment. Thus defined, deliv-
ery of a speech is, according to Quintilian, more important than its content (see Dutsch
2002). (See Fig. 22.3) “[Delivery] has an extraordinarily powerful effect in oratory: since
the effect that a speech has on all listeners depends on what they hear – the quality of
the speeches that we compose in our minds is not as important as the way in which we
deliver them.”
1. Voice (vox) 2. Face (vultus) 3. Deportment
Part of Speech Voice Quality Eyes (no expressions described) (habitus corporis)
Introduction Quiet Joy Head nods
Narration Colloquial Sadness (no movements described)
Proof Colloquial/Witty Intensity Consent
Argument Sharper, more emphatic Indifference Refusal
Digression Gentle/sweet/relaxed Pride Affirmation
Epilogue: Anger Modesty
a) Recapitulation Even, short, clear-cut clauses Threat Hesitation
b) Asking mercy Sweet and melancholic Flattery Surprise
Submission Neck Indignation
Mood Voice Quality
Joy Joyful Eyebrows Straight (reliability)
Encouragement Strong Anger (contracted) Bent (subservience)
Hostility Rather slow Sadness (dropped) Hands Stiff (arrogance)
Compliments Gentle and submissive Joy (raised)
Consent (dropped) [Emotions]
Refusal (raised) Aversion
Derision (not specified) Fear
Contempt (not specified) Joy
Loathing (not specified) Sorrow
Confession
Blush/pallor Hesitation
Regret
Nose [Actions]
(no meaning assigned) Demand
Twitching Promise
Wrinkling Summon
Dismiss
Lip movement Threaten
(no meaning assigned) Supplicate
Smacking Question
Curling Negate
[Deictic]
Measure
Quantity
Number
Time
Deictic Adverbs
Fig. 22.3: Quintilian on Delivery
Unlike the author of Rhetoric for Herennius and (to a lesser degree) Cicero, Quintilian
does not privilege voice in his account. On the contrary, he offers a far more extensive
account of gesture and deportment than he does of voice. He writes with contagious
enthusiasm about the expressive potential of every small section of the human body
(eyebrows, fingers, mouth, neck). Like the Peripatetics and Cicero, he subscribes to
the opinion that voice is directly connected to the human soul and has the power to
convey – and provoke – every emotion. He also assumes that the audience’s emotional
response is of paramount importance to the speaker and compares the work of the ora-
tor to that of an actor. Quintilian’s discussion of voice contains, in addition to the old
Peripatetic prescription that the tone of voice must befit circumstances, advice on the
speaker’s manly self-management (reminiscent of Cicero’s comments). We read, for

example, that only strong emotions (grief, anger, indignation, pity) are appropriate
for the orator and that therefore he should shun the Eastern custom of using a singsong
voice in courts (3.58). In the perception of the relationship between the voice and the
body’s visual aspects, however, Quintilian differs from the author of the Rhetoric for
Herennius in a subtle but important way.
Whereas the anonymous author treats the body as subordinate to the voice, Quinti-
lian claims that “the demeanor (gestus) agrees (consentit) with the voice, and together
with it, obeys the mind” (11.3: 65). He thus presents gesture as intimately connected to
voice and speech – but not dependent on voice for guidance – and goes on to point out
that that gestus can convey meaning without words:
For, indeed, not only hands, but also nods of the head can convey our will; the mute use it
instead of language and we can be moved by pantomime dancing without a sound. It is also
possible to understand the disposition of individuals from their face and gait. Furthermore,
although animals have no language, one can sense their anger, joy, and affection in their
eyes and in certain other signs. (Quintilian 11.3: 66)
In this excerpt, Quintilian departs from his earlier (Peripatetic) definition of rhetorical
delivery (as concerned chiefly with emotions) and draws attention to the body’s other
possibilities, such as expressing consent or storytelling. He also alludes to sign language
and seems to assume that non-verbal communication is to some degree used by every
living creature.
The rhetorician seems to be making no distinction between voluntary communica-
tion and involuntary signs that can be read by others who can interpret a person’s dis-
position from their face and gait. Instead, he suggests that the gestures which are
endowed with meaning and “come out naturally along with voice” must be distin-
guished from the “imitative” gestures used by pantomime actors (11.3.88). The most
complex code among his different facets of natural gestus is that of hand gestures. Quin-
tilian writes that these almost surpass spoken word in their intricacy:
As for hands, without which delivery is lame and weak, I can hardly tell you of how many
movements they are capable, for they almost surpass the multiplicity of words. For,
whereas other parts of the body merely help the speaker, the hands so to say, speak them-
selves. Don’t we use them to ask, promise, call, dismiss, threaten, supplicate; don’t we
express revulsion, fear, questions, and negations? Don’t we show joy, sadness, doubt, con-
fession, regret, measure, quantity, number, and time? Don’t the hands have the power to
encourage and forbid, approve admire and show shame? Do they not, in indicating people
and places fulfill the function of adverbs and pronouns? (Quintilian 3.85–3.87)
As I argued elsewhere, Quintilian’s description of the variety of gestures (11.3.86–

11.3.87), although apparently chaotic, dissimulates a theoretical framework (see Dutsch
2002: 264–267, especially Figure 1 and Table 1). In addition to the Peripatetic staple of
emotions ( joy, sadness), Quintilian assumes that gestures can convey actions (such as to
ask, or to promise) and that they sometimes stand for demonstrative adverbs and pro-
nouns, as well as numerals. His reference to adverbs and pronouns is of paramount
importance, because, as Quintilian himself points out, these are concepts derived
from Stoic grammars. The Peripatetics, Quintilian writes in Book One, introduced
the rudimentary distinction between verbs and nouns, classifying all other words as
“conjunctions.” The Stoics later added several new parts of speech, including the parti-
ciple and the adverb (1.4.18–1.4.20). Quintilian’s references to adverbs and pronouns in
his comments on the language of gesture, relying as they do on Stoic theory of language,
thus probably reflect a Stoic theory of gesture. The fact that both references to Stoic
sources and comparisons between gesture and parts of speech are absent from Cicero’s
comments on delivery and from The Rhetoric for Herennius (see Achard 2000: 12),
would confirm the thesis that Quintilian’s comparison is indeed derived from Stoic
thought.
Quintilian is also intensely preoccupied with the possibility that the speaker’s body
might look (as well as sound) “unnatural” (another Stoic motif): the head must be car-
ried “naturally,” as carried too high it expressed arrogance; too low, a slavish disposition
(11.3.68–11.3.69; see Varwig 1976). The eyes need careful attention, as they are natu-
rally expressive of one’s emotions. Although Quintilian thinks that those (probably
Stoic theorists) who argue that the best delivery owes absolutely nothing to art go
too far, his insistence on natural delivery (11.3.10) is quite remarkable and has no par-
allel in the Peripatetic sources. The problem of authenticity arises, for example, when
Quintilian considers whether the emotions displayed by the orator are “true” or “arti-
ficial” and concludes that they are positioned somewhere between truth and fiction. He
advises the future orator that “it is best to feel the affect and imagine pictures of things
and be moved as though by genuine emotions” (11.3.61–11.3.62).
Ultimately, all these precautions serve three purposes: to win the audience’s
approval by displaying manly self-control, to persuade them by the intensity of one’s
conviction, and to stir their emotions. The Roman rhetorician thus arrives at a workable
compromise between the linguistic and psychological approaches, suggesting both that
the body has an innate capacity for communicating diverse concepts in several different
ways and that the speaker must take into special consideration his audience’s likely
emotional response to his own body language. In addition to his focus on nature, Quin-
tilian is also aware of the cultural dimensions of gesture. He often compares his recom-
mendations for a Roman orator with what is deemed acceptable in other contexts, i.e.
among “Greek” or “Eastern” orators, among Roman orators in earlier times, or among
actors on the tragic and comic stage or in pantomime performances.
8. Conclusion
Because delivery required the orator to use his body to represent willfully – rather than
express spontaneously – his thoughts, this particular aspect of the orator’s work was in
the eyes of ancient theorists reminiscent of the task of a dramatic performer. Ancient
writers were acutely aware of the similarities between these two stylized bodily idioms.
The notion that a man speaking to his fellow citizens on important public or legal mat-
ters had to resort to methods akin to acting displeased certain rigorous thinkers, includ-
ing Aristotle, but the awareness of the rules governing the use of voice, posture, and
facial expression in oratory sparked a larger debate on the function of the body in
human communication. By the time of Quintilian, the theory of delivery was a sophis-
ticated discipline that combined a psychology of emotions with a semantic theory that
held that nonverbal communication encompassed several modes (posture, face, gesture,
and voice). Through these modes a speaker was believed to convey information
that was usually linked with spoken utterances. Sometimes, however, especially in the
case of hand gesture, the body was believed to speak a language of its own. Governed
by its own grammar, this language had multiple registers (everyday interaction, pub-
lic speaking, pantomime, sign language) and regional dialects (Rome versus the
“East”).
In sum, while the ancient texts we have at our disposal are focused on technical de-
tails useful for public speaking, they also bear witness to a much broader intellectual
discourse that encompassed not only emotions, but also the genesis of language, and
the very nature of human communication.
9. References
Achard, Guy 2000. Pathos et passions dans l’Ad Herennium et le De inuentione. In: Andreas
Haltenhoff and Fritz-Heiner Mutschler (eds.), Hortus Litterarum Antiquarum: Festschrift für
Hans Armin Gärtner zum 70. Geburtstag, 1–17. Heidelberg: Carl Winter.
Aldrete, Gregory S. 1999. Gestures and Acclamation in Ancient Rome. Baltimore: John Hopkins
University Press.
Aristotle 1992. The Art of Rhetoric. Translated and edited by Hugh Lawrence Tancred. New York:
Penguin Classics.
Boegehold, Alan L. 1999. When a Gesture Was Expected: A Selection of Examples from Archaic
and Classical Greek Literature. Princeton, NJ: Princeton University Press.
Caplan, Harry 1954. [Cicero]: Ad Herennium de ratione dicendi. Cambridge, MA: Harvard Uni-
versity Press.
Cicero, Marcus Tullius 1949. On Invention. Translated by H. M. Hubbell. Cambridge, MA: Loeb
Classical Library.
Cicero, Marcus Tullius 2001. On the Ideal Orator. Translated by James M. May and Jakob Wisse.
Cicero, Marcus Tullius 2004. Cicero’s Brutus or History of Famous Orators; also His Orator, or
Accomplished Speaker. Translated by E. Jones. Whitefish, MT: Kessinger.
Clarke, Martin Lowther 1996. Rhetoric at Rome: A Historical Survey. Revised edition with intro-
duction by D.H. Berry. London: Routledge. First published [1953].
Corbeill, Anthony 2004. Nature Embodied. Gesture in Ancient Rome. Princeton, NJ: Princeton
University Press.
Diogenes Laertius 1853. The Lives and Opinions of Eminent Philosophers. Translated by Charles
Duke Yonge. London: Henry G. Bohn.
Dutsch, Dorota 2002. Towards a grammar of gesture: A comparison between the types of hand
movements of the orator and the actor in Quintilian’s Insitutio Oratoria 11.3 85–184. Gesture
2(2): 259–281.
Dutsch, Dorota 2007. The language of gesture in the illuminated manuscripts of Terence. Gesture
2: 39–70.
Edwards, Michael 2007. Alcidamas. In: Ian Worthington (ed.), A Companion to Greek Rhetoric,
47–57. Malden, MA: Blackwell.
Fantham, Elaine R. 1982. Quintilian on performance. Phoenix 36: 243–263.
Fantham, Elaine R. 2002. Orator and/et actor. In: Patricia E. Easterling and Edith Hall (eds.),
Greek and Roman Actors: Aspects of an Ancient Profession, 362–376. Cambridge: Cambridge
University Press.
Fortenbaugh, William W. 1985. Theophrastus on delivery. In: William W. Fortenbaugh, Pamela M.
Hubby and Anthony A. Long (eds.), Theophrastus of Eresus: On His Life and Work, 269–288.
New Brunswick, NJ: Transaction.
Fortenbaugh, William W. 1986. Aristotle’s platonic attitude toward delivery. Philosophy and
Rhetoric 19: 242–254.
Fortenbaugh, William W. 2007. Aristotle’s art of rhetoric. In: Ian Worthington (ed.), A Companion
to Greek Rhetoric, 107–123. Malden, MA: Blackwell.
Gleason, Maud W. 1995. Making Men: Sophists and Self-Presentation in Ancient Rome. Princeton,
NJ: Princeton University Press.
Graf, Fritz 1994. Gestures and conventions: The gestures of Roman actors and orators. In: Jan
Bremmer and Herman Roodenburg (eds.), A Cultural History of Gesture, 36–58. Cambridge:
Polity Press.
Gunderson, Erik 2003. Declamation, Paternity, and Roman Identity: Authority and the Rhetorical
Self. Cambridge: Cambridge University Press.
Hall, Jon 2007. Oratorical delivery and the emotions: Theory and practice. In: William Dominik
and Jon Hall (eds.), A Companion to Roman Rhetoric, 218–234. Malden, MA: Blackwell.
Harrigan, Jinni A., Robert Rosenthal and Klaus R. Scherer (eds.) 2005. The New Handbook of
Methods in Nonverbal Behavior Research. Oxford: Oxford University Press.
Juslin, Patrik N. and Klaus R. Scherer 2005. Vocal expression of affect. In: Jinni A. Harrigan, Ro-
bert Rosenthal and Klaus R. Scherer (eds.), The New Handbook of Methods in Nonverbal
Behavior Research, 65–135. Oxford: Oxford University Press.
Kennedy, George A. 1972. The Art of Rhetoric in the Roman World, 300 B.C. – A.D. 300. Prince-
ton, NJ: Princeton University Press.
Kennedy, George A. 1980. Classical Rhetoric and Its Christian and Secular Tradition from Ancient
to Modern Times. Chapel Hill: University of North Carolina Press.
Kennedy, George A. 1994. A New History of Classical Rhetoric. Princeton, NJ: Princeton Univer-
sity Press.
Kennedy, George A. 2005. Invention and Method: Two Rhetorical Treatises from the Hermogenic
Corpus. The Greek Text. Edited by Hugo Rabe, translated with Introduction and notes by
George A. Kennedy. Atlanta: Society of Biblical Literature.
Lateiner, Donald 1995. Sardonic Smile: Nonverbal Behavior in Homeric Epic. Ann Arbor: Univer-
sity of Michigan Press.
Long, Anthony A. and David N. Sedley 1987. The Hellenistic Philosophers, Volume 2: Greek and
Latin Texts with Notes and Bibliography. Cambridge: Cambridge University Press.
Lucretius 2011. On the Nature of Things. Translated by Frank O. Copley. New York: Norton.
O’Sullivan, Neil 1992. Alcidamas, Aristophanes and the Beginnings of Greek Stylistic Theory.
(Hermes Einzelschriften 60.) Stuttgart, Germany: Franz Steiner.
Plutarch 1969. Moralia, Volume 8: Table Talk, Books 1–6. Translated by P.A. Clement and H.B.
Hoffleit. Cambridge, MA: Loeb Classical Library.
Plutarch 2010. Plutarch’s Lives, Volume 5 (Lives of the Ten Orators). Charleston, SC: Nabu Press.
Quintilian 1856. Education of an Orator. Translated by John Selby Watson. London: Henry G.
Bohn.
Rabe, Hugo 1913. Hermogenes Opera. Leipzig: Teubner. Reprinted Stuttgart: Teubner [1969].
Sifakis, Gregory Michael 2002. Looking for the actor’s art in Aristotle. In: Patricia E. Easterling
and Edith Hall (eds.), Greek and Roman Actors: Aspects of an Ancient Profession, 148–164.
Sonkowsky, Robert P. 1959. An aspect of delivery in ancient rhetorical theory. Transactions and
Proceedings of the American Philological Association 90: 256–274.
Tranquillus, C. Seutonious 2010. The Lives of the Twelve Caesars: Grammarians and Rhetoricians.
Whitefish, MT: Kessinger.
Unknown 1994. Rhetorica ad Herennium. Translated by Theodor Nüsslein. Mannheim, Germany:
Artemis and Winkler.
Varwig, Freyr Roland 1976. Der Rhetorische Naturbegriff bei Quintilian. Heidelberg: Carl Winter.
Dorota Dutsch, Santa Barbara, CA (USA)

23. Medieval perspectives in Europe: Oral culture and bodily practices 343
23. Medieval perspectives in Europe: Oral culture

and bodily practices
1. Oral culture in the Middle Ages
2. Bodily signs: Structure and semantics
3. Areas of application
4. References
Abstract
Prior to the invention of book printing, Western culture had no efficient storage medium
that served to unburden human memory. Instead of writing a mnemotechnics based on
the visual perception of bodily movements, took over the functions of orienting, identify-
ing, and stabilizing the social order in the medieval period. In medieval instructions on
ecclesiastic and secular norms of behavior, the categorization of the most wide-spread
and relevant bodily practices was ordered according to the various functions of the
body parts involved, such as neck, back, and knee muscles, arms, hands, lips, and facial
muscles. Naturally, such a system of signifiers (like the prostration, the genuflection with
inclined body, the genuflection with erect body, bowing to the waist, bowing to the chest,
the foot kiss, the knee kiss, the shoulder kiss, the hand kiss, etc.) which will be recon-
structed here on the basis of various source corpora, could not encompass all the motion
sequences that took place in the context of different interactions. Nevertheless, it has
proven useful to treat socially meaningful actions as communicative acts within the
contexts of 1) religion, 2) law, 3) ceremonial and 4) etiquette.
1. Oral culture in the Middle Ages

Prior to the invention of book printing, Western culture had no efficient storage
medium that served to unburden human memory. Based on this observation, Walter
Ong constructed his theory of orality, according to which orally communicating peoples
know only what they can bear in memory. According to Ong (1982), oral culture is a
form of culture in which “no one can ‘look up’ anything.” Its members “perceive
words as sounds” that exist only in the moment of their emergence. Primary orality
is thus organized “situationally rather than abstractly” (Ong 1982: 11; McLuhan 1960:
207). As soon as one attempts to grasp Ong’s notion of orality through theories of
cognition and communication, however, one invariably encounters a series of
inconsistencies.
To begin with, Ong’s distinction between “literate” and “pre-literate” applies to
material types of media, whereas his dichotomy of “seeing” vs. “hearing” refers to
the nature of human channels of perception. It would be definitely a wrong assumption
that our ancestors’ sensuous intake capacities differed from ours. In the hierarchy of the
channels of perception and communication, sight is accorded a leading role because it
expands and defines the physiological and cognitive functions of other sensomotoric
subsystems, such as hearing, taste, smell, and touch (Loenhoff 2001: 171). The eye,
which enables such different functions as planning, programming, and geographic
surveying, is the primary precondition for all cognition, control, and cultural semiosis.
In the context of pre-literate culture, the hierarchy of modes of cognition was no differ-
ent from that of other ages. In the Middle Ages, too, human cognitive processes were
guided primarily by the eye. Instead of writing, however, a mnemotechnics based on
the visual perception of bodily movements took over the functions of orienting, identi-
fying, and stabilizing the social order in the medieval period.
Secondly, Ong’s notion of “orality” narrows the context of communication insofar as
it refers only to the exchange of information using language between a speaker and a
recipient. Such a narrowed approach completely excludes from consideration the posi-
tion of the body toward the addressee, for example, as well as the role of eye contact (or
the lack thereof), openness of arms and legs, shoulder position, in other words, all those
behavioral variables which always co-determine the course and character of communica-
tive acts. In order to understand the specificity of oral culture in the Middle Ages, we
must thus comprehend the structural and semantic relationship of verbal and non-verbal
signs to one another and their meaning in the context of pre-literate communication.
2. Bodily signs: Structure and semantics

In what follows, the “sign” will be treated in accordance with general semiotic theory as
the relationship between signifier and signified. If we define the signifier as the percep-
tible movement of a body part, the signified can be defined as the context-bound mean-
ing of such a movement. The relationship of various signifiers to one another and to
their signifieds is characterized by a number of peculiarities in the Middle Ages. In
order to better determine them we will analyze late Roman (Alföldi 1935: 103–111)
as well as Byzantine (De ceremoniis, 10th century) instructions for court ritual, early
medieval instructions for ecclesiastic prayer practices, particularly those of Humbert de
Romans, Petrus Cantor, and St. Dominic (Schmitt 1992: 274–304; Trexler 1987: 43), as
well as descriptions of encounters between medieval diplomats (Voss 1987: 134–136).
In medieval instructions on ecclesiastic and secular norms of behavior, the categor-
ization of the most wide-spread and relevant bodily practices was ordered according to
the various functions of the body parts involved, such as neck, back, and knee muscles,
arms, hands, lips, and facial muscles. Naturally, such a system of signifiers, which will be
reconstructed here on the basis of various source corpora, could not encompass all the
motion sequences that took place in the context of different interactions. Nevertheless,
it has proven useful to treat socially meaningful actions as communicative acts.
2.1. According to body position

The prostration (prostratio), i.e. lying prostrate on the ground, the lying prostration
(prostratio venia)
2.2. According to the function of the knee joints

The genuflection (genuflexio)
(i) The genuflection with inclined body (genuflexio proclivis)

(ii) The genuflection with erect body (genuflexio recta)
2.3. According to the function of the hip joints

Bowing (inclinatio)
(i) Bowing to the waist (inclinatio plena)

(ii) Bowing to the chest (inclinatio semiplena)
inclinationes
(ad renes)
semiplena (minor) plena
genuflexlones
(ad genua)
recta (cum corpore proclivis (cum corpore

erecto super genua) prostrato)
prostrations
(ad talos)
(idem quod genuflexio venia (cum toto

proelivis) corpore)
Fig. 23.1 Medieval practices of bowing
2.4. According to the function of the cervical joints

The head nod (capite nutare, nutus)
2.5. According to the function of the arm joints

The embrace (amplexus)
2.6. According to the function of the elbow joints

Greeting by hand
(i) The handshake (dextram dare, porrigere)

(ii) Removing one’s hat, i.e. baring one’s head (caput aperire)
(iii) Blowing a kiss (basia iactare)
2.7. According to the function of the lips

The kiss (osculum)
(i) The foot kiss (osculum pedum)

(ii) The knee kiss (osculum genuum)
(iii) The shoulder kiss (osculum ulnae)

(iv) The hand kiss (osclum manui)
(v) The cheek and mouth kiss (osculum oris)
(vi) The forehead kiss (osculum frontis)
2.8. According to the function of other facial muscles

(i) Winking at someone (nictatio)
(ii) Raising one’s eyebrows, etc.
The basic body techniques listed above can be subdivided into more complex cate-
gories. For instance, the term proskynesis was generally used to refer to the combination
of the genuflection with a kiss on the feet, knees, or hand, or, somewhat less frequently,
to a combination of bowing and kissing the corner of a garment or a hand. In regard to
the nature of the relationship between signifier and signified, we can further distinguish
between iconic vs. indexical and symmetric vs. asymmetrical bodily signs.
2.9. Iconic vs. indexical bodily signs

While iconic signs postulate a relationship of similarity between signifier (e.g. a portrait)
and signified (a face), indexical signs presuppose a relationship between part and whole.
Iconic bodily signs can be mastered without extensive schooling and are thus repro-
duced relatively easily. Their social reproduction is subject to the principle of analogy.
A simple demonstration generally suffices to ensure the continued use of a given sign.
The recognition of indexical signs, by contrast, relies on the human ability to construct
causal and concessive relationships between a trait (e.g. smoke) and the entire context
of traits (fire). This ability presupposes experience and is acquired over the course of
more extensive training.
In the Middle Ages, the semiotic distinction between “iconic” and “indexical” was
symbolically implemented in a storage medium accepted by the whole of society: the
icon. Depicted on icons, techniques such as the genuflection or the bow possessed
such exceptional potential for communal integration because they could be adopted
by representatives of all social layers, as well as by foreign migrants, without much
effort. In situations of multilingual oral communication, the function of social orienta-
tion was shifted primarily to iconic bodily movements. Consequently, in the context of
religious and ceremonial communication, nonverbal sign systems were to some extent
more important than verbal expressions. The rule of silence (silentium), which increased
the believer’s focus on the optical perception of body movements, was therefore an
important component of the rituals of Roman emperors and Christian priests even in
the early period (Alföldi 1935: 38; Treitinger 1969: 32).
In this context, the emergence of “indexicality” can be understood as a doubly
motivated process. On the one hand, both styles of behavior and media of representing
the elites, such as oil painting and engraving, proliferated and grew more diverse in the
modern era. On the other hand, the noblesse of the modern age increasingly insisted
on the anciennity of its forms of representation: everything that counted was to have
a certain age.
In this manner, courtly society created a system of gestures that, so to speak, did not
function autonomously but referred to the canonical gestures of Christian iconography.
For instance, one did not fully remove one’s hat but only had to raise it briefly or only
touch the brim slightly, as can be observed in the “chapeau” gesture that derives from
courtly etiquette and was preserved by the educated middle class. Accordingly, in acts
of courtly reverence of the seventeenth century, one could slightly bend one’s leg to
refer to something that icons expressed through full genuflection.
2.10. Symmetric vs. asymmetric bodily signs

The distinction between symmetry and asymmetry proposed here refers to the level of
signifiers, i.e. bodily movements. Symmetrical body movements are executed by each
participant of a given interaction either synchronically (e.g. shaking hands, raising
arms for an embrace) or with a delay (turn taking) (e.g. exchanging caresses, bowing
to one another). Asymmetrical body techniques are only performed by one of the
participants of the interaction (e.g. blowing a kiss).
The distinction between symmetry and asymmetry is not identical to the distinction
between equality and inequality, which belongs to the level of signifieds (Schürmann
1994: 159–196). Modern society retains constructions aimed at distinguishing between
equality and inequality even when, for example, symmetry is enacted on the level of phys-
ical contact. A handshake is no longer a sign of equalization today. In the Middle Ages,
however, symmetric and asymmetric body techniques served to symbolically indicate
socially valid hierarchies. In the context of an estate-based society, the relationship
between signifiers and signifieds rested on the principle of similarity and analogy. The ver-
tical stratification of the human body was projected onto social hierarchies, and vice versa.
Symmetrical body techniques thus occurred primarily in diplomatic relations, com-
munication among rulers, as well as in monasteries, as in the example of the ritual
kiss of peace (osculum fidei et pacis), during which the monks approached one another
symmetrically and aligned their cheeks. In general, however, symmetry was not a part
of everyday routine in an estate-based society. Symmetrical uses of the body were anal-
ogous to a legally valid process of establishing equal status. When the German King
Konrad III (1138–1152) refused to genuflect before the Byzantine Emperor Manuel I
“due to the honor of the Roman Empire” (ob honorem Romani imperii), it was de-
manded that both rulers ride toward one another and exchange kisses on horseback
(ut […] ex parilitate convenientes sedendo se et osculando salutarent) (see Fuhrmann
1993: 127 with reference to Arnold von Lübeck’s report in the Chronica Slavorum).
With regard to asymmetrical communication techniques such as bowing to breast
level (inclinatio semiplena) or to the ground (ad terram), they were considered to be
norm violations in the context of medieval etiquette, since social hierarchies at the
time (unlike today) were indicated through symbolic translations into vertical distances
between bodies. In acts of obeisance that adhered to the norm, asymmetric gestures of
homage could not be lacking (Althoff 1997: 202). One of the most important character-
istics of the modern era consisted in its tendency to conceal the explicit semantics of
bodily signs and in the decreased expressive force of body positions with regard to
social relationships of equality and inequality. The transparency of pre-modern spaces
of interaction, in which those who were older, richer, or more powerful were assured
deep genuflection, increasingly dissolved between the sixteenth and nineteenth
centuries. Most often, this process occurred not under pressure from below but, on the
contrary, by the initiative of the upper classes. The following resolution by Kaiser
Joseph II of January 10, 1787 is indicative of this trend. In it, it was decreed
[…] that henceforth, […] the kiss on the hand which men and women offer to the highest of
lords of the highest archducal house, just like genuflections in reverence and the kneeling
before anyone and in all cases is to be ceased entirely, […] likewise no one, […] who wishes
to request or otherwise petition anything, shall no longer kneel, for this is no fitting action
between men, but is to be reserved for God alone. (Schreiner 1990: 128)
Body-related decrees from the eighteenth century illustrate the dissolution of tradi-
tional, visually oriented modes of perception in which social hierarchies were projected
onto the vertical body axis and depicted by analogy. In the sphere of power, a spatial
symbolism of “left” and “right” (the expressions left wing and right wing come from
the British Parliament in the 1700s, when the Liberal party sat on the left and the
Tory party sat on the right) became established by the eighteenth century, while the
symbolism of “above” and “below” in self-representations of powerful political figures
increasingly faded into the background. The new politics aimed at concealing inequal-
ities surfaced in the rhetoric of the “universally human”. It aimed to block the observation
of differences in status and rank on the level of the body.
3. Areas of application
The physical behavior that determined the course of most everyday interactions in the
Middle Ages, such as prayer, conflict resolution, inviting and receiving, greeting and
taking leave, was subject to conscious societal standardization. With regard to medieval
bodily practices, we will examine four sets of norms: religious, legal, ceremonial, and
etiquette norms.
(i) Norms of transcendental communication (worship) fell under the authority of

religious ethics.
(ii) Norms related to murder, incest, theft, and insulting of the ruler, which were essen-
tially based on punitive measures, were codified in the form of laws, carved into
stone, and cut into wood.
(iii) Whereas the law was concerned primarily with precedents, i.e. with what had
already occurred, ceremonial instructions were anticipatory norms that aimed at
preventing the conditions for an unwanted precedent.
(iv) Norms of etiquette, in turn, can be placed into an intermediary zone between cer-
emonial and courtesy. While ceremonial norms were concerned with observing
hierarchies, etiquette norms strove to integrate into the system those who were
considered border crossers, such as couriers, messengers, and pilgrims. Anticipa-
tory norms of propriety were (like legal norms) of an idealistic nature. In other
words, they were not meant to govern the actions of everyone in every situation.
3.1. Religion
The common conception of the Middle Ages as an age of superstition is the result of
mistaken interpretations of bodily worship practices, which continue to be viewed as
demanding an exceptional degree of physical exertion. In fact, the use of the body in
medieval religious rites was subject to a strict system of usage rules (adoratio). Aside
from worship practices, the signified adoratio also encompassed other forms of tran-
scendental communication, such as those between sacred rulers (i.e. emperors) and
their subjects, between artists and patrons, but also between lover and beloved. Eccle-
siastic rituals, however, drew on the reverse of the adoratio in the form of expressions of
humility.
In the context of physical worship rituals, the signifiers were strictly divided into
prostrations (prostrationes), genuflections (genuflexiones), and bows (inclinationes). In
the context of antiquity as well as in the medieval tradition, the prostration was most
often understood as an extended genuflection. This is particularly true of the position
known as genuflexio proclivis, in which the head of the kneeling subject touches the
ground. Humbert de Romans (early thirteenth century) treats the prostratio as identical
to the genuflexio proclivis. Petrus Cantor (1197) also describes the prostration under the
title genuflexiones (see Schmitt 1992: 286). We can infer that the lying body position was
viewed as an amplification of the genuflection in liturgical ceremony. In my classifica-
tion, the term “prostration” only refers to the prostrate position. It is distinct from
the genuflection with its various semantic connotations (see Humbert, early thirteenth
century: 167).
Even though the classifications of prayer postures conducted in the Middle Ages
were based on the knowledge of rituals inherited from antiquity, significant differences
from the latter occurred as well. As a pose whose function was restricted to sacral com-
munication, the prostration symbolized primarily an exaggerated fear of God. As the
religious prostratio venia, this pose was widely used in the Middle Ages. With regard
to the suitability of various body positions for prayer, Petrus Cantor (1197) deemed
“most excellent,” “most serious,” and “most useful” those types of prayer in which
“chest, stomach, arms, knees, upper thighs and toes touch the earth” (“Inter omnes
autem modos orandi ist est quasi melior et fere utilior: iacere in solo, ita quod os et pec-
tus et venter et brachia et genua nec non et crura atque digiti contingant terram” – see
Petrus Cantor 1197: 233; see also Trexler 1987: 190–233; Schmitt 1992: 290). In the
beginning of the thirteenth century, Humbert de Romans also included the full prostra-
tion in his classification of different prayer types. However, he condemned the habit of
“certain laymen” to extend their arms in the form of the cross of Christ and to kiss the
ground. The rejection may be explained by the association of kissing the ground with
pagan practices of earth worship. The prayer instructions of Humbert de Romans
and Petrus Cantor show that the liturgical forms of prayer in Catholic prayer practices
of the twelfth and thirteenth centuries were treated analogously to rhetorical figurae.
Similarly to rhetoric and the art of theatre, the impression of true versus false (e.g.
feigned) holiness (ostentatio sanctitatis) depended on the assumption of a certain
body posture (eloquentia corporis). Lying prostrate was interpreted as an extreme
expression of devoutness in Catholic worship (albeit in a positive sense).
Aside from their concern with technique, twelfth-century prayer manuals consis-
tently distinguish between the space of private and the space of public prayer (Trexler
1987: 43). In this regard, Petrus Cantor refers to certain contemporaries who were
“ashamed to pray lying” in public (“[…] dicit […] verecundor orare […] in terra
prostrates”, Petrus Cantor 1197: 838–839). Moreover, the religious semantics of the
prostrate position is played off against the use of this technique in court ritual:
individuals who are accustomed to falling prostrate before a tyrant ought instead to
prostrate themselves before God (“Itaque cum prosternas te ante pede alicuius tiranni
pro modica pena fugienda, potius teneris te proicere ante deum […]”, Petrus Cantor
1197: 799–802; see also Trexler 1987: 47). Trexler notes: “the opposition of knights to
kneeling and especially prostration is today and was even then cited as a cultural char-
acteristic of the medieval West” (Trexler 1987: 47). The symbolism of prayer techniques
was used to overcome the controversial ethical attitudes of the armed nobility on the
one hand and of a monkhood on the other hand.
Beginning in the late Middle Ages, genuflection with folded hands gained promi-
nence as a competing and more strictly regimented posture during public prayer. In
the West, the symbolically generalizing semantics of worship (adoratio) generally
applied to genuflexio, and not only to the prostrate position (Bolkestein 1929: 31). In
the context of worship, the genuflection was a well-documented phenomenon through-
out all of Europe since antiquity. In addition to the prostration, it can be considered
the basic element of the genuflecting adoratio technique. It must be noted, however,
that Greek “Gods of state,” in contrast to later depictions of Christ, were generally
not worshipped by genuflection (Bolkestein 1929: 29). The motif of kneeling that ap-
pears on ancient Greek vases is associated primarily with female figures. Kneeling
girls essentially rested on their heels without inclining their upper body forward (Walter
1910: 244).
The genuflection only gradually established itself in the context of ancient Roman
courtly ceremony. Only in the third century did the Christian Church recognize the gen-
uflection before the Roman emperor as a valid ceremony, which in turn led the imperial
state to legitimize genuflection in Christian prayer practices (Alföldi 1935: 77). From
then on, the practice of genuflection was preserved in particular by those who had at-
tacked it most vigorously during confrontations with Roman emperors: the Christian
clergy. This was achieved by means of a reinterpretation of a New Testament passage
according to which the Three Holy Kings had worshipped the Infant Jesus on their
knees as both God and King. Bishops and priests in particular came to feel included
in both state and church hierarchies by the adoratio practice. The history of medieval
rule saw many periods during which the pope genuflected before the emperor, or,
much more frequently, the emperor genuflected before the pope. Thus Pope Leo III
performed the adoratio before Charles the Great (et post laudes ab apostolico more
antiquorum principium adoratus est – Ann. regni Franc. a. 801). It remains unclear
whether the adoratio in this context must be interpreted as full prostration, genuflec-
tion, or bowing. In view of the Roman adoratio practices known to us, the use of the
prostration (prostratio venia) by the pope before the emperor is rather unlikely. Bowing
was a common greeting ritual among free citizens and was thus less appropriate to the
context of the adoratio. Genuflection thus appears as the most likely option. Since the
coronation of Charles the Great, the proskynesis of the pope before the emperor had
not been repeated (see Folz 1964: 175; Kelly 1986: 98). In the late Middle Ages, it
was the emperor, however, who was expected to kiss the pope’s feet during the corona-
tion ceremony. The year 1209 even saw the introduction of a double foot kiss whenever
the pope himself performed the coronation (Wirth 1963: 176; Hoffmann 1990: 44–46).
Prayer and acts of penitence involving repeated prostrations (metanoeae) became
almost completely obsolete in Western Catholicism over the course of the twentieth
century. By contrast, they continue to be practiced until today on Mount Athos and
among the Old Believers of the Russian Orthodox Church (Baumgartner 1967: 225). In
prayer instructions for Russian Old Believers edited in the late nineteenth and early
twentieth century, for instance, prostrations are proscribed for the beginning and end
of worship services. Eastern Orthodox instructions for prayer correspondingly give
the following suggestions: “When you are to make prostrations do not beat your head
against the floor of the church or home. Simply bend your knees and lower your head,
but do not strike the ground with it. Move both of your hands away from your heart
and properly place them on the floor. Do not extend them like two axes […]” (Syn
cerkovnyj 1894: 25–27; Robson 1995: 49ff.).
3.2. Law
The medieval act of jurisdiction comprised both acoustic and gestural aspects of the
speech act and was interpreted as a physical act. Since antiquity, the proper connection
of physical and verbal expressions in the context of legal eloquence was determined by
the rhetorical teaching of actio/pronuntiatio. Instructions on the use of gestures in the
context of medieval law can be inferred from the Sachsenspiegel, for instance. The illus-
trations in this thirteenth-century manuscript served to depict instructive patterns of
behavior within the courtroom. Thus, for instance, one illustration depicts the Gaugraf,
who invites the Bauermeister (or head of the township, indicated by his straw hat) to
speak through a polite hand gesture (his right hand describes an offering movement
in the direction of a group of four men). Three Landsassen stand behind the Bauerme-
ister, their hands lowered with palms facing inward in a gesture of reverence and patient
expectation. A fourth Landsasse, not quite conscious of his behavior, expresses a neg-
ative position (see Die Sprache der Hände: 10). Illustrations such as these served to
familiarize plaintiffs with the nature of legal procedures.
Although late Roman rhetoric in its most widespread form generally permitted and
approved of the use of hand gestures in the pronuntiatio, it also made allowances for the
alternative to the speaking hand. In this context, it was noted that the toga of the
ancient Greeks did not feature a pad. Judges and lawyers of the time, whose arms
were kept inside the garment, had to resort to another form of gestures. While the cloth-
ing of the “old orators” concealed both hands, in later garments one hand remained
free for gesticulation. In Quintilian’s Institutionis Oratoriae Libri XII (see there 11, 3:
84–124ff.) restrictions on gestures concern primarily the right but not the left hand
(manus sinistra), most likely under the assumption that right-handers held the toga
with their left hand (Maier-Eichhorn 1989: 114ff.).
The tradition of medieval rhetoric preserved the “hand-out-” and “hand-in-” posi-
tions as placeholders for two contradictory meanings, each of which could claim a cer-
tain validity. Whereas the first position was viewed as the embodiment of action, the
second signified a demonstrative renunciation of action and was thus a symbol of
sober-minded self-restraint (Fleckner 1997: 124ff.).
The tendency of medieval judges to conceal their hands in their garments can be
gleaned from the works of numerous Baroque authors. In his instruction on the use
of rhetorical gestures (Chironomia 1644), for instance, John Bulwer indicated that
the hand that was concealed and kept inside the garments was an expression of “mod-
esty” and “frugal pronunciation.” This interest in early Christian ethics of modesty
found its expression in increased significance of Hellenistic iconography. The statue
of Aeschines of Macedonia (390–331 BC), who is depicted with his hand resting inside
his dress, was considered an authoritative example. In his speech against Timarchos,
Aeschines invoked the ideal of the public orator of past times, who had addressed
the people like Solon with their hands tucked into the folds of their garments. “They
were too modest,” Aeschines stressed, “to remove their hand from their garments
during their speeches” (Aeschines, Against Timarchos: 25).
Not just the act of jurisdiction but every speech act was interpreted as a bodily act in
the context of oral culture. The application of legal sanctions rested on a complex sys-
tem of relationships through which the harm done to the individual body was translated
into the degree of damage done. High treason was thus punishable by beheading and
theft by cutting off a hand. Even the invalidation of a speech act had a physical dimen-
sion. It occurred on a non-verbal level, by means of forced exile, legal censorship of
speech, and not least through the removal of one’s tongue.
Up until the late eighteenth century, legal sources did not distinguish between bodily
versus speech acts in their treatment of cases that led to the tearing or cutting out of the
tongue (see Keller [1921] 198; Quanter 1901: 175–178; Schuhmann 1964: 39). Thus, the
“illocutionary” power and the “perlocutionary” injury caused by “profane” statements
were suppressed directly on the level of the body. Because the word of God was consid-
ered as community fostering, the existence of the community was considered threatened
by the presence of blasphemy. The increasing standardization of word and sentence
semantics within the context of Franciscan and Dominican education helped develop
the preconditions for formalizing legal codes, including laws about speech.
The intensification of the punishment for blasphemy (gotzschwür) is highly charac-
teristic for the late Middle Ages and the eve of the Reformation (see Schuhmann
1964: 37). Norm violations due to abusive expressions or curse words directed against
God and the saints, or deeds or gestures directed toward holy objects, were punishable
by pillory or removal of the tongue, depending on the corresponding legal measure.
Furthermore, the cutting out of the tongue was frequently used in the case of other
“foul” expressions, such as perjury, and occasionally because of false allegations, libel,
and extortion. The tongue was either removed from the throat in its entirety or, as in
the sixteenth century in Memmingen, shortened by the width of two thumbs: “Im Jahr
1368 in vigilia Magni wart Heinrich der Schwertfurb […] gestellt in den branger und dar-
nach wart im die zung uz dem hals geschnitten und die stat ewiglich verboten […]” (“In
the year 1368 in vigilia Magni Heinrich der Schwertfurb was placed onto the pillory,
whereafter his tongue was cut from his throat, and he was forever banished from the
town.” Buff 1877: 166).
The unification of punitive measures demonstrated, for instance, by the Carolina
(1532), or Peinliche Halsgerichtsordnung, of Emperor Charles V, encompassed forms
of punishment for “criminal” acts of speech that were now increasingly treated in the
context of national law: “Abschneidung der zungen Offentlich in branger [Pranger]
oder halßeisen gestellt, die zungen abgeschnitten, und darzu biß auf kundtlich erlau-
bung der oberhandt [Obrigkeit] auß dem landt verwisen werden soll. (“Cutting of ton-
gues: [they are] to be publicly placed on the pillory or into the iron collar, their tongues
cut out, and until public permission by the authorities expelled from the land.” Cited
after Carolina § 198. see Quanter 1901: 176; Hensel 1979: 74).
The transformation of the legal order of Modern Europe was reflected in the increas-
ing value ascribed to conversational rhetoric (Linke 1996: 129). The process by which
social life became increasingly codified created the background conditions that trans-
formed verbal media into the only possible means of communication and conflict reso-
lution. Whereas the stability of the medieval social order largely rested on acquired
body techniques, modernity put greater emphasis on verbal skill, which increasingly
dominated society as a technique of information exchange (Fauser 1991: 123). The
art of conversation, differentiated from both law and ceremony, became a trans-societal
value over the course of the eighteenth century and was even accorded the role of the
primary medium in social communication. Christian Thomasius encapsulated this trans-
formation in a single sentence: “The basis of all societies is conversation” (Thomasius
1710: 108).
Close attention to the historical relationship between conversational practices and
punishment by removing the tongue, with a particular stress on the interrelation
between rhetorical and brute-force mechanisms of blocking speech, shows that the
social rhetoric of early modern Europe integrated methods of communication that
aimed at cancelling out speech acts into social practices. They enabled the cancelling
out of speech acts primarily on the “rhetic” level, by drawing on rhetorical patterns
such as the following: “Let’s change the subject,” “Let’s not talk about that now”
(Freidhof 1992: 22b).
Early modern conversation manuals propagated polite expressions that, for instance,
(i) contained information about the receptivity of the listener (“I don’t want to hear
anything about that”); or
(ii) limited the amount of information communicated (“Enough of it,” “Let’s drop the
subject”); or
(iii) led the conversation in a new direction (“Let’s talk about something else
instead”); or
(iv) imputed to the speaker a pragmatic, or “illocutionary,” objective not intended by
his conversational move (“You must be joking,” “You can’t be serious,” “What a
question”).
One dialogue in a bilingual conversation manual reads as follows: “Sir, you’ve put on
considerable weight since your marriage” – “Surely you are joking!” The form of
such dialogues reflects the actual everyday demand for non-violent prohibitions on
speech and corrections of speech intentions in frequently recurring conversational
situations. In line with Radtke’s observation, we must note that most conversation man-
uals before the early seventeenth century were rather rudimentary. Only in the begin-
ning of the seventeenth century did notable foreign language grammar books begin to
be published more frequently. Both the foreign didactic literature used in teaching as
well as the basic grammars were designed rather ordinarily in the seventeenth century.
These foundations were essentially adopted in the eighteenth century and the basic
approaches were retained (Radtke 1994: 48–56).
3.3. Ceremony
Ceremony served to symbolically depict and thus stabilize social hierarchies. On the
level of signifiers, the ceremonial normativization of behavior focused on bodily ( proxe-
mic) distance and asymmetric body techniques such as the bow and the genuflection.
On the level of signifieds, ceremonial systems of proximity aimed primarily to depict

a temporal difference between generations (“older” vs. “younger”) as a spatially con-
strued difference (“far” vs. “near”). Each member of a given community was assigned
a firm position and a fixed distance vis-à-vis the other within its system of pedigree.
Secondly, ceremonial distance established itself as protective distance in the context
of male rulers. A distance of over three meters increased the difficulty of quickly using a
bared weapon such as a dagger or sword. Accordingly, as late as the sixteenth century,
Western diplomats were led toward Ottoman rulers while being held at both arms (“Die
zwen Bassa khamen unnd namen den Grauen [den Grafen Niclas Salm]. Jeglicher bey
ainem Armb, unnd füerten den zw dem Khaiser […]”, Herberstein 1517: 334). Begin-
ning in the late seventeenth century, this also became the valid way to approach rulers
in the Russian court (“Beym Handkuss wurden wir an die Arme gefasst […]”, Kämpfer
[1683] 1827).
Thirdly, ceremonial distance served to depict the asymmetry of social status relation-
ships. The sequence of steps during the approach was usually determined by a unipolar
(as opposed to a bipolar) vector, so that only one of the parties could move. As a result,
the mutual approach of individuals of higher and lower rank was evaluated in different
ways – in the case of the former as the right to a patronizing show of mercy; in the latter
as the duty to show respect. The same rule of asymmetry was valid both on the non-
verbal and on the verbal level.
Fourthly, ceremonial distance was considered a non-dialogical form of representa-
tion. Only the higher-ranking party was allowed to abridge the distance, whereas the
“gemein man” (“common man”) was forced to observe the indicated spatial distance.
Those who were accorded a lesser role in the discourse according to their rank were
neither allowed to initiate nor to conclude the conversation, nor to freely choose and
elaborate the topic. In other words, free movement in the context of the conversation
was forbidden. Ceremonial distance consequently served to block the mechanism of
interpersonal rhetoric.
The respect quotient of a ceremonial obeisance was indirectly proportional to the
extent of verbal communication. Put differently, the less was said during ceremonial
acts of reverence, the more momentous the obeisance. As a rule, ceremonial shows
of obeisance with sacral semantics were only executed non-verbally. The ceremonial
silence that was mandatory in the presence of the Roman emperor was characteristic
of such cases. When the emperor approached, silence (silentium) was enforced. The
meetings of the emperor’s privy council were therefore also referred to as “Silentium.”
The office of the “Silentiarii” gradually developed into an important state organ. From
the point of view of comparative cultural studies, however, it must be noted that
ceremonial silence was much more strictly observed at the Byzantine, and later the
Muscovite, Court than in the Roman Empire.
Fifthly, ceremonial distance was implemented as a gaze-specific distance that
enabled public representations of power to be perceived visually. The possibility of po-
sitioning oneself at viewing distance to the ruler was perceived as a sign of privilege in
all early- and pre-modern societies. Conversely, in Byzantium, Rome, Vienna, and at
the Muscovite Court it was forbidden to turn one’s back to the ruler during an audience
with the emperor or tsar. Following the rules summarized in the Byzantine ceremonial
codex De Ceremoniis Aulae Byzantinae (tenth century), patricians had to back away
from the emperor toward the door in reverse (opisthopodeı̂) following proskynesis,
so that their face remained turned toward the emperor (De Ceremoniis I: 32; II: 56;
Vogt II: 160).
Alföldi’s 1935 [1970] studies on the genesis of court ceremony in the West have
shown without doubt that rituals such as observing a ceremonial distance to the ruler
and performing asymmetrical gestures of obeisance (such as genuflection) slowly
emerged within the context of late Roman imperial ceremony. It is symptomatic that
the introduction of proskynesis at the court of Diocletian (284–305 AD) occurred
against the background of a far-reaching codification of administrative and everyday
life. Aside from the submissive genuflection, such standardizing measures included
the introduction of a uniform trade tax, the subdivision of sovereign territory, the intro-
duction of price regulation, the standardization of coinage, and finally the far-reaching
bureaucratization of the administrative apparatus. “Principate” was replaced by “Dom-
inate” and henceforth the Emperor expected to be honored as Dominus rather than as
the Princeps among equals.
It was as late as the third century AD that genuflection gained entrance into the sys-
tem of government of late antiquity, and later that of the Middle Ages. Diocletian, who
ruled from 284 to 305 AD, may have been the first emperor to firmly incorporate it as
part of court ritual. Quite symptomatically, one of the most important preconditions for
this was the decline of the greeting rituals exchanged between the emperor and his se-
nators, which involved a mutual kiss and emphasized equality. Even Tiberius (14–37
AD) already replaced the daily receptions in the early morning with the simultaneous
arrival of all council members, apparently because of administrative concerns. During
this transition, the “day-to-day kisses” (cotidiana oscula) were also abolished as the
epitome of salutatio, which had the effect of increasingly subjecting greetings to the
control of state authorities (Alföldi 1935: 41).
Henceforth the genuflection (genuflexio) became the epitome of imperial ceremony.
Emperor and king kneeled during the unction, vassals kneeled during commendation
and obeisance, and knights kneeled while being knighted (see Amira 1943). Aside
from the common distinction between falling on one knee (simplex) and on both
knees (duplex), the question as to whether one should kneel with erect or lowered
upper body continued to lead to disagreements. In the first version (genuflexio recta),
the upper body was aligned orthogonally to the axis of the knee. In the second variant
(genuflexio proclivis) the upper body of the kneeling subject was inclined forward toward
the ground. The resulting inclined angle was often so large that the head touched the
ground, as was the case in the Chinese kau-tau and most likely in the Russian “bowing
down” (čelobitie) (see Sittl 1890: 160). Western rules for the knave and page service,
for instance, proscribed kneeling on one knee before every lord, “whoever he may be”
“Hold up youre hede, and knele but on oone kne // To youre sovereyne or lorde, whedir
he be” (The Babees Book V. 62ff., cited after Mahr 1911: 18). In contrast to Western
courts, at the Moscow court of the seventeenth century it was customary for the head
to touch the ground during the adoration of the Tsar. (According to reports of the
time, Muscovite diplomats during a reception on October 10 1654 “each time bowed with
their foreheads down to the ground”, “sich idesmalhs mit der stirn gantz auff die erden gen-
eigt.” This form of reverence was accordingly titled “reverence in the Muscovite manner”,
“die reverentz auff die moscowitische arth”).
While the genuflection became a part of papal and imperial ritual in the early Middle
Ages, use of the foot kiss generally remained restricted to acts of homage before the
pope (Schreiner 1990: 118). Due to the resistance of the West European knighthood
and West European kings, the foot kiss was rarely used as a ritual of paying obeisance
to the ruler in the West Roman Empire. In the few extant references to it, this gesture
only appears as an act of submission: thus the knights of Milan kissed the feet of
Barbarossa in 1182. In the East Roman Empire, however, the foot kiss was practiced
regularly during acts of obeisance to the emperor and the patriarch. It is likely that
the Roman popes adopted the foot kiss based on Byzantine court ritual, since, begin-
ning in the seventh century, the deacons who read the gospels during the papal mass
approached the throne after reading in order to kiss the pope’s feet. In the context
of papal elections, too, both ecclesiastic and secular laymen kissed the feet of the
newly elected head of the church.
According to Schreiner, the earliest documents that prove beyond a doubt that even
high-ranking secular persons in the West kissed the Pope’s feet date to the ninth cen-
tury. However, the foot kiss was not promoted to an obligatory part of imperial crown-
ing ceremonial until the beginning of the twelfth century. For instance, during the
procession leading up to the coronation on February 12th 1111, Heinrich V kissed
the feet of Paschalis II, who was waiting before the gates of St. Peter. Papal demands
for the foot kiss ritual were often justified by references to biblical role models, but pri-
marily through Christological associations. (See Ps 3, 11: Request to kiss God’s feet;
Lk 7, 38; 45: Mary Magdalene kissing the feet of Christ; Mt 28, 9: Proskynesis of the
women before the risen Christ; Apg 10, 25: the Centurion Cornelius kissing the feet
of St. Peter, Schreiner 1989: 1063.)
Aside from the foot kiss, a number of other foot-centered acts of obeisance hovered
near the boundary of the allowed and the reasonable. These, for instance, included the
washing of feet and the so-called “strator service” performed at the stirrups. In the
twelfth and thirteenth centuries, a rivalry arose in the symbolical territory of corre-
sponding rituals of obeisance between the papal and princely powers in the West. As
a consequence, since the Middle Ages foot-related obeisance gestures corresponded
far more closely than the genuflection to a “border of degradation,” the transgression
of which could cause irreversible damage to the self-perception of those involved.
Such gestures presumably presupposed a well-calculated mutual understanding between
the environment and those affected and were mandated primarily in the context of
national law and rituals of legal ratification.
In all likelihood, the manner of execution of the foot kiss, the washing of the feet,
and the strator service was never completely spontaneous. Most likely, the “script”
was always agreed upon in advance by both communicating parties (Ostrogorsky
1935: 200). But even a provisional agreement could hardly ensure the full predictability
of the ritual cases where the subject in the lower position used the possibility to diverge
into a ritual of evasion. For example, King Frederick Barbarossa’s refusal to provide
Pope Hadrian IV his services as stableman (1115) was almost certainly a diplomatic
trap for which the pope was not prepared. He responded to the king’s unwillingness
to hold the stirrups by refusing him the kiss of peace. The damaged frame of communi-
cation had to be repaired by means of a corrective exchange, until the King finally as-
sented to perform the strator service in the form of a simple ceremony. The ritual of
kissing the feet, which consisted of a ceremonial approach on horseback, followed by des-
cending, kneeling, and kissing the feet, therefore generally presupposed the “script-safe”
behavior of the involved parties. Apparently, decisions about details of staging were less
the responsibility of the emperor or the pope than of the masters of ceremony, who,
among other things, were to ensure an illusion of spontaneity and veracity. Such is the
impression conjured by the description of a foot kiss ritual in 1177, performed by the
newly crowned King Frederick I Barbarossa before Pope Alexander III:
Frederick approached the Pope, removed his red robe, threw himself onto the ground before
Alexander with arms stretched apart, kissed his feet [“like the first of the apostles” – deos-
culatus eius tanquam primi apostolorum pedibus] and then his knees. Touched to tears, the
pope rose slightly, helped the emperor stand up, embraced his head with both hands, gave
him the kiss of peace [pacis osculum] and bade him sit by his side. (Althoff 1997: 229–258)
As stylized eyewitness report shows, the aim of the foot kiss was less to indicate real
positions of secular and ecclesiastical power than to elicit assertions about the legiti-
macy of a ruler’s sovereignty from the audience. Ultimately, this approach served to
reveal the channels of communication between primary and secondary social elites.
Correspondingly, the foot kiss offered to the pope by the emperor was justified in var-
ious ways by contemporary opinion: on the one hand, it was viewed as a simple cere-
mony, on the other, as one of the acts of reverence befitting the first apostles, and
thirdly, as an act of obeisance due only to God. Explanations for the rationale behind
Barbarossa’s gesture vary from source to source. According to Bosso, the emperor per-
formed the foot kiss “according to the model of the first of the apostles,” i.e. Saint Peter.
According to Romuald, the gesture was directed exclusively at God (Deum in Alexan-
dro venerans). According to the annals of Pisa, Barbarossa offered due reverence to his
spiritual father (li fece debita reverenzia come a Padre spirituale) (see Hack 1999: 654).
Based on the thesis that in most documented cases the function of the foot kiss ritual
was ostentatious and directed toward the medieval urban public, we can assume that
the termination of this ritual (with the last papal coronation, that of Charles V in
Bologna, 1530) was sanctioned less by its participants than by the audience.
The intensive reformatory polemics against the kissing of the pope’s feet coincided
with a tendency to raise the middle-class body and to reduce the physical strain of obei-
sance rituals. In his Tractatus de Ecclesia, Jan Hus characterized the foot kiss demanded
by the popes as an expression of anti-Christian conceit. The heresies of which Hus was
accused during the Council of Constance (1414–1418) also included his notion that the
pope merited neither the title sanctissimus nor the foot kiss. In the year 1468, Emperor
Frederick III refused to kiss the feet of Pope Paul II because otherwise he “might not
have preserved the majesty of the empire.” Furthermore, Martin Luther’s 1520 treatise
“to the Christian nobility of the German nation” condemned the “Fußkussen des
Bapsts” (“kissing of the pope’s feet”) as “Endchristlich exempel” (“an anti-Christian
example”) (see Hus 1413: 98; Schreiner 1990: 123–124ff.). The tendency toward obscur-
ing ostentatious gestures, which also manifested itself in secret state policies and the
clothing habits of the Jesuits, suggested that the faith of both the Catholic Church
and secular power in iconic depictions of canonical devotional gestures had lessened
significantly toward the mid-sixteenth century.
3.4. Etiquette
In the Middle Ages, etiquette was applied in situations where the boundaries of differ-
ent social systems had been crossed. This was the case with migrants, such as couriers
and pilgrims, which travelled from land to land. The necessity of implicating such border
crossers in a foreign system of relationships led to a series of conversion mechanisms that
enabled a partial equalization of insiders and outsiders.
In the following, I suggest a distinction among three types of equalization: equalization
in the sense of leveling, equalization in the sense of ritual inclusion, and equalization in
the sense of identical means of exchange.
The first aspect, leveling, refers to a symbolic inversion of social hierarchies of
power. Most medieval leveling rituals, such as the washing of feet, occurred primarily
in monasteries, but spread from there into secular space in the form of Christological
rituals. A specific example of leveling in governmental greeting rituals is the exemption
of subjects from genuflection, i.e. by raising the supplicants from the ground (allevatio
supplicis). The increasing tendency toward social leveling in public bodily practices was
characterized by the desemanticization of greeting and parting phrases that stress sta-
tus. This is the case in the German servus (Latin ‘the slave’) and the Italian ciao,
which derives from schiavo (likewise ‘the slave’) (see Prati 1936: 240ff.).
The second aspect of the notion of equality encompasses the semantics of inclusion.
In most traditional cultures, the latter corresponds to a symbolic restoration of primor-
dial kinship relations, such as by ritual “fraternization.” In such cases, “equality” is
established by force of belonging to the same extended family or community of Chris-
tian believers. This process is exemplified by the fraternal kiss of peace, the so-called
osculum fidei et pacis, which was characteristic for medieval peace agreements. In socie-
ties where communicative accessibility was reserved for a few houses of nobility related
through marriage, the symbols of “primordial” relationship generally had legal status.
They therefore guaranteed the preservation of social boundaries (Voss 1987: 134–
136). The treatment of the excommunicated in the Middle Ages shows that the so-
called “brotherly kiss” was also considered as a sign of inclusion. In addition to the
kiss of peace, excommunicated subjects were to be refused company at the table, polite
address, greeting words, as well as conversation (“In quinque maxime vitandi sunt
excommunicati, videlicet in mensa, oratione, salutatione osculo pacis et in colloquio”
Summa de Confessione des Petrus von Poitiers, around 1200, cited after Fuhrmann
1993: 118).
The third aspect of equalization implies the identity of the rules of the game and the
means of exchange, and thus presupposes the notion of communication as an exchange.
One example of this phenomenon is the English “handshake.”
Ritualized inclusion into the family was among the preferred forms of equalization
in the Middle Ages. Only in modern Europe did the unification of means of exchange
become of greater importance. Documents prove that in the Middle Ages the kiss of
peace and the fraternal kiss (osculum pacis) were exchanged between strangers (see
Strätz 1999: 1590ff.). But outside of the church, too, it was typical for “community mem-
bers to exchange the fraternal kiss after communal prayer and before the beginning of
the Eucharistic part of the mass” (Schreiner 1990: 101ff.). According to popular belief,
the ritual of the fraternal kiss can be traced to a rule that the Apostle Paul imparted
to the faithful: “Greet one another with the holy kiss [Salutate invicem in osculo
sancto]” (This is nearly a fixed expression in the letters of St. Paul. St. Paul, Rome
16, 16; I. Cor. 16, 20, II. Cor. 13, 12; I. Thess. 5, 26 here: invicem fratres omnes) (see
Voss 1987: 139). It is also very likely, however, that the legal-protocolary symbolic
power with which this kiss was vested not only in the normativized ecclesiastic space
of communication in the Middle Ages, resulted from the nature of pre-modern socie-
ties. In the latter, social boundaries were determined not by formal organizations, but
primarily by sensorily perceptible interactions in the context of exchanging gifts. One
can even assume with great likelihood that medieval princely law served as an essential
precondition for the spread of this ecclesiastic gesture. In the context of the fraternal
cheek kiss, we can refer to the law of the “family of kings,” according to which the mem-
bers of the ruling family were required to address each other using the brotherly “you”
(“Du”) and to kiss. Dölger indicates “that the kiss between kings was a well-known
institution in the Middle Ages” (Dölger 1976: 34–69). In the meeting of kings in
Ruodlieb, which largely reflects eleventh century ceremonial, according to Voss, the
use of the kiss as a greeting between rulers is also entirely taken for granted (Voss
1987: 142). A typical example of later exchanges was the meeting between Heinrich
VI and Philipp II in Milan (1191), where the German emperor received the French
ruler in osculo pacis (Voss 1987: 139). This form of greeting by kiss is also documented
in the context of East- and West-Franconian encounters, such as in the meetings
between Otto II and Lothar (in 980), who kissed simultaneously as family members
and as rulers (osculum sibi dederunt).
The reception by kiss and embrace became a firm component of knightly and diplo-
matic encounters in the late thirteenth century. In contemporary literature, the great
significance of “brotherliness” is expressed in greeting scenes of a highly emotional
nature. We must distinguish, however, between the greeting kiss exchanged while stand-
ing and the expressions of obeisance that took place between sovereign and vassal dur-
ing feudal transactions. Mentions of physical contact between vassal and sovereign
disappear from the records around the end of the sixteenth century. According to
Old French law, the vassal kneeled and placed his hands into the hands of the sovereign
(immixio manuum), upon which the latter raised the vassal from his knees and kissed
him on the mouth (see Chénon 1924: 137). On the verbal level, ritualized formulas
affirmed the semantics of mutual trust during feudal transactions (see “Vers 1260, Le
Livre de jostice et de plet décrit le rite de l’hommage en ces termes: Cil qui requiert
doit joindre les mains et dire: Sire, ge deviens vostre home, etc […] Et li sires doit re-
spondre: Et ge vos recef à home, que ge foi vous porterai, comme à mon home, et vos en
bese en nom de foi” cited after Chénon 1924: 138). Such encounters involved not only
the cheek kiss, but kisses on eyes and lips as well. “Plus de c[ent] fois li baise // et la
bouche et le nez” (cited after Schultz 1889: 521). See also the same document in Heck-
endorn (1970: 3). For instance, after Hartmann von Aue’s Gawein and Iwein recognized
one another they kissed each other on eyes, cheeks, and mouth for many hours. See: “Si
underkusten tausentstunt // Ougen wangen unde munt” (Hartmann von Aue, Iwein,
V. 7503/4). On this, see also Heckendorn (1970: 3ff). See also the following commentary
by Schreiner: “The writers of Middle High German lyric and epic poetry profited
from the fact that ‘munt’ and ‘stunt’ rhymed. Apparently there was no shortage of
lovers who kissed the eyes, cheeks, and mouth of their beloved ‘tusent-stunt’, ‘dusent
stund’ or ‘tausend stunt, êtlicher stunt, an der stunt, bı̂ der stunt’ and ‘zô der stunt’ ”
(Schreiner 1990: 104; see also Jones 1966: 195–210). In Wolfram von Eschenbach’s
Parzival, Gahmuret kisses his knights in greeting: “Dô kuste er die getriuwen […]”
(Wolfram von Eschenbach Parzival II: 99; V: 7).
Aside from male greeting rituals, medieval literature contains numerous scenes in
which the maid-of-honor responds to the greeting of the knight with a kiss on the
cheek. In the Middle Ages, this gesture was presumably not entirely free from the pos-
sible erotic connotations of courtly love (see the scene depicting the greeting of the
cousin of Gahmuret, during which the queen kisses the knight’s dagger: “von der küne-
ginne rı̂ch // si kuste den degen minneclı̂ch”) (Wolfram von Eschenbach Parzival I: 47;
V: 28; 29; 30; 48; V: 1; 2). It must be added, however, that non-erotic symbols, such as
collective feasts and the giving of alms, could also occur in a ritualized relationship to
the female kiss. According to the notions of pre-modern economics, the female greeting
kiss was thus associated less with eroticism and more with ritualized acceptance into an
extended family.
In this function, the lady’s kiss was incorporated into the sign system of medieval ars
donandi: it symbolized the wealth and military might of the feudal lord and implicitly
referred to his reproductive ability. Most often, the lord invited his male guests to be
greeted with a kiss by his wife. Accordingly, medieval poetry tended to raise the female
kiss to the equivalent of payment for knightly virtues. In the context of the dominant
symbolic order of medieval society, the knight’s ascent along the social ladder was usu-
ally depicted symbolically by shortening the ceremonial distance vis-à-vis the sovereign.
In Wolfram von Eschenbach’s Parzival, Gahmuret, victorious against Razalic, requests
that the latter approach and kiss his wife: “gêt näher, mı̂n hêr Razalic: // ir sult küssen
mı̂n wı̂p” (Wolfram von Eschenbach Parzival I: 46; V: 1, 2; Heckendorn 1970: 3ff). In
the work of an anonymous Rhenish poet (around 1180), the queen steps out of the
women’s chambers (the bower) in order to approach the knights of King Rother and
to kiss them (“die vrouwe alsô lossam // kuste den hêren // do schiet her danne mit
êren // ûz van der kemenâte […] die konigı̂n ginc umbe // unde kuste besunder // alle
Rôtheres man” König Rother around 1180). And when Parzival meets his teacher’s
daughter, he is likewise permitted to approach in a rather symbolic manner, whereby
“he kissed her on the mouth” (“kuste er sie an den munt”) (Wolfram von Eschenbach
Parzival III: 176; V: 9; Heckendorn 1970: 3ff).
In the modern era, the integration of feudal economic units into the relationship
between absolutist courts and the dissolution of knightly familial ties led to a reevalua-
tion of fraternal greeting rights during court rituals. A tendency toward the semantic
relativization of the fraternal kiss in the ecclesiastic context developed at the same
time as this gesture was gradually displaced from the secular stage. The significance
of the kissing ritual, in effect considered inscrutable, was increasingly undermined
by questions about its meaning, i.e. by questions about the extent to which a holy
kiss (osculum sanctum) could be distinguished from a sinful kiss between lovers
(Hieronymus Epistolae 22: 16; see also Fuhrmann 1993: 117ff.; Sittl 1890: 79). The
kiss of peace was increasingly abandoned in the interest of preserving one’s image.
The early twelfth century saw the propagation of a rule according to which the kiss
of peace could be exchanged only within a given house of nobility. Such restrictions
on public exchanges of kisses on the mouth may also have been indirectly related to
the appearance of the so-called “kissing plaque” (osculatorium or instrumentum
pacis). It was now no longer necessary to exchange kisses; “symbolic mediacy was
replaced by a consecrated device” (Schreiner 1990: 101). This was a small plate made
of medal or marble that featured the image of Christ, which was to be kissed. The
osculatorium began to fulfill a symbolic function in church ritual by representing the
lips of the community members that were to be kissed.
Aside from this process of displacement, devotional literature of the early modern
period reveals an increasing tendency to differentiate among various meanings of the
kiss. For instance, Johannes von Paltz (died 1511) distinguishes between three forms
of the osculum corporale: a kiss that can be recommended (osculum commendabile),
a kiss that can be excused (osculum excusabile), and a kiss that must be treated with
disdain (osculum detestabile). The greeting kiss (osculum receptionis) belongs to the
recommended gestures, insofar as it appears justified by Old Testament examples.
The osculum cognationis is a forgivable kiss, which takes place between mother and
son. Contemptible kisses include those which express affectation (simulatio), treachery
(dolositas), and sensual desire (libido) ( Johannes von Paltz around 1500: 155–157; cited
after Schreiner 1990: 110).
In the early modern period, the handshake replaced the kiss of peace as sign of egal-
itarian treatment. The increasing popularity of the handshake for initiating communica-
tion must be viewed in the context of the general expansion of early modern
communicative intercourse, where encounters with strangers and the process of initia-
tion communication were increasingly normativized and formalized. This process of
normativization was apparently set off by the fact that individual persons – merchants
and ambassadors, migratory knights and religious missionaries – were increasingly lifted
from the context of primordial and ceremonial power hierarchies. Prior to a tourna-
ment, knights who did not know each other greeted one another by lifting their visor
and showing an unprotected right hand regardless of status and rank. The primary
focus of the event was on comparing their performance. Quite symptomatically, one
symbol of honor consisted in exchanging positions as well as weapons with the oppo-
nent. In a similar way, the use of the handshake as a greeting among merchants served
to affirm the equality of communicative strategies outside their range of application
within power hierarchies (Mahr 1911: 41).
While extending the right hand in greeting was primarily the privilege of feudal lords
and their progeny in the early Middle Ages, sixteenth century diplomatic protocols indi-
cate that this gesture, at least in England, had already become a component of diplo-
matic greeting rituals. As early as in the work of Saxo Grammaticus, we read,
according to Kolb, about the encounter between Frederick I and Waldemar in Lübeck
(1181), which included an embrace, a kiss, and a handshake [“Siquidem inprimis eum
amplexu atque osculo decentissime veneratus, mox dextra honorabiliter apprehensa”]
(Kolb 1988: 99). Apparently, shaking the right hand was part and parcel of greeting ri-
tuals in encounters between rulers both in the early and late Middle Ages. In this
regard, Voss (1987) polemicizes with Wielers (1959), who argues that the use of the
term dextras dare in the context of meetings between rulers could only be traced to
the eighth century (Voss 1987: 139ff.).
The frequency with which diplomatic correspondence since the late sixteenth cen-
tury mentions the hand greeting in the context of diplomatic meetings suggests that
emblematic gestures at the time became abstracted from their concrete semantics of
approach, affirmation, and peaceful intentions, and were increasingly formalized. As
a result of this formalization of diplomatic practice, offering one’s hand became a sim-
ple signal for initiating communication. In this form, it continues to be used regularly at
the beginning of conversations, regardless of the ranks of the interlocutors, and even in
cases where the conversation concerns a disagreement of conflict of interests.
4. References
Aeschines 1908. Aeschinis Orationes. Leipzig: Teubner. First published [400 AD].
Alföldi, Andreas 1970. Die Monarchische Repräsentation im Römischen Kaiserreiche. Darmstadt,
Germany: Wissenschaftliche Buchgesellschaft. First published [1935].
Althoff, Gerd 1997. Spielregeln der Politik im Mittelalter. Kommunikation in Frieden und Fehde.
Darmstadt: Wissenschaftliche Buchgesellschaft.
Amira, Karl von 1905. Die Handgebärden in den Bilderhandschriften des Sachsenspiegels. Munich:
Akademie der Wissenschaft.
Bolkestein, Hendrik 1979. Theophrastos’ Charakter der Deisidaimonia als Religionsgeschichtliche
Urkunde. Religionsgeschichtliche Versuche und Vorarbeiten 21(2): 1–21. Giessen: Töpelmann.
First published [1929].
Buff, Adolf 1877. Verbrechen und Verbrecher zu Augsburg in der zweiten Hälfte des 14. Jahrhun-
derts. Zeitschrift des Historischen Vereins für Schwaben und Neuburg 4(2): 1–108.
Chénon, Emile 1924. Le rôle juridique de l’osculum dans l’ancien droit français. Mémoires de la
Société des Antiquitaires de France Serie 8, Volume 6, 124–155.
Dölger, Franz 1976. Die “Familie der Könige” im Mittelalter. In: Franz Dölger, Byzanz und die
Europäische Staatenwelt, 34–69. Darmstadt: Wissenschaftliche Buchgesellschaft.
Fauser, Markus 1991. Rhetorik und Umgang. Topik des Gesprächs im 18. Jahrhundert. In: Alain
Montandon (ed.), Über die Deutsche Höflichkeit: Entwicklung der Kommunikationsvorstellun-
gen in den Schriften über Umgangsformen in den Deutschsprachigen Ländern, 117–143. Char-
lottesville: University of Virginia Press.
Fleckner, Uwe 1997. Napoleons Hand in der Weste: Von der ethischen zur politischen Rhetorik
einer Geste. Daidalos 64: 122–129.
Folz, Robert 1964. Le Couronnement Impérial de Charlemagne: 25 Décembre 800. Paris: Gallimard.
Freidhof, Gerd 1992. Typen dialogischer Kohärenz und Illokutions-Blockade. Zeitschrift für Sla-
wistik 37(2): 215–230.
Fuhrmann, Horst 1993. Willkommen und Abschied. Über die Begrüßungs- und Abschiedsrituale
im Mittelalter. In: Wilfried Hartmann (ed.), Mittelalter. Annäherungen an eine Fremde Zeit,
111–139. Regensburg: Schriftenreihe der Universität Regensburg.
Hack, Achim Thomas 1999. Das Empfangszeremoniell bei Mittelalterlichen Papst-Kaiser-Treffen.
Vienna: Böhlau.
Heckendorn, Heinrich 1970. Wandel des Anstandes im Französischen und Deutschen Sprachgebiet.
Basel: Lang.
Hensel, Gerd 1979. Geschichte des Grauens. Deutscher Strafvollzug in 7 Jahrhunderten. Altendorf:
Lector-Verlag.
Herberstein, Sigmund 1855. Selbstbiographie Sigmunds Freiherrn von Herberstein. In: Theodor
von Karajan (ed.), Fontes Rerum Austriacarum, Abt. 1, Volume 1, 67–396. Vienna: Bohlau.
Hoffmann, Thomas 1990. Knien vor dem Thron und Altar. Friedrich Nicolai und die Orthopädie
des aufrechten Ganges vor den Kirchenfürsten. In: Bernd J. Warneken (ed.), Der Aufrechte
Gang. Zur Symbolik einer Körperhaltung, 42–49. Tübingen: Tübinger Vereinigung für
Volkskunde.
Hus, Jan 1413. Tractatus de ecclesia. Edited by Samuel Harrison Thomson. Cambridge: Roger
Dymmok, 1956.
Jones, George Fenwick 1966. The kiss in Middle High German literature. Studia Neophilologica
38: 195–210.
Kämpfer, Engelbert 1827. Diarium Itineris ad Aulam Muscoviticam indeque Astracanum suscepti
Anno 1683. Edited by Friedrich Adelung. [First published in 1683]. Saint Petersburg.
Keller, Albrecht [1921] Der Strafrichter in der Deutschen Geschichte. Bonn/Leipzig: Nachdruck
Hildesheim.
Kelly, John Norman Davidson 1986. The Oxford Dictionary of Popes. Oxford: Oxford University Press.
Kolb, Werner 1988. Herrscherbegegnungen im Mittelalter. Bern: Lang.

Linke, Angelika 1996. Sprachkultur und Bürgertum. Zur Mentalitätsgeschichte des 19. Jahrhun-
derts. Stuttgart: Metzler.
Loenhoff, Jens 2001. Die Kommunikative Funktion der Sinne: Theoretische Studien zum Verhältnis von
Kommunikation, Wahrnehmung und Bewegung. Konstanz: UVK Verlagsgesellschaft.
Mahr, August 1911. Formen und Formeln der Begrüßung in gebrüder knauer England von der Nor-
mannischen Eroberung bis zur Mitte des 15. Jahrhunderts. Heidelberg: Lang.
Maier-Eichhorn, Ursula 1989. Die Gestikulation in Quintilians Rhetorik. Bern: Lang.
McLuhan, Marshall 1960. Five sovereign fingers taxed the breath. In: Edmund Carpenter and
Marshall McLuhan (eds.), Explorations in Communication: An Anthology, 207–208. Boston:
Beacon Press.
Ong, Walter J. 1982. Orality and Literacy. The Technologizing of the Word. London: Methuen.
Ostrogorsky, Georgije 1935. Zum Stratordienst des Herrschers in der byzantinisch-slawischen
Welt. Seminarium Kondakovium: Recueil d’études (Archéologie, histoire de l’art, études by-
zantines) 7: 187–204.
Paltz, Johannes von 1983. Coelifodina. In: Christoph Burger (ed.), Werke 1, Spätmittelalter und Refor-
mation. Texte und Untersuchungen 2, 155–157. Berlin: Heiko A. Oberman. First published [1500].
Petrus Cantor [1197] 1987. De oratione et speciebus illibus. In: Richard Trexler (ed.) The
Christian at Prayer. An Illustrated Prayer Manual Attributed to Peter the Chanter, 165–235.
Binghampton, NY: Medieval and Renaissance Texts and Studies.
Prati, A. 1936. Nomie supranommi die genti indicanti qualitàe mestieri, In: Archivum Romani-
cism, Bd. 20, 200–256.
Quanter, Rudolf 1901. Die Schand- und Ehrenstrafen in der Deutschen Rechtspflege. Eine Krimi-
nalistische Studie. Dresden: Aalen.
Radtke, Edgard 1994. Gesprochenes Französisch und Spachgeschichte. Tübingen: Niemeyer.
Reiske, Johann Jacob (ed.) 1751–1766 Constantini Porphyrogeniti Imperatoris De Ceremoniis Aulae
Byzantinae Libri Duo. Graece et latine e Recensione Io. Leipzig. [Reprinted 1829: Bonn, Weber].
Robson, Roy R. 1995. Old Believers in Modern Russia. DeKalb: Northern Illinois University Press.
Romans, Humbert von 1888–1889. Expositio super constitutiones fratrum praedicatorum. In:
Joachim Joseph Berthier (ed.), Opera Omnia, Volume 2, 160–167. Stuttgart: Klett Cotta. [cited
by Schmitt 1992: 286ff.]
Schmitt, Jean-Claude 1992. Die Logik der Gesten im Europäischen Mittelalter. Translated by Rolf
Schubert and Bodo Schulze. Stuttgart: Klett Cotta.
Schneider Brakel, Franz and Jürgen W. Braun 2005. Die Sprache der Hände. Mainz: Schmidt.
Schreiner, Klaus 1990. Er küsse mich mit dem Kuss seines Mundes (Osculetur me osculo oris sui,
Cant.1, 2). Metaphorik, kommunikative und herrschaftliche Funktion einer symbolischen Han-
dlung. In: Hedda Ragotzky and Horst Wenzel (eds.), Höfische Repräsentation. Das Zeremoniell
und die Zeichen, 89–132. Tübingen: Niemeyer.
Schuhmann, Helmut 1964. Der Strafrichter. Seine Gestalt – seine Funktion. Kempten: Verlag für
Heimatpflege.
Schultz, Alwin 1889. Das Höfische Leben zur Zeit der Minnesinger. Leipzig: Zeller.
Schürmann, Thomas 1994. Tisch- und Grußsitten im Zivilisationsprozess. Münster: Waxmann.
Sittl, Karl 1890. Die Gebärden der Griechen und Römer. Leipzig, Germany: Teubner.
Strätz, Hans Wolfgang 1999. Kuss. In: Robert-Henri Bautier (ed.), Lexikon des Mittelalters, Vol-
ume 5, 1590–1592. Stuttgart: Metzler.
Syn cerkovnyj 1894 Сын церковный 1894. Москва: Типография единоверцев.
The Babee’s Book end of the 1500s (including: Urbanitas, the Young Chidren’s Book, the Lytylle
Childrenes Lytil Book, Hugh Rhoder’s Boke of Nurture, John Russell’s Book of Nurture, The
Boke of Kervynge, The of Curtasye, Symone’s Lesson of Wysedome for all Maner Chyldren).
London. [reprinted 1868].
Thomasius, Christ 1710. Kurtzer Entwurff der Politischen Klugheit, Sich Selbst und Andern in allen
Menschlichen Gesellschafften Wohl zu Rathen und zu einer Gescheiden Conduite zu Gelangen.
Frankfurt/Leipzig: Athenäum. [Reprinted 1971: Frankfurt am Main].
Treitinger, Otto 1969. Die Oströmische Kaiser- und Reichsidee nach ihrer Gestaltung im Höfischen
Zeremoniell. Vom Oströmischen Staats- und Reichsgedanken. Bad Homburg: Gentner.
Trexler, Richard 1987. The Christian at Prayer. An Illustrated Prayer Manual Attributed to Peter
the Chanter (d. 1197). Binghampton, NY: Medieval and Renaissance Texts and Studies.
Viller, Marcel, Ferdinand Cavallera, Joseph de Guibert, André Rayez and Charles Baumgartner
(eds.) 1967. Dictionnaire de Spiritualité Ascétique et Mystique. Doctrine et Histoire. Paris: Gabriel
Beauclesne.
Voss, Ingrid 1987. Herrschertreffen im Frühen und Hohen Mittelalter. (Beihefte zum Archiv für
Kulturgeschichte 26.) Cologne: Böhlau.
Vogt, Albert (ed.) 1939–1967. De Ceremoniis, T. 1, Livre 1, chapitres 1–46(37). Paris: Les Belles Letters.
Vogt, Albert (ed.) 1939–1967. De Ceremoniis, T. 1, Livre 1, chapitres 47(38)–92(83). Paris: Les
Belles Letters.
Walter, Otto 1910. Kniende Adoranten auf attischen Reliefs. Jahreshefte des Österreichischen
Archäologischen Instituts in Wien 13: 229–244. Vienna: Rohrer.
Wielers, Margret 1959. Zwischenstaatliche Beziehungen im Frühen Mittelalter. Ph.D. dissertation.
Munich.
Wirth, Karl August 1963. Imperator Pedes Papae deosculator. Ein Beitrag zur Bildkunst des 16.
Jahrhunderts. In: Hans Martin Freiherr von Erffa and Elisabeth Herget (eds.), Festschrift für
Harald Keller zum 60. Geburtstag. Darmstadt: Roether.
Wolfram von Eschenbach 1952. Parzival. Edited by Karl Lachmann. Berlin: Walter de Gruyter.
[First published in the 1300s].
Dmitri Zakharine, Konstanz (Germany)
24. Renaissance philosophy: Gesture as

universal language
1. Introductory
2. Gesture in Late Renaissance rhetoric and poetics
3. Giovanni Bonifacio
4. Francis Bacon
5. John Bulwer
6. Georg Philipp Harsdörffer
7. Conclusions
8. References
Abstract
The idea of gesture as a universal language, an inheritance from classical rhetoric, took
on new importance around the mid-16th century. Among the relevant factors were a
heightened emphasis on nonverbal media in the propaganda wars of the Reformation
and Counterreformation; associated reforms in the teaching of rhetoric that gave stron-
ger theoretical grounding to the affective use of voice and gesture; the rediscovery of
Aristotle’s Poetics (Halliwell 1986) and its catalyzing effect on contemporary rhetoric,
emphasizing affect and extending the reach of rhetorical theory to the nonverbal arts;
the addition of a dynamic physiognomics drawing on basic principles of psychology,
physiology and medicine, to the traditional physiognomics based on static anatomical
24. Renaissance philosophy: Gesture as universal language 365
features; growing need of Europeans to communicate with peoples of unknown tongues;

and the role of gesture in the development of “civil conversation” from a canon of courtly
manners to a code of bourgeois civility and polite conversation based in ethics. By the
mid-17th century, gesture was understood as a universal language in a number of interre-
lated ways. Authors particularly influential in these developments were Giovanni Pierio
Valeriano Bolzani (1477–1558), Giambattista della Porta (1535–1615), Gian Paolo Lo-
mazzo (1538–1592), Giovanni Bonifacio (1547–1635), Francis Bacon (1561–1626), Francis
Junius (1589–1677), John Bulwer (1606–1656), and Georg Philipp Harsdörffer (1607–1658).
1. Introductory
In the Renaissance, as in the Middle Ages, gesture and its depiction continued to serve
as a sort of universal language in religious painting, stained glass, sculpture, sacred
drama, and ritual (Barasch 1987; Baxandall 1972: 61, 1982: 234 note 34; Davidson
2001; Schmitt 1991), most effective amidst high illiteracy. In logic and grammar, sign
theory had been concerned mainly with arbitrary signs, but a theory of natural signs,
including signs of passions, had long been central to medicine (Wollock 1990: 17–22;
1997: 97–152; 2002: 243–249; 2012: 843). While gesture received pragmatic and usually
cursory treatment, if any, in late medieval rhetorical handbooks (Knox 1990: 102–109,
115–16; 1996: 379–387; Müller 1998: 42–45; Rehm 2002: 34), medieval logic and theol-
ogy had a rich tradition of sign theory, including natural, nonverbal signs, based on
Augustine’s De Doctrina Christiana Book II.1–8 (Burrow 2002:1–4; Jackson 1969; Manetti
1993: 157–168, see 142–143) – still important to Bulwer in 1644 (Wollock 2011: 42–43).
Among classical works known in the Middle Ages, the ps.-Ciceronian Rhetorica ad
Herennium and mutilated texts of Cicero’s De Oratore and Quintilian’s Institutio Ora-
toria presented gesture as a universal language mirroring the movements of the soul
(Enders 1992, 2001). A complete Quintilian was discovered in 1416; Cicero’s De Oratore
and Orator, in 1421 (Rehm 2002: 33–36; G. Rossi 2003).
2. Gesture in Late Renaissance rhetoric and poetics

Gesture attracted new interest in the 15th and 16th centuries (Müller 1998: 45–47). This
has been attributed to the religious dissension of the time (Knox 1990: 113–114), a gen-
eral trend toward specialization (Knox 1990: 114–115), and growing emphasis on
method (Knox 1990: 115–121; see Knox 1996: 384–388) – all closely related, since
method fostered specialization, and religious conflict was a major stimulus to both. In
the wake of Reformation efforts to reorganize learning, Ramists for the Protestant
side, Jesuits for Rome, searched the specialized topic of gesture for general principles
on which to ground new teaching methods. But methodological reform does not fully
account for the communicative program of the Counterreformation: there were other
cultural factors – the rediscovery of Aristotle’s Poetics, extension of rhetorical and poet-
ical principles to the visual arts, social demands for “civil conversation”, and growing
contact with peoples of unknown tongues.
2.1. Aristotle’s Poetics as catalyst in Renaissance rhetoric

It was in the mid-16th c. that the Poetics of Aristotle first “entered the blood stream of
criticism” (Lauter 1964: 39); indeed “the sixteenth-century Italians can be said to have
invented Aristotle’s Poetics” (Hathaway 1962: 6). Interpretation of this treatise fol-
lowed prevalent conceptions of classical rhetoric and the poetics of Horace (Vickers
1991), making new connections and giving new emphasis to old ones. Thus the Poetics
became the catalyst for a distinct group of philosophical problems touching on gesture.
Questions about opsis (i.e. “spectacle”, visual aspects of theatrical performance, includ-
ing actorial representation), hupokrisis (representation, acting), and psychagogia (liter-
ally “soul-leading”, or affective aspects of rhetoric) (Halliwell 1986: 337–343; Janko
1987: 153–154), were integrated with already-established debates on gesture as part
of hupokrisis (Latin actio) in rhetoric and ekphrasis (vivid visual description) in both
poetry and rhetoric.
Voice and gesture are voluntary, bodily actions expressing the emotions and inner
workings of the mind. In oratory, they comprise the art of delivery (performance), gen-
erally but not always used together. As an art, writes Aristotle (Rhetoric III.1) delivery
had been neglected in rhetoric; it developed in poetics as actors and rhapsodes replaced
the poets themselves as interpreters (Rhetoric 1403b33). Aristotle’s view of gesture is
decidedly critical. To the finest taste, delivery itself is vulgar (phortikos, Rhetoric III.1,
1404a1). At Poetics 61b27–29–62a12, he reports the opinion of some critics that
the use of gesture makes tragedy more vulgar than epic (Poetics 62a3–6; Davis 1992:
156; Janko 1987: 153). Although spectacle is a constituent part of tragedy and comedy,
it is the least artistic and the least connected with the poetic art: for the true ends
of tragedy can be achieved by simply reading the poetry aloud (Poetics 50b16–20,
62a11–13).
Renaissance commentators, e.g. Robortello (1548) and Castelvetro (1570), imbued
in rhetorical tradition, disputed this view (Puttfarken 2005: 64, 65). Had not Cicero
and Quintilian praised the dignity and necessity of gesture (Dutsch 2002; Fögen 2009;
Hall 2004)? “Every motion of the mind, said Cicero in a famous passage (De Oratore
III), has its own face and voice and gesture”, and at a minimum, these expressions must
not be at odds with what is expressed. Did not Cicero hold the comedian Roscius in the
greatest admiration for his superlative mimicry (Duncan 2006: 174–175)? Had not Aris-
totle himself recommended that the poet write with the requirements of staging in
mind; that he plan a play around the key gestures (Poetics, 55a29–32; see Halliwell
1986:89, Scott 1999)?
2.2. The poetics of painting and other arts

The pictorial power of words, already a prominent theme in classical poetics, was sup-
ported by the doctrine of ekphrasis (Graf 1995), Horace’s ut pictura poesis, emphasizing
the effect of enargeia ‘vividness’ on the phantasia and the passions. While no theory of
painting had come down from antiquity, Leone Battista Alberti (1404–1472) was one of
the first to see that ekphrasis could also apply the other way, making the verbal arts a
model for painting. In Della Pittura, 1435–36 (book III on the education of the artist) he
urged painters to ground their theory in the principles of rhetoric (Barasch 1997: 302).
Aristotle, in a complex discussion of mimesis in the Poetics (Halliwell 1986: 109–137,
esp. 123–125, 132–135), compares poetry, acting and painting (Poetics 47a18, 48a5,
48b10–19, 50a26–29, 50b1, 54b8–15, 60b8). In the Renaissance, the growing fascination
with painting and sculpture led philosophers and critics to see them as liberal rather
than mechanical arts. This tendency, which Rehm (2002: 35–36) already detects in the
Renaissance reception of Quintilian, raised the intellectual status of visual modes of

communication, including gesture, which was central to the “language” of the visual
arts, and greatly “poeticized” the idea of how to make a painting. All this had a tremen-
dous influence on the painting and art criticism of the later Renaissance and Baroque
(Badt 1959; LeCoat 1975; Popp 2007: 63–77; Preimesberger 1987: 104–106, 110–115;
Puttfarken 2005: 57–69; Schröder 1992: 95–110; Varriano 2006: 101–113). It also
meant that not only the scholar, but also the gentleman, ought to appreciate painting.
These ideas merged with prescriptions for “decent” gesture in the theory of “civil con-
versation” (i.e. civil comportment, social ethics), first found in its Renaissance form in
Erasmus, De civilitate (1530; Bremmer 1991: 28 at n.41; see Hübler 2001: 171–202).
The overall effect was to centralize gesture as a meeting point of the arts of language,
the visual arts, music (Popp 2007: 72–77) and ballet (Goodden 1986: 112–119; Gualandri
2001a,b; Schroedter 2004: 171–175, 260 n.43), and in this sense as well, gesture became a
universal “language”, the imitative art par excellence, capable of a speech-like articu-
lation and temporality, the rhythmic disposition of music, and the iconicity and emo-
tional impact of the visual arts, all with the immediacy of embodied action. Thus, the
renaissance saw gesture not only as an intensifier expressing the universally intelligible
passions behind words, but as a universal language in itself. This wider view, extending
the rhetoric of affect to nonverbal media, would characterize all the arts of the Baroque.
2.3. Reformation and Counterreformation: Ramists and Jesuits

The Counterreformation contributed greatly to the idea of gesture and visual art as a
universal language (Barasch 1997; see Lühr 2002: 26–28). While the Tridentine stan-
dardization of the Mass, including its gestures (Knox 1996: 387), was important, there
was also need for a broad theoretical defense of images, because the iconoclastic ten-
dencies of the Protestant Reformation found considerable support in patristic litera-
ture. The first treatise after the council of Trent (1545–1563) to formulate official
views on the function of art was Cardinal Gabriele Paleotti’s Discorso Intorno alle Ima-
gini Sacre e Profane (Bologna 1582). Barasch (1997: 12–15) finds these doctrines devel-
oped in the Trattato della Pittura e Scultura: Uso et Abuso Loro (Florence 1652) by the
Jesuit Giovanni Domenico and the painter Pietro de Cortona. “Images, we read at the
beginning of the Trattato, are a language common to all men, Whoever cannot show and
express his concetti della mente in words, can still do it in images (…) a lingua commune,
intelligible to everyone. There is a ‘mute eloquence’ (muta eloquenza) [Tasso] that
lends particular power to everything that is ‘said’ in images” (Barasch 1997: 15).
2.3.1. Jesuits
The Jesuits were the advance troops of the Counterreformation. As Jesuit rhetorician
Nicolas Caussin (1583–1651) explained in De Eloquentia Sacra et Humana (1626), soph-
istry is all splendid surface; but genuine oratory is an expression of deep and genuine
feeling cultivated by the speaker drawing on the natural sources of the passions and
conveying them to his audience. Thus in delivery, the expressive qualities of voice
and gesture must be genuine and perfectly suited to the content (Campbell 1993:
61–62). The Jesuits underscored this old teaching of Cicero and Quintilian by putting
heavy emphasis on the art of delivery (Rehm 2002: 36–37). In Book IX, Caussin treats
voice and articulation (including their defects), as well as gesture, in great detail. He
ends the treatment of gesture by referring readers to the Vacationes Autumnales
(1620) of the Jesuit Louis de Cressolles (1568–1634), a work entirely devoted to the
use of voice and gesture in oratory (Campbell 1993: 64–65; Fumaroli 1981). These
are only the two best-known of a whole library of Jesuit rhetorical texts that emphasize
delivery (Conley 1990: 152–157; Fumaroli 2002: 279–391; Knox 1990: 111–114).
Quintilian had sharply distinguished rhetorical gesture from actorial mimicry
(Müller 1998: 27–28, 31–42); but the Renaissance was moving toward a wider theory
of nonverbal expression (see Fumaroli 2002: 305, 317, 357–360; Müller 1998: 45–47l;
Percival 1999). Although there was practically no theoretical literature on acting as
such in the 16th or 17th century, Jesuit rhetorical delivery greatly influenced contempo-
rary acting (Golding 1986; Goodden 1986; Gros de Gasquet 2007; Niefanger 2011;
Zanlonghi 2002).
2.3.2. Ramists
Pierre de la Ramée (Petrus Ramus, 1515–1572), professor of rhetoric and philosophy at
Paris, an opponent of scholastic and Aristotelian dialectics, developed a method to reor-
ganize all fields of learning, principally through a reform of logic and rhetoric (Conley
1990: 127–133, 140–143). The Ramists were influential in Protestant countries and, for a
relatively short time (1575–1625), in their universities. Ramus and his associate Omer
Talon (Audomarus Talaeus, 1510–1562) reduced rhetoric to elocution (style, i.e. tropes
and figures of speech) and action (voice and gesture), discarding invention and dispo-
sition – on the grounds that these properly belonged to logic – as well as memory (incor-
porated under disposition). While elocution got overwhelming attention, a few, like
Schonerus (Lazarus Schöner, 1543–1607) were interested in action. Johannes Althusius
(1563–1638) went further: reasoning that elocution applied as much to written as to oral
composition, he removed action from rhetoric to ethics, treating it under “civil conver-
sation” (Althusius 1601: 103–119). Others, like Alsted and Keckerman, were Aristote-
lians who used Ramist techniques of organization. (On Ramist influence in the study
of gesture, see Knox 1990: 116–125.) Despite their great differences, both Jesuit and
Ramist methods had the effect of giving greater emphasis to the expressive-affective
aspects of rhetoric, including pronunciation and gesture (Koch 2008: 321, n.31). Of
course, Quintilian and Cicero were the main sources for both.
3. Giovanni Bonifacio
Giovanni Bonifacio (1547–1635), from a noble family at Rovigo, studied law and rhet-
oric at the University of Padua, the center of Poetics scholarship (Benzoni 1967, 1970).
One of his teachers was the important Aristotelian theorist Antonio Riccoboni (1541–
1599), with whom he kept in touch in later life (Frischer 1996: 81 n.38; Mazzoni 1998:
214, n.32). As a lawyer, Bonifacio was professionally concerned with the art of rhetoric;
but he was also involved with the theatre as playwright, critic, and actor, even writing a
discourse on tragedy (Padua, 1624). In 1616 he published L’arte de’ Cenni, a 600-page
compendium of gestures arranged by parts of the body from head to foot, dedicating it
to the Accademia Filarmonica of Verona, of which he was a member. He also belonged
to literary academies in Treviso, Venice and Padua.
A likely inspiration for Bonifacio was the famous and much reprinted Hieroglyphica
of Giovanni Pierio Valeriano Bolzani (1477–1558), (Valeriano 1556), which devotes one
section to the hieroglyphic interpretation of the parts of the body and their gestures
(Dorival 1971; Fumaroli 1981: 238; P. Rossi [1983] 2006: 78). Another important source
was certainly the Trattato dell’Arte della Pittura (1584) of the neoplatonic theorist Gian
Paolo Lomazzo (1538–1592), Book II of which is on “Actions and Gestures” (Aikema
1990: 104–105). Bonifacio emphasizes “that both poet and painter are capable of ren-
dering almost every emotion. The painter[’s] … task [was] to ‘portray the gestures
and movements, and thus the emotions of people’; his work should consequently be
understood by everyone without exception” (Aikema 1990: 105).
L’Arte de’ Cenni is not a handbook of rhetoric or acting technique or social behavior,
but a compendium for everyone interested in gesture – philosophers, critics, dramatists,
orators, poets, choreographers – and especially painters (Aikema 1990: 105; Gualandri
2001b: 395, 401; Popp 2007: 65–66; Puttfarken 2005: 6–67, 108–109). Bonifacio’s inten-
tion was to present a dynamic physiognomics, an art of discerning the inner state of the
soul from visible bodily signs, of great use for behaving with civility in all social situa-
tions. The art should also be of great value to merchants and explorers encountering
unknown peoples, since the affective message of a painting is as clear to Asians and
Africans as to Europeans (Knox 1996: 391–392, 395). Bonifacio finds gesture more sin-
cere than speech, of greater antiquity, dignity, naturalness, and universality. Though
truth may be simulated, he says (echoing Lomazzo), it is much easier to dissemble
with speech (Casella 1993: 335–340).
The arrangement, while systematic and compendious, is anatomical rather than func-
tional. This isolates the gestures from one another: there are no laws of combination, no
syntax. It is a lexicon without a grammar. As David Clement (1754) wrote, Bonifacio
does not so much teach us to speak by signs, as to know what ideas the ancients attached
to the parts of the body in their various movements. This lack of functional perspective
differentiates the work from both the rhetorical and the courtesy literature, which
describe gestures as instances of general rules of delivery or of social behavior (Casella
1993: 344).
4. Francis Bacon
Francis Bacon (1561–1626) discusses bodily expression in De Augmentis Scientiarum
IV.1 (the 1623 Latin translation and enlargement of The Advancement of Learning,
1605), and manual gesture at VI.1. These brief discussions would prove extremely influ-
ential, especially in the 18th century. Book IV is part of Bacon’s proposal for a new
science of humanity. At IV.1 he notes that while Aristotle in the Physiognomics pro-
vides a detailed treatment of the human body at rest, he ignores the body in motion,
i.e. gesture and expression. Yet this had long been discussed in the medical literature,
and the shift in physiognomics from an exclusive focus on structure to signs of transient
passions can already be seen in the opening chapters of Book II of Lomazzo’s Treatise
(English translation 1598), which discusses the concordance of body and soul on the
basis of the Galenic theory of the four humors (Gualandri 2001; Rykwert 1996: 92),
and in the De Humana Physiognomonia (1586) of Giambattista della Porta (1535–
1615), which devotes a chapter to the semiotics of gesture and expression, again
supported by humoral theory (Percival 1999: 16–17; Rykwert 1996: 40–45).
Bacon’s treatment of manual gesture near the beginning of Book VI (“The Doctrine
of Delivery”, covering logic, grammar and rhetoric) comes out of the logical tradition
(Aristotle, De Interp I.1) rather than the rhetorical (Wollock 2002: 232, 242–243).
Signs need not stand for words, they can stand directly for things, for “whatever may
be split into differences, sufficiently numerous for explaining the variety of notions, pro-
vided these differences are sensible, may be a means of conveying the thoughts from
man to man: for we find that nations of different languages, hold a commerce, in some
tolerable degree, by gestures” (see Knox 1990: 127, 130–132; Müller 1998: 52, 61).
To illustrate this, Bacon draws a parallel between gesture and hieroglyphic: while the
latter is permanent and the former transient, both resemble what they signify. This or-
dering of gesture with hieroglyphic recalls the renaissance fascination with emblemata
and imprese (inspired by Egyptian hieroglyphs, which were interpreted symbolically).
Bacon probably got the idea from Pierio’s Hieroglyphica, and possibly from another
work of Porta, the Ars Reminiscendi (1602), which connects hieroglyphics and gestures
with Ciceronian memory images, “those animated pictures which are recalled into the
imagination to represent a fact or a word” (P. Rossi 2006: 77; see 77–79, 107–109). In the
Jesuit ethnographic literature, Bacon had found Chinese ideographs described as a
“real character” which, rather than resembling what they signify, denote real relations.
Bacon believed that the perfection of such a system, symbolizing the verified “notes of
things”, could serve as a universal, philosophical language (Wollock 2002: 231–236).
Though it would require much labor to learn, the logical relations of the graphic sche-
mata would replace iconicity as an aid to memory (see his discussion of the ars memor-
ativa at DA V.5, the immediately preceding section; and Wollock 2002: 252, n.4). This
idea is complemented by the desideratum, immediately following, of a philosophical,
universal grammar based on a comprehensive study of all natural languages. British lan-
guage reformers would avidly pursue the real character (Wollock 2002: 239–240; 2011:
39–41).
In the scholarly literature on the real character, Bacon’s brief discussion of gesture is
usually seen as a mere introduction to his actual suggestion for a universal language and
advancement of science. The universality of gesture is that of primitive origins, related
to fable and ancient processes of imagination, memory and communication – what Vico
would later call the “poetic character”, the primal generator of both language and myth
(Bedani 1989: 35–43; Cantelli 1976: 49–63; Singer 1989; Wollock 2002: 232 at n.2, see
239–240). Others have argued, however, that the hieroglyphic principle is important
to Bacon’s own method (Stephens 1975, esp. 70–71,121–171; Altegoer 2000, esp. 82–83,
107–112).
5. John Bulwer
John Bulwer (1606–1656) was a London physician who wrote five books (1644a, b, 1648,
1649, 1650, 1653) and an unpublished manuscript (Wollock 1996: 16–19) on the body as
a means of communication. A self-styled Baconian, Bulwer was inspired by DA IV.1,
on the mutual influence of body and soul, a topic Aristotle had ignored; nor was it
Aristotle, as he further notes in the introduction to Chirononomia (Bulwer 1644b:
24–26), who had developed “manual rhetorique” into an art, but rhetoricians them-
selves, along with painters and actors: the first that “collected these Rhetoricall motions
of the Hand into an Art (…) was surely Quintilian”. Bulwer also admired the Jesuits
who, inspired by Quintilian, expanded the role of gesture in oratory (Müller 1998: 46–47,
52, 53). He notes that Gerard Vossius (see Conley 1990: 159–162) reviews Quintilian’s
manual precepts; and disagrees with those – surely thinking of Althusius here – who rel-
egate gesture to ethics, because there is a distinction between actio moralis or civilis and
the action the Greeks call hupokrisis (i.e. delivery), “accom[m]odated to move the affec-
tions of the Auditours”, although such gestures “doe presuppose the Aethique precepts
and the lawes of civill conversation”. Talaeus (the first Ramist rhetorician) “prefers these
Canonicall gestures before the artifice of the Voyce”, but his commentator Claudius Minos
(Claude Mignault, 1536–1606) allows this only in communication between “Nations of
divers tongues”. Lazarus Schöner, a commentator on Talaeus, expresses a wish for
“Types and Chirograms, whereby this Art might be better illustrated then by words”,
which Bulwer has “here attempted to supply…” Indeed the Chironomia is the first
book to contain pictorial illustrations of the manual gestures of classical oratory.
Calling Bartholomaeus Keckermann (1572–1609, see Conley 1990: 157–159) “no
better than a precisi[a]n in Rhetorique” for giving precedence to the voice over gesture
even while admitting that “the Jesuites (known to be the greatest proficients in
Rhetorique of our times) instruct their disciples” in gesture as an art, Bulwer remarks
“how wonderfully they have improved and polished this kind of ancient Learning”,
which “appeares sufficiently by the Labours of three eminent in this facultie”, i.e.,
[Jean] Voel, Cressolles, and Caussin. Despite Conley’s observation (1990: 157) that Jesuit
rhetoric had little influence on English forensic oratory, the ceremonial style of worship
of the Church of England under Archbishop William Laud (1573–1645), which Bulwer
fully supported, emphasized the body and the senses in worship and helped define
what has often been called (originally by its detractors in the mid-19th century), the
“Laudian counterreformation” (see Parry 2006: 190–191; Wollock 2011: 68–72; 1996: 8–11).
While Chirologia/Chironomia, though limited to the hands, resembles Bonifacio’s
Cenni in many ways, there is no evidence that Bulwer knew the work or that it was
known in England at all. The similarities can be accounted for by English developments
that parallel the continental ones discussed above. In this connection, the influence
of Francis Junius the younger (1589–1677) has gone virtually unnoticed. Junius, of
Huguenot ancestry, born in Heidelberg, raised at Leiden, friend of Grotius and
brother-in-law of Gerard Vossius, was a paragon of Caroline intellectual culture. He
lived in England from 1620 to 1642 as librarian to the Earl of Arundel, and on the
fall of Charles I joined the earl in exile in the Netherlands. Junius’s De Pictura Veterum
(1637, revised English edition, The Painting of the Ancients 1638) is a classical source-
book on gesture and expression to facilitate the construction of a classical theory
of painting. Like Bulwer, Junius was also responding to the attacks of Puritan icono-
clasts on painting, theatre, and ceremonial worship (Wollock 2011: 68–72). He drew
on Sidney’s Arcadia as well as his Defense of Poesie, the first work in English to
show the influence of Aristotle’s Poetics (Dundas 2007). Junius’s work must have
been of great use to Bulwer, who writes in the preface to Chironomia:
I never met with any Rhetorician or other, that had picturd out one of these Rhetoricall
expressions of the Hands and fingers; or (…) any Philologer that could exactly satisfie
me in the ancient Rhetoricall postures of Quintilian. Franciscus Junius in his late Transla-
tion of his Pictura veterum, having given the best proofe of his skill in such Antiquities, by a
verball explanation thereof. (Bulwer 1644: 26)
Like Lomazzo and Bonifacio, Junius wrote his De Pictura Veterum because, in Bulwer’s
words, the “Painter, or Carver, or Plastique” needs to know “the naturall and artificiall
properties of the Hand” … for as the History [i.e. narrative] runnes and ascribes pas-
sions to the Hand, gestures and motions must come in with their accommodation”
“(Bulwer 1644b: 58). Just as Bonifacio had been influenced by Lomazzo, so was Junius
(Hard 1951: 238–239, n.8); and, so it would seem, was Bulwer himself. A single garbled
reference to “Palomatius” [sic] “in proport.” (Bulwer 1644b: 100) seems second-hand,
but the rare word “motist” is more telling: this was Haydocke’s rendering of Lomazzo’s
motista (Oxford English Dictionary: “A person skilled in depicting or describing move-
ment”): the earliest recorded use in English. A dedicatory verse to Chirologia addresses
Bulwer as “motistarum clarissimo” (Bulwer 1644a: unnumb.); the phrase “cunning mo-
tist” occurs in William Diconson’s dedicatory verse to the same work (Bulwer 1644a:
unnumb.) and to Pathomyotomia (Bulwer 1649: end matter); once in the text of
Chirologia (Bulwer 1644a: 172), twice in Chironomia (Bulwer 1644b: 24, 58), and
twice in Philocophus (Bulwer 1648: dedication, unnumbered, 150). Haydocke/Lomazzo
was one of the first books to advocate appreciation of painting as part of the education
of an English gentleman (235–236), which links it with the Inns of Court culture in
which Bulwer flourished.
To De Augmentis IV.1, Bulwer added the perspective of DA VI.1, bringing in Bacon’s
gesture/hieroglyph comparison and frequently citing Pierio’s Hieroglyphica. But Bulwer’s
attachment to the renaissance tradition of gesture as a universal language did not extend
to the artificial “real character”. Gesture is already “an universall character of reason (…)
generally understood and knowne by all Nations, among the formall differences of their
Tongue” (Bulwer 1644a: 3). For “[t]his naturall Language of the Hand (…) had the hap-
pinesse to escape the curse at the confusion of Babel” (Bulwer 1644a: 7). The first to take
Bacon’s hint about hieroglyphic and gesture, seeing the latter as the more fundamental of
the two, Bulwer foreshadows Giambattista Vico, William Warburton, and the French phi-
losophers influenced by Warburton (Rosenfeld 2004: 37–43) in their quest for the origins
of language – though none of them knew Bulwer’s work.
As we have seen from Lomazzo, Porta and Bonifacio, Bulwer was not the first to dis-
cuss the function of the humors and passions in the body’s signification of the soul.
There are some basic psychophysical observations in his 1644 publications (e.g. his idea
that gesture precedes utterance in the passage from thought to speech, agrees with
Cressolles, see Golding 1986: 151), but it is mainly in his next two books that Bulwer de-
velops psychophysiology as the universal basis of gesture and expression. In Philocophus
(1648) he investigates the sensorimotor aspects of speech; a tradition, also based on
humoral physiology, highly compatible with that on gesture (Wollock 2002, 2012). Bulwer
was the only physician among the renaissance gesture writers, and the only one to
discuss psychophysiology in detail according to the teachings on voluntary motion of
Aristotle and Galen (see Wollock 1990: 11–22; 1997: 113–150; 2012: 844–851). Thus,
although Bulwer’s Chirologia/Chironomia, like Bonifacio’s Arte de’ Cenni, may be
called a lexicon without a grammar, Bulwer’s lexicon (glossing all the gestures as
verbs) is backed up by a psychophysiology (see analysis in Hübler 2001:350–361).
Bulwer did not explore the connections of speech and gesture except from the stand-
point of rhetorical delivery (nor did anyone else at the time; see McNeill 2005: 13–15),
but his review of the interrelation of the senses in the general theory of action provides
a groundwork (Wollock 2012; see Müller 1998: 77–82, discussing McNeill). It is also
striking that he concludes his anatomy of the expressive muscles of the head with the
musculature of the speech organs (Bulwer 1649: 228–241), which links directly to the
more extensive discussion of articulatory phonetics in Philocophus (Bulwer 1648: 1–54).
Indeed, the psychophysiological processes common to both gesture as universal language
and gesture as transient iconic sign, point to the substantial links between them; and to
have provided such a framework for further investigation is Bulwer’s most important,
though long unappreciated, contribution to language theory.
It is an interesting question how, or even whether, Bulwer connects with the
Baconian language reformers (Wollock 2011). Recalling Bacon’s scheme of advance-
ment from natural to artificial real character is Knox’s suggestion (1990: 129–136;
1996: 395–396) that investigators of artificial, universal language saw the universal
language of gesture as a precedent. However, the universality of gesture lies in its nat-
ural, iconic, imitative character, the universality of human expression, the human psy-
chophysiology that supports it, and its natural relationship to speech. The artificial “real
character” prescinds from the imagination, emotions, and any ratio of bodily communi-
cation. Thus, in opting for the artificial real character over more iconic forms of com-
munication, the artificial language researchers abandoned body as first principle of
expression, in favor of a supposedly independent rationality of the external world
(Wollock 2002: 242–243, 249–252).
Whether or not the natural basis of language belongs to language itself, a satisfactory
theory of language needs to take strict account of it. While mental imagery and affect
remained central to rhetoric and aesthetics, both mechanistic psychology and formalist
linguistics, in a trend not so much anti-visual as anti-sensory in general, excluded them.
And indeed this fits with both the puritan iconoclasm, or fear of iconicity (except for
utilitarian use), as well as with Cartesian hyper-abstraction. Thus, Bulwer’s lack of influ-
ence in the three centuries after his death is due to cultural and intellectual changes that
would make England unreceptive to his approach, as well as to the fact that his work,
available only in English, was virtually unknown abroad.
6. Georg Philipp Harsdörffer

Harsdörffer (1607–1658) was a Nuremberg jurist, poet, litterateur, polymath and trans-
lator. A member of the patrician class, Harsdörffer, who had traveled in Italy in 1627,
saw it as his task to popularize the urbane culture – the “civil conversation” (i.e. civil
conduct, civility) of the Italian academies – for German society, which had nothing sim-
ilar. His Frauenzimmer Gesprächspiele (Ladies’ Conversation Games), in eight volumes
(1641–1649), not only presents what he considered the best of contemporary European
learning and culture, but does so in the easily-digested form of conversational dialogues
and games, exemplifying polite social discourse.
Harsdörffer distinguished four means of expression, speech, writing, picture and ges-
ture, all highly cultivated in the Italian literary and scientific academies (Battafarano
1990: 78). In several passages in the first volume he speaks of painting and gesture as
lingua universale (Harsdörffer [1644] 1968, I:63f, I:71). In volume four (Harsdörffer
1968, IV: 268–273), he presents a critical commentary of Bonifacio’s account of gesture,
in the context of the “civile conversazione” tradition of Stefano Guazzo and others
(Agazzi 2000: 33–37; Bonfatti 1983; see Lievsay 1961). In volume seven (1647), he dis-
cusses gesture as an original, universal language and its cultural refinement (Locher
1991: 263–265). Courtesy books frequently discussed gesture, and its value for polite
conversation was stressed by Bonifacio, who (as Harsdörffer emphasizes), was himself
a member of the Accademia Filarmonica of Verona. In another work, Der Poetischer
Trichter (1648–1653), Harsdörffer provides valuable insights into the use of gesture
on the stage; he also discusses the core problems on the relations of poetry with the
other arts, with which we began this article (Niefanger 2011).
Bonifacio was unknown in England, and Bulwer was virtually unknown on the con-
tinent. In the mid-1650s, however, Harsdörffer, or his associates, had access to a copy of
Chirologia/Chironomia. (See excerpts in the Teutsche Secretarius, I, 1656: 707–718.)
Harsdörffer’s enthusiasm for Bonifacio represents, in the German context, the
beginning of a transition to the culture of wit, taste, and polite conversation (Agazzi
2000: 33–37; Bannasch 2007; Bonfatti 1983; Hübler 2001:171–202; Knox 1990: 383–
384; Locher 1991; Müller 1998: 53–55). From this perspective, not only is gesture
mute speech, but speech and voice are audible gesture (see Hübler 2007: 147–170;
Wollock 1979, 1982: 195–223).
7. Conclusions
Renaissance ideas on gesture foreshadow the 18th century, and to some extent even
Romanticism (see Vico, Herder). Important for us today is not so much the literal
question whether gesture is a universal language, as the fact that in this period gesture
called attention to linguistic processes that are certainly universal-psychophysiological
processes common to verbal and nonverbal thought – that were often overlooked,
downplayed, or even denied in 20th-century linguistics.
8. References
Agazzi, Elena 2000. Il Corpo Conteso: Rito e Gestualità nella Germania del Settecento. Milan: Jaca
Book.
Aikema, Bernard 1990. Pietro della Vecchia and the Heritage of the Renaissance in Venice. Flor-
ence: Istituto universitario olandese di storia dell’arte.
Altegoer, Diana B. 2000. Reckoning Words: Baconian Science and the Construction of Truth in
English Renaissance Culture. Madison, NJ: Fairleigh Dickinson University Press; London: As-
sociated University Presses.
Bacon, Francis 1857–1870. De augmentis scientiarum. In: James Spedding, Robert Leslie Ellis and
Douglas Denon Heath (eds.), The Works of Francis Bacon 1: 431–837. London: Longman. First
published [1623].
Badt, Kurt 1959. Raphael’s Incendio del Borgo. Journal of the Warburg and Courtauld Institutes 22
(1/2): 35–59.
Bannasch, Bettina 2007. Zwischen Jakobsleiter und Eselsbrücke: Das “Bildende Bild” im Emblem-
und Kinderbilderbuch des 17. und 18. Jahrunderts. Göttingen, Germany: Vandenhoeck u.
Ruprecht.
Barasch, Moshe 1987. Giotto and the Language of Gesture. Cambridge Studies in the History of
Art. New York: Cambridge University Press.
Barasch, Moshe 1997. Language of art: Some historical notes. In: Moshe Barasch. Language of
Art: Studies in Interpretation, 10–26. New York: New York University Press.
Battafarano, Italo Michele 1990. Harsdörffers “Frauenzimmer Gesprächspiele”: Frühneuzeitliche
Zeichen- und (Sinn)Bildsprachen in Italien und Deutschland. In: Volker Kapp (ed.), Die
Sprache der Zeichen und Bilder: Rhetorik und Nonverbale Kommunikation in der Frühen Neu-
zeit, 77–88. Marburg, Germany: Hitzeroth.
Baxandall, Michael 1972. Painting and Experience in Fifteenth-Century Italy: A Primer in the
Social History of Pictorial Style. Oxford: Clarendon Press.
Baxandall, Michael 1982. The Limewood Sculptors of Renaissance Germany. New Haven, CT:
Yale University Press.
Bedani, Gino 1989. The origins of language, “natural signification” and “onomathesia”. In: Vico
Revisited. Orthodoxy, Naturalism and Science in the Scienza Nuova, 35–51. Oxford: Berg.
Benzoni, Gino 1967. Giovanni Bonifacio (1547–1635), erudito uomo di legge e devoto. Studi
Veneziani 9: 247–312.
Benzoni, Gino 1970. Bonifacio, Giovanni. In: Dizionario Biografico degli Italiani, 12: 104–197.
Rome: Enciclopedia Italiana Treccani.
Bonfatti, Emilio 1983. Vorläufige Hinweise zu einem Handbuch der Gebärdensprache im
deutschen Barock: Giovanni Bonifacios “Arte de’ Cenni” (1616). In: Joseph P. Strelka and
J. Jungmair (eds.), Virtus et Fortunae: Zur Literatur zwischen 1400 und 1700. Festschrift H.-
G. Roloff, 393–405. Bern: Lang.
Bonifacio, Giovanni 1616. L’arte de’ Cenni, con la quale, Formandosi Favella Visibile, si Tratta
della Muta Eloquenza. Vicenza: Francesco Grossi.
Bremmer, Jan 1991. Walking, standing and sitting in ancient Greek culture. In: Jan Bremmer and
Herman Roodenburg (eds.), A Cultural History of Gesture, 15–35. Ithaca, NY: Cornell Univer-
sity Press.
Bulwer, John 1644a. Chirologia, or the Natvrall Langvage of the Hand. London: T. Harper.
Bulwer, John 1644b. Chironomia, or the Art of Manuall Rhetoricke. London: T. Harper.
Bulwer, John 1648. Philocophus, or the Deafe and Dumbe Man’s Friend. London: Humphrey Moseley.
Bulwer, John 1649. Pathomyotomia, or a Dissection of the Significant Muscles of the Affections of
the Mind. London: Humphrey Moseley.
Bulwer, John 1650. Anthropometamorphosis: Man Transform’d, or the Artificial Changeling. Lon-
don: J. Hardesty.
Bulwer, John 1653. Anthropometamorphosis. Second, enlarged edition. London: William Hunt.
Burrow, John Anthony 2002. Gestures and Looks in Medieval Narrative. Cambridge: Cambridge
University Press.
Campbell, Stephen F. 1993. Nicolas Caussin’s “Spirituality of Communication”: A meeting of
divine and human speech. Renaissance Quarterly 46(1): 44–70.
Cantelli, Gianfranco 1976. Myth and language in Vico. In: Giorgio Tagliacozzo and Donald Phillip
Verene (eds.), Giambattista Vico’s Science of Humanity, 47–63. Baltimore: John Hopkins Uni-
versity Press.
Casella, Paola 1993. Un dotto e curioso trattato del primo Seicento: L’arte de’Cenni di Giovanni
Bonifaccio. Studi Secenteschi 34: 331–407.
Conley, Thomas M. 1990. Rhetoric in the European Tradition. Chicago: University of Chicago Press.
Davidson, Clifford, ed. 2001. Gesture in Medieval Drama and Art. Early Drama, Art, and Music
Monograph. Kalamazoo, MI: Medieval Institute Publications.
Davis, Michael 1992. Aristotle’s Poetics: The Poetry of Philosophy. Savage, MD: Rowman and
Littlefield.
Dorival, Bernard 1971. Philippe de Champaigne et les Hiéroglyphiques de Pierius. Revue de l’Art
11: 31–41.
Duncan, Anne 2006. Performance and Identity in the Classical World. Cambridge: Cambridge Uni-
versity Press.
Dundas, Judith 2007. Sidney and Junius on Poetry and Painting: From the Margins to the Center.
Cranbury, NJ: Associated University Presses.
Dutsch, Dorota 2002. Towards a grammar of gesture. A comparison between the types of hand
movements of the orator and the actor in Quintilian’s Institutio Oratoria 11.3.85–184. Gesture
2(2): 259–281.
Enders, Jody 1992. Rhetoric and the Origins of Medieval Drama. Ithaca, NY: Cornell University
Press.
Enders, Jody 2001. Of miming and signing. In: Claude Davidson (ed.), Gesture in Medieval Drama
and Art, 1–25. Kalamazoo, MI: Medieval Institute Publications.
Fögen, Thorsten 2009. Sermo corporis: Ancient reflections on gestus, vultus and vox. In: Thorsten
Fögen and Mireille M. Lee (eds.), Bodies and Boundaries in Graeco-Roman Antiquity, 15–44.
Berlin/New York: Walter de Gruyter.
Frischer, Bernard 1996. Rezeptionsgeschichte and Interpretation: The Quarrel of Antonio Ricco-
boni and Niccolò Cologno about the structure of Horace’s Ars Poetica. In: Helmut Krasser and
Ernst A. Schmidt (eds.), Zeitgenosse Horaz: Der Dichter und seine Leser seit zwei Jahrtausen-
den, 68–116. Tübingen: Gunter Narr.
Fumaroli, Marc 1981. Le corps éloquent: Une somme d’actio et pronuntiatio rhetorica au XVII
siècle, les Vacationes autumnales du P. Louis de Cressoles (1620). Dix-Septième Siècle
33(132): 237–264.
Fumaroli, Marc 2002. L’âge de L’éloquence: Rhétorique et “Res Literaria” de la Renaissance au
Seuil de L’époque Classique. Geneva: Droz.
Golding, Alfred S. 1986. Nature as symbolic behavior: Cresol’s Autumn Vacations and early
Baroque acting technique. Renaissance and Reformation 10(1): 147–157.
Goodden, Angelica 1986. Actio and Persuasion: Dramatic Performance in Eighteenth-Century
France. Oxford: Clarendon Press; New York: Oxford University Press.
Graf, Fritz 1995. Ekphrasis: Die Entstehung der Gattung in der Antike. In: Gottfried Boehm and
Helmut Pfotenhauer (eds.), Beschreibungskunst – Kunstbeschreibung. Ekphrasis von der
Antike bis zur Gegenwart, 143–155. Munich: Fink.
Gros de Gasquet, Julia 2007. Rhétorique, téatralité et corps actorial. XVIIe Siècle: Revue Trimes-
trielle (236): 501–519.
Gualandri, Francesca 2001a. Affetti, Passioni, Vizi e Virtù: La Retorica del Gesto nel Teatro Del
’600. Milan: Peri.
Gualandri, Francesca 2001b. Le geste scénique dans “Le nozze di Teti e Peleo”. In: Marie-Thérèse
Bouquet-Boyer (ed.), Les Noces de Pélée et de Thetis: Venise, Paris, 1654; actes du colloque de
Chambéry et de Turin, 3–7 novembre 1999, 391–405. Bern: Lang.
Hall, Jon 2004. Cicero and Quintilian on the oratorical use of hand gestures. Classical Quarterly
54(1): 143–160.
Halliwell, Stephen (ed.) 1986. Aristotle’s Poetics. Chicago: University of Chicago Press.
Hard, Frederick 1951. Some interrelations between the literary and the plastic arts in 16th and
17th century England. College Art Journal 10(3): 233–243.
Harsdörffer, Georg Philipp 1968–1969. Frauenzimmer Gesprächspiele. Edited by Irmgard Bou-
cher. Tübingen: Max Niemeyer. First published [1641–1649].
Harsdörffer, Georg Philipp 1656–1659 Der teutsche Sekretarius. Das ist aller Cantzeleyen/ Studir
und Schreibstuben Nützliches/fast Nothwendiges und zum Drittenmal Vermehrtes Titular- und
Formularbuch (2 Vol.). Nürnberg: Endter.
Hathaway, Baxter 1962. The Age of Criticism: The Late Renaissance in Italy. Ithaca, NY: Cornell
University Press.
Hübler, Axel 2001. Das Konzept “Körper” in den Sprach- und Kommunikationswissenschaften.
Tübingen: Francke.
Hübler, Axel 2007. The Nonverbal Shift in Early Modern English Conversation. Amsterdam: John
Benjamins.
Jackson, B. Darrell 1969. The theory of signs in St. Augustine’s De Doctrina Christiana. Revue des
Eludes Augustiniennes 15: 9–49.
Janko, Richard (ed.) 1987. Aristotle, Poetics: With the Tractatus Coislinianus, Reconstruction of
Poetics II, and the Fragments of the On Poets (Book 1). Indianapolis: Hackett.
Knox, Dilwyn 1990. Ideas on gesture and universal languages c. 1550–1650. In: John Henry and
Sarah Hutton (eds.), New Perspectives on Renaissance Thought: Essays in the History of
Science, Education and Philosophy in Memory of Charles B. Schmitt, 101–136. London:
Duckworth.
Knox, Dilwyn 1996. Giovanni Bonifacio’s L’arte de’ cenni and Renaissance ideas of Gesture. In:
Mirko Tavoni (ed.), Italia ed Europa nella Linguistica del Rinascimento. Atti del Convegno In-
ternazionale, Ferrara 20–24 March 1991, 2: 379–400. Modena: Panini.
Koch, Erec 2008. The Aesthetic Body: Passion, Sensibility, and Corporeality in Seventeenth-Cen-
tury France. Newark: University of Delaware Press.
Lauter, Paul (ed.) 1964. Theories of Comedy. Garden City, NY: Doubleday Anchor.
LeCoat, Gerard G. 1975. The Rhetoric of the Arts, 1550–1650. Bern: Herbert Lang.
Lievsay, John L. 1961. Stefano Guazzo and the English Renaissance, 1575–1675. Chapel Hill: Uni-
versity of North Carolina Press.
Locher, Elmar 1991. Harsdörffers Deutkunst. In: Italo Michele Battafarano (ed.), Georg Philipp
Harsdörffer. Ein Deutscher Dichter und Europäischer Gelehrter, 243–265. Bern: Lang.
Lühr, Berit 2002. The Language of Gestures in Some of EI Greco’s Caravaggio Altarpieces. Ph.D.
dissertation, Department of History of Art, University of Warwick, UK.
Manetti, Giovanni 1993. Theories of the Sign in Classical Antiquity. Translated by C. Richardson.
Bloomington: Indiana University Press.
Mazzoni, Stefano 1998 L’Olimpico di Vicenza: un Teatro e la sua “Perpetua Memoria”. Florence:
Le Lettere.
Spitz.
Niefanger, Dirk 2011. Gebärde und Bühne. In: Stefan Keppler-Tasaki and Ursula Kocher (eds.),
Georg Philipp Harsdörffers Universalität: Beiträge zu einem Uomo Universale des Barock, 65–
82. Berlin: de Gruyter.
Parry, Graham 2006. The Arts of the Anglican Counter-Reformation: Glory, Land and Honour.
Woodbridge, UK: Boydell Press.
Percival, Melissa 1999. The Appearance of Character: Physiognomy and Facial Expression in Eigh-
teenth-Century France. Leeds, U.K: Maney & Son for the Modern Humanities Research
Association.
Popp, Jessica 2007. Sprechende Bilder – Verstummte Betrachter: zur Historienmalerei Domenichi-
nos (1581–1641). Cologne: Böhlau.
Preimesberger, Rudolf 1987. Tragische Motive in Raffaels Transfiguration. Zeitschrift für Kunst-
geschichte 50: 89–115.
Puttfarken, Thomas 2005. Titian and Tragic Painting: Aristotle’s Poetics and the Rise of the Modern
Artist. New Haven, CT: Yale University Press.
Rehm, Ulrich 2002. Stumme Sprache der Bilder. Berlin: Deutscher Kunstverlag.
Rosenfeld, Sophia A. 2004. A Revolution in Language: The Problem of Signs in Late Eighteenth-
Century France. Stanford, CA: Stanford University Press.
Rossi, Giovanni 2003. Rhetorical role models for 16th to 18th century lawyers. In Olga Telle-
gen-Couperus (ed.), Quintilian and the Law, 81–94. Leuven, Belgium: Leuven University
Press.
Rossi, Paolo 2006. Logic and the Art of Memory: The Quest for a Universal Language. Translated
by Stephen Clucas, 2nd ed. London: Continuum. First published [1983].
Rykwert, Joseph 1996. The Dancing Column: On Order in Architecture. Cambridge: Massachu-
setts Institute of Technology Press.
Schmitt, Jean-Claude 1991. The rationale of gestures in the West; third to thirteenth centuries. In:
Jan Bremmer and Herman Roodenburg (eds.), A Cultural History of Gesture, 59–70. Ithaca,
NY: Cornell University Press.
Schröder, Volker 1992 “Le langage de la peinture est le langage des muets”: remarques sur un
motif de l’esthétique classique. In: René Démoris (ed.), Hommage à Elizabeth Sophie Chéron:
Texte et Peinture à l’Âge Classique, 95–110. Paris: Presses de la Sorbonne Nouvelle.
Schroedter, Stephanie 2004. Vom “Affect” zur “Action”. Würzburg: Königshausen & Neumann.
Scott, Gregory 1999. The poetics of performance. The necessity of spectacle, music, and dance in
Aristotelian tragedy. In: Salim Kemal and Ivan Gaskell (eds.), Performance and Authenticity in
the Arts, 15–48. Cambridge: Cambridge University Press.
Singer, Thomas C. 1989. Hieroglyphs, real characters, and the idea of natural language in English
seventeenth-Century thought. Journal of the History of Ideas 50(1): 49–70.
Stephens, James 1975. Francis Bacon and the Style of Science. Chicago: University of Chicago Press.
Valeriano Bolzani, Giovanni Pierio 1556. Hieroglyphica, sive De sacris Aegyptiorum literis com-
mentarii. Basel: Michael Isengin.
Varriano, John L. 2006. Caravaggio: The Art of Realism. Pennsylvania: Pennsylvania State Press.
Vickers, Brian 1991. Rhetoric and poetics. In: Charles B. Schmitt and Quentin Skinner (eds.), The
Cambridge History of Renaissance Philosophy, 715–745. Cambridge: Cambridge University Press.
Wollock, Jeffrey 1979. An articulation disorder in seventeenth-century Germany. Journal of Com-
munication Disorders 12: 303–320.
Wollock, Jeffrey 1982. Views on the decline of apical r in Europe: historical survey. Folia Linguis-
tica Historica 3: 185–238.
Wollock, Jeffrey 1990. Communication disorder in renaissance Italy: An unreported case analysis
by Hieronymus Mercurialis (1530–1606). Journal of Communication Disorders 23: 1–30.
Wollock, Jeffrey 1996. John Bulwer’s (1606–1656) place in the history of the deaf. Historiographia
Linguistica 23(1/2): 1–46.
Wollock, Jeffrey 1997. The Noblest Animate Motion: Speech, Physiology and Medicine in Pre-Car-
tesian Linguistic Thought. Amsterdam: John Benjamins.
Wollock, Jeffrey 2002. John Bulwer (1606–1656) and the significance of gesture in 17th-century
theories of language and cognition. Gesture 2(2): 227–258.
Wollock, Jeffrey 2011. John Bulwer and the quest for a universal language, 1641–1644. Historio-
graphia Linguistica 37(1/2): 37–84.
Wollock, Jeffrey 2012. Psychological theory of John Bulwer. In: Robert W. Rieber (ed.), Encyclo-
pedia of the History of Psychological Theories, 2: 839–856. New York: Springer Science.
Zanlonghi, Giovanna 2002. Teatri di Formazione: Actio, Parola e Immagine nella Scena Gesuitica
del Sei-Settecento a Milano. Milan: Vita e Pensiero.
Jeffrey Wollock, New York, NY (USA)
25. Enlightenment philosophy: Gestures, language,

and the origin of human understanding
1. Introduction
2. Condillac
3. French sign language
4. Conclusion
5. References
Abstract
Bodily forms of communication were a core topic in the 18th century debate about the
origin of language. At issue was whether God, nature, or man was responsible for creating
our language faculty. The debate called for new ideas about how language and thought
are related in order to advance learning. It brought the Enlightenment philosophers
25. Enlightenment philosophy 379
into confrontation with discourse traditions stemming from the Bible, Aristotle, and Des-
cartes. Drawing upon the work of the British empiricist Locke, Condillac argued that
bodily sensations are transformed and create the mind and all the knowledge it contains.
He hypothesized that the language of action was a precursor to speech and gave rise to
the language faculty which, rather than the soul, distinguishes us from other animals.
His sensualism was highly influential and inspired the pioneering work of de l’Epée,
the inventor of the gestural method of educating the deaf in France, and his successor,
Sicard. By considering what role the body may have in forming the mind, Enlightenment
philosophy shifted the perspectives on what language is and how its communication
function is intimately related to cognition.
1. Introduction
Enlightenment philosophy dawned in France in the mid 18th century. It was character-
ized by a critical questioning of traditional beliefs and values underlying the political
and social structures of that time. Humanism and the Reformation had challenged
the authority of the Roman Catholic Church, as well as the political conventions that
depended on it, such as the divine right of kings. Science was heralded as a new savior
promising to deliver truth. Referring to Aristotle’s Organon [Works on Logic] (trans.
[1930] 1960, [1955] 1992, [1938] 1996), Thomas Bacon (1561–1626) wrote Novum Orga-
num Scientiarum [New Instrument of Science] ([1620] 2011) announcing that the new
empirical method was to replace the old dialectic method of discovering truths. The
“light of reason” was to reveal the laws of nature through experimentation and obser-
vation to advance knowledge and, hence, improve physical well-being. As contempo-
rary European languages were replacing Latin as the medium of scholarly discourse,
Bacon saw a major obstacle to scientific inquiry in what he called idola fori ‘idols of
the market-place’, misconceptions arising from the common use of language, e.g. a
whale is a fish (see Trabant 2003: 122–131). Philosophers across Europe sought ways
to resolve these semantic problems, and their inquiries raised questions about the relation
between language and knowledge.
One approach was to try to eliminate idola fori by analyzing concepts and clarifying
definitions, thus reforming natural languages for philosophical discourse. Another was
to bypass words as the vehicle of philosophical knowledge by inventing a universal lan-
guage comprised of symbolic characters whose meanings would have a direct link with
reality and thus mirror the process of reasoning perfectly (see Harris and Taylor [1989]
1997: 110–125). Gesture became a topic in this discussion due to its mimetic nature. The
idea that it formed the very first language was advanced by Vico ([1744] 1990), Condil-
lac ([1746] 2001) and Rousseau ([1781] 1968). It was corroborated by the reports of tra-
velers, especially in the New World, where gesture proved to be a way of surmounting
the language barrier confronting Europeans in their encounters with Native Americans,
who were commonly viewed as primitive humans. The efficiency of gesture in establish-
ing communication implied that the original, natural, universal language of mankind
may have been gestural (see Kendon [2004] 2005: 22). The deaf were also commonly
equated with primitive man. Diderot (1713–1784) expressed the view that the gestural
communications of the deaf may give clues to the original structure of thought in Lettre
sur les Sourds et Muets ([1751] 1978) (see Fischer 1990: 35–58; Kendon 2005: 38). Their
signs attracted scholarly interest as the Abbé de l’Epée showed how they could be used
to teach French to the deaf and thus educate them.
Inquiry into the question the origin of language unavoidably raised polemic issues
concerning religion and society, because it concerned man’s nature and his position
in the universe. At issue was whether God, nature, or man was responsible for our lan-
guage capacity. Using thought experiments to speculate about the historical origin of
language, the birth of the language faculty in our ancestors in the empirically inacces-
sible past, the aim was to gain insight into the perpetual origin of language, from which
each and every instance of language use springs, and thus throw light on how we obtain
knowledge. The debate called for new ideas about how language and knowledge are
related in order to advance learning and cultivate a new image of man, which brought
the Enlightenment philosophers into confrontation with three discourse traditions. As a
result, philosophy underwent a process of emancipation to free itself from dogmas (see
Aarsleff 2001: xiii–xv; Trabant 1996: 44–49, 2001: 6–9, 2003: 15–20, 29–34, 131–136):
(i) The Bible: According to Genesis, God created the first man, Adam, a fully evolved
human being equipped with the language capacity. The primary function of lan-
guage was cognition, which Adam exercised in naming the animals; according to
myth lingua Adamica possibly contained divine knowledge due to its divine origin
(see Aarsleff 1982: 281–283; Willard 1989: 134–137). Communication is a second-
ary function of language that can lead us astray from God’s will; rhetorical lan-
guage, appealing to and expressing passions (needs, desires, feelings, emotions),
caused the loss of Paradise, and so the body was instrumental in committing the
Original Sin.
(ii) Aristotle’s linguistic conception in De Interpretatione [On Interpretation] (trans.
[1938] 1996): The mind naturally receives conceptual images of the world; these
are the same for all human beings and constitute thought. The sole function of lan-
guage is the communication of thought. Different languages are different ways of
materializing the same universal ideas; the link between words and ideas is arbi-
trary and conventional (see Trabant 1996: 47–49; 2003: 29–34).
(iii) Cartesian dualism in Discours sur la Méthode ([1637] 1960): The mind is the soul
and it differentiates us from other animals and machines; it receives true, innate
ideas from God. The sole function of language is the communication of thought.
Natural movements which express passions are not part of language (Descartes
1960: 97).
2. Condillac
The Essai sur l’Origine des Connaissances Humaines ([1746] 1973) by Etienne Bonnot
de Condillac’s (1714–80) caused intense debate in the second half of the 18th century
(see Aarsleff 1982: 146–209; Ricken 1989: 301–309). There, he argues that the langage
d’action ‘the language of action’ gave rise to the human language faculty, which distin-
guishes us from other animals. His argument draws upon the concept of actio ‘action,
delivery’ in the Greek and Roman rhetorical tradition, which had been gaining in
importance since the 16th century (see Aarsleff 2001: xviii–xxiii; Kendon 2005: 20–38).
His proposals were further developed in Grammaire ([1775] 1970a) and La Logique
([1780] 1970b).
2.1. Condillac’s relationship to Locke

In his Essay Concerning Human Understanding ([1690] 1996), John Locke (1632–1704)
dismissed the myth that lingua Adamica was a code containing divine knowledge of the
laws of nature (see Willard 1989: 149). He countered Descartes by refuting the exis-
tence of innate ideas, arguing that knowledge has a natural, physical origin in experi-
ence and observation rather than a divine origin. He viewed the new-born mind as
empty, like a “white paper void of all characters” (Locke 1996: 45), and becoming in-
scribed with simple ideas via two sources: (bodily) sensations and (mental) reflection.
His conception of thought is the mental manipulation of ideas independently of
words. Simple ideas combine to form complex ideas, and words tie ideas together
(see Locke 1996: 248). Whereas the mind passively receives its simple ideas, it actively
allocates names (see Locke 1996: 91). Hence words have an arbitrary and conventional
relation to their constituent ideas. In this respect, his view is Aristotelian (see Harris
and Taylor 1997: 131; Trabant 2003: 169). The vital difference is that he argues that
we do not have the same ideas or concepts of the world, because language is acquired
through individual experience and communication with others. Thus, individual varia-
tions in the collections of ideas tied together by words are bound to occur within a lan-
guage community, and even within an individual, as time goes by (see Locke 1996: 271).
Condillac selected key points that Locke had made, systematically organized the ex-
tracted material, reviewed and built on it. He undoubtedly admired Locke but criti-
cized him for assigning only a marginal importance to words in his epistemology. He
argued that ideas cannot combine without the intervention of signs and it is words, in
particular, that enable us to internally manipulate our ideas of the external world at
will, and thus gain control over it. He reduces the complexity of knowledge acquisition
to one single principle: “Ideas connect with signs, and it is, as I will show, only by this
means that they connect among themselves” (Condillac 2001: 5). Whereas Locke took
cognitive processes to be innate, Condillac argued that not just our ideas depend on sen-
sory data, but also the development of our mental software. By inventing a scenario in
which sign usage led to the origin of language, he aimed to show how language could
have a constitutive role in establishing and developing cognition (see Ricken 1989:
289; Trabant 2003: 171). While the expressive elements that enter into human commu-
nication are irrelevant to Locke’s epistemology, they are central to Condillac’s theory
that the language of action primed the mind for the language capacity that enables
true knowledge to be gained (see Aarsleff 2001: xv–xvii). Condillac thus surpasses
Locke’s empiricism and dismisses the need for the Cartesian gap between body and
mind.
2.2. Model of the perpetual origin of language

Condillac replaced Locke’s dual sources of knowledge (sensations and reflection) with
just one: transformed sensations create the mind and all the knowledge it contains. The
transformation of sensations generates increasingly powerful mental faculties in a lad-
der-like progression: perception, consciousness, attention, reminiscence, imagination,
contemplation, memory, reflection, abstraction and analysis, understanding and, on
the final rung, reason. His model delineates a scale that spans the gap between nature
at the bottom of the ladder and culture at the top, reflecting a phylogenetic process.
Countering Cartesian dualism, he asserted that all animals have a soul (mind) equipped
with all the mental faculties up to and including imagination. But only humans have
access to the higher cognitive processes beyond this threshold. Memory enables man
“to gain mastery of his own imagination and to give it new exercise” (Condillac
2001: 40). This voluntary use of the imagination makes the vital difference: It gives
us the ability to draw analogies between the world around us and our reactions to it,
which guides the development of our semiotic capacity. He distinguished three types
of signs (see Condillac 2001: 36):
(i) Accidental signs are chance repetitions of perceptions that unintentionally evoke
the same ideas;
(ii) Natural signs are sounds and movements that express affective states and are
established by nature;
(iii) Instituted signs are man-made and established by convention.
2.3. Story of the historical origin of language

The transition from natural signs to instituted signs is the crux of his evolutionary sce-
nario. Our semiotic capacity to create instituted signs is said to have primed the human
mind for language and rational thought, thus enabling our ancestors to climb beyond
the realm of nature and aspire to culture. A feedback mechanism between sign inven-
tion and intellectual growth enabled the mind to gain independence from external ob-
jects by enriching itself with its own internal objects upon which to reflect and hence to
develop the human capacity for reason.
Condillac’s story avoids confronting biblical dogmas directly. His scenario is situated
after the Flood, and so after Babel, where lingua Adamica was presumably lost, and
tells how the use of signs solved a communication problem. His protagonists are two
children of opposite sex who began to live together, which gave them the opportunity
to exercise their cognitive faculties. By naturally associating perceptions with cries of
emotion and bodily movement, they helped each other:
They usually accompanied the cries with some movement, gesture, or action that made the
expression more striking. For example, he who suffered by not having an object his needs
demanded would not merely cry out; he made as if an effort to obtain it, moved his head,
his arms, and all parts of his body. Moved by this display, the other fixed the eyes on the
same object, and feeling his soul suffused with sentiments he was not yet able to account
for to himself, he suffered by seeing the other suffer so miserably. From this moment he
feels that he is eager to ease the other’s pain, and he acts on this impression to the extent
that it is within his ability. Thus by instinct alone these people asked for help and gave it.
(Condillac 2001: 114–115)
Like the Roman poet Lucretius (98–55 BC), who integrated gesture into a heathen
version of Genesis in De Rerum Natura V [On the Nature of Things] (trans. [1924]
1992), Condillac assumed that sympathy is an innate human attribute. Crucially, the
first sign was naturally created in the observer’s mind (see Aarsleff 2001: xxv): He
linked what his senses were telling him and then acted instinctively to help his compan-
ion. Thereafter the children communicated about the world using cries and movements.
The use of natural bimodal signs established new relationships – between them as they
interacted, and between them and the external world as they tackled survival problems
together – leading to enhanced cognition and the first language: the language of action.
This emotion-based form of interaction and reference generated higher mental facul-
ties. The intentional use of natural signs gave them control over their memory, which –
together with the imagination and contemplation – established the faculty of analogy.
The ability to imagine and thus create retrievable signs brought about a mental libera-
tion from immediate circumstances and drove the transition from nature to culture.
Hence our semiotic capacity, rather than a soul, as in Cartesian dualism, differentiates
us from other animals.
As the number of bimodal signs in usage increased, so did the capacity for gaining
knowledge. Condillac does not say why the voice took over this function, but bimodal
signs were gradually superseded by mono-modal ones as dance and speech evolved sep-
arately (Fig. 25.1). Dance is conceived as a “strong and noble” (Condillac 2001: 118)
way of communicating to compensate for the limitations of primordial speech. Bodily
attitudes and actions later became subjected to rules in the danse des gestes ‘dance of
gestures’ which conveyed semantic content, and which, in turn, gave rise to the danse
des pas ‘dance of steps’ to express “certain states of mind, especially joy” (Condillac
2001: 118), as exemplified in Italian pantomime. Condillac celebrates the various
branches of the Greek and Roman rhetorical tradition (Condillac 2001: 120–155) as fruit-
ful evolutionary developments in the natural history of our semiotic capacity (see Trabant
2003: 172). Since “gestures, dance, prosody, declamation, music, and poetry” are “closely
interrelated as a whole and to the language of action which is their underlying principle”
(Condillac 2001: 156), aesthetics preceded epistemology, and imagination preceded
reason (see Aarsleff 2001: xvi). He thus exonerates the passions as infiltrating linguistic
communication to the detriment of humanity, as in the Biblical account of the Fall.
accidental signs
natural signs
language of action
(movement + cries)
instituted signs
dance speech
dance of gestures
dance of steps
Fig. 25.1: Condillac’s (2011) conception of language evolution
Regarding the evolution of speech, he puts forward no clues as to how duality of pat-
terning could have emerged from cries of emotion. He addresses the problem of the
transition from unarticulated sounds to articulated speech mainly as a material (pho-
netic) rather than a structural (syntactic) or conceptual (semantic) one: The number
of words increased by chance, and young children learnt to pronounce more of them
while their vocal equipment was still flexible. This timely exercise extended their use
into adulthood and established speech as the dominant mode of communication. He
does not suggest what drove the transition – simply that “the use of articulated sounds
became so easy that they prevailed” (Condillac 2001: 116).
In the Essai (Condillac 2001), movement, gesture, and action only play a role in the
early stages of linguistic and cognitive development. In this respect, Condillac discusses
the case of a congenitally deaf-mute man from Chartres whose hearing was suddenly
restored at the age of 23. At first, he was reported to have been surprised at hearing
the sound of bells. Then, having silently listened to people’s conversations for some
months, and repeated in private the words he had heard, one day, he spontaneously
began to speak. Condillac concluded that, while still deaf, the young man was mentally
comparable to a “wild” (feral) child of nature, because deafness had isolated him from
speakers, to whom he could communicate only his essential needs by means of gesture,
and that without speech, his mind had remained in a primitive state (see Condillac 2001:
85–87). He was thus unable to climb the ladder from nature and culture in Condillac’s
model. However, Condillac modified his judgement of the mental capacities of the deaf
and upgraded his view of the semiotic potential of gesture in a later work, Grammaire
(1970a: 359–360), after visiting the Abbé de l’Epée’s school for the deaf in Paris and
witnessing his pupils’ abilities (see Fischer 1993: 431–437; 2011: 13–14). Epée’s analytic
method of using manual signs as the primary medium for educating the deaf convinced
Condillac that they can indeed grasp abstract concepts and thus achieve normal levels
of intellectual development. In Grammaire, he treats the language of action primarily as
a gestural phenomenon at length, and defines gesture as encompassing movements
of the arms, head, and the whole body, as well as facial and ocular expressions (see
Condillac 1970a: 354–355). Nature is said to have determined the first signs and
paved the way for us to imagine new ones. Consequently, we could express all our
thoughts in gestures just as well as we do in words (see Condillac 1970a: 357). He dif-
ferentiates two languages of action, “one natural, whose signs are given by a conforma-
tion of the organs; and the other artificial, whose signs are given by analogy. The former
is necessarily very limited; the latter is quite able to render understandable all of
man’s thoughts” (Condillac 1970a: 359, as quoted in Seigel 1995: 106). By “natural”
Condillac means signs determined by biological constitution, hence all animals have
a language of action specific to their species. By “artificial” he means man-made
signs whose semantic content has been analyzed and whose forms are determined by
analogy, as Epée’s methodical signs were, thus conceding that the language use required
to develop the intellect is, in fact, modality independent (see Fischer 2011: 13–14).
Crucially, he views analysis and analogy as basic complementary principles underlying
language and knowledge acquisition, and which originated in the language of action
(see Condillac 1970a: 365; Fischer 1993: 431–433; Harris and Taylor 1997: 139–154).
2.4. Analytic function of gesture

Condillac conceives of gesture as a natural form of expression that communicates holi-
stically by simultaneously integrating feelings and thoughts, whereas verbal language
analyses the contents of a message into components and arranges these in a linear
sequence (see Harris and Taylor 1997: 144). In the Essai, he states that “a single gesture
is often equivalent to a long sentence” (Condillac 2001: 141) and that early humans
learnt to analyze the contents of their minds by making the transition from holistic to
analytic gestures – from simultaneous to successive linguistic sequencing. Then the com-
munication channel began to switch between visual-kinesic and auditory-vocal modal-
ities. Speech was a by-product that became easier in the long run through habitual
usage.
Condillac discusses the transition from a natural language of gestural signs to an arti-
ficial language of speech in more depth in Logique (Condillac 1970b). There, he pro-
poses how holistic gestures gave rise to analytic signs in the mind of the viewer. Our
ancestors obey nature. They gesticulate without a plan. The viewer who “listens with
his eyes” (Condillac 1970b: 404, my translation) will not understand/hear (Condillac
uses the French verb entendre, which can have either meaning) what the signer is trying
to communicate if he does not “decompose” (analyze) what he sees. It is natural for him
to observe movements sequentially: He attends to the most striking movements first,
then to others, and thus mentally converts his holistic perception of an action into a lin-
ear sequence of distinct movements, each of which is coupled with a distinct idea. These
early humans realized that such decomposed actions are easier to understand than hol-
istic ones. An unconscious cognitive process then becomes conscious and deliberate. By
decomposing his gestures, the evolutionary signer decomposes his thought into its con-
stituent ideas to clarify it for himself. Others understand him because he understands
himself. Repetition reinforces the habit, and gestural language naturally evolves into
an analogical method for analyzing thought: “I say method because the succession of
movements will not be made arbitrarily and without rules. For since gesture is the effect
of one’s needs and circumstances, it is natural to decompose it in the order given by
those needs and circumstances. Although this order can and does vary, it can never
be arbitrary” (Condillac 1970b: 404–405, as quoted in Harris and Taylor 1997: 148).
2.5. Analogical nature of gesture

The mimetic potential of gesture is central to Condillac’s hypothesis, and isomorphism
underlies the analytic processing he proposes (see Harris and Taylor 1997: 148–154).
A decomposed gesture comprises components, each of which corresponds in form
and relations to an idea of which it is a sign. The co-analysis of gesture and thought
is based on visual analogy: “Since the whole pattern of gesture is the picture of the
whole thought, partial gestures are so many pictures of ideas that make up a part of
this whole. Thus, if this man decomposes these partial gestures, he will likewise decom-
pose the partial ideas they are signs of, and he will continually construct new and
distinct ideas” (Condillac 1970b: 405, as quoted in Harris and Taylor 1997: 148–149).
Gestural analysis enhanced the capacity to form analogies which, thanks to the
power of the human imagination, promoted creative thinking (see Takesada 1982:
47–58). As the signer’s natural capacity for analysis became methodical, it enabled
him to create a language device based on analogies drawn between extracts of reality
and gestural forms. Absolutely arbitrary (randomly chosen) signs would not have
been understood. The more transparent the analogy between a sign and a referent,
the clearer and more precise the idea that the sign conveyed (see Condillac 1970b: 406).
Condillac argues that speech could only have been imagined by being modeled on
the language of action by forming analogies. The movement changed medium. Kinesic
patterns primed sound patterns. Intonational amplitude reflected the amplitude of body
movements and substituted these. The first languages were highly intonated because of
the limited articulatory abilities of the first speakers. Sounds were intentionally varied
in tone to create semantic distinctions, as in tone languages like Chinese. As more
words were created, the less “singing” was required to make semantic distinctions
(see Condillac 2001: 120–122).
In Grammaire, Condillac (1970a) elaborates on how the first vocal lexicon could
have expanded to include elements other than cries, equivalent to interjections, and
onomatopoeic expressions. He suggests that names for things that do not make noises
were formed analogously through a synesthetic transfer from the visual-gestural to the
audio-oral modality (see Condillac 1970a: 366–367), an idea that recalls how Plato
related mimetic movement and sound symbolism (Cratylus, trans. [1926] 1996: 133–
147). It is a contentious issue whether duality of patterning could have evolved from
interjections and onomatopoeic expressions. In both cases, they are holophrastic sounds
which are typically outside the phonetic range of a language system and not syntacti-
cally integrated into sentential structures; furthermore, whereas interjections have no
semantic dimension, onomatopoeic expressions have no pragmatic dimension (Trabant
1998: 115–146).
Condillac reasons that, originally, instituted signs were based on the principle of
analogy, and chains of analogies expanded the lexicon over time, i.e. they were moti-
vated signs with links that formed metaphoric matrices. The older the language, the
more it preserved characteristic traces of the language of action and exercised the imag-
ination. These traces fade as a language evolves diachronically. Hence, Ancient Greek
conveyed more vivid imagery than the relatively abstract French of his day (see Con-
dillac 2001: 141). He argues that in reforming natural languages for philosophical
inquiry, the principles of analysis and analogy must be applied if clear and precise
ideas are to be conveyed.
2.6. Natural word order

Condillac challenged the rationalist view that the subject-predicate (Subject Verb
Object) order of forming propositions was natural (see Aarsleff 2001: xxix–xxxi; Ricken
1989: 293; Trabant 2003: 176–177). He proposed that the natural word order of the first
vocal language was object-verb (OV): “The most natural order of ideas caused the
object to be placed before the verb, as in ‘fruit to desire’ ” (Condillac 2001: 157).
Here again, the language of action was formative. The referent (Object) occupied posi-
tion one because it was initially indicated by a gesture to draw the listener’s attention to
what the speaker was talking about. The verb (V) in position two communicated what
the speaker felt / thought about the referent. The subject (S) came to occupy position
three later. Conjugated verb forms sprouted from infinitives to substitute deictic ges-
tures, and other parts of speech followed. As purely vocal language was phased in, ges-
ture filled in lexical gaps, articulating between articulated words. Syntax thus emerged
at the interface between two types of sign employing different media.
In Condillac’s view, it is simply habit that makes the word order Subject Verb Object
feel so natural to French native speakers that they fall prey to the préjugé ‘prejudice,
preconception’ that Subject Verb Object corresponds to the natural sequence of per-
ceptions and thoughts (see Condillac 2001: 173). Word order varies according to the
génie ‘genius’ of a language, i.e. its individual character that reflects the collective
character of the people who speak it (see Condillac 2001: 185–195; Trabant 2003: 175–
177). Génie is a consequence of the climate that the speakers live in and the type of gov-
ernment they have. It blossoms in the hands of great writers and withers in those of
mediocre ones. It is captured in the combination of ideas that words comprise, and
the connotations attached to them, as well as in the syntax that governs how words
may be combined to form text. For Condillac, language diversity reflected the many
possible ways of configuring thoughts, and all are equally natural (see Ricken 1989:
293). He considered the different grammatical structures of different languages to
offer different advantages. The inflectional forms of Latin declinations give a flexibility
to word order, which allows a wide margin of stylistic creativity that is wonderful for
composing poetry. Its génie gives full rein to the imagination. In contrast, the lack of
declinations in French imposes the word order Subject Verb Object to establish gram-
matical relations between the nouns in a sentence. This yields constructions that link
perceptions and thoughts well. Its génie gives it simplicity and clarity that makes it
ideal for abstract reasoning. Condillac considered analysis and imagination to be antag-
onistic cognitive processes: A language that favors the one, does so at the expense of the
other. A good analytic language, like French, exercises the reasoning that underpins
good philosophy, whereas a language that exercises the imagination more, like Latin,
inspires good poetry. “The most perfect language lies in the middle, and the people
who speak it will be a nation of great men” (Condillac 2001: 192), because they would
not only produce good science, but they would also bear the hallmark of good culture
by creating good art.
2.7. Summary
In the Essai, Condillac (2001) tells “a communicative story to explain the genesis of a
cognitive device” (Trabant 2001: 5), arguing that subjective expression gave rise to
objective denotation. Reacting to instinctive pity resulted in semiogenesis. Gaining
mental control over natural signs led to the invention of conventional ones. Semiogen-
esis led to glottogenesis. In Grammaire (Condillac 1970a) and Logique (Condillac
1970b), he develops the idea that gestural language established the faculty of analogy
by giving holistic thought an analytic linear structure and led to speech via cross-
modal transfer. The analogical principle underlying the language of action provided a
blueprint for metaphoric matrices to build up the lexicon.
In sum, Condillac shook all three discourse traditions with regard to the origin of
language. His recognition, inspired by Locke, that language influences the way that peo-
ple think marked a break with the Aristotelian tradition. He re-embodied the human
mind that Cartesian rationalism had severed from the body by proposing that it is gener-
ated by transforming sensations. God is sidestepped rather than eliminated in his hypoth-
esis of how man made language, allowing him to make a significant contribution to the
advancement of philosophy outside the realm of theology. His epistemology and theory
of language origins came to be known as sensualist (or sensationalist) philosophy.
3. French sign language

Interestingly, Descartes viewed language as modality independent. It did not matter to
him which bodily organs are used to express thought. Recognizing that people who are
unable to hear or speak invent their own signs to communicate their thoughts, he con-
sidered sign language to be an equally valid form of language as speech (Descartes
1960: 96–97; Trabant 2003: 135–136). But, generally, up to the 16th century, “the deaf
were almost universally relegated to the class of idiots and madmen” (Berthier
[1840] 2006: 164). This view changed radically in the 18th century thanks to the pioneer-
ing work of enlightened French teachers and their deaf pupils, who proved the falsity of
this verdict of the speaking majority.
3.1. De l’Epée
The abbé Charles-Michel de l’Epée (1712–1789) is known as the inventor of the ges-
tural method of educating the deaf. In the 1760s he founded, and privately funded,
the first public school for the deaf in the world, in Paris. It was then commonly believed
that without speech, intellectual development was just not possible (see Berthier 2006:
168). This is why teaching the deaf to speak was considered the first priority if they were
to be integrated into society. Epée polemicized against this practice. He argued that, in
order to teach the deaf to think, it was better to teach them written French first, by com-
municating in the modality they naturally use among themselves, just as he had been
taught Latin by means of his own native language (see Lane [1984] 2006: 49). He learnt
the sign language of his pupils but thought that it lacked rules (Lane 2006: 7). So he in-
vented a system of signes méthodiques ‘methodical signs’ for analyzing and representing
lexical and grammatical elements of French. Words were written on the blackboard or
cards and first fingerspelled to memorize them. Words were correlated with ideas by
means of gesture. A word with a concrete meaning, like “door”, was explicated deicti-
cally. A word with an abstract meaning was explicated gesturally by analyzing its con-
stituent ideas, each of which was shown by means of a “natural” (motivated) sign for a
concrete referent that the pupils already understood. Hence, the meanings of complex
metaphysical terms were “made visible” using “natural signs, or those made natural by
analysis” (Epée 1776: Part II, 38; as quoted in Fischer 1993: 435). As an example,
Fischer (1993: 433) gives Epée’s (1776) analysis of the complex idea of je crois
‘I believe’ into simple ones that were expressed gesturally:
I say “yes” with the mind. I think that “yes”.

I say “yes” with the heart. I like to think that “yes”.
I believe
I say “yes” with the mouth.
I have not seen, and do not see now, with my eyes.
Here one recognizes Condillac’s underlying principles of analysis and analogy for
creating artificial signs in the language of action (see Fischer 1993: 431–437; 2011: 14),
which Condillac himself recognized. Four years after Epée had published his method-
ology in Institution des Sourds et Muets par la Voie des Signs Méthodiques (1776), Con-
dillac praised his method of instruction in Grammaire, stating that the ideas it conveyed
were “more exact and more precise” (Condillac 1970a: 359, my translation) than those
usually acquired with the help of hearing. The subtitle of Epée’s (1776) book, “A work
that contains the project of a universal language, by the intermediary of natural signs
subjected to a method” (my translation) indicates his belief in the superior efficiency
of gestural signs. Epée later quoted Condillac’s remark in the book’s second edition,
published in 1784 (Fischer 1993: 434), but his belief that gestural signs could be per-
fected to become the universal language sought by philosophers did not gain support.
However, his work thrived, and many schools throughout Europe were founded on
his method of using sign to educate the deaf, marking a breakthrough in recognizing
their status as human beings with normal intellectual capacities (see Lane 2006: 50).
3.2. Sicard
The abbé Roch-Ambroise Cucurron Sicard (1742–1822) succeeded Epée as the princi-
pal of the school for the deaf in Paris in 1789 and established it as the model for many
others that opened in Europe and America (Lane 2006: 9). Although admiring Epée as
a great pioneer, Sicard was critical of his method, which in his view made the deaf into
“automatic copyists of signed French into written French without any understanding of
what they were writing” (Lane 2006: 81), so they were unable to compose their own
sentences (see Sicard [1803] 2006: 94). To redress this deficit, Sicard extended Epée’s
method, explicitly applying Condillac’s sensualistic philosophy in a kind of metaphysical
experiment (see Fischer 1993: 437–444; 2011: 15–19).
Sicard considered the uneducated deaf to be “absolute savages” (Sicard 1808: 1: 15–16;
as quoted in Fischer 1993: 442), who required a method for learning French that corre-
sponded to their “primitive” state, so they could become human and hence part of soci-
ety (see Fischer 1993: 437–439). In his Cours d’Instruction d’un Sourd-Muet de
Naissance (2006), he describes how he educated his first and most famous pupil, Jean
Massieu (1772–1846), in a process that parallels Condillac’s evolutionary transition
from natural to artificial signs. He systematically applied the principle of analogy to
show that a drawing (motivated signifier) can be replaced by a word (conventional sig-
nifier) to represent something, and the principle of analysis to signify part-whole rela-
tions, for example, a “tree” is composed of “roots”, “trunk”, and “branches” (see Sicard
2006: 111), thus enabling Massieu to classify his growing knowledge. Gesture was used
to explicate simple concrete terms, then complex abstract ones. In contrast to Epée,
Sicard never mastered the sign language of his pupils (Lane 2006: 10), and he stressed
the importance of deaf pupils supplying their own manual signs to “replace spoken
language” (Sicard 2006: 97), as these were most suited to their primitive level of intel-
lectual development. Sicard’s approach is demonstrated in his Théorie des Signes
(1808), a dictionary in two volumes, inspired by an idea of Epée’s (Berthier 2006:
185). Volume one gives examples of how concrete terms are decomposed into their sim-
ple constituent ideas and explicated by enacting pantomimic scenes, for which deaf
pupils provided the sign sequences described by Sicard. Volume two treats abstract
terms and grammar.
In the words of Roch-Ambroise Auguste Bébian (1789–1834), a hearing person who
attended Sicard’s school and became fluent in the sign language of his deaf school
friends, Sicard may have completed and perfected Epée’s approach by giving the
deaf a way of obtaining a satisfactory grammatical translation of French:
But, one senses, the more profoundly these signs decompose the sentence – thus revealing
the structure of French – the further they get away from the language of the deaf, from
their intellectual capacities and style of thinking. That is why the deaf never make use
of these signs among themselves; they use them in taking word-for-word dictation, but to
explain the meaning of the text dictated, they go back to their familiar language. (Bébian
[1817] 2006: 148)
Bébian ended the practice of educating the deaf in a manual version of French rather
than their own sign language (Lane 2006: 127). It took the intelligence and persever-
ance of deaf people themselves to prove the true value of their native language. To
name but a few outstanding deaf pioneers: Pierre Desloges (1747–1799) wrote the
first book by a deaf person, in which he defended Epée’s method and discussed the
sign language of his Parisian deaf community (see Lane 2006: 28–29). Laurent Clerc
(1785–1869) was a school friend of Bébian’s who left France for America, where he
co-founded the first school for the deaf in 1817 with Thomas Hopkins Gallaudet
(1787–1851) (see Lane 2006: 9–10, 127–128). And Ferdinand Berthier (1803–1886), a
pupil of Clerc’s, achieved full professorship at the age of 26, was a prolific French
author, and the first deaf person to receive the Legion of Honor, the highest decoration
in France (see Lane 2006: 161).
Despite the success of the manual approach to educating the deaf, the 19th century
saw the return of emphasizing speech and lip reading, a fate that was sealed by the Con-
gress of Milan in 1880, which deemed that the deaf should first and foremost learn
to speak. Sign language was banished from classrooms worldwide until the mid 20th
century, when it attracted the attention of language scientists (see Lane 2006: 1–4).
3.3. Summary
Condillac’s sensualist philosophy inspired the attempts of Epée and Sicard to teach the
deaf French by translating written words into gestural signs and thus facilitate their
social integration. This led to recognizing the superiority of their own sign language
for educational purposes. However, the light thrown on the génie of their language in
the 18th century almost faded to extinction at the end of the 19th century but was
rekindled in the mid 20th century.
4. Conclusion
By giving the body a central role in forming the mind, Enlightenment philosophy
shifted the perspectives on what language is and how its communication function is in-
timately related to cognition. Considered as a vestige of our phylogenetic past, gesture
became a focus of interest in the debate on the origin of language. In Condillac’s treat-
ment of the language of action that underpinned his natural history of man, gesture was
constitutive in forming the first naturally motivated signs, and in creating blueprints for
man-made signs and hence language, which promoted creative thinking and intellectual
evolution. The analogical and analytic principles underlying his conception of the lan-
guage of action found application in Epée’s and Sicard’s experiments with French sign
language, proving that the deaf are indeed capable of reason. Despite the acclaimed
natural potential of gesture for conveying clear and precise ideas, the proposal that
sign language could be developed into a universal philosophical language was not
pursued.
5. References
Aarsleff, Hans 1982. From Locke to Saussure. Essays on the Study of Language and Intellectual
History. Minneapolis: University of Minnesota Press.
Aarsleff, Hans 2001. Introduction. In: Etienne Bonnot de Condillac, Essay on the Origin of Human
Knowledge, xi–xxxviii. Translated and edited by Hans Aarsleff. Cambridge: Cambridge Univer-
sity Press.
Aristotle 1960. Posterior Analytics. Edited and translated by Hugh Tredennick. Topica. Edited and
translated by E. S. Forster. The Loeb Classic Library. Cambridge, MA: Harvard University
Press. First published [1930].
Aristotle 1992. On Sophistical Refutations, On Coming-to-Be and Passing Away. Translated by
E. S. Forster; On the Cosmos. Translated by D. J. Furley. The Loeb Classic Library. Cambridge,
MA: Harvard University Press. First published [1955].
Aristotle 1996. On Interpretation. In: Categories. On interpretation. Edited and translated by
Harold P. Cooke, Prior Analytics. Edited and translated by Hugh Tredennick. The Loeb Classic
Library. Cambridge, MA: Harvard University Press. First published [1938].
Bacon, Francis 2011. The Works of Francis Bacon. A New Edition. Edited and translated by
Basil Montagu. 1825–1836. 16 Volumes. London: British Library, Historical Print Editions.
Bébian, Roch-Ambroise Auguste 2006. Essai sur les sourds-muets et sur le langage naturel. In:
Harlan Lane (ed.), The Deaf Experience. Classics in Language and Education, 129–160. Wash-
ington, DC: Gallaudet University Press. First published [1817].
Berthier, Ferdinand 2006. Les sourds-muets avant et depuis l’Abbé de l’Epée. In: Harlan Lane (ed.),
The Deaf Experience. Classics in Language and Education, 163–203. Washington, DC: Gallau-
det University Press. First published [1840].
Condillac, Etienne Bonnot de 1973. Essai sur l’Origine des Connaissances Humaines, Ouvrage où
l’on Réduit à un Seul Principe tout ce qui Concerne l’Entendement. Edited by Charles Porset.
Auvers-sur-Oise, France: Galilée. First published [1746].
Condillac, Etienne Bonnot de 2001. Essay on the Origin of Human Knowledge. Translated and edi-
ted by Hans Aarsleff. Cambridge: Cambridge University Press. First published [1746].
Condillac, Etienne Bonnot de 1970a. Grammaire. In: Œuvres Complètes, Volume 5, 351–625. Reprint
of the Paris edition 1821–1822. Geneva: Slatkine Reprints. First published [1775].
Condillac, Etienne Bonnot de 1970b. La logique. In: Œuvres Complètes, Volume 15, 319–463. Reprint
of the Paris edition 1821–1822. Geneva: Slatkine Reprints. First published [1780].
De l’Epée, Charles-Michel 1776. Institution des Sourds et Muets par la Voie des Signes Méthodi-
ques. Ouvrage qui Contient le Projet d’une Langue Universelle, par l’Entremise des Signes Nat-
urels Assujettis à une Méthode. Paris: Nyon l’Aı̂né.
De l’Epée, Charles-Michel 2006. La véritable manière d’instruire les sourds et muets, confirmée
par une longue expérience. In: Harlan Lane (ed.), The Deaf Experience. Classics in Language
and Education, 51–72. Washington, DC: Gallaudet University Press. First published [1784].
Descartes, René 1960. Discours de la Méthode pour bien Conduire la Raison et Chercher la Vérité
dans les Sciences. Paris: Garnier. First published [1637].
Diderot, Denis 1978. Lettre sur les sourds et muets à l’usage de ceux qui entendent et qui parlent.
Edited and presented by Jacques Chouillet. In: Œuvres Complètes, Volume 4. Edited by Jean Fabre,
Herbert Dieckmann, Jacques Proust, and Jean Varloot. Paris: Hermann. First published [1751].
Fischer, Renate 1990. Sign language and French Enlightenment. Diderot’s “Lettre sur les sourds
et muets”. In: Siegmund Prillwitz and Tomas Vollhaber (eds.), Current Trends in European
Sign Language Research. Proceedings of the Third European Congress on Sign Language
Research, Hamburg, July 26–29, 1989, 35–58. Hamburg: Signum Press.
Fischer, Renate 1993. Language of action. In: Renate Fischer and Harlan Lane (eds.), Looking
Back. A Reader on the History of Deaf Communities and Their Sign Languages, 429–455. Ham-
burg: Signum Press.
Fischer, Renate 2011. Der gestische Sprachursprung – Szenarien um 1800. In: Das Zeichen, 87,
12–22.
Harris, Roy and Talbot J. Taylor 1997. Landmarks in Linguistic Thought I. The Western Tradition
from Socrates to Saussure. 2nd edition. London: Routledge. First published [1989].
Kendon, Adam 2005. Gesture. Visible Action as Utterance. Reprint with corrections. Cambridge:
Cambridge University Press. First published [2004].
Lane, Harlan 2006. The Deaf Experience. Classics in Language and Education. Washington, DC:
Gallaudet University Press. First published [1984].
Locke, John 1996. An Essay Concerning Human Understanding. Abridged and edited by John W.
Yolton. London: Everyman. First published [1690].
Lucretius Carus, Titus 1992: De Rerum Natura V. Translated by W. H. D. Rouse and revised by
Martin Ferguson Smith. The Loeb Classic Library. Cambridge, MA: Harvard University
Plato 1996. Cratylus. In: Cratylus. Parmenides. Greater Hippias. Lesser Hippias. Translated by
H. N. Fowler. The Loeb Classic Library. Cambridge, MA: Harvard University Press. First
published [1926].
Ricken, Ulrich 1989. Condillac: Sensualistische Sprachursprungshypothese, geschichtliches
Menschen – und Gesellschaftsbild der Aufklärung. In: Joachim Gessinger and Wolfert von
Rahden (eds.), Theorien vom Ursprung der Sprache, 287–311. Berlin: De Gruyter Mouton.
Rousseau, Jean-Jacques 1968. Essai sur l’Origine des Langues où il est Parlé de la Mélodie et de
l’Imitation Musicale. Edited by Charles Porset. Paris: Nizet. First published [1781].
Seigel, Jules Paul 1995. The Enlightenment and the evolution of a language of signs in France and
England. In: Nancy S. Struever (ed.), Language and the History of Thought, 91–110. Rochester,
NY: Boydell and Brewer.
Sicard, Roch-Ambroise Cucurron 2006. Cours d’instruction d’un sourd-muet de naissance, pour
servir à l’éducation des sourds-muets, et qui peut être utile à celle de ceux qui entendent
et qui parlent. In: Harlan Lane (ed.), The Deaf Experience. Classics in Language and Educa-
tion, 83–126. Washington, DC: Gallaudet University Press. First published [1803].
Sicard, Roch-Ambroise Cucurron 1808. Théorie des Signes pour l’Instruction des Sourds-Muets,
dédié à S. M. l’Empereur et Roi. Suivi d’une notice sur l’enfance de Massieu. 2 volumes.
Paris: L’Imprimerie de l’Institution des Sourds-Muets.
Takesada 1982. Imagination et langage dans l’essai sur l’origine des connaissances humaines de
Condillac. In: Jean Sgard (ed.), Condillac et les Problèmes du Langage, 47–57. Geneva:
Slatkine.
Trabant, Jürgen 1996. Thunder, girls, and sheep, and other origins of language. In: Jürgen Trabant
(ed.), Origins of Language, 39–69. Budapest: Collegium Budapest.
Trabant, Jürgen 1998. Artikulationen. Historische Anthropologie der Sprache. Frankfurt am Main:
Suhrkamp.
Trabant, Jürgen 2001. Introduction: New perspectives on an old academic question. In: Jürgen
Trabant and Sean Ward (eds.), New Essays on the Origin of Language, 1–17. Berlin: De Gruy-
ter Mouton.
Trabant, Jürgen 2003. Mithridates im Paradies. Kleine Geschichte des Sprachdenkens. Munich:
C. H. Beck.
Vico, Giambattista 1990. Princı̀pi di Scienza Nuova d’Intorno alla Comune Natura della Nazioni.
Third edition. Edited by Andrea Battistini. Milan: Mondadori. First published [1744].
Willard, Thomas 1989. Rosicrucian sign lore and the origin of language. In: Joachim Gessinger and
Wolfert von Rahden (eds.), Theorien vom Ursprung der Sprache, Volume 1: 131–157. Berlin:
De Gruyter Mouton.
Mary M. Copple, Berlin (Germany)

26. 20th century: Empirical research of body, language, and communication 393
26. 20th century: Empirical research of body,

language, and communication
1. Introduction
2. Wilhelm Wundt: A semiotic classification of gestures
3. David Efron: Gestures as expression of culture
4. The post-war period and the paradigm of nonverbal communication – gesture and speech
as separate channels
5. Gesture studies in the 70s and 80s: The emergence of modern research on gestures
6. Modern gesture research: A wanderer between fields
7. Conclusion
8. References
Abstract
The chapter presents a concise overview of empirical research of body, language, and
communication in the 20th century. Starting from a discussion of Wundt’s semiotic clas-
sification of bodily behavior (Wundt 1921) and the work of Efron (1972) on gestures as
expression of cultures, the paper moves on to a discussion of empirical research on body,
language, and communication in the post war period and the decline in interest within the
1950s to 1970s due to the research paradigm of nonverbal communication. After shortly
discussing the work of Pike (1972) and Birdwhistell (1970), the paper then traces back the
emergence of modern gesture research within the 70s and 80s and introduces the main
research paradigms established, in particular within gestures studies. The paper then dis-
cusses modern day gesture research within the fields of psycholinguistics/psychology,
interaction studies, linguistics, and semiotics, cognitive linguistics, conversation analysis
and the ethnography (of communication) as well as the field of artificial intelligence
showing that present research on body, language, and communication is a wanderer
between disciplines, and first and foremost characterized by its interdisciplinary nature.
1. Introduction
At the end of the 19th century, research on body, language, communication, and in par-
ticular gestures reached a respectable and central position in the sciences. Works of
scholars such as Mallery (1972 [1881]), Tylor (1964 [1870]), and de Jorio ([1881]
2000) contributed substantively to ongoing scientific interest and were at the heart of
central scientific questions.
However, this interest declines substantially after the influential work of Wilhelm
Wundt (1921) at the beginning of the 20th century. Topics of the 19th century, such
as the evolution of language, the nature of sign languages, and the universality and nat-
uralness of gestures were not at the center of scientific interest and only few works on
body, gestures, and language emerge in the beginning of the 20th century. Only at
the beginning of the 1970s does the interest in gestures rise again. Based on several
influential publications on gestures, language and the body, a diversified range of
research characterized by its interdisciplinary nature develops. Today, research on
body, language and communication wanders between different scientific fields such as
Anthropology, Psychology, (Cognitive) Linguistics, Semiotics, Conversation Analysis,

Neurology, Primatology, and Artificial Intelligence.
2. Wilhelm Wundt: A semiotic classification of gestures

In the first volume of his tome Völkerpsychologie: Eine Untersuchung der Entwicklungs-
gesetze von Sprache, Mythos und Sitte. Band 1 Die Sprache (1921), the psychologist
Wilhelm Wundt traces the development of language from expressive movements
(“Ausdrucksbewegungen”). By comparing gestures of the deaf, traditional gestures of
North American Indians and Cistercian monks as well as coverbal and traditional
gestures of Neapolitan speakers, Wundt proposes a semiotic classification of gestures
(“Gebärden”) which aims at their psychological nature and the various cognitive prin-
ciples and mechanisms involved in their sign creation processes (see Müller, Ladewig,
and Teßendorf in preparation for a discussion of the term “Gebärden”). Wundt’s goal is
thereby an etymology, a classification of “Gebärden” from a genetic viewpoint (Wundt
1921: 162).
Gestures, according to Wundt, have developed from expressive movements used in
the first place to express emotions lacking any overt communicative character. Yet
when observed by others, expressive movements evoke similar feelings and as such,
emotions or affective states, are shared by and among others. The development of ges-
tures and sign languages thereby offers a way of examining psychological as well as sign
creation processes operating in the development of language by demonstrating similar
or comparable processes of symbolization. They inform us about the nature of language
and show how even arbitrary signs may evolve from simple iconic relations through
abstract processes of conceptualization, reflecting characteristics of the inherent language
ability.
Along with the classification of gestures that traces their development from expres-
sive movements via processes of metonymical or metaphorical abstraction to arbitrary
signs, Wundt proposes a highly differentiated account of gestures. By distinguishing the
various types of gestures upon the separation of their denotation, i.e. denotation
(“Bezeichnung”) and co-denotation (“Mitbezeichnung”), Wundt (1921) is able to
trace back the semiotic processes taking effect in gestures. For the first time, Wundt
convincingly shows the complex and interwoven semiotic processes that come into
effect when moving from perceived objects to imaginary objects and their symbolic
representation in gestures.
Wundt’s classification represents an essential contribution to the study of gestures
even until today, as many of his ideas and concepts have been taken up only recently
(Fricke 2007; Mittelberg 2006; Teßendorf 2008). Moreover it laid the basis for many
classification systems of gesture to come (see Efron [1941] 1972; Ekman and Friesen
1969; Fricke 2007).
Wundt’s work remains the only promising work at the beginning of the 20th century.
Wundt’s work appeared at a time in which interest in gestures and topics associated
with the study of gestures became insignificant. The decline in topics of the 19th cen-
tury, the abandonment of sign languages from discussion, the rise of behaviorism, an
ever-growing interest in psychology and the study of behavioral aspects that were
beyond the conscious control of the individual, fostered the move away from the
body and gestures (see Kendon 1982, 2004b). When gestures or bodily forms of
expression were of interest, it was only with respect to their possible evidence on under-
lying motivations, character or personality of the individual (see Krout 1935; Wolff
1945) as well as the expression of emotion (Allport and Vernon 1933; Bruner and
Tagiuri 1954).
With the rise of structural linguistics and the focus of linguistics on abstract aspects
of the language system, interest in gestures had also no appropriate place in the science
of language. Anthropologists, such as Franz Boas and Edward Sapir, accepted gestures
as part of a wide variety of communicative behavior. They stated that “among the pri-
mary communicative processes of society may be mentioned: language, gestures, in its
widest sense” (Sapir 1949: 104). Furthermore, gestures may play a role in determin-
ing the meaning of utterances (Sapir 1949). However, in general, gestures were not con-
sidered part of linguistics proper. Bloomfield’s depreciation of gestures may be seen as
the standard view in linguistics at that time. “To some extent, individual gestures are
conventional and differ for different communities.” But “most gestures scarcely go
beyond an obvious pointing and picturing.” (Bloomfield [1933] 1984: 39)
3. David Efron: Gestures as expression of culture

An exception at that time is David Efron’s study Gesture, Race and Culture (1972) on
the everyday use of gestures by assimilated and not assimilated east European Jews and
south Italian immigrants in the New York of the 1930s.
Efron entitled his study with the goal of empirically testing theories of racist deter-
mined forms of communication, and aimed at answering the question whether gestural
behavior is innate or culturally determined. Efron’s study is therefore characterized by
a broad empirical basis and methodological diversity and accuracy. His study is the first
that tries to account for gestures on both quantitative and qualitative levels: quantita-
tively in terms of counting aspects of form in the use of gestures, analyses of motion
pictures by counting, and the use of graphs and charts; and qualitatively through direct
observation of gestural behavior in natural situations, i.e., cafes and public places, and
the use of sketches and notations (Efron 1972: 41, 66).
Efron shows culture-specificity in a range of different aspects of gesture use, such as
the parts of the body used for gesturing, the size, and axes of movements as well as the
distance and use of space between speakers. For example, Italians and Jews differ in the
radius of their gestures (large, whole arm movements vs. wrist movements), the tempo
of movement (smooth vs. abrupt direction changes) as well as type of gestures used
(symbolic vs. discursive gestures). Moreover, Efron showed a decrease of traditional
gestural behavior along with an adaptation of gestural behavior to the American
culture. “These results, tentative as they are, point to the fact, that gestural behavior,
or the absence of it, is, to some extent at least, conditioned by factors of a socio-
psychological nature […].” (Efron 1972: 160) The use of gesture is thus determined
culturally and not biologically.
Apart from its statement on the cultural basis of gestures, this study proposes a ges-
ture classification that will be the basis for the majority of classifications to follow.
Moreover, it initiates gesture studies on a different empirical level and must thus clearly
be seen as a milestone in the research on body, language, and communication.
4. The post-war period and the paradigm of nonverbal

communication – gesture and speech as separate channels
Thinking about the body, language, and communication in the post-war period is substan-
tially influenced by cybernetics (see Ruesch and Bateson 1951) and leads to a conception of
human behavior which views communication as an exchange via verbal and/or nonverbal
channels with clearly different functions. While speech (digital sign) is allocated to express
and transmit content, bodily signals (analog sign), such as gestures, facial expressions, etc.,
are viewed as expressing solely emotional states as well as personality traits (Argyle 1989;
Bateson 1968; Ruesch and Kees 1956; Scherer and Walbott 1984; Watzlawick, Bavelas, and
Jackson 1969). This perception not only leads to a “splitting of the subject matter into ver-
bal and nonverbal signs, which is suggested anyway by the term ‘nonverbal communica-
tion’, but also allocates distinct functions to both areas of expression.” (Müller 1998: 66,
translation JB) Moreover, the interrelation of both modalities is lost sight of in these stu-
dies, as bodily expressions are analyzed independently of the verbal utterance, resulting in
usually very general statements about gesture: gestures illustrate and exemplify or modify
and replace the verbal utterance (Scherer and Walbott 1984).
The term “nonverbal communication”, which is expressed for the first time by
Ruesch and Kees (1956), not only becomes the primary term for the understanding
of communication in various scientific fields but more importantly has become domi-
nant and has even hindered the way of thinking about and the means of studying ges-
tures until today (see for example Matschnig 2007; Pease and Pease 2003).
An attempt at systematizing “nonverbal” behavior, and especially gestures, within this
paradigm was presented by Ekman and Friesen (1969). Similar to Wundt (1921), Ekman
and Friesen aim at investigating the biological and cultural anchoring of nonverbal behavior,
and discuss how a particular nonverbal behavior is acquired and made part of the individual
behavioral repertoire. Ekman and Friesen develop a classification of nonverbal behavior
which, among others, investigates gestures, facial expressions, body positions, etc. on the
basis of their origin (acquisition and inclusion in the behavioral repertoire), usage and cod-
ing (semiotic relation). Based on this threefold differentiation, Ekman and Friesen make a
distinction between emblems, illustrators, regulators (nonverbal behavior which regulates the
flow of interaction), affective displays (movements triggered by the seven basic emotions),
and adaptors (e.g., learned movements to satisfy bodily needs or to execute bodily actions).
Their main interest is thereby focused on emblems, gestures, which have a “direct verbal
translation, or dictionary definition” (Ekman and Friesen 1969), are known to members
of a culture, and are used consciously and deliberately. In subsequent years, Ekman and
Friesen’s work on emblems inspires a large body of emblem collections from different lan-
guages and cultures (for an overview see Müller 1998; Kendon 2004b; Payrato volume 2;
Teßendorf this volume). Apart from becoming one of the best-known classifications of ges-
ture, it has become the one with the most far-reaching consequences for the understanding
and description of gestures (see Fricke 2007; Müller 1998 for a detailed discussion).
5. Gesture studies in the 70s and 80s: The emergence of modern

research on gestures
“Despite the growth of linguistics” in the first half of the 20th century, “and a greatly
increased concern with communication, on the other, gesture remained largely
unstudied because it was left without a theoretical framework into which it could be
really fitted.” It “fell between stools.” (Kendon 2004b: 72)
This situation gradually changes at the beginning of the 1970s, as research on the
body, language and especially gestures returns in a number of scientific fields (e.g.,
anthropology, linguistics, and psychology), leading to the development of new perspec-
tives and theoretical frameworks for the study of gestures (see Kendon 1982, 2004b for
a detailed discussion).
Pike (1967, 1972) and Birdwhistell (1970, 1974, 1975) are among the first to put for-
ward thoughts at bringing together body and language in the study of communication
from a theoretical as well as methodological viewpoint. Pike and Birdwhistell objected
to the view of communication being divided and analyzed into discrete and indepen-
dent channels (e.g., verbal and nonverbal). Rather, communication must be under-
stood as a shared and common system of behavior patterns, a “multichannel system”
(Birdwhistell 1979: 193). Based on this assumption, Pike and Birdwhistell approach
this question on the basis of structuralist methods, yet with a different aim and scope.
Pike (1967, 1972) develops a theoretical framework and method with which verbal and
nonverbal activities can be conceived of as a structural whole (Hübler 2001: 220).
The activity of man constitutes a structural whole, in such a way that it can not be subdi-
vided into neat “parts” or “levels” or “compartments” with language in a behavioral com-
partment insulated in character, content, and organization from other behavior. Verbal and
nonverbal activity is a unified whole, and theory and methodology should be organized or
created to treat it as such. (Pike 1967: 26)
Birdwhistell (1970, 1974, 1975), on the other hand, develops his program of “kinesics”, a
science of communication by bodily behavior (gestures, facial expressions, etc.). In the
view of Birdwhistell, bodily behavior can be analyzed analogously to verbal utterances.
With the help of descriptive and structural linguistics, he examines bodily behavior
on the different levels of language (phonetics, phonology, morphology, and syntax),
and subdivides bodily expressions in distinctive hierarchical units (kine, kinemes,
kinemorphs, complex kinemorphs, and kinemorph constructions).
Another step that moves the analysis of body and speech forward into the focus of
attention is the work by Condon and Ogston (1966, 1967). Based on microanalyses of
motion pictures, Condon and Ogston are able to show a “precise correlation between
changes of body motion and the articulated patterns of the speech stream.” (Condon
and Ogston 1967: 227) This correlation of body movements and speech encompasses
different hierarchical levels of speech, such that body movements may align with
“sub-phones”, phones, syllables, words, phrases, and higher-level units. This systematic
co-occurrence of body movements with speech seems to be based on the segmental
breath pulse (see Furuyama 2002; Tuite 1993), leading to the impression that “the
body dances in time with speech” (Condon and Ogston 1967: 225).
This tight interrelation of speech and body movements is picked up by Ken-
don (1972) who puts forward an examination of various types of bodily and showed
that “just as the flow of speech may be regarded as an hierarchically ordered set of
units, so we may see the patterns of body motion that are associated with it
as organized in a similar fashion, as if each unit of speech has its ‘equivalent’ in
body motion.” (Kendon 1972: 204) Contrary to Condon and Ogston, Kendon goes
beyond the mere correlation of units of speech with units of bodily expressions. Rather,
he shows that the larger the speech unit, the more body parts are involved, and con-
cludes: “it is as if the speech production process is manifested in two forms of activity
simultaneously: in the vocal organs and also in the bodily movement, particularly
movements of the hands and arms.” (Kendon 1972: 205) Speech and gesture are
“under the guidance of the same controlling mechanism” (Kendon 1972: 206), and
even more so “appear together as manifestations of the same process of utterance.”
(Kendon 1980: 208)
A similar view is expressed by McNeill (1979, 1985, 1986). Gestures “share
with speech a computational stage; they are, accordingly, parts of the same psycho-
logical structure” (McNeill 1985: 350). Moreover, gestures allow insights into
the mental representation on which speech is based. Gestures are a “channel of
observation onto the speaker’s mental representation.” (McNeill 1986: 108; see
also McNeill 1985)
Kendon’s works and in particular McNeill’s (1985) statement of gestures and speech
sharing a “computational stage” triggers a lively debate on the relation and production
of gesture and speech (Butterworth and Hadar 1989; Feyereisen 1987; McNeill 1989).
Initiated by this debate, publications on gestures abruptly increased in the 90s and
the topic of gesture and speech being guided by one mental process becomes the
main paradigm of the newly arising gesture research.
6. Modern gesture research: A wanderer between fields

6.1. Body, language, and communication in the field of
psycholinguistics and psychology: Gestures as a “window
onto thinking.”
David McNeill’s book Hand and Mind (1992) sets off a new era in gesture research as
it combines for the first time implications of the interwoven nature of gesture and
speech (see Condon and Ogston 1967; Kendon 1972, 1980) into a theory of utterance
production. McNeill argues that “gestures are an integral part of language much
as are words, phrases, and sentences – gesture and language are one system.”
(1992: 2) Gesture and speech, according to McNeill “arise form a single process of
utterance formation” resting upon an “imagistic” (expressed in gestures) and a “lin-
guistic” (expressed in speech) side. Imagistic and linguistic thinking are brought
together by means of a self-organizing process, i.e., a dialectic of thinking being ex-
pressed in the “growth point.” Although gesture and speech express the same con-
cept, the types of expression of the two modi are different. While speech splits up
meaning into segments and a linear order, gestures are multidimensional, “global
and synthetic and never hierarchical” (McNeill 1992: 19). Other than speech, “the
meaning of the ‘parts’ of the gesture are determined by the meaning of the
whole.” (McNeill 2005: 10) Co-speech gestures are idiosyncratic, obligatorily tied
to speech, do not show linguistic characteristics, are non-conventionalized, and
always depend on the thinking of the individual speaker (McNeill 1992, 2005).
McNeill’s interest in gestures is centered on their cognitive foundations. He aims
at a conceptual framework that includes gesture and language while showing their
similarities as well as differences. McNeill thereby conceives of gestures as a
“window onto thinking” (McNeill and Duncan 2000: 143), a way of inspecting pro-
cesses of conceptualizations and thinking, as gesture displays “mental content, and
does so instantaneously, in real-time.” (McNeill and Duncan 2000: 143)
The orientation of gesture research on questions of conceptual processes, produc-
tion, reception, and the relation between speech and gesture as initiated by McNeill
constitutes one of the main topics in modern gesture research (for an overview see
for example Duncan, Cassell, and Levy 2007). Studies address how gesture and speech
relate to each other on the level of production, i.e., regarding speech-gesture production
models (de Ruiter 2000, 2006, 2007; Kita 1990, 2000; Seyfeddinipur 2006), whether ges-
tures aid the speech production process (Alibali, Kita, and Young 2000; Beattie and
Rima 1994; Beattie and Shovelton 2002b; Esposito, McCullough, and Quek 2001;
Hadar, Dar, and Teitelman 2001; Hadar and Pinchas-Zamir 2004; Krauss and Hadar
1999; Krauss, Dushay, Chen, et al. 1995; Krauss, Chen, and Chawla 1996; Mayberry
and Jaques 2000; Seyfeddinipur 2006; Tuite 1993) or are communicative themselves
and fulfill an important interactive function (Alibali and Heath 2001; Bavelas 1994;
Bavelas and Chovil 2000; Bavelas, Chovil, Coates, et al. 1995; Bavelas, Kenwood, and
Phillips 2002; Beattie and Shovelton 1999, 2002a, 2007; Furuyama 2002; Gerwing and
Bavelas 2004; Gullberg and Holmquist 1999, 2006; Iverson and Goldin-Meadow 1998;
Kimbara 2007; Kuhlen and Seyfeddinipur 2007; Levy and Fowler 2000; Özyürek
2000, 2002; Tabensky 2001). Comparisons of conceptualization and expressions of
ideas and concepts in gesture and speech particularly in different languages (Allen,
Özyürek, Kita, et al. 2007; Duncan 1996; Kita and Özyürek 2003; Özyürek, Kita, and
Allen 2001) as well as questions on the acquisition of gestures are also being addressed
in psychological studies on gestures (Butcher, Goldin-Meadow, and McNeill 2000;
Goldin-Meadow 1993; Guidetti and Nicoladis 2008; Gullberg 1998; McCafferty 2004;
Negueruela, Lantolf, Rehn Jordan, et al. 2004). The production and reception of ges-
tures from neurological perspectives (Duncan 2002; Kelly and Goldsmith 2004; Kita
and Lausberg 2007; Lausberg, Cruz, Kita, et al. 2003; Lausberg and Kita 2003; McNeill
1992, 2005) are investigated as much as structural relations between gestures and speech
(Furuyama, Takase, and Hayashi 2002; Kita, van Gijn, and van der Hulst 1998; Loehr
2006; McClave 1998 inter alia).
6.2. Body, language, and communication in the field of interaction

studies, linguistics, and semiotics: Forms, functions, and the
internal structure of gestures
Other than psychological or psycholinguistic studies, which concentrate on the concep-
tual level of gestures, studies within the field of interaction studies, linguistics, and semi-
otics address the interaction and interrelations of gesture and speech, as well as
structures, and characteristics of the medium “gesture” itself.
One of the first monographs within this framework is that of Calbris (1990). In her
book The Semiotics of French Gestures, Calbris aims “to establish the structural seman-
tics of coverbal gesturing through a semiotic analysis of the French gestural system.”
(Calbris 1990: xv) The author investigates how gestures are organized as physical
movements and shows how these forms can be used as a means of expressing concrete
as well as abstract meaning. Gestures are investigated from within, meaning for their
specific structures, functions and the symbolic mechanisms involved in meaning-making

processes. Accordingly, Calbris develops a physical semantic classification of gestures
which takes into account the
(i) physical components of gestures,

(ii) semantic fields of gestural meaning, and
(iii) the relation of physical components and semantic fields to each other.
Central to the work of Calbris (1990, 2003a, 2003b, 2008) is the assumption that gestures
can be decomposed into different features, i.e., physical components such as straight,
curved or circled movements which recur with rather stable gestural meanings. Based
on this assumption, Calbris sets up semantic groups for particular movement types,
for instance, and suggests, even further, that it is possible to set up a gesture dictionary
based on recurring gestural forms and functions. For the first time, Calbris systemati-
cally addresses the question of contrasts in the physical forms of gestures and their
possible semiotic implications by relying on a large body of different gestures form dif-
ferent discourse types.
Apart from the idea of decomposing gestures into separate features that can be
understood as building blocks of gestural forms and meanings, Calbris’ discussion of
how the representation of objects or concrete actions can serve as metaphors for abstracts
concepts is equally interesting. Gestural representations of concrete objects or actions,
according to Calbris, are representations based on abstraction: the concrete action or
object being represented in the gestures solely functions as a symbol for the concept
to be represented. “Even in evoking a concrete situation, a gesture does not reproduce
the concrete action, but the idea abstracted from the concrete reality. The dimensions of
objects are mimed according to symbolic norms.” (Calbris 1990: 115) (c.f. Kendon
2004b; Mittelberg 2006; Müller 1998, 2010; Streeck 2009)
Questions of gestural representation and abstraction are also addressed by Müller in
her book Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich (1998).
Gestures, according to Müller, display all three of Bühler’s (1934) language categories
and thus, just as speech, have the functions of expressing, making an appeal and repre-
senting (Müller 1998: 10; this volume). Yet, what is particular for gestures, and therefore
differentiates them from other sign systems, is the ability to represent objects, events,
and the like – a function only found in speech and gesture. Müller, thus, singles out ges-
tures with dominantly representational function and, based on these, develops four
modes of representation, i.e., processes of gestural sign creation based on the handling
of objects in the world. In the case of acting, the hands represent the performance of an
action such as opening a window. If the hands model however, the hands recreate an
object, such as a bowl for instance by three-dimensionally modeling it. In the case of
drawing, the hand recreates the bowl not in a plastically three-dimensional but in a
two-dimensional way. In the case of representing, the hand itself becomes the object
to be represented such as a finger used for writing on a piece of paper (Müller 1998:
115–120). In a recent revision of her earlier proposal, Müller refines the four modes
into two, i.e., acting and representing (Müller, Ladewig, and Teßendorf in preparation;
see also Müller this volume).
Each of the gestural modes of representation is tied to a specific form of abstraction
from the object of perception (e.g., two dimensional vs. three dimensional). Gestures
are thus “concrete, i.e., figuratively transformed abstractions of perceived reality and
abstract ideas.” (Müller 1998: 124, translation JB) The gestural modes thereby show
a particular perception of the observed object driving speakers to isolate particular
form aspects. The different gestural modes thus lead to differing gestural representation
of objects, and even to the particular use of hand shapes and movements (Müller 1998:
120). In the case of the gestural modes, “form features of the hands, particular hand
postures and movements of the hands constitute meaning.” (Müller 1998: 120, transla-
tion JB) In connection with the modes of representation, Müller therefore also ex-
presses first thoughts on other aspects of gestural structures, such as the relation of
gestural forms and possible meanings taken up in later publications (Müller 2004)
(see also section below for further studies on this aspect).
Another major publication within the interactional paradigm on language, body
and communication is Kendon’s Gesture: Visible Action as Utterance (Kendon 2004b).
The book, a synopsis of Kendon’s numerous studies and work on gestures, addresses
major research questions and foci under the focus of gestures and speech in interac-
tion and includes semiotic, linguistic, and cultural factors of spontaneous as well as
conventionalized gestures.
Gestures, as put forward by Kendon already in his papers in 1972, 1980, are integral
parts of the utterance “and are partnered with speech as a part of the speakers final
product.” (Kendon 2004b: 5) Starting from this assumption, Kendon examines the rela-
tion of speech and gesture in its diversity under the overarching theme of speech and
gesture jointly creating utterance ensembles. Kendon examines these utterance ensem-
bles in a variety of aspects. By studying the structural relation of speech and gesture, i.e.,
the position and timing of gestures and speech, Kendon points out the joint creation of
speech and gestures leading to a mutual adaption of the modes to each other. Gestural
expressions can be modified to fit the structure of speech, and speech may be modified
to meet the requirement of gestures (Kendon 2004b: 135ff.; see also Seyfeddinipur
2006). Furthermore, the variety and differences in the range of semantic relations of
speech and gesture as well as the semantic interaction of the both modi are foci of Ken-
don’s work (Kendon 2004b: 158ff., 176ff.). Gestures fulfill a variety of function, depend-
ing on the communicative context. Similar to Müller, Kendon therefore proposes a
functional classification of gestures including referential, pragmatic, and interactive
gestures (Kendon 2004b: 158ff.). Yet, a particular focus is placed on gestures with prag-
matic function (see also Kendon 1995, 2004a). Kendon aims at a thorough investigation
of the range of forms and functions of pragmatic gestures and thereby develops the idea
of so-called “gestures families”.
When we refer to families of gestures we refer to groupings of gestural expressions that

have in common one or more kinesic or formational characteristics. […] [E]ach family
not only shares in a distinct set of kinesic features but each is also distinct in its semantic
themes. The forms within these families, distinguished as they are kinesically, also tend to
differ semantically although, within a given family, all forms share in a common semantic
theme. (Kendon 2004b: 227)
Using this concept of the gesture family, Kendon investigates several families of ges-
tures, such as the grappolo or bunch, the ring, and gestures of the open hand, and is
able to show that functional differences in the different gestures correspond to
differences in the type and manner of their execution. Similar to Calbris (1990, 2003a,
2003b, 2008), Kendon is able to semantically group gestures based on differences in
their forms. He shows that features such as hand shapes and movement seemingly con-
trast with others in order to reveal differences in meaning. The idea of gestures being
set up of features used recurrently by speakers and partially carrying and conveying
meaning on their own, is one of the major tenets of Kendon’s work.
In her study on the palm up open hand, Müller (2004) proposes a similar analysis yet
takes the conclusions about the relation of form, meaning, and function in gestures a bit
further. Based on the idea of the gesture family, Müller is able to show that the group of
gestures using the flat hand is structured around a formational and functional core
(hand shape and orientation of the palm), while different types of movement (e.g., ro-
tating, lateral) serve for internal meaning differentiation. The variations in form, ac-
cording to Müller, are thereby not idiosyncratically based, but rather correspond to a
closed set of form, as they only involve the feature “movement”. Concluding, Müller
adds:
the fact that the incorporated features appear to be semiotically unrelated to the features
of hand shape and orientation suggests that indeed two independent form-function ele-
ments are joined in one gesture: the Palm Up Open Hand (hand shape and orientation)
with Rotating or Lateral Motion (motion pattern). This indeed points towards a rudimen-
tary morphology based on purely iconic principles in pragmatic of performative co-speech
gestures. (Müller 2004: 254)
Studies on pragmatic gestures and the question of possible form-meaning relations in

gestures set off a new focus within the study on gestures. Numerous studies investigate
the forms, functions and gestural structures and patterns of “recurrent gestures”
(Becker 2004, 2008; Brookes 2001, 2004, 2005; Calbris 2003a, 2003b, 2008; Ladewig
2006, 2010, 2012, volume 2; Bressem and Müller volume 2; Müller and Speckmann
2002; Neumann 2004; Seyfeddinipur 2004; Teßendorf 2005, 2008).
Initiated by such first studies (Birdwhistell 1970; Calbris 1990, 2003a, 2003b; Kendon
2004a, 2004b; Müller 2004; Sparhawk 1978; Webb 1996, 1998), and the idea of possible
form-meaning relations in gestures, recently more and more studies address gestures
with the question of possible “linguistic structures”, i.e., patterns and structures in ges-
tures. Studies within this paradigm focus on the forms, structures, and functions of
co-speech gestures alone, investigating a possible “grammar of gesture” (see Müller,
Bressem and Ladewig this volume). One aim of these studies is to provide an encom-
passing account of the structural properties of gestures, such as linear and simultaneous
structures of gestures, processes of gestural sign creation, and structures of meaning and
references in gestures (Bressem 2012; Bressem and Ladewig 2011; Fricke 2007, 2012;
Ladewig 2006, 2010, 2011, 2012; Ladewig and Bressem forthcoming; Müller 1998;
Müller, Ladewig, and Teßendorf in preparation; Müller volume 2; Teßendorf 2008).
Another focus is on the possible integration of co-speech gestures into theories of
(vocal) grammar, i.e., a “multimodal grammar” (Fricke 2007, 2012, this volume).
In an analogy to the phonaestheme of speech, Fricke (2012) develops the idea of
kinaestheme, i.e., a submorphematic unit conveying a context dependent form-meaning
relationship. Fricke’s concept takes earlier attempts at formulating the semanticization
in gestures a step further (see Birdwhistell 1970; Becker 2004, 2008; Kendon 2004a;
Müller 2004) and incorporates it into an overall multimodal theory of grammar. Fricke
argues for the semanticization and typification of gestures, the existence of recursive
structures in gestures as well as the applicability of constituent structures to the linear
progression of gestures. Moreover, gestures have the potential to be used in an attrib-
utive function to the verbal utterance, as they limit the extension of the noun referent
(Fricke 2012) in the same way as attributes of speech. Studies approaching gestures
with the perspective of a grammar of gesture and/or multimodal grammar make up
a yet new field in gesture studies.
6.3. Body, language, and communication in the field of cognitive

linguistics: Gestures as evidence for the bodily basis of language
A rather new, but very active field in the study of body, language, and communication is
found within cognitive linguistics, in which gestures are viewed as multimodal evidence
for general conceptual structures and mechanisms. Freeing the study of language in cog-
nitive linguistics from the bounds of speech alone and in order to “escape the linguistic
circularity” (Murphy 1996: 184), gestures are seen as the externalization “of interna-
lized imagery and experiences with the physical and social world” (Mittelberg 2010),
what Mittelberg calls “exbodiment”. Gestures give insights into the understanding of
abstract knowledge domains (Calbris 2003a; Cienki 1998b, 2005; Ladewig 2006, 2010,
2011; McNeill 1992; Mittelberg 2006, 2010; Mittelberg and Waugh 2009; Müller 1998,
2003, 2004, 2007, 2008; Müller and Tag 2010; Parrill 2008; Sweetser and Parrill 2004;
Nuñez 2004; Taub 2001) and make abstract phenomena graspable as they mediate
between the concrete and the abstract. Gestures are seen as multimodal evidence for
iconicity, metaphor, and may even reveal source domain information not found in
speech (Cienki 1998a, 2008; Cienki and Müller 2008; McNeill 1992, 2005, 2008; Mittel-
berg 2006, 2008; Mittelberg and Waugh 2009; Müller 1998, 2004, 2008; Müller and
Cienki 2009; Müller and Tag 2010; Nuñez 2004; Sweetser 1998; Webb 1996, 1998,
inter alia), thus contributing substantially to the understanding of metaphoric processes
in general.
As – according to the cognitive linguistics view – language use is grounded in bodily
experience in and with the physical world, the visual-tactile modality of gestures offers
also valuable and interesting insights into the study of image schemas (Calbris 2008;
Cienki 1998a, 2005; Ladewig 2006, 2011; Mittelberg 2006, 2010; Williams 2004) or
motor patterns (Mittelberg 2006). As image schemas, i.e., “recurring dynamic patterns
of our perceptual interactions and motor programs” (Johnson 1987: xiv) operating
at the pre-conceptual level, evolve through “perception, motoric movements, and
handling of objects in the real world” (Johnson 1987: 29), the gestural modality is
ideally suited for their expression. Analyses focusing on image schemas and motor pat-
terns in gestures led to new and complementary insights into their structures and
mechanisms.
The notion of conceptual metonymy (Gibbs 1994; Lakoff and Johnson 1980; Panther
and Thornburg 2007), on the other hand, has proven valuable insights into questions
of gestural sign formation, in particular for the interpretation of metaphoric gestures
(Mittelberg 2006, 2010; Mittelberg and Waugh 2009; Müller, Ladewig and Teßendorf
in preparation), but also for interactive processes in the production and perception of
gestures (Ladewig and Teßendorf in preparation).
6.4. Body, language, and communication in the fields of conversation

analysis and the ethnography (of communication): Gestures and
the organization of interaction in the real world
Studies within the field of conversation analysis approach the analysis of gestures, and –
even more generally – bodily behavior including eye gaze, body postures, facial expres-
sions, with the perspective of describing their use and function for the set up and
regulation of interaction. The aim of these studies is to show that speech alone
cannot be sufficient for understanding communicative and interactional processes,
and that rules and mechanisms in the regulation of interaction can only be insufficiently
investigated if gestures and other bodily behavior are excluded (Schmitt 2007a, 2007b).
In general, central concepts and mechanisms of conversation analysis, such as turn
taking (Sacks, Schegloff, and Jefferson 1974) and repair (Sacks, Schegloff, and Jefferson
1977) for instance, are evaluated with respect to the use of gestures. Accordingly, stu-
dies investigate gestures’ role regarding the negotiation of the participants’ status in
interaction (Goodwin 2000a, 2000b, 2003; Heath 1992; Heath and Luff 1992; Müller
1998; Streeck 1994). Gestures’ role for the turn-taking mechanism and thus the alterna-
tion of possible speakers in conversations is another major field of investigation
(Bavelas, Chovil, Lawrie, et al. 1992; Bohle 2007; Hayashi, Mori, and Takagi 2002;
Mondada 2007; Müller 1998, 2004; Schmitt 2005; Streeck 1995; Streeck and Hartge
1992; Weinrich 1992). Repair mechanisms in conversations (Fornel 1992; Müller 1994;
see also Seyfeddinipur 2006), the relation of gestures and pauses in speech (Bohle
2007; Müller and Paul 1999; Schmitt 2005; Schönherr 1997, 2001; see also Ladewig
2006), as well as gestures’ role and function in word searches (Müller 1994; Müller
and Paul 1999; see also Ladewig 2006, 2010) constitute another area of focus within con-
versation analysis. Gestures and their influence and relevance on turn construction and
turn construction units are also being investigated (Fornel 1992; Schmitt 2005; Streeck
and Hartge 1992). A fundamental challenge for all studies within the paradigm of con-
versation analysis is thereby the question of proving the interactive and communicative
relevance of gestures and other bodily behavior for the participants in interaction (see
Streeck 1993, 1994).
A new, but lively developing, strand within conversation analysis moves away from
simply investigating the role of bodily behavior for known mechanisms and rather ap-
proaches existing concepts and mechanisms with the goal of broadening their scope to
capture the multimodal complexity of communication and interaction (Deppermann
and Schmitt 2007; Mondada and Schmitt 2009; Mondada 2007; Müller and Bohle
2007; Schmitt 2005, 2007a, 2007b; Schmitt and Deppermann 2007).
A yet different perspective is pursued in practice-based approaches to the study of
the body and communication (Heath 1992; LeBaron and Streeck 2000; Streeck 1996,
2007, 2009). Based on the framework of conversation analysis, the ethnography (of
communication) (Hymes 1962), and context analysis (Kendon 1990; Scheflen 1973),
gestures are conceived of as a “family of human practices: not as a code or symbolic sys-
tem or (part of) language, but as a constantly evolving set of largely improvised,
heterogeneous, partly conventional, partly idiosyncratic, and partly culture-specific,

partly universal practices of using the hands to produce situated understandings.”
(Streeck 2009: 5) Studies in this area investigate the ways in which gesture contributes
to human understanding, i.e., shared understanding of the world at hand (Streeck 2009:
6). Gestures are investigated in terms of how gestural meaning-making processes evolve
through interaction with the world. The focus is on how “conversational hand gestures
ascend from ordinary, non-symbolic exploratory and instrumental manipulations of the
world of matter and things.” (LeBaron and Streeck 2000: 119) The hand’s experience
of interacting with the world is thereby the basis for their representational ability.
“[T]he hands are knowing signifiers: they bring schematic practical knowledge acquired
through action in the real world to bear upon the symbolic realm of gesticulation.”
(Streeck 1994: 3) Similar to the understanding of gestures from a cognitive linguistic
perspective, gestures are embodied, i.e., based on bodily experiences in and with the
world. Cognitive activities and concepts are multimodal and experience is stored in
the sensory modalities.
6.5. Body, language, and communication in the field of artificial

intelligence: Gestures and their modeling and implementation
in artificial agents
Investigations within the field of artificial intelligence and embodied agents aim at the
empirical study of human multimodal behavior and the conception and simulation of
computational models in virtual humans. The major aim is to build animated embodied
agents that have the ability to generate human-like multimodal utterances as well as
interact on multimodal levels with humans (Cassell, Stone, Douville, et al. 1994; Kette-
bekov and Sharma 2001; Kipp 2004; Kipp, Neff, Kipp, et al. 2007; Kopp, Tepper, and
Cassell 2004; Kopp and Wachsmuth 2004; Kopp, Bergmann, and Wachsmuth 2008;
Latoschik 2000; Wachsmuth 2005 inter alia).
Resulting from this perspective, studies within this paradigm need to describe ges-
tures and bodily behavior in general, regarding their forms and functions to allow for
a lifelike reproduction in embodied agents. Accordingly, analyses pose similar research
questions to those found in linguistic and semiotic approaches to gesture. Research
questions focus on forms, meaning construction, the linear succession and composition-
ality of gestures, as well as the selection, composition, representation, and distribution
of meaning as it comes to be expressed in speech and gesture (Arendson, van Doorn,
and de Ridder 2007; Bergmann and Kopp 2006, 2007; Cassell and Prevost 1996; Harling
and Edwards 1997; Kopp, Tepper, Ferriman, et al. 2007; Martell 2005; Quek, McNeill,
Bryll, et al. 2002; Sowa 2006; Tepper, Kopp, and Cassell 2004 inter alia).
Similar to linguistic and semiotic analyses, proposals in the artificial intelligence par-
adigm perceive coverbal gestures as made up of analytically discrete elements, propos-
ing a strict combinatoric of gestural forms along with their meanings. Gestural forms
along with their possible meanings are extracted and gathered in gestionaries or gesture
lexicons. Newer accounts even pursue a stronger separation of gestural forms and
meanings, and move away from the gestionary approach with a limited set of gestures –
which convey predefined meanings – towards a separate description of gestural forms
and their associated meaning (see Tepper, Kopp, and Cassell 2004).
7. Conclusion
In the course of time, and in particular since the 1970s, the study of the body, language,
and communication has evolved into an independent area of research. The journal
GESTURE, founded by Adam Kendon and Cornelia Müller, has celebrated its 10th
anniversary, and the International Society for Gesture Studies (ISGS) will be hosting
its sixth conference. In the past years, a number of research centers focusing on gestures
have been established, such as the Berlin Gesture Center, the Nijmegen Gesture Center
at the Max Planck Institute for Psycholinguistics, and the Manchester Gesture Center.
The field is nowadays a wanderer between disciplines, and first and foremost character-
ized by its interdisciplinary nature. Studies on gesture and other bodily forms of behav-
ior are finding attentive ears in several disciplines and their respective journals. Its
interdisciplinary approach has become one of the field’s major strengths, turning it
into an innovative area of research.
8. References
Alibali, Martha W. and Dana C. Heath 2001. Effects of visibility between speaker and listener on
gesture production: Some gestures are meant to be seen. Journal of Memory and Language 44:
169–188.
Alibali, Martha W., Sotaro Kita and Amanda Young 2000. Gesture and the process of speech pro-
duction: We think, therefore we gesture. Language and Cognitive Processes 15(6): 593–613.
Allen, Shanley, Asli Özyürek, Sotaro Kita, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2007. Language-specific and universal influences in children’s syntactic packaging of Man-
ner and Path: A comparison of English, Japanese, and Turkish. Cognition 102: 16–48.
Allport, Gordon W. and Philip E. Vernon 1933. Studies in Expressive Movement. New York:
MacMillan.
Arendson, Jeroen, Andrea J. van Doorn and Huib de Ridder 2007. When and how well do people
see the onset of gestures. Gesture 7(3): 305–342.
Argyle, Michael 1989. Körpersprache und Kommunikation. Paderborn, Germany: Junfermann.
Bateson, Gregory 1968. Redundancy and coding. Animal Communication: Techniques of Study
and Results of Research, 614–626.
Bavelas, Janet Beavin 1994. Gestures as part of speech: Methodological implications. Research on
Language and Social Interaction 27(3): 201–221.
Bavelas, Janet Beavin and Nicole Chovil 2000. Visible acts of meaning. An integrated message model
of language in face-to-face dialogue. Journal of Language and Social Psychology 19(2): 163–193.
Bavelas, Janet Beavin, Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for
dialogue. Personality and Social Psychology Bulletin 21(4): 394–405.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive
Gestures. Discourse Processes 15: 469–489.
Bavelas, Janet Beavin, Trudy Johnson Kenwood and Bruce Phillips 2002. An experimental study
of when and how speakers use gesture to communicate. Gesture 2(1): 1–17.
Beattie, Geoffrey and Aboudan Rima 1994. Gesture, pauses and speech. An experimental inves-
tigation of the effects of changing social context on their precise temporal relationships. Semi-
otica 99(3/4): 239–272.
123(1/2): 1–30.
Beattie, Geoffrey and Heather Shovelton 2002a. Iconic hand gestures and the predictability of
words in context in spontaneous speech. British Journal of Psychology 91: 473–491.
Beattie, Geoffrey and Heather Shovelton 2002b. Lexical access in talk: A critical consideration of
transitional probability and word frequency as possible determinants of pauses in spontaneous
speech. Semiotica 141(1/4): 49–71.
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language (vol. 1), 221–241.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis. Institut für deutsche
und niederländische Philologie, Freie Universität Berlin.
Becker, Karin 2008. Four-feature-scheme of gesture: Form as the basis of description. Unpub-
lished manuscript.
Bergmann, Kirsten and Stefan Kopp 2006. Verbal or visual? How information is distributed across
speech and gesture in spatial dialog. In: David Schlangen and Raquel Fernandez (eds.), Pro-
ceedings of brandial 2006, the 10th Workshop on the Semantics and Pragmatics of Dialogue,
90–97. Potsdam.
Bergmann, Kirsten and Stefan Kopp 2007. Co-expressivity of speech and gesture: Lessons for
models of aligned speech and gesture production. Symposium at the AISB Annual Conven-
tion: Language, Speech and Gesture for Expressive Characters, 153–158.
Birdwhistell, Ray 1970. Kinesics and Context. Pennsylvania: University of Pennsylvania Press.
Birdwhistell, Ray 1974. The language of the body: The natural environment of words. In: A. Sil-
verstein (ed.), Human Communication, 203–220. Hillsdale, NJ: Lawrence Erlbaum.
Birdwhistell, Ray 1975. Background considerations of the study of the body as a medium of
“expression”. In: Jonathan Benthall and Ted Polhemus (eds.), The Body as a Medium of
Expression, 34–58. New York: Dutton.
Birdwhistell, Ray 1979. Kinesik. In: Klaus R. Scherer and Harald G. Wallbott (eds.), Nonverbale Kom-
munikation. Forschungsberichte zum Interaktionsverhalten, 102–202. Weinheim, Germany: Beltz.
Bloomfield, Leonard 1984. Language. New York: Holt. First published [1933].
Bohle, Ulrike 2007. Das Wort Ergreifen, das Wort Übergeben: Explorative Studie zur Rolle
Redebegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
tation. Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases. Semiotica 184(1/4): 53–91.
Bressem, Jana and Cornelia Müller volume 2. The family of AWAY gestures. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva Ladewig, David McNeill and Jana Bressem (eds.), Body – Language –
Communication. An International Handbook on Multimodality in Human Interaction (Handbooks
of Linguistics and Communication Science 38.2). Berlin and Boston: De Gruyter Mouton.
Brookes, Heather 2001. O clever ‘He’s streetwise’. When gestures become quotable. Gesture 1(2):
167–184.
Bruner, Jerome S. and Renato Tagiuri 1954. The perception of people. In: Gardner Lindzey (ed.),
Handbook of Social Psychology, Vol. 2, 634–654. Reading, MA: Addison-Wesley.
Bühler, Karl 1934. Sprachtheorie. Die Darstellungsfunktion der Sprache. Jena, Germany: Gustav
Fischer.
Butcher, Cynthia, Susan Goldin-Meadow and David McNeill 2000. Gesture and the transition
from one to two word speech: When hand and mouth come together. In: David McNeill
(ed.), Language and Gesture, 235–258. New York: Cambridge University Press.
Butterworth, Brian and Hadar Uri 1989. Gesture, speech, and computational stages: A reply to
Calbris, Geneviève 2003a. From cutting an object to a clear cut analysis. Gesture as the representa-
Calbris, Geneviève 2003b. Multireferentiality of coverbal gestures. In: Monica Rector, Isabella
Poggi and Nadine Trigo (eds.), Gestures, Meaning and Use, 203–207. Porto: Universidade Fer-
nando Pessoa.
Calbris, Geneviève 2008. From left to right…: Coverbal gestures and their symbolic use of space.
In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 27–53. Amsterdam: John
Benjamins.
Cassell, Justine and Scott Prevost 1996. Distribution of semantic features across speech and ges-
ture by humans and computers. In: Proceedings of the Workshop on the Integration of Gesture
in Language and Speech. October 7–8, Wilmington.
Cassell, Justine, Matthew Stone, Brett Douville, Scott Prevost, Brett Achorn, Mark Steedman,
Norm Badler and Catherine Pelachaud (1994). Modeling the interaction between speech
and gesture. In: Ashwin Ram and Kurt Eiselt (eds.), Proceedings of the Sixteenth Annual Con-
ference of the Cognitive Science Society, 153–158. Atlanta, Georgia (USA): Lawrence Erlbaum.
sions. In: Jean-Pierre Koening (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford: Center for the Study of Language and Information Publication.
Cienki, Alan 1998b. Straight: An image schema and its transformations. Cognitive Linguistics 9:
107–149.
Cienki, Alan 2005. Image schemas and gestures. In: Beate Hampe and Joseph E. Grady (eds.), From
Perception to Meaning: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter
Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John Benjamins.
Condon, William S. and William D. Ogston 1967. A segmentation of behavior. Journal for Psychia-
tic Research 5: 221–235.
de Jorio, Andrea 2000. Gesture in Naples and gesture in classical antiquity. A translation of La
mimica degli antichi investigata nel gestire napoletano (Fibreno, Naples [1832]) and with an
introduction and notes by Adam Kendon. Bloomington: Indiana University Press.
de Ruiter, Jan Peter 2000. The production of gesture and speech In: David McNeill (ed.), Lan-
de Ruiter, Jan Peter 2006. Can gesticulation help aphasic people speak, or rather, communicate?
Advances in Speech-Language Pathology 8(2): 124–127.
de Ruiter, Jan Peter 2007. Postcards from the mind: The relationship between speech, imagistic
gesture and thought Gesture 7(1): 21–38.
Deppermann, Arnulf and Reinhold Schmitt 2007. Koordination. Zur Begründung eines neuen
Forschungsgegenstandes. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimoda-
len Interaktion, 15–54. Tübingen, Germany: Narr.
Duncan, Susan D. 1996. Grammatical form and “thinking for speaking” in Mandarin Chinese and
English: An analysis based on speech-accompanying gestures. Ph.D. dissertation, Department
of Psychology, University of Chicago.
Duncan, Susan D. 2002. Gesture, verb aspect, and the nature of iconic imagery in natural dis-
course. Gesture 2(2): 183–206.
Duncan, Susan D., Justine Cassell, and Elena T. Levy (eds.) 2007. Gesture and the Dynamic
Dimension of Language: Essays in Honor of David McNeill. Philadelphia: John Benjamins.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published in [1941].
gins, usage and coding. Semiotica 1: 49–98.
Esposito, Anna, Karl E. McCullough and Francis Quek 2001. Disfluencies in gesture: Gestural
correlates to filled and unfilled speech pauses. Proceedings of IEEE Workshop on Cues in
Communication. Hawaii: Kauai.
Fornel, Michel de 1992. The return gesture: Some remarks on context, inference, and iconic ges-
ture. In: Peter Auer and Aldo di Luzio (eds.), The Contextualization of Language, 159–176.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: De
Gruyter Mouton.
Fricke, Ellen volume 1. Towards a unified grammar of gesture and speech: A multimodal
approach. Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig and David McNeill
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1). Berlin:
De Gruyter Mouton.
Furuyama, Nobuhiro 2002. Prolegomena of a theory of between-person coordination of speech
and gesture. International Journal Human-Computer Studies 57: 347–374.
Furuyama, Nobuhiro, Hiroki Takase and Koji Hayashi 2002. An ecological approach to intra- and
inter-personal coordination of speech, gesture, and breathing movements. Proceedings of the
First International Workshop on Man-Machine Symbiotic Systems, 169–199. Japan: Kyoto.
Gerwing, Jennifer and Janet Beavin Bavelas 2004. Linguistic influences on gesture’s form. Gesture
4: 157–194.
Gibbs, Raymond W., Jr. 1994. The Poetics of Mind: Figurative Thought, Language, and Under-
standing. New York: Cambridge University Press.
Goldin-Meadow, Susan 1993. When does gesture become language? A study of gesture used as a
primary communication system by deaf children of hearing parents. In: Kathleen Rita Gibson
and Tim Ingold (eds.), Tools, Language and Cognition in Human Evolution, 63–85. Cambridge:
Goodwin, Charles 2000a. Action and embodiment within situated human interaction. Journal of
Goodwin, Charles 2000b. Practices of seeing: Visual analysis: An ethnomethodological approach. In:
Theo van Leeuwen and Carey Jewitt (eds.), Handbook of Visual Analysis, 157–182. London: Sage.
Goodwin, Charles 2003. Pointing as situated practice. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage, Culture and Cognition Meet, 217–241. Hillsdale, NJ: Erlbaum.
Guidetti, Michele and Elena Nicoladis 2008. Introduction to special issue: Gestures and commu-
nicative development. First Language 28(2): 107–115.
Gullberg, Marianne 1998. Gesture as a Communication Strategy in Second Language Discourse: A
Study of Learners of French and Swedish. Lund, Sweden: Lund University Press.
Gullberg, Marianne and Kenneth Holmquist 1999. Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics and Cognition 7(1): 35–63.
Gullberg, Marianne and Kenneth Holmquist 2006. What speakers do and what listeners look at.
Visual attention to gestures in human interaction live and on video. Pragmatics and Cognition
14(1): 53–82.
Hadar, Uri, Rivka Dar and Amit Teitelman 2001. Gesture during speech in first and second lan-
guage: Implications for lexical retrieval. Gesture 1(2): 151–165.
Hadar, Uri and Lian Pinchas-Zamir 2004. The semantic specificity of gesture: Implications
for gesture classification and function. Journal of Language and Social Psychology 23(2):
204–214.
Harling, Philip and Alistair Edwards 1997. Hand tension as a gesture segmentation cue. In: Philip
Harling and Alistair Edwards (eds.), Progress in Gestural Interaction. Proceedings of Gesture
Workshop ’96, 75–88. Berlin: Springer.
Hayashi, Makoto, Junko Mori and Tomoyo Takagi 2002. Contingent achievement of co-tellership
in a Japanese conversation: An analysis of talk, gaze, and gesture. In: Cecilia E. Ford, Barbara
A. Fox, and Sandra A. Thompson (eds.), The Language of Turn and Sequence, 81–122. Oxford:
Heath, Christian 1992. Gesture’s discreet tasks: Multiple relevancies in visual conduct and in the
contextualization of language. In: Peter Auer and Aldo di Luzio (eds.), The Contextualization
of Language, 101–127. Amsterdam: John Benjamins.
Heath, Christian and Paul Luff 1992. Collaboration and control: Crisis management and multime-
dia technology in London Underground Line Control Rooms. Journal of Computer Supported
Cooperative Work 1(1–2): 69–94.
Hübler, Axel 2001. Das Konzept “Körper” in den Sprach- und Kommunikationswissenschaften.
Tübingen: Francke.
Hymes, Dell 1962. The ethnography of speaking. In: Thomas Gladwin and William C. Sturtevant
(eds.), Anthropology and Human Behavior, 13–53. Washington, DC: Anthropological Society
of Washington DC.
Iverson, Jana and Susan Goldin-Meadow 1998. The Nature and Functions of Gesture in Children’s Com-
munication: New Directions for Child and Adolescent Development. San Francisco: Jossey-Bass.
Johnson, Mark 1987. The Body in Mind. The Bodily Basis of Meaning, Imagination, and Reason.
Chicago: University of Chicago.
Kelly, Spencer and Leslie Goldsmith 2004. Gesture and right hemisphere involvement in evaluat-
ing lecture material. Gesture 4: 25–42.
Kendon, Adam 1972. Some relationship between body motion and speech. In: Aron W. Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Perga-
mon Press.
Kendon, Adam 1982. The study of gestures: Some observations on its history. Recherches Semio-
tiques 2(1): 44–62.
Kendon, Adam 1990. Conducting Interaction. Patterns of Behavior in Focused Encounters. Cam-
Kendon, Adam 2004a. Contrasts in gesticulation: A Neapolitan and a British speaker compared.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday
Gestures, 173–193. Berlin: Weidler.
Kendon, Adam 2004b. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
Kettebekov, Sanshzar and Rajeev Sharma 2001. Toward natural gesture/speech control of a large
display. In: Roderick Little and Laurence Nigay (eds.), Engineering for Human-Computer
Interaction, 221–234. Heidelberg: Springer.
Kimbara, Irene 2007. Indexing locations in gesture: Recalled stimulus image and interspeaker
coordination as factors influencing gesture form. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language (vol. 1), 213–220.
Kipp, Michael 2004. Gesture Generation by Imitation: From Human Behavior to Computer Char-
acter Animation. Boca Raton, FL: Dissertation.com.
Kipp, Michael, Michael Neff, Kerstin H. Kipp and Irene Albrecht 2007. Towards natural gesture
synthesis: Evaluating gesture units in a data-driven approach to gesture synthesis. Intelligent
Virtual Agents 7: 15–28.
Kita, Sotaro 1990. The Temporal Relationship between Gesture and Speech: A Study of Japanese-
English Bilinguals. Chicago: University of Chicago.
Kita, Sotaro and Hedda Lausberg 2007. Speech-gesture discoordination in split brain patients’
left-hand gestures: Evidence for right-hemispheric generation of co-speech gestures. Cortex
8: 131–139.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and co-
speech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture, and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
multimodal thinking—towards an integrated model of speech and gesture production. Interna-
tional Journal for Semantic Computing 2(1): 115–136.
Kopp, Stefan, Paul Tepper and Justine Cassell 2004. Towards an integrated microplanning of lan-
guage and iconic gesture for multimodal output. In: International Conference on Multimodal
Interaction, 04 October 13–15. Center County, Pennsylvania: Pennsylvania State College.
Kopp, Stefan, Paul Tepper, Kimberley Ferriman, Kristina Striegnitz and Justine Cassell 2007.
Trading spaces: How humans and humanoids use speech and gesture to give directions. In:
Toyoaki Nishida (ed.), Conversational Informatics. An Engineering Approach, 133–160. Chi-
chester: John Wiley.
Kopp, Stefan and Ipke Wachsmuth 2004. Synthesizing multimodal utterances for conversational
agents. Computer Animation and Virtual Worlds 15(1): 39–52.
Krauss, Robert M., Yihusiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonverbal
communication: What do conversational hand gestures tell us? In: Mark Zanna (ed.),
Advances in Experimental Psychology, 389–450. San Diego: Academic Press.
Krauss, Robert, Robert Dushay, Yihsiu Chen and Francis Rauscher 1995. The communicative
value of conversational hand gestures. Journal of Experimental Social Psychology 31: 533–552.
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech and Sign, 93–116.
Krout, Maurice H. 1935. The social and psychological significance of gestures (a differential ana-
lysis). The Pedagogical Seminary and Journal of Genetic Psychology 47: 385–412.
Kuhlen, Anna and Mandana Seyfeddinipur 2007. From speaker to speaker: Repeated gestures
across speakers. Paper presented at the Conference Berlin Gesture Center Colloquium, August
29, 2007. Berlin.
Ladewig, Silva H. 2006. Die Kurbelgeste – konventionalisierte Markierung einer kommunikativen
Aktivität. MA thesis, Institut für deutsche und niederländische Philologie, Freie Universität
Berlin.
In: Sprache und Literatur 41(1) (105): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6 http://cogni
textes.revues.org/406.
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation. Faculty of Social and Cultural Sciences, Euro-
pean University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium ‘hand’: Discover-
ing recurrent structures in gestures. Semiotica.
Ladewig, Silva H. and Sedinha Teßendorf in preparation. Collaborative metonymy – Co-construction
meaning and reference in gestures.
Lakoff, George and Johnson Mark 1980. Metaphors We Live By. Chicago: Chicago University Press.
Latoschik, Marc Erich 2000. Multimodale Interaktion in Virtueller Realität am Beispiel der Virtuel-
len Konstruktion. Bielefeld, Germany: Technische Universität Bielefeld.
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alan Ptito 2003. Pantomime to
visual presentation of objects: Left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
co-speech gestures and in gesturing without speaking. Brain and Langugage 86: 57–69.
LeBaron, Curtis and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David McNeill
(ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Levy, Elena T. and Carol A. Fowler 2000. The role of gestures and other grades language forms in
the grounding of reference in perception. In: David McNeill (ed.), Language and Gesture,
Loehr, Dan 2006. Gesture and Intonation. Washington, DC: Georgetown University Press.
Mallery, Garrick 1972. Sign Language among North American Indians Compared with that among
Other Peoples and Deaf-Mutes. The Hague: Mouton. First published [1881].
Martell, Craig 2005. FORM: An experiment in the annotation of the kinematics of gesture. Ph.D.
dissertation, University of Pennsylvania, Pennsylvania.
Matschnig, Monika 2007. Körpersprache: Verräterische Gesten und Wirkungsvolle Signale. Mu-
nich: Gräfe und Unzer.
Mayberry, Rachel I. and Joselynne Jaques 2000. Gesture production during stuttered speech. In:
David McNeill (ed.), Language and Gesture 199–214. Cambridge: Cambridge University Press.
McCafferty, Stephen G. 2004. Space for cognition: Gesture and second language learning. Interna-
tional Journal for Applied Linguistics 14(1): 148–165.
McClave, Evelyn 1998. Pitch and manual gesture. Journal of Psycholinguistic Research 27(1): 69–89.
McNeill, David 1979. The Conceptual Basis of Language. Hillsdale, NJ: Erlbaum.
McNeill, David 1986. Iconic gestures of children and adults. Semiotica 62: 107–128.
Review 96(1): 175–179.
of Chicago Press.
McNeill, David 2008. Unexpected metaphors. In: Alan Cienki and Cornelia Müller (eds.), Meta-
phor and Gesture, 185–199. Amsterdam: John Benjamins.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking-for speaking. In: David
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for multi-
modal models of grammar. Ph.D. dissertation. Cornell University New York, Ann Arbor, MI: UMI
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyv Evans
and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and New Direc-
tions, 351–385. London: Equinox.
Mittelberg, Irene and Linda Waugh 2009. Multimodal figures of thought: A cognitive-semiotic
approach to metaphor and metonymy in co-speech gesture. In: Charles Forceville and Eduardo
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter.
Mondada, Lorenza 2007. Interaktionsraum und Koordinierung. In: Reinhold Schmitt (ed.), Koor-
dination. Analysen zur Multimodalen Interaktion, 55–94. Tübingen: Narr.
Mondada, Lorenza and Reinhold Schmitt 2009. Situationseröffnungen. Zur Multimodalen Herstel-
lung Fokussierter Interaktion. Tübingen: Narr.
Müller, Cornelia 1994. Cómo se llama …? Kommunikative Funktionen des Gestikulierens in
Wortsuchen. In: Peter Paul König and Helmut Wiegers (eds.), Satz-Text-Dikurs, 71–80. Tübin-
gen, Germany: Niemeyer.

Arno Spitz.
Müller, Cornelia 2003. On the gestural creation of narrative structure: A case study of a story told
in conversation. In: Monica Rector, Isabella Poggi and Trigo Nadine (eds.), Gestures: Meaning
and Use, 259–265. Porto: Universidade Fernando Pessoa.
Müller, Cornelia 2004. Forms and uses of the Palm Up Open Hand. A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everday Gestures,
Müller, Cornelia 2007. A dynamic view on matphor, gesture and thought. In: Susan D. Duncan,
Justine Cassell and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language.
Essays in Honor of David McNeill, 109–116. Amsterdam: John Benjamins.
Müller, Cornelia 2008. Metaphors. Dead and Alive, Sleeping and Waking. A Cognitive Approach to
Metaphors in Language Use. Chicago: University of Chicago Press.
Perspektive. Zeitschrift für Sprache und Literatur 41(1): 37–68.
Müller, Cornelia volume 2. Singular and recurrent gestures. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva Ladewig, David McNeill and Jana Bressem (eds.), Body – Language –
Communication. An International Handbook on Multimodality in Human Interaction (Hand-
books of Linguistics and Communication Science 38.2). Berlin and Boston: De Gruyter Mouton.
Müller, Cornelia and Ulrike Bohle 2007. Das Fundament fokussierter Interaktion: Zur Vorbereitung
und Herstellung von Interaktionsräumen durch körperliche Koordination. In: Reinhold Schmitt
(ed.), Koordination. Analysen zur Multimodalen Interaktion, 129–167. Tübingen: Narr.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of ges-
tures: A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig
and David McNeill (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. Handbooks of Linguistics and Communication
Science (HSK 38.2). Berlin: De Gruyter Mouton.
Multimodal Metaphor, 297–328. Amsterdam: John Benjamins.
syntaktische Fallstudie. In: Hartmut Eggert and Janusz Golec (eds.), … Wortlos der Sprache
mächtig. Schweigen und Sprechen in Literatur und Sprachlicher Kommunikation, 265–281.
Stuttgart, Germany: Metzler.
Müller, Cornelia, Silva H. Ladewig and Sedinha Teßendorf in preparation. Gestural modes of mime-
sis – revisited. Mimetic techniques and cognitive-semiotic processes driving gesture creation.
Müller, Cornelia and Susanne Tag 2010. Attention, foregrounding, and embodied activation. Mul-
timodal Metaphors in Spoken Discourse. Cognitive Semiotics 6: 85–120.
Murphy, George L. 1996. On metaphoric representation. Cognition 60(2): 173–204.
Negueruela, Eduardo, James P. Lantolf, Stefanie Rehn Jordan and Marı́a José Gelabert 2004. The
“privative” function of gesture in second language speaking activity: A study of motion verbs
and gesturing in English and Spanish. International Journal for Applied Linguistics 14(1): 113–147.
Neumann, Ranghild 2004. The conventionalization of the ring gesture in German discourse. In:
Nuñez, Rafael E. 2004. Embodied cognition and the nature of mathematics: Language, gesture,
and abstraction. In: Kenneth D. Forbus, Dendre Gentner and Terry Regier (eds.), Proceedings
of the 26th Annual Conference of the Cognitive Science Society, 36–37. Mahwah, NJ: Lawrence
Erlbaum.
Özyürek, Asli 2000. The influence of addressee location on spatial language and representational
gestures of direction. In: David McNeill (ed.), Language and Gesture, 64–83. Cambridge: Cam-
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects of
addressee location on representational gestures. Journal of Memory and Language 46: 688–704.
Özyürek, Asli, Sotaro Kita and Shanley Allen 2001. Tomato man movies: Stimulus kit designed to
elicit manner, path and causal constructions in motion events with regard to speech and ges-
tures. Nijmegen, the Netherlands: Max Planck Institute for Psycholinguistics, Language and
Cognition group.
Panther, Klaus Uwe and Linda Thornburg 2007. Metonymy. In: Dirk Geerarts and Hubert Cuyckens
(eds.), The Oxford Handbook of Cognitive Linguistics, 236–264. Oxford: Oxford University Press.
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gestures, 225–247. Amster-
Pease, Allan and Barbara Pease 2003. Der tote Fisch in der Hand und andere Geheimnisse der Kör-
persprache. Berlin: Ullstein.
Pike, Kenneth 1967. Language in Relation to a Unified Theory of the Structure of Human Behav-
ior. The Hague: Mouton.
Pike, Kenneth 1972. Towards a theory of the structure of human behavior In: Ruth M. Brend
(ed.), Kenneth L. Pike – Selected Writings, 106–116. The Hague, Paris: Paris Mouton.
Quek, Francis, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E.
McCullough and Rashid Ansari 2002. Multimodal human discourse: Gesture and speech. Asso-
ciation for Computing Machinery, Transactions on Computer-Human Interaction 9(3): 171–193.
Ruesch, Jürgen and Gregory Bateson 1951. Communication: The Social Matrix of Psychiatry. New
York: Norton.
Ruesch, Jürgen and Weldon Kees 1956. Nonverbal Communication: Notes on the Visual Percep-
tion of Human Relations. Berkeley: University of California Press.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1977. The preference for self-correction
in the organization of repair in conversation. Language 53: 361–382.
Sapir, Edward 1949. Selected Writings in Language, Culture and Personality. Edited by David
Mandelbaum. Berkeley: University of California Press.
Scheflen, Albert 1973. Communicational Structure. Bloomington: Indiana University Press.
Scherer, Klaus R. and Harald G. Walbott 1984. Nonverbale Kommunikation. Forschungsberichte
zum Interaktionsverhalten. Weinheim: Beltz.
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung 6: 17–61
(www.gespraechsforschung-ozs.de).
Schmitt, Reinhold (ed.) 2007a. Koordination. Analysen zur Multimodalen Interaktion. Tübingen:
Narr.
Schmitt, Reinhold 2007b. Von der Konversationsanalyse zur Analyse multimodaler Interaktion.
In: Heidrun Kämper and Ludwig M. Eichinger (eds.), Sprach-Perspektiven, 395–417. Tübingen:
Narr.
Schmitt, Reinhold and Arnulf Deppermann 2007. Monitoring und Koordination als Voraussetzun-
gen der multimodalen Konstitution von Interaktionsräumen. In: Reinhold Schmitt (ed.), Koor-
dination. Analysen zur Multimodalen Interaktion, 95–128. Tübingen: Narr.
Schönherr, Beatrix 1997. Syntax – Prosodie – nonverbale Kommunikation. Empirische Untersu-
chungen zur Interaktion Sprachlicher und Parasprachlicher Ausdrucksmittel im Gespräch. Tü-
bingen, Germany: Niemeyer.
Schönherr, Beatrix 2001. Paraphrasen in gesprochener Sprache und ihre Kontextualisierung durch
prosodische und nonverbale Signale. Zeitschrift für Germanistische Linguistik 29: 332–363.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the “Pistol
Hand”. In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Every-
day Gestures, 205–216. Berlin: Weidler.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting Speech and Gesture. Max Planck Institute
Series in Psycholinguistics, 39. Nijmegen, NL: Radboud Universiteit Nijmegen.
Akademische Verlagsgesellschaft.
Sparhawk, Carol 1978. Contrastive-Identificational features of Persian gesture. Semiotica 24(1/2):
49–86.
guage and Social Interaction 27(3): 239–267.
Streeck, Jürgen 1995. On projection. In: Esther Goody (ed.), Social Intelligence and Interaction,
87–110. Cambrigde: Cambridge University Press.
Streeck, Jürgen 1996. How to do things with things. Objets trouvés and symbolization. Human Stu-
dies 19: 365–384.
Streeck, Jürgen 2007. Geste und verstreichende Zeit. Innehalten und Bedeutungswandel der “bieten-
den Hand”. In: Heiko Hausendorf (ed.), Gespräch als Prozess, 157–180. Tübingen: Narr.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer and
Aldo di Luzio (eds.), The Contextualization of Language, 138–158: Amsterdam: John Benjamins.
Sweetser, Eve 1998. Regular Metaphoricity in Gesture: Bodily-Based Models of Speech Interaction.
London: Elsevier.
Sweetser, Eve and Fey Parrill 2004. What we mean by meaning: Conceptual integration in gesture
Tabensky, Alexis 2001. Gesture and speech rephrasings in conversation. Gesture 1(2): 213–235.
Taub, Sarah F. 2001. Language from the Body. Iconicity and Metaphor in American Sign Language.
Tepper, Paul, Stefan Kopp and Justine Cassell 2004. Context in context: Generating language and
iconic gesture without a gestionary. AAMAS ’04 Workshop on Balanced Perception and Action
for Embodied Conversational Agents: 79–86.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de
Barrer”. MA thesis, Institut für deutsche und niederländische Philologie, Freie Universtität Berlin.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures–combining functional with cognitive
approaches. Unpublished manuscript.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1/2): 83–105
Tylor, Edward B. 1964. Researches into the Early History of Mankind and the Development of Civ-
ilization. Edited and abridged with an introduction by Paul Bohannan. Chicago: University of
Chicago Press. First published [1870].
Wachsmuth, Ipke 2005. Multimodale Interaktion in Mensch-Maschine-Systemen. In: Christiane
Steffens, Manfred Thüring and Leon Urbas (eds.), Zustandserkennung und Systemgestaltung.
6. Berliner Werkstatt Mensch-Maschine-Systeme, 1–6. Düsseldorf: VDI.
Watzlawick, Paul, Janet Beavin Bavelas and Don D. Jackson 1969. Menschliche Kommunikation –
Formen, Störungen, Paradoxien. Bern: Huber.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. In: Lynn Messing (ed.), Proceed-
ings of WIGLS. The Workshop on the Integration of Gesture in Language and Speech. October
7–8, 1996, 79–95. Delaware, Newark: Applied Science and Engineering Laboratories Newark.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures. In:
Christian Cave, Isabelle Guaitelle and Serge Santi (eds.), Oralité et Gestualité: Communication
Multimodale, Interaction, 387–391. Montreal: L’Harmattan.
Weinrich, Lotte 1992. Verbale und Nonverbale Strategien in Fernsehgesprächen: Eine Explorative
Studie. Tübingen: Niemeyer.
Williams, Robert F. 2004. Making meaning from a clock: Material artifacts and conceptual blend-
ing in time-telling instruction. Ph.D. dissertation, Department of Cognitive Science, University
of California, San Diego.
Wolff, Charlotte 1945. A Psychology of Gesture. London: Methuen.
Wundt, Wilhlem 1921. Die Sprache. Völkerpyschologie: Eine Untersuchung der Entwicklungsge-
setze von Sprache, Mythus und Sitte. Erster Band. Leipzig: Kröner.
27. Language – gesture – code: Patterns of movement

in artistic dance from the Baroque until today
1. The Baroque: Dancing literalness
2. Dance in the 19th century: The body as a systematized instrument
3. Modern Dance: Communicating emotion
4. Dance since the 1960s: De-Gesturing the motion
5. References
Abstract
To talk about artistic dance as a language is a delicate issue. It is especially sensitive
because language obviously in the first place refers to communication, thereby evoking
the impression that dance is per se “understandable” in a global sense, as it does not
use words but bodily motion to express itself in front of an audience. Though protagonists
of Modern Dance, such as Mary Wigman, talk about the “language of dance” (Wigman
1963), the modes of transmitting contents or meaning are very different from the verbal
mode. Even the idea of meaning as a hermeneutic category is partly neglected in the
course of contemporary dance in Europe since the 1990s.
While dance in the baroque era and in classical ballet from the 19th century onwards
still holds on to movement as a codified language, choreographers in the course of Mod-
ern Dance in the 20th century are less focused on an identifiable vocabulary according to
aesthetic conventions, but instead approach the modes of language in a more expressive
way. The following foray through dance from the Baroque until today delineates
developments of the notion of dance as a codified system during the different epoches.
1. The Baroque: Dancing literalness

Baroque stage dance in Europe is usually historically framed in the time from 1660 until
1750 and is marked by the creation of dance manuals, where for the first dance time
appeared as text in an academic field (Franko 1993: 3). However, Mark Franko also
takes into consideration the period before, often called Manierism (1580–1660), an
interval in history where dance claimed a certain autonomy from text and from
27. Language – gesture – code 417
significations of the body that went with it. Regarded as burlesque dance, this period
was merely a short interlude in the history of the performing arts, and afterwards
dance was fixed again in its role as an interludious part within opera plays (Franko
1993: 3–5). Art in the era of Baroque served a highly social function. José Antonio Mar-
avall, who is cited by Franko, articulates art in that time as “the gesticulating submission
of the individual to the confines of the social order” (Franko 1993: 4). In France, “per-
mission” for those expressions was given by the king, namely Louis XIV, who estab-
lished the court ballet in order to legitimate “the monarch in his double status as
real and ideal body,” thus serving as the warrant of political power (Franko 1993: 3;
Lee 2002: 66). Along with this representational turn, Sarah R. Cohen highlights the
“[t]ransformation of aristocratic performance into an aesthetic product that a wider
public could appropriate” (Cohen 2000: 166). As a further effect, rules were established
that turned dance into an art form in need of professionals, thus – going along with re-
ceiving it as art on stage – removing it from being a social amusement physically avail-
able for everybody (Jeschke 1996: 93). Under the regency of Louis XIV, dance
developed into a regulated system of steps and positions, assisted by Pierre Rameau’s
determination of the five foot positions in ballet, recorded in 1725 (Rameau 1967:
11–22), as well as the signification of designated steps like “balancé, jeté, pirouette
[and] entrechat” that still exist today (Lee 2002: 82).
Already a century before, dance was understood as a written text and as a represen-
tation of aristocratic power (Franko 1993: 36). Accordingly, stage dance produced a dis-
ciplined body executing strict codes of movement, fixed in written, in the sense of the
written word, choreo-graphed instructions, such as the famous Orchesography by
Thoinot Arbeau in 1589 (Lepecki 2006: 6–7). Consequently, André Lepecki formulates
the assumption that “the modern body revealed itself as a linguistic entity” (Lepecki
2006: 7). Exemplary for this phenomenon of dance in the early baroque period is the
so called geometrical dance, focusing on specific shapes and figures of the moving
body which the audience was supposed to literally “read” as a text (Franko 1993:
15–16), supported by a flattened dimension of the event presented on stage: “Geomet-
rical dance […] acquired its name from geometrical and symbolic patterns that were de-
signed to be seen from above as if they were horizontal or flat on a page” (Franko 1993:
21). The dancers’ bodies even formed words executed through specific postures
(Jeschke 1996: 89): A prominent printed example is the Alfabeto Figurato by Giovanni
Battista Bracelli in 1624 (Fig. 27.1; Franko 1993: 17). Thus dance in the Baroque be-
comes a regulated model of reading as well as a written, choreo-graphed text itself,
at least since Raoul Auger Feuillet established a notational system for dance (Feuillet
1700, Jeschke 1996: 91).
From the middle of the 17th century, dance manuals appeared, especially in France,
Germany and Great Britain, that gave instructions and thereby codified the danced
movement on stage, generating “[…] a technology that creates a body disciplined to
move according to the commands of writing” (Lepecki 2006: 6). Thus, stage dance
was established as an art form following fixed rules and patterns according to positions
of the body itself, the steps as well as arms, hands and fingers. Apart from regulating
movement according to the social system, an important goal was to give dance the sta-
tus of an accepted art form: Claude François Ménéstrier claims that dance expresses
itself through movement in the same way that painting does so through its own specific
means (Ménéstrier 1682: 41). The system now established by various authors regulates
Fig. 27.1: Giovanni Battista Bracelli, Alfabeto Figurato. From: Bizzarie di varie figure (1624), Bib-
liothèque nationale, Paris
the body’s motion in every possible position. John Weaver for instance gives exact
directives about how to stand and to use gravity and balance in order to execute a cer-
tain set of movements that followed (Weaver 1721: 97). Kellom Tomlinson turns this
initial anatomical need into a social posture essential to literally marking one’s own
position and representational status in society, by giving very detailed advice how to
stand, e.g., during a conversation:
[…] when we stand in Company, are when the weight rests much on one Foot, as the other,
the Feet being considerably seperated or open, the Knees [sic!] streight [sic!], the Hands
placed by the side in a genteel Fall or natural Bend of the Wrists, and being in an agreeable
fashion or Shape about the joint or bend of the Hip, with the Head gracefully turning to
the Right or Left, which completes a most Heroic Posture. (Tomlinson 1735: 4)
Wendy Hilton provides a detailed overview about the use of different body parts, e.g.,
the arms, according to the social dance conventions, including Tomlinson’s instructions
(Hilton 1981: 133–171). Stephanie Schroedter shows how those positions and gestures
change meaning during the course of time: A slightly different shape of the arms or
fingers, for example, was enough to expose a dancer as not being up-to-date with the
current social conventions (Schroedter 2008: 425).
Apart from the choreographed guidelines involving a stern code of bodily attitudes
necessary to communicate in social settings, the project of many dance authors was to
establish a vocabulary appropriate to express emotions in dance. Ménéstrier in partic-
ular reflects about the “affects of the soul” as a primary function of artistic dance
(Ménéstrier 1682: 161). In addition to the movement, which is usually denominated
as the steps, the “expressions” are the vehicles to transport emotions and passions
(Ménéstrier 1682: 158; Jeschke 1996: 95–96). The bodily attitudes involved in the pro-
duction of affects such as love, hate, grief or terror are more complicated than the sim-
ple execution of steps. These attitudes also bear a certain ambivalence: on the one hand,
Ménéstrier asks for preferably “natural” postures which shall be able to trigger the
imagination of the audience, and on the other hand encloses them in a codified and
regulated system according to the laws of “rhetoric,” providing almost a catalogue
of appropriate gestures in relation to the expressed emotion (Jeschke 1996: 96–97;
Huschka 2006: 115; Fig. 27.2). Thus a literal language of affects and emotions becomes
an intrinsic part of dance as a movement sytem on stage, claiming the potential of
making its contents clearly understandable for the spectators.
Fig. 27.2: Gerard Lairesse: Terror (1640–1711)
About 80 years later, Jean Georges Noverre strengthens the idea that dancers should
not only be accurate interpreters of formalized figures but also should support the
mode of emotional expression within the dance. In his Lettres sur la Danse (1760), he
postulates the “truthful expression” that should withdraw from the mere abstraction
of danced figures but tend to unify body, mind and soul, thus claiming an aesthetics
of authenticity avant la lettre (Huschka 2006: 107). However, this “naturalness” is again
a constructed one, produced by a necessary literacy that lets the whole dance “speak”
and links it again to language, expressed in Noverre’s statement “[w]ords become use-
less, everything will speak, each movement will be expressive, each attitude will depict a
particular situation, each gesture will reveal a thought, each glance will convey a new
sentiment; [sic!] everything will be captivating, because all will be a true and faithful
imitation of nature” (Noverre 1951: 52). With his proclaimed danse d’action (Noverre
1951: 52), he provides the basis for ballet as an autonomous art form that literally
can “speak for itself ” through its movement and gestures. Still, ballet is bound to the
opera, but Noverre prepares the ground for ballet being some 100 years later not
only the appendix of opera plays any more, or being “parked” in an interlude position.
Noverre promotes the change from the more abstract ballet de cour, serving social
functions, into the ballet en action, making pantomime a key element (Schroedter
2004: 411).
2. Dance in the 19th century: The body as a systematized

instrument
Carlo Blasis’ famous treatise Traité Élémentaire de la Danse, as well as his Code of Terp-
sichore (1828), were among the key works for the academization of ballet (see also
Jeschke, Wortelkamp, and Vettermann 2005: 173). In the latter work, he develops the
idea of dance as being a universal language of emotion and puts a special focus on
the gesture in ballet – a position that had a huge impact on the widespread reception
of dance as a language to be understood all over the world, no matter which culture
it belongs to. Though contemporary choreographers, as well as dance theoreticians
and sociologists, have shown that dance is always tied to its cultural context, the
above-mentioned conception of dance is still prominent today, and therefore Blasis
shall be cited more extensively:
By gesture we present to the eyes all that we cannot express to the ears; it is a universal
interpreter that follows us to the very extremities of the globe […] Speech is the language
of reason: it convinces our minds; tones and gestures form a sentimental discourse that
moves the heart. Speech can only give utterance to our passions, by means of reflection
through their relative ideas. Voice and gesture convey them to those we address, in an
immediate and direct manner. In short, speech, or rather the words which compose it, is
an artificial institution, formed and agreed upon between men, for a more distinct recipro-
cal communication of their ideas; whilst gesture and the tone of voice are […] the dictio-
nary of simple nature; they are a language innate in us, and serve to exhibit all that
concerns our wants and the preservation of our existence; for which reason they are
rapid, expressive, and energetic. Such a language cannot but be an inexhaustable source
to an art whose object is to move the deepest sensations of the soul! (Blasis 1976: 113–114)
In favouring the pantomime (Blasis 1976: 114), he differentiates between two kinds of
gestures: “natural” and “artificial” ones (Blasis 1976: 114). The former are meant to
comprise all those expressions that are linked to the “sentiments,” the latter are related
to the “moral world” (Blasis 1976: 114). Akin to Noverre, Blasis opts for a system of
gestural conventions that shall be able to transmit the expressed emotions via an arti-
ficial language presented on stage. However, he partly quits the field of “direct”
understanding evoked by pantomimic imitation and introduces the idea of a repertory

of gestural conventions that the spectator needs to learn by “theatrical habit,” thus
being able to decode certain “symbolic signs” as the “things they represent” (Blasis
1976: 155). Thus Blasis develops a system of gestures being both an artificial part of rep-
resentation and an intelligible mode of communication, transmitted by the symbolic
language of theatrical dance. Still, the need for a hermeneutic understanding of
ballet is striking in Blasis’ explanations and, insofar as it is prominent, reminds of the
exigences in baroque dance: “The Ballet-master should set the gestures, attitudes and
steps exactly to the rythm of the tunes, and so manage that each sentiment expressed
may be responsive to the measure” (Blasis 1976: 127). Hence the dancer develops
into an artificial figure on stage, perfectly trained with regard to posture and balance
and utilizing the principles of geometry to clarify the bodily positions in dance ( Jeschke,
Wortelkamp, and Vettermann 2005: 190–191). In his treatises, Blasis “codified the tech-
nique of ballet as it then existed and laid the foundations of many of the great teaching
methods of later times,” as Ivor Guest remarks (Guest 1966: 16). Terms like equilibrium
and arabesque, as well as the broad variations of pirouettes and jumps, had been
unknown until then (Guest 1966: 16–17), leading to a new understanding of physical
grace in dance (Pappacena 2007: 98–99). As an important effect, the dancer more and
more becomes her/his own sculptor, training the body as the “material” of her/his art
(Brandstetter 2004: 57). Hence, ballet became a “storytelling genre deeply entranched
in language,” as Marian Smith points out (Smith 2000: 122) – a language sytematically
codified in physical gestures and leading to a canonization and institutionalization of
ballet (Pappacena 2007: 95).
According to the narrativity of ballet and a new ideal of grace and proportion,
female protagonists became the main focus in ballet of the 19th century, embodying
characters linked to one of the most famous protagonists of European ballet in that
time: Marie Taglioni. Due to her performance, the former dance characters at the
opera stages – as they are “noble, demi-caractère and comique,” and also bound to a
certain bodily appearance – vanished in favour of the “taglionization” of the Romantic
Ballet (Guest 1966: 18–19). With Taglioni, the “ideal of graceful style” was introduced
on the stages of the Paris opera world, spreading quickly all over Europe (Guest 1966:
79). An important contribution to the rise of Romantic Ballet and its female protago-
nists was the ideal of elevation, prominently featured by the dance master August
Bournonville, who proclaimed the “légèreté” in dance, paradoxically achieved through
sincere daily training practice (Falcone 2007: 283–286). Also, Bournonville added the
five positions of the body to the system of steps founded by Rameau, as there are en
face, effacé, croisé, épaulé and renversé according to the body’s alignment towards
the audience (Falcone 2007: 289). In conjunction with the required effortlessness, the
characters in ballet often featured female apparitions, thus strengthening the myth of
Romantic Ballet and its main topics of fugitive love, as for example the principle per-
former of the same name in La Sylphide (Brandstetter 2007), choreographed by Filippo
Taglioni (1832), the father of Marie, whose fame was grounded in dancing the role of
the main character (Anderson 1992: 79).
During the course of the 19th century, dance develops as an independent art form
and slowly distances itself from the needs of story telling, as Théophile Gautier remarks
already in 1848: “the real, unique and eternal theme for the ballet is the dance itself ”
(in Smith 2000: 97).
3. Modern Dance: Communicating emotion

Art at the turn of the century is marked by a double figure that goes along with a general
crisis of culture in Europe (Brandstetter 1995b: 49). On one hand, language doesn’t seem
to provide assured knowledge any more and increasingly fails to serve as an appropriate
mean of description of aesthetic experience, as Hugo von Hoffmannsthal argues in his fic-
tional Letter of Lord Chandos to Francis Bacon (1902). In that letter, Lord Chandos com-
plains about his inability to get a hold of the cultural phenomena surrounding him,
expressed by an increasing destruction of thought and language caused by an overload
of books, literature and therefore (linguistic) signs (Brandstetter 1995b: 50–51): “Rather,
the abstract words which the tongue must enlist as a matter of course in order to bring out
an opinion disintegrated in my mouth like rotten mushrooms.” (Hoffmannsthal 2005: 121).
Some years later, Hoffmannsthal gets to know the US-American dancer Ruth St. Denis:
their ongoing dialogue kick-starts his new approach towards a creative aesthetic potential.
St. Denis speaks of dance as a “means of communication between soul and soul – to
express what is too deep, too fine for words.” (St. Denis 1979: 22) Thus dance seems to
be the way out of the crisis of signification, opening a new “modell of creation,” as
Gabriele Brandstetter summarizes (Brandstetter 1995b: 17, 56). Beside dance as an appro-
priate mode of aesthetic articulation in the time of semiotic uncertainty, Stéphane Mal-
larmé argues decidedly against the moving gesture that “speaks” (in Smith 2000: 97)
but develops the notion of dance as an “écriture corporel”: Not the gestures are denoting
certain narrative contents but dance and its movement itself appears as a sign on stage,
transforming the dancer into a medium of corporal writing (Brandstetter 1995b: 21).
On the other hand, dancers like Isadora Duncan and Mary Wigman follow a conse-
quent path in distancing themselves from the codified, geometrical and hence “unnat-
ural” ideal of classical ballet (Duncan 2008: 35; Wigman 1963: 33–35), thus preparing
the ground for Modern Dance. Some of the key features in Duncan’s dances are
wide, fluent clothes (in opposition to the constrictions of corset costumes in ballet), dan-
cing barefoot (Manning 1993: 34) and the absence of any codified movement vocabu-
lary linked to ballet (Brandstetter 1995b: 155). Instead, Duncan’s “first idea of
movement of the dance, certainly came from the rhythm of the waves […] [i]t is the
alternate attraction and resistance of the law of gravity that causes this wave move-
ment” (Duncan 2008: 9). Besides, Duncan draws her inspiration from paintings of the
Renaissance, like those of Sandro Botticelli, and especially from motives on ancient
Greek vases (Brandstetter 1995b: 72–74, 111). However, it is important to notice that
Duncan’s idea of a “natural body” is actually a model of nature embossed by the
cultural ideal of ancient Greek artists (Brandstetter 1995b: 47; Schulze 1999: 53).
With the withdrawal from codified, academic patterns in ballet, an increasing
approach of communicating emotion and advancing to the inner world of human beings,
as St. Denis phrases it, can be observed during the unfolding of Modern Dance over the
first half of the 20th century. Martha Graham decidedly opposes the idea of story telling
in dance but focuses on communication as a means of articulating changed human needs
in modern times (Graham 1979: 50–51), often linked to freeing the body from restricted
conditions of industrialization (Brandstetter 1995a):
There is a difference of inspiration in the dance today. Dance [is] no longer performing its
function of communication. By communication is not meant to tell a story or to project an
idea, but to communicate experience by means of action and perceived by action. […] The
departure of the dance from classical and romantic delineations was not an end in itself,
but the means to an end. […] The old forms could not give voice to the more fully awa-
kened man. They had to undergo metamorphosis – in some cases destruction – to serve
as a medium for a time differently organized. (Graham 1979: 50)
Referring to the new motile expressions necessary for Modern Dance, Mary Wigman
distinguishes between “stage dance,” where pantomime is used to “interpret[] mean-
ingful action,” and the so called “absolute dance […] independent of any literary-in-
terpretative content; it does not represent, it is; and its effect on the spectator who is
invited to experience the dancer’s experience is on a mental-motoric level, exciting
and moving” (Wigman 1963: 35–36). Wigman articulates the idea of a direct transmit-
ting affect of e/motion to the audience, a phenomenon John Martin describes as “[m]
etakinesis” in the beginning of the 1930s ( J. Martin 1969: 14): “emotional experience
can express itself through movement directly” ( J. Martin 1969: 18). However, Yvonne
Hardt depicts that emotion in dance is not a state of free floating feelings but achieved
by a long lasting process of bodily discipline in the rehearsal studio (Hardt 2006: 143).
Wigman’s idea of a “language of dance” thus accentuates the symbolic level of dance
movement (Wigman in Schikowski 1930), and contemporary dance critic Hans
Brandenburg even conceives of her work as presenting pure form in terms of a “phe-
nomenology of movement […] following no other form than the one emerging from
the movement itself ” (Brandenburg 1921: 201–202; see also Wigman 1963: 12;
Manning 1993: 18). However, Rudolf von Laban suggests a sytem of notation to rec-
ord the new “languages” of Modern Dance. Since there is a lack of an explicit vocab-
ulary provided for by ballet, he responds to the need of visualizing the movement
patterns of dance in an appropriate way, taking into consideration the levels of the
body in space, the movement of the extremities and its motile qualities, called efforts
(Laban 1966).
Far from the idea of an absolute dance, Doris Humphrey insists on dance acting in a
social field: “Just self-expression, provided that can be had at all, is certainly not accept-
able” (Humphrey 1959: 31). She differentiates between a repertory of gestures, among
them the so called “ritual gesture” and the “social gesture,” like hand shaking, embra-
cing and so on (Humphrey 1959: 115–118). Those gestures are not necessarily meant to
be pantomimic but serve as a foil in creating dance themes and motives by stylizing the
movement, and should be always considered within their corporal history (Humphrey
1959: 115, 121–124). Humphrey also develops a set of “emotional gesture[s],” not mim-
icking certain feelings, but exploring the body’s shape caused by them. The situation of
griefing for instance lets the body contract itself, wrapping the arms around the torso,
etc. Those explorations serve as a starting point for the development of “patterned
emotions” in dance (Humphrey 1959: 118), and are partially reminiscent of the postures
Ménéstrier suggested three hundred years before.
Apart from that, it has to be recognized that in this historical period (often female)
dancers and choreographers are less interested in a literal mimesis of contents or
emotions, as they represent themselves not only as the authors (choreographers) of
their own dances but also by expressing themselves in writing, like accompanying
self-reflective commentaries and manifestos (Brandstetter 1995b: 37–38).
4. Dance since the 1960s: De-Gesturing the motion

Dance in the 1960s experiences a significant cesura. Especially the foundation of the
Judson Dance Theatre in New York provides a radical change in dance performances.
Artists like Yvonne Rainer refuse any expressive or spectacular dramaturgy or appear-
ances of a “natural” body within their performances (Banes 1987: 15–18) – articulated
in her famous NO-manifesto (1965; Rainer 1974: 51) – but instead focus on the formal
structures of movement as well as a democratization of dance due to the chosen mate-
rial e.g., by introduction of movements from every day life (Banes 1987: 44, 65) and the
(sometimes participatory) relation between performers and spectators (R. Martin 1998:
29–34).
After choreographers like Pina Bausch used the spoken word on stage in a way that
involved the biographical self of the dancers (Schlicher 1987: 212–135) and since the
1990s, a certain mistrust towards dance as a means of communicating sense can be ob-
served in the European dance scene. According to the strategies of the protagonists of
the Judson Church, the focus often lies not on what to represent and to transmit on
stage but how dance is projecting itself in the field of the performing arts, highlighting
the critical self-reflection of body, the movement and its presentation, and often leading
to a certain minimalism on stage (Husemann 2002: 17). Choreography is no longer de-
fined via the arrangement of bodies on stage but becomes a reflective instance, bringing
thinking into motion (Husemann 2002: 8).
To put artistic dance in the field of communication therefore requests an alternative
approach that allows for the freedom of movement from fixed interpretative conditions.
The model of gesture, as Giorgio Agamben puts it into play, can serve as a foil to under-
stand European contemporary dance as a field of bodily, theatrical and discursive nego-
tiation. Agamben opts against a model of gestures serving a specific purpose as well as
being only executed with an end in itself. Instead he gives the following definition:
“What characterizes gesture is that in it nothing is being produced or acted, but rather
something is being endured and supported” (Agamben 2004: 110). Thus dance as ges-
ture, following Agamben, focuses on his own mediatized character “of corporeal move-
ments” (Agamben 2004: 110). In this respect, he withdraws from the idea of gesture as a
simple bodily means of communication but allocates it to a meta-reflective sphere that
positions all human being as being ontologically related to language: “The gesture is
[…] communication of a communicability. It has precisely nothing to say because
what it shows is the being-in-language of human beings as pure mediality” (Agamben
2004: 111).
In that respect, the French choreographer Jérôme Bel questions body, movement
and choreography as almost equitable elements in a network of significances. By
using minimal, often retarded motions avoiding any spectacular dramaturgy, and a neu-
tral non-emotional attitude, for instance in his performances Nom donné par l’auteur
(1994) or Jérôme Bel (1995) (Siegmund 2006: 318–319), he is showing and simulta-
neously reflecting the body as a producer of theatrical signs on stage. Moreover, chore-
ography is no longer bound to the body alone, serving as the indicator for what should
be perceived: “All of the objects are speaking elements of a choreography,” and conse-
quently the dancers themselves become variable objects on stage (Siegmund 2006: 320).
Thus, Bel is unsettling dance as a code that is universally available and shifts it into a
“latent position” (Siegmund 2006: 323).
The Europe-based US-American choreographer Meg Stuart follows a different strat-

egy of draining movement of sense and predictability. Often, the dancers almost lose
their bodily control on stage, and socially conventionalized gestures are blurred or
emerge as being empty rituals, e.g. in her piece Visitors Only (2003), where a party
scene gets totally out of hand. The movements of the dancers are infused by ticks rem-
iniscent of the Tourette syndrome, thus refusing to conform to “socially appropriate”
behaviour. Also, spoken language itself is cracked into fragmented weasel words that
no longer communicate but instead transform the body onstage into a transit station
of alienated sense (Foellmer 2009: 271).
This stream in European contemporary dance hence no longer favours closed narra-
tives or the expressionism of emotions but questions its own conditions in the field of
art, and demands from the audience to reflect upon its proper being as an individual
and a part of community in today’s society.
5. References
Agamben, Giorgio 2004. Notes on gesture. In: Hemma Schmutz and Tanja Widman (eds.), That
Bodies Speak Has Been Known for a Long Time, 105–114. Cologne: Walther König.
Anderson, Jack 1992. Ballet and Modern Dance. A Concise History. Hightstown: Princeton Book
Company.
Banes, Sally 1987. Terpsichore in Sneakers. Post-Modern Dance. Hanover: Wesleyan University
Press.
Blasis, Carlo 1976. The Code of Terpsichore: A Practical and Historical Treatise on the Ballet, Dan-
cing and Pantomime; with A Complete Theory of the Art of Dancing: Intended As Well for the
Instruction of Amateurs as the Use of Professional Persons (1828), Facsimile of the James Bul-
lock edition, London, 1928. New York: Dance Horizons.
Brandenburg, Hans 1921. Der Moderne Tanz. Munich: Georg Müller.
Brandstetter, Gabriele 1995a. Tanz-Avantgarde und Bäder-Kultur – Grenzüberschreitungen
zwischen Freizeitwelt und Bewegungsbühne. In: Erika Fischer-Lichte (ed.), TheaterAvant-
garde, Wahrnehmung – Körper – Sprache, 123–155. Tübingen: A. Francke.
Brandstetter, Gabriele 1995b. Tanz-Lektüren. Körperbilder und Raumfiguren der Avantgarde.
Frankfurt am Main: Fischer.
Brandstetter, Gabriele 2004. “The code of Terpsichore.” Carlo Blasis’ Tanztheorie zwischen Ara-
beske und Mechanik. In: Gabriele Brandstetter and Gerhard Neumann (eds.), Romantische Wis-
senspoetik. Die Künste und die Wissenschaften um 1800, 49–72. Würzburg: Königshausen and
Neumann.
Cohen, Sarah R. 2000. Art, Dance and the Body in French Culture of the Ancien Régime. Cam-
Duncan, Isadora 2008. Der Tanz der Zukunft (1903). In: Magdalena Tzaneva (ed.), Isadora Dun-
cans Tanz der Zukunft. 10 Stimmen zum Werk von Isadora Duncan, 33–50. Berlin: LiDi
EuropEdition.
Falcone, Francesca 2007. Between tradition and innovation: Bournonville’s vocabulary and style.
In: Gunhild Oberzaucher-Schüller (ed.), Souvenirs de Taglioni. Bühnentanz in der Ersten
Hälfte des 19. Jahrhunderts, Vol. 2, 283–292. Munich: Kieser.
Feuillet, Raoul-Auger 1700. Chorégraphie ou l’Art de décrire la Dance par Caractères, Figures et
Signes Demonstratifs. Paris: Michel Brunet. Reprint 1979. Hildesheim: Georg Olms.
Foellmer, Susanne 2009. Am Rand der Körper. Inventuren des Unabgeschlossenen im Zeitgenös-
sischen Tanz. Bielefeld: transcript.
Franko, Mark 1993. Dance as Text. Ideologies of the Baroque Body. Cambridge: Cambridge Uni-
versity Press.
Graham, Martha 1979. Graham 1937. In: Jean Morrison Brown (ed.), The Vision of Modern
Dance, 49–53. New Jersey: Princeton Book Company.
Guest, Ivor 1966. The Romantic Ballet in Paris. London: Pitman and Sons.
Hardt, Yvonne 2006. Reading emotions. Lesarten des Emotionalen am Beispiel des modernen
Tanzes in den USA. In: Margit Bischof, Claudia Feest and Claudia Rosiny (eds.), e_motion,
139–155. Hamburg: Lit.
Hilton, Wendy 1981. Dance of Court and Theater. The French Noble Style 1690–1725. Princeton,
NJ: Princeton Book Company.
Hoffmannsthal, Hugo von 2005. The Lord Chandos Letter and Other Writings. Selected and trans-
lated by Joel Rotenberg. New York: New York Review Book.
Humphrey, Doris 1959. The Art of Making Dances. New York: Rinehardt.
Huschka, Sabine 2006. Der Tanz als Medium von Gefühlen. Eine historische Betrachtung. In:
Margit Bischof, Claudia Feest and Claudia Rosiny (eds.), e_motion, 107–122. Hamburg: Lit.
Husemann, Pirkko 2002. Ceci Est de la Danse. Choreographien von Meg Stuart, Xavier Le Roy
und Jérôme Bel. Norderstedt: Books on Demand.
Jeschke, Claudia 1996. Körperkonzepte des Barock – Inszenierungen des Körpers und durch den
Körper. In: Sibylle Dahms and Stephanie Schroedter (eds.), Tanz und Bewegung in der Bar-
ocken Oper, 85–105. Innsbruck: StudienVerlag.
Jeschke, Claudia, Isa Wortelkamp and Gabi Vettermann 2005. Arabesken. Modelle “fremder”
Körperlichkeit in Tanztheorie und – inszenierung. In: Claudia Jeschke and Helmut Zedelmaier
(eds.), Andere Körper – Fremde Bewegungen. Theatrale und Öffentliche Inszenierungen im
19. Jahrhundert, 169–210. Münster: Lit.
Laban, Rudolf 1966. Choreutics. London: Macdonald and Evans.
Lee, Carol 2002. Ballet in Western Culture. A History of Its Origins and Evolution. New York:
Routledge.
Lepecki, André 2006. Exhausting Dance. Performance and the Politics of Movement. New York:
Routledge.
Manning, Susan A. 1993. Ecstasy and the Demon. Feminism and Nationalism in the Dances of
Mary Wigman. Berkeley: University of California Press.
Martin, John 1969. The Modern Dance. New York: Dance Horizons. First published [1933].
Martin, Randy 1998. Critical Moves. Dance Studies in Theory and Politics. Durham, NC: Duke
University Press.
Ménéstrier, Claude François 1682. Des Ballets Anciens et Modernes selon les Règles du Théatre.
Paris: Rene Giugnard.
Noverre, Jean Georges 1951. Letter VII. In: Jean Georges Noverre (ed.), Letters on Dancing and
Ballet (Lettres sur la Danse et les Ballets, 1760), 49–55. Translated by Cyril W. Beaumont, 1930.
London: Beaumont.
Pappacena, Flavia 2007. Dance terminology and iconography in early nineteenth century. In: Gun-
hild Oberzaucher-Schüller (ed.), Souvenirs de Taglioni. Bühnentanz in der Ersten Hälfte des 19.
Jahrhunderts, Vol. 2, 95–112. Munich: Kieser.
Rainer, Yvonne 1974. Parts of Some Sextets. Some retrospective notes on a dance for 10 people
and 12 mattrasses called “Parts of Some Sextets,” performed at the Wadsworth Atheneum,
Hartford, Connecticut and Judson Memorial Church, New York, March 1965 (1965). In:
Yvonne Rainer (ed.), Yvonne Rainer: Work 1961–73, 45–51. Halifax: Press of the Nova Scotia
College of Art and Design.
Rameau, Pierre 1967. Le Maı̂tre à Danser (1725). Facsimile of the Paris edition. New York:
Broude Brothers.
Schikowski, John 1930. Absoluter Tanz. In: Mainzer Anzeiger, 19.1.1930, Mary – Wigman –
Archive, Akademie der Künste Berlin, folder N – S.
Schlicher, Susanne 1987. TanzTheater. Traditionen und Freiheiten. Pina Bausch, Gerhard Boh-
ner, Reinhild Hoffmann, Hans Kresnik, Susanne Linke. Reinbek bei Hamburg: Rowohlt.
28. Communicating with dance 427
Schroedter, Stephanie 2004. Vom “Affect” zur “Action.” Quellenstudien zur Poetik der Tanzkunst
vom Späten Ballet de Cour zum Frühen Ballet en Action. Würzburg: Königshausen and
Neumann.
Schroedter, Stephanie 2008. The French art of dancing as described in the German dance instruc-
tion manuals of the early 18th century. In: Stephanie Schroedter, Marie Thérèse Mourey, and
Giles Bennett (eds.), Baroque Dance and the Transfer of Culture around 1700, 412–448. Hilde-
sheim, Germany: Georg Olms.
Schulze, Janine 1999. Dancing Bodies Dancing Gender. Tanz im 20. Jahrhundert aus der Perspek-
tive der Gender-Theorie. Dortmund: Edition Ebersbach.
Siegmund, Gerald 2006. Abwesenheit. Eine Performative Ästhetik des Tanzes. William Forsythe,
Jérôme Bel, Xavier Le Roy, Meg Stuart. Bielefeld: transcript.
Smith, Marian 2000. Ballet and Opera in the Age of “Giselle.” Princeton, NJ: Princeton University
Press.
St. Denis, Ruth 1979. The dance as life experience In: Jean Morrison Brown (ed.), The Vision of
Modern Dance, 21–25. New Jersey: Princeton Book Company.
Tomlinson, Kellom 1735. The Art of Dancing Explained by Reading and Figures. London. Reprint
1970. Westmead: Gregg International.
Weaver, John 1721. Anatomical and Mechanical Lectures upon Dancing. London: J. Brotherton
and W. Meadows.
Wigman, Mary 1963. Die Sprache des Tanzes. Stuttgart: Battenberg
Susanne Foellmer, Berlin (Germany)
28. Communicating with dance: A historiography

of aesthetic and anthropological reflections
on the relation between dance, language,
and representation
1. Introduction: Dance as a field of investigating bodily communication
2. Anthropological and aesthetical discourses on dance
3. Overview of models for “reading” dancing
4. Historical development
5. Deconstructing the division between aesthetics and anthropology
6. Exposition
7. References
Abstract
The article presents an overview of the aesthetic and anthropological reflections on the
“language” of the body on stage from classical times to the present. It discusses the his-
toriography of these two academic fields and their corresponding traditions by placing
them in conversation with one another. The article focuses on a few paradigmatic
concepts and methods that are often applied when artists and scholars think about the
relation of dance with regards to the body, language, and representation.
1. Introduction: Dance as a field of investigating bodily

communication
Dance is a social and artistic practice that communicates most directly through the
body. Although there does not exist a specific word or concept in all cultures and lan-
guages that signifies “dance” as something distinct from other cultural practices like
theatre or ritual, phenomena that can be recognized as dance appear in all social, cul-
tural and historical contexts (Schlesier 2007; Kaeppler 1996). Despite this significant
presence in human social practice and interaction, dance has only become a major
field of academic investigation since the mid-1980s with an increasing interest in bodily
practices as a central part of human culture, knowledge and social formation (Kamper
and Wulf 1982; Lorenz 2000). Along with the concurrent development and establish-
ment of dance studies and dance anthropology came the deconstruction of the simplistic
but pervasive notion that dance is a universal language that dominated dance discourse
since the 18th century. Dance has become a paradigmatic field for investigating and ex-
emplifying the complex and altering relationships between body movement, language
and communication.
2. Anthropological and aesthetical discourses on dance

In the Western context there is hardly any historical or contemporary text on dance that
does not problematize to some extent the relationship between dance, writing and
bodily communication (e.g. Blasis 1828; Brandstetter and Klein 2006; Laban 1922;
Martin [1933] 1965; Noverre 1727). This is due to a lack of a unifying system of dance nota-
tion; despite several attempts, such a system was never developed for dance as it was for
music, for instance (Jeschke 1983). Aesthetic studies of dance have always stressed its
ephemeral nature that distinguishes it from more stable artistic products such as texts
and paintings. While writing on Western dance has historically perceived ephemerality
as a negative aspect and seen this transitory character as the main reason why dance
held a minor status among the arts (Blasis 1828; Noverre 1727), this performative aspect
of dance has received considerable reformulation with an increasing interest in the body
as a means to transmit and store cultural knowledge (Buckland 2006; Taylor 2003).
With this academic development, the aesthetic and anthropological discourses on
dance began more consciously to reflect each other (Brandstetter and Wulf 2007).
Until the 1990s, these had been seen as quite distinct: While aesthetics focused predo-
minantly on dance as art as it was performed on stage from ballet to modern and con-
temporary dance, anthropology looked at those forms of dances considered to be social,
traditional or ethnically marked, mostly studying dance within Non-Western contexts.
In anthropological discourses, dance did not stand for ephemerality but rather it
exemplified tradition or a universal human practice and experience. Accordingly, in
anthropological discourses, the relation between communication and tradition was con-
ceptualized quite differently, focussing on questions like: How does dance symbolize or
even “express” cultural traits of a society? How could dance be used to understand a
specific culture? What is the function of dance? (Kaeppler 1996: 309–310).
Despite this differentiation that has been upheld between Western artistic dance
and ethnically marked dance, representations of “other” cultures clearly existed in Wes-
tern stage dance. Furthermore, there exists a significant, yet mostly unacknowledged
influence of non-Western dance on so called Western artistic practice (Dixon Gottschild

1998; Savigliano 1995). The distinction between both Western and Non-Western dance
was also challenged when newer studies revealed that Non-Western dance was never
the expression of a “natural” social behaviour, as the paradigm of tradition had ren-
dered it, but instead of a complex, rule-based and artistic practice that has its own artis-
tic development (Buckland 2006; Williams 1991). With the deconstruction of the
pervasive notion that had linked ethnically marked dances to the realm of tradition
and locality (whereas Western artistic practice had been perceived as innovative and
universal), the separation between the two scholarly fields became obsolete. Through
the influence of a postcolonial framework, the search for “origin” was exposed as a
Western cultural invention (Hall 1999).
For this reason, this text provides a broad overview and historiography of these two
academic fields and traditions by placing and seeing them in conversation with each
other. As these are vast academic fields with an enormous diversity, this text focuses
on a few paradigmatic concepts and methods applied when artists and scholars have
thought about the relation of dance in regard to the body, language, and representation.
3. Overview of models for “reading” dancing

Generally, three major tendencies can be identified on how the relationship between
dance and communication has been conceptualized. While most often none of these
standpoints is uniquely applied when investigating dance, one usually dominates the
understanding and reading of it.
The first approach locates dance in a semiotic framework. Words are assigned to spe-
cific movements within dance styles that have codified movement vocabularies (e.g. bal-
let: plié; kathak: padma kosha), and they often stand for a codified meaning or meaning
potential. This is especially the case when dance focuses on a gestural movement vocab-
ulary or pantomimic aspects, or when the movements are interpreted as gestures. This
framework is also apparent in the symbolic readings of dance in many anthropological
contexts, where cultural performances become “texts” to be read (Williams 1991), or in
semiotic analysis of artistic dance that uses a linguistic terminology to describe dance
components (for instance seeing the shape of a movement as a noun, the action as a
verb and the quality as an adverb) (Adshead 1988; Hutchinson Guest 2005). It is also
apparent when modes of representation of dance are framed along the lines of mimesis,
resemblance or replication (Foster 1986).
Secondly, dance may be seen as a language of its own and distinct from the rational-
ity of language: Accordingly, dance communicates differently from verbal or written
language and bypasses intellectual reflection. This interpretation developed most
clearly at the end of the 19th century, when the body was (re)discovered as a site of
knowledge – a knowledge that was often considered to have been lost through the pro-
cess of civilization and industrialization (Bücher 1898; Laban 1922; Sachs 1937). The
focus on the distinctness and kinaesthetic means of communication developed in the
context of modern dance, in anthropological studies that were clearly linked to a per-
spective on Non-Western dance forms as more “natural”, and also within an aesthetic
discourse that opted for the distinctness of art based on its primary modes of expression
(Martin [1933] 1965; Wigman 1935). This perspective of dance that sees it as distinct
from language is also influential in early phenomenological investigations of this art
(Langer 1953; Sheets-Johnstone 1966). Since the 1990s, such a phenomenological

reading has become more contextualized in its cultural contexts and is thus highly
intertwined with the third form of reading dance (Cohen Bull 1997; Sklar 2008).
In this third reading, dance is considered – like all social practices – to be based in
performative actions both distinct from language and also shaped by a social order
that can only be understood through language. The formation of the body, its practice
and how we can perceive it are interwoven with language and power structures. This
viewpoint is taken within the field of social anthropology, where dance is considered
within a larger social context (Spencer 1985). It also shared by those who conceptualize
the body within a theoretical framework that has been highly influenced by the studies
of Michel Foucault (Foster 1996; Desmond 1997; Franko 1995).
4. Historical development
Anthropology is both a product of the Enlightenment and the increasing process of col-
onization since the 18th century, which provoked the study and comparison of people in
different spaces and the investigation of long term human development. Those who first
used the term were not particularly interested in dance, although Georg Forster in his
“Leitfaden zu einer künftigen Geschichte der Menschheit” clearly integrated the dance
of so called “wilde” people into his humanisitc vision that saw dance as mankind’s first
step progressing toward civilization (Forster [1789] 1974: 191).
4.1. Universalistic perspectives

The interest in studying and comparing human development in the 18th century coin-
cided with a significant increase in dance publications. The dance treatises of that period
started to systemize, to analyze dance and to suggest cultural differences, although the
study of dances of other cultures was not yet an explicit goal. Most prominently, dance
of ancient cultures became increasingly interesting to the aesthetic discourse in the 18th
century, as the writers and dance masters (Cahusac 1754; Gallini 1762; Noverre 1727)
believed that they could learn from “earlier” cultures about the development of their
own dance forms. They were convinced that these studies could enhance the expressive
quality of 18th century dance practice. The 18th century treatise on dance made a plai-
doyer for a newer, more expressive and narrative form of dancing that culminated in the
ballet d’action by evoking and referencing predominantly Greek theatre as their ideal
model. Because of its reliance on mime, this was envisioned as a more expressive
medium than the noble dance that had developed at the Italian and French courts
since the 16th century and which functioned as the negative backdrop against which
a new dance aesthetic was developed (Foster 1996).
The 18th and 19th century dance treatise did not pay attention to the differences or
diversity of ancient dancing and usually conflated the “chorei” (as part of the Greek
theatre, signifying dancing, but also more generally the choir and the place where it
was performed) and the “pantomimos” (a solo dance form mostly performed in the
Hellenistic and Roman period) into simply “the Ancient dance” (for a more detailed
assessment of Greek dance and its historiography, see Ley 2003). Writing on dance
as it appeared by Homer, Aristotle, Aristophanes, Herodotus and Plato was as such
not looked at as a way to describe ancient dance but instead to find universal traits.
By the 19th century, the focus was clearly directed toward gestures, which by then were
considered “undoubtedly, the very soul and support of the Ballet” (Blasis 1828: 111).
Blasis, whose Code of Terpsichore is considered the first and most comprehensive sys-
tematization of ballet steps, as it still informs ballet today, drew on the Greek distinction
between phora, considered a gesture for the expression of emotions and actions, and
schemata, which is considered a gesture which expresses the characteristics of a person
or thing and that was widely circulated in publications, for his distinction between natural
gestures (for emotions) and artificial (for abstract concepts) (Blasis 1828: 133; Lawler
1964; Ley 2003: 476). For 18th and 19th century writer, dancers needed to learn this lan-
guage of gestures in order to enhance expressivity (Blasis 1828; Cahusac 1754: 2–32;
Gallini 1762; Noverre 1726: 28). The belief that dance is a universal language that could
be studied with scientific means did not yet signify that it was also considered natural.
4.2. Evolutionistic and Carthatic interpretations – into the 20th century

While the interest in Ancient or Greek culture remained alive into the 20th century, the
explicit study of the dances of other cultures became more paramount as anthropolog-
ical studies, in a positivistic manner, started to collect the songs and dances of different
cultures all over the word from the end of the 19th century onward (Bücher 1898;
Spencer 1862). Anthropological studies, more specifically, those compiling dances of
cultures believed to be “primitive,” were still underwritten with the notation that
these dances marked an early state in the evolution of mankind; “primitive” was
here equated with primeval. In the 19th and well into the 20th century, this so called
primitive dance became clearly divorced from artistic dance as it was performed in
the West and came to stand in for a more natural, bodily expression that was not yet
“tamed,” or restricted through the process of civilization (Sachs 1937). Especially
those dances that clearly portrayed the abandon of restriction, rhythmic stamping
and compulsive movements of the hips came to stand in for a more direct form of com-
munication and the experience of religious ecstasy that had been lost to the civilized
world. The transformative power that was assigned to dance by philosophers like
Nietzsche ( [1872] 1972: 25) and anthropologists like Spencer (1862: 234) or Durkheim
(1912) echoed a critique on civilization that was important for the rise of the new body
culture movement and modern dance.
Early 20th century dance discourse relied on this liberating discourse that suggested
that society could be renewed through the creation of a more communal feeling evoked
by dancing. The new dance, which rebelled against the ballet and its codified move-
ments, envisioned the true or “absolute” dance as a means to heal the evils of industri-
alization and civilization (Baxmann 2000; Hardt 2004). Rudolf von Laban, who was one
of the central figures in the development of Ausdruckstanz (expressive dance) in
Germany, did not only claim that all humans could dance but was crucial in initiating
the development of movement choirs (Laban 1922). This was an improvised group
dance form that could be practiced both by lay and professional dancers and whose
importance was in the communal feeling the dance should evoke. This was paralleled
by a paradigmatic shift in how dance was believed to communicate. Instead of gestures
or attention given to narration, the somatic or kinaesthetic reactions to movements
were foregrounded. Because people shared a common experience, they could also
sympathize physically with others through performed movements. While the German
discourse on dance relied on a terminology that saw people “swinging” or “radiating”

(schwingen) with each other (Laban 1922), echoing in its language the importance of
the invention of the telegraph, this phenomenon of pre-intellectual communication
was termed in Anglophone dance discourse “muscular sympathy” (Martin [1933]
1965). Similar to other modern art forms, dance started to focus on movement as the
main means of communication and rejected older forms of narration, representation
and codified movements. Central figures in historical modern dance in the 1920s to
1940s like Martha Graham or Mary Wigman conceptualized body language as some-
thing distinct from verbal communication that resisted notation and believed in deeply
emotional and archetypal movement states that correlated to specific human feelings
(Martin [1933] 1965; Wigman 1935).
Even after the universalistic and evolutionistic discourses ceased to dominate
anthropology in other fields of inquiry, their influences were kept alive in the study
of dance through the influence of Curt Sachs’ World History of Dance (Sachs 1937;
Schlesier 2007: 141). His initial definition of dance articulated what many modern dancers
throughout the 20th century believed: “The dance breaks down the distinctions of body
and soul, of abandoned expression of the emotions and controlled behaviour, of social life
and the expression of individuality, of play, religion, ballet and drama – all distinctions
that a more advanced civilization has established” (Sachs 1937: 3). Originally published
in 1933 and already translated into English in 1937, World History of Dance remained
one of the standard books in the field despite a severe critique that was voiced on the
basis of the methodological framings and its racist underpinnings (Kaeppler 1996; Schle-
sier 2007; Williams 1991; Youngerman 1974). Quite divergent schools of anthropology
drew on this study, including the founders of dance anthropology in the United States
like Franz Boas and his daughter Franziska Boas, who represent a more empirical tradi-
tion focussing on cultural relativism (Boas 1944), or those in the context of social anthro-
pology who investigate the function of dance in society and reject the non-utilitarian
perspective of dance that carthatic theories of dance had voiced (Spencer 1985).
5. Deconstructing the division between aesthetics

and anthropology
The state of dance anthropology led the Australian Drid Williams to write a polemic
evaluation of how dance anthropology had been written up to this point (Williams
1991). Her series of lectures gives one of the most detailed critiques of dance anthro-
pological writing at that time, deconstructing both the notion that dance and the dancer
in general could be simply equated with a “natural” body and arguing for the complex-
ity and artistry of the dances of other cultures and rejecting the naturalizing expression-
ism of the body. “If we must use the term ‘expression’ in relation to the dance then we
might better say that when we see dance anywhere in the world, what we are seeing is
an expression of the choreographer’s knowledge of human feelings, ideas, lives and the
universe” (Williams 1991: 21).
5.1. Linguistic approaches

The linguistic approach and the influence of literary theory on the aesthetical discourse
also become apparent with two texts that have reshaped dance studies: Susan Foster’s
Reading Dancing (1986) and Janet Adshead’s Dance Analysis (1988). Published only
two years apart, these two studies coincide with the advent of the academic establish-
ment of dance studies in Anglophone academia. They argued in their comparative ap-
proaches for a competence in reading dance that needed to be learned in similar ways
as language. Foster, who began with the working methods of four contemporaneous
choreographers, clearly demonstrated how “wrong” expectations of how a dance com-
municates might make the dance inaccessible for a viewer even within a Western
concert dance context. Rather, she suggests that in order to understand dance one
needs to understand its different modes of representation (Foster 1986). Adshead en-
couraged a cross-cultural perspective, applying her movement oriented dance analysis
to a comparison between modern dance, English Folk dancing and Toganian dance
(Adshead 1988).
However, dance studies predominantly kept the distinction of aesthetics and anthro-
pological investigations apart up to the mid-1990s because the establishment of a more
theoretically founded dance scholarship coincided and took its inspiration from the so-
called post-modern dance that was bound to a highly modernistic artistic discourse that
opted for the independence of art. Influential choreographers, such as Merce Cunning-
ham, and the protagonists of the Judson Dance Theater, such as Yvonne Rainer, Steve
Paxton, and Trisha Brown, wanted to stage dance that presented movement for move-
ment’s sakes, avoiding classical narration as well as emotional expressivity (Banes
1987). By performing movement that was pedestrian, without any organic organisation
and flow and performed in casual manner, they not only questioned the boundaries of
what defines dance but challenged the convention that dance should communicate or
represent at all. They also destabilized the inevitable correlation between specific ges-
tures or movements and their symbolic or emotional meanings. The focus was drawn to
the context in which gestures and movements could appear and attain a symbolic mean-
ing on stage at all (Hardt 2006). Despite such a broad view of the potential of how
movement could signify, aesthetical discourse on dance participated in defending and
explaining post-modern dance’s rejection of former expressivity in dance, and it clearly
followed the modernistic argumentation of dancers that separated it from representation
(Banes 1987; Franko 1995).
5.2. Comparative and postcolonial approaches

It was only with the deconstruction of this classic modernist artistic discourse that an
ethnographically based methodology was employed to analyze Western dance and
that Non-Western dance could also be seen as artistic avant-garde practice. While
Joann Wheeler Kealiinohomoku’s 1969 essay “An anthropologist looks at ballet as a
form of ethnic dancing” had already introduced a perspective that considered Western
stage dance within an anthropological framework, such a perspective, however, only
became more established about 25 years later. This was the result of both the deconstruc-
tion of the binaries of “high” and “low” art and the influence of postcolonial discourse on
dance studies.
Not only ballet, which for a quite a long time and repeatedly was seen as a prime
example for studying the so called process of civilization and the increasing refinement
of European society since the 16th century (Brandstetter 2005; Franko 1993; Weickmann
2002), but also dances considered modern were revisited for their connection to a wider
social context and cultural exchanges. The focus is now also geared towards the inter-
dependence of social and artistic dance, as, for instance, Claudia Jeschke demonstrates
in case studies that both dance genres shared similar movement motives, dynamics and
preferences for body parts within their historical period (Jeschke 1999). These studies
indicate that the division with respect to the focus of locality between aesthetic and
anthropological perspective is slowly being eroded.
Provoked by the increasing process of globalization, the focus is now geared toward
how dance cultures have always travelled through different cultural contexts and how
they are reshaped and in this process are assigned new meanings. For instance, Ruth
St. Denis and her staging of Indian Dances, Mary Wigman’s interest in early Germanic
cultures, Nijinskiy’s scandalous Le Sacre du Printemps that marked the event of mod-
ernism in Ballet in 1913 by evoking pagan rituals, or Martha Graham’s appropriation of
Native American dance were now discussed in the context of colonisation, cultural bor-
rowings and hybrid dance cultures (Franko 1995; Jeschke 1999; Manning 1993). More
recent studies have also revealed African American influence on a wide range of dances
and American culture in general, including the neoclassic ballet of the founder of
American ballet, Balanchine (Dixon Gottschild 1998).
This cultural hybridity can also be studied both as dance migrates through different
contexts and also as it is appropriated by different social classes within a similar locality,
as for instance in the development of social dances like Tango (Savigliano 1995), and
HipHop (Desmond 1997; Klein and Friedrich 2004) or the spread of “African dance”
classes all over the world. When these dance forms travel throughout the world and
especially into white middle class contexts, most often the movements of the hips, the
looseness of the legs, or the forward bend of the upper body are replaced by a focus
on figuration of steps and a more upward and confined movement of the body
(Desmond 1997). An exemplary case study is Marta Savigliano’s Tango and the Political
Economy of Passion (1995), which traces how Tango travelled from the working class
pubs in Argentina to the bourgeois night life in Europe at the beginning of the 20th
century and later back to Argentina, where the dance was (re)appropriated by a
wider Argentinean middle class. Savigliano demonstrates how Tango had very different
connotations and meanings for different groups of dancers: while it allowed the white
bourgeois in Europe to express what was considered suppressed and to indulge a fasci-
nation with the exotic, the (re)appropriation of Tango back into Argentina was linked
to what Savigliano calls auto-exotism and the need to define a cultural heritage. With
this, stable notions of culture have finally been put aside.
In general, how dance cultures come to stand in for a national movement has been
traced through a critique of the Folk as signifying tradition. Folk dance is now consid-
ered as part of the cultural invention of tradition that accompanied the foundation of
nation state building in Europe (Baxmann and Cramer 2005; Buckland 2006). The
European folk dance traditions share a similar history for instance with the Indian
Bharatha Natyam, considered as classical Indian dance but which was only revived
and reinvented in the way that it exist today in the process of Indian liberation from
British colonization (Chatterjee 2004; Meduri 2004).
These studies have also challenged the notion that saw traditionally and ethnically
marked dances as always expressing and symbolizing (one of the reasons they were
not considered as avant-garde). For instance, while in Kathak – a classical Indian
dance form – the hand gestures (mudras) have specific meaning potential, what they
end up signifying not only depends on the context and part of the dance they are per-
formed in, but they can also be used in a highly decorative manner in the so called nritta
(non-narrative) sequences of the dance that does not attempt to signify. In this sense,
then, classical Indian dance parallels the structure of the ballet d’action, where sections
that simply demonstrate physical virtuosity (the so called white sections in ballet) are
alternated with more narrative sections (Katrak 2008). However, while the virtuous sec-
tions in both dances may function outside a more classical understanding of communi-
cation in the sense of narration, they are highly implicated in the representational value
of dance that demonstrates physical ability and the skilled performer that separates
the dancer from the audience. This is an expression of a society that values a specific
understanding of virtuosity (Foster 1986).
5.3. Phenomenological approaches to reading dancing

The tendency toward linguistic models of dance interpretation throughout the 1980s
and 1990s has left more performative elements of dancing bodies that had characterized
earlier studies out of focus. With the performative turn in the humanities, these methods
have more recently become of greater interest once again. In a dance phenomenologi-
cal approach that reaches back to Susan Langer’s philosophical study Feeling and Form
(1953), dance’s kinaesthetic, visual, tactile, and auditory sensations are considered a
crucial part of the choreographic and social meaning (Sheets-Johnstone 1966). While
Langer’s book and earlier phenomenological studies still relied on a universal under-
standing of the body and its senses, newer studies in the field point out that perception
cannot be understood as simply natural or intuitive. Rather, they propose that a specific
dominance in the modality of sensing is structured by social norms and artistic practices.
Deidre Sklar argues that speaking of a pre-linguistic does not imply pre-cultural. In dif-
ferent social, cultural and historic circumstances, people learn to emphasize and value
different sensory details of form and quality and different ways of processing somato-
sensory information (Sklar 2008: 95). Proprioception and kinaesthesia are the central
categories in a dance analysis that has striven to detach it from the merely sight oriented
studies of the form and shape of dance. Accordingly, attention is paid to different cul-
tural sensory sensibilities. For instance, Cynthia Cohen Bull has demonstrated that sight
or seeing, touch or sensing, and sound and hearing respectively structure and inform
how ballet, contact improvisation and Ghanaian Dance are taught and perceived
(Cohen Bull 1997).
6. Exposition
The investigation of dance and dance practice has been shifting between preferring
either the performative or the representational mode and “nature” of dancing and
accordingly does not allow one to postulate a specific or universal way of how dance
can communicate. Rather, the different discourses on dance encourage a comparative
and contextualized analysis and, unfortunately, for most of history have only allowed
for research into how discourse has framed the understanding of dance instead of
how audiences or participants might have experienced the practice of dance.
7. References
Adshead, Janet (ed.) 1988. Dance Analysis: Theory and Practice. London: Dance Books.
Banes, Sally 1987. Terpsichore in Sneakers. Post-Modern Dance. Middletown, CT: Wesleyan Uni-
versity Press.
Baxmann, Inge 2000. Mythos: Gemeinschaft. Körper- und Tanzkulturen in der Moderne. Munich:
Fink.
Baxmann, Inge and Franz Anton Cramer 2005. Deutungsräume: Bewegungswissen als Kulturelles
Archiv der Moderne. Munich: Kieser.
Blasis, Carlo 1828. The Code of Terpsichore. A Practical and Historical Treatise on the Ballet, Dan-
cing, and Pontomime with a Complete Theory of the Art of Dancing. London: Bulcock.
Boas, Franziska (ed.) 1944. The Function of Dance in Human Society. First Seminar. New York:
Dance Horizons.
Brandstetter, Gabriele 2005. The code of Terpsichore. The dance theory of Carlo Blasis: Mechan-
ics as the matrix of grace. Topoi 24(1): 67–79.
Brandstetter, Gabriele and Gabriele Klein (eds.) 2006. Methoden der Tanzwissenschaft. Modelana-
lysen am Beispiel von Pina Bauschs “Le Sacre du Printemps”. Bielefeld: transcript.
Brandstetter, Gabriele and Christoph Wulf (eds.) 2007. Tanz als Anthropologie. Berlin: Fink.
Bücher, Karl 1898. Arbeit und Rhythmus, 3rd edition. Leipzig: Teubner.
Buckland, Theresa Jill (ed.) 2006. Dancing from Past to Present. Nation, Culture, Identities. Mad-
ison: University of Wisconsin Press.
Cahusac, Louis de 1754. La Danse Ancienne et Moderne ou Traité Historique de la Danse. A La
Haye, France: Chez Jean Neaulme.
Chatterjee, Ananya 2004. Constructing a historical narrative for Odissi. In: Ann Albright and Ann
Dils (eds.), Rethinking Dance History. A Reader, 143–156. London: Routledge.
Cohen Bull, Cynthia J. 1997. Sense, meaning, and perception in three dance cultures. In: Jane Des-
mond (ed.), Meaning in Motion. New Cultural Studies of Dance, 269–287. Durham, NC: Duke
University Press.
Desmond, Jane (ed.) 1997. Meaning in Motion. New Cultural Studies of Dance. Durham, NC:
Duke University Press.
Dixon Gottschild, Brenda 1998. Digging the Africanist Present. London: Greenwood.
Durkheim, Emile 2001. The Elementary Forms of Religious Life. Translated by Carol Cosman.
Oxford: Oxford University Press. First published [1912].
Forster, George 1974. Leitfaden zu einer künftigen Geschichte der Menschheit. In: Siegfried
Scheibe (ed.), Georg Forsters Werke: Kleine Schriften zu Philosophie und Zeitgeschichte,
Volume 8. Berlin: Akademie Verlag. First published [1789].
Foster, Susan Leigh 1986. Reading Dancing. Berkeley: University of California Press.
Foster, Susan Leigh 1996. Choreography and Narrative: Ballet’s Staging of Story and Desire. Bloo-
mington: Indiana University Press.
Franko, Mark 1993. Dance as Text: Ideologies of the Baroque Body. Cambridge: Cambridge Uni-
versity Press.
Franko, Mark 1995. Dancing Modernism: Performing Politics. Bloomington: Indiana University
Press.
Gallini, Giovani 1762. A Treatise on the Art of Dancing. London: R. Dodsley.
Hall, Stuart 1999. Kulturelle Identität und Globalisierung. In: Karl Hörning and Rainer Winter
(eds.), Widerspensitge Kulturen. Cultural Studies als Herausforderung, 393–439. Frankfurt am
Main: Suhrkamp.
Hardt, Yvonne 2004. Politische Körper: Ausdruckstanz, Choreographien des Protests und die Ar-
beiterkulturbewegung in der Weimarer Republik. Münster: Lit.
Hardt, Yvonne 2006. Reading emotions: Lesarten des Emotionalen am Beispiel des modernen
Tanzes in den USA (1945–1965). In: Margrit Bischof, Claudia Feest and Claudia Rosiny
Münster (eds.), e-motions. Jahrbuch der Gesellschaft für Tanzforschung 16: 139–155. London,
London & Münster: Lit.
Hutchinson Guest, Ann 2005. Labanotation. The System of Analyzing and Recording Movement,
4th edition. New York: Routledge.
Jeschke, Claudia 1983. Tanzschriften: Die Illustrierte Darstellung eines Phänomens von den Anfän-
gen bis zur Gegenwart. Bad Reichenhall: Comes Verlag.
Jeschke, Claudia 1999. Tanz als BewegungsText. Analysen zum Verhältnis von Tanztheater und
Gesellschaftstanz (1910–1965). Tübingen: Niemeyer.
Kaeppler, Adrienne L. 1996. Dance. In: David Levinson and Melvin Ember (eds.), Encyclopedia
of Cultural Anthropology, Volume 1, 309–313. New York: Routledge.
Kamper, Dietmar and Christoph Wulf (eds.) 1982. Die Wiederkehr des Körpers. Frankfurt am
Main: Suhrkamp.
Katrak, Ketu H. 2008. The gestures of Bharata Natyam: Migration into diasporic contemporary
Indian dance. In: Carrie Noland and Sally Ann Ness (eds.), Migrations of Gesture, 217–240.
Minneapolis: University of Minnesota Press.
Kealiinohomoku, Joann Wheeler 1969/70 An anthropologist looks at ballet as a form of ethnic
dance. Impulse 20: 24–33.
Klein, Gabriel and Malte Friedrich 2004. Is This Real? Die Kultur des HipHop. Frankfurt am
Main: Suhrkamp.
Laban, Rudolph von 1922. Die Welt des Tänzers. Fünf Gedankenreigen. Vienna: W. Seifert.
Langer, Susan 1953. Feeling and Form. A Theory of Art Developed from Philosophy in a New Key.
New York: Charles Scribner’s Sons.
Lawler, Lillian 1964. The Dance of Ancient Greece. London: Adam and Charles Black.
Ley, Graham 2003. Modern visions of Greek tragic dance. Theatre Journal 55: 467–480.
Lorenz, Maren 2000. Leibhaftige Vergangenheit. Einführung in die Körpergeschichte. Tübingen:
edition discord.
Manning, Susan 1993. Ecstasy and the Demon: Feminism and Nationalism in the Dances of Mary
Wigman. Berkeley: University of California Press.
Martin, John 1965. The Modern Dance. New York: Dance Horizon. First published [1933].
Meduri, Avanthi 2004. Bharatha Natyam – what are you? In: Ann Cooper Albright and Ann Dils
(eds.), Moving History / Dancing Cultures. A Dance History Reader, 103–113. Middeltown, CT:
Wesleyan University Press.
Nietzsche, Friedrich 1972. Die Geburt der Tragödie aus dem Geist der Musik: In: Griorriog Colli
and Mazzino Montinari (eds.), Werke. Kritische Gesamtausgabe, 17–152. New York: De Gruy-
ter. First published [1872].
Novack, Cynthia 1990. Sharing the Dance. Contact Dance: Contact Improvisation and American
Culture. Madison: University of Wisconsin Press.
Noverre, Jean Jacques 1727. Lettres sur la Danse et sur les Ballets, Précédées d’une Vie de l’Au-
teur, par André Levinson. Paris. (Letters on Dancing and Ballet. New York: Dance Horizons,
1803.)
Sachs, Curt 1937. World History of Dance. New York: Norton.
Savigliano, Marta 1995. Tango and the Political Economy of Passion. Boulder, CO: Westview
Press.
Schlesier, Renate 2007. Kulturelle Artefakte in Bewegung. Zur Geschichte der Anthropologie des
Tanzes. In: Gabriele Brandstetter and Christoph Wulf (eds.), Tanz als Anthropologie, 132–145.
Berlin: Fink.
Sheets-Johnstone, Maxine 1966. The Phenomenology of Dance. London: Dance Books.
Sklar, Deidre 2008. Remembering kinesthesia: An inquiry into embodied cultural knowledge. In:
Carrie Noland and Sally Ann Ness (eds.), Migrations of Gesture, 85–111. Minneapolis: Univer-
sity of Minnesota Press.
Spencer, Herbert 1862. First Principles. London: Watts.
Spencer, Paul (eds.) 1985. Society and the Dance. The Social Anthropology of Process and Perfor-
mance. Cambridge: Cambridge University Press.
Taylor, Diana 2003. The Archive and the Repertoire. Performing Cultural Memory in the Americas.
Durham, NC: Duke University Press.
Weickmann, Darion 2002. Der dressierte Leib. Kulturgeschichte des Balletts (1580–1870). Frankfurt
am Main: Campus.
Wigman, Mary 1935. Deutsche Tanzkunst. Dresden: Reißner.
Williams, Drid 1991. Ten Lectures on Theories of the Dance. London: Scarecrow Press.
Youngerman, Suzanne 1974. Curt Sachs and his heritage: A critical review of world history of the
dance with a survey of recent studies that perpetuate his ideas. CORD News 6: 6–19.
Yvonne Hardt, Cologne (Germany)
29. Mimesis: The history of a notion

1. Introduction
2. Antiquity and its continuation
3. Reorientation of the theory of mimesis
4. The twentieth century
5. Outlook: New findings
6. References
Abstract
Mimesis, a term that is usually translated as “imitation,” has been central to the fields of
arts, aesthetics, and poetry. However, mimesis is not only a theoretical term. The mimetic
ability is fundamental to all realms of human action and understanding. Mimetic pro-
cesses may be described as the way humans interact with the world and how they recreate
the existing world through their own capacity for giving form. This paper presents an
overview of the history of scholarly reflections on the notion of mimesis from Aristotle
to the present. Three main periods are focused on: a) the period spanning from its first
formulations by Plato and Aristotle in antiquity to the early modern period, b) the period
in which mimesis is declared meaningless by rationalist and idealist philosophies, while
still playing an important social role, and c) the rehabilitation of the concept in the
20th century. The paper concludes by presenting new findings from neuroscience and
developmental psychology.
1. Introduction
In the history of philosophy, mimesis is usually understood as an aesthetic category and
translated as “imitation”. This understanding prevents one from seeing the wide field of
application and manifold meanings that mimesis has been given in the course of the
centuries. However, mimesis plays a decisive role in almost every area of human imag-
ination and action, thought and language. The mimetic ability is part of the conditio
humana and is central to human understanding. In the course of history, it unfolds its
29. Mimesis: The history of a notion 439
semantic spectrum with terms such as: expression and portrayal as well as mimicry,
imitatio, representation, game and ritual.
Mimetic processes may be described as the way human beings interact with the
world in which they live. They take in the world through their senses, yet they do not
passively endure, but respond to it with constructive measures: what they receive
from the world is formed by their own action. The mimetic ability has to be discovered
or constructed through and with the model. In mimetic acts, the subject recreates the
existing world through its own capacity for giving form. In this respect, mimesis is two-
fold: a creation of semblance and a social or artistic formation in another medium. To
the extent that mimesis in fact creates what it copies, it precedes the epistemological
distinction between truth and falsehood. As a productive activity, mimesis belongs
to the domain of practice. It is not a theoretical term; its aim is not knowledge but
world making (Goodman 1978). What is created and how this is achieved depends
on the competence of the world maker; therein lies its creative aspect.
The conceptual history of the term mimesis lacks clear delineations and exhibits a
certain resistance to theoretical appropriation. Insofar as the concept of mimesis is
linked to actions, productive activities and processes, the artificiality and acuity of sci-
entific definitions is not conducive to it. Mimetic processes combine experientially
acquired practico-technical skills with capacities for practical cognition and evaluation.
With the beginning of the predominance of rational thought and the postulate of the
singularity of the subject, the concept of mimesis loses its outstanding position in the
history of thought, which it had occupied prior to Cartesian rationalism and idealism.
The history of the concept of mimesis may be divided into three periods:
(i) the long period spanning from its first formulations by Plato and Aristotle to the
early modern period;
(ii) the period, in which mimesis is declared meaningless by rationalist and idealist phi-
losophies, while still playing an important social role;
(iii) the rehabilitation of the concept in the 20th century by authors such as Walter
Benjamin, Theodor W. Adorno, René Girard, and Jaques Derrida (the first impor-
tant work about mimesis in literature can be found in Auerbach 1953; see recent
studies on the concept of mimesis: Gebauer and Wulf 1995; Girard 1977; Girard
1986; Halliwell 2002; Kablitz 1998; Kardaun 1993; Koch 2010; Melberg 1995; Petersen
2000; Schönert 2004; Scholz 1998; Taussig 1993).
In more recent times, a discovery of mimesis may be observed in cultural studies, the
social sciences as well as in the neurosciences.
2. Antiquity and its continuation

2.1. Plato, Aristotle
In Plato’s work, mimesis has several heterogeneous meanings: aside from mimicry, rep-
resentation and expression one also encounters emulation, transformation, the creation
of similarity, the production of appearances and illusion (see Koller 1954). A unified
conception is not apparent. Depending on the context, the term is applied and evalu-
ated in disparate ways. The Republic emphasises the outstanding importance of mimesis
for education, where it is mainly accomplished through the emulation of models (Plato
1991). A young person strives to become similar to the model. The representation of
bad models is potentially dangerous as it may spoil the youth. Poetic representations
stimulate the mimetic capacity and thus initiate transformations and alterations. This,
however, does not imply that they are easily subordinated to pedagogical and social ob-
jectives. They rather run the risk of developing in an uncontrolled and rampant way and
may thus lead to unwanted side effects. This is the reason why Plato calls for a regulation
of poetry and of the models it portrays.
In Plato’s work, the central significance of mimesis for art, poetry and music is
already intimated. Mimesis is attributed with the capacity of creating a world of illusion.
Imitation is conceived of as the ability to produce images, not things. The images’ defin-
ing feature is their resemblance to things and objects, in which the real and the imag-
inary intermingle. Insofar as images are defined through resemblance, they belong to
the world of appearances by visualizing something, which they themselves are not.
They thus occupy an intermediary space between being and nonbeing.
Plato’s work contains comprehensive, yet also contradictory accounts of mimesis and
a scathing criticism of its truth-value (see Havelock 1963). It can be assumed that Pla-
to’s heterogeneous conception is connected to the transition from oral to literate cul-
ture (see Ong 1982). From this perspective, Plato’s condemnation of mimesis may be
linked to his efforts to replace speech with conceptual discourse. The fraught combina-
tion of oral with literate culture, in which a transformation of language and thought an-
nounces itself, remains characteristic for Plato. With the dissemination of writing and
reading, a number of mimetic particularities of oral culture lose their significance.
Other mimetic capabilities, connected to writing and reading, come to the fore (see
Havelock 1982). It is likely that Aristotle’s allocation of mimesis to the domains of
poetry, art and music may only be adequately understood in the context of the
“literateness” of his philosophy.
In his Poetics, Aristotle (1987) points out that man is distinguished from other ani-
mals through his mimetic ability. Mimesis is innate and already shows itself in earliest
childhood. With its help, man is capable of learning even before he can develop other
forms of world appropriation. The specific form this ability might take depends on how
and by means of which contents it develops. Aristotle differentiates between two as-
pects of mimesis: in the first, drawing on Plato, he emphasises the significance of mime-
sis for the creation of images. He then develops his own conception of literary mimesis
(see Halliwell 1986). In his understanding, mimesis does not merely aim at the recre-
ation of a given state of affairs but also aspires towards transformation, particularly
towards beautification, improvement and a universalization of individual traits.
In poetry, mimesis represents the possible and the universal. The Poetics interprets it
as ‘fable’ or ‘plot’ (Ricoeur 1984: 52ff.). In tragedy, it is aimed at the dramatization and
embodiment of the speaking and acting human being and may be grasped as the capac-
ity for poetic representation, which expresses itself in linguistic and imaginary plot out-
lines. Mimesis creates fictive worlds, with no direct relation to reality. It is less through
the individual elements of the plot than trough artistic organisation that the cathartic
effects of tragedy are produced (see Goldstein 1966). In contrast to Plato, who dreads
the negative consequences of models, Aristotle regards their mimetic emulation as a
chance to diminish their force. A confrontation with models, not their evasion, is thus
to be desired.
Plato’s and Aristotle’s elaborations develop the semantic horizon of mimesis, which
all other discussions of this concept up to the present day will refer to.
2.2. Middle Ages, early modern period: Mimesis as imitatio

and as expression of power
During a long period of time, which encompasses the entire Middle Ages up until the
early modern period, mimesis is understood as the recreation of a model. The highest
entity to be emulated is god, who is also regarded as the source of all creativity. Creative
emulation is not so much understood as a creation of works but rather as a spiritual
matter, particularly in the imitatio Christi. Mimesis is primarily characterised by the fol-
lowing attributes: it is a reproduction according to an idea; it constitutes a relation of
succession in connection to a template, it generates a similarity with the model (see
Flasch 1965) and – an important idea in the Renaissance – it possess verisimilitude.
There are three significant aspects for an understanding of medieval mimesis. The
first aspect concerns the relation of representation. It is dominated by the question of
the relation between models and the results of mimesis. As no direct statements
about god and Jesus are possible, all existing ideas and images of them refer back to
a relation of representation, which every new illustration also needs to relate to. That
is to say, representations always refer to prior representations. The second aspect con-
sists in the observation and examination of processes, which generate artistic and liter-
ary representations. Thirdly, there is an interest in the results of these mimetic processes
and the investigation of the innovative potential of medieval imitatio.
In the early modern period, the concept of mimesis is historicised and a relation to
power is established. It is not only in the 17th century but already in the Renaissance
that the question concerning the connection to history is broached: what is the signifi-
cance of Greek and Roman Antiquity for the present? Do their cultural achievements
constitute pinnacles, which can at best be imitated? Or is the present the highpoint of
the development? There is a controversial debate between the proponents of humanism
and those of the new time. The latter refer to the experimental method, new discoveries
in the natural sciences and the achievements of the present, in relation to which the
knowledge of the “ancients” is depreciated. These discussions lead to the historical re-
lativization of cultural works and thus to the relativization of mimesis’ claim to validity,
which still constitutes the paradigm for the appropriation of cultural products (see
Nelson 1977).
Contemporary poetics discuss the significance of handed down rules for literary pro-
duction. In addition to the criteria derived from Horace, similitude and verisimilitude
become important factors for the evaluation of literary texts. The discussions about
style and the value of literary works are expressions of the fight for social control
over symbolic expression and the authoritative worldview. From this perspective, the
caustic debates about the adherence to rules and the mimesis of traditional prototypes
becomes intelligible. In connection with these quarrels, a literary subjectivism, whose
origins date back to the late Middle Ages, is articulated for the first time. Appealing
to the mimesis of the “ancients”, the possibility of developing a personal style is ex-
plored. Protected by the authority of the ancient authors, the Renaissance poets
develop their personal modes of expression beyond the demands of the ecclesiastical
and secular authorities. The call for mimesis thus leads to a formal obligation in literary
representation while enabling a subjective expression of the individual.
With the dissemination of the book and an increase of readers, texts acquire an
unprecedented importance. The question of how the reading of texts may be influenced
through commentaries and guidelines becomes highly relevant. Through the immediate
reference of one text to another, intertextuality becomes an issue (see Fumaroli 1980).
What does it mean for literary production that texts mimetically relate to other texts
and thus create new texts? What is the relation between textuality and the significant
literary practice of fragmentation, conspicuous in the work of Montaigne (1958)? The
systematic use of fragments certainly constitutes a modern principle of literary compo-
sition and is accounted for by the insufficiency of human knowledge. In Montaigne it
exists in correlation with self-referentiality, in which the “I” is always ultimately
referred back to itself (see Starobinski 1985). This self-referentiality constitutes the
central mimetic framework for the understanding of the modern subject.
3. Reorientation of the theory of mimesis

3.1. The 17th century: Mimesis between ancient heritage
and state orchestration
In the 17th century, as was already the case in the Renaissance, the ancient poetic and
rhetorical models are upheld under strict normative control. In contrast to former
epochs, aesthetics is given a different position in society, especially in France, where
it begins to occupy a new function within the power structure (see Reiss 1982).
Under Louis XIV, poets, musicians, historiographers, architects and painters serve the
institution of power. The unfolding of subjectivity in artistic mimesis is integrated
into the subject of the absolutistic state (Du Bellay 1968).
Apart from military strength and financial resources, the ruler needs the power of sym-
bolic representation (see Apostolidès 1985; Boileau-Despréaux 1683; Girard 1977). The
grandeur, absoluteness and legitimation that the king aspires to cannot be achieved with-
out representation. Mimesis is given a constructive function within the absolute monar-
chy of France: it produces a world of political aesthetics (Elias 1983). This occurs
primarily in architecture, garden design, theatre, painting and historiography.
The Querelle des Anciens et des Modernes (Gillot 1968), the big controversy about
the relation of the present to the ancient world, shows how strong the desire was to sub-
ordinate oneself to the normativity of the historical and aesthetic heritage, to look for
support in the non-political authorities of the past and to thereby authorise one’s power
(see Gillot 1968). Against this attempt on part of the Anciens, there is the position of
the Modernes, who no longer conceive of the present as a parallel to the ancient
world. For them, the age of Louis XIV is a historical and political novelty, which
they oppose to antiquity as a world in its own right. Mimesis has finally ceased to be
at the service of tradition and to always function in the same way (Auerbach 1953:
370ff.). The Modernes regard France in opposition, as a competitor even, to the ancient
world. Following the defence of the modern position, a realisation and ascertainment of
one’s own aesthetic practice occurs.
In the 17th century a new conception of symbolic systems emerges: one realizes that
the signs can be made to depend on a will that is capable of enforcing them as a
symbolic order in its own right (Foucault 1970). The representations thus generated
lay claim to universal validity. They are applied in all symbolic domains with state-
supporting functions and are thus conflated into an all-encompassing fiction. The object
of these fictionalizing representations is the power of the king and of the state. Yet this
all-encompassing demand is met with a profound scepticism towards the representa-
tional function of language and painting, especially in the works of Racine, La Roche-
foucauld, Pascal and Madame Lafayette (see Lyons 1982; Stierle 1984; Todorov 1982).
Other thinkers, too, question the mimetic representation of reality, most notably Des-
cartes, who supplants representational thinking with the scientific method (see Ehrard
1970; Fischer-Lichte 1989: 70; Judowitz 1988).
3.2. The Enlightenment and the 19th century

In the Age of Enlightenment, the mimesis of nature and the nature of mimesis become
the objects of intense art-theoretical debates. In the first phase of this argument in
France, England and Germany the concept is narrowed down to imitation (see Boyd
1968: 98ff.), while the concept of nature is increasingly expanded (see Putnam 1981).
The debate is centred on the question as to what kind of world is created through artis-
tic mimesis: what are its characteristics? How closely is it linked to social reality? Is it
permitted to refer to a possible world other than the existing one?
In the second half of the 18th century, the debate enters a new phase. Now the man-
ner in which artistic mimesis occurs is discussed (see Herrmann 1970; Hohner 1976;
Huyssen 1981). The discussion about method leads to a much more precise elaboration
of the problem and concerns both technical and theoretical aspects: how, with the aid of
signs, are symbolic worlds fabricated? What rules are employed? What restrictions have
to be taken into consideration? In place of a vague philosophy of mimesis, referencing
ancient poetries and poetics, there emerges a way of thinking, based on semiotics, art
criticism and judgements of taste (Bourdieu 1989). Aristotle still remains the great
authority, yet the petrified Neo-Aristotelianism of 17th century French dramatic theory
is replaced by an interpretation that no longer wants to adhere to the letter, but to the
intention of the Poetics.
In the Age of Enlightenment, drama comes to occupy a completely different posi-
tion than in earlier epochs. The concept of catharsis is taken as a basis to turn the
stage into an exceptional space, where the “cult of feelings” of the bourgeoisie can
unfold (Szondi 1979). Dramatic plots are made transparent in relation to inner mo-
tives, desires and feelings (Zimbardo 1986). Emotionalisation is the main character-
istic of the aspiring social class. This idea is theatrically illustrated and validated by
endowing it with ethical traits and attributing it, especially the idea of compassion,
with universal significance. The capacity for sentiment and feeling is portrayed on
stage as characteristic of the bourgeoisie. Theatrical mimesis is instrumental in giving
emotional expression a generally recognizable scenic configuration and to fashion the
interior life of the bourgeois by means of a codification of public expression (see Elias
1978; Foucault 1979). Due to the mimesis of emotions, drama participates in the self-
legitimation of the bourgeoisie; the moral playhouse becomes a court judging the
dominant politics.
In the theatre, models of aesthetic and emotional conduct are acted out, which
become increasingly important for the bourgeoisie’s self-conception. Mimesis becomes
the arbiter between inner life and the public (see Habermas 1991; Sennett 1977).
It extends its realm into the domain of emotion, yet subordinates emotional expression
to taste and verisimilitude. The Aristotelian categories are interpreted by the anthro-
pology of the emerging bourgeois society, one of the problems being that bourgeois
society is not yet fully formed. This leads to a special situation, in which theatrical
mimesis participates in actively shaping a society and its anthropology – with its central
conceptions of man, humanity, morality and free will.
Theatrical mimesis develops an impact far beyond the stage. It transforms the every-
day into a dramatic text, in which the individuals play their part (see Geyer 2007; Will-
ems 2009). Acting becomes a social category. For Diderot and Lessing, the two
spokesmen of bourgeois dramatic art, there exist ideal models in nature, which authors
and actors need to grasp (Lessing 1962). Both adhere to an idea of a universal natural
order given to mimesis, yet both are already on their way to a new conception (see Be-
laval 1950; Lacoue-Labarthe 1989). They recognize that semiotic systems and media
not only influence what is portrayed but also decisively participate in its creation (see
Genett 1969; Goodman 1968).
Towards the end of the 18th century, the interest of art theoreticians in the concept
of mimesis begins to wane. Contemporaneous with this decline of interest, mimetic pro-
cesses become increasingly important for the creation of the social world. Mimesis
slowly develops into an all-encompassing, yet theoretically barely recognized social
category.
3.3. Mimesis as a principle for world making in novel and society

In the 19th century, mimetic techniques single out individual cases from the totality of
society; their starting point is a methodological individualism. On the surface of the
social interaction of bourgeois society, new codes are established, which, in the form
of precisely defined indices, express the individual’s inner states. Literature opens a
gateway from society’s exterior manifestation into its interior. Its concern is to give
an account of reality, which is superior to all other interpretations of society.
The lives of the bourgeois are also dominated by mimetic practices. In the game of
social players, world making practices emerge, which utilise principles analogous to the
ones employed in the construction of a novel. In Balzac, for instance, there is a literary
mimesis of social mimesis, of society’s common mode of world making (see Lukács
1952). The author thus elevates, dramatizes and justifies social reality. Yet there are
numerous novelists who, in contrast to Balzac, view social mimesis with scepticism
(see Prendergast 1986). They realize that the world making of bourgeois society is sub-
ject to social constraints, which they exemplify by means of their literary characters. In
their novels they engage in a clear-sighted distancing from social mimesis and thus from
their societies (to be observed in Stendhal, Flaubert and Dostoyevsky; see Girard 1966).
By means of their literary writing, they open up the possibility of freeing oneself from
the power of social mimesis. Their literary portrayal of a social portrayal is a return to
the medium as well as to the activities of author and reader, to the writing, reading and
imagining – a reflexive position vis-à-vis the constitutive processes of literature. The
result is a tendency towards the disintegration of the novel (e.g. in Flaubert: L’Éduca-
tion Sentimentale). The portrayal of the portrayal effaces itself. By making mimesis
visible, the novel falls silent.
4. The twentieth century

4.1. Mimesis as access to world, language and writing
In the works of Walter Benjamin and Theodor W. Adorno, the concept of mimesis is
rediscovered and described as an indispensable human ability: it enables access to
the world and to others; it is the precondition for understanding and facilitates a partial
overcoming of the subject-object dichotomy. With the aid of mimesis the world is
“translated” into images and thus made available to memory and to the imagination
(Benjamin 2006: 84). The gradual repression of the mimetic capacity in civilisation
leads to a loss of immediacy in relation to the world and to a reduction of the expressive
aspects of language in favour of its semantic content and instrumental function (Hor-
kheimer and Adorno 2002). According to Benjamin’s speculative interpretation, certain
elements of the mimetic relation to the world have been handed over to writing as an
“archive of non-sensual correspondences”. To avoid an impoverishment of the relation
to the world and to others, these elements in reading and writing have to be decoded
and revived with the aid of the mimetic capacities available to man (Fittler 2005).
The significance of mimetic processes for ontogenesis is exemplarily illustrated in
Benjamin’s Berlin Childhood. At its centre is the child’s development, which results
from a mimetic interaction with its environment. Through the latter the child becomes
similar to the many places, objects and constellations that surround it and thereby ex-
pands the limits of its subjectivity. By means of language and imagination, it creates an
inner outside world, which is enriched through subsequent mimetic processes. Its
mimetic capacities facilitate “living experiences” and, closely linked to the former,
the possibility of happiness. They constitute a corrective to the conceptual thought
that deprives it of vitality (Adorno 1984). With the increasing domination of man
over nature, the subject has also suppressed part of its nature. The mimetic capacity
cannot reverse this process; still it can make it bearable by allowing for aesthetic
experiences and other “non-conceptual cognitions” (Adorno 1978).
In philosophy, writing and reading are always interpreted mimetically. According to
Jaques Derrida, they always refer to something, which precedes them: writing to what
has already been written, reading to what has already been read. Texts are never origi-
nals but always “doubles”; they result from compositions, additions, and intersections
with other texts. They have no origin but begin in mimetic situations: every origin is
a repetition. Writing can be copied an infinite number of times and is open to different
interpretations. Texts are generated and disseminated; their dissemination and utilisa-
tion elude every attempt at control; they refuse univocity (Derrida 1981); they are con-
nected to play, to the simulacrum, to concealment. The mimetic approach to texts
differs from imitation and simulation through an element of difference. Its aim is not
the production of sameness but the creation of similarity; it enables differences and
thus productive freedom.
4.2. Social Mimesis

In the social sciences and in cultural studies there has been a growing awareness in the
last decades that mimetic processes, which had been overlooked for a long time, play a
highly significant role in many social situations. This will be illustrated by using the
example of rituals and games.
Rituals refer to earlier rituals that have already been performed, without being mere
copies of the latter. Every performance of a ritual is based on a new performance that
entails a modification of earlier ritualistic activities. In mimetic processes, a mimetic
relation with earlier ritualistic arrangements is established (Gebauer and Wulf 2003;
Wulf 2005). The dynamics of ritual push toward repetition as well as difference. Repe-
tition of ritualistic action never entails an exact reproduction of the past, but leads to
the creation of a new situation, in which the difference to prior realisations of the ritual
constitute a constructive element (Wulf et al. 2004).
Games, akin to plays and rituals, are also mimetically structured. Every performance
of a game is unique. At the same time it constitutes a mimetic repetition and an update
of cultural knowledge, contained in the numerous games and applied in the game’s ac-
tivities, for which games represent mnemonic possibilities (Gebauer and Wulf 1998).
Through games, people enter special realms of experience and modify their own as
well as their society’s cultural knowledge. At the same time, by virtue of their partici-
pation, they are transformed into players: games create their players and, insofar as
they foster social practice and integration, they expand the players’ social knowledge
and their ability to act competently – yet only within a small fragment of the entirety
of social possibilities. Their actions are based on an “as-if ” and in this sense they are
both serious and playful. If this double meaning is lost, the game becomes serious
and may cause violence.
In games, the sense of ludic activity develops before the players notice it. It emerges
without entering consciousness. Depending on the kind of game – free play, institutio-
nalised games, competition, dramatics or gambling – the sense that emerges within the
game is always different. Roger Caillois (1961) describes these different game varieties
as agon (competition), alea (chance), mimicry (mask) and llinx (intoxication) and sets
them in relation to the societies in which they are played. Despite their significant dif-
ferences, games are always characterised by a mimetic reference to the world outside
the game. The analysis of games therefore generates an insight into the social and
cultural reality of a society (see Gebauer et al. 2004).
5. Outlook: New findings

In recent research in the neurosciences and in developmental psychology, semblance
and imitation occupy a surprisingly prominent position in the interpretation of the be-
haviour of humans and eutherian mammals. Research about specific motion neurons in
macaques, the so-called mirror neurons, indicates an immediate comprehension of the
actions of the researcher on part of the monkey (see Iacobini 2008; Rizzolatti and Sini-
gaglia 2008). In a whole series of neurobiological analysis, there is the assumption that
in direct interaction an immediate understanding of another person’s emotions occurs, a
bodily resonance, which enables a comprehension of his/her emotional state or situation
(see recent researches in Bastiaansen, Thioux, and Keyers 2009; Enticott et al. 2008;
Hennenlotter et al. 2009).
In the works of the developmental psychologist Michael Tomasello, the essential dif-
ference between human and animal behaviour lies in the child’s ability to create a
mimetic relation between itself and the person conducting the experiment (Tomasello
1999, see chapters 3 and 4). This is neither the case in the interaction between monkeys
amongst themselves nor in the interaction between monkey and researcher (Tomasello
2008). By now, there no longer seems to be any doubt concerning the biological foun-
dations of mimetic behaviour in humans, whereas there is no consensus about their
relations to the formation of social mimetic processes.
6. References
Adorno, Theodor W. 1978. Minima Moralia. Translated by Edmund Jephcott. London: Verso.
Adorno, Theodor W. 1984. Aesthetic Theory. Ttranslated by C. Lenhardt. London: Routledge.
Apostolidès, Jean-Marie 1985. Le Prince Sacrifié. Théâtre et Politique au Temps de Louis XIV.
Paris: Minuit.
Aristotle 1987. Poetics. Translated by Stephen Halliwell. Chapel Hill: University of North Carolina
Press.
Auerbach, Erich 1953. Mimesis. The Representation of Reality in Western Literature. Translated by
Willard R. Trask. Princeton, NJ: Princeton University Press.
Bastiaansen, Jojanneke A., Marc Thioux and Christian Keysers 2009. Evidence for mirror systems
in emotions. Physiological Transactions of the Royal Society 364(1528): 2392–2404.
Belaval, Yvon 1950. L’Esthétique sans Paradoxe de Diderot. Paris: Gallimard.
Benjamin, Walter 2006. Problems in the sociology of language. In: Howard Eiland and Michael W.
Jennings (eds.), Walter Benjamin: v. 3: Selected Writings, 1935–1938. Cambridge, MA: Harvard
University Press.
Boileau-Despréaux, Nicolas 1683. Art of Poetry. Translated by Sir William Soames. London: Bent-
ley and Magnes.
Bourdieu, Pierre 1989. The historical genesis of a pure aesthetic. Journal of Aesthetics and Art
Criticism 46: 201–210.
Boyd, John D. 1968. The Function of Mimesis and Its Decline. Cambridge, MA: Harvard Univer-
sity Press.
Caillois, Roger 1961. Man, Play and Games. Translated by Meyer Barash. New York: Free Press of
Glencoe.
Derrida, Jacques 1981. The double session. In: Jacques Derrida, Dissemination, 187–237. Trans-
lated by Barbara Johnson. Chicago: University of Chicago Press.
Du Bellay, Joachim 1968. La Défence et illustration de la langue Française. In: Hubert Gillot (ed.),
La Querelle des Anciens et des Modernes en France. Geneva: Slatkine.
Ehrard, Jean 1970. L’Ideé de Nature en France à l’Aube des Lumières. Paris: Flammarion.
Elias, Norbert 1978. The Civilizing Process. Translated by Edmund Jephcott. New York: Urizon
Books.
Elias, Norbert 1983. The Court Society. Translated by Edmund Jephcott. New York: Pantheon
Books.
Enticott, Peter G., Kate E. Hoy, Sally E. Herring, Patrick J. Johnston, Zafiris J. Daskalakis and
Paul B. Fitzgerald 2008. Mirror neuron activation is associated with facial emotion processing.
Fischer-Lichte, Erika 1989. Semiotik des Theaters. Eine Einführung, Vol. 2. Tübingen: Narr.
Fittler, Doris M. 2005. Ein Kosmos der Ähnlichkeit: Frühe und Späte Mimesis bei Walter Benjamin.
Bielefeld: Aisthesis.
Flasch, Kurt 1965. Ars imitatur naturam. Platonischer Naturbegriff und mittelalterliche Philoso-
phie der Kunst. In: Kurt Flasch (ed.), Parusia. Festgabe für Johannes Hirschberger, 265–306.
Frankfurt am Main: Minerva.
Foucault, Michel 1970. The Order of Things: An Archaeology of the Human Sciences. New York:
Vintage Books.
Foucault, Michel 1979. Discipline and Punish: The Birth of the Prison. Translated by Alan Sher-
idan. New York: Vintage Books.
Fumaroli, Marc 1980. L’Âge de l’Éloquence. Rhéthorique et “Res Literaria” de la Renaissance au

Sevil de l’Époque Classique. Geneva: Droz.
Gebauer, Gunter, Thomas Alkemeyer, Uwe Flick, Bernhard Boschert and Robert Schmidt 2004.
Treue zum Stil. Die Aufgeführte Gesellschaft. Bielefeld: transcript.
Gebauer, Gunter and Christoph Wulf 1995. Mimesis. Culture, Art, Society. Berkeley: University of
California Press.
Gebauer, Gunter and Christoph Wulf 1998. Spiel, Ritual, Geste. Reinbek: Rowohlt.
Gebauer, Gunter and Christoph Wulf 2003. Mimetische Weltzugänge. Soziales Handeln – Rituale
und Spiele – Ästhetische Produktionen. Stuttgart: Kohlhammer.
Genette, Gérard 1969. Figures II. Paris: Seuil.
Geyer, Paul 2007. Die Entdeckung des modernen Subjekts. Anthropologie von Descartes bis Rous-
seau. Würzburg: Königshausen & Neumann.
Gillot, Hubert 1968. La Querelle des Anciens et des Modernes en France. Geneva: Slatkine.
Girard, René 1966. Deceit, Desire and the Novel: Self and Other in Literary Structure. Translated
by Yvonne Freccero. Baltimore: Johns Hopkins University Press.
Girard, René 1977. Violence and the Sacred. Baltimore: Johns Hopkins University Press.
Girard, René 1986. The Scapegoat. Baltimore: Johns Hopkins University Press.
Goldstein, Harvey D. 1966. Mimesis and catharsis reexamined. Journal of Aesthetics and Art Crit-
icism 24: 567–577.
Goodman, Nelson 1968. Languages of Art: An Approach to a Theory of Symbols. Indianapolis:
Bobbs-Merrill.
Goodman, Nelson 1978. Ways of Worldmaking. Hassocks, UK: Harvester Press.
Habermas, Jürgen 1991. The Structural Transformation of the Public Sphere. Translated by Tho-
mas Burger. Cambridge: Massachusetts Institute of Technology Press.
Halliwell, Stephen 1986. Aristotle’s Poetics. London: Ducksworth.
Halliwell, Stephen 2002. The Aesthetics of Mimesis: Ancient Texts and Modern Problems. Prince-
Havelock, Eric A. 1963. Preface to Plato. Cambridge, MA: Harvard University Press.
Havelock, Eric A. 1982. The Literate Revolution in Greece and Its Cultural Consequences. Prince-
Hennenlotter, Andreas, Christian Dresel, Florian Castrop, Andreas O. Caballos-Baumann, Afra
M. Wohlschläger and Bernhard Haslinger 2009. The link between facial feedback and neural
activity within central circuitries of emotion – new insights from botulinum toxin-induced de-
nervation of frown muscles. Cerebral Cortex 19(3): 537–542.
Herrmann, Hans Peter 1970. Naturnachahmung und Einbildungskraft. Bad Homburg: Athenäum.
Hohner, Ulrich 1976. Zur Problematik der Naturnachahmung in der Ästhetik des 18. Jahrhunderts.
Erlangen: Palm & Enke.
Horkheimer, Max and Theodor W. Adorno 2002. Dialectic of Enlightenment. Translated by Ed-
mund Jephcott. Stanford, CA: Stanford University Press.
Huyssen, Andreas 1981. Das Versprechen der Natur. Alternative Naturkonzepte im 18. Jahrhun-
dert. In: Reinhold Grimm and Jost Hermand (eds.), Natur und Natürlichkeit. Stationen des
Grünen in der Deutschen Literatur, 1–18. Königstein: Athenäum.
Iacobini, Marco 2008. Mirroring People: The New Science of How We Connect with Others. New
York: Farrar, Strauss and Giroux.
Judowitz, Dalia 1988. Subjectivity and Representation in Descartes. The Origin of Modernity. Cam-
Kablitz, Andreas (ed.) 1998. Mimesis und Simulation. Freiburg im Breisgau: Rombach.
Kardaun, Maria 1993. Der Mimesisbegriff in der Griechischen Antike: Neubetrachtung eines Um-
strittenen Begriffs als Ansatz zu einer Neuen Interpretation der Platonischen Kunstauffassung.
Amsterdam: North-Holland.
Koch, Gertrud (ed.) 2010. Die Mimesis und ihre Künste. Munich: Fink.
Koller, Hermann 1954. Die Mimesis in der Antike. Nachahmung, Darstellung, Ausdruck. Bern: A.
Francke.
Lacoue-Labarthe, Philippe 1989. Typography: Mimesis, Philosophy, Politics. Cambridge, MA:
Harvard University Press.
Lessing, Gotthold Ephraim 1962. Hamburg Dramaturgy. Translated by Victor Lange. New York:
Dover.
Lukács, Georg 1952. Balzac und der Französische Realismus. Berlin: Aufbau-Verlag.
Lyons, John D. 1982. Speaking in pictures, speaking of pictures. Problems of representation in
the seventeenth Century. In: John D. Lyons and Steven G. Nichols Jr. (eds.), Mimesis:
From Mirror to Method, Augustine to Descartes, 166–187. Hanover, NH: University of
New England Press.
Melberg, Arne 1995. Theories of Mimesis. Cambridge: Cambridge University Press.
Montaigne, Michel de 1958. The Complete Essays. Translated by Donald M. Frame. Stanford, CA:
Stanford University Press.
Nelson, Benjamin 1977. Die Anfänge der modernen Revolution in Wissenschaft und Philosophie.
Fiktionalismus, Probabilismus, Fideismus und katholisches Prophetentum. In: Benjamin Nel-
son (ed.), Der Ursprung der Moderne, 94–139. Frankfurt am Main: Suhrkamp.
Ong, Walter J. 1982. Orality and Literacy: The Technologizing of the Word. London: Routledge.
Petersen, Jürgen H. 2000. Mimesis – Imitatio – Nachahmung: Eine Geschichte der europäischen
Poetik. Munich: Fink.
Plato 1991. The Republic of Plato, 393 c. translated by Allan Bloom. New York: Basic Books.
Prendergast, Christopher 1986. The Order of Mimesis. Balzac, Stendhal, Nerval, Flaubert. Cam-
Putnam, Hilary 1981. Reason, Truth and History. Cambridge: Cambridge University Press.
Reiss, Timothy 1982. Power, poetry and the resemblance of nature. In: John D. Lyons and Steven
G. Nichols, Jr. (eds.), Mimesis: From Mirror to Method, Augustine to Descartes, 215–247. Han-
over, NH: University Press of New England.
Ricoeur, Paul 1984. Time and Narrative, Vol. 1. Chicago: University of Chicago Press.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain. Oxford: Oxford University
Press.
Scholz, Bernhard F. (ed.) 1998. Mimesis: Studien zur literarischen Repräsentation. Tübingen, Ger-
many: Francke.
Schönert, Jörg (ed.) 2004. Mimesis – Repräsentation – Imagination: Literaturtheoretische Positio-
nen von Aristoteles bis zum Ende des 18. Jahrhunderts. Berlin: de Gruyter.
Sennett, Richard 1977. The Fall of Public Man. New York: Alfred A Knopf.
Starobinski, Jean 1985. Montaigne in Motion. Translated by Arthur Goldhammer. Chicago: Uni-
versity of Chicago Press.
Stierle, Karlheinz 1984. Das bequeme Verhältnis. Lessings “Laokoon” und die Entdeckung des
ästhetischen Mediums. In: Gunter Gebauer (ed.), Das Laokoon-Projekt. Pläne einer semio-
tischen Ästhetik, 23–58. Stuttgart, Germany: Metzler.
Szondi, Peter 1979. Die Theorie des Bürgerlichen Trauerspiels im 18. Jahrhundert. Edited by Gert
Mattenklott. Frankfurt am Main: Suhrkamp.
Taussig, Michael T. 1993. Mimesis and Alterity: A Particular History of the Senses. New York:
Routledge.
Todorov, Tzvetan 1982. Theories of the Symbol. Translated by Catherine Porter. Ithaca, NY: Cor-
nell University Press.
University Press.
Willems, Herbert (ed.) 2009. Theatralisierung der Gesellschaft. Soziologische Theorie und Zeit-
diagnose. Wiesbaden: Verlag für Sozialwissenschaften.
Wulf, Christoph 2005. Zur Genese des Sozialen. Mimesis, Performativität, Ritual. Bielefeld:
transcript.
Wulf, Christoph, Birgit Althans, Kathrin Audehm, Constanze Bausch, Benjamin Jörissen,
Michael Göhlich, Ruprecht Mattig, Anja Tervooren, Monika Wagner-Willi and Jörg
Zirfas 2004. Bildung im Ritual. Schule, Familie, Jugend, Medien. Wiesbaden: Verlag für
Sozialwissenschaften.
Zimbardo, Rose A. 1986. A Mirror to Nature. Transformations in Drama and Aesthetics 1660–1732.
Lexington: University Press of Kentucky.
Gunter Gebauer, Berlin (Germany)

Christoph Wulf, Berlin (Germany)
IV. Contemporary approaches
30. Mirror systems and the neurocognitive substrates

of bodily communication and language
1. Introducing mirror neurons and mirror systems
2. Direct and indirect pathways
3. From mirror neurons to language
4. Language within the framework of bodily communication
5. Putting it together
6. References
Abstract
The mirror system hypothesis for the evolution of the language ready brain hypothesizes
that the ability to produce and recognize hand movements for practical goals may have
underwritten the evolution of brain systems supporting manual gesture, with further evo-
lution yielding protosign and protospeech. The chapter thus introduces mirror neurons
and mirror systems, stressing the role of direct and indirect pathways in supporting imita-
tion, and then traces the posited path from mirror neurons to language, placing language
within the framework of bodily communication. Ontogenetic ritualization in apes is
assessed in relation to human gestural communication, noting the role of human emo-
tional and intentional dispositions in supporting communication that goes beyond the
communicative repertoire of apes that lack human contact, pointing the way to deixis.
1. Introducing mirror neurons and mirror systems

We may seek to understand the human brain by studies of neurological patients and by
brain imaging of both normal subjects and patients. However, such studies are limited
to the relative activity of large regions of brain in different tasks or to the effects
of gross brain damage. To delve into the details of neural circuitry we must turn to
single-cell neurophysiology of animals, correlating the firing of individual neurons
with what an animal is sensing, thinking, or doing. Our closest “evolutionary cousin”
for which detailed neurophysiology as well as behavioral data are available is the
macaque monkey. The last common ancestor of macaque and human lived perhaps
20 million years ago, whereas the last common ancestor of chimpanzee and human
lived around 6 to 7 million years ago – but single-cell neurophysiology on chimpanzees
is not allowed. Thus for those of us who wish to place human communication in an evo-
lutionary context, the chosen data sets are behavior for humans, chimpanzees (and
other apes) and macaques (and other monkeys), brain imaging for humans and now,
to a lesser extent, chimpanzees and macaques, and detailed neurophysiology of maca-
ques. The discovery of mirror neurons in the brains of macaques opens up the approach
to an understanding of the evolution of bodily forms of communication and language
that this article sets forth.
452 IV. Contemporary approaches
Before going further we introduce a cast of brain regions (in both macaque and
human) which will play roles in what follows:
Motor cortex is the region of cerebral cortex that can send detailed motor commands
to, for example, the muscles of the hands, thus supporting skilled movements.
Premotor cortex is the region of cerebral cortex just in front of motor cortex. It pro-
vides neural codes for actions and movements that can be sent to motor cortex to be
elaborated into instructions for the musculature.
Broca’s region is an area of the human brain which, in the left hemisphere, is
traditionally associated with speech production.
The basal ganglia form a subcortical region traditionally associated with the
sequencing of movement.
The insula is a region deep within the folds of cerebral cortex. It plays a role in
diverse functions related to emotion.
With this, let us turn to mirror neurons and see their role in action and communica-
tion, while also emphasizing the cooperation of many other brain regions as well in sup-
porting these functions: A mirror neuron in the brain of creature C for an action A is a
neuron that fires both
(i) when C executes A and actions similar to A, but not when C executes dissimilar
actions, and
(ii) when C observes another creature execute an action more-or-less similar to A.
We need single-cell recording to tell whether or not a neuron has these properties. Mir-
ror neurons were first recorded in Parma from an area called F5 in the premotor cortex
of the brains of macaque monkeys, and were mirror neurons for particular kinds of
grasp, e.g., neurons that fired in association with a precision pinch but not a power
grasp or vice versa (Gallese et al. 1996). Subsequent modeling advanced the view
that mirror neurons gained their specificity through learning (Oztop and Arbib 2002),
a view amply confirmed by empirical data showing that some mirror neurons could
respond to tearing paper versus breaking peanuts, with some responsive to the sound
as well as the sight of the action (audiovisual neurons; Bonaiuto, Rosta, and Arbib
2007; Kohler et al. 2002), while other mirror neurons could be formed – but only
after extensive training of the monkey – that were responsive to tool use (Umiltà
et al. 2008). Moreover, mirror neurons have been found in macaques for oro-facial ac-
tions, both ingestive actions and some related communicative gestures like lip-smacks
and teeth chatters (Ferrari et al. 2003), and it has been suggested that the insula may
contain mirror neurons for the expression of disgust (Wicker et al. 2003, see Rizzolatti
and Craighero 2004 for an extensive review and references to the primary literature on
macaque mirror neurons and human mirror neuron systems).
In humans, our data come from brain imaging, not single neuron recording. We say a
brain region is a mirror system for a class A of actions if, relative to a suitable control
condition, it exhibits significant activation both for humans executing actions of class A
and humans observing other people execute actions of class A. Brain imaging has re-
vealed that the human brain has a mirror system for grasping – regions that are more
highly activated both when the subject performs a range of grasps and observes a
range of grasps (Grafton et al. 1996, Rizzolatti et al. 1996). Following this early work
there has been an explosion of papers on the imaging of various human mirror systems,
30. Mirror systems and the neurocognitive substrates 453
but very few papers exploring mirror neurons in the macaque, and the latter primarily
from the Parma group and their colleagues.
The terms mirror system or mirror neuron system suggest the false impression that
observed activity must be due primarily to firing of mirror neurons (though in some
cases it might be). Quite apart from other issues, this ignores the fact that brain regions
such as F5 in the macaque which contain mirror neurons also contain canonical neurons
(which fire as the monkey executes actions, but not during observation of actions of
others) and other neuron types as well, so that finer-grained analysis is needed to
tease apart the contribution of different classes of neurons in the brain activation
seen in diverse studies.
Complementing this concern is the fact that much discussion of action understanding
focuses on the brain regions thought to contain mirror neurons, excluding the role of
other brain regions. While it is true that mirror neuron activity correlates with observing
an action, it is probably false that such activation is sufficient for understanding the
movement. A possible analogy might be to observing a bodily gesture in a foreign
culture – one might be able to recognize much of the related movement of head, body,
arms and hands that constitute it, yet be unable to understand what it means within
the culture.
Rizzolatti and Sinigaglia (2008: 137) assert that numerous studies (e.g., Calvo-Merino
et al. 2005) “confirm the decisive role played by motor knowledge in understanding
the meaning of the actions of others.” The cited study used functional Magnetic Reso-
nance Imaging (fMRI) brain imaging to study experts in classical ballet, experts in ca-
poeira (a Brazilian dance style) and inexpert control subjects as they viewed videos of
ballet or capoeira actions. They found greater bilateral activations in various regions
including “the mirror system” when expert dancers viewed movements that they had
been trained to perform compared to movements they had not. Calvo-Merino et al.
(2005) assert that “their results show that this ‘mirror system’ integrates observed
actions of others with an individual’s personal motor repertoire, and suggest that
the human brain understands actions by motor simulation.” However, nothing in the
study shows that the effect of expertise is localized entirely within the mirror system.
Indeed, the integration they posit could well involve “indirect” influences from the
prefrontal cortex.
Indeed, Rizzolatti and Sinigaglia (2008: 137) go on to say that the role played by
motor knowledge does not preclude that these actions could be “understood with
other means”. Buccino et al. (2004) used fMRI to study subjects viewing a video, with-
out sound, in which individuals of different species (man, monkey, and dog) performed
ingestive (biting) or communicative (talking, lip smacking, barking) acts. In the case of
biting there was a clear overlap of the cortical areas that became active in watching
man, monkey and dog, with activation in areas linked to mirror systems. However,
although the sight of a man moving his lips as if he were talking induced strong “mirror
system” activation in a region that corresponds to Broca’s area; the activation was weak
when the subjects watched the monkey lip smacking, and disappeared completely when
they watched the dog barking. Buccino et al. (2004) conclude that actions belonging to
the motor repertoire of the observer (e.g., biting and speech reading) are mapped on
the observer’s motor system, whereas actions that do not belong to this repertoire
(e.g., barking) are recognized without such mapping. However, in view of the distribu-
ted nature of brain function, I would suggest that the understanding of all actions
involves general mechanisms which need not involve the mirror system strongly – but
that for actions which are in the observer’s repertoire these general mechanisms may be
complemented by activity in the mirror system which enriches that understanding by
access to a network of associations linked to the observer’s own performance of such
actions.
(e)
F5 Mirror F5 Canonical
Act
Recognize Actions Specify Actions
Observe
Scene Interpretation
& Action Planning
Action Understanding
Fig. 30.1: The perceptuomotor coding for both observation and execution contained in F5 region
for manual actions in the monkey is linked to “conceptual systems” for understanding and plan-
ning of such actions within the broader framework of scene interpretation. The interpretation and
planning systems themselves do not have the mirror property save through their linkage to the
actual mirror system.
Summarizing this discussion, Fig. 30.1 emphasizes (a) that F5 (and presumably any
human homologue labeled as a “mirror system”) contains non-mirror neurons (here
the canonical neurons are shown explicitly), and (b) that it functions only within a
broader context provided by other brain regions for understanding and planning of ac-
tions within a broader framework of scene interpretation. The direct pathway (e) from
mirror neurons to canonical neurons for the same action may yield “mirroring” of the
observed action, but is normally under inhibitory control. In some social circumstances,
a certain amount of mirroring is appropriate, but the total lack of inhibition exhibited in
echopraxia and echolalia (Roberts 1989) – compulsive repetition of observed actions or
heard words – is pathological.
2. Direct and indirect pathways

The earlier example of “recognizing” a bodily gesture in a foreign culture yet not under-
standing it can be illuminated by considering a conceptual model derived from the
study of apraxia. De Renzi (1989) reports that some apraxics exhibit a semantic deficit –
they have difficulty both in classifying gestures and in performing familiar gestures on
command – yet may be able to copy the pattern of movement of such a gesture without
“getting its meaning”. We call this residual ability low-level imitation to distinguish it
from imitation based on recognition and “replay” of a goal-directed action. Rothi,
Ochipa, and Heilman (1991) thus proposed a dual route imitation learning model
(Fig. 30.2).
– The direct route, supporting imitation of meaningless and intransitive gestures, con-
verts a visual representation of limb motion into a set of intermediate limb postures
or motions for subsequent execution (low-level imitation).
– The indirect route recognizes and then performs known actions.
Auditory/Verbal Tactile or Visual/Object Visual/Gestural

Input Input Input
Auditory Analysis Tactile or Visual Visual Analysis

Analysis
Direct Direct
Object Recognition
Route Route
System
Phonological Input Input Praxicon

Lexicon Semantics
(Action)
Phonological Output Output Praxicon

Lexicon
Phonological Buffer
???
Innervatory Patterns Innervatory Patterns
Motor Systems Motor Systems
Fig. 30.2: A dual route imitation learning model balancing language and praxis. The “?” indicates
that the right-hand side should be augmented by an “action buffer”, as described later in this chap-
ter. The upright dashed oval emphasizes the bidirectional link between lexicon and semantics
(Arbib and Bonaiuto 2008; adapted from Rothi, Ochipa, and Heilman 1991).
In terms of observable motion, imitation of an action (acting upon an object) and pan-
tomime of that action (in the absence of an object) may appear the same. However, imi-
tation is the generic attempt to reproduce movements performed by another, whether
to master a skill or simply as part of a social interaction. By contrast, pantomime is per-
formed with the intention of getting the observer to think of a specific action or event.
It is essentially communicative in its nature. The imitator observes; the pantomimic
intends to be observed.
In a novel pantomime, the pantomimic is acting out, perhaps in reduced form, the
mental rehearsal of an action – but often in the absence of the objects with which
the action normally complies. The effort may thus be guided by imaginary affordances
(i.e., imagining those aspects of the object which could be involved in an action) and/or
motor memory (of the movements involved in the action). Even here there is a distinc-
tion. One may pantomime an action in one’s own repertoire using the same effectors
that one uses for the action itself, though perhaps with somewhat reduced and less
articulated movements, given the absence of real objects. Or one may pantomime
an action not within one’s repertoire. This might involve both a visual memory (per-
haps missing essential details) of an observed action, and a mapping of observed
effectors onto one’s own. One might pantomime the movement of the wings of a
butterfly by making one’s hands flutter by; while pantomiming the wings of a soaring
gull might employ the full length of the arms held relatively fixed as the top of the
body sways.
While much discussion of the mirror system stresses that it is part of the motor sys-
tem, it must be more strongly stressed that action and perception are interwoven. The
ability to recognize an action must rest on appropriate processing of visual or other sen-
sory input derived from observation of someone (whether self or other) performing the
action. For a specific example, consider the goal of writing a cursive lower case letter
“a”. One can use a long pole with a paint brush on the end to write on a wall in
what, despite little skill, is recognizably one’s own handwriting even if one has never
attempted the task before and thus the necessary actions are not within the motor rep-
ertoire. The “secret” is that we have a visual, effector-independent, representation of
the letter in terms of the relative size and position of marker strokes. Making the
strokes is highly automatized in the case of a pen or pencil but the visual recognition
can be invoked to control the novel end-effector of the pole-attached paintbrush,
using visual feedback to match this novel performance against our visual expectations.
Thus the generic goal of writing the “a” will invoke an automatized effector-specific
motor schema if the tool is a pen or pencil, but an “effector-independent” vision-
based representation can be used for feedback control of a specific, albeit unfamiliar,
effector – focusing on the trajectory and kinematics of the end-effector of the writing
action, whatever it may be. More generally, then, the ability to recognize the perfor-
mance of an action must precede acquisition of the action by imitation if indeed it is
novel. At first, the recognition provides feedback for visual control of the novel action.
With practice, “internal models” will be developed for the use of the novel effector, in-
creasing the speed and accuracy of the movement (Arbib et al. 2009). With increasing
skill, feedforward comes to dominate. The feedback becomes less and less exercised in
controlling movement but is still there to handle unexpected perturbations (Hoff and
Arbib 1993).
3. From mirror neurons to language

The lexicon is the collection of words in a language or the repertoire of a speaker or
signer. The phonological input lexicon of Fig. 30.2 contains the perceptual schemas
for recognizing each spoken word in the lexicon on hearing it, while the phonological
output lexicon contains the motor schemas for pronouncing each spoken word in the
lexicon. Note that one might be able to recognize a word when pronounced in diverse
accents, yet repeat it in one’s own accent. This contrasts with the direct path in which
one tries to imitate how a word is pronounced, a strategy that can also work for non-
words or words that one does not know. If a word is in the lexicon, then its recognition
and production (in the phonological lexicon) are to be distinguished from what the
word actually means, given in Fig. 30.2 by the link between the lexicon and the box
for semantics.
Rothi, Ochipa, and Heilman (1991) coined the term praxicon to denote the collec-
tion of praxic actions (practical actions upon objects; the type of action impaired in
apraxia) within the repertoire of a human (or animal or robot). Thus the input praxicon
corresponds, approximately, to the mirror neurons that recognize actions, while the out-
put praxicon adds the canonical neurons involved in the performance of actions. These
form the indirect pathway. The direct pathway reminds us that we can also imitate cer-
tain novel behaviors that are not in the praxicon and cannot be linked initially to an
underlying semantics.
For Rothi, Ochipa, and Heilman (1991), the language system at the left of Fig. 30.2
simply serves as a model for their conceptual model of the praxis system at the right,
with semantics playing a bridging role (see Itti and Arbib 2006 for a discussion on
how to extend semantics from objects to “scenes” structured by actions). But in this
section, our challenge is to understand the relevance of mirror systems to an account
of how the system on the left evolved “atop” the system on the right. The portion of
human frontal cortex that serves as the “mirror system for grasping” is located in or
near Broca’s area, traditionally associated with speech production. (See Petrides and
Pandya 2009 for a more subtle analysis than was available ten years earlier of the
macaque homologues of Broca’s area, and the related connectivity in macaque and
human. These new findings have yet to be used to update the analyses reviewed in
the present article.) What could be the connection between these two characteriza-
tions? Inspired in part by findings that damage to Broca’s area could affect deaf
users of signed languages (Poizner, Klima, and Bellugi 1987), not just users of spoken
language, and by earlier arguments for a gestural origin of language, Giacomo Rizzo-
latti and I (Arbib and Rizzolatti 1997; Rizzolatti and Arbib 1998; Rizzolatti and
Arbib 1999) suggested that Broca’s area evolved atop (and was thus not restricted
to) a mirror system for grasping that had already been present in our common ancestor
with macaques, and served to support the parity property of language – that what is
intended by the speaker is (more or less) understood by the hearer, including the
case where “speaker” and “hearer” are using a signed language. This Mirror-System
Hypothesis provided a neural “missing link” for those theories that argue that commu-
nication based on manual gesture played a crucial role in human language evolution
(Armstrong, Stokoe, and Wilcox 1995; Armstrong and Wilcox 2007; Corballis 2002;
Hewes 1973). Many linguists see “generativity” as the hallmark of language, i.e., its abil-
ity to add new words to the lexicon and then to combine them hierarchically through
the constructions of the grammar to create utterances that are not only novel but
can also be understood by others. This contrasts with the fixed repertoire of monkey
vocalizations. However, we stress that such generativity is also present in the repertoire
of behavior sequences whereby almost any animal “makes its living.” It is strange,
then, that Rothi, Ochipa, and Heilman’s (1991) original figure includes a phonological
buffer for putting words together, but omits an “action buffer” for putting actions
together.
Clearly, the Mirror System Hypothesis does not say that having a mirror system is
equivalent to having language. Monkeys have mirror systems but do not have language,
and we expect that many species have mirror systems for varied socially relevant beha-
viors. Moreover, since monkeys have little capacity for imitation – much depends on
how imitation is defined (Visalberghi and Fragaszy 2001; Voelkl and Huber 2000) –
more than a macaque-like mirror system is needed for the purpose of imitation. More-
over, the ability to copy single actions is just the first step towards imitation, since
humans have what I call complex imitation (Arbib 2002) which combines a perceptual
skill – “parsing” a complex movement into more or less familiar pieces – with a motor
skill that exploits it, performing the corresponding composite of (variations on) familiar
actions. Crucial to this is the recognition of the subgoals to which various parts of the
behavior are directed (Wohlschläger, Gattis, and Bekkering 2003). I have argued
(Arbib 2002) that chimpanzees (and, presumably, the common ancestor of human
and chimpanzees) have simple imitation, they can imitate short novel sequences
through repeated exposure, whereas only humans have complex imitation.
With this background, I can summarize one hypothetical sequence leading from
manual gesture to a language-ready brain (Arbib 2005, 2012) as follows:
(i) pragmatic action directed towards a goal object;

(ii) simple imitation of such actions (as we see in the next section, this can be
exploited by apes to develop a limited repertoire of communicative gestures);
(iii) complex imitation of such actions (which has evolved since humans and chimpan-
zees diverged from their common ancestor);
(iv) pantomime in which similar actions are produced in the absence of any goal object
and extended to actions outside the pantomimics repertoire (which provides an
open-ended semantics that can be expanded as needed);
(v) conventionalized gestures divorced from their pragmatic origins (if such existed):
in pantomime it might be hard to distinguish a movement signifying “flying” from
one meaning “a [flying] bird”, thus providing an “incentive” for coming up with an
arbitrary gesture to distinguish the two meanings. This suggests that the emer-
gence of symbols occurred when the ambiguity or “cost” of pantomiming proved
limiting;
(vi) the increasing use of vocalization to complement manual and facial gestures;
(vii) the use of such abstract gestures for the formation of compounds which can be
paired with meanings in more or less arbitrary fashion yielding a syntax integrated
with an increasingly compositional semantics.
My current view is that stages (iii)–(vi) and a rudimentary (pre-syntactic) form of (vii)
were present in pre-human hominids, but that the “explosive” development of (vii) that
we know as language depended on cultural evolution well after biological evolution had
formed modern Homo sapiens. This remains speculative, and one should note that bio-
logical evolution may have continued to reshape the human genome for the brain even
after the skeletal form of Homo sapiens was essentially stabilized, as it certainly has
done for skin pigmentation and other physical characteristics. However, the fact that
most people can master any language if raised appropriately, irrespective of their
genetic heritage, shows that these changes are not causal with respect to the structure
of language.
Work continues both to fill in the details of the Mirror System Hypothesis and to
develop a neurolinguistics which assesses mechanisms for language processing in the
human brain in relation to this evolutionary perspective. Itti and Arbib (2006) discuss
how to extend semantics from objects to “scenes” structured by actions (these ideas
have been developed in the treatment of semantic representations and template con-
struction grammar by Arbib and Lee 2008). Arbib and Bonaiuto (2008) suggest that
analysis of the action buffer (the missing link of Fig. 30.2) might employ a method
called Augmented Competitive Queuing (Bonaiuto and Arbib 2010). I have then
suggested (briefly) how this might be related to a buffer for utterance formulation
based on construction grammar. Details lie outside the scope of this article (but see
Lee and Barrès, 2013).
4. Language within the framework of bodily communication

While the previous section has stressed an evolutionary view of language within the
framework of bodily communication, it is clear that the use of the body for communi-
cation extends far beyond language. Thus, the “secret” of language may perhaps be
sought within a framework broader than the emergence of lexicon and grammar
which marks the transition from (i) the observation of another’s movements as a cue
to action (e.g., a dog seeing another dog digging may look for a bone) in which “com-
munication” of the actor’s intentions to the observer is a mere side-effect of an action
intended for a practical end to (ii) explicitly communicative actions which (whether
consciously or not) are directed at the observer rather than acting upon objects in
the physical world. Thus, the duality of Fig. 30.2 between language and praxis can be
extended to include a third component of communicative gestures for bodily commu-
nication generally and manual gesture more specifically that has an important but rel-
atively small overlap with language and praxis. This invites the neologism somaticon for
the repertoire of communicative acts employing the body, many of which (unlike the
lexicon) are neither conscious nor codified for use across a community.
We can see a form of this in the great apes, but with the restriction that any one group
in the wild has a very limited set of gestures – there seems no conscious realization that
one can create new gestures as needed. While some gestures are no doubt side-effects of
the animal’s own behavior, it is clear that some gestures are purposefully communicative.
Manual and bodily gestures are called “auditory” if they generate sound (but not with
vocal cords) while tactile gestures include physical contact with the recipient. Visual
gestures generate a mainly visual effect with no physical contact. In captivity, all apes
use their visual gestures rarely unless the recipient is visually attending. Tellingly, both
wild and captive populations use tactile and (in the case of African great apes) auditory
gestures to attract the attention of someone who is not looking at them. (See Arbib,
Liebal, and Pika 2008: 1057, for primary references and further discussion.)
There are certainly unlearned gestures for great apes, suggesting that the production
of at least some species-typical gestures is due to genetic predisposition, triggered by
commonly available individual learning conditions, while the occasions for use have
to be learned, as is the case for vocal development in monkeys (Seyfarth and Cheney
1997). However, there is good evidence that great apes can also invent or individually
learn new gestures. Idiosyncratic gestures have been seen in great apes and used only by
single individuals – as such it seems unlikely that they were genetically determined or
socially learned – and yet they were used to achieve a certain social goal such as play
and most often caused a response of the recipient (Pika, Liebal, and Tomasello 2003).
Turning to gestures used by at least two individuals, perhaps one as sender and the other
as receiver, Tomasello and Call (1997) argue that these are learned via ontogenetic ri-
tualization, an individual learning process whereby two individuals shape each others’
behavior in repeated instances of an interaction to develop a novel communicative
gesture. The general form of this type of learning is:
(i) Individual A performs behavior X;

(ii) Individual B reacts consistently with behavior Y;
(iii) Subsequently B anticipates A’s performance of X, on the basis of its initial prefix
by performing Y; and
(iv) Subsequently, A anticipates B’s anticipation and produces the prefix in a ritualized
form (waiting for a response rather than completing X) in order to elicit Y.
For example, play hitting is an important part of the rough-and-tumble play of chimpan-
zees, and many individuals come to use a stylized arm-raise to indicate that they are
about to hit the other and thus initiate play (Goodall 1986). However, since there
are group-specific gestures – performed by the majority of individuals of one group,
but not observed in another – it would seem that even if ontogenetic ritualization
were involved in “first use” of a gesture, its spread throughout a group would involve
some form of simple imitation of gestures.
Pointing (using the index finger or extended hand) has only been observed in chim-
panzees interacting with their human experimenters (e.g., Leavens, Hopkins, and Bard
1996; Leavens, Hopkins, and Thomas 2004) as well as in human-raised or language-
trained apes (e.g., Gardner and Gardner 1969; Patterson 1978; Woodruff and Premack
1979), but never between conspecifics. Since both captive and wild chimpanzees share
the same gene pool, Leavens, Hopkins, and Bard (2005) argued that the occurrence
of pointing in captive apes is attributable to environmental influences on their com-
municative development. A related suggestion is that apes do not point for conspeci-
fics because the observing ape will not have the motive to help or inform others or
to share attention and information (Tomasello et al. 2005). One hypothesis that
I find attractive is this: A chimpanzee reaches through the bars to get a banana but
cannot reach it. However, a human does something another chimpanzee would not
do – recognizing the ape’s intention, the keeper gives him the banana. As a result,
the ape soon learns that a point (i.e., the failed reach) is enough to get the pointed-
to object without the exertion of trying to complete an unsuccessful reach. This is a
variation on ontogenetic ritualization that depends on the fact that humans provide
an environment which is far more responsive than that provided by conspecifics in
the wild:
(i) Individual A attempts to perform behavior X to achieve goal G, but fails – achieving
only a prefix Z;
(ii) Individual B infers goal G from this behavior and performs an action that achieves
the goal G for A;
(iii) In due course, A produces Z in a ritualized form to get B to perform an action that
achieves G for A.
Let me call this human-supported ritualization. Intriguingly, it is far more widespread

than just for apes. The cats in my house have developed a rich system of communication
with my wife and me. “Stand outside a door and meow” means “let me in”; “stand
inside a door” will, depending on posture, signal “just looking through the glass” versus
“let me out”. A tentative motion down the hall towards the room with the food dishes
means “feed me”. And so on. Generally, then, the main forms of communication are
vocalizations that attract attention, and “motion prefixes” where the cat begins a per-
formance with the intention that the human recognize the prefix and do what is neces-
sary to enable the action to proceed to completion. Such performances succeed only
because humans, not the cats, have evolved a form of cooperative behavior that
responds to such signals.
Building on this, let me suggest three types of bodily communication – while admit-
ting that the range of articles in this Handbook, Body-Language-Communication, show
that this is a very small sample:
4.1. Body mirroring

Body mirroring establishes (unconsciously) a shared communicative space that has
no meaning in itself, but in which turn-taking can be situated (e.g., Oberzaucher and
Grammer 2008). This is like echolalia/echopraxia (recall pathway (e) of Fig. 30.2) but
is non-pathological if restricted to “basic body mirroring” that does not transgress social
conventions (which can inhibit a subordinate from mirroring a superior, while allowing
it for a close acquaintance). It pertains more to the direct route than to the indirect
route of Fig. 30.2. Again, we see a role for inhibitory control.
4.2. Emotional communication

We have evolved facial gestures for the expression of emotions (Darwin 1872) and it has
been argued that a mirror system for such expressions could account for empathy when
pathway (e) induces in us the motor state for the emotion (the simulation theory,
Gallese and Goldman 1998). Other findings suggest that the ability to recognize the
auditory and visual details of a person’s actions may be a factor in empathy (Gazzola,
Aziz-Zadeh, and Keysers 2006). The role of the insula in a mirror system for disgust
(Wicker et al. 2003) suggests that emotional communication may involve diverse mirror
systems. However, in line with Fig. 30.1 and the discussion of Buccino et al. (2004),
I would not see “emotion recognition” as the sole province of a set of mirror systems.
If I were to see someone advancing toward me with a knife, I would recognize his anger
yet feel fear; I would not empathize as I struggled to protect myself.
Dapretto et al. (2006) studied children imitating facial expressions of emotion, and
found that increasing scores on measures for autism spectrum disorder (ASD) corre-
lated with decreasing activation of the mirror system during the imitation task. How-
ever, the children with ASD successfully imitated the facial expressions! This
suggests that normally developing children conduct the task via an indirect path (the
emoticon, but not in the e-mail sense) in which they recognize the emotion and express
it themselves, whereas the ASD child must increasingly depend on the direct path
to reproduce the expression as a meaningless aggregate of facial movements, devoid
of initial meaning – just as we might imitate a meaningless grimace.
4.3. Intentional communication

In most cases, body-mirroring proceeds spontaneously, as does emotional interaction.
By contrast, gestures – like language – may be used as part of intentional communica-
tion, where the actions of the “communicator” are more-or-less chosen for their effect
on the “communicatee”. In the cases of the apes and cats discussed above, such
intended communication seems primarily instrumental, trying to elicit some immediate
action located in the here and now. By contrast, humans can employ intentional
communication for a vast range of speech (and gesture and sign) acts – sharing inter-
ests, telling stories, registering degrees of politeness, asking questions, and on and on.
Among the most challenging questions for the evolution of language is its ability to sup-
port “mental time travel” (Suddendorf and Corballis 1997, 2007), the ability to recall
the past and to plan and discuss possible outcomes for the future. But this takes us
beyond the scope of this article.
5. Putting it together
A partial view of what we have achieved in this article is provided by Fig. 30.3. The top
box combines the ability to recognize and perform small, meaningless actions (the
direct pathway) with the ability to recognize and perform familiar goal-directed actions
(the indirect pathway, including the input and output praxicons). A parallel set of mir-
ror systems would support the expression and recognition of emotions (the input and
output emoticons). However, we must go “beyond the mirror” (BtM) to sequence
actions, bringing in, for example the basal ganglia. Given the data on monkeys and
apes, we must invoke other systems beyond the mirror to imitate actions. Apes seem
capable of simple imitation, but further evolution beyond the chimpanzee-human
common ancestor was needed to support complex imitation and the intentional use
of pantomime for intended bodily communication.
Mirror System for Actions

(mirror neurons, canonical
Recognize Act
neurons, & more
& BtM for Compounds
Not a Evolution
Flow of From Praxis to
Data Communication
Mirror System for Words

(mirror neurons, canonical
Hear/See Say/Sign
neurons, & more
& BtM for Constructions
Embodied Semantics:
Schema networks:
Perceive Act
Perceptuo-motor
schema assemblage
Fig. 30.3: A view of the language-ready human brain informed by the Mirror System Hypothesis
(loosely adapted from Arbib 2006). See text for details.
However, as stressed in Fig. 30.1, the recognition of an action is only a small part of
understanding the meaning of the action within the broader context of scene perception
(not to mention mental time travel) and action planning. The bottom box summarizes
the role of schema networks and assemblages of perceptual and motor schemas in
providing the embodied semantics (as discussed in the next paragraph) that supports
this broader understanding. The line linking the top and bottom boxes then indicates
both the separation and coordination between the planning of actions based on
recognition of current circumstances and goals, and the invocation of specific parameters
in adapting motor control to the specific objects with which interaction is taking place.
The theory of embodied semantics, however, needs to be treated with some caution.
For many authors (e.g., Feldman and Narayanan 2004; Gallese and Lakoff 2005) the
notion is that concepts are represented in the brain within the same sensory-motor cir-
cuitry on which the enactment of that concept relies. By contrast, Fig. 30.3 separates the
semantic pathway (at the bottom) from the basic sensorimotor pathways of the mirror
system (both mirror neurons and other neurons involved in the detailed specification
of actions). The suggestion is that knowing about objects and actions (and thus being
able to plan actions) must be distinguished from the detailed muscle control for
those actions. Nonetheless, the suggestion is that our semantics is grounded in our abil-
ity as embodied actors to interact with the world and the people around us. Then, as we
develop (both ontogenetically and phylogenetically), the concepts that can be referred
directly to physical and social interactions scaffold increasingly abstract concepts that
are, at best, tenuously related to the original sensorimotor processing (Arbib 2008).
Finally, the middle box shows the mirror system for words-as-articulatory-actions as
having evolved from the mirror system for actions (via pantomime, the coupling of ges-
ture and vocalization, and the conventionalization of communicative actions), rather
than being directly driven by it. Although not shown in the diagram, gestures remain
available to complement language (as in the co-speech gestures of McNeill 2005), but
the signs of signed language are the “words” of a fully-formed human language and
so, as part of a rich integrated system of syntax and semantics, are to be distinguished
from gestures more closely related to action. Duality of patterning then involves the
ability to put meaningless sounds together (the direct pathway; phonology) and the
ability to directly access words as appropriate sound assemblages (the input phonological
lexicon) and produce words as articulatory patterns (the output phonological lexicon).
For signed languages, the conventions of handshape, location and movement provide
the equivalent “phonology” (Stokoe 1960). We again need a linkage, now between the
middle box and the bottom box, to link words to their meanings. Again, we must go
beyond the mirror – applying various constructions (using syntax to build semantics)
to combine words into hierarchically structured utterances that express the richness of
our embodied semantics and the abstractions that grow from it, both working with
and moving beyond the capacities of non-linguistic bodily communication.
6. References
Arbib, Michael A. 2002. The mirror system, imitation, and the evolution of language. In: Kerstin
Dautenhahn and Chrystopher L. Nehaniv (eds.), Imitation in Animals and Artifacts. Complex
Adaptive Systems, 229–280. Cambridge: Massachusetts Institute of Technology Press.
Arbib, Michael A. 2005. From monkey-like action recognition to human language: An evolution-
ary framework for neurolinguistics (with commentaries and author’s response). Behavioral and
Brain Sciences 28: 105–167.
Arbib, Michael A. 2006. Aphasia, apraxia and the evolution of the language-ready brain. Apha-
siology 20: 1–30.
Arbib, Michael A. 2008. From grasp to language: Embodied concepts and the challenge of
abstraction. Journal of Physiology-Paris 102(1–3): 4–20.
Arbib, Michael A. 2010. Mirror system activity for action and language is embedded in the inte-
gration of dorsal and ventral pathways. Brain and Language 112: 12–24.
Arbib, Michael A. 2012. How the Brain Got Language: The Mirror System Hypothesis. New York:
Arbib, Michael A. and James B. Bonaiuto 2008. From grasping to complex imitation: Modeling
mirror systems on the evolutionary path to language. Mind & Society 7: 43–64.
Arbib, Michael A., James B. Bonaiuto, Stéphane Jacobs and Scott H. Frey 2009. Tool use and the
distalization of the end-effector. Psychological Research 73: 441–462.
Arbib, Michael A. and JinYong Lee 2008. Describing visual scenes: Towards a neurolinguistics
based on construction grammar. Brain Research 1225: 146–162.
Arbib, Michael A., Katja Liebal and Simone Pika 2008. Primate vocalization, gesture, and the evo-
lution of human language. Current Anthropology 49(6): 1053–1076. (See http://www.journals.
uchicago.edu/doi/full/1010.1086/593015#apa for access to supplementary materials and videos.)
Arbib, Michael A. and Giacomo Rizzolatti 1997. Neural expectations: a possible evolutionary
path from manual skills to language. Communication and Cognition 29: 393–424.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. Oxford:
Bonaiuto, James B. and Michael A. Arbib 2010. Extending the mirror neuron system model, II:
What did I just do? A new role for mirror neurons. Biological Cybernetics 102: 341–359.
Bonaiuto, James B., Edina Rosta and Michael A. Arbib 2007. Extending the mirror neuron system
model, I: Audible actions and invisible grasps. Biological Cybernetics 96: 9–38.
Buccino, Giovanni, Fausta Lui, Nicola Canessa, Ilaria Patteri, Giovanna Lagravinese, Francesca
Benuzzi, Carlo A. Porro and Giacomo Rizzolatti 2004. Neural circuits involved in the recog-
nition of actions performed by nonconspecifics: An FMRI study. Journal of Cognitive Neu-
roscience 16(1): 114–126.
Calvo-Merino, Beatriz, Daniel E. Glaser, Julie Grèzes, Richard E. Passingham, and Patrick Hag-
gard 2005. Action observation and acquired motor skills: An FMRI study with expert dancers.
Cerebral Cortex 15(8): 1243–1249.
Corballis, Michael C. 2002. From Hand to Mouth, the Origins of Language. Princeton, NJ: Prince-
ton University Press.
Dapretto, Mirella, Mari S. Davies, Jennifer H. Pfeifer, Ashley A. Scott, Marian Sigman, Susan Y.
Bookheimer, and Arco Iacoboni 2006. Understanding emotions in others: Mirror neuron dys-
function in children with autism spectrum disorders. Nature Neuroscience 9(1): 28–30.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. Republished in
[1965]. Chicago: University of Chicago Press.
De Renzi, Ennio 1989. Apraxia. In: Francois Boller and Jordan Grafman (eds.), Handbook of
Neuropsychology, Vol. 2, 245–263. Amsterdam: Elsevier.
Feldman, Jerome and Srinivas Narayanan 2004. Embodied meaning in a neural theory of lan-
guage. Brain & Language 89(2): 385–392.
Ferrari, Pier Francesco, Vittorio Gallese, Giacomo Rizzolatti and Leonardo Fogassi 2003. Mirror
neurons responding to the observation of ingestive and communicative mouth actions in the
monkey ventral premotor cortex. European Journal of Neuroscience 17(8): 1703–1714.
Gallese, Vittorio, Luciano Fadiga, Leonardo Fogassi and Giacomo Rizzolatti 1996. Action recog-
nition in the premotor cortex. Brain 119: 593–609.
Gallese, Vittorio and Alvin Goldman 1998. Mirror neurons and the simulation theory of mind-
reading. Trends in Cognitive Science 2: 493–501.
Gallese, Vittorio and George Lakoff 2005. The brain’s concepts: The role of the sensory-motor
system in conceptual knowledge. Cognitive Neuropsychology 22: 455–479.
Gardner, R. Allen and Beatriz T. Gardner 1969. Teaching sign language to a chimpanzee. Science
165: 664–672.
Gazzola, Valeria, Lisa Aziz-Zadeh and Christian Keysers 2006. Empathy and the somatotopic
auditory mirror system in humans. Current Biology 16(18): 1824–1829.
Goodall, Jane 1986. The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard
University Press.
Grafton, Scott T., Michael A. Arbib, Luciano Fadiga and Giacomo Rizzolatti 1996. Localization of
grasp representations in humans by positron emission tomography. 2. Observation compared
with imagination. Experimental Brain Research 112(1): 103–111.
Hewes, Gordon W. 1973. Primate communication and the gestural origin of language. Current
Anthropology 12(1–2): 5–24.
Hoff, Bruce and Michael A. Arbib 1993. Models of trajectory formation and temporal interaction
of reach and grasp. Journal of Motor Behavior 25(3): 175–192.
Itti, Laurent and Michael A. Arbib 2006. Attention and the minimal subscene. In: Michael A.
Arbib (ed.), Action to Language via the Mirror Neuron System, 289–346. Cambridge: Cam-
Kohler, Evelyn, Christian Keysers, M. Alessandra Umiltà, Leonardo Fogassi, Vittorio Gallese and
Giacomo Rizzolatti 2002. Hearing sounds, understanding actions: Action representation in
mirror neurons. Science 297(5582): 846–848.
Leavens, David A., William D. Hopkins and Kim A. Bard 1996. Indexical and referential pointing
in chimpanzees (Pan troglodytes). Journal of Comparative Psychology 110(4): 346–353.
Leavens, David A., William D. Hopkins and Kim A. Bard 2005. Understanding the point of point-
ing. Current Directions in Psychological Science 14(4): 185–189.
Leavens, David A., William D. Hopkins and Roger K. Thomas 2004. Referential communication
by chimpanzees (Pan troglodytes). Journal of Comparative Psychology 118(1): 48–57.
Lee, Jinyong and Victor Barrès 2013. From visual scenes to language and back via Template Con-
struction Grammar. Neuroinformatics.
Oberzaucher, Elisabeth and Karl Grammer 2008. Everything is movement: On the nature of em-
bodied communication. In: Ipke Wachsmuth, Manuela Lenzen and Guenther Knoblich (eds.),
Embodied Communication in Humans and Machines, 151–177. Oxford: Oxford University
Press.
Oztop, Erhan and Michael A. Arbib 2002. Schema design and implementation of the grasp-related
mirror neuron system. Biological Cybernetics 87(2): 116–140.
Patterson, Francine 1978. Conversations with a gorilla. National Geographic 134(4): 438–465.
Petrides, Michael and Deepak N. Pandya 2009. Distinct Parietal and Temporal Pathways to the
Homologues of Broca’s Area in the Monkey. Public Library of Science Biology 7(8): e1000170.
Pika, Simone, Katja Liebal and Michael Tomasello 2003. Gestural communication in young gor-
illas (Gorilla gorilla): Gestural repertoire, learning and use. American Journal of Primatology
60(3): 95–111.
Poizner, Howard, Edward S. Klima and Ursula Bellugi 1987. What the Hands Reveal about the
Brain. Cambridge: Massachusetts Institute of Technology Press.
ciences 21: 188–194.
Rizzolatti, Giacomo and Michael A. Arbib 1999. From grasping to speech: Imitation might pro-
vide a missing link: Reply. Trends in Neurosciences 22(4): 152.
Rizzolatti, Giacomo and Laila Craighero 2004. The mirror-neuron system. Annual Review of Neu-
roscience 27: 169–192.
Rizzolatti, Giacomo, Luciano Fadiga, Massimo Matelli, Valentino Bettinardi, Eraldo Paulesu, Da-
niela Perani and Ferruccio Fazio 1996. Localization of grasp representations in humans by
PET: 1. Observation versus execution. Experimental Brain Research 111(2): 246–252.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain: How Our Minds Share Ac-
tions, Emotions, and Experience. Translated from the Italian by Frances Anderson. Oxford:
Roberts, Jaqueline M 1989. Echolalia and comprehension in autistic children. Journal of Autism
and Developmental Disorders 19(2): 271–281.
Rothi, Lesley J., Cynthia Ochipa and Kenneth M. Heilman 1991. A cognitive neuropsychological
model of limb praxis. Cognitive Neuropsychology 8: 443–458.
Seyfarth, Robert M. and Dorothy L. Cheney 1997. Some general features of vocal development in
nonhuman primates. In: Charles T. Snowdon and Martine Hausberger (eds.), Social Influences
on Vocal Development, 249–273. Cambridge: Cambridge University Press.
tems of the American Deaf. Buffalo, NY: University of Buffalo.
Suddendorf, Thomas and Michael C. Corballis 1997. Mental time travel and the evolution of the
human mind. Genetic, Social and General Psychology Monographs 123(2): 133–167.
Suddendorf, Thomas and Michael C. Corballis 2007. The evolution of foresight: What is mental
time travel, and is it unique to humans? Behavioral and Brain Sciences 30: 299–351.
Tomasello, Michael and Josep Call 1997. Primate Cognition. New York: Oxford University Press.
Tomasello, Michael, Malinda Carpenter, Josep Call, Tanya Behne and Henrike Moll 2005. Under-
standing and sharing intentions: The origins of cultural cognition. Behavioral and Brain
Sciences 28: 1–17.
Umiltà, M. Alessandra, L. Escola, Irakli Intskirveli, Franck Grammont, Magali Rochat, Fausto
Caruana, Ahmad Jezzini, Vittorio Gallese and Giacomo Rizzolatti 2008. When pliers become
fingers in the monkey motor system. Proceedings of the National Academy of Sciences of the
USA 105(6): 2209–2213.
Visalberghi, Elisabetta and Dorothy M. Fragaszy 2001. Do monkeys ape? Ten years after. In: Ker-
stin Dautenhahn and Chrystopher L. Nehaniv (eds.), Imitation in Animals and Artifacts, 471–
500. Cambridge: Massachusetts Institute of Technology Press.
Voelkl, Bernhard and Ludwig Huber 2000. True imitation in marmosets. Animal Behaviour 60:
195–202.
Wicker, Bruno, Christian Keysers, Jane Plailly, Jean-Pierre Royet, Vittorio Gallese and Giacomo
Rizzolatti 2003. Both of us disgusted in my insula: The common neural basis of seeing and feel-
ing disgust. Neuron 40(3): 655–664.
Wohlschläger, Andreas, Merideth Gattis and Harold Bekkering 2003. Action generation and
action perception in imitation: an instance of the ideomotor principle. Philosophical Transac-
tions of the Royal Society of London 358: 501–515.
Woodruff, Guy and David Premack 1979. Intentional communication in the chimpanzee: The
development of deception. Cognition 7: 333–352.
Michael A. Arbib, University of Southern California, Los Angeles, CA (USA)
31. Gesture as precursor to speech in evolution

1. Introduction
2. Modern developments
3. From hand to mouth
4. Conclusion
5. References
Abstract
Although it is commonly assumed that language evolved from primate calls, evidence
now converges on the alternative view that it emerged from manual gestures. Signed
languages are recognized as true languages, and manual gestures accompany and add
31. Gesture as precursor to speech in evolution 467
meaning to regular speech. The mirror system in primates, involving neural circuits spe-
cialized for both the production and perception of manual grasping, may have formed the
platform from which a manual form of communication evolved. Great apes in captivity
cannot be taught to speak, but they have acquired manual systems that have at least some
language-like properties and even in the wild, ape gestures are more flexible and inten-
tional than are their vocalizations. In hominins, bipedalism probably led to even greater
flexibility of gestures. A dramatic increase in brain size during the Pleistocene may have
signalled the emergence of grammatical features, but language probably remained pri-
marily gestural, with added facial and vocal elements, until the emergence of Homo sa-
piens some 200,000 years ago, when speech emerged as the dominant mode. The practical
advantages of speech over gesture, including the freeing of the hands, helps explain the
dramatic rise of sophisticated tools, bodily ornamentation, art, and perhaps music, in
our own species.
1. Introduction
Suppose we had no voice or tongue, and wanted to communicate with one another, should
we not, like the deaf and dumb, make signs with the hands and head and rest of the body?
(from Plato’s Cratylus, 360 BC)
Signed languages have been recognized for well over 2,000 years. Interest seemed to
pick up from the 17th century, though, when Francis Bacon ([1605] 1828) suggested
that the signed languages of the deaf and dumb were superior to spoken language,
since (or so he thought) they did not require grammar, the curse of Babel. The English
physician John Bulwer wrote in 1644 of “the natural language of the hand,” and was the
first to propose that the deaf might be educated through lip reading. Signed languages
were not studied formally, however, until the work of Abbé Charles-Michael de l’Épée
in the 1770s when he developed a sign language for use in a private school for the deaf
in Paris. This language was amalgamated with signs used by the deaf population in the
US to form the basis for American Sign Language (ASL).
The French philosopher Géraud de Cordemoy ([1668] 1972) called gestures “the first
of all languages,” noting that they were universal and understood everywhere. The idea
that gestures were an evolutionary precursor to spoken language was further developed
in the 18th century, although constrained by religious doctrine proclaiming language to
be a gift from God. The Italian philosopher Giambattista Vico ([1744] 1953) accepted
the Biblical story of the divine origin of speech, but proposed that after the Flood hu-
mans reverted to savagery, and language emerged afresh with the use of gestures and
physical objects, only later incorporating vocalizations. Condillac ([1746] 1971) also be-
lieved that language evolved from manual gestures but, as an ordained priest, he felt
compelled to disguise his theory in the form of a fable about two children isolated in
the desert after the Flood. They began by communicating in manual and bodily ges-
tures, which were eventually supplanted by vocalizations. Condillac’s compatriot
Jean-Jacques Rousseau (1781) later endorsed the gestural theory more openly.
In the 19th century, Charles Darwin referred to the role of gestures in his book
The Descent of Man: “I cannot doubt that language owes its origins to the imitation
and modification of various natural sounds, and man’s own distinctive cries, aided by
signs and gestures” (Darwin [1871] 1896: 86; emphasis added). Shortly afterwards,
the philosopher Friedrich Nietzsche, in his (1878) book Human, All Too Human, wrote
as follows:
Imitation of gesture is older than language, and goes on involuntarily even now, when the
language of gesture is universally suppressed, and the educated are taught to control their
muscles. The imitation of gesture is so strong that we cannot watch a face in movement
without the innervation of our own face (one can observe that feigned yawning will
evoke natural yawning in the man who observes it). The imitated gesture led the imitator
back to the sensation expressed by the gesture in the body or face of the one being imi-
tated. This is how we learned to understand one another; this is how the child still learns
to understand its mother (…) As soon as men understood each other in gesture, a symbol-
ism of gesture could evolve. I mean, one could agree on a language of tonal signs, in such
a way that at first both tone and gesture (which were joined by tone symbolically) were
produced, and later only the tone. (Nietzsche [1878] 1986: 219)
This extract also anticipates the later discovery of the mirror system, discussed below.
In 1900, Wilhelm Wundt wrote a two-volume work on speech, and argued that a uni-
versal sign language was the origin of all languages. He wrote, though, under the mis-
apprehension that all deaf communities use the same system of signing, and that signed
languages are useful only for basic communication, and cannot communicate abstract
ideas. We now know that signed languages vary widely from community to community,
and can have all of the communicative sophistication of speech (e.g., Emmorey 2002)
2. Modern developments
The gestural theory began to assume its modern shape with an article published in 1973
by the anthropologist Gordon W. Hewes. He too drew on evidence from signed lan-
guages, and also referred to contemporary work showing that great apes were unable
to learn to speak but could use manual gestures in a language-like way, with at least
moderate success. It was not until the 1990s, though, that the idea began to take
hold. Pinker and Bloom’s (1990) influential article on language evolution made no men-
tion of Hewes’ work, but was followed by an increasing number of publications that
picked up the gestural theme (e.g., Armstrong, Stokoe, and Wilcox 1995; Corballis
1992, 2002; Rizzolatti and Arbib 1998). Three books published since 2007 converge
on the gestural theory, but from very different perspectives, one based on signed lan-
guages (Armstrong and Wilcox 2007), one on gestural communication in great apes
(Tomasello 2008), and one on the mirror system in the primate brain (Rizzolatti and
Sinigaglia 2008). I discuss each in turn.
2.1. Signed language

The grammatical and semantic sophistication of signed language is well illustrated by
Gallaudet University in Washington, DC, where university-level instruction is conducted
entirely in American Sign Language (ASL). Children exposed to signed languages from
infancy learn them as easily and naturally as those exposed to speech learn to speak, even
going through a stage of manual “babbling.” Although manual babbling has been taken
as evidence that language is essentially amodal (Petitto, Holowka, Sergio, et al. 2004),
others have used evidence from sign language to argue that language, whether manual
or vocal, is basically gestural rather than amodal, with manual gestures taking prece-
dence in human evolution (e.g., Armstrong and Wilcox 2007). In principle, at least, lan-
guage could once have existed in a form consisting entirely of manual and facial
gestures, comparable to present-day signed languages.
One difference between signed and spoken languages is that many of the signs are
iconic, representing the actual shapes of objects or sequences of events. In Italian
Sign Language some 50% of the hand signs and 67% of the bodily locations of signs
stem from iconic representations, in which there is a degree of spatiotemporal mapping
between the sign and its meaning (Pietrandea 2002). In American Sign Language, too,
some signs are purely arbitrary, but many more are iconic (Emmorey 2002). For exam-
ple, the sign for “erase” resembles the action of erasing a blackboard, and the sign for
“play piano” mimics the action of actually playing a piano. The iconic nature of signed
languages may seem to contravene the “arbitrariness of the sign,” considered by Ferdi-
nand de Saussure (1916) to be one of the defining properties of language. But arbitrari-
ness is more a consequence of the language medium, and of expediency, than a
necessary property of language per se.
Speech, unlike manual signing, requires that the information be linearized, squeezed
into a sequence of sounds that are necessarily limited in terms of how they can capture
the physical nature of what they represent. The linguist Charles Hockett (1978) put it
this way:
(…) when a representation of some four-dimensional hunk of life has to be compressed

into the single dimension of speech, most iconicity is necessarily squeezed out. In one-
dimensional projections, an elephant is indistinguishable from a woodshed. Speech per-
force is largely arbitrary; if we speakers take pride in that, it is because in 50,000 years
or so of talking we have learned to make a virtue of necessity. (Hockett 1978: 274–275)
In any case, not all signs are iconic, and they can also be iconic without being transpar-
ently so. Seemingly iconic signs often cannot be guessed by naı̈ve observers (Pizzuto
and Volterra 2000). They also tend to become less iconic and more arbitrary over
time, in the interests of speed, efficiency, and the constraints of the communication
medium. This process is called conventionalization (Burling 1999). Once the principle
of conventionalization is established, there is no need to retain an iconic component,
or even to depend on visual signals. We are quick to learn arbitrary labels, whether
for objects, actions, emotions, or abstract concepts. Manual gesture may still be neces-
sary to establish links in the first place – the child can scarcely learn the meaning of the
word dog unless her attention is drawn to the animal itself – but there is otherwise no
reason why the labels themselves need not be based on patterns of sound. Of course
some concepts, such as the moo of a cow or miaow of a cat, depend on sound rather
than sight. Pinker (2007) notes a number of newly minted examples: oink, tinkle,
barf, conk, woofer, tweeter. But most spoken words bear no physical relation to what
they represent.
The symbols of signed languages are less constrained. The hands and arms can mimic
the shapes of real-world objects and actions, and to some extent lexical information can
be delivered in parallel instead of being forced into rigid temporal sequence. Even so,
conventionalization allows signs to be simplified and speeded up, to the point that many
of them lose most or all of their iconic aspect. The American Sign Language sign for
home was once a combination of the sign for eat, which is a bunched hand touching
the mouth, and the sign for sleep, which is a flat hand on the cheek. Now it consists
of two quick touches on the cheek with a bunched handshape, so the original iconic
components are all but lost (Frishberg 1975).
Two signed languages have emerged, apparently de novo, in recent times, and have
provided some insights into how language might have evolved in the first place. One
arose in Nicaragua, where deaf people were isolated from one another until the Sandi-
nista government in 1979 created the first schools for the deaf. Since then, the children
in these schools have invented their own sign language, which has developed into
the system now called Lenguaje de Signos Nicaragüense (LSN). In the course of
time, Lenguaje de Signos Nicaragüense has changed from a system of holistic signs
to a more combinatorial format. For example, the children were told a story of a cat
that swallowed a bowling ball, and then rolled down a steep street in a “waving, wob-
bling manner.” Asked to sign the motion, some children indicated the motion holisti-
cally, moving the hand downward in a waving motion. Others, however, segmented
the motion into two signs, one representing downward motion and the other represent-
ing the waving motion, and this version increased after the first cohort of children had
moved through the school (Senghas, Kita, and Özyürek 2004).
The other newly-minted sign language arose among the Al-Sayyid, a Bedouin com-
munity in the Negev Desert in Israel. Some 150 of the people in this community of
about 3,500 have inherited a condition leaving them profoundly deaf, and although
these are in the minority their sign language, Al-Sayyid Bedouin Sign Language
(ABSL), is widely used in the community, along with a spoken dialect of Arabic. Al-
Sayyid Bedouin Sign Language is now in only its third generation of signers. A feature
of Al-Sayyid Bedouin Sign Language is that it lacks the equivalent of phonemes or mor-
phemes. Each signed word is essentially a whole, not decomposable into parts (Aronoff,
Meir, Padden, et al. 2008). In this respect, it seems to defy what has been called duality
of patterning, or combinatorial structure at two levels, the phonological and the gram-
matical. Hockett (1960) listed duality of patterning as one of the “design features” of
language, so its absence in Al-Sayyid Bedouin Sign Language might be taken to
mean that Al-Sayyid Bedouin Sign Language is not a true language. Yet, Hockett him-
self recognized that the design features did not appear all at once, and that duality of
patterning was probably a latecomer. In the early stages of language development,
whole signs or sounds can be used to refer to distinct entities, such as objects or actions,
but as the required vocabulary grows individual signs or sounds become increasingly
indistinguishable. The solution is then to form words that are combinations of elements,
and duality of patterning is born (Hockett 1978). Al-Sayyid Bedouin Sign Language,
then, may be regarded as a language in the early stages of development. Aronoff con-
cludes that words are the primary elements of all languages, and do not acquire phonol-
ogy or morphology until they are thrust upon them; as he put it, “In the beginning was
the word” (Aronoff 2007: 803).
Other signed languages, such as American Sign Language, do have the equivalent of
phonemes, sometimes called cheremes, but more often referred to simply as phonemes
(Emmorey 2002). It is reasonable to suppose that Al-Sayyid Bedouin Sign Language
will evolve duality of patterning in the course of time. The study of both Lenguaje
de Signos Nicaragüense and Al-Sayyid Bedouin Sign Language have therefore yielded
insights into the manner in which language develops structure, perhaps not so much
through some innately given universal grammar (Chomsky 1975) as through a gradual
and pragmatic process of grammaticalization (Heine and Kuteva 2007; Hopper and
Traugott 1993).
2.2. Communication in apes

The gestural theory is also supported by studies of communication in nonhuman pri-
mates, and in particular our closest nonhuman relatives, chimpanzees and bonobos.
Such evidence is of course indirect, since present-day chimpanzees and bonobos
have themselves evolved since the last common human-chimpanzee ancestor some
6 or 7 million years ago. Given the general consensus that chimpanzees and bonobos
are incapable of true language, there is the added problem that language must have
appeared de novo at some point in hominine evolution, and intermediate forms are
not available. Nevertheless, the chimpanzee and bonobo probably provide the best
available proxies for the communicative abilities of the earliest hominines, and provide
a starting point for the construction of evolutionary scenarios.
One reason to suppose that language originated in gestures is that nonhuman pri-
mates have little if any cortical control over vocalization, but excellent cortical control
over the hands and arms (Ploog 2002). Over the past half-century, attempts to teach the
great apes to speak have been strikingly unsuccessful, but relatively good progress
has been made toward teaching them to communicate by a form of manual sign lan-
guage, as in chimpanzee Washoe (Gardner and Gardner 1969), or by pointing to visual
symbols on a keyboard, as in the bonobo Kanzi (Savage-Rumbaugh, Shanker, and
Taylor 1998). These visual forms of communication scarcely match the grammatical
sophistication of modern humans, but are a considerable advance over the restricted
nature of the speech sounds that these animals make. The human equivalents of primate
vocalizations are probably emotionally-based sounds like laughing, crying, grunting, or
shrieking, rather than words.
Of course vocalization is ubiquitous in the animal kingdom, including primates, and
even Kanzi vocalizes incessantly while communicating on the keyboard or making ges-
tures. In most cases, though, animal calls are fixed, and not susceptible to learning,
which is of course a prerequisite for language. Studies suggest that primate calls do
show limited modifiability, but its basis remains unclear, and it is apparent in subtle
changes within call types rather than the generation of new call types (Egnor and
Hauser 2004). Moreover, where language is flexible and conveys propositional informa-
tion about variable features of the world, animal calls are typically stereotyped, and
convey information in circumscribed contexts. In both chimpanzees and bonobos
facial/vocal gestures have been shown to be much more tied to specific contexts, and
therefore less flexible, than manual gestures (Pollick and de Waal 2007). Other studies
have shown that, unlike vocal calls, the communicative bodily gestures of gorillas (Pika,
Liebal, and Tomasello 2003), chimpanzees (Liebal, Call, and Tomasello 2004), and bo-
nobos (Pika, Liebal, and Tomasello 2005) are subject to social learning and are sensitive
to the attentional state of the recipient – both prerequisites for language.
Tomasello (2008) nevertheless notes that although sensitivity to the attentional state
of the recipient is a necessary prerequisite for language, it is not sufficient, at least in
the Gricean sense of language as a cooperative enterprise directed at a joint goal
(Grice 1975). Chimpanzee gestures seem largely confined to making requests that are
self-serving; for example, a chimp may point to a desirable object that is just out of
reach, with the aim of getting a watching human to help. True language requires the fur-
ther step of shared attention, so that the communicator knows not only what the recip-
ient knows or is attending to, but knows also that the recipient knows that the
communicator knows this. This kind of recursive mind-reading enables communication
beyond the making of simple requests to the sharing of knowledge, a distinctive property
of language.
Tomasello (2008) goes on to summarize work on pointing in human infants, suggest-
ing that shared attention and cooperative interchange emerge from about one year of
age. The one-year-old points to objects not only to request them, but also to indicate
things that an adult is already looking at, in the apparent understanding that attention
to the object is shared. Infants therefore seem already to understand sharing and shared
communication before language itself has appeared. In this, they have already moved
beyond the communicative capacity of the chimpanzee. Even so, Tomasello’s work re-
veals an essential continuity between the pointing behaviour of chimpanzees and that of
human infants.
It is also worth making the point that manual gestures provide a much more natural
means of communicating about events in the world than vocal sounds do. Pointing re-
fers naturally to objects and events, proving the link between specific gestures – or
words – to real-world entities. The hands and arms provide a natural vehicle for mim-
icry and description, since they are free to move in four-dimensional space (three of
space, one of time). Indeed the whole body can be used to pantomime events. Of
course, as we have seen, vocalizations can be used to mimic some sounds in onomato-
poeic fashion, but the natural world presents itself much more as a visuo-tactile entity
than as an acoustic one.
2.3. The mirror system

The gestural theory was boosted in the 1990s by the discovery of mirror neurons in the
primate brain (Rizzolatti and Sinigaglia 2008). Mirror neurons, first recorded in area F5
in the ventral premotor cortex of the monkey, are activated both when the animal
grasps an object and when it observes another individual making the same movement
(Rizzolatti, Fadiga, Gallese, et al. 1996). For the gestural theory, the important point is
that F5 lies within the homologue of Broca’s area in humans (Rizzolatti and Arbib
1998), an area long associated with the production of speech (Broca 1861), suggesting
that this area was specialized for manual action before its involvement in speech.
Indeed this work has led to a reappraisal of the role of Broca’s area in humans. It
now appears that Broca’s area can be divided into Brodman areas 44 and 45, with
area 44 considered the true analogue of area F5, and that area 44 is involved not
only in speech, but also in motor functions unrelated to speech, including complex
hand movements, and sensorimotor learning and integration (Binkofski and Buccino
2004).
The relationship between representations of actions and spoken language is further
supported by neuroimaging studies, which show activation of Broca’s area when people
make meaningful arm gestures (e.g., Buccino, Binkofski, Fink, et al. 2001), or even
imagine them (e.g., Hanakawa, Immisch, Toma, et al. 2003). Patients with Broca’s apha-
sia show deficits, not only in speech, but also in the encoding of actions performed by
humans (Fazio, Cantagallo, Craighero, et al. 2009). They are not impaired in the encod-
ing of physical actions. Broca’s area, then, appears to be specialized for both the
production and the understanding of human action, including vocal action.
In nonhuman primates, though, the mirror system seems not to incorporate vocaliza-
tion (Rizzolatti and Sinigaglia 2008), although it is receptive to acoustic input. Kohler,
Keysers, Umiltà, et al. (2002) recorded neurons in area F5 of the monkey responding to
the sounds of manual actions, such as tearing paper or cracking peanuts. Significantly,
there was no response to monkey calls. This is consistent with evidence that vocaliza-
tions in nonhuman primates are under limbic rather than neocortical control (Ploog
2002), and are therefore not (yet) part of the mirror system. Even in the chimpanzee,
voluntary control of vocalization appears to be limited, at best (Goodall 1986).
Mirror neurons are part of a more general “mirror system” involving regions of the
brain other than F5. In monkeys, cells in the superior temporal sulcus (STS) also
respond to observed biological actions, including grasping (Perrett, Harries, Bevan,
et al. 1989), although few if any respond when the animal itself performs an action.
F5 and superior temporal sulcus connect to area PF in the inferior parietal lobule,
where there are also neurons, known as “PF mirror neurons,” that respond to both
the execution and perception of actions (Rizzolatti, Fogassi, and Gallese 2001). This ex-
tended mirror system in monkeys largely overlaps with the homologues of the wider
cortical circuits in humans that are involved in language, leading to the notion that
language evolved out of the mirror system (see Arbib 2005 for detailed speculation).
3. From hand to mouth

We must still reconcile the evidence for gestural origins with the fact that language in
Homo sapiens is predominantly vocal. At some point in our evolution, the mirror sys-
tem must have incorporated vocalization, and our hominine forebears must have gained
a much higher degree of voluntary control over voiced sounds. Some have regarded this
transition as a major hurdle for gestural theory to surmount, including Hewes (1996)
himself. The linguist Robbins Burling wrote that “the gestural theory has one nearly
fatal flaw. Its sticking point has always been the switch that would have been needed
to move from a visual language to an audible one” (Burling 2005: 123)
3.1. Speech as gesture

Part of the answer to this lies in the growing realization that speech itself is a gestural
system, so that the switch is not so much from vision to audition as from one kind of
gesture to another. The notion of speech as gesture underlies the so-called motor theory
of speech perception, which holds that speech sounds are perceived in terms of how they
are produced, rather than on the basis of acoustic analysis (Liberman et al. 1967). The
issue leading to the motor theory was not the lack of information in the acoustic signal,
but rather the fact that individual phonemes are perceived as invariant despite extreme
variability in the acoustic signal. Liberman, Cooper, Shankweiler, et al. (1967) proposed
that invariance lay instead in the articulatory gestures. As Galantucci, Fowler, and
Turvey (2006) recently put it, “perceiving speech is perceiving gestures” (Galantucci,
Fowler, and Turvey 2006: 361) Transcortical magnetic stimulation (TMS) administered
to the motor cortex controlling the lips and tongue has been shown to disrupt the
perception of phonemes produced by these articulators, strongly supporting the motor

theory (D’Ausilio, Pulvermüller, Salmas, et al. 2009) – and of course implicating the
mirror system.
The idea of speech as a gestural system has led to the concept of articulatory pho-
nology (Browman and Goldstein 1995). Speech gestures comprise movements of six
articulatory organs, the lips, the velum, the larynx, and the blade, body, and root of
the tongue. Each is controlled separately, so that individual speech units comprise dif-
ferent combinations of movements. As support for the motor theory, the speech signal
cannot be decoded by means of visual representations of the sound patterns, as in a
sound spectrograph, but can be discerned in mechanical representations of the gestures
themselves, through X-rays, magnetic resonance imaging, and palatography (Studdert-
Kennedy 2005). The transition from manual gesture to speech, then, can be regarded as
one occurring within the gestural domain, with manual gestures gradually replaced by
gestures of the articulatory organs.
Part of the transition probably involved gestures of the face, and especially the
mouth. Some neurons in area F5 in the monkey fire when the animal makes movements
to grasp an object with either the hand or the mouth (Rizzolatti, Camarda, Fogassi,
et al. 1988). Petrides, Cadoret, and Mackey (2005) have identified an area in the mon-
key brain rostral to premotor area 6 that is involved in control of the orofacial muscu-
lature. This area is also considered a homologue of part of Broca’s area. The close
neural associations between hand and mouth may be related to eating rather than com-
munication, but later exapted for gestural and finally vocal language. The connection
between hand and mouth can also be demonstrated behaviorally in humans (Gentilucci,
Benuzzi, Gangitano, et al. 2001). Even the kinematics of speech itself are altered, de-
pending on whether the speaker grasps a large or a small object, or even watches
another individual grasping such objects (see Gentilucci and Corballis 2006 for review).
3.2. An evolutionary scenario

The hominines split from the line leading to chimpanzees and bonobos some 6 to
7 million years ago. These early hominines were distinguished by bipedalism, freeing
the hands from involvement in locomotion, expanding their potential role in gestural
communication. There have been many speculations as to the origins of bipedalism,
but one possibility is that it was driven by the adaptive advantages of improved
communication.
It is unlikely, though, that communication attained language-like properties until the
Pleistocene, which largely overlaps the emergence of the genus Homo, and is usually
dated from about 1.8 million years (or earlier) to about 10,000 years ago. With the
global shift to cooler climate, much of southern and eastern Africa became more
open and sparsely wooded. The hominines were then not only more exposed to attack
from dangerous predators, such as saber-tooth cats, lions, and hyenas, but also obliged
to compete with them as carnivores. The solution was not to compete on the same
terms, but to establish a so-called “cognitive niche” (Tooby and DeVore 1987), relying
on social cooperation and intelligent planning for survival. This may explain the dra-
matic increase in brain size. In the early hominines, brain size was about the same, rel-
ative to body size, as that of the present-day great apes, but from the emergence of the
genus Homo it increased rapidly, and had doubled by about 1.2 million years ago.
It reached a peak about 700,000 years ago, and remains about three times the size
expected in a great ape of the same body size (Wood and Collard 1999).
Communication was no doubt a critical element of the cognitive niche. Survival de-
pended on ability to encode, and no doubt express, information as to “who did what to
whom, when, where, and why” (Pinker 2003: 27). The problem is that the number of
combinations of actions, actors, locations, time periods, implements, and so forth, that
define episodes becomes very large, and a system of holistic calls to describe those epi-
sodes rapidly taxes the perceptual and memory systems. Syntax may then have emerged
as a series of rules whereby episodic elements could be combined.
The expressiveness of modern signed languages tells us that manual gestures would
have been sufficient to carry the communicative burden. A gradual switch from hand to
mouth was probably driven by the increasing involvement of the hands in manufacture,
and perhaps in transporting belongings or booty. Manufacture of stone tools, considered
a conceptual advance beyond the opportunistic use of sticks or rocks as tools, appear in
the fossil record from some 2.5 million years ago (Semaw, Renne, Harris, et al. 1997).
From some 1.8 million years, Homo erectus began to migrate out of Africa into Asia
and later into Europe, and the Acheulian industry emerged, with large bifacial tools
and handaxes, marking a significant advance over the simple flaked tools of the earlier
industry.
With the hands increasingly involved in such activities, the burden of communica-
tion may have shifted to the face, which provides sufficient diversity of movement
and expression to act as a signaling device. Signed languages involve communicative
movements of the face as well as of the hands (Sutton-Spence and Boyes-Braem 2001),
and eye-movement recordings suggest that signers watching signed discourse focus
mostly on the face and mouth, and relatively little on the hands or upper body (Muir
and Richardson 2005). In American Sign Language, facial expressions and head move-
ments can turn an affirmative sentence into a negation, or a question, as well as providing
the visual equivalent of prosody in speech (Emmorey 2002).
The incorporation of vocalization into the gestural system may then have been a rel-
atively small step. One advantage is that voicing makes gestures of the tongue, normally
partly hidden from view, recoverable by the receiver through audition rather than
vision. But vocal communication has other advantages. Unlike signed language, it can
be carried on at night, or when the line of sight between sender and receiver is blocked.
Communication at night may have been critical to survival in a hunter-gatherer society.
The San, a modern hunter-gatherer society, are known to talk late at night, sometimes
all through the night, to resolve conflict and share knowledge (Konner 1982). Speech is
much less energy-consuming than manual gesture. Signing can impose quite severe
physical demands, whereas the physiological costs of speech are so low as to be nearly
unmeasurable (Russell, Cerny, and Stathopoulos 1998). Speech adds little to the energy
cost of breathing, which we must do anyway to sustain life.
But it was perhaps the freeing of the hands for other adaptive functions, such as car-
rying things, and the manufacture of tools, that was the most critical. Vocal language
allows people to use tools and at the same time explain verbally what they are doing,
leading perhaps to pedagogy (Corballis 2002). Indeed, this may explain the dramatic
rise of more sophisticated tools, bodily ornamentation, art, and perhaps music, in our
own species. This is sometimes linked to the Late Stone Age dating from about
50,000 years ago (Klein 2008), but more likely originated from around 75,000 years
ago in Africa ( Jacobs, Roberts, Galbraith, et al. 2008), shortly before the human dis-
persal from Africa at around 55,000 to 60,000 years ago (Mellars 2006). These develop-
ments, associated with Homo sapiens, are often attributed to the emergence of language
itself, but I suggest that the critical innovation was not language, but speech.
It is unclear precisely how the neurophysiological change allowing vocalization to
become fully incorporated into the mirror system occurred. One possible lead comes
from the forkhead box transcription factor, FOXP2. A mutation of this gene has re-
sulted in a severe deficit in vocal articulation, and an functional magnetic resonance ima-
ging study revealed that members of a family affected by the mutation, unlike their
unaffected relatives, showed no activation in Broca’s area while covertly generating
verbs (Liégeois, Baldeweg, Connelly, et al. 2003). Although highly conserved in mam-
mals, the forkhead box transcription factor 2 gene underwent two mutations since the
split between hominine and chimpanzee lines. One estimate placed the more recent
of these at “not less than” 100,000 years ago (Enard, Przeworski, Fisher, et al. 2002).
Contrary evidence, though, comes from a report that the mutation is also present in the
DNA of a 45,000-year-old Neandertal fossil (Krause, Lalueza-Fox, Ludovic, et al.
2007), suggesting that it goes back as much as 700,000 years ago to the common
human-Neandertal ancestor. But this is challenged, in turn, by Coop, Bullaughev, Luca,
et al. (2008), who used phylogenetic dating of the haplotype to re-estimate the time at
a mere 42,000 years ago. Further clarification of this issue, and of the precise role of fork-
head box transcription factor 2 in vocal articulation, may lead to a better understanding
of when and why we came to talk, and how this impacted on human culture.
4. Conclusion
Evidence from signed languages, ape communication, and the mirror system strongly
point to the idea that language originated in manual gestures. The shift to speech
was probably progressive, with increasing involvement of the face, and gradual incorpo-
ration of vocal elements. Even today, manual and facial gestures accompany speech
(McNeill 2005). It is not clear when and how speech eventually became dominant,
but the switch may have been restricted to Homo sapiens, and the consequent freeing
of the hands may explain the extraordinary flowering of manufacture and art in human
culture, and why we eventually prevailed over the Neandertals, who were driven to
extinction some 12,000 years ago (Corballis 2004).
5. References
ary framework for neurolinguistics. Behavioral and Brain Science 28: 105–168.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. Oxford:
Aronoff, Mark 2007. In the beginning was the word. Language 83: 803–830.
Aronoff, Mark, Irit Meir, Carol A. Padden and Wendy Sandler 2008. The roots of linguistic orga-
nization in a new language. Interaction Studies 9: 133–153.
Bacon, Francis 1828. Of the Proficience and Advancement of Learning, Divine and Human.
London: Dove. First published [1605].
Binkofski, Ferdinand and Giovanni Buccino 2004. Motor functions of the Broca’s region. Brain
and Language 89: 362–389.
Broca, Paul 1861. Remarques sur le siège de la faculté de la parole articulée, suivies d’une obser-
vation d’aphémie (perte de parole). Bulletin de la Société d’Anatomie (Paris) 36: 330–357.
Browman, Catherine P. and Louis F. Goldstein 1995. Dynamics and articulatory phonology. In:
Tim van Gelder and Robert F. Port (eds.), Mind as Motion. Explorations in the Dynamics of
Cognition, 175–193. Cambridge: Massachusetts Institute of Technology Press.
Buccino, Giovanni, Ferdinand Binkofski, Gereon R. Fink, Luciano Fadiga, Leonardo Fogassi, Vit-
torio Gallese, Rüdiger J. Seitz, Karl Zilles, Giacomo Rizzolatti and Hans-Joachim Freund 2001.
Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI
study. European Journal of Neuroscience 13: 400–404.
Bulwer, John 1644. Chirologia: On the Natural Language of the Hand. London.
Burling, Robbins 1999. Motivation, conventionalization, and arbitrariness in the origin of lan-
guage. In: Barbara J. King (ed.), The Origins of Language: What Nonhuman Primates Can
Tell Us, 307–350. Santa Fe, NM: School of American Research Press.
Burling, Robbins 2005. The Talking Ape. New York: Oxford University Press.
Chomsky, Noam 1975 Reflections on Language. New York: Pantheon.
Condillac, Étienne Bonnot de 1971. An Essay on the Origin of Human Knowledge. Translated by
T. Nugent. Gainesville, FL: Scholars Facsimiles and Reprints. First published [1746].
Coop, Graham, Kevin Bullaughev, Francesca Luca and Molly Przeworski 2008. The timing of
selection of the human FOXP2 gene. Molecular Biology and Evolution 25: 1257–1259.
Corballis, Michael C. 1992. On the evolution of language and generativity. Cognition 44: 197–226.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Princeton, NJ: Prin-
ceton University Press.
Corballis, Michael C. 2004. The origins of modernity: Was autonomous speech the critical factor?
Psychological Review 111: 543–552.
Cordemoy, Géraud de 1972. A Philosophical Discourse Concerning Speech. Delmar, New York:
Scholars’ Facsimiles and Reprints. First published [1668].
Darwin, Charles 1896. The Descent of Man and Selection in Relation to Sex. London: William
Clowes. First published [1871].
D’Ausilio, Alessandro, Friedemann Pulvermüller, Paola Salmas, Ilaria Bufalari, Chiara Beglio-
mini and Luciano Fadiga 2009. The motor somatotopy of speech perception. Current Biology
19: 381–385.
Egnor, S.E. Roian and Marc D. Hauser 2004. A paradox in the evolution of primate vocal learn-
ing. Trends in Neurosciences 27: 649–654.
Emmorey, Karen 2002. Language, Cognition, and Brain: Insights from Sign Language Research.
Hillsdale, NJ: Erlbaum.
Enard, Wolfgang, Molly Przeworski, Simon E. Fisher, Cecilia S.L. Lai, Victor Wiebe, Takashi Ki-
tano, Anthony P. Monaco and Svante Pääbo 2002. Molecular evolution of FOXP2, a gene in-
volved in speech and language. Nature 418: 869–871.
Fazio, Patrik, Anna Cantagallo, Laila Craighero, Alessandro D’Ausilio, Alice C. Roy, Thierry
Pozzo, Ferdinando Calzolari, Enrico Granieri and Liciano Fadiga 2009. Encoding of human
action in Broca’s area. Brain 132: 1980–1988.
Frishberg, Nancy 1975. Arbitrariness and iconicity in American Sign Language. Language 51: 696–
719.
Galantucci, Bruno, Carol A. Fowler and Michael T. Turvey 2006. The motor theory of speech per-
ception reviewed. Psychonomic Bulletin and Review 13: 361–377.
Gardner, R. Allen and Beatrix T. Gardner 1969. Teaching sign language to a chimpanzee. Science
165: 664–672.
Gentilucci, Maurizio, Francesca Benuzzi, Massimo Gangitano and Silvia Grimaldi 2001. Grasp
with hand and mouth: A kinematic study on healthy subjects. Journal of Neurophysiology
86: 1685–1699.
transition. Neuroscience & Biobehavioral Reviews 30: 949–960.
Goodall, Jane 1986. The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard
University Press.
Grice, Paul 1975. Logic and conversation. In: Peter Cole and Jerry L. Morgan (eds.), Syntax and
Semantics, Vol. 3: Speech Acts, 43–58. New York: Academic Press.
Hanakawa, Takashi, Ilka Immisch, Keiichiro Toma, Michael A. Dimyan, Peter Van Gelderen and
Mark Hallett 2003. Functional properties of brain areas associated with motor execution and
imagery. Journal of Neurophysiology 89: 989–1002.
Heine, Bernd and Tania Kuteva 2007. The Genesis of Grammar. Oxford: Oxford University Press.
Hewes, Gordon W. 1973. Primate communication and the gestural origins of language. Current
Hewes, Gordon W. 1996. A history of the study of language origins and the gestural primacy
theory. In: Andrew Lock and Charles R. Peters (eds.), Handbook of Human Symbolic Evolu-
tion, 571–595. Oxford: Clarendon Press.
Hockett, Charles F. 1960. The origins of speech. Scientific American 203(3): 88–96.
Hockett, Charles F. 1978. In search of love’s brow. American Speech 53: 243–315.
Hopper, Paul J. and Elizabeth C. Traugott 1993. Grammaticalization. Cambridge: Cambridge Uni-
versity Press.
Jacobs, Zenobia, Richard G. Roberts, Rex F. Galbraith, Hilary J. Deacon, Rainer Grün, Alex
Mackay, Peter Mitchell, Ralf Vogelsang and Lyn Wadley 2008. Ages for the Middle Stone
Age of Southern Africa: Implications for human behavior and dispersal. Science 322: 733–735.
Klein, Richard G. 2008. Out of Africa and the evolution of human behavior. Evolutionary Anthro-
pology 17: 267–281.
Kohler, Evelyne, Christian Keysers, M. Allessandra Umiltà, Leonardo Fogassi, Vittorio Gallese
and Giacomo Rizzolatti 2002. Hearing sounds, understanding actions: Action representation
in mirror neurons. Science 297: 846–848.
Konner, Melvin 1982. The Tangled Wing: Biological Constraints on the Human Spirit. New York:
Harper.
Krause, Johannes, Carles Lalueza-Fox, Orlando Ludovic, Wolfgang Enard, Richard E. Green,
Hermàn A. Burbano, Jean-Jacques Hublin, Catherine Hänni, Javier Fortea, Marco de la Ra-
silla, Jaume Bertranpetir, Antinio Rosas and Svante Pääbo 2007. The derived FOXP2 variant
of modern humans was shared with Neandertals. Current Biology 17: 1908–1912.
Liberman, Alvin M., Franklin S. Cooper, Donald P. Shankweiler and Michael Studdert-Kennedy
1967. Perception of the speech code. Psychological Review 74: 431–461.
Liebal, Katja, Josep Call and Michael Tomasello 2004. Use of gesture sequences in chimpanzees.
American Journal of Primatology 64: 377–396.
Liégeois, Frédérique, Torsten Baldeweg, Alan Connelly, David G. Gadian, Mortimer Mishkin and
Faraneh Vargha-Khadem 2003. Language fMRI abnormalities associated with FOXP2 gene
mutation. Nature Neuroscience 6: 1230–1237.
Mellars, Paul 2006. Going east: New genetic and archaeological perspectives on the modern
human colonization of Eurasia. Science 313: 796–800.
Muir, Laura J. and John E.G. Richardson 2005. Perception of sign language and its application
to visual communications for deaf people. Journal of Deaf Studies & Deaf Education 10:
390–401.
Nietzsche, Friedrich W. 1986. Human, All Too Human: A Book for Free Spirits. Translated by R. J.
Hollingdale. Cambridge: Cambridge University Press. First published [1878].
Perrett, David I., Mark H. Harries, Rachel Bevan, S. Thomas, P.J. Benson, Amanda J. Mistlin,
Andrew J. Chitty, Jari K. Hietanen and J.E. Ortega 1989. Frameworks of analysis for the
neural representation of animate objects and actions. Journal of Experimental Biology 146:
87–113.
Petitto, Laura A., Siobhan Holowka, Lauren E. Sergio, Bronna Levy and David J. Ostry 2004.
Baby hands that move to the rhythm of language: Hearing babies acquiring sign language bab-
ble silently on the hands. Cognition 93: 43–73.
Petrides, Michael, Geneviève Cadoret and Scott Mackey 2005. Orofacial somatomotor responses
in the macaque monkey homologue of Broca’s area. Nature 435: 1325–1328.
Pietrandrea, Paolo 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2: 296–321.
Pika, Simone, Katja Liebal and Michael Tomasello 2003. Gestural communication in young gorillas
(Gorilla gorilla): Gestural repertoire, and use. American Journal of Primatology 60: 95–111.
Pika, Simone, Katja Liebal and Michael Tomasello 2005. Gestural communication in subadult bo-
nobos (Pan paniscus): Repertoire and use. American Journal of Primatology 65: 39–61.
Pinker, Steven 2003. Language as an adaptation to the cognitive niche. In: Morten H. Christiansen
and Simon Kirby (eds.), Language Evolution, 16–37. Oxford: Oxford University Press.
Pinker, Steven 2007. The Stuff of Thought. London: Penguin Books.
Pinker, Steven and Paul Bloom 1990. Natural language and natural selection. Behavioral and
Brain Sciences 13: 707–784.
Pizzuto, Elena and Virginia Volterra 2000. Iconicity and transparency in sign languages: A cross-
linguistic cross-cultural view. In: Karen Emmorey and Harlan Lane (eds.), The Signs of Lan-
guage Revisited: An Anthology to Honor Ursula Bellugi and Edward Klima, 261–286. Mahwah,
Plato (360 BCE) Cratylus. Translated by Benjamin Jowett. http://www.ac-nice.fr/philo/textes/
Plato-Works/16-Cratylus.htm
Ploog, Detlev 2002. Is the neural basis of vocalisation different in non-human primates and Homo
sapiens? In: Timothy J. Crow (ed.), The Speciation of Modern Homo Sapiens, 121–135. Oxford:
Pollick, Amy S. and Frans B.M. de Waal 2007. Apes gestures and language evolution. Proceedings
of the National Academy of Sciences 104: 8184–8189.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Cognitive
Sciences 21: 188–194.
Rizzolatti, Giacomo, Rosolino Camarda, Leonardo Fogassi, Maurizio Gentilucci, Giuseppe Lup-
pino and Massimo Matelli 1988. Functional organization of inferior area 6 in the macaque mon-
key. II. Area F5 and the control of distal movements. Experimental Brain Research 71: 491–507.
Rizzolatti, Giacomo, Leonardo Fadiga, Vittorio Gallese and Leonardo Fogassi 1996. Premotor
cortex and the recognition of motor actions. Cognitive Brain Research 3: 131–141.
Rizzolatti, Giacomo, Leonard Fogassi and Vittorio Gallese 2001. Neurophysiological mechanisms
underlying the understanding and imitation of action. Nature Reviews 2: 661–670.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain. Oxford: Oxford University
Press.
Rousseau, Jean-Jacques 1781. Essai sur l’Origine des Langues. It was published posthumously by
A. Belin, Paris, in 1817. Geneva.
Russell, Bridget A., Frank J. Cerny and Elaine T. Stathopoulos 1998. Effects of varied vocal inten-
sity on ventilation and energy expenditure in women and men. Journal of Speech, Language,
and Hearing Research 41: 239–248.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Edited by C. Bally and A. Seche-
haye. Lausanne: Payot.
Savage-Rumbaugh, Sue, Stuart G. Shanker and Talbot J. Taylor 1998. Apes, Language, and the
Human Mind. New York: Oxford University Press.
Semaw, Sileshi, Paul Renne, John W.K. Harris, Craig S. Feibel, Raymond L. Bernor, N. Fesseha
and Ken Mowbray 1997 2.5-million-year-old stone tools from Gona, Ethiopia. Nature 385:
333–336.
Senghas, Ann, Sotaro Kita and Asli Özyürek 2004. Children creating core properties of language:
Evidence from an emerging sign language in Nicaragua. Science 305: 1780–1782.
Studdert-Kennedy, Michael 2005. How did language go discrete? In: Maggie Tallerman (ed.), Lan-
guage Origins: Perspectives on Evolution, 48–67. Oxford: Oxford University Press.
Sutton-Spence, Rachel and Penny Boyes-Braem (eds.) 2001. The Hands Are the Head of the
Mouth: The Mouth as Articulator in Sign Language. Hamburg: Signum-Verlag.
Tooby, John and Irven DeVore 1987. The reconstruction of hominid evolution through strategic
modeling. In: W. G. Kinzey (ed.), The Evolution of Human Behavior: Primate Models, 183–237.
Albany: State University of New York Press.
Vico, Giambattista 1953. La Scienza Nova. Bari: Laterza. First published [1744].
Wood, Bernard and Mark Collard 1999. The human genus. Science 284: 65–71.
Wundt, Wilhelm 1900. Die Sprache. Leipzig: Engelman.
Michael C. Corballis, Auckland (New Zealand)
32. The co-evolution of gesture and speech,

and downstream consequences
1. Introduction
2. Window on the mind
3. “Gesture-first”
4. Mead’s Loop
5. Natural selection of Mead’s Loop
6. The brain that evolved
7. Whence syntax?
8. Origin and ontogenesis, and the return of “gesture-first”
9. Conclusions
10. References
Abstract
This article is a sketch of the “gesture-first” and the gesture and speech (“Mead’s Loop”)
theories of language evolution, plus how a capacity for syntax could have arisen during
linguistic encounters, and implications of the foregoing for current-day language onto-
genesis as it recapitulates steps in language phylogenesis, including both a gesture-first
and a separate gesture and speech stage.
“The actuation of language evolution lies within the interacting individuals, under the
pressure of a host of external ecological factors”. (Mufwene 2010)
1. Introduction
The origin of language, a prodigal topic, has returned to respectability after long exile.
Discoveries in linguistics, brain science, primate studies, children’s development and
elsewhere have inspired new interest after the infamous 19th Century ban (actually,
32. The co-evolution of gesture and speech, and downstream consequences 481
bans) on the topic. The topic can be approached from many angles. Most common
seems to be the comparative – differences and resemblances between humans and
other primates. A related approach is to consider the brain mechanisms underlying
communicative vocalizations and/or gesture. These have been recorded directly in
some primate species and can be compared to humans on some performance measures
thought to depend on similar brain mechanisms. Or a linguistic angle – the key features
of human language and whether anything can be said of how they came to be and
whether other animal species show plausible counterparts. Approaches are combined
in comparing human language to vocalizations, gestures and/or the instructed sign lan-
guage use of, say, orang-utans or chimps. I will take a third approach, gestures, which
also has its devotees, but I shall diverge from past approaches in crucial ways. This
approach is not inconsistent with the others, and in fact will draw on them at various
places.
My guiding idea and fundamental divergence is the following: Gestures are compo-
nents of speech, not accompaniments but actually integral parts of it. Much evidence
supports this idea, but its full implications have not always been recognized. The growth
point (GP) hypothesis is designed to explicate this integral linkage. A key insight is that
speech on the one hand and gesture on the other, when combined in a growth point,
bring semiotically opposite modes of representation together into a single package or
idea unit. The growth point unity of opposite semiotic modes, gesture and speech, cre-
ates a specific form of human cognition that animates language, giving it a dynamic
dimension intersecting the static one accessed by intuitions of linguistic form. This semi-
otic opposition, and the processes to resolve it, propel thought and speech forward. In
this chapter, I develop a sustained argument concerning the evolutionary implications
of the idea that gestures are integral parts of what evolved to make language possible
and how they did so. Fundamentally, what evolved are new forms of thinking and acting
which orchestrate vocal and manual movements in the brain areas in charge of such
movements, and these new orchestrations are based on the synthesis of speech and ges-
ture. The growth point explains this unit of thought and action, and the theme of this
chapter is to uncover some of the evolutionary selections that made this possible.
So what did it take in human evolution to have this system of communicative actions
in which global gesture imagery and “digital” encoded linguistic forms join in smoothly
performed actions? In its essentials the argument of these “notes” is that gesture and
speech are integrated as one dynamic system: this dynamic system is the outcome of
evolutionary selection. Accordingly, explanations of the origin of language can be
judged whether or not they are able to predict it. The “gesture-first” hypothesis of
recent fame does not. I offer a different explanation, called Mead’s Loop, that lays
the neural groundwork for language and gesture to be integrated as we observe it.
2. Window on the mind

The essential claim in the “window onto the mind” article is that language includes ges-
ture; it is not just that it would be interesting to include gesture – it is inseparable from
language in fact and is not an “add-on” (Kendon 2008b). Taking gesture into account,
we see that language is a dynamic system for combining imagery and encoded categor-
ial content during real-time communication. The minimal unit of this dynamic process is
termed the growth point (GP). So any model of how language began would need to
explain how this system of united semiotic oppositions – global and synthetic imagery
and analytic and combinatoric language combined to make a single idea unit – evolved.
To leave it out is to misconstrue what evolution produced. This idea is my main meth-
odological approach – to test a theory of language origin for whether (or not) it explains
the dual semiotic system of imagery and conventional code. A popular current theory,
appearing over and over in a veritable avalanche of recent books, is what I dub “ges-
ture-first”. Despite this name, the primatologist, neuroscientist, developmental psychol-
ogist, anthropologist, sign-language linguist, regular linguist, computer scientist, etc.
proponents of gesture-first seemingly have not carefully analyzed gestures with speech
but rely instead (it appears) on folk culture portrayals. So, they do not recognize a key
point: that language is misconstrued if it is not seen as a duality of semiotic opposites.
In this article, we take up gesture-first and an alternative to be explained called
Mead’s Loop. Gesture-first states that the initial form of language was unspoken – it
was a gesture or a sign language. I show that gesture-first – to put it delicately – is
unlikely to be true, because it is unable to capture the connections of speech and ges-
ture that we, the living counter-examples, display: it “predicts” what did not evolve (that
gesture withered when speech arose) and does not predict what did evolve – that ges-
ture is an integral part of speaking. A theory that says what didn’t happen did, and what
did happen didn’t, cannot be generally true, to say the least. That so many have adopted
it I explain as due to the above-mentioned folk (and fabricated) beliefs about gestures.
The alternative is “Mead’s Loop”, after George Herbert Mead (1974), which does
account for the integrated system of speech-gesture. It also provides a framework for
a remarkably wide range of other phenomena that stem from it, as would be expected
from a founding event. According to Mead’s Loop, speech and gesture were naturally
selected jointly. The dual semiosis is intrinsic to the origin of language. Without gesture
speech would not have evolved, and without speech gesture would not have also –
contrary to both “speech-first” and “gesture-first”. Mead’s Loop creates a “thought-
language-hand link” in the area of the brain that orchestrates complex actions – Broca’s
Area according to much evidence. In completing Mead’s Loop, mirror neurons under-
went a further evolution from the primate norm, a recalibration or “twist” in which they
came to respond, not only to the actions of others but to one’s own gestures, as if from
another. Thus they became capable of using the imagery in gesture to orchestrate vocal
and manual actions. When one makes a gesture, Mead’s Loop mirrors it and brings its
significance into the process of vocal and manual action orchestration, tying together
thought, speech and gesture, and moving the orchestration of actions in Broca’s Area
away from their original vegetative (vocal tract) and manipulative (manual) signifi-
cances. This explains how speech and gesture actions can be orchestrated jointly by sig-
nificances other than those of the actions themselves. And it shows how imagery and
linguistic encoding were united from the beginning. At the same time, Mead’s Loop im-
parted an apparent social reference of gestures – enabling the actions they orchestrate
to mesh with the emerging socio-cultural system of langue, the “social fact” of the lan-
guage system. Many scenarios can be imagined in which Mead’s Loop would have been
selected, but the one that seems most important and likely to have undergone natural
selection is adult-infant instructional interaction, with adults (mothers) the locus of the
natural selection. In this situation, Mead’s Loop gives the adult a sense of being an
instructor as opposed to being just a doer with an onlooker the chimpanzee way. As
many have observed, entire cultural practices of childrearing depend upon this sense.
3. “Gesture-first”
The popularity of this theory of the origin of language was sparked in part by a genuine
discovery, the discovery of mirror neurons – although it far antedates this discovery,
going back to at least the 18th C. (see Harris and Taylor 1989). It says the first step
of language was not speech, nor speech with gesture, but was gesture alone. Gestures
in this theory are regarded as mimicry of real world actions, hence the connection to
mirror neurons. In some versions, the gestures are said to have been structured as a
sign language. This view has inspired a surprising range of enthusiasts (see for instance
Arbib 2005; Armstrong and Wilcox 2007; Corballis 2002; Rizzolatti and Arbib 1998;
Tomasello 2008; Volterra, Caselli, Caprici, et al. 2005) and has taken on something
like the default assumption about the place of gesture in the origin of language and
a few Doubting Thomases. We (the co-authors of McNeill, Duncan, Cole, et al. 2008)
are among the latter, along with Woll (2009) and Dessalles (2008) and possibly a few
others.
Gesture-first is correct in asserting that language could not have come into existence
without gesture: on this point, we agree. The error lies in the explanation for why that
was so; that gesture had to precede speech, hence “gesture first”, but, we say, if this did
take place (and we do not deny it) it could not have led to human language. Instead, it
would have landed at a different point on the Gesture Continuum, pantomime. The
Gesture Continuum (formerly “Kendon’s Continuum”, renamed at Kendon’s request):
Gesticulation → Language-slotted → Pantomime → Emblems → Sign language
As one goes from gesticulation to sign language the relationship of gesture to speech
changes:
– Speech presence declines.

– Language-like properties increase.
– Socially regulated signs replace spontaneously generated form-meaning pairs.
Language-slotted gestures look like gesticulations but differ in the manner of integrating
with speech. They enter a grammatical slot, semiotically merge with speech, and acquire
syntagmatic values from it; gesticulation in contrast is gesture co-produced obligatorily
with speech but semiotically opposed to it. Pantomime is gesture without speech, often
in sequences and usually comprised of simulated actions. Sign languages are full linguis-
tic codes with their own static dimension. This terminology follows for the most part
Kendon (1980).
There is some evidence described later in children’s early ontogenesis that such a
phase evolved, but if so it was able to support at most a semi-language. A separate
and different evolution, Mead’s Loop, created the dynamic dimension that we now see.
3.1. The problem with it

Even if mirror neurons were a factor in the origin of language, our basic claim is that a
primitive phase in which communication was by gesture or sign alone, if it existed, could
not have evolved into the kind of speech-gesture combinations that we observe in
ourselves today. We don’t deny that such a phase could have existed, but do claim that it
could not have been a true protolanguage; it has difficulty explaining how gestures com-
bine with speech to comprise language, as we observe it. Assuming gesture-first, a tran-
sition to later speech would not have led to speech and gesture units, synchronized at
points where they are co-expressive of newsworthy content but to mimicry/pantomime
or perhaps a sign language, but no speech-gesture units. We ourselves are living
contradictions of gesture-first.
We say that “gesture-first” incorrectly predicts that speech would have supplanted
gesture, and fails to predict that speech and gesture became a single system. It thus
meets, along with the growth point, the requirement of falsifiability and (unlike the
growth point) is falsified – twice in fact.
3.1.1. Did gesture scaffold speech, then speech supplant it?

The problem created by gesture-first is that, when speech emerges, an intermediate
“cross-over” phase must be posited in which speech supplanted the original gesture lan-
guage. Henry Sweet (and presumably Henry Higgins) believed that speech emerged
when mouth parts mimicked manual gestures, which then disappeared:
Gesture (…) helped to develop the power of forming sounds while at the same time help-
ing to lay the foundation of language proper. When men first expressed the idea of “teeth”,
“eat”, “bite”, it was by pointing to their teeth. If the interlocutor’s back was turned, a cry
for attention was necessary which would naturally assume the form of the clearest and most
open vowel. A sympathetic lingual gesture would then accompany the hand gesture which
later would be dropped as superfluous so that ADA or more emphatically ATA would
mean “teeth” or “tooth” and “bite” or “eat”, these different meanings being only gradually
differentiated. (Sweet 1888: 51, emphasis added)
(Thanks to Bencie Woll, pers. comm., for bringing this passage to my attention).
Rizzolatti and Arbib (1998) also see gesture as fading once speech has emerged:
“Manual gestures progressively lost their importance, whereas, by contrast, vocalization
acquired autonomy, until the relation between gestural and vocal communication
inverted and gesture became purely an accessory factor to sound communication” (Riz-
zolatti and Arbib 1998: 193). More recently, Stefanini, Caselli, and Volterra (2007: 219,
referring to Gentilucci and Dalla Volta 2007 and others): “the primitive mechanism that
might have been used to transfer a primitive arm gesture communicative system from
the arm to the mouth…” which again implies supplantation.
Most recently of all, Armstrong and Wilcox (2007), in The Gestural Origin of Lan-
guage, imply supplantation, as it were, by silence. They do not, as far as I have been able
to tell, consider how speech came into being at all after sign language (as they identify
gesture-first); but skipping it does not skirt the mystery of how speech supplanted ges-
ture and still ended up being integrated with it. And Tomasello (2008), thinking in terms
of primate and very young (one-year and less) human infant gestures, likewise puts
forth “gesture-first” with supplantation (referring to ontogenesis, but with the sugges-
tion that something similar took place in phylogenesis): “Infants’ iconic gestures
emerge on the heels of their first pointing … they are quickly replaced by conventional
language … because both iconic gestures and linguistic conventions represent symbolic
ways of indicating referents” (Tomasello 2008: 323). It is not that the transitions he
mentions do not occur, but that they are insufficient to lead to the incorporation of
gesture as we see it in current-day human speech.
3.1.2. Pantomime?
Michael Arbib (2005) advocates an origin role for pantomime. Pantomime is consid-
ered to follow directly from mirror neuron action. In Arbib’s proposal the initial com-
municative actions were symbolic replications of actions both of self, others and entities,
and these pantomimes could later “scaffold” speech. Merlin Donald (1991) likewise
posited mimesis as an early stage in the evolution human intelligence. There may
indeed have been pantomimes without vocalizations used for communication at the
dawn, in which case pantomime could have had its own evolution, landing at a different
place on the Gesture Continuum, as seen today in its own temporal relationship to
speech: substituting for rather than synchronizing with it.
The distinguishing mark of pantomime compared to gesticulation, the point on the
Gesture Continuum where speech and gesture combine, is that the latter, but not the
former, is integrated with speech. Gesticulation is an aspect of speaking. The speaker
constructs a gesture-speech combination. In pantomime this does not occur. There is
no co-construction with speech, no co-expressiveness, timing is different (if there is
speech at all), and no duality of semiotic modes. Pantomime, if it relates to speaking
at all, does so, as Susan Duncan (pers. comm.) points out, as a “gap filler” – appearing
where speech does not. Movement by itself offers no clue to whether a gesture is ges-
ticulation or pantomime; what matters is whether or not two modes of semiosis com-
bine to co-express one idea unit simultaneously. It is conceivable that pantomime is
something an ape-like brain would be capable of, and was a proto-system already in
place in the last common chimp(say)-human ancestor, some 8 million years back
(while it could have been a step on the way, it would not identify the events that led
to language specifically).
3.1.3. Why does speech have to supplant gesture in gesture-first?

Michael Arbib in his 2005 Behavioral and Brain Sciences article argues that the original
gesture code “scaffolded” speech and this may seem a step away, but still gesture either
then withers or retreats into the background and in either case is not part of speaking
itself.
The reason why supplantation, overt or hidden, is inescapable in gesture-first is
found in the very core assertions of the theory. Gesture, according to gesture-first, is
a stand-alone code, unintegrated with speech; speech when it appears is another such
code; and two unintegrated codes do not integrate. The whole logic is to picture one
code coming after another, never to create speech and gesture as a single integrated sys-
tem of semiotic opposites. For this opposition, gesture or speech must not be a code, or
must be a code of a different kind. In current-day humans, gestures are semiotic oppo-
sites of encoded speech – global/synthetic vs. segmented/combinatoric, etc. It is not a
saving grace to point to gestures in nonlinguistic primates, and say these are the gestures
that preceded language, for of course they are not – they are gestures that do not appear
with language, and there is nothing in them to show how they could lead to it without
encountering the same obstacle of supplantation that we have been detailing.
Corballis (2002), in an argument for speech supplanting a gesture-first language,

points out the advantages of speech over gesture in a system of communication.
There is the ability to communicate while manipulating objects and to communicate
in the dark. Less obviously, speech reduces demands on attention, he argues, since in-
terlocutors do not have to look at one another (Corballis 2002: 191). However, these
qualities are irrelevant for gesture-first. This is because there are also positive reasons
for gestures not being language-like; speech is the default medium for linguistic encod-
ing. All across the world, languages are spoken/auditory unless there is some interfer-
ence to this channel (deafness, acoustic incompatibility, religious practices, etc.); none
are primarily visual/gestural. Susan Goldin-Meadow, Jenny Singleton and I (1996)
once proposed that gesture is non-linguistic because it is better than speech for imagery:
gesture has multiple dimensions on which to vary, while speech has only the dimension
of time:
We speculate that having segmented structure in the oral modality as we currently do

leaves the manual modality free to co-occur with speech and to capture the mimetic as-
pects of communication along with speech. Thus, our current arrangement allows us to
retain along with a segmented representation, and in a single stream of communication,
the imagistic aspects of the mimetic that are so vital to communication (Goldin-Meadow,
McNeill, and Singleton 1996: 52).
These Notes argue, as is plain by now, that imagery is not just “vital” but is language in
part, no less so than the segmented verbal atoms with which it combines to form growth
points.
Given this asymmetry, even though speech and gesture were jointly selected, it
would still work out that speech is the medium of linguistic segmentation. As a result,
natural selection for gesture-speech combinations would still have had speech handling
the linguistic semiotic. Sign languages – their existence as full linguistic systems –
impress many as a justification for positing gesture-first, but in fact, historically and
over the world, manual languages are found only when speech is unavailable; the dis-
crete semiotic then shifting to the hands. So it is not that gesture is incapable of carrying
a linguistic semiotic, it is that speech (to us visually disposed creatures) does not carry
the imagery semiotic.
3.2. Models of supplanting and scaffolding

To look at what happens when two codes co-occur and why it is inappropriate to say
that a duality of semiotic modes exists in this situation, we have two models: Aboriginal
signs performed with speech, and bilingual American Sign Language signs and speech;
in neither case is there the formation of packages of semiotic opposites. When a pairing
of semiotically equivalent gesture and speech is examined in these two models, the
two actively avoid speech-gesture combinations at co-expressive points. They either
repel each other in time or functionality or both, and do not coincide at points of co-
expressivity. This could be because when speech and gesture are both encoded they
are each action orchestration schemes of their own. And we non-robots are unable
to orchestrate different encoded streams at a time. The growth point meets a one-action
restriction since it combines gesture and speech into a single action, the gesture being
the orchestration framework for speech.
3.2.1. Warlpiri sign language

Warlpiri sign language is used by women when they are under speech bans but also
sometimes along with speech even though it is normally a speech substitute. This latter
usage lets us see what may have occurred at the hypothetical gesture or sign-speech
crossover. Here is one example from Kendon (1988):
Signs: grab-stroke of female-stroke husband-stroke of niece-stroke
Speech: grab hold of her yourself husband of my niece
The spacing is meant to show relative durations, not that signs were performed with
temporal gaps. What happens is that at the beginning of each phrase speech and
sign start out together but immediately fall out of synchrony, and then reset (there
is one reset in the example). The process of separation then occurs again. So, accord-
ing to this model, speech-gesture synchrony would be systematically interrupted at the
crossover. Yet synchrony of co-expressive speech and gesture evolved.
3.2.2. English-American Sign Language bilinguals

A second model is Emmorey, Borinstein, and Thompson’s (2005) observations of the
frequent pairings of signs and speech by hearing American Sign Language/English bi-
linguals. While 94% of such pairings are signs and words translating each other, 6% are
not mutual translations. In the latter, sign and speech collaborate to form sentences,
half in speech, half in sign. For example, a bilingual says, “all of a sudden [LOOKS-
AT-ME]” (from a Sylvester and Tweety cartoon narration; capitals signify signs simul-
taneous with speech). This could be “scaffolding” but notice that it does not create the
combination of unlike semiosis that we are looking for. Signs and words are of the same
semiotic type – segmented, analytic, repeatable, listable, and so on. There is no global-
synthetic component, and no built-in merging of analytic/combinatoric forms with
global synthesis. Of course, American Sign Language /English bilinguals have the abil-
ity to form growth point-style cognitive units. But if we imagine a transitional species
evolving this ability, the Emmorey, Borinstein, and Thompson (2005) model suggests
that scaffolding did not lead to growth point-style cognition; on the contrary, it implies
two analytic/combinatoric codes dividing the work. If we surmise that an old panto-
mime/sign system did scaffold speech and then withered away, this leaves us unable
to explain how gesticulation, with the special cognitive process we have described,
emerged and became engaged with speech. We conclude that scaffolding, even if it
occurred, would not have led to current-day speech-gesticulation linkages.
3.3. Absorbing pantomime and pointing

It is possible that two kinds of gesture – pantomime and possibly pointing – evolved
apart from language. This may seem another form of gesture-first but for the reasons
just recounted they could not have been precursors of our current-day language-gesture
system. And there are ontogenetic reasons to suggest that pointing and character view-
point gestures, in the end, are effects of language, not steps toward it. What is crucial is
that there be developed the semiotic contrasts of the global-synthetic to the analytic-
combinatoric modes that can fuel thought and speech, and pantomime and pointing
possibly could not do this. However, once language with gesture was established, pan-
tomime and pointing could be absorbed by it, pantomime becoming integrated as “char-
acter viewpoint” or CVPT gestures, pointing as deixis, both of the concrete and the
“abstract” metaphoric kinds. In ontogenesis, pointing and pantomime precede, speak-
ing follows and absorbs. Both emerge well in advance of encoded speech and, for a
time, alternate with it, patterns that exhibit the initiation and then absorption of
these separate gesture modes. The integration of pantomime with language to make
CVPT seems to occur not before age 3 and possibly even later, and this is also the ear-
liest ages at which there is any evidence of growth points. Pointing occurs throughout
this development but is so imprecisely timed with speech, even with adults, that it is
hard to say whether it does or does not participate in any sort of growth point process.
Abstract or metaphoric pointing to help organize discourse is definitely absent from
pre-growth point children. So there are various reasons to regard pantomime and point-
ing as products of their own evolutions which are, now, absorbed by a separately
evolved dynamic linguistic process.
3.4. Final word on gesture-first

The route from gesture-first, in which one code takes over from another, does not lead
to current humans. Gesture-first is admirable in that it can be falsified, à la Popper, and
it is false – it predicts what did not evolve (a language of pantomime or a sign language)
and does not predict what did evolve (our uniquely human language-gesture integra-
tion). It is possible that there was a gesture-first phase of human evolution. Although
it could not lead to language-gesture integration, it has left a residue in current-day hu-
mans that appears at certain points in children’s language development, which at these
points shows the very non-integration that makes gesture-first unlikely (putting it
mildly) or impossible as a precursor to language itself.
Whether you are persuaded by this overall argument depends, ultimately, on taking
seriously the idea that gesture and speech comprise a single multimodal system, that ges-
ture is not an accompaniment, ornament, or supplement to speech, but is actually part of
speaking. The growth point hypothesis is designed to articulate this unified speech-
gesture system as the minimal unit of the imagery-language dialectic. When we look
at models of speech-gesture cross-overs of the kind that, in theory, gesture-first would
have encountered when speech began to supplant the original gesture language, we do
not find conditions for the growth point dialectic, but instead non-co-expressiveness or
mutual exclusion of speech and gesture.
Joining the damage is Woll’s (2009) argument that not only does gesture-first leave
gestures unable to integrate with speech but it also blocks, within speech itself, the arbi-
trary pairing of signifiers with signifieds that is characteristic of (or, says Saussure 1916,
defining) a linguistic code.
The upshot is that gesture-first, for all its appeal, has little to do with the origin
of spoken language; it explains, at best, the evolution of pantomime, a different point
altogether on the Gesture Continuum.
4. Mead’s Loop
We now consider Mead’s Loop, an explanation of how the imagery-language dialectic
and growth point could plausibly have been naturally selected at the origin. Mead’s
Loop is a hypothesis concerning mirror neurons and how they were altered. According
to Mead’s Loop what had to evolve is a new ability for mirror neurons to self-respond
to one’s own gestures. The idea is inspired by George Herbert Mead who proposed that
the significance of a gesture depends on the gesture arousing in oneself the same
response as it does in others (Mead 1974: 47). Mead wrote this much earlier, probably
in the 1920s. I first saw this description in a paper-bound typescript by Mead (with pen-
ciled annotations) on the shelves of the University of Chicago library, under the title
Philosophy of Gesture. The pamphlet has since vanished, presumably sequestered in
some special collection. Mead’s Loop treats one’s own gesture as a social stimulus,
and thus explains why gestures occur preferentially in a social context of some kind
(face-to-face, on the phone, but not speaking to a tape recorder, Cohen 1977). A social
reference is intrinsic to mirror neurons. The Mead’s Loop “twist” is that one’s own ges-
tures are treated as having the same social reference. In so doing, it opens the mirror
neuron response to the range of significances that gesture is able to carry and brings
an inherent “Other” orientation to gesture – both essential properties of what evolved.
Without Mead’s Loop, mirror neurons respond to the actions of others with the sig-
nificances they have as actions: with it, they take on a range of different meanings, the
meanings of gesture. So, according to Mead’s Loop, part of human evolution was that
mirror neurons came to respond to one’s own gestures, including their imagery, as if
from another person.
To form Mead’s Loop, speech and gesture had to evolve together. Gesture-first and
speech-first (another alternative) are equally inadequate to the job. At the motor level,
Mead’s Loop provides a way for significant imagery – carried by gesture – to orches-
trate speech motor control. Gestures in the early hominid line would, with Mead’s
Loop, co-opt the vocal action circuits. Mead’s Loop gesture gains the property of
chunking: a chunk of linguistic output organized around significant imagery, rather
than, as with unmodified mirror neurons (shared by all primates), the significance of
instrumental actions qua action.
4.1. A more immediate step

Rizzolatti and Arbib (1998) linked mirror neurons to case grammar – a high-level gram-
mar of semantic connections, such as action-on-object. However, this linkage bypasses
more immediate steps, which (in our view) were more likely in any case to have been
the objects of natural selection. Moreover, their proposal risks the “movie-in-reverse”
fallacy that also affects evolutionary psychology, by starting from a current-day linguis-
tic function and seeking an origin of this very function in a hypothetical evolution that
selected it.
What would a more immediate step have been? Adam Kendon describes the link in
the following terms,
we still need to add something that allows us to understand how the actions of the mouth
and associated vocalisation came to be available, as it were, so that they could be recruited
into the referential gesture function. For this to be possible an elaborate and voluntary
control of the vocal system must already have been in place. In other words, a scenario
for the evolution of the human speech apparatus and its neuro-motor control systems is
also needed (Kendon 2009 ms., p. 29)
Mead’s Loop provides the connection. Through Mead’s Loop, gesture imagery enters
the part of the brain, Broca’s area, where complex actions are orchestrated, and pro-
vides a new basis of action, imagery, and does so in such a way that the neuro-motor
control system that Kendon mentions is a product rather than a precondition for the
recruitment of vocal tract orchestration by gesture function.
Mead’s Loop is a necessary but far from sufficient step toward language at the
dawn – the evolution of a brain link between thought, language and hand that enables
the orchestration of movements by significances other than those of the actions them-
selves. Other steps could have arisen after the Mead’s Loop step, and in fact would
require it. Some of these steps may have acquired biological adaptations, real genetic
changes and a few possibilities are suggested later.
Four steps comprise our Mead’s Loop hypothesis:
(i) Primates in general have mirror neurons but, by themselves, they mirror only the
intentional actions of others.
(ii) Mead’s Loop brings gestures into the same areas that have mirror neurons.
(iii) The gestures bring their own significances. They also have the quality of referring
to the Other – with them comes an intrinsic social presence.
(iv) These significances (including social references) then become the orchestrating
schemas of the once vegetative movements, making them into speech movements.
It is meant to explain:
(i) The synchronization of gesture with vocalization on the basis of shared meaning
(= the only way they synchronize)
(ii) The co-opting by meanings of brain circuits that orchestrate sequential actions and
(iii) The occurrence of gestures preferentially in a social context of some kind (face-to-
face, on the phone, but not speaking to a tape recorder – two modern challenges to
Mead’s Loop).
That Mead’s loop treats imagery as a social stimulus, explains point (iii). That mirror
neurons complete Mead’s loop in a part of the brain where action sequences are orga-
nized, means the two kinds of sequential actions, speech and gesture, converge and that
meaning in imagery is the integrating component, and this explains points (i) and (ii).
Hence, gesture and speech necessarily co-evolved in the Mead’s Loop hypothesis, and
this is the chief incompatibility with the gesture-first theory.
4.1.1. Compared to theory of mind

Mead’s Loop is not theory of mind. Mead’s Loop adaptation brings self-awareness of
one’s own behavior as social, not a theory of the cognitions and intentions of another
(theory of mind could have evolved, possibly earlier, from straight mirror neurons
with their general primate functionality of recognizing the intentional actions of

others).
4.1.2. Compared to imitation

Nor is Mead’s Loop imitation. A report in Science summarizes a controversy among pri-
matologists (2008), whether chimps learn through imitation (with the consensus being
that they do), but Mead’s Loop is not about this. In a learning situation, imitation
and Mead’s Loop can function collaboratively but each has a different role; the Loop
fortifies the “instructor” with self-awareness. Imitation provides a response mechanism
by the learner. Thus Loop and imitation can work together but are not the same.
5. Natural selection of Mead’s Loop

5.1. Speech and gesture selected together
Speech and gesture would have evolved jointly to establish Mead’s Loop: this, in
quotes, is a “prediction” of events in the remote past (see Volterra, Caselli, Caprici,
et al. 2005).
Fig. 32.1: Image thanks to William Hopkins.
5.2. The kinds of gestures Mead’s Loop could have co-opted

Mead’s Loop assumes a creature performing significant gestures of the kind seen now in
the great apes, chimps and bonobos in particular. The plausibility of the Mead’s Loop
hypothesis is bolstered if gestures like those could have been precursors that had the
potential to be responded to by the one making them as if from another. Such gestures
could have been present in the Mead’s Loop creature as well.
One precursor is suggested by the discovery that chimpanzees show a right-hand
dominance for gestures only when movements co-occur with vocalization (Hopkins
and Cantero 2003). Barring independent evolution, such combinations could also
have existed with the last common human-chimp ancestor and would have provided
raw material for co-opting the motor area by imagery – or raw material from even fur-
ther back: Fogassi and Ferrari (2004) have identified neural mechanisms in monkeys for
associating gestures and meaningful sounds, which they suggest could be a pre-adaptation
for articulated speech.
It is the case that chimps produce gestures and adapt new gestures for communica-
tive purposes (see Call and Tomasello 2007). Mead’s Loop also could use these kinds of
manual signals as raw material. What made us human according to this model crucially
depended at one point on such gestures, which Mead’s Loop altered fundamentally. Ob-
servations by Amy Pollick (see for instance Pollick and de Waal 2007) of bonobos at the
San Diego Zoo and elsewhere reveal a use of pantomimic/iconic gestures to induce
movements by other bonobos (but not, so far as I know, ever humans). The gesture re-
plicates the motion the performer desires from the other bonobo. She does not replicate
the entire movement with all her limbs, but symbolizes it in a hand movement, forming
an iconic gesture.
The significance of the gestures in the illustrations was clear to both producer and
recipient, and was to get the second chimp to move. The gestures were totally inde-
pendent of vocalization (for there was none) but when we imagine such gestures in
the Mead’s Loop creature, vocalization would also be present (like the Hopkins
chimp).
Start of swing gesture End of swing gesture
Start of second swing, other bonobo moves End of second swing, other moving
Fig. 32.2: Images thanks to Amy Pollick.

So this kind of gesture could have been raw material for the mirror neuron “twist”. That
it has not happened (at least, not yet) with the bonobo does not mean it couldn’t have
been available as raw material for the creature in which we are presuming language to
have originated.
5.3. When and how the GP evolved

The phrase “the dawn of language” suggests that language burst forth at some definite
point, say 150~200 thousand years ago, when the prefrontal expansion of the human
brain was completed (Deacon 1997). But the process has elements that may have
begun long before – 5 million years ago for bipedalism, on which things gestural
depend – although I am told by Erica Cartmill (pers. comm.) that gorillas, the least bi-
pedal of the great apes, use manual gestures, even bimanual ones; so bipedalism per-
haps is not as crucial for gesture as one would suppose; nonetheless, it seems
necessary for continuous gesturing. I think 2 million years ago for expansion of the fore-
brain and for the selection of self-responsiveness of mirror neurons and the resulting
reconfiguring of areas 44/45, based on Wrangham (2001) and Deacon (1997) who
date humanlike family life to then. I imagine that this form of living was itself the prod-
uct of changes in reproduction patterns, female fertility cycles, child rearing, neotony, all
of which must have been emerging over a long period before. So what this says is that
language as we know it emerged over 1.8 million years and that not much has changed
since the 150K˜200K landmark of reconfiguring Broca’s Area within the mirror neu-
rons/Mead circuit (although this date could overlook continuing evolution: there are
hints that the brain has changed since the dawn of agriculture and urban living; Lahn
group: see Evans, Gilbert, Mekel-Bobrow, et al. 2005).
The Mead’s Loop model doesn’t say what might have been the precursor to language
before 2 million years ago – Lucy and all. It would have been something an apelike
brain is capable of. There are many speculations about this – Kendon (1991), for exam-
ple, proposed that signs emerged out of ritualized incipient actions. Natural gesture sig-
nals in modern apes have an incipient action quality as well, the characteristic of which
is that an action is cut short and the action-stub becomes a signifier; a kind of meto-
nymy. The slow-to-emerge precursor from 5 million years ago to 2 million years ago
may have built up a gesture language that derived from instrumental actions as envi-
sioned in gesture-first. It would have been an evolution track leading to pantomime.
But the human brain evolved further to create a system in which gesture fuses with vo-
calization; this was a fundamental shift. This further evolution (post 2 million years ago)
changed the character of gesture and vocalization to become what we see today, and
is – we suppose – the basis of language in its current form.
5.4. Scenarios
Natural selection for Mead’s Loop could arise whenever sensing oneself as a social
object is advantageous – as for example when imparting information to infants,
where it gives the adult the sense of being an instructor as opposed to being just a
doer with an onlooker (the chimpanzee way). Entire cultural practices of childrearing
depend upon this sense (Tomasello 2008). Obviously, there would have been a range
of such instruction scenarios: passing on skills, planning courses of action, cultural
inculcations of all sorts, including “Machiavellian” – or “Macachiavellian”, as Dario

Maestripieri (2007) terms it – scheming. The mother-infant scenario links to MacNei-
lage and Davis’ (2005) proposal that “the first words might have been parental terms
formed in the Baby Talk context of parent-infant interaction” (MacNeilage and
Davis 2005: 183). Some self-awareness as an agent is necessary for any advantage of
Mead’s Loop to take hold. The adult must be sensitive to its own gestures as social ac-
tions. A role of self-aware agency also appears ontogenetically. The mother-infant sce-
nario has traceable implications for current-day language development by children. A
major step described below is the development by the child of the self-awareness of act-
ing as an agent (see Hurley 1998), something that could reflect an origin scenario and
Mead’s Loop selection could have connected to it.
Erica Cartmill (pers. comm.) observes in orang-utans what could have been a self-
aware agency precursor if it also had occurred in the Mead’s Loop creature – mothers
visually monitoring their infants when the infants are performing the same actions as
she; similarly, Washoe shaped signs in the hands of her adopted son – which also
could give the evolution of Mead’s Loop something from which to launch. On the
other hand, mothers in these species may not monitor the infant’s response to her ac-
tions, which is the innovation Mead’s Loop would favor directly. Such precursors could
have existed in the creature undergoing Mead’s Loop selection as well.
All of this suggests a scenario of organized family life by bipedal creatures inducing
changes in brain configuration and function. An important selection pressure for Mead’s
loop would be the family circle (not only “man the hunter” or “man the tool maker”) –
this setting offering a role for Mead’s Loop where one’s own meaningful movements
are treated as a social stimulus, especially as an engine of childrearing and infant care.
Sensing oneself as a social object would be naturally selected if it had an impact by
inculcating children who do better at coping. They would have the advantages this pro-
vides, carry the genetic disposition to Mead’s Loop, and in turn pass it on. The focus of
the selection pressure would be mothers; their infants, male and female, would benefit
from better cultural inculcation. Sarah Hrdy (2009) has highlighted a related enriching
aspect of early human family life, the ability to engage in collective infant rearing. Such
a practice stands in sharp contrast to chimpanzee infant-rearing, where infants, if left
without their mothers, are neglected and are even vulnerable to attacks by adults in
the same group. Group rearing of infants, including the infants of other parents, clearly
demands (and hence, naturally selects) responding to one’s own movements as social
objects in Mead’s Loop. The selection of vocalizations plus gestures with which to cre-
ate growth points could co-opt any vocalization-gesture preadaptations that may have
existed and push them into areas 44 and 45 to fuse into meaning-orchestrated move-
ments. This evolution cannot be tied to a single step. It took place through many con-
verging steps over a long period. Missing a crucial step, family life in particular, it would
never have occurred at all.
5.5. Time line

A proposed time line is as follows.
(i) To pick a date, the evolution of a thought-language-hand link started 5 million

years ago with the emergence of habitual bipedalism in Australopithecus. This
freed the hands for manipulative work and gesture, but it would have been only the
beginning. Even earlier there were presumably preadaptations such as an ability to
combine vocal and manual gestures, and the sorts of iconic/pantomimic gestures
we saw in bonobos, but not yet an ability to orchestrate movements of the vocal
articulators by gestures.
(ii) The period from 5 to 2 million years ago – Lucy et al. and the long reign of
Australopithecus – would have seen the emergence of precursors of language,
something an apelike brain would be capable of, such as the kind of protolanguage
Bickerton attributes to apes, very young children and aphasics (Bickerton 1990);
also, ritualized incipient actions become signs at this stage as described by Kendon
(1991).
(iii) Starting about 2 million years ago with the advent of Homo habilis and later Homo
erectus, there commenced the crucial selection of self-responsive mirror neurons
and the reconfiguring of areas 44 and 45, with a growing co-opting of actions by
language, this emergence being grounded in the appearance of a humanlike family
life with the host of other factors shaping this major change (including cultural in-
novations like the domestication of fire and cooking). Recent archeological find-
ings strongly suggest that hominids had control of fire, had hearths, and cooked
800 thousand years ago (Goren-Inbar, Alperson, Kislev, et al. 2004). Another cru-
cial factor would have been the physical immaturity of human infants at birth and
the resulting prolonged period of dependency giving time, for example, for cultural
exposure and growth points to emerge at leisure, an essential delay pegged to the
emergence of self-aware agency. Thus, the family as the locus for evolving the
thought-language-hand link seems plausible.
(iv) Along with this sociocultural revolution was the expansion of the forebrain from
2 million years ago, described by Deacon (1997), and a complete reconfiguring
of areas 44 and 45, including Mead’s loop, into what we now call Broca’s area.
This development was an exclusively hominid phenomenon and was completed
with Homo sapiens about 200–100 thousand years ago (if it is not continuing;
see Donald 1991).
Considering the timeline, protolanguage and then language itself seems to have
emerged over five million years (far therefore from a big bang). Meaning-controlled
manual and vocal gestures, synthesized, as we currently know them, under meanings
as growth points emerged over the last two million years. The entire process may
have been completed not more than 100 thousand years ago, a mere 5,000 human
generations, if it is not continuing.
6. The brain that evolved

In the brain model that current-day gesture and speech integration implies, the neuro-
gesture system involves both the right and left sides of the brain in a choreographed
operation with the following parts: The left posterior temporal speech region of Wer-
nicke’s area supplies categorial content, not only for comprehension but for the creative
production of verbal thought; this content becomes available to the right hemisphere,
which seems particularly adept at creating imagery to capture discourse content (see
McNeill this volume and Feyereisen volume 2). The right hemisphere could also play
a role in the creation of growth points. This is plausible, since growth points depend on
the differentiation of newsworthy content from context and require the simultaneous
presence of linguistic categorial content and imagery, both of which seem to be avail-
able in the right hemisphere. The frontal cortex also might play a role in constructing
fields of oppositions and psychological predicates, and supply these contrasts to the
right hemisphere, there to be embodied in growth points. The prefrontal cortex devel-
ops slowly in children (Huttenlocher and Dabholkar 1997), and in fact does not reach a
level of synaptic density matching the visual and auditory cortex until about the same
age that self-aware agency appears and with it, per hypothesis, growth points differen-
tiated from fields of oppositions. Everything (right hemisphere, left posterior hemi-
sphere, frontal cortex) converges on the left anterior hemisphere, specifically Broca’s
area, and the circuits specialized there for action orchestration. Broca’s area may
also be the location of two other aspects of the imagery-language dialectic. At least,
such would be convenient – or more than convenient: a study of Ben-Shachar, Hendler,
Kahn, et al. 2003 shows an fMRI response in Broca’s area for grammaticality judg-
ments. Grammaticality judgments, obviously, depend on linguistic intuitions. The
same study found verb complexity registering in the left posterior superior temporal
sulcus area. In the brain model, the left posterior area provides the categorial content
of growth points – the generation of further meanings via constructions and their
semantic frames, and intuitions of formal completeness to provide dialectic “stop
orders”. All of these areas are “language centers” of the brain in this model.
The brain model also explains why the language centers of the brain have classically
been regarded as limited to two important patches – Wernicke’s and Broca’s areas. If the
model is on the right track, contextual background information must be present to acti-
vate the broad spectrum of brain regions that the model describes. Single words and de-
contextualized stimuli of the kinds employed in most neuropsychological tests would not
engage the full system; they would be limited to activating only a small part of it.
7. Whence syntax?
The origin and natural selection of static dimension features has been a shining light on
the hill to some, bugbear to others, ever since Lenneberg (1967) posited a biological
capacity for language, and Chomsky (1965) applied the idea to syntax. The emergence
sui generis of morphs (McNeill and Sowa 2011) and syntagmatic values (Gershkoff-
Stowe and Goldin-Meadow 2002) in gestures when speech is denied suggests, at a
broad level, an urge to divide and organize meaningful symbols syntactically. In this sec-
tion, we consider the role that Mead’s Loop could have played in laying a basis for this
push to syntax and what, given this role, it would consist of.
7.1. The bioprogram

The topic brings us face to face with the question of the bioprogram for syntax (to use
Bickerton’s term 1990), a hypothesis that has been a hotly disputed battlefield of claims
and counter-claims. On one side are “universalists” who seek universal, possibly innate
and evolved language structures. On the other are many who dispute the entire
approach (anthropologists and developmental psychologists seem especially well repre-
sented), seeking diversity rather than universality (see papers in Bowerman and Brown
2008 for many excellent examples of this counter-position). Susan Goldin-Meadow

(2003), focusing on features that emerge in children’s language despite severe limita-
tions on input due to deafness, has found an intermediate position that refers to the
“resilient” properties of language in the untutored, spontaneous signs and simple gram-
mars that small deaf children invent who are growing up in hearing, nonsigning house-
holds. Cognitive linguistics seemingly rejects the idea of an innate syntax, while seeking
universal roots in cognition, an innovative reaction to much of the formalism and con-
ceptual isolationism of the universalist school. I hope to strike here the much-offered
but rarely-achieved “reasonable balance”. No dogmatic statements on either side.
I aim, however, perhaps not so reasonably, to adopt an original position that stems
from what seemingly none of the linguistic warriors has considered, although it is an
idea that has been available for some decades (Browman and Goldstein 1990), namely:
syntax is first and foremost a system of organizing speech and manual-gestural actions.
The gist of this section is to conclude that the bioprogram, as we construct it, originated
in actions, but language altered these actions internally and created, in effect, a new
form of action that exists only within language. There are source actions of basically
two kinds: a) vegetative (food intake, and breathing) and b) practical (manual) actions.
Both kinds of action were fundamentally altered from their original forms by Mead’s
Loop so that they could participate in growth points, and in this find some reason to
seek universal patterns.
7.2. Shareability
Mead’s Loop explains one step at the origin of language. It enabled gestures to orches-
trate actions both manual and vocal with significances other than the actions them-
selves. Additionally, Mead’s Loop, where gesture assumes the guise of a social other,
plants in the brain the seed of what Freyd (1983) in her innovative paper called Share-
ability – constraints on information that arise because it must be shared; constraints
because:
It is easier for a individual to agree with another individual about the meaning a new
“term” (or other shared concept) if that term can be described by: (a) some small set of
the much larger set of dimensions upon which things vary; and (b) some small set of dimen-
sional values (or binary values as on a specific feature dimension). Thus, terms are likely to
be defined by the presence of certain features. (Freyd 1983: 197, italics in original)
Shareability produces discreteness, repeatability, and portability – the linguistic semi-

otic within growth points opposed to the global and synthetic semiotic of imagery. In
her concluding footnote, Freyd speculates that shareability may be relevant to the in-
trapsychic workings of individual minds as well as to the interpsychic relations between
individuals (as shareability primarily asserts). She introduces the concept of communi-
cating intrapsychic modules to envision this, but seems herself uncomfortable with this
move. Mead’s Loop provides a place for shareability (hence discreteness, etc.) without
modularity. The virtual otherness of gesture does it. Thus, given Mead’s Loop and the
reconfiguration of Broca’s Area it caused, a selection pressure arises to find a compan-
ion static system with the property of shareability to fulfill the dual semiotic of the
growth point.
“Social-fact” or linguistic encodings arise in the act of sharing information, creating

in this process a “discreteness filter” such that the semiotic properties of segmentation
and combination arise automatically. In growth points, such encodings (initially simple)
already interlock with imagery. Sentences, whatever their complexity, stabilize growth
points by adding information because selection for shareability would have led to mor-
phology (discreteness) and syntagmatic values (combinations of morphs), both making
langue-like analytic decompositions of information and semiotic oppositions of imagery
in growth points possible. So in Mead’s Loop, the orchestration provided by gesture
imagery would be paired, from the start, with static structure forms of some kind.
7.3. Syntax as action

In a growth point, every utterance is plurifunctional. This was part of the origin of lan-
guage, with a new form of human action. The functions are apportioned to language
(the combinatoric-analytic semiotic that produces sequences of awareness, see our
Wundt quote in No. 45, section 7) and gesture (the imagery semiotic, which produces
holistic awareness and the overall schema of the speech action); the two functions
are, as Wundt described them, always present – the gesture the action pulse, the linguis-
tic encoding the micro-movements orchestrated within the gesture. We now consider
the linguistic orchestration in this package.
Mead’s Loop sees syntax in terms of action-orchestration, a socio-culturally predict-
able way of moving the vocal apparatus, the hands, posture, etc. so that they can be di-
rected and maintained over time and encounters (see Browman and Goldstein 1990;
MacNeilage and Davis 2005). The discovery of the FOXP2 gene points to the centrality
of action control at the foundation of language. It is not a “language gene” but appears
to code for a transcription factor affecting the expression of possibly many other genes
(see Konopka, Bomar, Winden, et al. 2009). The mutation in the KE family that led to
its discovery affects fine motor control, speech articulation and other actions, as well as
syntax, which, we are saying here, is a culturally maintained action schema. As a gene
affecting fine-tuned action control, it would influence the raw material on which Mead’s
Loop and its new form of action worked, but something else genetically was probably
the Mead’s Loop innovation itself (For an accessible and clarifying discussion of the
FOXP2 discovery, see MacAndrew http://www.evolutionpages.com/FOXP2_language.
htm, accessed 02/13/09).
Syntax then, in this perspective, comprises action patterns that have been established
and carried forward in a given historical/cultural tradition. Considering actions to be the
primary linguistic medium of evolution (perception having its own course), with sound
the external emblem, syntax and other langue patterns can be regarded as the means for
orchestrating the oral-laryngeal-manual-corporal (including whole-body) actions of
speakers. Syntax has other functions, in our lights most crucially, as a system of unpack-
ings usable in different contexts of speaking, including with strangers, without having
each time to construct a system anew. The growth point as action orchestrator explains
what Susan Duncan has called production “pulses,” pulses being foci of effort and cog-
nitively and verbally the next-forward steps. There are three pulses (at least), which
align in time – speech, centered on a prosodic peak; gesture, centered on a gesture
stroke; and significance, centered on a point of newsworthy content in context – all
co-expressive of the same idea unit. Gestures were critical for bringing this triple
orchestration under control. The growth point underlies all three pulses; in fact it ex-
plains the three and their time alignment in terms of this one pulse, the growth point
itself.
7.4. Two action theories

It seems likely that human language therefore emerged out of human actions. The
Mead’s Loop hypothesis explains how these actions were changed as a result of lan-
guage origin. I consider here two theories of the actions that could have been modified
this way. Peter MacNeilage in his 2008 book traces the evolution of speech to jaw move-
ments during the ingestion of food. With phonation, this basic maneuver creates conso-
nant-vowel syllables from mandible closings and openings. The structures thus formed
are called frames (manifestations of which occur in infant babbling). Subsequent varia-
tions lead to full syllable repertoires, and at this point are called “frames with content” –
specific tunings of how, to what extent and for how long, etc. the opening/closing
movements occur (the theory thus assumes that ontogenesis recapitulates this aspect
of phylogenesis). Mead’s Loop adds to this explanation a bridge from mandible move-
ments, significant at first as ingestion, to actions orchestrated symbolically. Frame/con-
tent no longer has its basic significance as opening/closing the jaw, but is organized by
significances with no necessary relationship to those actions, and is now orchestrated by
the phonotactic/lexical patterns of a specific language. And Mead’s Loop says that man-
ual gestures were crucial in the merging of jaw movements and meanings, making
speech itself possible.
Careful experiments by Sahin et al (2009) reveal some of the features of originally
vegetative actions as reconstituted as linguistic actions. To describe the experiments
in MacNeilage’s terms, frames add bursts of action foci to mandible openings/closings
as they are integrated into sequences defined linguistically. The approach devised by
Sahin et al (2009) seems promising for discovering how imagery may influence motor
orchestration. For example, imagery loaded with plurality or perfectivity (the “gram-
matical” pulse in their experiment), could cause grammatical coding, when it is part
of a growth point rather than the unpacking, to occur sooner in their observed sequence
of “distinct neuronal activity for lexical (~200 milliseconds), grammatical (~320 millise-
conds), and phonological (~450 milliseconds) processing, identically for nouns and verbs,
in a region activated in the same patients and task in functional magnetic resonance
imaging” (Sahin, Pinker, Cash, et al. 2009: 445).
We can add further insight from Adam Kendon’s description of gestures as ritualized
actions:
I argue that the manual actions that are coordinated with speech that we see in modern
humans, are derived from forms of practical action. That is, they are “ritualized” versions
of grasping, reaching, holding, manipulating, and so forth. Given their intimate connection
with speaking, I make a speculative leap, and argue that speaking itself is derived from
practical actions. Acts of utterance are to be understood, ultimately, as derived versions
of action system ensembles mobilized to achieve practical consequences. The origin of lan-
guage, that is to say, derives by a process of “symbolic transformation” of forms of action
that were not communicatively specialized to begin with. Speech and associated gestur-
ing does not descend from communicative vocalizations and visible displays but from
manipulatory activities of a practical sort. (Kendon 2008a: introduction)
This action hypothesis also leaves an explanatory gap that Mead’s Loop fills. It cannot
explain speech and gesture as new kinds of action in the human domain – the “process
of ‘symbolic transformation’ ” is unexplained and seems to presuppose such a process.
As with MacNeilage, there is no incompatibility but the growth point and Mead’s Loop
complete the picture. Practical actions cease to be actions of that kind and become, some-
how, actions of a new kind. The human brain evolved a “symbolic transformation” with
which to orchestrate manual and vocal actions by significances other than those of the
actions themselves. The natural selection of Mead’s Loop explains how this happened.
Without Mead’s Loop, the significance of the action remains that of the original action.
It is not clear what “ritualization” as an explanation is meant to cover exactly, but if
it is streamlining the forms of actions and making them smaller, there is this gap that
Mead’s Loop fills. Alternatively, “ritualization” could be the new significance orches-
trating the action, and then it is another name for Mead’s Loop – the “speculative
leap” – explaining the selective mechanism. In any case, once Mead’s Loop was estab-
lished, manual actions enter into dialectics with the co-evolving linguistic encodings
and thus fundamentally changed their character from their instrumental (“practical”)
action originals.
7.5. Scenarios for syntax

A situation in which selection pressure for shareability arises is when creatures with
some communicative potential encounter others with some other communicative
potential, and there is pressure to coordinate or distinguish action schemes. This in-
vokes shareability, the analytic decomposition of growth points. It also leads to the cre-
ation of form standards that enable speakers to recognize others as either like or
different from themselves in orchestrating actions. “The Encounter” has different
emphases depending on whether it is inter- or intragroup. The latter, intragroup encoun-
ter, would include adults interacting with each other as well as with their own and other
people’s infants, the situation in which Mead’s Loop is selected. This process has been
studied in the unique situation of an emerging sign language. Here language changes.
The intergroup encounter, on the other hand, could favor the opposite, shibboleths
and other discrimination forms to set the other apart and conserve one’s own way of
acting. Other-discrimination protects and fortifies the cultural action-schemes of a lan-
gue. Intergroup encounters lead to change and to resistance to change, and we can look
at both in currently observable settings.
7.5.1. Nicaraguan sign language

In the late 1970s the first ever school exclusively for the deaf was established in Nica-
ragua. Children who previously had lived in isolated communities in rural areas with no
contact with other deaf persons or sign language came together for the first time. This
intragroup encounter was the setting for a remarkable new language whose emergence
has been documented on video by the efforts of dedicated linguists and psycholinguists –
Judy Kegl (1994), Ann Senghas and Marie Coppola (2001), among others. As the lan-
guage developed over successive generations its character changed and moved ever
closer to patterns that also appear in established sign languages of the world (without,
however, any significant input from those sign languages). The process is a model of the
first stages of language emergence in its unfolding of shareable information. To quote

one of the researchers, Ann Senghas (2003):
[N]ew form-function mappings arise among children who functionally differentiate pre-
viously equivalent forms. The new mappings are then acquired by their age peers (who
are also children), and by subsequent generations of children who learn the language,
but not by adult contemporaries. As a result, language emergence is characterized by a
convergence on form within each age cohort, and a mismatch in form from one age cohort
to the cohort that follows. In this way, each age cohort, in sequence, transforms the lan-
guage environment for the next, enabling each new cohort of learners to develop further
than its predecessors. (Senghas 2003: 511)
The importance of the cohort, and the localization of each successive version of the sign
language within an age peer group shows the role of the encounter in the creation of
languages and language drift generally. The successive changes are systematic, not ran-
dom, and show the importance of shareability – as the language changed over succes-
sive generations it developed the ever greater analysis of form-function mappings
that Senghas describes.
7.5.2. Psycho-Babel: The spread and diversification of languages

In the Biblical story, the Tower of Babel was a construction so arrogantly tall that the
Deity was provoked to scatter mankind into a confusion of languages; before the disas-
ter all people lived harmoniously and spoke one and the same language. If we take the
Biblical story as a parable of migration it is not as far-fetched as one might suppose. The
insight is that migration breeds diversification; the further the migration and the more
encounters it produces, the greater the diversification.
We can explain this on the basis of the Mead’s Loop and the growth point. What
happens within individual speakers when languages spread? I mean at the level of
the individual brain and psychology; the spread of Indo-European would have included
this process whatever it is. Modern opinion ties migration to the invention and spread of
agriculture starting about 11400 years ago, but creatures with presumably a fully mod-
ern microanatomy (if not anything like modern cultures) were already dispersing over
the range eventually inhabited by modern-day Indo-European (IE) speakers and
beyond to Southeast Asia and modern Australia as well as northward, and were subject
to displacements or assimilations by migrating others.
Given the brain model, is there anything we can deduce of what might be a present-
day relic of the first orchestrations that Homo Sapiens would have created as people
moved and bumped into each other? Is such a hypothesis even conceivable? I no
doubt too incautiously put forth ideas for the emergence of three pieces:
(i) Mapping meanings onto temporal orders seems primitive – the thought-language-
hand link provides a new way to orchestrate movements, and these movements
take place in time. Thus, a language in which meanings map onto time alone is
one candidate for early linguistic form. More derived are morphological complexes
that dissociate meaning from temporal order. Other discourse and social forces
then are free to orchestrate vocal and manual actions. It is perhaps no mistake
that languages spoken in the least accessible areas, thus taking the longest to
reach, such as the arctic, are elaborate and distant from the presumed mapping
through action orchestrations of meanings onto time.
(ii) Combinations of constructions likewise would be coordinated through temporal
loci – the embedded parts temporally isolated from the embedding parts, which
in turn are continuous in time except for other embeddings. This creates levels
but always holds the temporal sequences together. Thus, a language in which recur-
sion is handled by holding pieces in memory until an embedded piece is complete
is another candidate for an early form. Notice that both the recursion and the basic
mapping mechanisms assume that the creatures employing them have some way to
know when some form is “complete” or “incomplete” – that is, have standards of
form. So something like a system of langue was in place.
(iii) Similarly, the spatial zone of Indo-European is the area through which migrants
would have passed, hence is likely to be covered with relic features. Glacial condi-
tions would have blocked paths to the north but east-west extensions were open.
Shields (1982), citing earlier authorities, proposed that the earliest form of Indo-
European was isolating, like Chinese (also very early), which implies a time-
sequenced base for grammatical meaning – in terms of the hypothesis, orchestrations
without further superstructure.
A simple hypothesis would be that the Indo-European type is one of the least elabo-
rated. From geography, it would have been the initial area to be entered, hence
would have the greatest likelihood of retaining time-based orchestrations. More outly-
ing areas – the arctic, the southern ocean, the New World – would have developed more
adorned languages. This makes sense from the dispersion viewpoint, since these remote
regions would have been the last to be reached through migrations; the indigenous
people they encountered on the way are long gone but the impacts are still visible
and long-distance migrants will have absorbed the greatest number of them. Each
encounter creates pressure to stabilize the dynamic of language and thought and this
presumably tends to add embellishments. Perhaps there is some such correlation –
more remote, more departure from temporal sequencing. If this sounds too much
like English to be true, the reasons for the deduction are not English but considerations
of possible brain mechanisms. Are the languages with the greatest time depth in the orig-
inal eastward migration route also ones with the most temporally sequenced relations?
The Sapiran division of languages according to how they combine meanings into single
words – analytic or isolating (e.g., Chinese), synthetic (e.g., Latin), polysynthetic (e.g.,
Inuit) (Sapir 1921: 128), may reflect degrees of adornment of the basic brain orchestration
plans. There are of course all kinds of possible complications of this simple pattern. Mul-
tiple waves of migration and changes from other sources can induce changes on top of
existing layers. Indo-European itself presumably was altered to become inflecting after
the first migrations. Recent DNA analyses (Science news item, issue of 4 Sept. 2009,
p. 1189, vol. 325) suggest that the first farmers in Europe were migrants, replacing indig-
enous hunter-gatherers, and modern European populations descended from these mi-
grants; so further, per the hypothesis, altering the language plan as well. There must
also be other forces changing languages that might move them in random directions, fur-
ther camouflaging ancestral forms. But the main point is that many hypotheses concern-
ing the dispersal of language – the Tower of Babel story – can be traced to the effects of
migration encounters on the mechanisms of orchestration as described by Mead’s Loop.
7.5.3. Language loyalty – what it is (and why it persists)

A related theme and predictable consequence of language divergence is personal iden-
tification with one’s own language. Given divergence, intergroup contacts provoke de-
fenses. In this identification, language seems to be not just “language” but in some
fashion part of one’s being. While there must be many reasons for language identifica-
tion or loyalty, including historical, political, religious and ideological reasons, it also is a
product of how language originated and later developed. The “I”/“me” inhabitance
built into the growth point provides a foundational linkage of one’s self of sense to
language. All these features make for “language loyalty”, which from this perspective
is better termed language inhabitance, with the result being to spot others whose
mode of being in language may be like or unlike one’s own, and to see – but a short
step – difference as a threat.
The phenomenon (built into us by Mead’s Loop) was perhaps crucial for group sur-
vival but it also is a source of evils – shibboleths, bigotry, and linguo-xenophobia are
among its diseased forms. Far more benignly, 5˜6 year-olds, at the time of emerging
self-awareness of agency but before politics and ideology if not religion, consider it
more likely that an English speaking child they are introduced to will grow up to be
an adult of a different race still speaking English, than an adult of the same race speak-
ing a different language (Kinzler and Dautel 2012), suggesting both a charmingly low
priority for race and language inhabitance by age 5˜6. What of native bilinguals? Are
they subject to a divided inhabitance? In a sense, yes: informants tell me (Pamela Per-
niss, native in English and German) that, to themselves, they seem to be either “in” one
language or the other and whichever language it is, the language appears to them a
whole. There is inhabitance in two modes but each is experienced singly, one at a
time (paradoxically, then, translation from one language to the other is difficult: it re-
quires shifting entire modes of being). The same principle applied to successive bilingu-
alism (the more typical kind, the “second” language being acquired second) can result
in the new language supplanting the first as the mode of being. And people who master
a prestige dialect can find their being irreversibly altered (the Eliza Doolittle effect),
not for show or because they are rejected by their former milieu but because their
mode of being has shifted (for a vivid description of this kind of experience, see the
introductory section of “Speaking in Tongues”, by Zadie Smith 2009: 41–44). These ef-
fects, according to Mead’s Loop, are products of how language itself came to be and its
connection to action and modes of being.
7.6. Language-trained animals in the light of Mead’s Loop

In chimps and bonobos we glimpse the groundwork but not the presence of Mead’s
Loop, both the beginning of linkages of hand movements with vocalizations and an
emerging power to orchestrate sequences of movements under significances other
than those of the actions themselves. All of this is what selection for Mead’s Loop
with its “twist”, its use of gesture imagery as a production unit, its inherent social ref-
erence, and its incorporation of “lived experience”, could have worked on at the origin;
but it is not yet Mead’s Loop or language.
My approach to the ape language studies is to seek in them possible precursors to
language. The logic is: whatever their ape filter retains of the human-designed languages
to which they have been exposed corresponds possibly to a precursor. Steven Pinker
famously pooh-poohed the ape-language studies as “trained animal acts” (1994).
While the communicative miracle of Kanzi and others is not “human language”,
Pinker’s sound-bite masks the precursors of human language revealed in these studies.
Perhaps the most significant language trained animal-human natural language differ-
ence stems ultimately from a lack of Mead’s Loop and lack as a result of growth points,
and hence – a significant difference – no unpacking process, despite having sequences of
signs of some kind. When Washoe or Kanzi outputs strings of signs or button presses,
there is nothing corresponding to the unpacking of growth points that is an inherent
part of the human engagement of thought with language. This unpacking of growth
points seems a cognitive innovation on the evolutionary stage, a trait others do not pos-
sess. Unpacking gives rise to syntax and could be a pattern of human cognition more
broadly, beyond the sentence. Chimps can learn to read Arabic numbers and recite
them (in the form of button presses) but they show no ability to go farther, for example,
to use their knowledge to actually count. Similarly, chimps who have received language
from their experimenters do not enlarge their universe of knowledge with it. I see in this
non-expansion of cognition striking evidence that a language-engendered mode of cog-
nition seems absent from the chimp’s mental world. The chimp brain is locked out of it,
for all its powers of alert attention and inference.
Beginning with the Gardners’ (1969, 1971) attempt to teach American Sign Lan-
guage (or something like it) to Washoe, the direction of thinking about the potential
of non-human primates for acquiring aspects of language entered a new phase. Whereas
earlier attempts with chimpanzees had emphasized speech and failed to elicit anything
like it, the Gardners reasoned that the problem was a peripheral limitation of the chim-
panzee vocal tract, and that the animals could master a different mode of language,
manual sign language. Washoe did indeed learn a number of signs, and spontaneously
began stringing them together to form complex references. However, the absence of
speech is significant, for it shows in the chimp that discovering new patterns of action
orchestration in language does not occur despite exposure to human speech. Nonethe-
less, sign sequences might be an alternate kind of action orchestration and, from a
Mead’s Loop vantage point, be important for what they do and do not exhibit.
Some years back (McNeill 1974), long before I perceived anything of the existence of
Mead’s Loop, I collected all the examples of Washoe’s sign sequences that I could glean
from various lists and examples the Gardners had cited, and concluded that she did
indeed produce sequences in certain regular ways (as well as producing numerous
signs themselves). I found three orders (two of which the Gardners had identified as
well). After eliminating various redundancies and sequences due to mechanical con-
straints, they boiled down to one pattern that Washoe truly followed; this was
Addressee-Action-Non-addressee. I wrote of this sequence: “The chimpanzee may
therefore have imposed her own formula on the sentence structures she observed her
handlers using. Washoe’s formula does not capture what the handlers themselves en-
coded (agent, action, recipient), but instead emphasizes a novel relationship as far as
grammatical form is concerned, that of an interpersonal or social interaction
(addressee-non-addressee)” (McNeill 1974: 83).
The Addressee-Action-Non-Addressee order, covers social reference – the same
area as Mead’s Loop – but is based directly on social interaction. Basically,
Addressee-Action-Non-Addressee is mimicry, usually one featuring Washoe’s begging
or demanding something from another. Washoe and her handlers could see the same
sequences of signs yet relate them to different underlying meanings, addressee rather
than agent, etc. So for Washoe, meaning packages orchestrating actions are those of
the interacting parties, not those of actional imagery. This points out both a similarity
and a crucial inter-species difference that could set one sort of stage for Mead’s
Loop. The same action sequences covered by non-Mead’s Loop and Mead’s Loop
minds could have offered the creature we are attempting to reconstruct a bridge. An
ancestral creature in the right family milieu, with this chimp-like capability, could
start to use begging in totally new ways, once selection pressures for Mead’s Loop
came to be felt in interacting with infants, as described earlier.
In keeping with an absence of Mead’s Loop, Washoe and other signing apes, as far
as I know, never perform gestures with their signs, as human signers do (Duncan 2005,
Liddell 2003, and Bechter 2009). This, despite the fact that non-signing apes gesture
and vocalize concurrently and show a right hand dominance when they do (Hopkins
and Cantero 2003). This vocalization and hand dominance is another precursor but
not yet language.
Kanzi, one of the celebrated bonobo subjects of Sue Savage-Rumbaugh’s language
learning experiments (Savage-Rumbaugh, Shanker, and Taylor 1998), was highly suc-
cessful at learning an artificial system of communication utilizing a keyboard with but-
tons corresponding to lexical words (each button with a distinctive visual sign). At first
glance, there is nothing in this performance that points to Mead’s Loop. The motions of
Kanzi’s hands (pressing buttons) are not re-orchestrated movements, as for us are
speech (originally vegetative) and gesture (originally manipulative); they remain press-
ing-down movements. But there are also, in the videos of Kanzi that I have seen, rapid
sequences of different key presses in which orchestrations of orders seem present under
communicative significances. Re-orchestration is not a full explanation of what Kanzi
may have achieved, but it does suggest that another precursor of Mead’s Loop could
have existed in our last common ancestor with the chimpanzee line, namely an ability
to orchestrate sequences of movements (the hands only) by meanings. Kanzi also shows
an ability to understand spoken English commands. However, this offers little clue to
where he stands regarding Mead’s Loop. Comprehension of speech is so multifaceted
and language itself so varied a factor in it, that it is difficult to conclude anything
very definite from these feats regarding precursors. The potential role of multiple fac-
tors seems particularly germane to Kanzi’s spoken English comprehension. While the
language comprehension tests Savage-Rumbaugh, Shanker, and Taylor (1998) con-
ducted excluded extralinguistic cues that could have steered Kanzi to correct choices
(such as the human handler’s gaze), we nonetheless see in some cases, in the videos,
Kanzi reaching for the correct item in advance of hearing the critical word identifying
it, so other cues seemingly were available to him despite all precautions – we know nei-
ther what nor how many. Given the multifaceted character of speech comprehension,
this is perhaps not too surprising even with conscientious control efforts. However,
there is no indication whatsoever that Kanzi uses imagery as the orchestration medium,
so Mead’s Loop is remote from the chimp (here, a bonobo) on this point as well. The
general realization that nonhuman primates are unable to orchestrate speech sounds (as
has been documented now for many decades, and was indeed part of the original moti-
vation for using sign language and a keyboard) shows the other limit of the chimp brain
for language: it’s not just that ape mouth-anatomy differs (a higher larynx, for one
thing), it is that apes do not have the brain circuitry to orchestrate these mouth-part
movements around gesture imagery, nor the circuitry to use imagery to do the orches-
trating. So Mead’s Loop would have no equivalent in their brain functioning, even
though precursors may exist, as Kanzi’s fluent sequential button-pressing and Washoe’s
sign sequences suggest.
A limit on orchestration doesn’t apply to Alex, the parrot, who apparently was able
to orchestrate actions of the syrinx and possibly tongue to produce speech-like sounds
(Pepperberg 2008), but the last common ancestor of humans and parrots is so far back
that this achievement can shed no light on human evolution and Mead’s Loop, espe-
cially given its apparent total absence in many other creatures also descended on the
mammalian side (cows, for example). The similarities are perhaps due instead to
some kind of convergent evolution, possibly tied to mimicry tendencies, shared by
parrots and humans (but, complicating this explanation, equally shared by apes).
Adam Kendon in 1991, summarizing several chimpanzee studies, wrote as follows:
Evidently, then, chimpanzees in wild and semi-wild conditions refer each other to fea-
tures of the environment by means of a sort of eye and body pointing, they do sometimes
give evidence of partially acting out possible, rather than actual courses of action, they
are able to grasp the nature of the information their own behaviour provides to others
and to modify it accordingly if it suits their purposes, and in respect to some kinds of ges-
tural usage, as we have seen, they are able to employ them in new contexts with new
meanings [referring here to attested cases by infants lifting their arms overhead as a sig-
nal of non-aggression, based on a natural postural adjustment to being groomed]. They
are on the edge of developing a use of gesture that is representational. The studies in
which chimpanzees have been taught symbolic usage, whether of gestures or of keyboard
designs, not only confirm the cognitive capacities that these observational studies imply
but also show that chimpanzees can use behaviour representationally if shown how.
(Kendon 1991: 212)
“Chimpanzees, then, seem on the verge of developing a language, yet they have not
done so. What is missing? What holds them back?” (Kendon 1991: 212).
Kendon proposes an answer, that chimpanzee social life is full of “parallel actions”
but has little in the way of collaboration. This is in line with Mead’s Loop. Chimpanzees
have not experienced the pressure, as we imagine early humans did, to develop Mead’s
Loop and growth points, and the discreteness, repeatability, and portability foundations
of a syntax of action orchestration with which to stabilize them. And this is why we
stand alone.
7.7. Summary of the static dimension

The gist of this section has been that much of the static dimension is cultural and his-
torical, shaped over time by intragroup and intergroup encounters, especially during mi-
grations. Important aspects of the static dimension – morphs, syntagmatic values out of
combinations, constructions and embeddings – are products of shareability, the form of
information decomposition and standardization that arises when information must be
shared. These static dimension aspects are candidate elements of the linguistic “bio-
program”, and appear now in experiments where new morphs and syntagmatic values
emerge spontaneously and immediately, without models.
8. Origin and ontogenesis, and the return of “gesture-first”

In this concluding section we consider language ontogenesis for the light it can shed on
phylogenesis, assuming that ontogeny in part recapitulates phylogeny. Traces of three
distinct evolutions, now combined but separate in origin, seem discernible this way –
a residue of a “gesture-first”, a syntactic age, and last the dawn of the growth point
and Mead’s Loop. From the viewpoint of Mead’s Loop, these three distinct stages or
“ages” of ontogenetic development can be discerned as recapitulations of phylogenetic
steps at the origin of language. It is here that “gesture-first” may have limited validity.
There was not one origin of language but two at least. One stage is gesture-first with
an alternation of gesture with vocalization, and speech-gesture combinations that are
not co-expressive but make semantic packages that later speech supplants. The gestures
are mostly pointing and pantomime. A “dark age” follows, a transition in keeping with
gesture-first where speech supplants gesture and its output drops. Pantomime and
pointing continue to dominate but at the same time, thanks to multiple pathways of on-
togenesis by what is, after all, a fully evolved human infant, in an un-gesture-first way
speech also is orchestrated by simple constructions. Finally Mead’s Loop activates and the
growth point emerges, and “gesture-first” lands as expected at the pantomime spot on the
Gesture Continuum. So, in sequence: from age 1 to 2 roughly, pantomime and scaffolding
( pace Arbib); from age 2 to 3 or 4 roughly, a transitional “dark age”; and from age 3 or 4
to school age and beyond, language-thought-combinations. In this scenario, gesture-first
creates what was predicted for it. The “dark age” transition is out of reach for gesture-
first and exists as a kind of limbo before the advent of Mead’s Loop and growth point,
but is the dawn of patterned speech reflecting the evolved language acquisition abilities
of current-day children, in particular the possible evolution of a capacity for construction
formation as outlined earlier. An ontogenetic overlap of separate evolutions is not sur-
prising, and it is one reason why the ontogeny-recapitulates-phylogeny pattern cannot
be a general explanation of ontogenesis and applies only selectively.
The “age of pantomime and scaffolding” is dominated by two kinds of gesture, pan-
tomime and pointing. Both have distinct timing relationships with speech that separate
them from the later emergence of co-expressive, speech-synchronized, dual semiotic
growth point gestures. This could be the ontogenetic recapitulation of a “gesture-
first” stage of phylogenesis – if there was such a stage, it has its echo in today’s child
language development. And it is limited, as gesture-first itself would have been, and
does not lead to the growth point. It yields pantomime and later speech and gesture
in alternation; and then speech supplants gesture, again as the gesture-first theory
assumes, and gestures drop off then in frequency.
Pantomime and pointing alternate with speech in children below age 2 to form
meaningful combinations (Butcher and Goldin-Meadow 2000; Goldin-Meadow and
Butcher 2003). Speech and gesture are not synchronized and gesture and vocalization
order varies. These gestures appear to “scaffold” speech (à la Michael Arbib). For
example, Goldin-Meadow/Butcher found that speech-gesture combinations foreshadow
the child’s initial word-word combinations, which appear a few weeks later with the
same semantic relationships. A child who pointed at an object and said “go” would, a cou-
ple of weeks later, produce word-word combinations with “go” plus object names. The
early gesture-word combinations cover a range of semantic relations: “open” + points
a drawer, “out” + holds up toy bag, “hot” + points at furnace, “no” + points at box,
“monster” + two vertical palms spread apart (=big) (Goldin-Meadow and Butcher 2003,
Table 3). There is some uncertainty over how universal any such single pattern is. The
review of early child language development of more than 50 languages from around the
world in Bowerman and Brown (2008) makes clear that there are various ways of en-
tering this domain of linguistic form, partially summarized by “bootstrapping”. This en-
compasses semantic and syntactic bootstrapping: these are regarded as rival claims
about infant predispositions for the forms of language, although in fact they seem
closely related, emphasizing one side or the other of the linguistic sign, signifier or sig-
nified. Bootstrapping from either is expected to lead to other areas of language. One
hypothesis holds that certain semantic signified patterns evolved – such as actor-
action-recipient, from Pinker (1989); the other that syntactic signifier patterns did –
such as subject-verb-object, from Gleitman (1990); both are said to provide entrée to
the rest of language. Of course, both or neither, as Slobin (2009) suggests in his review,
may have evolved – the picture is murky, to say the least. Other languages offer obsta-
cles and routes into the static forms of their language tailored to local features, and this
is not surprising if what is being developed at this stage are not only static structures but
action templates – how to organize motions of the vocal apparatus like those of
grownups.
“Dark age” transition. “Dark” because so little is known of gesture performance
once patterned speech starts, and also because it appears to be an age at which syntax
relates to thought not via inhabitance, as later, but as a kind of elaborated word learning
(constructions being like “words” with internal slots into which other words can insert;
see Goldberg 1995). Constructions can run off as denotative references and shared per-
formances with adults (Werner and Kaplan 1963) but not yet as modes of thinking for
speaking. Gestures predictably then appear to decline in frequency. I imagine, they are
largely pantomimic, but this is not actually known. Looking at the linguistic record, co-
piously described in contrast, the child appears to be engaged in building up action tem-
plates. The first 3 years roughly, for the child, is devoted to the fascinating exercise of
doing as adults do vocally. But it is not yet the cognition of integrated gesture-speech;
that is yet to come.
The “age of language-thought combinations” is the start of the growth point and the
effects of Mead’s Loop. This “age” commences not earlier than age 3˜4, and is linked to
the development of the sense of self-aware agency to which the selection of Mead’s
Loop was tied, and which onsets in current-day ontogenesis at around age 4.
9. Conclusions
Three separate evolutions converge in one ontogenetic process – gesture-first, the origin of
syntax and the dynamic inhabitance of language the growth point provides. The new age
of speculation regarding the old age of language origin may have reached a level where
scientific hypotheses and falsifiability tests can be formulated. “Gesture-first” is such a
hypothesis. It is falsifiable in the Popper sense and is false as a whole, because it cannot
account for the current-day observed combination of speech and gesture, but it may reap-
pear in the earliest stages of current-day child language acquisition, which retraces the
steps in language origin, gesture-first, the syntax origin (which comes “too early” now, pro-
ducing the “dark age” of constructions without presumed inhabitance), and the emer-
gence of the growth point and this inhabitance with the arrival of self-aware agency.
10. References
ary framework for neurolinguistics. Behavioral and Brain Sciences 28: 105–124.
Armstrong, David and Sherman Wilcox 2007. The Gestural Origin of Language. Oxford: Oxford
University Press.
Bechter, Frank Daniel 2009. Of deaf lives: Convert culture and the dialogic of ASL storytelling.
Unpublished Ph.D. Dissertation, University of Chicago.
Ben-Shachar, Michal, Talma Hendler, Itamar Kahn, Dafna Ben-Bashat and Yosef Grodzinsky
2003. The neural reality of syntactic transformations: Evidence from functional magnetic res-
onance imaging. Psychological Science 14: 433–440.
Bickerton, Derek 1990. Language and Species. Chicago: University of Chicago Press.
Bowerman, Melissa and Penelope Brown (eds.) 2008. Crosslinguistic Perspectives on Argument
Structure: Implications for Learnability. New York: Taylor and Francis.
Browman, Catherine P. and Louis Goldstein 1990. Tiers in articulatory phonology, with some im-
plications for casual speech. In: John Kingston and Mary E. Beckman (eds.), Papers in Labo-
ratory Phonology I: Between the Grammar and Physics of Speech, 341–376. Cambridge:
Butcher, Cynthia and Susan Goldin-Meadow 2000. Gesture and the transition from one- to two-
word speech: When hand and mouth come together. In: David McNeill (ed.), Language and
Gesture, 235–257. Cambridge: Cambridge University Press.
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys. New
York: Lawrence Erlbaum.
Technology Press.
Cohen, Akiba A. 1977. The communicative function of hand illustrators. Journal of Communica-
tion 27: 54–63.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Cambridge, MA: Har-
vard University Press.
Deacon, Terrence W. 1997. The Symbolic Species: The Co-evolution of Language and the Brain.
New York: Norton.
Dessalles, Jean-Louis 2008. From metonymy to syntax in the communication of events. Interaction
Studies 9(1): 51–65.
Duncan, Susan. 2005. Gesture in signing: A case study in Taiwan Sign Language. Language and
Linguistics 6: 279–318.
Emmorey, Karen, Helsa B. Borinstein and Robin Thompson 2005. Bimodal bilingualism: Code-
blending between spoken English and American Sign Language. In: James Cohen, Kara T.
McAlister, Kellie Rolstad and Jeff MacSwan (eds.), Proceedings of the 4th International Sym-
posium on Bilingualism, 663–673. Somerville, MA: Cascadilla Press.
Evans, Patrick D, Gilbert, Sandra L., Mekel-Bobrow, Nitzan, Vallender, Eric J., Anderson, Jeffrey
R., Vaez-Azizi, Leila M., Tishkoff, Sarah A., Hudson, Richard R. and Lahn, Bruce T. 2005. Mi-
crocephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science
309: 1717–1720.
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body-
Language-Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Fogassi, Leonardo and Pier Francesco Ferrari 2004. Mirror neurons, gestures and language evolu-
tion. Interaction Studies 5: 345–363.
Freyd, Jennifer J. 1983. Shareability: The social psychology of epistemology. Cognitive Science 7:
191–210.
Gardner, Beatrix T. and R. Allen Gardner 1971. Two-way communication with an infant chimpan-
zee. In: Allan Martin Schrier and Fred Stolnitz (eds.), Behavior of Nonhuman Primates, vol-
ume 4: 117–184. New York: Academic Press.
Gardner, R. Allen and Beatrix T. Gardner 1969. Teaching sign language to a chimpanzee. Science
168: 664–672.
Gentilucci, Maurizio, and Riccardo Dalla Volta 2007. The motor system and the relationship
between speech and gesture. Gesture 7: 159–177.
Gershkoff-Stowe, Lisa, and Susan Goldin-Meadow 2002. Is there a natural order for expressing
semantic relations? Cognitive Psychology 45(3): 375–412.
Gleitman, Lila 1990. The structural sources of verb meanings. Language Acquisition 1: 3–55.
Goldberg, Adele 1995. Constructions: A Construction Approach to Argument Structure. Chicago:
Goldin-Meadow, Susan 2003. The Resilience of Language: What Gesture Creation in Deaf Children
Can Tell Us about How All Children Learn Language. New York: Taylor and Francis.
Goldin-Meadow, Susan and Cynthia Butcher 2003. Pointing toward two-word speech in young
children. In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 85–
107. Mahwah, NJ: Erlbaum.
Goldin-Meadow, Susan, David McNeill and Jenny Singleton 1996. Silence is liberating: Removing
the handcuffs on grammatical expression in the manual modality. Psychological Review 103:
34–55.
Goren-Inbar, Naama, Nira Alperson, Mordechai E. Kislev, Orit Simchoni, Yoel Melamed, Adi
Ben-Nun and Ella Werker 2004. Evidence of hominid control of fire at Gesher Benot Ya’aqov,
Israel. Science 304: 725–727.
Harris, Roy and Taylor, Talbot J. 1989. Landmarks in Linguistic Thought: The Western Tradition
from Socrates to Saussure. New York: Routledge.
Hopkins, William D. and Monica Cantero 2003. From hand to mouth in the evolution of language:
The influence of vocal behavior on lateralized hand use in manual gestures by chimpanzees
(Pan troglodytes). Developmental Science 6: 55–61.
Hrdy, Sarah Blaffer 2009. Mothers and Others: The Evolutionary Origins of Mutual Understand-
ing. Cambridge, MA: Harvard University Press.
Hurley, Susan 1998. Consciousness in Action. Cambridge, MA: Harvard University Press.
Huttenlocher, Peter R. and Arun S. Dabholkar 1997. Developmental anatomy of prefrontal cor-
tex. In: Norman Krasnegor, G. Reid Lyon and Patricia S. Goldman-Rakic (eds.), Development
of the Prefrontal Cortex: Evolution, Neurobiology, and Behavior, 69–83. Baltimore: Brookes.
Kegl, Judy 1994. The Nicaraguan Sign Language Project: An overview. Signpost 7: 24–31.
Hague: Mouton.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man 26: 199–221.
Kendon, Adam 2008a. Review of Call and Tomasello (eds.) The gestural communication of apes
and monkeys. Gesture 8: 375–385.
Kendon, Adam 2008b. Some reflections of the relationship between ‘gesture’ and ‘sign’. Gesture 8:
348–366.
Kendon, Adam 2011. Some modern considerations for thinking about language evolution: A dis-
cussion of “The Evolution of Language by Tecumseh Fitch”. Public Journal of Semiotics 3(1):
79–108.
Kinzler, Katherine D. and Jocelyn B. Dautel 2012. Children’s essentialist reasoning about lan-
guage and race. Developmental Science 15(1): 131–138.
Konopka, Genevieve, Jamee M. Bomar, Kellen Winden, Giovanni Coppola, Zophonias O. Jonsson,
Fuying Gao, Sophia Peng, Todd M. Preuss, James A. Wohlschlegel and Daniel H. Geschwind
2009. Human-specific transcriptional regulation of CNS development genes by FOXP2. Nature
462: 213–218.
Lenneberg, Eric H. 1967. Biological Foundations of Language. New York: Wiley.
Liddell, Scott K. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge:
MacNeilage, Peter F. 2008. The Origin of Speech. Oxford: Oxford University Press.
MacNeilage, Peter F. and Barbara L. Davis 2005. The frame/content theory of evolution of speech:
A comparison with a gestural-origins alternative. Interaction Studies 6: 173–199.
Maestripieri, Dario 2007. Macachiavellian Intelligence: How Rhesus Macaques and Humans Have
Conquered the World. Chicago: University of Chicago Press.
McNeill, David 1974. Sentence structure in chimpanzee communication. In: Kevin Connolly and
Jerome Bruner (eds.), The Growth of Competence, 75–94. New York: Academic Press.
McNeill, David this volume. Gesture as a window onto mind and brain, and the relationship to
linguistic relativity and ontogenesis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
McNeill, David, Susan D. Duncan, Jonathan Cole, Shaun Gallagher and Bennett Bertenthal 2008.
Growth points from the very beginning. Interaction Studies (special issue on proto-language,
edited by Derek Bickerton and Michael Arbib) 9(1): 117–132.
McNeill, David and Claudia Sowa 2011. Birth of a morph. In: Gale Stam and Mika Ishino (eds.), In-
tegrating Gestures: The Interdisciplinary Nature of Gesture, 27–47. Amsterdam: John Benjamins.
Mead, George Herbert 1974. Mind, Self, and Society from the Standpoint of a Social Behaviorist
(C. W. Morris ed. and introduction). Chicago: University of Chicago Press.
Mufwene, Salikoko S. 2010. ‘Protolanguage’ and the evolution of linguistic diversity. In: Feng Shi
and Zhongwei Shen (eds.), The Joy of Research: A Festschrift for William S.-Y. Wang, 283–310.
Shanghai Jiaoyu Chubanshe: Education Press.
Pepperberg, Irene M. 2008. Alex and Me: How a Scientist and a Parrot Discovered a Hidden World
of Animal Intelligence–and Formed a Deep Bond in the Process. New York: Collins.
Pinker, Steven 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cam-
bridge: Massachusetts Institute of Technology Press.
Pinker, Steven 1994. The Language Instinct. New York: Harper Perennial.
Pollick, Amy S. and Frans B.M. de Waal 2007. Apes gestures and language evolution. Proceedings
of the National Academy of Sciences 104: 8184–8189.
ciences 21: 188–194.
Sahin, Ned T., Steven Pinker, Sydney S. Cash, Donald Schomer and Erik Halgren 2009. Sequential
processing of lexical, grammatical and phonological information within Broca’s Area. Science
326: 445–449.
Sapir, Edward 1921. Language: An Introduction to the Study of Speech. New York: Harcourt,
Brace and World.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Paris: Payot.
Savage-Rumbaugh, Sue, Stuart G. Shanker and Talbot J. Taylor 1998. Apes, Language, and the
Human Mind. New York: Oxford University Press.
Science Magazine 2008. Report on a conference on primate behavior and human universals. Issue
of 25 Jan. 318: 404–405.
Senghas, Ann 2003. Intergenerational influence and ontogenetic development in the emergence of
spatial grammar in Nicaraguan Sign Language. Cognitive Development 18: 511–531.
Senghas, Ann and Marie Coppola 2001. Children creating language: How Nicaraguan Sign Lan-
guage acquired a spatial grammar. Psychological Science 12: 323–328.
Shields, Kenneth, Jr. 1982. Indo-European Noun Inflection: A Developmental History. University
Park: Pennsylvania State University Press.
Slobin, Dan I. 2009. Review of M. Bowerman and P. Brown (eds.) “Crosslinguistic perspectives on
argument structure: Implications for learnability.” Journal of Child Language 36: 697–704.
Smith, Zadie 2009. Speaking in tongues. New York Review of Books, February 2nd. http://www.
nybooks.com/articles/archives/2009/feb/26/speaking-in-tongues-2/?pagination=false&printpage=
true.
Stefanini, Silvia, Maria Cristina Caselli, and Virginia Volterra 2007. Spoken and gestural production
in a naming task by young children with Down syndrome. Brain and Language 101: 208–221.
Sweet, Henry 1888. A History of English Sounds from the Earliest Period. Oxford: Clarendon Press.
Tomasello, Michael 2008. Origins of Human Communication. Cambridge: Massachusetts Institute
of Technology Press.
Volterra, Virgina, Mari Cristina Caselli, Olga Caprici and Elena Pizzuto 2005. Gesture and the
emergence and development of language. In: Michael Tomasello and David Slobin (eds.),
Beyond Nature-Nurture: Essays in Honor of Elizabeth Bates, 3–40. Mahway, NJ: Erlbaum.
Werner, Heinz and Bernard Kaplan 1963. Symbol Formation. New York: John Wiley. [Reprinted
in 1984 by Erlbaum].
Woll, Bencie 2009. Do mouths sign? Do hands speak? Echo phonology as a window on language
genesis. In: Rudolf Botha and Henriette de Swart (eds.), Language Evolution: The View from
Restricted Linguistic Systems, 203–224. Utrecht, the Netherlands: Lot.
Wrangham, Richard W. 2001. Out of the pan, into the fire: How our ancestors’ evolution depended
on what they ate. In: Frans de Waal (ed.), Tree of Origin: What Primate Behavior Can Tell Us
about Human Social Evolution, 119–143. Cambridge, MA: Harvard University Press.
David McNeill, Chicago, IL (USA)

and understanding
1. Sensorimotor simulation in speaking, gesturing, and understanding
2. Mental simulation and the body
3. Sensorimotor simulation in speaking and understanding
4. Gesture as sensorimotor simulation
5. Vocal gesture
6. Evolutionary perspective on gesture as simulation
7. Conclusion
8. References
Abstract
This chapter examines empirical studies and theory related to the hypothesis that embo-
died simulations of sensorimotor imagery play a crucial role in the construction of mean-
ing during conversation. We review some of the substantial body of work implicating
sensorimotor simulations in language comprehension and also some more recent work
suggesting that simulations are similarly involved in the production of gestures. Further-
more, we consider two less standard lines of research that are argued to be relevant to the
33. Sensorimotor simulation in speaking, gesturing, and understanding 513
sensorimotor simulation hypothesis. The first is new empirical evidence demonstrating

the pervasive use of vocal gesture in communication. Because of their direct integration
with speech, these gestures offer a unique window into the immediate conceptualizations
that motivate the online production of spoken language. Second, we take an evolutionary
perspective on simulations and examine the use of iconic gestures by the great apes. These
reports reveal the phylogenetic emergence of simulation as the basis for the production of
iconic gestures. Overall, this chapter offers a more unified understanding of how sensor-
imotor simulations figure into productive and receptive communication and across the
spectrum from language to gesture.

and understanding
Communicating thoughts requires that people do something with their bodies, whether
it be waving their hands, flashing their eyebrows, moving their tongues and lips, or nod-
ding their heads. Each of these bodily activities is traditionally understood as an out-
ward translation of inner thought processes. We speak or gesture in particular ways
as a code to signal to others what we are thinking, with particular systematic spoken
and bodily gestures typically viewed as having arbitrary connections with the messages
we hope others to understand.
This view of language and communication, where the body serves as the conduit for
the expression of thought and meaning, does not address why we have the words and
gestures we employ nor how these express the meanings they typically communicate.
Our argument in this chapter is that spontaneous iconic gestures and even highly con-
ventionalized systems of language and gesture are continually linked to ongoing expe-
riential simulations of sensorimotor imagery. Under this view, articulatory and manual
movements associated with language and gesture are produced as the physical manifes-
tations of these simulation processes. Furthermore, one’s ability to interpret both
speech and gesture as being specifically meaningful in context includes partly simulating
what others must be doing by their use of speech and gesture. A crucial part of this view
is that the human body serves not as a mere conduit for the transmission of meaning,
but plays an essential role in how we imaginatively simulate meaningful ideas. Humans
conceptualize all kinds of imagery through the multimodal actions of their bodies, and
we argue that this ability lies at the foundation of linguistic and gestural meaning. We
refer to this claim as the “sensorimotor simulation hypothesis” and in this chapter
describe some of the theoretical debate and empirical research related to the role
that sensorimotor simulation plays in speaking, gesturing, and understanding.
In addition to reviewing the more standard literature on embodied simulation, the
chapter also considers two additional lines of research that we argue are relevant to
the sensorimotor simulation hypothesis. First we look at recent evidence for the perva-
sive use of vocal gesture in communication. These gestures, because of their incorpora-
tion into the articulatory gestures of speech, offer a unique window into the immediate
conceptualizations that motivate the online production of spoken language. Second, we
take an evolutionary perspective on simulations and examine reports of the use of ico-
nic gestures by the great apes. It is argued that these reports, spanning almost 100 years,
reveal the phylogenetic emergence of simulation as the basis for the production of
iconic gestures.
2. Mental simulation and the body

Mental simulation is an idea that has attracted the attention of many contemporary cog-
nitive scientists in their study of perception, thought, and language. Consider, for exam-
ple, a single act of visual perception in which a person looks at a coffee cup sitting on a
table. One proposal for how people perceive the cup suggests that our perceptual sys-
tems have evolved to facilitate our interaction with a real, three-dimensional world.
Perception does not take place solely in the eyes or brain of the perceiver, but rather
is an act of the whole animal, the act of perceptually guided exploration of the environ-
ment, the function of which is to keep the perceiver in touch with the environment and
to guide action, not to produce inner experiences and representations (Gibson 1979).
More specifically, at any given moment, the environment affords a range of possibilities,
called affordances. A person looking at a coffee cup may, for instance, imagine different
ways of interacting with the cup, including reaching to grasp the cup, picking it up, and
tilting it in just the right way to sip its contents. One could also do other things while
observing a coffee cup, such as imagining it as something to pick up and throw, some-
thing to sit on, to balance on one’s head, and so on. Most generally, perception and
action are deeply intertwined, such that perceiving anything in the world is very
much a matter of anticipating different embodied interactions with these things, includ-
ing other people. We perceive by simulating embodied possibilities, and then act on
them given our adaptive needs in context (Noe and O’Regan 2002).
Mental simulation is often understood as the “reenactment of perceptual, motor, and
introspective states acquired during interactions with world, body, and mind” (Barsalou
2008: 618). Much behavioral research demonstrates that even higher-order conceptual
processing involves sensorimotor simulations. For example, people’s categorization be-
haviors (e.g., recalling as many instances of fruit as possible) reflect their sensorimotor
imagining of themselves in different real-world situations (e.g., walking down the pro-
duce aisle in a grocery store), rather than their simply retrieving lists of exemplars from
a taxonomic list in one’s conceptual knowledge (e.g., recalling all fruits stored in mem-
ory) (Vallee-Tourangeau, Anthony, and Austin 1998). People also appear to simulate
how different concepts may be combined (e.g., a half smile is different from a half
watermelon), such that concepts are better understood as being created on the fly
through sensorimotor simulation as opposed to being passively stored as a set of
abstract features in memory (Wu and Barsalou 2001).
Our argument is that both language and gesture are produced and understood in
terms of highly imagistic simulation processes, yet scholars differ in their characteri-
zation of simulation processes in human cognition. Some psycholinguists adopt the
idea that simulation in language use involves neural processes ordinarily having nonlin-
guistic functions (e.g., perception and action) (Havas, Glenberg, and Rinck 2007), a
view that closely links simulation to brain states. Others embrace an ecological, embo-
died view in which mental simulations emerge from the dynamical interactions of
brains, bodies, and world (Gibbs 2006a). Under this latter perspective, simulation is not
purely mental or neural, but a process that involves and effects many full-bodied sensations.
For instance, a classic social psychological study asked people in the U.S. to unscramble
mixed up wording in individual sentences that contained words referring to elderly peo-
ple (e.g., grey, old, wrinkled, tired, Florida, wise) (Chartand and Bargh 1999). After com-
pleting this task, participants were videotaped as they left the experimental room.
The result of interest is that people walked significantly slower after reading words
referring to the elderly than in other conditions. Thus, reading words with certain mean-
ings affected people’s subsequent walking behaviors to be similar to the multi-modal
simulation processes required to understand those earlier read words. As Zwaan
(2004: 38) argued, “comprehension is not the manipulation of abstract, arbitrary and
amodal symbols, a language of thought. Rather, comprehension is the generation of
vicarious experiences making up the comprehender’s experiential repertoire.”
3. Sensorimotor simulation in speaking and understanding

One of the earliest theories on sensorimotor influences on language understanding is
seen in work on the motor theory of speech perception (Galantucci, Fowler, and Turvey
2006). This theory maintains that listeners use their own articulatory motor programs to
interpret spoken language. Thus, considerable evidence shows that perception of pho-
nemes is accomplished not simply by analysis of physical acoustic patterns but through
their articulatory events, such as movements of the lip, tongue, and so on. People hear
speech sounds by imagining producing the stimuli they hear. One analysis of different
speech perception and production tasks proposed that listeners “focus on acoustic
change, because changing regions of the sound spectrum best reveal the gestural con-
stituency of talker’s utterances (Fowler 1987: 577). More recent research claims that
phonetic primitives are gestural, and not abstract features (Browman and Goldstein
1995). Articulatory gestures are unified primitives characterizing phonological patterns,
in addition to capturing something about the activity of the vocal articulators. Our
repository of words in the mental lexicon is more specifically composed of dynamically
specified gestures.
In recent years, experimental psycholinguists have studied various ways that the sen-
sorimotor system functions in the higher order structures of language involved in under-
standing meaningful utterances. One trend in this work is to explicitly explore how
overt and covert bodily movement shapes people’s interpretation of meaning. Thus,
some research demonstrates that appropriate bodily actions (e.g., making a hand ges-
ture with the thumb and fingers touching) facilitate semantic judgments for action
phrases such as “aim a dart” (Klatzky, Pellegrino, McCloskey, et al. 1989). In addition,
Glenberg and Kaschak (2002) demonstrate what they call the action-sentence compat-
ibility effect. In one experiment, participants made speeded sensibility judgments for
sentences that implied action either toward or away from the body (e.g. “Close the
drawer” implies action of pushing something away from the body). Participants indi-
cated their judgment by use of a button box which contained a line of three buttons per-
pendicular to their body. Presentation of the sentence was initiated when the participant
pressed the center button, and yes or no responses (i.e., sensible or not sensible) were
made with the two remaining buttons, requiring action either away from or toward the
body. Glenberg and Kaschak (2002) found an interference effect, such that comprehen-
sion of a sentence implying action in one direction interfered with a sensibility response
made in the opposing direction. This effect was interpreted as evidence that under-
standing language referring to action recruits the same cognitive resources needed to
actually perform the action.
Another study investigated whether people mentally represent the orientation of
a referent object when comprehending a sentence (Stanfield and Zwaan 2001).
Participants were presented with sentences that implicitly referred to the orientation of
various objects (e.g. “Put the pencil in the cup” implies a vertical orientation of the pen-
cil). After each sentence, a picture was presented, to which participants answered
whether the pictured object had been in the previous sentence. For pictures that
were contained in the previous sentence, the picture’s orientation varied as to whether
or not it matched the orientation implied by the sentence (e.g., a pencil was presented
in either a vertical or horizontal orientation). Overall, participants responded faster to
pictures that matched the implied orientation than to mismatched pictures and sen-
tences. This empirical finding suggests that people form analogue representations of ob-
jects during ordinary sentence comprehension, which is consistent with the simulation
view of linguistic processing.
Studies also suggest that sensorimotor simulations during language comprehension
can be shaped by emotion. Earlier brain imaging experiments showed that observing
(perception) and imitating (action) emotional facial expressions activate the same neu-
ral areas of emotion, as well as relevant motor parts of the mirror neuron system (Carr,
Iacoboni, Dubeau, et al. 2003). Generating facial expressions also primes people’s rec-
ognition of others’ facially conveyed emotions (Niedenthal, Brauer, Halberstadt, et al.
2001). These findings on simulation and emotion recognition have been extended to
language comprehension. For instance, one set of studies evoked positive or negative
emotions in participants by having them either hold a pen with their teeth (i.e., produ-
cing a smile) or gripping a pen with their lips (i.e., producing a frown) (Havas, Glen-
berg, and Rinck 2007). As people were either smiling or frowning, they made
speeded judgments as to whether different sentences were either pleasant (e.g., “You
and your lover embrace after a long separation)” or unpleasant (e.g., “The police car
rapidly pulls up behind you, siren blaring”). Response times to make these judgments
showed that people made pleasant judgments faster when smiling than when frowning
and made unpleasant judgments faster when frowning than when smiling. Subsequent
experiments revealed that these effects could not be replicated when people make
speeded lexical decisions to isolated words that were either pleasant (e.g., “embrace”)
or unpleasant (e.g., “police”), suggesting that emotional simulation during language
comprehension operates most strongly at the level of full phrases or sentences. Overall,
sensorimotor simulations appear to be critical parts of language understanding as
people create meaningful and emotional interpretations of linguistic expressions.
One concern with the psycholinguistic research on word and sentence processing is
that simulations may be important in understanding concrete actions and objects, but
not necessarily abstract ideas, such as “justice” or “democracy.” Yet there is much
work in cognitive linguistics showing that people understand at least some abstract con-
cepts in embodied metaphorical terms (Gibbs 2006a; Lakoff and Johnson 1999). More
specifically, abstract ideas, such as “justice,” are structured in terms of metaphorical
mappings where the source domains are deeply rooted in recurring aspects of embodied
experiences (i.e., achieving justice is achieving physical balance between two entities).
Many abstract concepts are presumably structured via embodied metaphors (e.g.,
time, causation, spatial orientation, political and mathematical ideas, emotions, the
self, concepts about cognition, morality) across many spoken and signed languages
(see Gibbs 2008). Systematic analysis of conventional expressions, novel extensions,
patterns of polysemy, semantic change, and gesture all illustrate how abstract ideas
are grounded in embodied source domains. Thus, the metaphorical expression “John
ran through his lecture” is motivated by the embodied metaphor of mental achievement
is physical motion toward a goal (a submetaphor derived from change is motion). These
cognitive linguistic findings provide relevant evidence showing that abstract concepts
are partly structured through embodied simulation processes (Gibbs 2006b).
A second response to the concern about simulation and abstract concepts comes
from different psycholinguistic research demonstrating that embodied conceptual me-
taphors motivate people’s use and understanding of metaphoric language related to var-
ious abstract concepts (Gibbs 2006a, 2006b). These experimental studies indicate that
people’s recurring embodied experiences often play a role in how people tacitly
make sense of many metaphoric words and expressions. In fact, some of these studies
extend the work on simulative understanding of non-metaphorical language to proces-
sing of metaphorical speech. Gibbs, Gould, and Andric (2006) demonstrated how peo-
ple’s mental imagery for metaphorical phrases, such as “tear apart the argument,”
exhibit significant embodied qualities of the actions referred to by these phrases (e.g.,
people conceive of the “argument” as a physical object that when torn apart no longer
persists). Wilson and Gibbs (2007) showed that people’s speeded comprehension of
metaphorical phrases like “grasp the concept” are facilitated when they first make, or
imagine making, in this case, a grasping movement. Bodily processes appear to enhance
the construction of simulation activities to speed up metaphor processing, an idea that
is completely contrary to the traditional notion that bodily processes and physical
meanings are to be ignored or rejected in understanding verbal metaphors. Further-
more, hearing fictive motion expressions implying metaphorical motion, such as
“The road goes through the desert,” influences people’s subsequent eye-movement
patterns while looking at a scene of the sentence depicted (Richardson and Matlock
2007). This suggests that the simulations used to understand the sentence involve a par-
ticular motion movement of what the roads does, which interacts with people’s eye
movements.
Experimental findings like these emphasize that people may be creating partial, but
not necessarily complete, sensorimotor simulations of speakers’ metaphorical messages
that involve moment-by-moment “what must it be like” processes, such as grasping, that
make use of ongoing tactile-kinesthetic experiences (Gibbs 2006b). These simulation
processes operate even when people encounter language that is abstract, or refers to
actions that are physically impossible to perform, such as “grasping a concept,” because
people can metaphorically conceive of a “concept” as an object that can be grasped.
One implication of this work is that people do not just access passively encoded con-
ceptual metaphors from long-term memory during online metaphor understanding,
but perform online simulations of what these actions may be like to create detailed
understandings of speakers’ metaphorical messages (Gibbs 2006b).
4. Gesture as sensorimotor simulation

During an 1880 conference held in Milan to advocate oral language education for the
deaf, one proponent Guiulo Tarra argued:
Gesture is not the true language of man which suits the dignity of his nature. Gesture,
instead of addressing the mind, addresses the imagination and the senses (…) Thus, for
us it is an absolute necessity to prohibit that language and to replace it with living speech,
the only instrument of human thought (…) Oral speech is the sole power that can rekindle
the light God breathed into man when, giving him a soul in a corporeal body, he gave him
also a means of understanding, of conceiving, and of expressing himself (…) mimic signs
(…) enhance and glorify fantasy and all the faculties of the imagination (Lane 1984:
391; from Wilcox 2004: 120–121).
The empirical research reviewed above on the active involvement of the sensorimotor
system in online language understanding reveals that Tarra was not so much mistaken
about the nature of gesture, but rather about the nature of language. Language, through
the imaginative processes of simulation, is deeply grounded in movement and the
senses, which form the basis for the construction of linguistic meaning. Yet, ironically,
as evidence has accumulated for this view on language, empirical research on gesture
has been slow to adopt a distinctly simulation perspective, despite its intuitive appeal.
One reason for this imbalance may be found in the asymmetry between different ap-
proaches to studying meaning in language compared to gesture. While language re-
searchers interested in meaning have tended to focus on comprehension, gesture
researchers have tended to pay more attention to production, which does not so easily
lend itself to the traditional dependent measures used to examine the online processes
involved in meaningful communication. Thus, when gesture comprehension has been mea-
sured, this has largely been in the way of offline, targeted assessments of information
uptake.
The intuitive basis for a simulation theory of gesture stems from its clear correspon-
dence to sensorimotor imagery, a point that has also received substantial empirical
attention over the years (McNeill 1992, 2005). One line of research has documented
a positive correlation between spatial processing and gesture production. For example,
Lavergne and Kimura (1987) found that people gesture more frequently when conver-
sing about spatial topics compared to neutral or verbal topics. Another study asked par-
ticipants to describe animated action cartoons, and found that speech-accompanying
gestures were nearly five times as likely to occur with phrases containing spatial prepo-
sitions than those without spatial content (Rauscher, Krauss, and Chen 1996). A study
by Hostetter and Alibali (2007) investigated the influence of individual differences in
spatial skills on gesture production, with the finding that people with strong spatial skills
and weak verbal skills gestured most frequently. Finally, neuropsychological research
shows that stroke patients who suffered visuospatial deficits gestured less than matched
controls (Hadar, Burstein, Krauss, et al. 1998).
Other studies have revealed a positive relationship between gesture and imagery
more broadly. In one experiment, Rime and colleagues had participants engage in a
50 minute conversation while seated in an armchair that restrained the movement of
their head, arms, hands, legs, and feet (Rime, Schiarapture, Hupet, et al. 1984). Analysis
of their dialogue revealed that while people’s movement was restricted, the content of
their speech showed a significant decrease in the amount of vivid imagery compared to
freely moving controls. Another study asked participants to describe a cartoon after, in
one condition, watching it, or in a second condition, reading it in narrative form (Hos-
tetter and Hopkins 2002). People gestured more frequently after watching the cartoon,
presumably because of the richer imagery in the non-verbal presentation. Last, Beattie
and Shovelton (2002) investigated the properties of verbal clauses that were likely to
occur with iconic gestures. Participants narrated cartoons, and afterwards, the clauses
in these narrations were rated by other subjects for their imageability. Clauses
associated with gestures were rated as more highly imageable.
These various studies demonstrate that gesture is highly correlated with communica-
tion about imagery-laden topics that span a range of motor, visual, and spatial imagery.
Indeed, these findings are not at all surprising as one considers the highly imagistic
nature of gestures themselves, which are inherently visible, spatially-oriented motor ac-
tions. Yet, the studies also beg the question of how sensorimotor imagery might relate
to gesture within the momentary processes involved in communicative interaction. One
recent proposal – the Gesture as Simulated Action hypothesis – offers a potential
answer, arguing for the idea that gestures arise from the simulation of motor imagery
(Hostetter and Alibali 2008). According to this view, simulations of action-related
thoughts lead to the activation of neural premotor action states, which then has the
potential to spread to motor areas. This spreading activation comes to be realized as
the overt action of representative gesture.
Hostetter and Alibali’s Gesture as Simulated Action hypothesis is clearly related to
our claim about the importance of sensorimotor simulation in speaking, gesturing, and
understanding. Yet the Gesture as Simulated Action hypothesis may have some limita-
tions. The first is not critical, but concerns the scope of the gestural behaviors the Ges-
ture as Simulated Action hypothesis actively attempts to explain. Explicitly the theory
concerns “representative gestures,” which are said to include iconic and deictic gestures.
However, deictics receive minimal consideration, and more broadly, the theory does not
address language-related gestures, including, spoken prosody, manual beat gestures, or
iconic “vocal” gestures (see below), nor the conventionalized gestures of speech and
sign. Given the tight temporal coordination and semantic coherence between spoken
language and gesture (McNeill 1992, 2005; Perlman in press), we emphasize the impor-
tance of developing a comprehensive theory of gesture aimed to account for the range
of these gestural behaviors. This point gains weight as researchers increasingly observe
a more graded distinction between modality-independent notions of gesture and
language that rests largely on a continuous quality of conventionalization.
A second limitation of the Gesture as Simulated Action account is in its exclusive
emphasis on simulated action as the foundation of gesture. The theory argues that simu-
lated motor imagery is the critical conceptual ingredient for gesture production, and
that gestures related to non-motor, perceptual imagery might arise, but only through
co-activation with afforded actions or by simulated perceptual actions like eye move-
ments or tactile exploration. Though one cannot deny that action is an essential element
of many, if not all, gestures (though consider a gestural depiction of stillness and expand
from there), the fact that gestures are composed of substance and context is likewise
essential to the simulation process. To fully appreciate the simulative processes involved
in the production of gesture, it is critical to attend to the full process of how the body is
imaginatively and often metaphorically used to constitute an endless variety of simu-
lated entities and qualities. As McNeill (1992) observed, “The hand can represent a
character’s hand, the character as a whole, a ball, a streetcar, or anything else (…) In
other words, the gesture is capable of expressing the full range of meanings that arise
from the speaker” (McNeill 1992: 105). Moreover, as we describe below, the cross-
modal sorts of representations involved in common vocal gestures demonstrate a com-
plexity of simulation that is just not adequately captured by the simplifying notion of
“simulated action.”
This capacity is easily demonstrated even within a conceptually simple context. For
example, consider a study by Lausberg and Kita (2003), which investigated hand pref-
erence for observer-viewpoint iconic gestures. Participants watched and described ani-
mations of two shapes interacting with each other, with one shape oriented on the left
side and the other on the right. In one condition, participants verbally described the
event as they also produced many speech-accompanying gestures, and in a second con-
dition, they depicted the event silently. Lausberg and Kita were interested in whether
the verbal condition would elicit, through left hemispheric activation associated with
speech, more right-handed gestures. Indeed, the findings showed only a minimal effect
induced by speech, and instead, by far the most important factor influencing hand
choice was whether the represented block was oriented on the left or the right side
of the animation. Not surprisingly, when the represented block was on the left, speakers
tended to use their left hand, and when the block was on the right, speakers tended to
use their right hand. In addition, participants also produced numerous bimanual ges-
tures in which the hands worked together to embody the spatial relationship between
the blocks. This empirical finding provides a nice and simple illustration of the essential
role played by substance and context, in addition to action, in how simulative processes
give rise to gesture.
These difficulties with the Gesture as Simulated Action hypothesis do not dampen
our enthusiasm for the general idea that gesture is both produced and understood in
terms of ongoing sensorimotor simulations. By providing a detailed formulation of
their hypothesis, Hostetter and Alibali open the way for direct, empirical testing of a
simulation-based theory of gesture, helping to incorporate gesture into an important
theory for the comprehension of meaning in language. Notably, by linking the sensor-
imotor simulations of language comprehension to gesture production, the hypothesis
brings together the meaning-making processes involved in language and gesture, as
well as comprehension and production.
The value of their proposal can be seen in one recent study that examined the degree
of detail incorporated into the production of a gesture, and in turn, how much of this
detail is conveyed to the listener (Cook and Tanenhaus 2009). Participants solved the
Tower of Hanoi problem, either with a stack of weights or on a computer, and after-
wards described the solution to a listener. (In this problem, a stack of disks is arranged
bottom up from largest to smallest on the leftmost of three pegs. The goal is to move all
of the disks to the rightmost peg, moving only one disk at a time and without ever pla-
cing a larger disk on top of a smaller one.) Analysis of the trajectory of speakers’ ges-
tures revealed that these were finely tuned to the actual trajectory involved in solving
the problem, with differences reflecting the differently afforded constraints of the real-
weight versus computer version of the task. Moreover, listeners were sensitive to this
information, which transferred into matching trajectories when listeners later per-
formed the computer version of the problem themselves. Cook and Tanenhaus (2009)
interpret these analog differences in gesture production and comprehension as evidence
for the activation of perceptual-motor information that is involved in the actual perfor-
mance of the task. More generally, they suggest that this finding is consistent with a
sensorimotor simulation account of gesture production and comprehension.
Cook and Tanenhaus provide a compelling interpretation of their results, and we
anticipate much future empirical research focused in this direction. Yet, returning to
the current formulation of the Gesture as Simulated Action hypothesis, we also
reiterate the need for a more comprehensive account that emphasizes the full embodied
and contextualized complexity of these simulation processes and the meaningful actions
that result. Given evidence for the multimodal nature of language and gesture, it is cru-
cial to understand not just the action aspect of gesture, but how the body is incorporated
into the construction of meaning during these activities. To illustrate this point, we next
describe some recent research and examples related to the production of gestures
within the vocal modality.
5. Vocal gesture
When people talk, they commonly pattern their voice in a variety of ways to iconically
depict an aspect of their subject matter. Many times these iconic correspondences are
created within the domain of sound, with the sounds of our voice imitating the sounds
of our environment. One familiar example is when quoting another person’s speech, we
often imitate certain characteristics of that person’s voice, such as their emotional state,
accent, and tonal quality (Clark and Gerrig 1990). Nonhuman animals can also be
“quoted,” such as when imitating the high-pitched barks of a yapping dog (perhaps
in combination with a one-handed gesture of the dog’s mouth opening and closing).
And we are similarly inclined to imitate the sounds of inanimate subjects too, an ability
that often proves useful when taking a malfunctioning car to the mechanic. It seems that
people produce and comprehend these sorts of iconic vocalizations so naturally that
we hardly even notice as they are seamlessly woven into our speech.
Close observation reveals further that iconic vocalizations go well beyond simple
pantomimic imitations of other sounds. Indeed, spoken words and phrases often take
on prosodic patterns that reflect aspects of their so-called semantic meaning, often ex-
tending these iconic correspondences across modalities through processes of abstrac-
tion and metaphor. For example, one might describe “a looong snake,” expressing
the snake’s physical length and size by iconically accenting the adjective with extended
duration and low pitch. Or in contrast, the phrase “a quicklittlebug” might be uttered
with a fast tempo and high pitch to convey the bug’s rapid movement and size. Building
from such observations, scholars have recently begun to argue that these co-speech vo-
calizations are, in fact, the same qualitative sort of behavior as manual gestures (Em-
morey 1999; Liddell 2003; McNeill 2005; Okrent 2002; Perlman in press). Below we
describe some of the empirical research on vocal gesture and discuss its implications
for an integrative, simulative account of gesture and language.
Bolinger (1983, 1986) was one of the first to recognize an iconic quality to spoken
prosody and to point to the close relationship between intonation and gesture. He
saw intonation as “part of a gestural complex whose primitive and still surviving func-
tion is the signaling of emotion” (Bolinger 1986: 195). Ohala (1994) too observes a link
between intonation and the more primitive emotional vocalizations shared with other
mammals, which is expressed in what he calls the “frequency code.” According to
Ohala, high-frequency vocalizations signal apparent smallness and, by extension, non-
threatening, submissive, or subordinate attitude, and low-frequency vocalizations signal
apparent largeness and thus threat, dominance, and self-confidence. Although both
scholars stress the expansion of this iconic intonational system through processes
of ritualization and metaphor, recent research nevertheless indicates that this view of
intonation and gesture is too narrow. By focusing on the iconic expression of
emotion, Bolinger and Ohala neglect intonation’s incorporation into more imagistic,
representational gestures.
Recent empirical research has begun to document the prevalent use of representa-
tive vocal gesture within various experimental and more naturalistic contexts. An
early series of studies investigated people’s production of vocal gestures, or what the
authors called “analog acoustic expressions” (Shintel, Nusbaum, and Okrent 2006).
One group of participants described the movement of an animated dot moving up or
down with the phrases, “It is going up” or “It is going down.” Another group of parti-
cipants simply read these same sentences as they were presented on a computer screen.
Analysis of people’s speech revealed that the final word of these phrases, “up” and
“down,” were spoken with a higher or lower fundamental frequency both when spoken
to describe the dot or just simply read.
A second study in this series had participants describe animated dots as they moved
at either a fast or slow rate to the left or right. Participants were instructed to use the
phrases “It is going left” or “It is going right” to describe the dot, without explicit men-
tion of its speed. Participants nevertheless spoke the phrase with an overall shorter
duration for fast-moving dots and longer duration for the slow-moving dots. Moreover,
when these descriptions were replayed for listeners to guess whether the utterance had
been spoken in description of a fast or slow moving dot, their accuracy was significantly
correlated with phrase duration, indicating sensitivity to the prosodic information.
A different series of experiments examined whether modulations in speech rate con-
tribute to a speaker’s mental representation of a described event as in motion or still
(Shintel and Nusbaum 2007). Building on an experiment by Zwaan and Yaxley
(2002), participants listened to sentences (e.g., “The horse is brown”) spoken at a fast
or slow rate and then indicated whether a pictured object had been mentioned in the
sentence. Critically, the picture presented the object either in a stationary position or
in motion (e.g., a horse standing still or running), with the idea that fast-spoken sen-
tences would contribute a sense of movement in the listener’s representation and facil-
itate responses to in-motion pictures. This prediction was confirmed as participants
were faster in responding to compatible trials (fast rate-moving picture or slow rate-
still picture). According to Shintel and Nusbaum, this finding suggests that speech
rate (an “analog acoustic expression”) can contribute to a listener’s analog perceptual
representation of a described event. Analogically conveyed motion information can
influence listeners’ representations about described objects, even when information is
conveyed exclusively in the prosodic properties and not the propositional content of
the sentence.
The regular use of vocal gesture has also been demonstrated in a more naturalistic
setting. Perlman (in press) investigated the spontaneous use of iconic speech rate by
having participants watch and describe a series of short video clips showing fast or
slow-paced events. For each description that made explicit mention of speed, speech
rate measurements were made for the full utterance, as well as for speed-related adver-
bial phrases. The analysis showed that speakers generally spoke faster or slower in their
full descriptions of fast or slow events, respectively, and additionally, they spoke
adverbial phrases about “fast” events faster than adverbials about “slow” ones.
Perlman suggests that these two separate speech rate effects may arise as the man-
ifestation of two different sorts of simulation-related processes. The more general shift
in speech rate is suggested to arise from a background simulation of the event, reflecting
speakers’ imaginative engagement with the tempo of the action as they proceed through
their description, scanning and profiling specific details to highlight different aspects of
the message. On the other hand, the adverbial phrase-specific effect is qualitatively
compared to the more commonly observed manual gestures that are produced concur-
rently with speech. The vocal gesture emerges precisely as the speaker is conceptualiz-
ing and communicating about speed as the profiled aspect of the message. We propose
here that this simulative focus is the motivating force that drives both the conventional
articulatory gestures of the adverbial phrase and simultaneously, the iconic increase in
the rate with which the words are spoken. These conventional and iconic forms of ges-
ture are dynamically integrated together as they are simultaneously activated by the
focused concept of speed as it is contextualized within the simulation of the event.
Yet, ongoing work in this paradigm indicates that vocal gesture is only part of the
story, and that the notion of multimodal iconic gesture is probably more apt (Perlman
in press). In a current study, subjects come into the lab in pairs and take turns watching
and describing to each other short video clips of various animals engaging in different
activities. Preliminary analysis of audio and video shows that vocal gestures, including
the spontaneous rhythms and intonational patterns of speech, are often performed in
precise temporal and semantic coordination with iconic manual gestures. Importantly,
in many instances, gestures may not be arbitrarily synchronized with speech, but rather,
both gesture and speech are performed in iconic synchrony with an ongoing simulation
of the event being described.
To illustrate this synchrony, consider the following excerpt from the description of a
video clip of a large fish floating around in an aquarium. The fish drifts up to the surface
of the water and then suddenly gulps down a bug.
(1) it was this big fish [kind of hanging out, he was floating slowly up to the top
and he ate some…thing…]
(1) (2)
Preparation: raises and holds right hand with thumb and fingers pinched together as a fish
and its mouth.
Manual iconic (1): right hand rises slowly upward and pauses
Manual iconic (2): on “ate” the right hand thrusts forward, thumb and fingers spread-
ing open and closing like the mouth of the attacking fish. On “some” the fingers and hand
retract back and are held to the end of the utterance.
Vocal iconic (1): speech is slowed down and low in intensity reflecting the fish’s man-
ner of floating to the surface. The slowing is most marked in extended duration of the
vowel /o/ in the adverb “slowly.” Pitch steadily rises, peaking on the word “top.”
Vocal iconic (2): speech suddenly increases in tempo and intensity in synchrony with
the stressed syllable of “ate.”
This person’s description demonstrates the multimodal nature of gesture as a see-
mingly single unified expression incorporates iconic manual and vocal gestural elements
with the conventional articulatory gestures of speech. It is additionally interesting to
note that the sudden nature of the fish’s attack, although it is quite apparent in the cor-
responding manual and vocal gestures, is not entirely discernible from just the semantics
of the words. A clue, however, is provided by the grammar of the utterance, in which the
temporally extended “float,” expressed by the progressive aspect, is contrasted to the
punctuated “eat,” expressed in the past tense. Thus, corresponding elements of the fish’s
manner of motion, the temporal contour of the motion, and even perhaps its upward
direction are all manifested synchronously within iconic manual gesture, iconic prosody,
lexical semantics, and even higher-order syntactical structures.
Unlike manual gestures (with the exception of when they are integrated with sign-
ing), vocal gestures are, in a sense, parasitic on the phonological form of an utterance.
That is to say, these gestures must be interwoven within the phonological material pro-
vided by the forms of spoken words. How is it that particular segments, during the
online moments of speech production, are modified to accentuate certain iconic quali-
ties? One possibility is that phonological aspects of a word form maintain some latent
potential to take on a quality of iconicity, which may, in some instances become acti-
vated in relation to the contextualized dynamics of an utterance. This idea borrows
from Wilcox’s (2004) theory of cognitive iconicity and Müller’s (2008) notion of activation
as it plays out in the triadic structure of metaphor.
Cognitive iconicity adopts Langacker’s (1987) claim that semantic and phonological
poles (i.e., semantic meaning and phonological form) each reside within semantic space,
which is itself a subset of the full expanse of conceptual space. Wilcox describes that,
“The phonological pole reflects our conceptualizations of pronunciations, which
range from the specific pronunciation of actual words in all their contextual richness
to more schematic conceptions, such as a common phonological shape shared by all
verbs, or a subset of verbs, in a particular language” (Wilcox 2004: 122). Wilcox explains
that cognitive iconicity is not an objective similarity relation between a form and its sig-
nified referent, but rather a constructed relation between two structures in a multidi-
mensional conceptual space. He also notes that metaphor can act as a “worm hole”
through this space, functioning to shorten the distance between the phonological and
semantic poles.
A similar schema exists in the triadic structure of metaphor, which involves two
meaning structures of source and target and a relation between them. Considering a tra-
ditional view distinguishing dead and living metaphors, Müller (2008) proposes a dyna-
mical alternative in which the metaphorical relation between two concepts does not
need to be fully active or fully opaque, but instead can be activated to a greater or lesser
degree with the dynamics of each instantiated usage. Applying this framework to the
triadic structure of cognitive iconicity, it follows that the iconic relation between the
semantic and phonological poles may lie dormant and become more or less activated
dependent on the dynamics of usage. When activated, these iconic relations become
accentuated and take form as vocal gesture.
Consider for example, the saliently extended duration of /o/ in the pronunciation of
the word “slooowly” in the above example (1). In this case, the duration of the /o/ was
particularly activated, observable through its exaggeration. However, for comparison,
consider how one might articulate the word “slowly” in the phrase, “a ssloowly sslither-
inng sssnake.” In this case the /o/ is still articulated with some extended duration, but in
addition, the frication of the /s/ is extended too, in part because of the alliteration, but
surely also in part to the onomatopoeic hiss that we associate with snakes. Thus one can
see how iconic relations between a phonological form and an aspect of its meaning
might become differentially activated within each dynamical usage.
Vocal gestures seen in examples like (1) above offer a special window into the full
extent to which speech and gesture are integrated together as they manifest from the
same simulative processes. Furthermore, these gestures imply that the embodied repre-
sentations that arise during these simulative processes are profoundly multimodal. Var-
ious concepts, such as those related to speed, manner of movement, size, and verticality,
are spontaneously embodied in the movements of our hands and also in the movements
of our vocal tract, and indeed, in the right context, body parts ranging across much of
our anatomy. The frequent and casual use of iconic vocal gestures in particular suggests
that humans have a special knack for conceiving of their experience in terms of iconic
movements of the vocal tract, often through cross-modal abstractions and metaphor. As
McNeill (1992: 12), puts it, “Gestures are like thoughts themselves,” pointing to the idea
that the embodiment of thoughts through gesture is an essential aspect of the very
nature of how humans think. Vocal gestures may also be thoughts with conventional
linguistic gestures reflecting conventionalized aspects of embodied thought patterns.
6. Evolutionary perspective on gesture as simulation

We now change gears slightly to consider, from the perspective of the sensorimotor
simulation hypothesis to the evolutionary root of language and gesture. Our goal
with presenting this evolutionary perspective is to try and gain insight into the more
fundamental processes that might be involved with sensorimotor simulations and the
bodily activations that arise from them.
If, as we argue, language and gesture are produced and understood through the
bodily movements arising from sensorimotor simulations, then what might be the evo-
lutionary origin of this communicative ability? Can we observe in our great ape cousins
the precursor of representative gesture as sensorimotor simulation? Starting from the
simulation hypothesis, one might begin by making some predictions about the charac-
teristics and qualities of the gestures we would expect to find. Generally one might
expect the most phylogenetically primitive representative gestures to be those that
manifest from the most imaginatively simple of simulations. Thus, for example, the sim-
ulation ought to involve the gesturer’s own perspective and body, and not someone
else’s, let alone the gesturer’s body imagined as something entirely different. In addi-
tion, one would expect the contextual and afforded elements of the simulation to be
largely present in the gesturer’s immediate perceptual experience – the more distant
the element, presumably the more difficult it would be to imagine. In sum, primitive
representative gestures ought to be tightly connected – in form, meaning, and context –
to the presently afforded instrumental and attentive actions that are available to the
gesturer. Our following review of some of the existing reports of spontaneous iconic
gestures by the great apes shows that, by and large, these gestures do in fact tend to
exhibit these sorts of properties (Crawford 1937; Köhler 1925; Savage-Rumbaugh,
Wilkerson, and Bakeman 1977; Tanner, Patterson, and Byrne 2006; Tanner and Byrne
1996).
Kendon (1991), in his examination of the origins of representative action, begins by
directing his reader to the classic work of Köhler (1925) and his observations of the par-
tial enactments of imagined actions by chimpanzees. Kendon points to this work, as we
follow suit, to emphasize the tight relationship between an instrumental action, the
mental imagining of that action, and the partially enacted gestures that manifest during
the imagined act. Almost a century ago, Köhler observed that some chimpanzee gestures
originate from a process that we might in present terms refer to as simulated action. He
describes several instances in which partial actions arise in non-communicative contexts

when the chimpanzee is merely imagining his participation in an action, such as when he
observes another performing a well-rehearsed act.
Köhler captures one interesting example of this in a photograph (Plate IV), in which
a chimpanzee, Sultan, exhibits what Köhler calls a “sympathetic left hand.” The chimp
had earlier mastered the skill of stacking up a series of wooden boxes to acquire a
highly desired banana hanging from the ceiling. In the photograph, Sultan is caught ob-
serving another chimpanzee accomplishing the same task just at the moment when the
second chimp has reached the top and is grabbing for the banana. Sultan is seen staring
intently up at the scene, his left hand reaching into the air. Though we of course cannot
know for sure what Sultan was thinking about, it is certainly plausible to consider that
the chimp was imagining himself up on top of the box grabbing for that banana, just as
he had practiced many times before. Relevant to the simulation hypothesis, notice that
the task was both highly familiar to Sultan as well as fully present perceptually. More-
over, one can only presume that the highly desirable nature of the food served to entice
Sultan’s imagination into action.
Importantly, Köhler also observed that partial actions sometimes arise from
imagined actions within the context of social interaction. He writes:
[A] considerable proportion of all desires is naturally expressed by slight initiation of the
actions which are desired. Thus, one chimpanzee who wishes to be accompanied by
another, gives the latter a nudge, or pulls his hand, looking at him and making the move-
ments of “walking” in the direction desired. One who wishes to receive bananas from
another initiates the movement of snatching or grasping, accompanied by intensely plead-
ing glances and pouts. (…) In all cases their mimetic actions are characteristic enough to be
distinctly understood by their comrades. (Köhler 1925: 307–308)
In each of these circumstances, it appears that the gesturing chimp desires an interactive
outcome and partially enacts a gesture that, if it were carried out to instrumental com-
pletion, would function to bring the outcome into being. Indeed, this ability to imagine
the performance of an instrumental act upon a social interactant, but then to partially
inhibit that act’s performance, appears crucial in the origin of these possible precursors
of representative gesture. And as Köhler notes, comprehension of such partial actions
comes naturally, facilitated by their mimetic resemblance and presumably their
contextual relevance to an afforded instrumental action.
More recent studies have also reported the use of spontaneously produced iconic
gestures by great apes, often occurring in contexts of tactful social engagements in
which one ape is trying to influence the movement and position of an interlocutor (Sav-
age-Rumbaugh, Wilkerson, and Bakeman 1977; Tanner and Byrne 1996). For example,
Savage-Rumbaugh and colleagues documented the gestures uses by bonobo chimpan-
zees as they coordinated copulatory positions with one another (promiscuous sex is a
common behavior of bonobos). They observed that many of the gestures that immedi-
ately preceded copulatory bouts bore an iconic quality and could generally be placed
into three categories: positioning motions (actual physical, gentle movements to
move the recipient’s body or limbs), touch plus iconic hand motions (limb or body
part is lightly touched and then movement is indicated by an iconic hand motion),
and iconic hand motions (simply indicates via an iconic hand movement, without
touch). Interestingly, this ordering of increasing abstraction correlated negatively
with the gestures’ frequency of occurrence, suggesting that those gestures closest to
instrumental action were the easiest to perform. One could reason that the more ab-
stractly iconic gestures, further removed from the immediate context of instrumental
action, would place a greater load on the imagination.
Another study by Tanner and Byrne (1996) observed the use of iconic gestures
between two captive gorillas at the San Francisco Zoo. In this case, a 13-year-old
adult male Kubie was recorded using a variety of iconic tactile and visual gestures to
encourage play and direct interactive movement with Zura, a 7 year-old-female. Of par-
ticular interest here, Tanner and Byrne note how special conditions in the zoo enclosure
made coercion ineffective and propose that these conditions were a motivating force for
the production of the gestures. For instance, a door to an indoor pen was opened wide
enough to allow Zura to fit through but not her larger companion, permitting Zura to
escape if Kubie was too forceful. Additionally, a second, older silverback male was part
of the troop, which meant that Kubie had to be especially charming so as to engage
Zura without drawing the other silverback’s attention. Thus, again, we find that iconic
gestures arise in contexts closely connected with instrumental action, in which one ape
desires to bring about a certain outcome with an interactant, but must exercise restraint
for purposes of social tact.
Evidence shows also that with rich human social interaction and enculturation, apes
develop a markedly expanded capacity to produce more abstracted iconic gestures
(Tanner, Patterson, and Byrne 2006). In many of these cases, the expansion of the imag-
ination and its distinct role within particular gestures is obvious. For example, consider
some of the iconic gestures produced by the language-trained bonobos Kanzi and Mu-
lika in which they “made twisting motions toward containers when they needed help in
opening twist-top lids” and “hitting motions toward nuts they wanted others to crack
for them” (Savage-Rumbaugh, Kelly, Rosa, et al. 1986: 218). Both of these gestures dif-
fer from those described above in how they incorporate the imagined physical manip-
ulation of objects that are not available to immediate tactile experience. Notably
though, the simulated objects are available to visual perception, and as Kanzi and Mu-
lika perform these gestures, it sounds from the preposition “through” that their visual
attention is clearly drawn towards these objects. One might wonder whether they could
produce such sophisticated iconic gestures with the same degree of facility if the objects
were located outside of their perceptual purview. (See Köhler 1925 for interesting ex-
amples from the domain of problem solving in which direct perceptual access to the ele-
ments involved in a solution is necessary for a chimp to conceive of the solution.)
Finally, a simulation-based account of gesture has implications for more domain-general
cognitive processes, the evolution of which is typically assumed a necessary precondition
for the use of iconic gestures. A traditional view on iconic gesture assumes that their pro-
duction and comprehension depends on highly developed cognitive abilities related to
imitation and theory of mind. For example Tomasello (2008: 203) reasons,
To use an iconic gesture one must first be able to enact actions in simulated form [a more
deliberate notion of simulation than we intend in our usage], outside their normal instrumen-
tal context – which would seem to require skills of imitation, if not pretense. But even more
importantly, to comprehend an iconic action as a communicative gesture, one must first
understand to some degree the Gricean communicative intention; otherwise the recipient
will suppose that the communicator is simply acting bizarrely, trying to run like an antelope
or to dig a hole for real when the context is clearly not appropriate. (Tomasello 2008: 203)
This line of reasoning, in combination with empirical research, has led researchers, such
as Tomasello and his colleagues, to dismiss on a priori grounds the possibility that the
great apes use iconic gestures (Call and Tomasello 2007; Pika 2007; Tomasello 2008).
They argue that the great apes have only minimal abilities to imitate and to share
communicative intentions, and thus they simply cannot use iconic gestures.
However, from the perspective of the simulation hypothesis, the use of iconic ges-
tures, although spontaneous, creative, and socially-minded, does not require an ability
for deliberate imitation and pantomime or hard Gricean social-cognitive reasoning.
As we have seen above, reports of iconic gestures by the great apes describe them as
used, not “outside their normal instrumental context,” but directly within it (Tomasello
2008: 203). Moreover, these iconic gestures are comprehended without necessarily a
reflective understanding of the gesturer’s “communicative intention,” but more directly
through an activated sense of the full action within a context that is rife with expecta-
tion of exactly that sort of action. We suggest that these gestures appear to reflect an
emerging ability to perform increasingly imaginative, sensorimotor simulations and to
modulate their iconic motor activations towards communicative expression. According
to the simulation hypothesis, this capacity is interwoven at the core of the cognitive
skills leading to the origin of our own, dramatically more sophisticated system of
socially-tuned simulations which are foundational in the motivation of our gesture
and language.
7. Conclusion
Our main hypothesis is that experiential simulations of sensorimotor imagery are fun-
damental to the conceptual processes that underlie the use of gesture and language and
the construction of meaning during conversational interaction. According to this view,
one’s ability to interpret meaning during conversation resides largely in the ability to
simulate the thoughts and ideas of one’s interlocutor through the expressive movements
of their speech and gesture. These simulative processes are also involved in the produc-
tion of language and gesture. Articulatory movements, whether of the vocal tract, the
hands, or potentially any other part of the body, are produced by the activations that
arise during a sensorimotor simulation. Indeed, these bodily activations are, in the
sense of Vygotsky’s (1986) notion of a “material carrier,” an essential aspect of the
thought itself. An important implication of this idea is that our embodied, sensorimotor
experience plays a crucial role in the formation of the concepts and the meanings we
construct and express during the online moments of conversation. Critically, these con-
cepts and meanings appear to be interwoven across modalities, and often involve the
creation of schematic and metaphorical cross-modal correspondences.
Although we do not, as yet, have a firm idea on all of the constraints on simulation
processes, and the extent to which they create simplified, as opposed to complex, mean-
ings, we suggest that these processes include aspects of full-bodied experiences, and are
critical to understanding the minds of others. People are likely to be quite flexible in the
level of details they create during sensorimotor simulation, depending on their imme-
diate motivations and goals, the social context, the linguistic material to be understood,
and the task. The process of constructing sensorimotor simulations is constrained sim-
ilarly as are other fundamental cognitive operations in the pursuit of meaning. People
will create simulations rich enough to enable them to infer sufficiently relevant
meanings and impressions, while also trying to minimize the cognitive effort needed to
produce meaningful effects. In some cases, the meanings, emotions, and impressions one
infers when understanding a speaker’s utterance will be relatively crude, primarily
because this set of products will be “good enough” for the purposes at hand. At
other times, people may engage in more elaborate, even highly strategic simulation pro-
cesses as they tease out numerous meanings and impressions from an utterance in con-
text, such as when reading novel metaphors in poetry. Interestingly, these more
elaborate instances of communication often seem to be ones that invoke more richly
iconic expression and interpretation.
Finally, there is one important challenge that must be addressed to achieve a more
unified understanding of how sensorimotor simulations figure into the production and
comprehension of language and gesture. This challenge rests in the need to resolve
the qualitative distinctions that are often assumed to distinguish linguistic communica-
tion from gesture and other so-called paralinguistic forms of expression. On the surface,
language appears to be a completely different sort of behavior from gesture, and lan-
guage scholars have long believed that its use depends on specialized cognitive pro-
cesses. More specifically, language is typically characterized as a conventional system
of discrete, arbitrary forms that are strung together by the phonological and syntactic
rules that comprise duality of patterning (e.g., Hockett 1960; Jackendoff 1994; Pinker
1994). In contrast, gesture appears be almost diametrically opposite. Gestures are idio-
syncratic, iconic, and analog in form; they lack syntactic combinatorial rules and cannot
be analyzed into anything resembling phonological components (McNeill 1992). Ges-
tures seem naturally molded to our thoughts and it is intuitive how they might manifest
directly from the bodily activations of sensorimotor simulations. Language, on the other
hand, is generally thought to be a symbolic code for thought and thus to involve pro-
cesses of encoding/decoding or often “mapping” (e.g., Glenberg and Kaschak 2002)
between thought and linguistic form.
Though on the surface these distinct properties may appear to reflect differences of
quality, a new framework has emerged that instead considers them as a set of continua
(McNeill 1992, 2005). The properties of language and gesture occupy opposing proto-
typical ends of these continua and the intermediate properties of various other forms
of communication, such as pantomimes and emblems, are positioned in between. Build-
ing on this perspective, evidence suggests that the most critical difference between lan-
guage and gesture may relate more to conventionality than to any particular formal
differences intrinsic to language per se. Under this view, the formal properties asso-
ciated with language – duality of patterning, arbitrary symbolism, and categorical
form – may simply be emergent properties of functional constraints on a conventional
communication system.
Crucial to this idea are various empirical observations demonstrating how a func-
tional need to establish conventional communication appears to lead quite naturally
to the properties characteristically associated with language. For example, deaf children
raised without access to a system of sign language naturally create their own linguistic
homesign system (Goldin-Meadow and Feldman 1977). Over a child’s development,
initially idiosyncratic iconic gestures become conventionalized into more discrete
word-like forms, which the child combines together by simple syntactic rules. In fact,
this pattern is so robust that it can be induced within just a few minutes in a laboratory
setting. McNeill (1992) describes a study in which speakers told fairytales to a partner,
but were permitted to use only manual gestures and no words. McNeill notes how
“Within 15 or 20 minutes a system has emerged that includes segmentation, composi-
tionality, a lexicon, a syntax with paradigmatic oppositions, arbitrariness, distinctiveness,
standards of well-formedness, and a high degree of fluency” (McNeill 1992: 66).
Other evidence for this natural pathway between language and gesture comes from
studies documenting the residual iconicity found in sign languages. One study examin-
ing 1,944 signs in Italian Sign Language found that fifty percent of handshape occur-
rences and sixty-seven percent of body locations had clear iconic motivations
(Pietrandrea 2002). Indeed, sign language scholars have traced some of the historical
routes by which conventional signs originate from iconic gesture (Wilcox 2004). Fur-
thermore, the fact that a substantial amount of this iconicity persists even in mature lan-
guages suggests that it continues to play an active role in their ongoing development.
Although it is less transparent, there is also evidence to suggest that vocally-based ico-
nicity similarly persists to some degree in spoken languages (Hinton, Nichols, and
Ohala 1994).
Perhaps most crucial difference between gesture and language may turn out mostly
to be a matter of convention. (This is not to say, of course, that humans do not have a
special knack for acquiring the conventional; clearly, conventional actions and behav-
ioral routines of all sorts abound in human culture.) As we have seen, this quality of
convention is fluid and, importantly, bidirectional. Gestures can become more linguistic
under certain functional constraints, and likewise, in certain contexts, linguistic forms
can become more like gesture. Such is the case with poetry, both spoken and signed,
and more mundanely, it is the case with prosodic vocal gestures. Given this slippery dis-
tinction, it seems perhaps less plausible to view language from the standard notion as a
symbolic, arbitrary code, into which thoughts are encoded, transmitted, and then de-
coded back. A more parsimonious account might consider instead that language is pro-
duced and understood in the same way as gesture, with the only significant difference
being that linguistic gestures arise from the conventionalized aspects of simulated sen-
sorimotor imagery. Moreover, it takes only casual observation to witness the completely
ordinary way in which language and gesture are seamlessly integrated with pantomimes
and emblems, as well as music, dance and everything in between. Underlying it all,
we argue, is the ability to interactively engage our minds and bodies in imaginative
simulations of sensorimotor imagery.
8. References
Barsalou, Lawrence W. 2008. Grounded cognition. Annual Review of Psychology 71: 230–244.
Beattie, Geoffrey and Heather Shovelton 2002. What properties of talk are associated with the
generation of spontaneous iconic hand gestures? British Journal of Social Psychology 41:
403–417.
Bolinger, Dwight 1983. Where does intonation belong? Journal of Semantics 2(2): 101–120.
Bolinger, Dwight 1986. Intonation and Its Parts: Melody in Spoken English. Palo Alto, CA: Stan-
ford University Press.
Browman, Catherine and Louis Goldstein 1995. Dynamics and articulatory phonology. In: Timo-
thy van Gelder and Robert F. Port (eds.), Mind as Motion, 175–193. Cambridge: Massachusetts
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys.
London: Lawrence Erlbaum.
Carr, Laurie, Marco Iacoboni, Marie-Charlotte Dubeau, John C. Mazziotta and Gian Luigi Lenzi
2003. Neural mechanisms of empathy in humans: A relay from neural systems for imitation to
limbic areas. Proceedings of the National Academy of Science USA 100: 5497–5502.
Chartrand, Tanya and John Bargh 1999. The chameleon effect: The perception-behavior link and
social interaction. Journal of Personality and Social Psychology 76(6): 893–910.
764–805.
Cook, Susan Wagner and Michael K. Tanenhaus 2009. Embodied communication: Speakers’ ges-
tures affect listeners’ actions. Cognition 113(98): 98–104.
Crawford, Meredith P. 1937. The cooperative solving of problems by young chimpanzees. Compar-
ative Psychology Monograph 14(2): 1–88.
Emmorey, Karen 1999. Do signers gesture? In: Lynn Messing and Ruth Campbell (eds.), Gesture,
Speech, and Sign, 133–159. New York: Oxford University Press.
Fowler, Carol A. 1987. Perceivers as realists; talkers too. Journal of Memory and Language 26(5):
574–587.
Galantucci, Bruno, Carol A. Fowler and M. T. Turvey 2006. The motor theory of speech percep-
tion reviewed. Psychonomic Bulletin and Review 13(3): 361–377.
Gibbs, Raymond W., Jr. 2006a Embodiment and Cognitive Science. New York: Cambridge Uni-
versity Press.
Gibbs, Raymond W., Jr. 2006b Metaphor interpretation as embodied simulation. Mind and Lan-
guage 21(3): 434–458.
Gibbs, Raymond W., Jr. (ed.) 2008. Cambridge Handbook of Metaphor and Thought. New York:
Gibbs, Raymond W., Jr., Jessica J. Gould and Michael Andric 2006. Imagining metaphorical ac-
tions: Embodied simulations make the impossible plausible. Imagination, Cognition, and Per-
sonality 25(3): 221–238.
Gibson, James Jerome 1979. An Ecological Approach to Visual Perception. Boston: Houghton
Mifflin.
Glenberg, Arthur M. and Michael P. Kaschak 2002. Grounding language in action. Psychonomic
Bulletin and Review 9(3): 558–565.
Goldin-Meadow, Susan and Heidi Feldman 1977. The development of language-like communica-
tion without a language model. Science 197(4301): 401–403.
Hadar, Uri, Aaron Burstein, Michael R. Krauss and Nachum Soroker 1998. Ideational gestures
and speech: A neurolinguistic investigation. Language and Cognitive Processes 13: 59–76.
Havas, David A., Arthur M. Glenberg and Mike Rinck 2007. Emotion simulation during language
comprehension. Psychonomic Bulletin and Review 14(3): 436–441.
Hinton, Leanna, Johanna Nichols and John J. Ohala 1994. Sound Symbolism. Cambridge: Cam-
bridge Univesity Press.
Hockett, Charles F. 1960. The origin of speech. Scientific American 203: 89–97.
Hostetter, Autumn B. and Martha W. Alibali 2007. Raise your hand if you’re spatial: Relations
between verbal and spatial skills and representational gesture production. Gesture 7(1): 73–95.
action. Psychonomic Bulletin and Review 15(3): 495–514.
Hostetter, Autumn B. and William D. Hopkins 2002. The effect of thought structure on the pro-
duction of lexical movements. Brain and Language 82(1): 22–29.
Jackendoff, Ray 1994. Patterns in the Mind: Language and Human Nature. USA: Basic Books.
Kendon, Aadam 1991. Some considerations for a theory of language origins. Man 26(2): 602–619.
Klatzky, Roberta L., James W. Pellegrino, Brian P. McCloskey and Sally Doherty 1989. Can you
squeeze a tomato? The role of motor representations in semantic sensibility judgments. Journal
of Memory and Language 28: 56–77.
Köhler, Wolfgang 1925. The Mentality of Apes. London: Routledge and Kegan Paul.
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. The Embodied Mind and Its
Challenge to Western Thought. New York: Basic Books.
Lane, Harlan 1984. Where the Mind Hears: A History of the Deaf. New York: Random House.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar, Vol. 1: Theoretical Foundations.
Stanford, CA: Stanford University Press.
co-speech gestures and in gesturing without speaking. Brain and Language 86(1): 57–69.
Liddell, Scott K. 2003. Sources of meaning in ASL classifier predicates. In: Karen Emmorey (ed.), Per-
spectives on Classifier Constructions in Sign Languages, 199–219. Mahwah, NJ: Lawrence Erlbaum.
of Chicago Press.
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking. Chicago: University of
Chicago Press.
Niedenthal, Paula M., Markus Brauer, Jamin B. Halberstadt and Ase H. Innes-Ker 2001. When
did her smile drop? Contrast effects in the influence of emotional state on the detection of
change in emotional expression. Cognition and Emotion 15(6): 853–864.
Noe, Alva and J. Kevin O’Regan 2002. On the brain-basis of visual consciousness: A sensorimotor
account. In: Alva Noe and Evan Thompson (eds.), Vision and Mind, 367–398. Cambridge: Mas-
Ohala, John J. 1994. The frequency code underlies the sound-symbolic use of voice pitch. In:
Leanna Hinton, Johanna Nichols and John J. Ohala (eds.), Sound Symbolism, 325–347. Cam-
Okrent, Arika 2002. A modality-free notion of gesture and how it can help us with the morpheme
vs. gesture in question in sign language linguistics. In: Richard P. Meier, Kearsy Kormier and
David Quinto-Pozos (eds.), Modality and Structure in Signed and Spoken Language, 175–198.
Perlman, Marcus in press. Talking fast: The use of speech rate as iconic gesture. In: Fey Parrill,
Vera Tobin and Mark Turner (eds.) Meaning, Form, and Body. Stanford, CA: Center for the
Study of Language and Information Publications.
Pietrandrea, Paola 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2(3): 296–321.
Pika, Simone 2007. Gestures in subadult gorillas (Gorilla gorilla). In: Josep Call and Michael Tomasello
(eds.), The Gestural Communication of Apes and Monkeys, 41–67. New York: Lawrence Erlbaum.
Pinker, Steven 1994. The Language Instinct. New York: William Morrow.
Rauscher, Frances H., Robert M. Krauss and Yihsiu Chen 1996. Gesture, speech and lexical
access: The role of lexical movements in speech production. Psychological Science 7(4): 226–231.
Richardson, Daniel and Teenie Matlock 2007. The integration of figurative language and static de-
pictions: An eye movement study of fictive motion. Cognition 102: 129–138.
Rime, Bernard, Loris Schiarapture, Michel Hupet and Anne Ghysselinkckx 1984. Effects of rela-
tive immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level.
Motivation and Emotions 8(4): 311–325.
Savage-Rumbaugh, E. Sue, Kelly McDonald, Rosa A. Sevcik, William D. Hopkins and Elizabeth
Rupert 1986. Spontaneous symbol acquisition and communicative use by pygmy chimpanzees
(Pan paniscus). Journal of Experimental Psychology: General 115(3): 211–235.
Savage-Rumbaugh, E. Sue, Beverly J. Wilkerson and Roger Bakeman 1977. Spontaneous gestural
communication among conspecifics in the pygmy chimpanzee (Pan paniscus). In: Geoffrey H.
Bourne (ed.), Progress in Ape Research, 97–116. New York: Academic Press.
Shintel, Hadas and Howard C. Nusbaum 2007. The sound of motion in spoken language: Visual
information conveyed by acoustic properties of speech. Cognition 105(3): 681–690.
34. Levels of embodiment and communication 533
Shintel, Hadas, Howard C. Nusbaum and Arika Okrent 2006. Analog acoustic expression in
speech communication. Journal of Memory and Language 55: 167–177.
Stanfield, Robert A. and Rolf A. Zwaan 2001. The effect of implied orientation derived from ver-
bal context on picture recognition. Psychological Science 12(2): 153–156.
Tanner, Joanna E. and Richard W. Byrne 1996. Representation of action through iconic gesture in
a captive lowland gorilla. Current Anthropology 37(1): 162–173.
Tanner, Joanna E., Francine G. Patterson and Richard W. Byrne 2006. The development of spon-
taneous gestures in zoo-living gorillas and sign-taught gorillas: From action and location to
object representation. Journal of Developmental Processes 1: 69–102.
Vallee-Tourangeau, Frederic, Susan H. Anthony and Neville G. Austin 1998. Strategies for gener-
ating multiple instances of common and ad hoc categories. Memory 6(5): 555–592.
Gertrude Vakar, revised and edited by Alex Kozulin. Cambridge: Massachusetts Institute of
Technology Press.
Wilcox, Sherman 2004. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed lan-
guages. Cognitive Linguistics 15(2): 119–147.
Wilson, Nicole L. and Raymond W. Gibbs 2007. Real and imagined body movement primes met-
aphor comprehension. Cognitive Science 31: 721–731.
Wu, Ling and Lawarence Barsalou 2001. Grounding concepts in perceptual simulation: I. Evi-
dence from property generation. Unpublished manuscript.
Zwaan, Rolf A. 2004. The immersed experiencer: Toward an embodied theory of language com-
prehension. In: Brian H. Ross (ed.), The Psychology of Learning and Motivation, 35–62. New
York: Academic Press.
Zwaan, Rolf A., Robert A. Stanfield and Richard H. Yaxley 2002. Language comprehenders men-
tally represent the shape of objects. Psychological Science 13(2): 168–171.
Marcus Perlman, Santa Cruz, CA (USA)

Raymond W. Gibbs Jr., Santa Cruz, CA (USA)
34. Levels of embodiment and communication

1. Introduction
2. Embodiment and the semiotic hierarchy
3. Communication in the perspective of cognitive semiotics
4. Conclusions
5. References
Abstract
The notion of “embodiment” is popular and useful for the cognitive sciences, but it needs
to be made more precise. This chapter describes four different levels of embodiment,
from the perspective of a cognitive semiotics, focusing on meaning. On the most basic
level, the biological body, meaning is co-extensional with life. The lived body, however,
concerns the experiencing self. The significational body engages in sign use and the ex-
tended body concerns the normative level involved in shared sign systems. Communication
thus has a different nature on each of these levels of embodiment. On the level of the bio-
logical body, communication consists of reactions to bodily states, such as cries. For the ex-
periencing self, communication involves intentional movements. The third level of
embodiment brings in communicative actions such as gestures (including vocal ones) in
which a sign relationship is intentionally communicated. The fourth level involves system-
atic use of external representations, as in signed or spoken (and written) languages. This
framework provides a coherent, cognitive semiotic approach for using, understanding,
and researching the notions of embodiment and communication.
1. Introduction
After being dominated for a long time by formal and computational approaches, there
is currently a growing consensus in the cognitive sciences that “the body shapes the
mind” (Gallagher 2005), as well as human communication. However, behind the
term “embodiment” lie many different concepts (see Krois, Rosengren, Steidele, et al.
2007; Ziemke, Zlatev and Frank 2007). Focusing on the embodiment of meaning, this
chapter outlines four different kinds of embodiment: biological, phenomenological,
significational (sign-based) and extended. Since the notion of communication, with con-
siderably older ancestry, is still more ambiguous, I propose a general definition and
four corresponding levels, illustrating these with examples from the comparative and
developmental literatures.
One conclusion is that a purely biological view of embodiment can only account for
the lowest level of communication. By complementing the biological perspective of the
body with a phenomenological one (Husserl [1952] 1989; Merleau-Ponty 1962), focus-
ing on “the lived body” (Leib), we can accommodate crucial dimensions of communi-
cation such as agency and intention. Furthermore, a phenomenological semiotics can
provide a sign concept that is not “Cartesian” or “solipsist”, but rather grounded in
the acts of the lived body, and furthermore on the roles of symbolic artifacts and exter-
nal representations (Donald 1991; Sinha 2009; Sonesson 2007a, 2007b, 2009). The pro-
posal is meant to counteract a danger eminent in focusing too much on the “highest”
level of meaning and communication: a devaluing of the foundational role of the
real, living and lived human body in bringing forth a human, and humane, world.
2. Embodiment and the semiotic hierarchy

2.1. What is “embodiment”?
As often pointed out (e.g. Wallace, Ross, Davies, et al. 2007; Ziemke, Zlatev and Frank
2007), the so-called classical information-processing paradigm within the cognitive
sciences operated with a disembodied notion of the “mind/brain”, on the metaphor
of the digital computer (e.g. Chomsky 1965; Fodor 1981; Jackendoff 1983; Pinker
1994). In reaction to this conception and its many problems, the notion of embodiment
has been something of a rallying call for those looking, with good reasons, for a “radical”
alternative:
Radical embodiment … [is] radically altering the subject matter and theoretical framework
of cognitive science. (Clark 1999: 22)
We propose a radically different view. We will argue that conceptual knowledge is

embodied, that is, it is mapped within the sensory-motor system (Gallese and Lakoff
2005: 456).
Embodied cognition offers a radical shift in explanations of the human mind – a
Copernican revolution in cognitive science – you might say, which emphasizes the way cog-
nition is shaped by the body and its sensorimotor interaction with the world (Lindblom and
Ziemke 2007: 129–130).
Despite initial optimism (see Varela, Thompson and Rosch 1991), however, a widely
accepted, coherent interdisciplinary theoretical framework for the study of human
meaning, communication and thinking based on the notion of embodiment has been
lacking (see Zlatev 2007). One important reason behind this is the ambiguity of the
term “embodiment” itself. The cognitive psychologist Wilson (2002) writes of “six
views of embodied cognition” and the cognitive scientist Ziemke (2003) of “six differ-
ent notions of embodiment”, with the two sets only partially overlapping. Rohrer (2007:
348) states: “By my latest count the term “embodiment” can be used in at least twelve
different important senses with respect to our cognition”, which are different from the
“senses” highlighted by Wilson and Ziemke. Such non-convergence is hardly surprising,
since for a consistent classification one first needs to ask: what (X) is it that is (claimed
to be) embodied in what (Y)? One can find at least the following terms substituting for
the variables in this schema:
X = mind, language, meaning, concepts, thinking, the self…

Y = the biological body, robot bodies, neural networks, (sensorimotor areas in) the brain,
sensorimotor interactions, image schemas, the cognitive unconscious, the phenomenal
body, artifacts, practices, signs…
Related to the problem of ambiguity is that of overextension: Are all aspects of the
(human) mind embodied, and if so, are they done so in the same way? Some strong
statements have been made to this effect:
Image schematic structure is the basis for our understanding of all aspects of our percep-
tion and motor activities. […] Conceptual Metaphor Theory proposes that all abstract con-
ceptualization works via conceptual metaphor, conceptual metonymy, and a few other
principles of imaginative extension. (Johnson and Rohrer 2007: 33, 38, my emphasis)
It is not surprising that such claims have been met with skepticism (Haser 2005; Sinha
2009; Zlatev 2007).
2.2. The embodiment of meaning

In my own research (Zlatev 1997, 2003, 2005, 2007, 2009), as well as that of some others
(Emmeche 2007; Sonesson 2007a, 2007b) a first step to making the concept of embo-
diment manageable has been to narrow down X to meaning. Given the philosophical
problems that have surrounded meaning, this move may at first seem to be rather unpro-
ductive. However, with the rapprochement of ideas from semiotics, theoretical biology,
phenomenology and enactive cognitive science (Brier 2008; Gallagher 2005; Gallagher
and Zahavi 2008; Thompson 2007; Stjernfelt 2007), there are grounds for optimism that
a “unified bio-cultural theory of meaning” (Zlatev 2003) may indeed be possible.
Attempting to synthesize work from cybernetics to linguistics, Zlatev defined mean-

ing as “the relationship between an organism and its environment, determined by […]
value” (Zlatev 2003: 258). More recently, this idea has been generalized into the frame-
work of the Semiotic Hierarchy (Zlatev 2009). In brief: meaning exists if and only if
there is: (a) subject S, (b) a subject-internal value system V and (c) a world W in
which the subject is embedded. A particular phenomenon (p) will have a given meaning
M for S, according to the formulation given in (i).
(i) M (p, S) = W(p) * V(p, S)
To rephrase, the meaning of a given phenomenon, for a given subject, will be deter-
mined by the type of world in which the phenomenon appears and the value of the phe-
nomenon for the subject. If either p is not within W or its value for S is nil, p will be
meaningless for S. Depending on the nature of (a), (b), and (c), four levels of meaning
can be defined, summarized in Tab. 34.1, and explained in the remainder of this section.
Tab. 34.1: Summary of the four levels of meaning of The Semiotic Hierarchy (Zlatev 2009)
Level Subject World Value system
1 Organism Umwelt Biological
2 Minimal self Natural Lebenswelt Phenomenal
3 Enculturated self Cultural Lebenswelt Significational (Sign-based)
4 Linguistic self Universe of discourse Normative
2.2.1. The biological body

As proposed originally by von Uexküll ([1940] 1982) the most fundamental subject S is
a biological organism, even of the simplest kind. Its world W is that of the Umwelt – the
part of the larger environment that is picked out by a value system V, which is either
innately or through learning geared for the organism’s survival and reproduction.
Only organisms, and not artificially created machines, have a set of closely related prop-
erties: autopoiesis (Maturana and Varela 1980), identity-world polarity (Thompson
2007), and an intrinsic value system (Edelman 1992), serving their own interests, rather
than optimizing some externally defined function. No artificial system has these proper-
ties, and hence the only kind of body able to give rise to meaning is the living, biological
body: meaning is co-extensional with life (Zlatev 2003).
2.2.2. The lived body

However, the subject of biology, the organism, is not necessarily an experiencing sub-
ject. The living body is not identical with the lived body (Husserl’s Leib) (see Husserl
[1952] 1989; Merleau-Ponty 1962). The relationship between the organism-subject
and the phenomenon (e.g. the “smell” of the animal picked up by the tick in the famous
example of von Uexküll), is intrinsically meaningful for the tick, but this is not sufficient
to conclude that it has subjective experience ( pace the claims of von Uexküll). At the
same time, the proto-intentional relationship inherent in the organism-Umwelt polarity,
i.e. the biological directedness of the organism-subject toward phenomena which it “ex-
periences” (due to its intrinsic value system) as meaningful, even if non-phenomenally,
is a plausible ground for the emergence of consciousness (as primitive sentience) in
evolution (see Popper 1962, 1992; Thompson 2007; Zlatev 2003, 2009).
On the level of phenomenal value/meaning, there is not only a biologically meaning-
ful Umwelt, but a phenomenal Lebenswelt in which the subject is immersed. The subject
S is here a “minimal self ” (Gallagher 2005), with (at least) affective and perceptual con-
sciousness, which is intentional (i.e. directed) towards whatever is perceived. The other
sense of “intention”, related to agency and volition is related to having a body image,
unifying (at least) haptic, proprioceptive and visual experience of one’s own body (Gal-
lagher 2005), giving higher animals (i.e. at least mammals) and human infants a “sense
of self ” (Stern [1985] 2000), capable of acting purposefully on their surroundings.
2.2.3. The significational body

Non-human animals (without special “enculturation” in a human culture and pro-
grammes of reinforcement) and pre-9 month old-infants are, however, not capable of
using and interpreting signs, understood as triadic structures, involving representamen,
object, and interpreter, following (Peirce 1958: 2.229, my emphasis): “a sign, or repre-
sentamen, is something which stands to somebody for something in some respect or
capacity. […] The sign stands for something, its object.” Thus, a subject (S) uses R (re-
presentamen) to signify O (object), if and only if (1) R and O are connected: in perceiving
or enacting R, S conceives of O, (2) the relation is asymmetrical (R → O, not O → R) and
(3) R and O are differentiated: R is qualitatively different from O for S.
This can be illustrated clearly in the case of pictorial signs. Investigating apes’ (not
the genitive) understanding of pictures, Persson (2008) distinguishes between (a) sur-
face mode, in which only the marks of lines and colour are perceived (Bildding, in Hus-
serl’s terminology), (b) reality mode, in which the picture is confused with the object it
represents (Sujet), e.g. a picture of banana treated as an “odd banana” and (c) pictorial
mode, in which the Bildding is seen as an expression with a certain kind of content (Bil-
dobjekt) which can, but need not represent a particular object (Sujet). Only in the case
of (c) does the subject see the picture as a sign. The sign concept of such a phenomen-
ological semiotics (Sonesson 1989, 2007a, 2009) is a generalization of this, since it can
involve other semiotic resources such as gestures, symbolic play, pantomime, theatre –
and at least the content words of language.
This concept of the sign is phenomenological since it is the consciousness of the sub-
ject (S) that makes both the differentiation and the connection possible (Zlatev 2008b,
2009). While it is logically possible for signs to emerge individually, outside of acts
of communication, as envisaged by Piaget ([1945] 1962), they are typically learned
socially, through imitation and communication. They become stable, and eventually
conventional (i.e. commonly known) in a “symbolic” culture. Thus, the subject S of
this level is an enculturated subject, and the world W is not only the directly perceived
and acted upon natural Lebenswelt, but also a culturally mediated one, not replacing,
but augmenting the first (Zahavi 2003). Note, however, that not all aspects of culture
imply signification (sign use), see Sonesson (2009). Still, signs are the most salient
part of human cultures, and thus, a human cultural Lebenswelt is essentially based on
sign use.
The role of the biological and lived bodies of the previous two levels appears to be
fundamental for (the emergence of) signification (sign use) in two ways: through (whole
body) imitation (Piaget 1962) and the most basic forms of sign use in ontogeny and
possibly in evolution: iconic and deictic gestures, both being aspects of bodily mimesis
(Donald 1991, 2001; Zlatev 2007, 2008a, 2008c).
2.2.4. The extended body

At the highest level of meaning (and latest in evolution/history and ontogeny), the Life-
world of subjects as “linguistic selves” is extended to include not only the pre-sign
meanings (e.g. those of direct perception) and pre-linguistic signs (e.g. mimetic rituals),
but all those denizens of Popper’s (1962, 1992) “world 3”: cultural beliefs, myths, scien-
tific theories, political ideologies, novels, poems, internet forums, blogs etc. which are
made possible by language. This can be called a “universe of discourse” (Sinha
2004). A key aspect of meaning on this level, absent (in any fully realized way) earlier
is normativity: the meanings expressed by language and its many derivative forms are
communicated in ways that obey public, commonly known criteria of correctness, or
“rules” (Itkonen 1978, 2003, 2008; Wittgenstein 1953; Zlatev 2008a).
But in which way does meaning on this level relate to embodiment? Unlike the self-
evident role of the body (biological, lived/imagined and significational/expressive) on
the previous three levels, with the ascent of language, and especially external represen-
tations such as notations, pictures and diagrams, the role of the human body here is less
obvious. Thus, in one sense, one could argue that meaning at this level becomes “disem-
bodied”. But we could also describe this as a matter of “extended embodiment… as-
pects and features of the experientially or ecologically significant, non-corporeal
world” (Sinha and Jensen de López 2000: 24). Normativity is a property not only of lan-
guage, but of all cultural artifacts (from chairs to bank notes), with their “status func-
tions” (Searle 1995). Some of these meanings may be analyzed in terms of “cultural
affordances” (Sonesson 2007a), but others – most clearly again notations and diagrams –
are (systems of) signs, where the sign vehicles (representamena) have “gained a body of
their own” (Sonesson 2007b). Thus, an exclusive focus on language and the “universe
of discourse” for the highest level of the Semiotic Hierarchy might be misleading,
and we could use the term “extended body” to stand for all those modes of meaning
and communication that both transcend the limits of the human body, and link bodily
experience to the wider world of culture, in a global “semiosphere” (Lotman 1990).
3. Communication in the perspective of cognitive semiotics

3.1. What is “communication”?
Dance and Larson (1976) list 126 different definitions answering this question. In an
attempt to “clarify this muddy concept by outlining a number of basic elements used
to distinguish communication” (Littlejohn 1999: 6), Dance has singled out three dimen-
sions according to which concepts of communication differ, as discussed in a compre-
hensive overview (Littlejohn 1999), from which the definitions (ii–vii) are taken
(Dance 1970: 6–7).
(ii) Communication is the process that links discontinuous parts of the living world to
one another. (Ruesch 1957: 462)
(iii) The means of sending military messages, orders, etc. as by telephone, telegraph, radio,
couriers. (The American College Dictionary. New York: Random House. 1964: 224)
(iv) Those situations in which a source transmits a message to a receiver with conscious
intent to affect the latter’s behaviors. (Miller 1966: 92)
(v) It is the process that makes common to two or several what was the monopoly of
one. (Gode 1959: 5)
(vi) Communication is the verbal interchange of a thought or idea. (Hoben 1954: 77)
(vii) Communication is the transmission of information. (Berelson and Steiner 1964:
254)
The first dimension is generality, with (ii) being (arguably) much too general, and
(iii) much too concrete. The second dimension is intentionality, with (iv) requiring “con-
scious intent”, and (v) not. The third concerns whether definitions presume success of
communication, or not. The definition in (vi) does so (along with the over-restrictive
requirement that interchange be “verbal”), while (vii) focuses on the “transmission”,
but does not require that the message is successfully received or understood.
Littlejohn (1999: 7) provides an informative division of “theories of communication”
into five major types: (a) structural-functional theories deriving from system theory,
semiotics, linguistics and discourse studies, (b) cognitive and behavioral theories from
the cognitive and biological sciences, (c) interactionist theories from ethnomethodology
and related forms of social studies, (d) interpretive theories from hermeneutics and phe-
nomenology which “celebrate subjectivism or the preeminence of individual experi-
ence” (Littlejohn 1999: 15) and (e) critical theories, often based on Marxism, which
“focus on issues of inequality and oppression” (Littlejohn 1999: 15). Most importantly,
Littlejohn points out the different theories’ strengths and weaknesses, and shows that
these camps focus on complementary aspects of communication: (a) and (c) on the
social-cultural dimension, with (a) zeroing in on structures, while (c) on processes.
Both (b) and (d) focus on the individual, but from different perspectives: (b) from
the third-person perspective of “objective” observation, while (d) from the first-person
perspective, typical for the humanities. Finally, while (a–d) all focus on understanding
(and possibly explaining) communication, (e) attempts further to use such knowledge
in order to change communicative practices and structures for the better. It is hard
not to agree with Littlejohn’s conclusion: “These genres are more than theory types.
They also embody philosophical commitments and values and reflect the kind of
work that different theorists believe is important” (Littlejohn 1999: 16).
Given this diversity, it may appear naı̈ve to propose a “unified theory of communi-
cation”. Still it is possible to apply the synthetic theory of meaning outlined in the pre-
vious section in order to distinguish levels of communication, corresponding to the
levels of meaning and embodiment. The ambitions of the emerging paradigm of cogni-
tive semiotics involve precisely the combination of the social-cultural and the individual
approaches, the scientific “third-person” and the experiential “first-person” perspec-
tives, as expressed on the home site of the journal with the same name:
The first of its kind, Cognitive Semiotics is a multidisciplinary journal devoted to high qual-
ity research, integrating methods and theories developed in the disciplines of cognitive
science with methods and theories developed in semiotics and the humanities, with the
ultimate aim of providing new insights into the realm of human signification and its
manifestation in cultural practices. (www.cognitivesemiotics.com)
Such a multidisciplinary approach goes against ideological borders that leave us

with a one-sided (or else incoherent) world-view concerning phenomena lying at
the very core of what defines us as human beings: meaning and communication.
Thus, we may propose a definition of communication that lies in an intermediary
position with respect to Dance’s (1970) three dimensions and Littlejohn’s five
theory types:
(viii) Communication is the transmission of meanings between two or more subjects by

means of different (primarily bodily) expressions.
With respect to generality (viii) is clearly less abstract than (ii) by requiring that the
communicating entities be subjects, rather than “parts of the living world” such as neu-
rons or hormones. At the same time it is clearly general, rather than domain-specific
like (iii). With respect to “intentionality” it is non-committed, allowing subdivision
into intended and non-intended transmission of meanings – in two different ways to
be explicated below. As for “success”, it uses the (often criticized) notion of transmis-
sion, as in (vi), but unlike it, it does not focus solely on the “sender”, but on both parties
(“between”). Also unlike (vi), it does not concern only verbal meaning. At the same
time, it does not require in all cases that the sender’s meaning (rather than “informa-
tion”) to be identical with that of the receiver, as in (vii), thus allowing for individual
interpretation, and collective negotiation.
3.2. Levels of communication

The potentially most problematic term in the definition in (viii) is that of “meanings”,
but in line with the general meaning theory outlined in Section 2, this is neither used in
a general, vague sense, nor is specific to a particular level or type of meaning, e.g. “ver-
bal”. In fact, meaning as defined in the Semiotic Hierarchy is a wider concept than com-
munication, since it also involves phenomena in the world, especially in the first two
levels of Umwelt (the meaningful environment) and Lebenswelt (the world accessible
to consciousness) that are not produced by another subject. For communication in
the sense of (viii), the phenomenon seen (or understood) as meaningful by one subject
is always produced by another subject, without or with volition (the distinction between
levels 1 and 2), without or with signification (level 2 vs. level 3), without or with normative
value (level 3 vs. level 4).
These levels of communication correspond to those of the Semiotic Hierarchy, and
the levels of embodiment described in Section 2. Furthermore, cutting across these
levels, divisions can be made, depending on the kind of “materiality” of the commu-
nicative signals and the perceptual modalities of their perception. Tab. 34.2 sum-
marizes this taxonomy, with categories of communicative signals to be explained in
what follows.
Tab. 34.2: Levels of communication, corresponding to the four levels of meaning of the Semiotic
Hierarchy, and the levels of embodiment (see Section 2), with categories of communicative signals
from the different communicative modalities, or “channels”
Level Subject Embodiment Bodily-Visible/ Vocal- Material-Visible/
Haptic Audible Audible
1 Organism Biological Bodily reactions Cries Traces
2 Minimal self Phenomenological Intention- Directed Marks
movements, calls
Attention getters
3 Enculturated Significational Gesture, “Vocal Early picture
self pantomime gestures” comprehension
4 Linguistic self Normative/ Signed language Spoken Writing, external
Extended language representations
3.2.1. Bodily reactions, cries and traces

On the lowest level of the hierarchy, behaviors of a given organism affect the behavior
of another, and thus serve as communicative signals, but are produced automatically, as
part of processes of bodily regulation. Through evolutionary processes that are fairly
well understood (Hauser 1996), these behaviors become selected not only for their reg-
ulative function but also for the way they affect other animals, and thus become
communicative without any (necessary) mediation of awareness.
In the “bodily-visible” channel, piloerection (hair-raising) in mammals may be taken
as an example of a completely involuntary bodily reaction. It makes the animal appear
larger in size and thus more dangerous to an intruder. Many animal cries, such as dogs’
barking, have an analogous role in the vocal-audible modality. It has been argued that
barking is the result of the tension dogs feel when placed in a potentially threatening
situation from which they cannot, or will not for conflicting motivations, flee – a reac-
tion that played a key role in their domestication by our ancestors over the past
10,000 years (Lord, Feinstein and Coppinger 2009). Finally, “extra-bodily” non-
volitional communicative signals such as urination – in Tab. 34.2 called traces – are
used by a variety of territorial animals, including certain fish (Almeida, Miranda,
Frade, et al. 2005) as biological “status” signals to competitors and potential mates.
What is common to these three types of communicative signals is that they serve as
“symptoms” of the biological state of the “sender”, honestly (e.g. sexual status) or not
(e.g. size) and are in principle not different for “receiver” from other aspects of the Um-
welt. However, with the possible exception of the chemical communication of fish in the
third example, these communicative signals, while produced completely “unconsciously”,
i.e. non-voluntarily, involve (almost certainly) phenomenal experience. Thus, they would
seem to involve not only basic, Umwelt-level meaning, but also a basic, pre-cultural and
pre-significational Lebenswelt. This is indicative of the difficulties of separating the levels
in actual cases, even though the distinctions can be maintained analytically.
3.2.2. Intention movements, attention getters, directed calls and marks

The communicative signals of the next level have been most extensively studied among
the non-human primates, and especially the great apes: animals for which the presence
of conscious experience and purposive action can hardly be in doubt (see Beshkar 2008;
Zlatev 2009). Furthermore, the signals discussed here are produced with the purpose of
influencing the behavior of conspecifics, thus amounting to the definition given in (4).
However, while being both “intentional” (i.e. volitional) and “communicative” this
does not amount to a strong notion of intentional communication (Grice 1989), which
requires a higher-level intention: not only to influence the behavior of the receiver,
but that the receiver understands the sender’s intended meaning, a form of third-level
mentality: “I wish that you understand that I mean X in producing Y” (Zlatev
2008a). This implies at least some mastery of signification, which would bring us to
the next level of the hierarchy (Section 3.2.3).
While all species of great apes have been shown capable of signification given special
tutoring and human enculturation: chimpanzees and bonobos (Savage-Rumbaugh
1998), gorillas (Patterson 1980) and orangutans (Miles 1990), their spontaneous com-
municative signals are not true signs and do not amount to “intentional communica-
tion” as defined above (see Deacon 1997; Tomasello 2008), though that claim has
been contested (e.g. Savage-Rumbaugh 1998).
In the bodily-visual modality, there has been considerable recent interest in ape ges-
tures (Call and Tomasello 2007; Pika 2008), and considerable individual and intra-
species group variation, implying learning, has been shown. Leavens and colleagues
(e.g. Leavens, Hopkins and Bard 2008) have documented the widespread presence of
spontaneous “pointing” in captive apes of all species, mostly to human receivers, but
also among themselves in some special conditions. However, Tomasello (2008) and
Pika (2008) argue that such gestures are qualitatively different from those of children
in their second year of life. To put it simply in the terminology of this paper, while chil-
dren’s deictic and iconic gestures involve sign use, those of the apes are not, but can be
categorized as either intention movements (IMs), attention getters (AGs) or a combina-
tion of both. IMs arise from so-called ontogenetic ritualization: e.g. a pulling of the
other’s body in a desired direction becomes toned down to a gentle tug with time,
since the receiver has learned to respond adequately to the initial part of the sender’s
action, allowing it to become “stylized”. Apes also demonstratively understand that the
other needs to attend to such IMs for them to be efficient communicative signals, and
hence when the receiver is facing another direction, the sender will usually produce
AGs – either in the bodily-haptic channel (touching, patting) or in the vocal-audible
one (calling) in order to gain receiver’s attention prior to producing IMs. Tomasello
(2008) argues that ape pointing, which in non-enculturated individuals is always to
desired objects and most often food, arises precisely in this way.
Vocal calls as AGs have already been mentioned as an example of communicative
signals on this level within the vocal-audible modality. Unlike IMs and non-vocal
AGs, however, most ape calls do not seem to be learned (“socially transmitted”), but
species-general, “innate” signals. Hence Tomasello (2008) argues that ape bodily-visible
communicative signals, and not calls, were the likely stepping stone for the evolution of
language: an argument for the “gesture-first” position within the prolonged debate with
“speech-first” theorists (see Johansson 2005). This is plausible, but ape (and dolphin)
vocal signals are not to be easily dismissed. In the case of the most studied non-
human species in primatology, chimpanzees, calls have been shown to be of two
types: “broadcast” and “proximal”. The first, such as the “food-cry” are high-pitched,
not addressed to anyone in particular, apparently involuntary (Deacon 1997), and
while their communicative function is often agonistic (pro-social) rather than (only)
antagonistic, they seem to be closer to the cries of level 1. The second type of calls
are low-pitched, directed to particular individuals, voluntarily produced and intended
to have a particular effect, e.g. consoling a distressed relative. It has also been shown
that the two types lead to different brain-activation patterns: more localized to the
right-hemisphere for the directed, proximal calls (Taglialatela, Russell, Schaeffer,
et al. 2009). Therefore, it seems that Tomasello (2008) underestimates the complexity
of ape vocal abilities, by treating them basically as a level 1 phenomenon (cries):
non-voluntary and unlearned.
However, what was stated in the beginning of this subsection, that both bodily and
vocal spontaneous animal signals are not communicative signs, remains unchallenged.
The calls signaling different types of predators (leopard, eagle, snake) produced by ma-
caques, which received much attention at the beginning of the 1990s, are now nearly
unanimously agreed to be “broadcast” signals, serving their communicative functions,
without being either learned or intentional (both in the sense of “voluntary” and “di-
rected”) to another (see Cheney and Seyfarth 2005). Surprisingly, an interesting case
for possible spontaneous sign use by non-humans, made by Savage-Rumbaugh (1998)
has not received much attention in the literature. It concerns the third type of channel:
the external-visible one. During troop migrations, wild bonobos have been observed to
break and leave branches at path crossings, possibly signaling to other members of the
troop following them the direction that they had taken. The suggestion is that given the
ecological context (dense vegetation preventing visual contact, predators that would
be informed in the case of vocal signaling), this was a strategy consciously chosen by
some individual troop members, and then became socially transmitted. If this interpre-
tation were to be confirmed with more cases and better documentation, the branches
would be almost literally “pointers” (i.e. deictic signs) fulfilling the conditions for signi-
fication and intentional communication given earlier. Unsurprisingly, Tomasello (1999)
is skeptical, since breaking tree branches and dragging them is part of the behavioral
repertoire of the species, and is used for a variety of “display” functions. Still, this is
a good example, if only as a thought experiment of sorts, alerting us that communication
“in the wild” could take forms and modalities that are not readily apparent to us. The
classification in Tab. 34.2 takes an intermediary position (between Tomasello and Sav-
age-Rumbaugh) calling the branches along the path marks: it is at least possible that the
natural branch breaking behavior may become “ritualized” in a particular troop, so that
some members voluntarily leave them along the way so as to influence the behavior of
those following in a (literally) desired direction, though without involving intentional
communication proper, i.e. involving a higher-level intention for their intended meaning
to be understood.
3.2.3. Gestures, pantomime, “vocal gestures” and picture comprehension

Given the lack of clear evidence for spontaneous sign use in non-human animals, and
the highly circumstantial and much debated evidence from paleontology and archeol-
ogy for good examples of pre-linguistic, but nevertheless significational meaning and
communication, we need to turn to early childhood. To define the border between “pre-
linguistic” and the “linguistic” child is not unproblematic. The first words appear
around the first birthday, but developmental psychologists from Vygotsky (1962) to
the present disagree on whether they are truly “symbolic” (i.e. signs as here defined)
rather than “indices” (Piaget 1962) that could be understood as level 2 communicative
signals, i.e. associations between a vocalization and a desired object or event. It is first
with the “vocabulary spurt” around the middle of the second year, and clearly by
20 months that it is generally uncontested that the child has made the entry into
language.
Considering the period between 9 and 18 months, on the other hand, there is strong
evidence that the (typically developing) child has become a sign user, foremost in the
bodily-visible channel (Acredolo and Goodwyn 1990; Bates, Camaioni and Volterra
1975; Bates 1979; Blake, Osborne, Cabral, et al. 2003; Carpenter, Nagell and Tomasello
1998; Liszkowski, Carpenter, Henning, et al. 2004; Piaget 1962). This period can be sub-
divided in two stages. The child’s first bodily communicative signals are dyadic (e.g. rais-
ing the hands to express the wish to be picked up) and when triadic, function as requests
for objects. Thus they resemble the “gestures” of the great apes discussed in the previ-
ous sub-section. In a recent review, Pika asks a pertinent question “Gestures of apes
and pre-linguistic human children: Similar or different?” and concludes that there are
both similarities and differences: “Many human gestures are … used to direct the atten-
tion and mental states of others to outside entities… Apes also gesture… but use these
communicative means mainly as effective procedures in dyadic interactions to request
action from others” (Pika 2008: 131–132). While the ontogenetic progression needs to
be more carefully studied, especially in a cross-cultural perspective, it seems that the
gestures that are specific for human children and which “direct the attention of others
to some third entity, simply for the sake of sharing interest in it or commenting on it”
(Pika 2008: 131) appear clearly from about 13–14 months. It is in part a terminological
issue, but Tab. 34.2 reserves the term gesture for those (human-specific) bodily ex-
pressions which (a) “stand” for a an actual or imagined object, action or event,
and (b) in which this sign relationship is intentionally communicated. It is possible
to have (a) without (b), as in “private” symbolic play or reenactment, but (b) clearly
requires (a). Since the relationship between expression and meaning is not (yet) con-
ventional, there are two ways in which the meaning can be “transmitted”: on the
basis of resemblance between R(epresentamen) and O(object) (see Section 2.2.3),
i.e. through iconic gestures, and through declarative (as opposed to imperative) point-
ing. Children’s iconic gestures are performed from a “character viewpoint” rather
than an “observer viewpoint” (McNeill 2005) or a “first-person perspective” rather
than a “third-person perspective” (Zlatev and Andrén 2009). Thus, there is no
clear distinction between iconic gestures and “pantomime” before the emergence
of language.
What about the other two channels on this level? While this is controversial, I would
suggest that prior to the vocabulary spurt around the middle of the second year, the
child’s first “words” serve a subordinate role to gestural communication, as a supple-
ment to the multimodal communicative signal (Clark 1996). Their conventional refer-
ential function is not yet clear to the child and they serve as “vocal gestures” – a role
more sophisticated than that of directed calls (level 2), but not yet part of a linguistic
system (level 4). This is consistent with the growing acceptance of the view of “gesture
as the cradle of speech” (see Acredolo and Goodwyn 1990; Iverson and Goldin-
Meadow 1998; Lock and Zukow-Goldring 2010).
Finally, a similar development seems to occur in the channel of external-visual re-

presentations during the period 9–20 months. Children’s motoric skills are not yet
mature in order to be able to produce pictorial signs (i.e. representational drawings)
even by the end of this period. But if we look at studies of picture perception and
understanding, we find a transition from a “reality mode” (see Persson 2008)
in which the picture is confused with the object depicted, to the beginning of a “pic-
torial mode” in which the picture is understood to be a pictorial sign (at least by
18 months), though difficulties in establishing (and maintaining in memory) such so
called “dual representation” (see DeLoache 2004) persist until the end of the third
year, depending on the type of sign vehicle used and on the nature of the experimental
paradigm.
3.2.4. Language: signed, spoken and systematic external representations

Following a transitional period of one word utterances supplemented non-redundantly
with gestures to form word-gesture combinations (Iverson and Goldin-Meadow 1998),
toward the end of the second year most children become increasingly proficient in com-
bining linguistic expressions, and learning their internal relationships. Depending on
their social environment more than on their perceptual capacities, this takes place
either in the spoken or in the “bodily-visible” channel, i.e. they become at first appren-
tices and eventually masters of either a spoken or a signed language. In some respects
this is a “constructive” process (Tomasello 2003), and studies of the spontaneous emer-
gence of Nicaraguan Sign Language (NSL) among deaf school-children who were being
taught Spanish through lip-reading and writing (e.g. Senghas, Kita and Özyürek 2004)
have shown that children not only spontaneously acquire an existing language, but are
capable of co-creating one across several generations (“cohorts”) of interacting signers.
Still, common to both the “acquisition” and the “construction” perspectives is that what
is ultimately established is a “socially-shared symbolic system” (Nelson and Shaw 2002)
or a “conventional-normative semiotic system for communication and thought” (Zlatev
2008a). Such definitions of language include the two key features distinguishing lan-
guage from pre-verbal gestures, (from which it gradually emerged in the case of
NSL), as well as most representational images: (a) conventionality, in the sense of
signs and their relations being commonly known and normative (Itkonen 2003, 2008)
and (b) systematicity, most evident (and studied) on the sentence level as “grammar”,
but also on the level of discourse.
With literacy, to which children are introduced from 3 to 7 years of age, depending
on the educational practices of different (literate) societies, a “universe of discourse”
opens up. “Externally embodied” signs, and sign complexes such as (verbal) texts, (com-
plex) pictures, numerical and graphic representations – in various combinations and
media – constitute a considerable, if not the major part of the meaningful world for
an increasing number of people, in our increasingly technologized societies. At the
same time, the “lower” levels of meaning and communication continue to operate in
parallel, and it would be both an intellectual mistake, committed by representatives
of post-structuralism such as Derrida (1976), and a grave social mistake to de-value
them, by either playing down their importance, or else attempting to assimilate them
to the higher levels.
4. Conclusions
About a decade ago, assessing the status quo within linguistic and philosophical seman-
tics, cognitive science and semiotics, I made the following pessimistic pronouncement:
Our conception of meaning has become increasingly fragmented, along with much else in
the increasing “postmodernization” of our worldview. The trenches run deep between dif-
ferent kinds of meaning theories: mentalist, behaviorist, (neural) reductionist, (social) con-
structivist, functionalist, formalist, computationalist, deflationist… And they are so deep
that a rational debate between the different camps seems impossible. The concept is
treated not only differently but incommensurably within the different disciplines. (Zlatev
2003: 253)
This served as the motivation for attempting to formulate “an outline of a unified bio-
cultural theory of meaning”, giving a foundational place to life (rather than machines),
and proposing various hierarchies of meaning in evolution and development, which in
a broadly continuous framework could also accommodate qualitative changes. Such
ambitions may have been somewhat premature, but since then, several impressive at-
tempts at integrational theories of meaning have been proposed (Brier 2008; Emmeche
2007; Sonesson 2007b; Stjernfelt 2007) as well as a rapprochement between phenomen-
ology and cognitive science (Gallagher and Schmicking 2010; Gallagher and Zahavi
2008; Thompson 2007). The appearance of the journal Cognitive Semiotics can be
seen as a reflection of the same need to counter the fragmentation described in the
quotation above. The Semiotic Hierarchy (Zlatev 2009) and its extension to the notion
of embodiment and communication explored in this chapter are in line with these
developments.
What is common to all these approaches is an effort to assert “the primacy of the
body”, but without falling into any form of biological reductionism in which the body
(with focus on the brain) is treated as kind of physical object, a sophisticated machine.
Another common motivation is a desire to point out that “higher levels” of meaning,
communication and intersubjectivity presuppose lower ones: evolutionarily, develop-
mentally, but also “synchronically”. Meaning and communication are rooted in the bio-
logical, lived and significational bodies interacting with their respective “worlds” (see
Tab. 34.1). This is important since neglecting the body in theorizing leads to distorted
accounts involving at one extreme beliefs in innate “language organs”, and at another
extreme, claims that “everything is a text”. Still much worse would be a cultural
devaluing of the living and lived body in an over-technological society and “globalized”
world. This could potentially lead to the experience of a vacuum of meaning and
breakdown in communication.
5. References
Acredolo, Linda and Susan Goodwyn 1990. Sign language among hearing infants: The spontane-
ous development of symbolic gestures. In: Virginia Volterra and Carol J. Erting (eds.), From
Gesture to Language in Hearing and Deaf Children, 68–78. Berlin: Springer.
Almeida, Olinda G., António Miranda, Pedro Frade, Peter C. Hubbard, Eduardo N. Barata and
Adelino V. M. Canário 2005. Urine as a social signal in the Mozambique Tilapia (Oreochromis
mossambicus). Chemical Senses 30 (suppl 1): i309–i310.
Bates, Elizabeth 1979. The Emergence of Symbols. Cognition and Communication in Infancy. New
Bates, Elizabeth, Luigia Camaioni and Virginia Volterra 1975. The acquisition of performatives
prior to speech. Merrill-Palmer Quarterly 21: 205–226.
Berelson, Bernard and Gary A. Steiner 1964. Human Behavior. New York: Harcourt, Brace and
World.
Beshkar, Majid 2008. Animal consciousness. Journal of Consciousness Studies 15(3): 5–34.
Blake, Joanna, Patricia Osborne, Marlene Cabral and Pamela Gluck 2003. The development of
communicative gestures in Japanese infants. First Language 23(1): 3–20.
Brier, Soren 2008. Cybersemiotics: Why Information Is Not Enough. Toronto: University of
Toronto Press.
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys.
Mahwah, NJ: Lawrence Erlbaum.
Carpenter, Malinda, Katherine Nagell and Michael Tomasello 1998. Social cognition, joint atten-
tion, and communicative competence from 9 to 15 months. Monographs of the Society of
Research in Child Development 63(4). Boston, MA: Blackwell.
Cheney, Dorothey L. and Robert M. Seyfarth 2005. Constraints and preadaptations in the earliest
stages of language evolution. Linguistic Review 22: 135–159.
Chomsky, Noam 1965. Aspects of a Theory of Syntax. Cambridge: Massachusetts Institute of Tech-
nology Press.
Clark, Andy 1999. An embodied cognitive science? Trends in Cognitive Sciences 3(9): 345–351.
Dance, Frank E. 1970. The “concept” of communication. Journal of Communication 20:
201–220.
Dance, Frank E. and Charles E. Larson 1976. The Foundations of Human Communication: A The-
oretical Approach. New York: Holt, Rinehart & Winston.
Deacon, Terrence 1997. The Symbolic Species: The Co-evolution of Language and the Brain. New
York: Norton.
DeLoache, Judy S. 2004. Becoming symbol-minded. Trends in Cognitive Sciences 8: 66–70.
Derrida, Jacques 1976. Of Grammatology. Baltimore: John Hopkins University Press.
Donald, Merlin 2001. A Mind so Rare: The Evolution of Human Consciousness. New York:
Norton.
Edelman, Gerald M. 1992. Bright Air, Brilliant Fire: On the Matter of the Mind. New York: Basic
Books.
Emmeche, Claus 2007. On the biosemiotics of embodiment and our human cyborg nature. In: Tom
Ziemke, Jordan Zlatev and Roslyn M. Frank (eds.), Body, Language, Mind: Volume 1,
Embodiment, 411–430. Berlin: De Gruyter Mouton.
Fodor, Jerry A. 1981. Representations. Cambridge: Massachusetts Institute of Technology Press.
Gallagher, Shaun 2005. How the Body Shapes the Mind. Oxford: Oxford University Press.
Gallagher, Shaun and Daniel Schmicking 2010. Handbook of Phenomenology and Cognitive
Sciences. Dordrecht, the Netherlands: Springer.
Gallagher, Shaun and Dan Zahavi 2008. The Phenomenological Mind: An Introduction to Philos-
ophy of Mind and Cognitive Science. London: Routledge.
Gallese, Vittoria and George Lakoff 2005. The brain’s concepts: The role of the sensori-motor sys-
tem in conceptual knowledge. Cognitive Neuropsychology 22: 445–479.
Gode, Alexander 1959. What is communication. Journal of Communication 9: 3–20.
Grice, Paul 1989. Meaning. In: Paul Grice (ed.), Studies on the Way of Words, 213–223. Cambridge,
Haser, Verena 2005. Metaphor, Metonymy and Experientialist Philosophy. Berlin: De Gruyter
Mouton.
Hauser, Marc D. 1996. The Evolution of Communication. Cambridge: Massachusetts Institute of

Technology Press.
Hoben, John B. 1954. English communication at Colgate re-examined. Journal of Communication
4: 72–92.
Husserl, Edmund 1989. Ideas Pertaining to a Pure Phenomenology and to a Phenomenological
Philosophy, Second Book. Dordrecht, the Netherlands: Kluwer. First published [1952].
Itkonen, Esa 1978. Grammatical Theory and Metascience: A Critical Inquiry into the Philosophical
and Methodological Foundations of “Autonomous” Linguistics. Amsterdam: John Benjamins.
Itkonen, Esa 2003. What is Language? A Study in the Philosophy of Linguistics. Turku, Finland:
Turku University Press.
Itkonen, Esa 2008. The central role of normativity for language and linguistics. In: Jordan Zlatev,
Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.), The Shared Mind: Perspectives on In-
Iverson, Jana M. and Susan Goldin-Meadow 1998. The Nature and Functions of Gesture in Chil-
dren’s Communication. San Fransisco: Jossey-Bass.
Jackendoff, Ray S. 1983. Semantics and Cognition. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Johansson, Sverker 2005. Origins of Language: Constraints on Hypotheses. Amsterdam: John
Benjamins.
Johnson, Mark and Tim Rohrer 2007. We are live creatures: Embodiment, American Pragmatism
and the cognitive organism. In: Tom Ziemke, Jordan Zlatev and Roslyn M. Frank (eds.), Body,
Language and Mind. Vol 1: Embodiment, 17–54. Berlin: De Gruyter Mouton.
Krois, John M., Mats Rosengren, Angela Steidele and Dirk Westerkamp 2007. Embodiment in
Cognition and Culture. Amsterdam: John Benjamins.
Leavens, David A., William D. Hopkins and Kim A. Bard 2008. The heterochronic origins of
explicit reference. In: Jordan Zlatev, Timothy P. Racine, Chris Sinha, and Esa Itkonen (eds.),
The Shared Mind: Perspectives on Intersubjectivity, 187–214. Amsterdam: John Benjamins.
Leavens, David A. and Timothy P. Racine 2009. Joint attention in apes and humans: Are humans
unique? Journal of Consciousness Studies 16(6–8): 240–267.
Lindblom, Jessica and Tom Ziemke 2007. Embodiment and social interaction: A cognitive science
perspective. In: Tom Ziemke, Jordan Zlatev, and Roslyn M. Frank (eds.), Body, Language and
Mind. Vol 1: Embodiment, 129–163. Berlin: De Gruyter Mouton.
Liszkowski, Ulf, Malinda Carpenter, Anne Henning, Tricia Striano and Michael Tomasello
2004. Twelve-month-olds point to share attention and interest. Developmental Science 7:
297–307.
Littlejohn, Stephen W. 1999. Theories of Human Communication. Belmont, CA: Wadsworth.
Lock, Andrew and Patricia Zukow-Goldring 2010. Prelinguistic communication. In: Gavin J.
Bremmer and Theodore D. Wachs (eds.), The Wiley-Blackwell Handbook of Infant Develop-
ment, 2 Volume Set, 2nd Edition. Oxford: Wiley-Blackwell.
Lord, Kathryn, Mark Feinstein and Raymond Coppinger 2009. Barking and mobbing. Behavioural
Processes 81(3): 358–368.
Lotman, Yuri M. 1990. Universe of Mind. A Semiotic Theory of Culture. New York: Iradj Bagher-
zade Tauris and Co.
Maturana, Humbert R. and Francisco J. Varela 1980. Autopoiesis and Cognition—the Realization
of the Living. Dordrecht, the Netherlands: Reidel.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. London: Routledge.
Miles, H. Lyn 1990. The cognitive foundations for reference in a signing orangutan. In: Parker, Sue
T. and Kathleen R. Gibson (eds.), “Language” and Intelligence in Monkeys and Apes, 511–539.
Miller, Gerald R. 1966. On defining communication: Another look. Journal of Communication 16:
90–112.
Nelson, Katherine and Lea K. Shaw 2002. Developing a socially shared symbolic system. In: James
P. Byrnes and Eric Amsel (eds.), Language, Literacy and Cognitive Development, 27–57. Hills-
Patterson, Francis 1980. Innovative use of language in a gorilla: A case study. In: Katherine Nelson
(ed.), Children’s Language. Vol. 2, 497–561. New York: Gardner Press.
Paukner, Annika, Stephen J. Suomi, Elisabetta Visalberghi and Pier F. Ferrari 2009. Capu-
chin monkeys display affiliation toward humans who imitate them. Science 325(5942):
880–883.
Peirce, Charles S. 1958. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Harvard Uni-
versity Press.
Persson, Tomas 2008. Pictorial Primates: A Search for Iconic Abilities in Great Apes. Lund,
Sweden: Lund University Cognitive Studies, 136.
Piaget, Jean 1962. Play, Dreams and Imitation in Childhood. New York: Norton. First published
[1945].
Pika, Simone 2008. Gestures of apes and pre-linguistic human children: Similar or different? First
Language 28(2): 116–140.
Pinker, Steven 1994. The Language Instinct. New York: William Morrow.
Popper, Karl 1962. Objective Knowledge. Oxford: Oxford University Press.
Popper, Karl 1992. In Search of a Better World: Lectures and Essays from Thirty Years. London:
Routledge.
Ruesch, Jurgen 1957. Technology and social communication. In: Lee Thayer (ed.), Communica-
tion: Theory and Research, 452–481. Springfield, IL: Thomas.
Savage-Rumbaugh, Sue 1998. Scientific schizophrenia with regard to the language act. In: Jonas
Langer and Melanie Killen (eds.), Piaget, Evolution and Development, 145–169. Mahwah,
Searle, John 1995. The Construction of Social Reality. London: Allen Lane.
Senghas, Ann, Sotaro Kita and Asli Özyürek 2004. Children creating core properties of language:
Evidence from an emerging sign language in Nicaragua. Science 305: 1779–1782.
Sinha, Chris 2004. The evolution of language: From signals to symbols to system. In: Kimbrough
D. Oller and Ulrike Griebel (eds.), Evolution of Communication Systems: A Comparative
Approach, 217–235. Cambridge: Massachusetts Institute of Technology Press.
Sinha, Chris 2009. Objects in a storied world: Materiality, normativity, narrativity. Journal of Con-
sciousness Studies 16(6–8): 167–190.
Sinha, Chris and Kristine Jensen de Lopéz 2000. Language, culture and the embodiment of spatial
cognition. Cognitive Linguistics 11(1–2): 17–41.
Sonesson, Goran 1989. Pictorial Concepts. Lund, Sweden: Lund University Press.
Sonesson, Goran 2007a. The extensions of man revisited. From primary to tertiary embodiment.
In: John M. Krois, Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.), Embodi-
ment in Cognition and Culture, 27–56. Amsterdam: John Benjamins.
Sonesson, Goran 2007b. From the meaning of embodiment to the embodiment of meaning: A
study in phenomenological semiotics. In: Tom Ziemke, Jordan Zlatev and Roslyn M. Frank (ed.),
Body, Language and Mind, Vol 1: Embodiment, 85–127. Berlin: De Gruyter Mouton.
Sonesson, Goran 2009. The view from Husserl’s lectern: Considerations on the role of phenomen-
ology in cognitive semiotics. Cybernetics and Human Knowing 16(3–4): 107–148.
Stern, Daniel N. 2000. The Interpersonal World of the Infant: A View from Psychoanalysis and De-
velopmental Psychology. New York: Basic Books. First published [1985].
Stjernfelt, Frederik 2007. Diagrammatology. Dordrecht, the Netherlands: Springer.
Taglialatela, Jared P., Jamie L. Russell, Jennifer A. Schaeffer and William D. Hopkins 2009. Visua-
lizing vocal perception in the chimpanzee brain. Cerebral Cortex 19(5): 1151–1157.
Thompson, Evan 2007. Mind in Life: Biology, Phenomenology and the Sciences of Mind. Cam-
bridge, MA: Harvard University Press.
University Press.
Tomasello, Michael 2003. Constructing a Language. A Usage-Based Theory of Language Acquisi-
tion. Cambridge, MA: Harvard University Press.
Varela, Francisco, Evan Thompson and Eleanor Rosch 1991. The Embodied Mind. Cambridge:
Massachusetts Institute of Technology Press.
von Uexküll, Jakob 1982. The theory of meaning. Semiotica 42(1): 25–82. First published [1940].
Vygotsky, Lev S. 1962. Thought and Language. Cambridge: Massachusetts Institute of Technology
Press.
Wallace, Brendan, Alastair Ross, John B. Davies and Tony Anderson 2007. The Mind, the Body
and the World. Psychology after Cognitivism? Exeter, UK: Imprint Academic.
Wilson, Margaret 2002. Six views of embodied cognition. Psychonomic Bulletin and Review 12(4):
625–636.
Wittgenstein, Ludwig 1953. Philosophical Investigations. Oxford: Basil Blackwell.
Zahavi, Dan 2003. Husserl’s Phenomenology. Stanford, CA: Stanford University Press.
Ziemke, Tom, Jordan Zlatev and Roslyn M. Frank 2007. Body, Language and Mind. Vol 1:
Embodiment. Berlin: De Gruyter Mouton.
Ziemke, Tom 2003. What’s that thing called embodiment? In: Richard Alterman and David Kirsh
(eds.), Proceedings of the 25th Annual Meeting of the Cognitive Science Society, 1305–1310.
Zlatev, Jordan 1997. Situated Embodiment: Studies in the Emergence of Spatial Meaning. Stock-
holm: Gotab.
Zlatev, Jordan 2003. Meaning = Life + (Culture): An outline of a unified biocultural theory of
meaning. Evolution of Communication 4(2): 253–296.
Hampe (ed.), From Perception to Meaning: Image Schemas in Cognitive Linguistics, 313–343.
Zlatev, Jordan 2007. Embodiment, language and mimesis. In: Tom Ziemke, Jordan Zlatev and Ro-
slyn M. Frank (eds.), Body, Language and Mind. Vol 1: Embodiment, 297–337. Berlin: De
Gruyter Mouton.
Zlatev, Jordan 2008a. The co-evolution of intersubjectivity and bodily mimesis. In: Jordan Zlatev,
Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.), The Shared Mind: Perspectives on In-
Zlatev, Jordan 2008b. The dependence of language on consciousness. Journal of Consciousness
Studies 15: 34–62.
Zlatev, Jordan 2008c. From proto-mimesis to language: Evidence from primatology and social
neuroscience. Journal of Physiology 102: 137–152.
Zlatev, Jordan 2009. The Semiotic Hierarchy: Life, Consciousness, Signs, Language. Cognitive
Semiotics 4: 169–200.
Zlatev, Jordan 2010. Phenomenology and cognitive linguistics. In: Shaun Gallagher and Daniel
Schmicking (eds.), Handbook on Phenomenology and Cognitive Science, 415–446. Dordrecht,
the Netherlands: Springer.
Zlatev, Jordan and Andrén, Mats 2009 Stages and transitions in children’s semiotic development.
In: Jordan Zlatev, Mats Andrén, Marlene Johansson Falck and Carita Lundmark (eds.), Studies
in Language and Cognition, 380–401. Newcastle: Cambridge Scholars.
Zlatev, Jordan, Timothy P. Racine, Chris Sinha and Esa Itkonen 2008. The Shared Mind: Perspec-
tives on Intersubjectivity. Amsterdam: John Benjamins.
Jordan Zlatev, Lund (Sweden)

35. Body and speech as expression of inner states 551
35. Body and speech as expression of inner states

1. Introduction
2. General concepts of German Expression Psychology
3. Selected authors
4. Relevance for current research
5. References
Abstract
This article provides a sketch of the theoretical framework of German Expression
Psychology (GEP) and discusses the forms and functions of bodily and verbal types
of communication that express inner states. Starting with a brief historical overview, we
discuss general concepts of the German Expression Psychology framework, in particular
with respect to the definition of expression, the relationship between expression and its
subject, and the perception of expression. Within each of these areas special attention
is given to the face, body and voice as indicators of inner states. Following this general
overview of German Expression Psychology, we focus on the contribution of three
selected authors, namely, Philipp Lersch, Paul Leyhausen and Egon Brunswik, who
have been particularly influential in the field of German Expression Psychology. For
Lersch, we consider the co-existential relationship between affect and expression, the de-
tailed anatomical description of expressions, as well as the analysis of dynamic aspects of
expressions. Leyhausen added an ethological perspective on expressions and perceptions.
Here, we focus on the developmental aspects of expression and impression formation,
and differentiate between phylogenetic and ontogenetic aspects of expression. Brunswik’s
Lens Model allows a separation between distal indicators on the part of the sender and
proximal percepts on the part of the observer. Here, we discuss how such a model can
be used to describe and analyze nonverbal communication on both the encoding and de-
coding side. Deriving from the presentation of all three authors, we outline the general
relevance of German Expression Psychology for current research, specifically with
respect to the definition and function of expressions and perceptions, and existing
approaches to the study of verbal and nonverbal behavior.
1. Introduction
Since Darwin (1872), the concept of expression has played a central role in the psycho-
logical understanding of emotions. As in other domains, the field of expression research
has undergone several changes and transformations. Interestingly, these conceptual and
methodological shifts have been linked to different research traditions in Europe and
the USA. German Expression Psychology as reviewed in this chapter encompasses
all research on nonverbal behavior that has been performed by German-speaking
expression psychologists (Asendorpf and Wallbott 1982). In this research vein, topics
such as facial expression, gestures and body movements were active areas of inquiry
and part of the curriculum in psychology at German universities until the sixties (see
Scherer and Wallbott 1990). However, just like the study of emotions, which had lost
its attraction in the past decades of psychological research, expression psychology
soon literally disappeared from the field. The approach used by many traditional
German Expression Psychology psychologists such as Lersch, Kirchhoff, Klages and
Leonhard was criticized for being speculative and unscientific in nature. Many empirical
psychologists rejected this type of philosophical and phenomenological analysis. As a
result, German Expression Psychology disappeared at the end of the sixties and was re-
placed in the seventies by a new research domain called “nonverbal communication
research.” This new line of research, which was influenced primarily by American psy-
chology, was driven by a research model based on systematic and objective measure-
ment. With its development, a new generation of psychologists working on expressive
behavior emerged in Germany and Europe, almost independently of the classic field
of “expression psychology” (Asendorpf and Wallbott 1982).
The present paper aims to review important concepts and ideas of German Expres-
sion Psychology which no longer receive much attention but have contributed to our
understanding of nonverbal communication. A complete overview of German Expres-
sion Psychology and its contribution for each channel of communication (face, body,
voice) can be found in a series of special issues on German Expression Psychology
by Asendorpf, Wallbott and Helfrich published in the Journal of Nonverbal Behavior
in 1982 and 1986. In the present paper, therefore, an exhaustive description of German
Expression Psychology will not be provided. Instead, following a short summary of Ger-
man Expression Psychology based on the articles just referred to, we will focus on the
contribution of three selected authors, namely Lersch, Leyhausen and Brunswik. Each
of these authors has added his own definition and perspective to the field of expression
psychology. Deriving from the description of the work of all three authors, we will out-
line the extent to which German Expression Psychology is still relevant for current
research on nonverbal behavior.
2. General concepts of German Expression Psychology

German Expression Psychology represents one of the first research traditions to sys-
tematically investigate expressive behavior. In this tradition, the main focus was on
the expression as an observable behavior or act. The emotion-eliciting processes
were only rarely studied by experimental manipulation. Rather, contents of expressions
were generally considered to be “permanent or transient psychic states, such as affec-
tive states, personality characteristics, such as ‘temperaments’ (Temperament), and
(rarely) social attitudes” (Asendorpf and Wallbott 1982: 137). Expression was conse-
quently a more fixed concept in which the expressive act with its observable features
was the crucial research issue. Common to the definitions of expression was the notion
of genuineness or spontaneity. Expressions were regarded mainly as externalizations
of inner-psychic states or traits, and therefore spontaneous and genuine in nature
(Asendorpf and Wallbott 1982).
The ultimate goal of expression psychology was to develop diagnostic tools. For this
purpose, there were attempts to establish a lexicon or inventory of emotional expres-
sions that assigned meaning to each type of expression (see Klages 1926; Lersch
1957, 1961). However, the procedures used were based mainly on case studies and per-
sonal observations. Moreover, the emphasis of many of these investigations lay in the
subjective description of perceived cues and impression formation. The lack of objective
measurement was particularly severe in research on facial expressions (Asendorpf 1982).
For body movements, gait and gestures, measurement techniques ranged from subjec-
tive descriptions and judgments to more objective approaches (Wallbott 1982). How-
ever, even in the case of complex and demanding techniques (such as time-way
graphs, light paths, and stereo photography), there was a tendency to interpret the cap-
tured data in subjective and evaluative terms (Scherer and Wallbott 1990). Similarly, for
voice and speech, a distinction was drawn between a genetic, acoustic or auditory level
and an impressionistic or phenomenological one. Most research interests, however,
focused on the phenomenological approach with its holistically perceived qualities of
spoken messages. In contrast, electro-acoustical methods of recording on the auditory
level were treated with caution and considered useful only in combination with a
holistic approach (Helfrich and Wallbott 1986).
Clearly, the focus of German Expression Psychology was not on the abstraction of
single variables and their objective measurement but rather on the supposed meaning
of expressions. Although an expressive act could have more than one meaning, the rela-
tionship between the expression and its subject was regarded rather as constant and in-
variant. In many cases, verbal and nonverbal phenomena were seen as indicative of
individual characteristics such as affective states and personality attributes (Asendorpf
and Wallbott 1982), German Expression Psychology thereby being diagnostic in its
approach. This was particularly pronounced in the case of research on gait and gestures.
Here, the primary interest was not in the expression of emotion, but more in the expres-
sion of character or personality itself (Wallbott 1982). In addition to the functional
value of body movements, it was thought that valuable information about the person’s
character could be extracted. In consequence, research interests were mainly concerned
with behavior characteristics and typological approaches to personality such as those
suggested by Kretschmer: for example, pyknics, athletics and leptosomics (see Kretsch-
mer 1940). A similar approach applied to research on voice and speech. Although
objective views existed that treated vocal features merely as indicators of speech con-
tent, most studies adopted a subjective person-related view. That is, an association
was made between expressive features and the underlying personality traits of the
speaker. Research attempts consequently focused on the identification and interpreta-
tion of different speech types and vocal characteristics in relation to personality or by
attributing certain traits by simply listening to the voice (Helfrich and Wallbott 1986).
The basis for such associative links between expression and personality were mainly
semantic analogies. That is, the person-specific characteristics were inferred from a
semantic description of the expression. In most cases, the meaning of the expression
was derived by analogy from the function of the respective action (Asendorpf and Wall-
bott 1982). For example, eyelid droop has the function of diminishing the visual input
from outside. By analogy, it follows that a person with lowered eyelids should have a
passive and unfocused attitude towards the world (Lersch 1961). Unfortunately, opera-
tionalizations for “passive” and “unfocused” were seldom given. Moreover, such infer-
ence-by-analogy syllogisms were not subject to empirical research that included
standardized tests. Such analogical-metaphorical conclusions, close to the concepts of
naı̈ve psychology, were responsible for much of the later criticism of German Expres-
sion Psychology as being unscientific and subjective in nature (Asendorpf and Wallbott
1982).
Within German Expression Psychology, concepts of expression and perception were
highly interlinked. That is, no clear distinction was made between expression and
impression. Rather, both concepts were seen as a gestalt – a holistic perceptual pattern.
As with expression, conclusions were often drawn by analogy and not directly traceable
(Asendorpf and Wallbott 1982). However, more explicit theories also existed that
tried to determine the effects of specific behavioral cues on impression formation.
This was particularly the case for research on the face. Here, the main focus of German
Expression Psychology consisted of the perception and imitation of facial expressions
(Asendorpf 1982). Specifically, the role of certain facial areas and components in evok-
ing behavioral responses was of research interest. Such component studies consisted of
the systematic variation of specific facial characteristics using schematized stimuli or
drawings. Other experiments explored the relative contribution of static (physiog-
nomic) and dynamic (pathognomic) cues, or facial and verbal cues in impression forma-
tion (Asendorpf 1982). A crucial concept was that of imitation in which the feelings
experienced by the perceiver (due to the mimicry of the sender’s expression) were
thought to lead to the attribution of feelings to the sender. In order to describe this
expression-impression process, constructs such as expression, perception and attribu-
tion were important. Although the focus was mainly on the perceived cues and their
imitation, this approach was not far from Brunswik’s Lens Model (see Asendorpf
and Wallbott 1982; Brunswik 1956). Here also a distinction was made between externa-
lization, perception and inference. In the Lens Model, a more fine-grained analysis was
conducted with respect to the type of eliciting emotion, the cues expressed and the im-
pressions formed. The majority of studies in German Expression Psychology failed to
go to such a detailed and objective level of description and in consequence experimental
designs were often unrepresentative with respect to their methodology.
3. Selected authors
In the following, we will focus on the concepts and theories of three authors influential
in the field of German Expression Psychology; namely Phillipp Lersch, Paul Leyhausen,
and Egon Brunswick.
3.1. Philipp Lersch

After obtaining a doctorate in German literature, Lersch studied psychology and phi-
losophy under the supervision of Erich Becher, Moritz Geiger and Alex Pfänder. For
Lersch (1961), defining expression required the distinction between inside and outside.
According to him, contents of the outside world were perceived in a way that was mean-
ingful to the inner state of the person. In this sense, expressions always contained a
meaning that could be understood through interpretation of sensory signals of an
inner state.
The relationship between expression and inner state (affect) was thought to be inter-
dependent and co-existential. Both the sensory phenomenon as expressive signal and its
mental contents were grounded in this co-existence. Lersch rejected the dualist view of
body and inner state. For him, expression and affect were not connected in a causal way.
Rather, he considered them to be two parts of the same process, together constituting a
whole. The scream of terror, the blushing in shame, and the cries of mourning were all
part of a psychophysical unity of the living mental body. Common to the definitions of
expression was the notion of spontaneity which was highly regarded in the field of
German Expression Psychology. Expressive acts were thought of as spontaneous signals

of inner-psychic states. Given the continuous fluctuation of feelings and mental con-
tents, expressive signals were proposed to be continuously changing. By this definition,
Lersch distinguished spontaneous expressions from other expressive signals that were
static or causal in their nature (e.g. hand-writing). The latter phenomena were not con-
sidered to be direct, but rather seen as traces of past expressive acts. A distinction was
also drawn between physiognomic cues that were due to the morphological character-
istics of the face and recurring expressive acts that had been solidified as a mimic trace
in the face. Although Lersch’s concept of expression included the notion of temporal
continuity, the relationship between expression and inner state or personality trait
was regarded as constant and permanent. Thus, variability in the meaning of expres-
sions was restricted, and the role played by social and interpersonal factors was hardly
considered.
An important facet of Lersch’s approach was the analysis of the impressions (Ein-
drucksanalyse) people formed when viewing facial expressions. Given the fleeting
nature of most expressions, he argued that the impressions formed would mostly be
vague. That is, people would not be able to provide information about the basis on
which they formed their impressions. Tracing these impressions back to their phenom-
enological roots was therefore seen as a crucial task in expression psychology. Specifi-
cally, it was felt that the objective basis of impressions should be made conscious.
According to Lersch, such an approach would allow a differentiation between those
impressions that were based on morphological characteristics of the face and those
that were based on expressive acts of the individual. Lersch took great care in describ-
ing the changes in appearance of the upper and lower face based on an anatomical-
physiological analysis and provided a detailed anatomical overview of 19 different facial
muscles and their involvement in various facial actions. For the upper face, he described
actions of the eyes (i.e., eyelid raise, squinting, lids drooping, gazing) and the eyebrows
(i.e., horizontal furrows, vertical furrows). For example, in squinting, the eye aperture is
narrowed due to the contraction of two muscle innervations that are opposite to each
other: the inner eye circle muscle and the eye cover fold. For the lower face, he described
actions of the lips (e.g., lip pressing, open lips) and mouth corners (e.g., lip corner depres-
sing, smiling). For example, in lip pressing, the lower lip is pressed against the upper lip,
thereby narrowing the red parts of the lips and pinching them slightly inwards.
Although the anatomical basis of such expressive features was objectively deter-
mined, the ultimate goal was to define their diagnostic meaning. This went beyond
the description of visible and anatomical changes in the face, aiming to determine
the underlying psychological meaning of expressive acts. Which contents of the con-
scious mind are expressed? What is the psychological purpose of showing expressions?
By answering those questions, Lersch aimed to specify the meaning of expressions as
signals of inner states. Moreover, he speculated about their characterological sense
and wanted to find out what certain behaviors meant with respect to the characteristics
of the whole person. According to Lersch, squinting, for example, had the function of
more consciously observing the world as well as protecting oneself from the perception
of the outside world. Due to its dual nature, it was supposedly found in shy and dishon-
est people and in humans who feel guilty. Similarly, lip pressing was supposed to occur
in states of determination, defiance, decisiveness, and distrust. Lersch speculated about
regularities and rules according to which an expression could be explained. Using such
an approach, he aimed to set up an inventory that assigned meaning to each type of

expression. Such descriptions were highly subjective and influenced by semantic analo-
gies as was common in the field of German Expression Psychology. Operationalizations
for constructs such as dishonesty or guilt were seldom given. Moreover, such inferences
were often based on personal observations and derived from the functional value of
expressive acts.
In his article about the theory of facial expressions, Lersch (1957) proposed four laws
for the occurrence of expressive acts. The law of organ function defined facial expres-
sions as purposeful behavior specific to the function of a relevant organ within the
body (e.g., disgust reaction). Expressions could also occur without experiencing a bodily
reaction, but in the context of inner states that were similar to that experience (e.g., dis-
pleasure, reluctance, law of symbolic transference). According to the law of demonstra-
tive accentuation, certain expressions occurred to amplify feelings and attitudes of social
significance (i.e., lip pressing as a sign to demonstratively underline one’s determina-
tion). The law of contrasting accentuation proposed that expressions of maximally dif-
ferent appearance were accompanied by inner states that were opposite to each
other (i.e., smiling in laughter and lip corner depressing in melancholy). From these
descriptions of the four laws, it becomes evident that some of them were based on Dar-
win’s (1872) two principles of serviceable associated habits and of antithesis. However,
Lersch did not argue for an exclusive evolutionary explanation of each law on its own,
acknowledging that expressions could be understood by more than one law.
In his attempt to interpret expressive acts as signs of inner states, Lersch also paid
attention to the dynamic configuration of facial expressions such as the intensity and
the movement of expressions. According to him, these would provide greater depth
of information in addition to the static configuration of facial displays. Lersch distin-
guished between two types of movement: round-fluent movements, which were charac-
terized by smooth occurrence, no abrupt changes and slow offsets, and angular-stiff
movements. In the latter, the expression started quickly and was held rigidly before
abruptly returning to a neutral baseline. According to Lersch, such varying movements
could apply to facial displays (e.g., laughing), as well as to actions of the eyes and the
head. Their meaning was thought to be crucial for understanding both the inner state of
the person and possible action intentions towards the outer world. Although the inter-
pretation of movement characteristics was far from being objective and was biased by
typological approaches (e.g., Kretschmer’s 1940 cyclothymes and schizothymes), impor-
tant aspects of the nature of the underlying emotion were addressed. In this context,
first attempts were made at explaining angular-stiff movements as signs of an inhibitory
or voluntary act, and round-fluent motions as signs of an emotionally genuine act.
3.2. Paul Leyhausen

Paul Leyhausen, a student of Konrad Lorenz, performed extensive behavioral studies
on several species, most commonly felines (see Peters 2000). He was convinced that
an understanding of bodily expressions in humans could only be achieved using a com-
parative approach. By focusing on the interaction between genetically encoded origins
of behaviors and experience and learning, Leyhausen emphasized notions that are still
popular in present approaches to nonverbal behavior that are at the intersection of bio-
logical psychology, evolutionary psychology, and social neuroscience. One of the key
elements in Leyhausen’s arguments was the notion that expression and impression de-
veloped, to a certain degree, independently of one another and that their relationship
was then shaped by the evolutionary pressures that social communication put on the
respective systems (e.g., Leyhausen 1967). Specifically, he argued that the origin of
expression was phylogenetically much earlier than the development and the genetic
transmission of perceptual mechanisms that would be able to respond to expression.
This notion offers explanations for certain (mis-)communication processes that are rel-
evant for any current researcher interested in the communication of affect and intent.
Leyhausen saw the possibility of predicting future behavior and adapting/reacting
accordingly as the most important function of the communication process.
Leyhausen was peculiar in holding the assumption that expression refers to behav-
ioral manifestations of action tendencies that do not correspond to the currently pre-
dominant behavior but that reflect competing processes. Thus, the expression was
considered to be the blend of two or more conflicting behaviors or behavioral tenden-
cies. Take the example of the arching of a cat’s back in an aggressive context. Darwin
interpreted this expression as an attempt to increase its size following the principle of
antithesis (appear big when aggressive and small when submissive; see Darwin 1872: 56
and Figure 8: 58). However, following Lorenz’s analysis of dog behavior, Leyhausen
(1956, cited in 1967) demonstrated that the cat’s arched back is more likely to be the
consequence of a co-activation of two behavioral tendencies – flight and fight in this
context – and only appears as a gesture of threat due to subjective human interpreta-
tion. Nonetheless, sometimes behaviors can be observed that do not seem to serve a
purpose under conditions of competing action tendencies. Leyhausen argued that
when it was impossible to follow any of these action tendencies, displacement activities
(Übersprungshandlungen) could occur. A chicken in a fight or flight situation might
start picking on the ground as if there were food. Humans might scratch their head
when in behavioral conflict. Such behaviors would have been difficult for Darwin to
explain using his three principles guiding the origins of expressions.
When addressing the issue of the impression, Darwin believed that experiences dur-
ing one’s lifetime could somehow be transmitted to the next generation, not unlike the
mechanisms that Jean Baptiste Lamarck proposed (see also Cornelius 1996). This left
various possibilities open concerning the development of impression. Leyhausen in
turn benefited from knowledge of genetic mechanisms that were unavailable at the
time of Darwin. He was able to reject Lamarckian transmission of experience and in
turn focus on an evolutionary analysis, considering how impression mechanisms
might have evolved and what the features of such mechanisms would be. This included
the complex relationship of genotype and phenotype and features such as the stability
or the variance of features in a given population in a given environment. A central fea-
ture of the genetically transmitted capacity for impression relates to the concept of the
innate releasing mechanism (IRM). Based on studies of several species, Leyhausen ar-
gued that different individual aspects capable of triggering an innate releasing mecha-
nism follow an additive logic, rather than pattern (Gestalt) principles. This is an
important assumption because it opens doors for the strategic control of an interaction
partner’s innate releasing mechanism to achieve certain (social) goals (see also Fridlund
1994). Interestingly, in his analysis of communication in humans, Leyhausen considered
aspects including fashion or stylized behaviors, such as mannerisms of speech or gesture,
as well as behavioral traces, such as writing.
Leyhausen strongly emphasized the need for comparative research, citing the long
time period that must have been required for impression processes to evolve. He re-
jected views that focused only on humans and consequential fallacies, such as the con-
fusion of homologies and analogies in the analysis of morphological structures or
behaviors. In analyzing the communication process Leyhausen cast a wide net, consid-
ering behaviors of species as simple as protozoa (paramecia), ticks, or fish, as precursors
to flexible impression and recognition mechanisms in humans. Of particular interest was
his distinction – in higher organisms only – of “angeborenes Ausdruckserfassen,” or Ein-
drucksfähigkeit on the one hand and “erworbenes Ausdruckserfassen,” or Ausdrucksver-
stehen on the other. He referred to an innate grasp of expression as the capacity for
impression and the acquired grasp of expression as understanding of expression. The
use of the German term erfassen, which can literally be translated as ‘grasping,’ proves
to be a particularly felicitous choice of words. Contemporary advances concerning the
role of a mirror system in the pre-motor cortex (e.g., Rizzolatti and Craighero 2004) that
may play a considerable role in the communication of affect and intention (see also
Kappas and Descôteaux 2003) could indeed have more to do with grasping than think-
ing. In other words, there may be a more physical, action-oriented sense of the expres-
sion of other individuals rather than a mental construct such as matching a schema.
Such notions go back to Lipps’ theory of empathy and feature strongly in expression
psychology (e.g., Lersch 1940 and Rothacker 1941, cited in Leyhausen 1967).
3.3. Egon Brunswik

Egon Brunswik was an assistant in Karl Bühler’s (for a survey of Karl Bühler’s theory
of language, see Müller this volume) Psychological Institute (and a fellow-student of
Konrad Lorenz), and received his PhD in 1927. In 1933, he met Edward C. Tolman
in Vienna, and in 1935/1936 he received a Rockefeller fellowship enabling him to
visit the University of California. He remained at Berkeley where he became an assis-
tant professor of psychology in 1937 and a full professor in 1947. Brunswik (1952, 1956)
formulated a number of theoretical and methodological principles with regard to the
study of perception. He applied these principles to a range of phenomena including per-
ceptual constancies and illusions and also to the attribution of psychological traits based
on observation of the exterior of an individual. With respect to the concept of expres-
sion, Brunswik conducted only one study on the relationship between facial features of
schematic drawings and a series of impression characteristics (mood, age, character,
likeability, beauty, intelligence, energy; Brunswik and Reiter 1937, reprinted in Bruns-
wik 1956). However, his methodological and theoretical framework has important im-
plications for studying body and speech as expression of inner states. These implications
were further highlighted by Scherer (1978, 2003), who proposed a modified version of
Brunswik’s Lens Model for research into non-verbal communication, in particular vocal
communication of emotions.
In his Lens Model framework, Brunswik defined two main methodological and the-
oretical concepts – representative design and ecological validity respectively. Brunswik
argued that psychological processes are adapted in a Darwinian sense to the environ-
ments in which they function (Dhami, Hertwig, and Hoffrage 2004). He proposed the
method of representative design to capture these processes by sampling random stimuli
from the environment or creating stimuli in which environmental properties are
preserved. He criticized classical methods of experimental design for applying the de-
mands of representativeness to the number of subjects but not to the number of objects
(stimuli). Furthermore, he questioned the feasibility of experimental designs that select
and isolate one or a few independent variables that are varied systematically whilst
extraneous variables are held constant. In his view, the concept of representativeness
is fundamental to generalization.
Brunswik’s methodological principles were closely entwined with his theoretical out-
look. Brunswik introduced the term ecological validity to indicate the degree of corre-
lation between a proximal (e.g., retinal) cue and the distal (e.g., object) variable to
which it is related. For example, in a perceptual task, ecological validity refers to the
objectively measured correlation between, say, vertical position and size of an object
(larger objects tend to be higher up in the visual field) over a series of situations. Or,
in the domain of emotional expressions, one may compare the ecological validity of
the cue “smiling” with the cue “reported happiness” as indicators of a person-object’s
emotional state. An important aspect of Brunswik’s approach was his conviction that
the environment to which an organism must adapt cannot be perfectly predicted
from the proximal cues. A particular distal stimulus does not always imply specific prox-
imal effects and vice versa. Proximal cues are only probabilistic indicators of a distal
variable. The proximal cues are themselves interrelated, thus introducing redundancy
into the environment.
Scherer (1978, 2003) has suggested modeling the relationship between emotions and
expressive cues in a Brunswikian perspective. In this approach, the relationship
between the underlying emotion and the expressive cues is probabilistic and many
redundant cues are potentially available for both the expression and communication
of emotion. Brunswik would probably question the current dominance of “modern” dis-
crete emotion theories as represented by Ekman, Izard, and their respective collabora-
tors (e.g., Ekman 1992; Izard 1991). Discrete emotion theorists claim that there are only
a limited number of fundamental or basic emotions and that for each of these there ex-
ists a prototypical, innate, and universal expression pattern. In this basic emotion view,
the association is deterministic and only allows minor variations (e.g. a motor program
might be carried out only partially). In the Brunswikian probabilistic view, several
expressive cues will be related independently to the emotional reaction, each with a de-
fined probability. On different occasions, under different circumstances, different com-
binations of cues might be used to express the same emotion. The extended lens model
integrates the study of the production (expression) and of the perception (impression)
of emotional expressions.
In Scherer’s model (see Scherer 2003, Figure 1: 229), the internal states – in this case,
the emotions – are exteriorized in the form of distal indicators corresponding, in the
case of vocal communication, to the acoustic characteristics of the voice. The notion
of externalization covers both intentional communication of internal states and involun-
tary behavioral and physiological reactions. At the operational level, the internal states
are represented by criterion values and the distal indicators by indicator values. These
distal indicators are represented proximally by percepts which are the result of percep-
tual processing on the part of the observer. These percepts can be evaluated by percep-
tual judgments expressed as scores on psychophysical scales or dimensions. The
correlations between indicator values and perceptual judgments are designated by
the term representation coefficient, and indicate the degree of precision of the projection
of distal indicators in the perceptual space of the individual. The attribution of a state is
the result of inferential processes based on perception of the distal indicators. These at-
tributions can be evaluated by obtaining new judgments on psychological dimensions
from observers. The correlations between perceptual judgments and attributions are re-
presented in the model by utilization coefficients which provide a measure of the utili-
zation (or the weighting) of each index that is perceived when a state is inferred. The
accuracy of these attributions in relation to the objectively observed state of the indi-
vidual is defined at an operational level by the correlation between criterion values
and attributions (coefficients of accuracy).
It is important to keep in mind that an emotion may be reliably attributed to a facial
or a vocal display despite this emotion not being present in the sender. Conversely,
when receivers cannot reliably attribute emotions to senders, this does not demonstrate
that senders do not express emotions. Such an imperfect relationship between the en-
coding process and the attribution process has been demonstrated by Hess and Kleck
(1994) using dynamic facial expressions, either posed or elicited by affectively evocative
materials. Subjects were able to accurately report the cues they employed in assessing
the perceived spontaneity or deliberateness of expressions. However, these cues were
not always valid determinants of posed and spontaneous expressions. In fact, partici-
pants were relatively poor at identifying expressions of the two types, and this low dis-
crimination accuracy was found to be a result of the consistent use of these invalid cues.
Reynolds and Gifford (2001) reported a similar observation for the expression and per-
ception of intelligence in nonverbal displays. They showed that intelligence, as measured
by a test, was correlated with a few nonverbal cues that could be observed and measured
in video recordings. Participants who were requested to rate intelligence on the basis of
those videos were relatively consistent in using another set of cues to make inferences
about intelligence. However, their ratings were not influenced by the cues that were
related to the intelligence scores, and they didn’t make accurate inferences.
4. Relevance for current research

German Expression Psychology has contributed important concepts and ideas to the
study of nonverbal communication. These often neglected aspects of German Expres-
sion Psychology are of crucial interest for today’s research on the face, body and voice.
The impact of Lersch’s work certainly consisted in the careful description of changes of
appearance in the face. Although such descriptions were often incomplete and contami-
nated with inferences about meaning (see Ekman 1979), they contributed to more
extensive methods of measuring facial behavior developed later. One of the currently
most widely used descriptive systems is the Facial Action Coding System (FACS) by
Ekman and Friesen (1978) (For an account of Facial Action Coding System, see Waller
and Smith Pasqualini this volume). It describes all possible visually distinguishable
facial movements (Ekman and Friesen 1982b). Similarly to Lersch’s approach, it is
based on the anatomy of facial action. However, no inferences are made about the
meanings of facial behavior, and measurements are based on non-inferential terms
(Ekman 1982). The Facial Action Coding System also allows for the analysis of dynamic
phases such as onset, apex and offset time. Lersch was certainly one of the first re-
searchers who pointed to the importance of dynamic aspects in facial expressions.
His idea that facial motion carries crucial information about the person and the
person’s expression can be found in much recent research on facial dynamics (i.e.,
Ambadar, Schooler, and Cohn 2005; Bassili 1978; Wehrle, Kaiser, Schmidt, et al.
2000). Particularly, the distinction between angular-stiff and round-fluent movement
has relevance for today’s study of fake and genuine expressions (see Ekman and Friesen
1982a; Krumhuber and Kappas 2005; Krumhuber, Manstead, and Kappas 2007).
Leyhausen argued for a clear separation between the study and the analysis of
expression and impression. In this respect, his approach was not far from Brunswik’s
concept (e.g., Brunswik 1956; see also Kappas, Hess, and Scherer 1991). However,
the ethological approach Leyhausen represents has pointed to a certain asymmetry
between the origins and, in consequence, the plasticity of expression and impression
processes in humans. The discovery of a mirror system in the brain (e.g., Rizzolatti
and Craighero 2004) and systematic studies on imitative behavior (e.g., in the shape
of the Chameleon effect, e.g., Chartrand and Bargh 1999) are concordant with Leyhau-
sen’s analysis that the impressions gathered from bodily expressions are much less flex-
ible than the expressions themselves. Research by Todorov and his colleagues has
demonstrated how dramatically fast impressions are arrived at, with individuals showing
great confidence in literally fractions of seconds (Todorov, Mandisodza, Goren, et al.
2005; Willis and Todorov 2006). These findings suggest rapid automatic effects that are
consistent with the type of mechanisms postulated by Leyhausen. In other words, this
is not about what faces or bodily expressions really say, but what people think they
say. This can be attributed to our evolutionary past, where it was useful to act quickly
in response to certain signs. A closer analysis of Leyhausen’s predictions might be highly
productive for furthering our understanding of “old questions” about issues such as mis-
communication, misattribution, or insensitivity to deception based on expressive cues.
These questions are currently difficult to answer when the attempt is based on an analysis
that centers only on humans in the here and now and does not take phylogenetic trajec-
tories into account. Leyhausen’s work may also prove useful for current and developing
questions linked to computer-mediated communication, for instance what level of realism
is really needed for artificial entities, such as agents, avatars, or embodied robots.
Brunswik’s lens model was quick to be used and gain support at the time of its
appearance, particularly by Hammond (1955), who remains to this day a strong
defender of this paradigm. Hammond’s initial application of the lens model was in
the area of judgment analysis (or decision making), using the paradigm to analyze
clinical (diagnostic) judgments by psychiatrists and psychologists. Holzworth (2001)
has presented a brief review of the research regarding analysis of judgments from a
Brunswikian perspective. In this review, the author finds that the value to be ascribed
to analysis of judgments quickly became oriented towards problems of interpersonal
perception. In this area, Albright and Malloy (2001) go as far as to state: “All research
on interpersonal perception is Brunswikian, because the use of real people as targets of
judgment invokes the principle of representative covariation. Because Brunswik was
the first to conduct a theoretically based and comprehensive […] study of social percep-
tion, he was the originator of the interpersonal approach to social perception research”
(Albright and Malloy 2001: 330–331). Brunswik was certainly ahead of his time and was
criticized by gestalt psychologists (e.g. Kurt Lewin) as well as by behaviorists, experi-
mental psychologists and statisticians. However, in the present day, Brunswik’s emphasis
on the importance of the environment is reflected in the increasing development of
“ecological psychology,” as illustrated by the work of Barker (1968).
German Expression Psychology remained an active research area until the sixties
and devoted much interest to what is now termed nonverbal communication. Unfortu-
nately, its concepts and theories are often overlooked in today’s research. From the pre-
sentation of three selected authors within the field of German Expression Psychology,
namely Lersch, Leyhausen and Brunswik, we wanted to demonstrate how rich and
wide-ranging their ideas have been. These ideas do not only reject the notion of discrete
prototypes of emotional expressions but also highlight the complexities in impression
formation and management. German Expression Psychology therefore has an impor-
tant status in providing the framework for much later empirical work on nonverbal
behavior. In this sense, it may be an ancestor of modern nonverbal communication
research, but it continues to contribute to today’s perspectives on bodily and verbal
forms of behavior to express inner states.
5. References
Albright, Linda and Thomas E. Malloy 2001. Brunswik’s theoretical and methodological contribu-
tions to research in interpersonal perception. In: Kenneth R. Hammond and Thomas R. Stew-
art (eds.), The Essential Brunswik: Beginnings, Explications, Applications, 328–331. New York:
Ambadar, Zara, Jonathan W. Schooler and Jeffrey F. Cohn 2005. Deciphering the enigmatic face:
The importance of facial dynamics in interpreting subtle facial expressions. Psychological
Science 16: 403–410.
Asendorpf, Jens 1982. Contributions of the German “Expression Psychology” to nonverbal com-
munication research, Part II: The face. Journal of Nonverbal Behavior 6: 199–219.
Asendorpf, Jens and Harald G. Wallbott 1982. Contributions of the German “Expression Psychol-
ogy” to nonverbal communication research, Part I: Theories and concepts. Journal of Nonver-
bal Behavior 6: 135–147.
Barker, Roger G. 1968. Ecological Psychology: Concepts and Methods for Studying the Environ-
ment of Human Behavior. Stanford, CA: Stanford University Press.
Bassili, John N. 1978. Facial motion in the perception of faces and of emotional expression. Journal
of Experimental Psychology: Human Perception and Performance 4: 373–379.
Brunswik, Egon 1952. The Conceptual Framework of Psychology. International Encyclopedia of
Unified Science, Volume 1, Number 10. Chicago: University of Chicago Press.
Brunswik, Egon 1956. Perception and the Representative Design of Psychological Experiments.
Berkeley: University of California Press.
Brunswik, Emil and Lotte Reiter 1937. Eindrucks-Charaktere schematisierter Gesichter.
Zeitschrift für Psychologie 142: 67–134.
Chartrand, Tanya L. and John A. Bargh 1999. The chameleon effect: The perception-behavior link
and social interaction. Journal of Personality and Social Psychology 76: 893–910.
Cornelius, Randolph. R. 1996. The Science of Emotion. Upper Saddle River, NJ: Prentice Hall.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. London: John Murray.
Dhami, Mandeep, Ralph Hertwig and Ulrich Hoffrage 2004. The role of representative design in
an ecological approach to cognition. Psychological Bulletin 130: 959–988.
Ekman, Paul 1979. Non-verbal and verbal rituals in interaction. In: Mario von Cranach, Klaus
Foppa, Wolfgang Lepenies and Detlev Ploog (eds.), Human Ethology: Claims and Limits of
a New Discipline, 169–202. Cambridge: Cambridge University Press.
Ekman, Paul 1982. Methods for measuring facial action. In: Klaus R. Scherer and Paul Ekman
(eds.), Handbook of Methods in Nonverbal Behavior Research, 45–90. Cambridge: Cambridge
University Press.
Ekman, Paul 1992. Facial expressions of emotion: New findings, new questions. Psychological
Science 3: 34–38.
Ekman, Paul and Wallace V. Friesen 1978. The Facial Action Coding System. Palo Alto, CA: Con-
sulting Psychologists Press.
Ekman, Paul and Wallace V. Friesen 1982a. Felt, false and miserable smiles. Journal of Nonverbal
Behavior 6: 238–252.
Ekman, Paul and Wallace V. Friesen 1982b. Measuring facial movement with the Facial Action
Coding System. In: Paul Ekman (ed.), Emotion in the Human Face. 2nd edition, 178–211. Cam-
bridge: Cambridge University Press; Paris: Editions de la Maison des Sciences de l’Homme.
Fridlund, Alan J. 1994. Human Facial Expression: An Evolutionary View. San Diego, CA: Aca-
demic Press.
Hammond, Kenneth R. 1955. Probabilistic functioning and the clinical method. Psychological
Review 62: 255–262.
Helfrich, Hede and Harald G. Wallbott 1986. Contributions of the German “Expression Psychol-
ogy” to nonverbal communication research, Part IV, The voice. Journal of Nonverbal Behavior
10: 187–204.
Hess, Ursula and Robert E. Kleck 1994. The cues decoders use in attempting to differentiate emo-
tion-elicited and posed facial expressions. European Journal of Psychology 24: 367–381.
Holzworth, R. James 2001. Judgment analysis. In: Kenneth R. Hammond and Thomas R. Stewart
(eds.), The Essential Brunswik: Beginnings, Explications, Applications, 324–327. New York:
Izard, Carroll E. 1991. The Psychology of Emotions. New York: Plenum Press.
Kappas, Arvid and Jean Descôteaux 2003. Of butterflies and roaring thunder: Nonverbal commu-
nication in interaction and regulation of emotion. In: Pierre Philippot, Erik J. Coats and Robert
S. Feldman (eds.), Nonverbal Behavior in Clinical Settings, 45–74. New York: Oxford Univer-
sity Press.
Kappas, Arvid, Ursula Hess and Klaus R. Scherer 1991. Voice and emotion. In: Robert S. Feldman
and Bernard Rimé (eds.), Fundamentals of Nonverbal Behavior, 200–238. Cambridge: Cam-
Klages, Ludwig 1926. Grundlagen der Charakterkunde. Leipzig: J. A. Barth.
Kretschmer, Ernst 1940. Körperbau und Charakter. Berlin: J. Springer.
Krumhuber, Eva and Arvid Kappas 2005. Moving smiles: The role of dynamic components for the
perception of the genuineness of smiles. Journal of Nonverbal Behavior 29: 3–24.
Krumhuber, Eva, Antony S. R. Manstead and Arvid Kappas 2007. Temporal aspects of facial dis-
plays in person and expression perception. The effects of smile dynamics, head-tilt and gender.
Journal of Nonverbal Behavior 31: 39–56.
Lersch, Philipp 1940. Seele und Welt. Leipzig: J. A. Barth.
Lersch, Philipp 1957. Zur Theorie des mimischen Ausdrucks. Zeitschrift für Experimentelle und
Angewandte Psychologie 4: 409–419.
Lersch, Philipp 1961. Gesicht und Seele. Munich: E. Reinhardt.
Leyhausen, Paul 1956. Verhaltensstudien an Katzen. Zeitschrift für Tierpsycholologie, Beiheft 2.
Leyhausen, Paul 1967. Biologie von Ausdruck und Eindruck. Psychologische Forschung 31:
113–227.
Sedinha Teßendorf (eds.), Body-Language-Communication: An International Handbook on
Peters, Gustav 2000. Nachruf: Paul Leyhausen (1916–1998). Bonner Zoologische Beiträge 49: 179–189.
Reynolds, D’Arcy J. and Robert Gifford 2001. The sounds and sights of intelligence: A lens model
channel analysis. Personality and Social Psychology Bulletin 27: 187–200.
Rizzolatti, Buccino G. and Fadiga L. Craighero 2004. The mirror-neuron system. Annual Review
of Neuroscience 27: 169–192.
Rothacker, Erich 1941. Die Schichten der Persönlichkeit (2. Auflage). Leipzig: J. A. Barth.
Scherer, Klaus R. 1978. Personality inference from voice quality: The loud voice of extroversion.
European Journal of Social Psychology 8: 467–487.
Scherer, Klaus R. 2003. Vocal communication of emotion: A review of research paradigms. Speech
Communication 40: 227–256.
Scherer, Klaus R. and Harald G. Wallbott 1990. Ausdruck von Emotionen. In: Klaus R. Scherer
(ed.), Enzyklopädie der Psychologie. Band C/IV/3 Psychologie der Emotion, 345–422. Göttingen:
Hogrefe.
Todorov, Alex, Anesu N. Mandisodza, Amir Goren and Crystal C. Hall 2005. Inferences of com-
petence from faces predict election outcomes. Science 308: 1623–1626.
Wallbott, Harald G. 1982. Contributions of the German “Expression Psychology” to nonverbal
communication research. Part III: Gait, gestures, and body movement. Journal of Nonverbal
Behavior 7: 20–32.
Waller, Bridget and Marcia Smith Pasqualini this volume. Analysing facial expression using Facial
Action Coding Systems (FACS). In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An
International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
Wehrle, Thomas, Susanne Kaiser, Susanne Schmidt and Klaus R. Scherer 2000. Studying the dy-
namics of emotional expression using synthesized facial muscle movements. Journal of Person-
ality and Social Psychology 78: 105–119.
Willis, Janine and Alex Todorov 2006. First impressions: Making up your mind after 100 ms expo-
sure to a face. Psychological Science 17: 592–598.
Eva Krumhuber, Bremen, Germany

Susanne Kaiser, Geneva, Switzerland
Kappas Arvid, Bremen, Germany
Klaus R. Scherer, Geneva, Switzerland
36. Fused Bodies: On the interrelatedness

of cognition and interaction
1. Introduction
2. Theoretical background
3. Conversation Analysis as method for doing Fused Bodies research
4. Cognition and Fused Bodies
5. Empirical evidence
6. References
Abstract
This chapter introduces the Fused Bodies (FB) approach to sense-making in social inter-
action. Using ethnographic observation and microanalyses of naturally occurring social
interaction as its empirical basis, Fused Bodies focuses on sense-making in face-to-face
encounters as the integrated whole of interactionally relevant, mutually oriented-to
body movements at any given time during the interaction. Fused Bodies emphasizes
that interactionally consequential body movements are more than just “behavior”; they
36. Fused Bodies: On the interrelatedness of cognition and interaction 565
are acts of sense which matter to the participants. Hence, though heavily indebted to the
concepts and principles of analysis in ethnomethodology and conversation analysis,
Fused Bodies aims at going beyond these disciplines in 1) focusing on and emphasizing
the role of whole bodies in interaction and 2) in explicitly considering the interactional
work done by co-participants as both action and cognition. The latter point is both pro-
grammatic in that Fused Bodies aims at empirically investigating “cognition” as a mem-
ber’s matter and it is empirical in that the sense of interactional actions and the
interactional machinery per se have been found to be significant matters to participants
which they invest interactional work in establishing, negotiating, repairing, and re-
establishing. Finally, Fused Bodies hypothesizes that body movements, including talk,
may in and of themselves index common sense knowledge and constitute interactional
cognitive work.
1. Introduction
Fused Bodies (FB) is a new approach to sense-making in social interaction. The term
sense is understood as to cover all the kinds of meanings an action in social interaction
can have, i.e. referential, conceptual, reflective, etc. The term also covers our under-
standing of meaning as bodily grounded and it highlights that meaning does not reside
in analysts’ models, schemas, or notations, but it resides in the way the sensing body
acts, perceives the actions of others and itself, forms it actions and so on.
From the perspective of Fused Bodies, sense is the total, integrated unit at any given
moment in space and time when mutually oriented, interacting, physical and knowl-
edgeable bodies, in concert, systematize their movements. The latter is understood as
any resource that interacting bodies make use of to achieve sense: language, gesture,
gaze, posture, manipulation of objects or other bodies, non-linguistic sounds, silence,
and more. In other words, the unit of sense-making does not have to be communication
in a traditional sense, involving language as the primary means of achieving sense. The
unit may also be completely non-linguistic. An analogy which highlights the Fused
Bodies focus on the coordinated work that whole bodies do (not just mouths, brains
and hands) is a couple of skilful tango dancers. Each dancer is constantly “tuned in”
to his or her partner, monitoring carefully the movements of the other, not just through
gazing but through the whole body, and every movement is done in precise coordination
with the movements of the other. In and through their coordination and timing, the
dancers become one whole, a gracefully moving unit which is two and one at the
same time. A tango requires mastery of particular “exotic” movements but it shares
with everyday social interaction the general human talent for performing interactional
coordination and timing with the body, to act as two in a graceful unit. The analogy
between a tango and everyday social interaction is of course to be understood at a cer-
tain level of abstraction. On the specific descriptive level, they are two different phe-
nomena of social interaction, two different kinds of social, interactional activities
governed by different specific rules and possibilities and “design” of actions. Social
interaction fuses bodies – in time and space and through sound, vision, coordination,
touch, smell, temperature, and more – and through this fusion social interaction
becomes possible and sociality becomes a fundamental fact of our lives.
Fused Bodies is thus focusing on movements that are turned into social actions by
interacting human bodies. Further, its interests concern when and how these
movements are turned into social actions and thus how sense is achieved in interaction.
What is characteristic about the Fused Bodies approach to social interaction is the way
in which it understands social actions as being inherently social and cognitive, or rather
socio-cognitive. Social actions are the means in and through which a common under-
standing is worked at and established. They are not merely “events”, “things that hap-
pen” or “behavior” in a traditional behaviorist sense. Social actions carry out
understanding. They demonstrate the understanding that they “do”. Furthermore,
they do not have a life of their own. They are being constructed, carried out, done, re-
cognized and understood by sense-making human bodies. Hence, in contrast to other
names for approaches to social interaction and communication such as Conversation
Analysis, Discourse Analysis and Discursive Psychology, the name Fused Bodies
profiles the doers of the social action.
The Fused Bodies approach thus endorses a fundamental mutual dependency and
reflexivity between sociality and cognition: On the one hand, understanding is only pos-
sible in and through sociality or on the basis of sociality. Yet on the other hand, an event
can only be a social action – that is an event that is understandable as a social action,
done by a sense-making human being and understood by a sense-making human being –
if there is cognition. When human beings in conversation analytic (see section 2. below)
jargon produce an action, treat an action and understand an action “as” something, they
(in Fused Bodies jargon) cognize that action.
From Fused Bodies’ research interests in understanding follows an interest in knowl-
edge. It follows from its approach to sense-making that Fused Bodies describes and ana-
lyzes knowledge as a social achievement too. The Fused Bodies interest in knowledge
thus concerns how knowledge is achieved socially and how interacting human bodies’
actions bear upon social knowledge.
2. Theoretical background
2.1. Body-language
The study of language as disembodied linguistic structure (morphological, syntactical,
semantic, symbolic, pragmatic) has a long history that dates back to ancient Greece
and includes the most prominent proponents of the modern science of linguistics:
Chomsky (1957), Austin (1962), and Searle (1962) (for a discussion of Aristotle’s con-
ceptualization of the relation between the structure of language and the structure of
reality, see Edel (1996)). The Fused Bodies approach, however, follows a more recent
line of research which embeds language as an integral part of the embodied, interac-
tional making of sense (e.g. Goodwin 2000a, 2000b, 2003; Goodwin and Goodwin
1986; Kendon 1980, 1993, 1994, 2000, 2002, 2004; Laursen 2002; LeBaron and Streeck
2000; Schegloff 1984, 1998; Streeck 1993, 2002). In the Fused Bodies program, this en-
tails that “language” itself, its production and understanding, is a visible bodily doing on
a par with gaze, gesture, head movements and movements of parts of the face (e.g. eye
brows), body posture, object manipulation, and non-verbal sounds. All bodily doings, or
as we shall call them movements, are all in different ways potential resources for inter-
action. It should be emphasized that since movements covers a wide range of different
resources that contribute to sense-making in different ways, these types of movements
should not be understood or treated as the same. Hence, we recognize the distinct
symbolic nature of most verbal utterances as being different from say the way in which
a body leans towards another while whispering as secret. Yet, both are part of the same
integrated whole of sense-making, both are also bodily movements and neither the ut-
tering of words nor other movements of the body can a priori be said to be either sym-
bolic or not. The uttering of words may in some context be almost un-symbolic and the
movement of other body parts may take on a symbolic function. The local context and
not the type of movement itself decides whether it is symbolic or not or how it differs
from other body movements.
Fused Bodies’ research interests concern resources which amount to actions that are
recognized as actions for interaction by the participants in interaction. The approach
focuses on how sense is achieved by the participants of interaction in and through con-
certed action in time and space. The Fused Bodies program has a lot in common with
and inherits a lot from multi-modal analyses of interaction in the vein of Ethnometho-
dology (EM) and Conversation Analysis (CA) (Heritage 1984; Hutchby and Wooffitt
1998; Silverman 1998).
Studies in this area have shown how for instance gestures and headshakes are orga-
nized as activities, how they are organized in relation to speaking and how they contrib-
ute to the total meaning of the utterances and/or actions of which they are a part. Yet
other studies (Streeck 1993) reveal a coordination of gaze and gesture in the way speak-
ers turn their gaze at their gesture (iconic gesture) before uttering a specific central
word that is semantically coherent with the meaning of the gesture. And Bavelas,
Coates, and Johnson (2002) have studied how the speaker’s gaze coordinates the inter-
actional work between speaker and recipient. Goodwin (1981) shows how the speaker
while speaking orients himself towards receiving the recipient’s gaze (see also Kendon
2004), and investigates (Goodwin 2000b, 2003) pointing as a situated activity which en-
tails among other “semiotic resources” “a body visibly performing an act of pointing”
(Goodwin 2000b: 69, see also Goodwin and Goodwin 1986). Finally, Schegloff (1998)
shows how “body torque” can “impinge on the conduct of the participants and shape
the way they interactively produce talk” (Schegloff 1998: 536).
However, though the Fused Bodies program inherits most of the methodological and
conceptual agenda of Ethnomethodology and Conversation Analysis (see below) and
has in common with these studies the focus on the employment and integration of mul-
timodal resources in interactional sense-making, it attempts at the same time to over-
come the tendency that has existed in these research traditions to treat language or
rather “talk” as the primordial “carrier” of sense. Thus, the Fused Bodies program en-
dorses an approach that studies face-to-face interaction without a priori analytic divi-
sions of aspects or resources of sense-making. Instead, understanding or sense is
conceptualized as an interactionally achieved, social, integrated whole consisting of
concerted, recognizable and understandable actions that are composed of the bodily
movements that happen to be employed.
2.2. Phenomenology, cognition and Ethnomethodology

Whereas many contemporary approaches to the interrelatedness of interaction and cog-
nition are based on considering how empirical interactional studies and existing cogni-
tive theories may or may not fit together or aim at rethinking concepts or findings from
the “opposing” field (e.g. Discursive Psychology, Edwards and Potter 1992) in the terms
of one’s own field, the Fused Bodies program in principle aims at starting with a blank
slate. This means that its research agendas are not motivated by the ambition to either
verify, undermine or rethink specific, existing cognitive theories, concepts and terminol-
ogy. Instead it sets up a redefinition of cognition as such that allows for empirically dis-
covering new kinds of cognizing and new kinds of cognitive phenomena. But by
following this agenda, Fused Bodies studies may of course converge with other or
indeed existing or previous cognitive studies with different points of departure. And,
needless to say, the Fused Bodies program does still draw on the same language and
the same general labeling of things in the world, which means that Fused Bodies studies
may eventually focus on potential phenomena such as metaphor, mind, conceptualization,
or frames without being answerable to any specific cognitive theory.
An influential line of thinking which has inspired both cognitive and ethnomethodo-
logical approaches to sense-making and which the Fused Bodies approach is inspired by
is the phenomenological understanding of how things can have an existence to human
beings. In that understanding, things can have an existence in and through the processes
by which they are constructed. Within “traditional” phenomenology these processes are
features of the individuals’ perception of the world. Cognitive studies take the indivi-
duals’ perception of the world as a point of departure and study these processes typi-
cally as mental processes going on within the minds or brains of individuals. The
individual human being is thus often conceptualized as an enclosed psychological entity
who, by means of communication, can reveal his thought contents to his surroundings.
Ethnomethodology has adopted the phenomenological view on how things come into
being, that is: On how things are constructed as “being”. Ethnomethodological studies,
however, view this construction as fundamentally social and observable. Things come
into being through human beings’ concerted actions. Only through a joint effort do
things become a part of human beings’ social world. Ethnomethodological studies
thus adopt a praxeological and procedural approach to how social human beings in
and through concerted actions construct or orient towards things as being a part of
social reality. With this approach, ethnomethodological studies investigate how social
human beings recognize actions as being part of procedures of sense-making through
the (re)use, (re)construction and (re)establishment of an action (of being exactly that
action) in systematic ways. What is of interest to ethnomethodological studies is conse-
quently not “cognitive” or “neurological” “processing” per se but the systematic occur-
rence of recognizable methods and techniques by which human beings construct a social
order and make sense of their social world. Thus, Ethnomethodology disregards any
attempt to account for social behavior by reference to “mental” processes which may
generate such behavior. Instead, Ethnomethodology views social behavior as the
locus of sense-making in and of itself. It should be noted that human beings are not
thus viewed as mindless social agents (or “cultural dopes”; Garfinkel 1967: 68). On
the contrary, social behavior indexes knowledge. Hence, Ethnomethodology interests
concern social human beings’ knowledge and thus their use of categories and notions
of their social world. Hence, mental or neurobiological categories such as cognition,
mind, thinking and feeling have a place in the ethnomethodological analysis only in
so far as these categories are oriented to or constructed by the participants themselves
(in and through their actions) as features of their social world. The Fused Bodies pro-
gram shares with Ethnomethodology the perspective on phenomena that describes
these in terms of how they come into being a part of social human beings’ life world.
However whereas Ethnomethodology studies focus on the social structures as these are
achieved in and through the actions of members of society and as existing only as such,
the Fused Bodies program to an even greater extent emphasizes the integral and defin-
ing role of the doers of these structures. Thus, in the Fused Bodies program actions and
social structure are seen as the bodily movements that constitute them.
3. Conversation Analysis as method for doing

Fused Bodies research
One of the key micro-analytic disciplines within which ethnomethodological concerns
have been pursued is Conversation Analysis (Heritage 1984, 1987; Maynard and
Clayman 1991). This line of research shares with Ethnomethodology the interest in
the procedures in and through which members of society achieve the social structures
that make up their social reality. Conversation Analysis focuses on how such structures
are achieved in social interaction. The latter is conceived of as being a phenomenon in
its own right, meaning that phenomena of interaction (interactional phenomena) can be
accounted for by reference to the dynamics of the interaction. Hence, interaction is
driven by conditions inherent to it. Conversation Analysis studies these in terms of
“generic interactional structures” and in terms of the so-called interactional machinery
(Psathas 1995) or apparatus (Sacks [1966] 1995; Sacks, Schegloff, and Jefferson 1974).
This includes a concern for the interactional procedures and techniques in and through
which participants in interaction establish this machinery. In other words, Conversation
Analysis studies how participants order their interactions. The focus is on the systemat-
ically employed and – to the participants – recognizable methods, procedures, patterns
and techniques in and through which the participants achieve such interactional order.
Central to Conversation Analysis studies is the analysis of how participants achieve
such order through concerted actions and how common understanding or rather inter-
subjectivity is thus a social interactional achievement. It should be noted that with the
Conversation Analysis methods the Fused Bodies program requires that “sense-
making” is dependent on recognizability. To achieve recognizability the participants
organize movements in specific ways. One central principle in this organization is the
placement or the positioning of movements in relation to other previously produced
movements, i.e. in relation to the context of the currently produced movements. In
that context, the movements are understood and treated by the participants as a social
action. Hence, the categorization of movements as resources for social interaction as
well as actions for social interaction are in the Fused Bodies’ program constrained by
the participants’ own orientations. Furthermore, in conversational analytic and ethno-
methodological approaches the relation between social actions on one hand and the
contexts in which they occur is viewed as being characterized by reflexivity: This
means that social contexts are constructed and understood through social actions on
the one hand while on the other hand the actions are understood through these
contexts. By context we then mean the local, sequential context.
Fused Bodies analyses are based on empirical studies of “naturally occurring” social
interaction in order to get as close as possible to how people make and achieve sense
through bodily movements in and as social actions. Designed experiments or task-
driven interactions generated by the researcher are not used as data (unless the focus
is on these as naturally occurring social situations). The social interactions are filmed
and then transcribed according to a set of transcription symbols which draw in part
on transcription conventions as developed by the mother of Conversation Analysis,
Gail Jefferson (Schenkein 1978; for a discussion of methodological and practical aspects
of using this style for transcribing verbal-interaction, see Ten Have 1999).
Traditional Conversation Analysis data was typically audio-taped and, as mentioned
above, the analysis focused on the oral sounds produced by the participants. The Fused
Bodies approach with its interests in “bodies-in-interaction” naturally relies on video-
taped data. It must thus include in its transcriptions all potentially relevant body move-
ments (see Hougaard and Hougaard 2009 for examples). Data is analyzed on the basis
of both the original footage and the transcriptions.
4. Cognition and Fused Bodies

Though firmly rooted in the behavioral, micro-analytic frameworks of Ethnomethodol-
ogy and Conversation Analysis, the Fused Bodies program does not exclude (some
notion of) cognition in the study of social interaction. However, the notions of cognition
that are entertained in the Fused Bodies approach are shaped and constrained by the
praxeological and empirical basis of the approach. This has so far led the Fused Bodies
approach to consider the following notion of cognition: Cognition is understood as part
and parcel of the recognized-as-social actions which participants perform with their
bodies. These actions are not just patterned kinesthetic movements, they are flesh
and blood and they are understood as being meaningful by the participants themselves.
Hence, it is understood by the participants themselves that these actions index compe-
tences such as for instance knowledge, thinking, memory and language (see next section
for an example). The program thus encompasses ordinary people’s ascribing cognitive
competences to fully competent members’ actions (or not). This interest is not con-
strained to people “talking” cognition, i.e. discussions of “memory”, (often labeled
“folk-psychology”) it rather aims at describing how such cognitive ascriptions are
built into sequences of bodily actions in subtle ways. The Fused Bodies approach
thus reverses the approach of most traditional cognitive science. It does not ask from
en etic perspective what cognitive “abilities”, “processes”, or “competences” people
must “run” or “possess” to behave in certain observed ways; it instead asks from an
emic perspective how we – analysts and participants – can understand cognition as it
emerges from the concerted, social behavior of ordinary people in ordinary settings.
With this, Fused Bodies treats action as cognition or action as mind. The human
mind is in this program essentially social. Participants build their understanding of
some (other’s) actions into their own actions. Hence, the way(s) in which some bodily
action (X) may be understood is dealt with by the co-participant in some next action
(Y) as s/he orients towards the prior action in subtle ways. Understanding is then
achieved by participants in interaction together. Together they “act” things “into
being”. Fused Bodies does not deny the existence of individual perception or individual
thinking, these we consider obvious facts that are also demonstrable as participants’
own understanding of each other (see “What are you thinking about?”). However,
such “individual thinking” is not assumed to be omnipresent in that it is connected
to or a part of every action and/or situation. “Individual thinking” is considered to
be a specific activity, whereas “social thinking”/understanding is part and parcel of
every activity – even of the “individual” one. That is, Fused Bodies has it that all our
knowledge as individuals is not individual but acquired in and through social interaction
with and about the world we are a part of (Hougaard and Hougaard 2009). Thus, the
Fused Bodies framework proposes a conception of mind which is fundamentally social
(see also Coulter 1979 and Wittgenstein [1953] 2001). The perception of “pain” may be
individual when stepping on a sharp stone barefooted. The conception or theoretical or
practical knowledge of the neuro-physical sensation we know as “pain” is however
achieved in and through the social determination of it. Thus, cognition is social and
so is the human mind.
As a consequence, the radical Fused Bodies hypothesis has it that the social mind
precedes the Husserlian “subject”. This hypothesis is partly based on the concepts
and empirical fundament of the Fused Bodies program and in part on the unsustainabil-
ity of the Husserlian subject understood as purely individual experience of being
(Husserl [1913] 1977; see also Eco 1997). The Fused Bodies program hypothesizes, as
opposed to traditional phenomenology, that the experience of “being” is dependent
on the co-existence of other “beings”. No child is born in isolation. Also, it follows
from this that the program does not leave room for a notion of cognition as an “auton-
omous” phenomenon that interacts autonomously with other “minded processes” such
as perception. Autonomous cognition is by definition non-human or pre-human. Con-
sider for instance a child who, though not born in isolation, is still, “brought up” by
wolves. We may recognize that child as a “human being” because of its biological poten-
tial for being a member of human society, but we consider its way of understanding the
world un-human. Had the child displayed an understanding of the world that corre-
sponded to a “human” perspective, this would have been evidence of autonomous cog-
nition. We speculate that what is autonomous is our orientation towards other
“individual beings” – most of all “individuals like us”. As some of the empirical studies
that gave rise to the Fused Bodies program evidence, people orient towards people (see
section 5. below). Thus, human beings’ way of having a world, having knowledge and
being minded is based on our social orientation.
The Fused Bodies program presents a non-reductionist approach to cognition. The
cognitive sciences have produced many attempts to demarcate the “cognitive sphere”,
for instance the individual’s “inside” or “mind”, the individual brain or the individual
body and brain. As described above, the notion of cognition entertained in the Fused
Bodies approach is constrained by the documentary method of interpretation, i.e. what
the participants themselves (together) understand as the cognitive constitution of
their social world. Whether it be in terms of talking about cognition (“memory”, “rec-
ognition” and the like) or in terms of building cognition into actions in interaction
(alluding to, or relying on cognition). This notion of cognition, though clearly method-
ologically constrained, in principle allows for various different notions of cognition or
an all-encompassing notion of cognition as for instance involving any thinkable (and
unthinkable) bodily movement. In other words: The Fused Bodies research interests
concern how participants in interaction treat actions not just as actions but as actions
carried out by knowledgeable, mindable, feeling human beings and members of society.
Anything within this that is understood intersubjectively as cognition is of interest to
Fused Bodies.
It should be made clear that the Fused Bodies program distinguishes between actual
empirical analyses and findings in studies of social interaction and the hypotheses and
philosophy of human knowledge and the human mind, that have been sketched above.
The latter is a changeable spin-off of the former, and the empirical studies of bodies
that fuse in social interaction to achieve intersubjectivity in a current activity remain
the core of the Fused Bodies program. Nonetheless, the Fused Bodies program can
indeed generate studies of sense-making in solitary settings where, using other methods
than those developed in connection to Conversation Analysis, the focus is on how the
processes and procedures of sense-making orient towards a social, normative constitution
of the sense-maker’s world.
5. Empirical evidence
The Fused Bodies program emerges from ethnographic and interactional studies of nat-
urally occurring (face-to-face) interaction in many different settings. Studies of how
participants – ordinary, natives, non-natives, (non)disabled, children, adults, parents,
businesspeople, teachers, pedagogues and more – make sense have indicated that
sense-making is neither an autonomous individual pre-planned cognitive endeavour
nor a behaviourist social machinery of talk. Sense-making is an accomplishment that
is worked and arrived at by the participants in a social process. These studies have
also indicated that the social process and the arrived-at understanding of whatever is
the ongoing interactional business matters to the participants.
5.1. Understanding matters

Studies in interaction have generally – we believe – shown that interactional meaning
and practices matter to people. Rasmussen (1998, 2000) has shown, for instance, how
business people work at and accomplish an “intercultural business environment” in
which different and diverging cultural actions of each participant are treated as inter-
culturally acceptable social ways of doing things. So for instance, the way in which the
same two participants address each other may differ depending on whether a German
speaker calls a Danish speaker (using first names for identification and as recognition
items) or whether a Danish speaker calls a German speaker (using family names
for identification and as recognition items). Even more so, the participants were
shown to consistently make use of these culturally diverging frames in one and the
same interaction – one (A) addressing the other (B) and identifying third persons
(C) with first names, whereas the other (B) made use of family names for the same
purposes.
Brouwer, Rasmussen, and Wagner (2004), however, have shown how understand-
ing in other native-nonnative interactions matter to the participants in different ways.
The authors show how the native participants deal with actions carried out by non-
native speakers that in these sequential environments have many implications. In
the excerpt below (Brouwer, Rasmussen, and Wagner 2004: 79–80), the non-native
participant and business partner makes the call and asks if “Heinrich” (first name)
is “at home”. This occasions some problems for the native co-participant who answers
the phone: “Heinrich’s ‘right’ name is ‘Herr Heinrich’” (Mr. Heinrich); Mr. Heinrich
is actually at home (as he is sick); Mr. Heinrich is thus not “in the house” which is a
relevant understanding of the caller’s question since the caller calls the company in
which Mr. Heinrich is employed. The called native co-participant chooses to deal
with these implications (in line 7) by first establishing the way the question (posed by
the non-native caller in line 5) is understood, and then answering the understood
question:
Excerpt:
1 D: .hhh haben sie Heinrich zu hause;

Have you Heinrich at home
‘Is Heinrich around’
2 (0.4)
3 G: bitte sehr?
Pardon?
4 (0.3)
5 D: haben sie Heinrich zu hause;
Have you Heinrich at home
‘Is Heinrich around’
6 (0.6)
7 G: .h der herr Heinrich. der is nicht im hause. nein
the mister Heinrich he is not in the house no
‘mister Heinrich is not in the building no’
The example above illustrates how the understanding of different actions doing dif-
ferent things matters to participant G. Instead of just treating the most likely under-
standing of D’s action (line 5) as “is Mister Heinrich in the house” by responding
“no” in line 6 or 7, G repairs “Heinrich” with “Mr. Heinrich” and repairs “at
home” with “in the house”, demonstrating that it matters what kind of action and
possible understanding is being dealt with subsequently. Furthermore, the construc-
tion of the relationship between the caller and the called and between the answerer
(G) and the called matters (“Heinrich” or “Mr. Heinrich” to whom?) as well. Notice
that that “it matters” is done in interaction. It is not an “internal” matter. Further-
more, the understanding of “what matters” is collaboratively worked at and accom-
plished by both participants.
Thus, a lot seems to be at stake for the participants. This may of course be – and has
been – described in terms of the actions they have carried out. However, we hold that
such an approach does not grasp the nature of “what matters”. That meaning and sense-
making matters – what means something to people, what they care about – is not part
and parcel of what can simply be labeled “interactional machinery.” In order to capture
and describe participants’, members of society’s, understanding of their world, the
Fused Bodies approach proposes a framework which does not conceptualize this under-
standing merely as “mechanics.” Hence the Fused Bodies approach is not in line with
traditional behaviorist thinking.
5.2. Bodily actions are recognizable social actions too

Crucially, what matters to the participants is not just dealt with verbally. Spoken
language is movement of the body, but this is only one way in which bodies move to
produce systematic, recognizable action-in-interaction, social action. Hougaard and
Hougaard (2008) – in analyzing the social interaction between a boy with cerebral
palsy and his pedagogue – show how meaningful interaction may proceed locally by
way of non-symbolic (non-signifying) body movements that are
(i) projected by talk,

(ii) subsequently defined by the pedagogue in and through talk or
(iii) neither framed, nor projected, nor subsequently defined by talk.
Hence, the understanding that matters to the participants is produced by entire bodies
(see also gesture studies, see section 2.1.), and sometimes non-verbal movements alone
do the job of making meaningful interaction (see also Kendon 1992; Lerner and
Zimmerman 2003). Participants may monitor whole bodies as they interact and they
may, in principle, use any part of their body in understanding the ongoing interaction
(see also Streeck 1993, 2002; LeBaron and Streeck 2000). It is of course central to a
framework that proposes to also include (non-verbal) movements as social actions to
be able to distinguish between body movements that are being treated as movements
for social interaction and non-interactional body movements. In the studies mentioned
above, the analyses show how the participants in interaction deal with such movements
(or not). On the basis of these studies, Fused Bodies proposes to approach the question
as a “members’ problem”, i.e. as a problem that members have to and actually do deal
with in practical ways. Hougaard and Hougaard (2009) show how participants orient
towards bodies as being “interactionally set” or not. In their analysis of social interac-
tion between a person suffering from aphasia and her speech therapist, it is shown how
the aphasic person is sometimes oriented to as “falling out” of the interaction. In con-
tinuation of doing a word search her gaze drops to the surface of a desk between her
and the therapist and the otherwise very lively, gesturing woman sits completely still.
In response to this, the speech therapist also sits completely still while however gazing
at his patient. Hence, the co-participants, by stopping all bodily movement momen-
tarily, bring the interaction to a halt. This is done non-verbally, in and through a coor-
dination of body movements and monitoring (by the therapist) of the other’s body. For
that moment, the bodies are not in an interactionally developing mode. Having co-
composed the stillness (with the therapist’s responding stillness), what follows does
not carry the interaction onward until the stillness is broken by the aphasic woman.
Hence, what bodies do during periods of interaction may or may not contribute to
the development of the interaction, and whether or not some body movement contri-
butes to the development of the interaction can be determined by whether it is oriented
to by the participants as doing so. If orienting towards bodily movements as social
actions, then the understanding hereof matters to the participants. The understanding
of these social actions are worked at by the co-participants and are thus are a concern
to them.
Importantly, not only the contents of social actions matter. In an analysis of an au-
tistic child who is building a Lego tower together with his father, Brouwer et al. (in
prep.) found that the parent corrects the child, if the child (Arthur, invented name) in-
itiates a turn-at-building (putting a block on the tower), when it is the parent’s turn
(“Daddy’s turn–Arthur’s turn”). This example – though it may appear strange to
many parents at first glance – confirms what seems to have been found in the analysis
of fighting for turns-at-talk (Sacks, Schegloff, and Jefferson 1974), namely that the-turn-
taking machinery matters. Considerable work can be seen to be put into negotiating the
basic elements of the interaction, whereby participants show an understanding of the

machinery as a meaningful thing that they do, as opposed to some purely behavioral
phenomenon which just “runs”. Furthermore, since the machinery in the case of the
parent and the autistic child building a Lego tower is carried out in and through non-
verbal body movements, the example shows that not only can non-verbal body move-
ments be social and interactionally consequential actions, they matter too, in the
same way that verbal body movements (talk-in-interaction) matter.
The latter point led Hougaard and Hougaard (2008) to propose the hypothesis that
participants in interaction may achieve a common sense understanding in and through
sequentially organized non-verbal body movements. Actually, body movements may be
a part of ordinary peoples’ common sense knowledge in the same way that systemati-
cally employed verbal methods for structuring talk-in-interaction are believed to be.
This hypothesis gets its weight in particular from the observation of sequences of inter-
action consisting only of body movements (which is not signing, as in a sign language of
the Deaf), where a first body movement is responded to as a meaningful action (X) in
and through a subsequent body movement (Y). The particular types of body movement
sequences studied by Hougaard and Hougaard (2008) are so-called transfer sequences
(see also Lerner and Zimmerman 2003) in which one participant offers another partic-
ipant an object which the second participants then takes. Supporting this hypothesis, of
course, requires empirical research showing that non-verbal body movements can be
robust structures of social interaction in and of themselves.
6. References
Austin, John 1962. How to Do Things with Words: The William James Lectures Delivered at Har-
vard University in 1955. Edited by James O. Urmson. Oxford: Clarendon.
Bavelas, Janet B., Linda Coates and Trudy Johnson 2002. Listener responses as a collaborative
process: The role of gaze. Journal of Communication 52(3): 566–579.
Brouwer, Catherine E., Dennis Day, Ulrika Ferm, Anders Hougaard, Gitte Hougaard and Gunilla
Thunberg in preparation. Structures in interactions between children with disability and their
parents, or: Treating the actions of children with disabilities as sensible.
Brouwer, Catherine E., Gitte Rasmussen and Johannes Wagner 2004. Embedded corrections in
second language talk. In: Rod Gardner and Johannes Wagner (eds.), Second Language Conver-
sations, 75–92. London: Continuum.
Chomsky, Noam 1957. Syntactic Structures. New York: De Gruyter.
Coulter, Jeff 1979. The Social Construction of Mind. London: Macmillan.
Eco, Umberto 1997. Kant og Næbdyret. [Danish translation. Kant e l’ornitorinco. Milan: R.C.S.
Libri S.p.A] Copenhagen: Forum.
Edel, Abraham 1996. Aristotle and His Philosophy. New Brunswick, NJ: Transaction.
Edwards, Derek and Jonathan Potter 1992. Discursive Psychology. London: Sage.
Goodwin, Charles 1981. Conversational Organisation – Interaction between Speakers and Hearers.
Goodwin, Charles 2000a. Gesture, aphasia, and interaction. In: David McNeill (ed.), Language
Goodwin, Charles 2000b. Pointing and the collaborative construction of meaning in aphasia. Texas
Linguistic Forum 43: 67–76.
Goodwin, Charles 2003. Conversational frameworks for the accomplishment of meaning in apha-
sia. In: Charles Goodwin (ed.), Conversation and Brain Damage, 90–116. Oxford: Oxford Uni-
versity Press.
Goodwin, Charles and Marjory Goodwin 1986. Gesture and coparticipation in the activity of
searching for a word. Semiotica 62: 51–75.
Heritage, John 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Heritage, John 1987. Ethnomethodology. In: Anthony Giddens and Jonathan H. Turner (eds.),
Social Theory Today, 224–272. Cambridge: Polity Press.
Hougaard, Anders R. and Gitte Rasmussen Hougaard 2008. Do body movements display com-
mon sense knowledge? Paper presented at Language Culture and Mind Conference III,
Odense, Denmark.
Hougaard, Anders R. and Gitte Rasmussen Hougaard 2009. Fused Bodies: Sense-making as a
phenomenon of interacting, knowledgeable bodies. In: Hanna Pishwa (ed.), Social Cognition
and Language, 47–78. Berlin: De Gruyter Mouton.
Husserl, Edmund 1977. Ideen zu einer reinen Phänomenologie und Phänomenologischen Philoso-
phie. Erstes Buch: Allgemeine Einführung in die reine Phänomenologie. 1. Halbband: Text der
1.–3. Auflage. Reprint, edited by Karl Schuhmann, The Hague: Martinus Nijhoff. First pub-
lished Halle (Saale): Max Niemeyer [1913].
Hutchby, Ian and Robin Wooffitt 1998. Conversation Analysis. Cambridge: Polity Press.
R. Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The Hague:
Mouton.
Kendon, Adam 1992. The negotiation of context in face-to-face interaction. In: Alessandro Dur-
anti and Charles Goodwin (eds.), Rethinking Context: Language as an Interactive Phenomenon,
Kendon, Adam 1993. Human gesture. In: Tim Ingold and Kathleen R. Gibson (eds.), Tools, Lan-
guage and Cognition in Human Evolution, 43–62. Cambridge: Cambridge University Press.
Interaction 27(3): 175–200.
Kendon, Adam 2000. Language and gesture: Unity or duality? In: David McNeill (ed.), Language
Press.
Laursen, Lone 2002. Kodeskift, gestik og sproglig identitet på internationale flersprogede arbejds-
pladser [Code-switching, gesture and identity in international work places]. PhD dissertation,
University of Southern Denmark.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David
Lerner, Gene H. and Don H. Zimmerman 2003. Action and the appearance of action in the con-
duct of very young children. In: Phillip Glenn, Curtis D. LeBaron and Jenny Mandelbaum
(eds.), Studies in Language and Social Interaction: In Honor of Robert Hopper, 441–457.
Maynard, Douglas W. and Steven E. Clayman 1991. The diversity of ethnomethodology. Annual
Review of Sociology 17: 385–418.
Psathas, George 1995. Conversation Analysis. The Study of Talk-in-Interaction. Thousand Oaks,
CA: Sage.
Rasmussen, Gitte 1998. The use of forms of address in intercultural business conversation. Revue
de Sémantique et Pragmatique 3: 57–72.
Rasmussen, Gitte 2000. Zur Bedeutung Kultureller Unterschiede in Interlingualen und Interkul-
turellen Gesprächen. Munich: Iudicium.
Sacks, Harvey 1995. The baby cried. The mommy picked it up. In: Gail Jefferson (ed.), Lectures
on Conversation, 236–242. Oxford: Blackwell. First published [1966].
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the
organisation of turn-taking for conversation. Language 50: 696–735.
37. Multimodal interaction 577
Schegloff, Emanuel 1984. On some gesture’s relation to talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Actions: Studies in Conversation Analysis, 266–296. Cam-
Schegloff, Emanuel 1998. Body torque. Social Research 65(3): 535–596.
Schenkein, Jim 1978. Studies in the Organization of Conversational Interaction. New York: Aca-
demic Press.
Searle, John 1962. Meaning and speech acts. Philosophical Review 71(4): 423–432.
Silverman, David 1998. Harvey Sacks: Social Science and Conversation Analysis. Cambridge: Pol-
ity Press; New York: Oxford University Press.
Streeck, Jürgen 1993. Gesture as communication. Communication Monographs 60: 275–299.
Ten Have, Paul 1999. Doing Conversation Analysis: A Practical Guide. Thousand Oaks, CA: Sage.
Wittgenstein, Ludwig 2001. Philosophische Untersuchungen/Philosophical Investigations. Oxford:
Blackwell. First published Oxford [1953].
Anders R. Hougaard, Odense (Denmark)

Gitte Rasmussen, Odense (Denmark)
37. Multimodal interaction

1. Introduction
2. Multimodal resources
3. Various resources mobilized within the time of the interaction
4. Materiality: Object manipulation as a prolongation of gesture
5. Spatiality: The establishment and transformation of interactional spaces
6. Mobility: Bodies in movement
7. Conclusion: Challenges
8. References
Abstract
This chapter sketches some of the embodied feature characterizing social interaction as a
locally situated and mutually intelligible achievement of the participants. It focuses on the
interactional dimension and relevance of gesture and other embodied conduct. After a
short presentation of the issues raised by the study of multimodal interaction, the chapter
focuses on the notion of “resource”, which covers both conventional forms – such as
grammar – and less standardized and more opportunistic means that are used by parti-
cipants to build the intersubjective accountability of their actions – such as gesture,
gaze, head movements, facial expressions, body posture, and body movements. In this
way, the notion of “multimodal resource” invites to take into consideration the irremedi-
able indexicality of linguistic resources, as well as the systematic and methodic use of em-
bodied resources. The chapter describes some features of these resources, particularly
focusing on their temporal and sequential unfolding in an emergent and incremental
way. Issues such as the timed coordination between participants in interaction, the emer-
gent time of talk, and the synchronization of multiple simultaneous embodied actions
are discussed. Finally, the chapter deals with larger issues, going beyond gesture and
concerning the body as a whole: the way in which material objects can be considered in
this perspective, as well as the importance of considering not only actions involving static
bodies but also mobile activities.
1. Introduction
Social interaction is the primordial site of language use – its “home habitat” (Schegloff
1996) – as well as the context and activity in and through which children and adults
learn a new language and socialize. Therefore, interaction is the fundamental site for
observing how language and, more generally, communication works in both a situated
and a systematic way, for describing how social relations, and also emotions and cogni-
tive processes, develop and unfold in real time and in real settings, and for studying the
resources on which participants rely in order to communicate together.
In face-to-face interaction, participants not only speak together, but also gesticulate
and move their bodies in meaningful and coordinated ways. Gesture studies have shown
that gestures in conversation originate through the same process that produces words
(Kendon 1980; McNeill 1985). Made predominantly by speakers, but strongly oriented
to their partners (Schegloff 1984), gestures are finely synchronized with the structure of
discourse (Müller 1998) and of talk in interaction (Kendon 2004; Bohle 2001), display-
ing the “same improvisational quality as do words in conversation” (Bavelas et al. 1992:
470), and they are finely tuned with the conduct of the co-participants to whom they are
addressed. This has prompted gesture studies to investigate “interactive gestures”
(Bavelas et al. 1992) in dialogue – that is, gestures that do not refer to the topics at
hand but instead refer to the interlocutor, monitoring shared understanding and estab-
lishing common ground (Clark 1996), seeking agreement, and also maintaining conver-
sation and regulating the turns at talk. These gestures, which typically take the form of
a pointing towards the interlocutor or more complex hand shapes such as exposed
palm, offering open hand, etc., belong to the range of visibly-embodied resources
that participants mobilize in order to build a systematic order of social interaction.
Interest in how human interaction works, not only in its ordinary social and every-
day, but also in its professional and institutional settings, has prompted the study of
video recordings of naturally occurring activities, aimed at understanding how partici-
pants smoothly achieve the finely-tuned complex coordination of their actions. On
the basis of naturalistic data, conversation analysis, inspired by Garfinkel’s ethnometho-
dology (1967), has focused on human interaction as endogenously and methodically or-
ganized. This means it is not considered as being governed by external norms and rules,
but as being locally achieved by the participants, based on micro-practices that are both
context-free and context-shaped (Heritage 1984) – such as practices for self-selecting,
for beginning a new turn, for recognizing transition-relevance points (Sacks, Schegloff,
and Jefferson 1974; Lerner 2003), or for repairing troubles (Schegloff, Jefferson, and
Sacks 1977). These practices are organized by the participants in a publicly accountable
way, that is, in a way which is intelligibly produced for, and interpreted by, them as
action unfolds in real time. This public accountability is built through the mobilization
of a range of resources, which includes language and gesture but which also integrates
other aspects of bodily conduct, such as body postures and movements. The booming
literature on multimodal resources (“multimodality” being conceived in this perspec-
tive as comprising language, gesture, gaze, head movements, facial expressions, body
postures, and – increasingly – object manipulation, technology and body movements

within space) shows that participants exploit both conventional forms and improvised
and occasioned means to produce the intelligibility of their actions and coordinate
with the actions of others.
In this brief review, we first reflect on the notion of “resource”. Then, by reviewing
some of the literature, mainly focusing on conversation analysis and ethnomethodology,
we show how studies of social interaction have progressively taken into consideration not
only gesture and gaze but also body movements and the constraints and potentialities of
interactional space and of material environment.
2. Multimodal resources
The notion of “resource” covers both conventional forms – such as grammar – and less
standardized and more opportunistic means that are used by participants to build the
intersubjective accountability of their actions. Thus, the notion of “resource” invites
us to take into consideration the irremediable indexicality of linguistic resources, as
well as the systematic and methodic use of embodied resources. This avoids us reifying
certain well-studied (mostly linguistic) resources and ignoring other less-studied ones,
or extracting the resources from the context in which they are situated. The use of a
resource is reflexive in an ethnomethodological sense (Garfinkel and Sacks 1970;
Heritage 1984), and refers to the fact that action is reflexive to the circumstances of
its organization, both adjusting to those circumstances and transforming them. Reflex-
ivity makes what is being used – in a local, contingent, and opportunistic way – into an
organizationally relevant detail. The value and meaning of a resource is context-
dependent, being related both to the sequential organization of social interaction and
to the situated occasion of its use. In return, a resource also shapes the particular inter-
pretation of the context that is made adequate and relevant at that particular point.
Moreover, the contextually-specific use of a resource might also shape its form and
intelligibility as a resource that will be available in the future – thus prompting language,
and more generally semiotic, change.
What Schegloff says about language can be generalized for other multimodal
resources:
The central prospect, then, is that grammar stands in a reflexive relationship to the orga-
nization of a spate of talk as a turn. On the one hand, the organizational contingencies of
talking in a turn […] shape grammar – both grammar as an abstract, formal organization
and the grammar of a particular utterance. On the other hand, the progressive grammatical
realization of a spate of talk on a particular occasion can shape the exigencies of the turn as
a unit of interactional participation on that occasion, and the grammatical properties of a
language may contribute to the organization of turns-at-talk in that language and of the
turn-taking device by which they are deployed. (Schegloff 1996: 56)
For linguistic resources, the notion of reflexivity means that grammatical constructions
and other linguistic forms are used by interlocutors by exploiting various available fea-
tures, but are also re-configured within their very use. Ultimately, linguistic resources
can be seen as being shaped by repeated configuring uses within interaction – the
repeated mobilization of forms in given sequential environments for the practical
purpose of achieving a given interactional action being a grounding force of
grammaticalization and entrenchment. This is particularly important for multimodal re-

sources: the flexibility, dynamicity and plasticity of resources are fundamental as they
constantly change and evolve in the real time of the interaction.
3. Various resources mobilized within the time of the interaction

Very early on, the very precise temporality of co-verbal gestures was described in ges-
ture studies. For example, speaker’s gestures generally slightly precede their lexical af-
filiates (Kendon 1980; McNeill 1992; Schegloff 1984). Their peculiar timing is the result
of interactive work, by which talk and gesture, or talk and posture, are organized in a
way that aligns them temporally, for example either by delaying talk to adjust to gesture
or the reverse (Condon 1971; Kendon 2004: 135). Co-occurrence of gesture and speech
has been widely documented in gesture studies, in approaches suggesting that talk and
gesture originate in the same conceptual structure (McNeill 1985), studying talk and
gesture as “composite signals” (Clark 1996: 156), as belonging to an “integrated mes-
sage model”, and showing that gesture and facial displays are used simultaneously
with words, being mobilized together to produce “visible acts of meaning” (Bavelas
and Chovil 2000) or “visible action as utterance” (Kendon 2004). However, despite
these studies, the way in which multimodal resources as a whole (meaning a wider
range of embodied resources than just gesture) are mobilized within multiple temporal
and sequential relationships in natural interaction remains to be systematically investi-
gated, focusing not only on the speaker but also on the actions of the addressees and on
the entire participation framework (Goodwin 1981, 2000, 2007b).
This enlarged approach of multimodality – considered as the integrated study of all
the relevant linguistic, embodied, and material resources participants mobilize for orga-
nizing social interaction in an audible-visible intelligible way – can benefit from the con-
ceptualization of temporality and sequentiality within interactional linguistics and
conversation analysis.Interactional linguistics and conversation analysis have shown
the need to “retemporalize” language (Auer, Couper-Kuhlen, and Müller 1999): far
from being a set of potentialities actualized within an abstract system of possible com-
binations, language is considered as being used, changed, and structured by its situated
use in real time. The focus on real time and on the online production and interpretation
of language within social interaction invites us to see linguistic structures in talk as
incremental (Auer 2009) and emergent (Hopper 1987). Turns are interactively designed
by the participants moment by moment in an interactive way (Goodwin 1979), the
speaker orienting to and taking into consideration the conduct of the co-participants,
reflexively adjusting her emergent turn to what they are (or are not) doing. Turns are
formatted step by step through an orderly series of sequential positions, such as pre-
beginnings and turn-initial positions and so on until pre-completion positions, comple-
tion positions and even post-completion positions (Schegloff 1996). These sequential
positions are interactively achieved, in a flexible and contingent way (Ford 2004), con-
structing units of talk which are a dynamic, negotiated, and collective achievement.
Within interactional linguistics, the grammatical features of turn-constructional units
and the interplay of multiple dimensions have been taken into consideration, from
syntax to prosody (Ochs, Schegloff, and Thompson 1996; Hakulinen and Selting 2005;
Couper-Kuhlen and Selting 1996). The integration of multimodality enlarges this
multiplicity, by considering, along with talk, gesture, gaze, facial expression, posture,
and body movements. Every one of these dimensions unfolds in time too, concurrently
with talk, constituting the sequential embodied organization of action. Therefore, the
embodied design of turns within interaction integrates not only gesture but also the
way in which the turns are formatted according to the presence or absence of gaze of
the co-participant on the speaker. So, Schegloff (1984) observes that the fact, that ges-
ture prefigures the meaning that will be conveyed by speech, enables the recipient to
build an understanding step by step of the turn, thus creating a projection space that
permits him to timely respond and act in a relevant and emergent way as the turn un-
folds. Goodwin (1981) shows that this progressivity can be delayed: the speaker pro-
duces re-starts when he notices that the co-participant is not gazing at him and, in
turn, these re-starts work both as a delaying device, suspending the progression of
the turn until the gaze is secured, and as an attention-getting device, requesting the
attention of the co-participant. Streeck (1993) finds a gesture-gaze pattern in his data
documenting descriptions in dialogue, which consists in the fact that “the speaker
looks at her own gesture before the word that carries the key information, returns
gaze to the listener once the keyword is uttered” (Streeck 1993: 234): in this way, ges-
tures become “objects of attention” which are offered for inspection to both the listener
and the speaker, the gaze marking the communicative relevance of the gesture. More
generally, gestures are coordinated with the state of attention of the participants:
they are closely coordinated with a monitoring of what the other participants do
(M.H. Goodwin 1980), they are sensitive to the participation framework in which the
speaker can either find or miss a recipient (Goodwin 1979), and their trajectory is orga-
nized after having secured the visibility of their target, through a preliminary rearran-
gement of the local environment. In this respect, studies of deictic reference
(Hindmarsh and Heath 2000a; Goodwin 2003; Mondada 2007) show that prior to the
use of referential expressions the speaker engages in intensive interactional work, in
order not only to get the relevant attention of the co-participants but also to (re)arrange
their entire bodies in such a way that the deictic action (achieved through linguistic and
gestural resources) lies within the focus of attention of the participants. As Goodwin
(2003) shows, pointing and talking are formatted together by taking into consideration
the surrounding space, the activity in which the participants are engaged, and the par-
ticipants’ mutual orientation. In another study, using the example of archaeologists ex-
cavating soil, Goodwin (2000) shows how participants actively constitute a visual field
which has to be scrutinized, parsed and understood together by the co-participants in
order to find out where the speaker is pointing. The archaeologists juxtapose language,
gesture, tools (such as trowels), and graphic fields (such as maps) on a domain of scru-
tiny, which is surrounding them but is also being delimitated by the very act of referring
to it. In this sense, gestures are environmentally coupled (Goodwin 2007a), and not
used as a separated resource coming from the exterior world into a pre-existing context:
the domain of scrutiny is transformed and reorganized by the very action of pointing,
done within the current task. As Hindmarsh and Heath (2000a) show, these gestures,
and body movements amplifying them, are realized in a way that is recipient-designed,
that is they indicate and even display the referent for the co-participants, at the relevant
moment, when the referent is visible for them. Pointing gestures are “produced and
timed with respect of the activities of the co-participants, such that they are in a position
to be able to see the pointing gesture in the course of its production” (Hindmarsh and
Heath 2000a: 1868). Thus, the organization of the gesture and the body of the speaker
are adjusted to the recipient, in order to guide him in the material environment and
towards the referent. Since recipients display their understanding and grasp of the
action going on, speakers adjust to the production of these expressions, or to their
absence or delay.
This mutual orientation involves not only talk and gesture but also the entire body,
gazing on and bending towards the object (Hindmarsh and Heath 2000a) and, more
radically, actively rearranging the surrounding environment. Mondada (2007) shows
how speakers, prior to the utterance of the deictic, dispose their bodies within space,
reposition objects within space, and even restructure the environment. The deictic
and the pointing gesture are produced only after participants have organized the dispo-
sition of the spatial context. Thus, deictic words and gestures are not merely adapting to
a pre-existing and immutable context; they are part of an action which actively renews
and changes the context, rearranging the interactional space in the most appropriate
way for the pointing to take place.In these cases, the emergent and progressive tempor-
ality of talk and action is suspended, delayed, or postponed until the conditions for joint
attention or a common focus of attention are fulfilled. The emergent organization of
talk and action concerns not only gesture and gaze but also the moving body, the sur-
rounding space and the material environment. What emerges from these contributions
is the necessity to go beyond the study of single “modalities” coordinated with talk, and
to take into consideration the broader embodied and environmentally situated organi-
zation of activities (Streeck, Goodwin, and LeBaron 2011). Consequently, in what fol-
lows we sketch some fields that are currently being investigated and which open up the
variety of multimodal resources to be considered within social interaction: materiality,
spatiality and mobility.
4. Materiality: Object manipulation as a prolongation of gesture

Gestures are not meaningful per se in isolation but only in context, and more precisely
as “environmentally coupled” (Goodwin 2007a). For instance, Goodwin shows that in
order to analyze the turn “She sold me this. But she didn’t sell me this (0.2) or tha:
t”, one has to take into account “the integrated use of language, the body and objects
in the world” (Goodwin 2009: 107). Actually, the speaker is holding a jar in his hand,
which has been bought on the internet. The missing part of the object is made visible
by rotating the hand at the bottom of the jar, and the words are made meaningful by
the articulation between talk, gesture and the object. As Goodwin notes, gestures
coupled to phenomena in the environment are pervasive in many settings. Even
more radically, LeBaron and Streeck (2000) consider that “hands-on interactions
with things” and tactile manipulations of objects constitute the experiential grounding
of more abstract and symbolic conversational gesture: symbols embody experiences
that have emerged in situated action (LeBaron and Streeck 2000: 136) and “hands
learn how to handle things before they learn how to gesticulate” (LeBaron and Streeck
2000: 137; see also Streeck 2009). Besides ordinary objects, video analyses in ethno-
methodology and conversation analysis have paid special attention to those particular
objects that are texts and visualizations – which are both objects that are handled,
grasped, and manipulated with the hands, and semiotic objects that can be read. In med-
ical settings, the way doctors turn to such objects during the consultation (Heath 1986;
DiMatteo et al. 2003; Robinson 1998) or produce texts and files (Heath 1982; Heath and
Luff 1996) has been documented. Likewise, the reflexive constitution of artifacts and
inscriptions by the way in which they are locally mobilized within the work activity
(Hindmarsh and Heath 2000b; Suchman 2000; Mondada 2006) and during scientific ac-
tivities (Ochs, Gonzales, and Jacoby 1996; Roth and Lawless 2002) has been studied.
Goodwin’s study of the use of the Munsell chart by archaeologists (Goodwin 1999) de-
tails this complex web of multimodal resources in an exemplary way. He describes the
work of archaeologists excavating soil and examining its color, holding a coding form to
be filled with soil description and the Munsell chart allowing the comparison between
different shades of color. The mobilization of the chart is done in an ordered way,
aligned with other movements such as the arrangement of the bodies, the participants’
gaze, their pointing gestures, and the holding of the trowel, as well as with the partici-
pants’ talk, which together make it possible to compare the colors of the Munsell chart
with the color of the soil sample. The participants agree, disagree, and negotiate the
final color to inscribe on the form. “Seeing” color with the Munsell chart is not the auto-
matic result of a procedure; it is a situated achievement needing the prior alignment of
the local action space and, thus, requiring time. In turn, the analysis of the use of the
Munsell chart is based on various video views of the archaeologists’ action, some of
them being quite narrowly framed close-ups, and the reproduction of various materials
(an example of a soil description form, various pictures of the Munsell chart, and the
Munsell book). In summary, attention to the objects permits the investigation not
only of the manual actions of the hands but also the way in which those actions are an-
chored within the environment – not forgetting the common focus of attention these
manipulated objects create among the co-participants.
5. Spatiality: The establishment and transformation of

interactional spaces
The focus on bodies and objects entails a renewed interest in the environment and in
the way in which embodied conducts adapt, exploit and transform the features of the
material surroundings of the action. In this sense, space is not just a pre-existing envi-
ronment but is a configuration created by the embodied disposition, orientation, and
arrangement of the participants within the interaction. Early on, Goffman (1963,
1964) showed that body arrangements in space create temporary territories with chan-
ging boundaries. These territories are recognized by the participants involved in the
encounter, and also by bystanders. The positions of the bodies delimit a temporary
“ecological huddle” (Goffman 1964) which materializes the “situated activity system”.
These arrangements constitute what Goffman (1963) calls “focused gatherings”, which
are defined by mutual orientation and shared attention, as displayed by body positions,
postures, gaze and addressed gestures. This interest in temporary territories, and in their
effectiveness, is shared by Ashcraft and Scheflen (1976). On the basis of video-taped
encounters in private and public settings, they observe that “the unoccupied space in
the center of the group nevertheless becomes a claimed territory. Others outside the cir-
cle customarily recognize the territory.” (Ashcraft and Scheflen 1976: 7). Kendon (1977,
1990: 248–249) conceptualizes this territory by using the notion of “F-formation”. Using
this notion, he refers to how different body positions and orientations build an arrange-
ment favoring a common focus of attention and the engagement in a joint activity. In
his work, Goodwin (2000, 2003, 2007a) insists on the mutual relationships between
embodied actions and material environment, defining what he calls “contextual config-
urations”. If the analysis of talk has to take into consideration the embodied actions of
the participants, the study of gesture or body postures cannot be developed in isolation,
but has to describe the way in which the structure of the environment contributes to the
organization of the interaction. Drawing on these inspirations, Mondada (2007, 2009,
2011a) proposes that “interactional space” is constituted through the situated, mutually
adjusted and changing arrangements of the participants’ bodies within space. This pro-
duces a configuration relevant to the activity they are engaged in, their mutual attention
and their common focus of attention, the objects they manipulate and the way in which
they coordinate in joint action. This interactional space is constantly being established
and transformed within the activity (De Stefani and Mondada 2007; LeBaron and
Streeck 1997; Mondada 2009a, 2011a; De Stefani 2011; Hausendorf, Mondada, and
Schmitt 2012). The transformation of interactional space is achieved by the bodily ar-
rangements of the participants constituting mobile configurations and mobile forma-
tions. This is even more the case with interactional spaces constituted through and
within mobile activities such as walking, driving, biking, etc.
6. Mobility: Bodies in movement

The focus on space has prompted an observation of what the entire body does in an
interaction. This in turn has recently drawn attention towards bodies in motion.
Much previous research had predominantly focused on interactions in static settings
or within a local site, and less attention was paid to interactions occurring in a mobile
situation, either with the bodies in motion, as in walking, or with participants moving in
a car (Haddington, Keisanen, and Nevile 2012) or even in an airplane (Nevile 2004).
Walking in interaction is an interesting case of mobility, because it involves the entire
body. Studies on walking emerge within an increasing interest in the distribution of
bodies within the environment, adjusting to the complex ecology of public spaces (Mon-
dada 2009a), complex workplace, or other activity contexts (Heath and Luff 2000) and
creating the dynamic “interactional space” of the exchange (Hausendorf, Mondada,
and Schmitt 2012). Walking practices are interesting because
(i) they show the importance of the entire body, and not only of gesture or of the
upper parts of the body for the organization of social interaction,
(ii) they concern a mobile body, and not only a static one,
(iii) they are methodically organized within interaction (such as in walking together),
and
(iv) they are chronologically and sequentially finely-tuned with the organization of
talk, reflexively contributing to its intelligibility.
Early works on walking already describe “doing walking” as a methodic practice and a
concerted accomplishment (Ryave and Schenkein 1974: 265). Members achieve walking
together, being recognized both as a “vehicular unit” (Goffman 1971: 8) and as “withs”
(Goffman 1971: 19). In walking together, participants organize their concerted action
both within their group – by maintaining proximity and pace, speeding up and slowing
down, managing turns and stopping together (see Haddington, Mondada, and Nevile in
press; De Stefani 2011) – and with respect to other passers-by, – while navigating within
a crowd, avoiding collisions, adjusting to the trajectory of others, and even making
accountable their interruptions of trajectory (Lee and Watson 1993; Watson 2005).
Two mobile units can also converge, for example when various couples or groups
meet and merge, thereby constituting one unique interactional space (Mondada
2009a). Conversely, people can also display that they are not with, exhibiting civil
inattention and minimizing the effects of co-presence (Goffman 1971; Sudnow
1972). As noted by Ryave and Schenkein, the fact that these challenges are resolved
in unproblematic ways reveals “the nature of the work executed routinely by partic-
ipant walkers” (Ryave and Schenkein 1974: 267). Moreover, collective walking activ-
ities are organized by being oriented in a finely-tuned way to the organization of talk
and even to the details of the emergent construction of turns and sequences at talk:
Relieu (1999) shows how turn-design is sensitive to the spatial ecology encountered
by speakers talking and walking, and Mondada (2009a) shows how the first turn of
an encounter is finely designed with respect to the walking body of the co-participant.
Walking practices, such as walking away (Broth and Mondada forthcoming) or run-
ning away (Deppermann, Schmitt, and Mondada 2010), orient to transition-relevance
points and to transitions from one activity to the other; conversely, they achieve the
accountability of these transitions and contribute towards the achievement of these
transitions in a publicly visible way in which all co-participants can align – or eventually
desalign.
7. Conclusion: Challenges
Multimodal interaction opens up an extremely rich field of investigation, expanding
prior knowledge of social interaction, and was initially based on audio recordings favor-
ing talk and on videos focusing on specific embodied resources, such as gesture. A wider
notion of the multimodal resources mobilized by participants for building their account-
able actions includes language, gesture, gaze, facial expressions, body posture, body
movements such as walking, and embodied manipulations of artifacts. This enlarged
vision opens up various challenges, both methodologically and theoretically. Method-
ologically, the study of relevant details concerning the entire body challenges the way
in which social action is documented, firstly through video recordings of naturally occur-
ring interactions in their ordinary social settings, and secondly through transcripts and
other forms of annotation. The documentation of participants engaged in mobile activ-
ities within complex settings, involving not only their bodies but also various material
and spatial environmental details, requires video recordings and video technologies
that are relevantly adjusted to the activities observed (see Mondada 2012). The repre-
sentation of complex conduct involving a variety of multimodal resources also chal-
lenges traditional, more linear, transcripts, and requires more and more sophisticated
annotation and alignment tools (like ELAN, ANVIL or CLAN). Analytically, this
rich documentation makes the reconstruction of the moment-by-moment temporality
of the emerging interaction complex. Multimodality is characterized by multiple tem-
poralities and multiple sequentialities operating at the same time. Multiple details
unfold both simultaneously and successively – and even multiple courses of action
since often simultaneous activities are concerned (such as in body-torqued positions,
Schegloff 1998; but also in various parallel actions such as working-and-overhearing,
M.H. Goodwin 1996; talking-and-operating, Mondada 2011b; talking-and-eating,
Mondada 2009b; and talking-and-driving, Haddington, Keisanen, and Nevile 2012).

Therefore, the conception of the emergent temporality of social interaction gets
increasingly complex, inviting the study of the social organization of embodied conducts
both in the specificity of their ecology and in their systematicity, the ordered interplay
of complex arrays of multimodal resources, and the participants’ orientation towards
these multiple details of embodied action.
8. References
Ashcraft, Norman and Albert Scheflen 1976. People Space: The Making and Breaking of Human
Boundaries. New York: Anchor.
Auer, Peter 2009. Online Syntax. Thoughts on the temporality of spoken language. Language
Sciences 31: 1–13.
Auer, Peter, Elizabeth Couper-Kuhlen and Frank Müller 1999. Language in Time. The Rhythm
and Tempo of Spoken Interaction. Oxford: Oxford University Press.
Bavelas, Janet and Nicole Chovil 2000. Visible acts of meaning. An integrated message model of
language in face-to-face dialogue. Journal of Language and Social Psychology 19(2): 163–193.
Bavelas, Janet, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gesture. Dis-
course Processes 15: 469–489.
Bohle, Ulrike 2001. Das Wort Ergreifen – das Wort Übergeben. Berlin: Weidler.
Broth, Mathias and Lorenza Mondada in press. Walking away: The embodied achievement of
activity closings in mobile interaction. Journal of Pragmatics.
Clark, Herbert 1996. Using Language. Cambridge: Cambridge University Press.
Condon, William S. 1971. Speech and body motion synchrony of the speaker-hearer. In: David L.
Horton and James Jenkins (eds.), Perception of Language, 150–173. Columbus: Merrill.
Couper-Kuhlen, Elisabeth and Margret Selting 1996. Prosody in Conversation: Interactional Stu-
dies. Cambridge: Cambridge University Press.
Deppermann, Arnulf, Reinhold Schmitt and Lorenza Mondada 2010. Agenda and emergence:
Contingent and planned activities in a meeting. Journal of Pragmatics 42: 1700–1712.
De Stefani, Elwys 2011. Ah Petta Ecco, io Prendo Questi che mi Piacciono’. Agire come Coppia al
Supermercato. Un Approccio Conversazionale e Multimodale allo Studio dei Processi Decisio-
nali. Roma: Aracne.
De Stefani, Elwys and Lorenza Mondada 2007. L’organizzazione multimodale e interazionale del-
l’orientamento spaziale in movimento. Bulletin Suisse de Linguistique Appliquée 85: 131–159.
DiMatteo, Robin, Jeffrey Robinson, John C. Heritage, Melissa Tabbarrah and Sarah Fox 2003.
Correspondence among patients’ self-reports, chart records, and audio/videotapes of medical
visits. Health Communication 15: 393–413.
Ford, Cecilia E. 2004. Contingency and units in interaction. Discourse Studies 6(1): 27–52.
Garfinkel, Harold and Harvey Sacks 1970. On formal structures of practical actions. In: John D.
McKinney and Edward A. Tiryakian (eds.), Theoretical Sociology, 337–366. New York: Apple-
ton-Century Crofts.
Goffman, Erving 1963. Behavior in Public Places: Notes on the Social Organization of Gathering.
New York: Free Press.
Goffman, Erving 1964. The neglected situation. American Anthropologist 66(6): 133–136.
Goffman, Erving 1971. Relations in Public: Microstudies of the Public Order. New York: Harper
and Row.
Goodwin, Charles 1979. The interactive construction of a sentence in natural conversation. In: George
Psathas (ed.), Everyday Language: Studies in Ethnomethodology, 97–121. New York: Irvington.
Goodwin, Charles 1999. Practices of color classification. Mind, Culture and Activity 7(1–2): 62–82.
guage, Culture and Cognition Meet, 217–241. Hillsdale, NJ: Lawrence Erlbaum.
Goodwin, Charles 2007a. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell
and Elena Terry Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Am-
sterdam: John Benjamins.
Goodwin, Charles 2007b. Participation, stance and affect in the organization of activities. Dis-
course and Society 18(1): 53–73.
Goodwin, Charles 2009. Things, bodies, and language. In: Bruce Fraser and Ken Turner (eds.),
Language in Life, and a Life in Language: Jacob Mey – A Festschrift, 106–109. Bingley, UK:
Emerald.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated for the production
of description sequences. Sociological Inquiry 50(3–4): 303–317.
Goodwin, Marjorie Harness 1996. Informings and announcements in their environments: Prosody
within a multi-activity work setting. In: Elizabeth Couper-Kuhlen and Margret Selting (eds.),
Prosody in Conversation: Interactional Studies, 436–461. Cambridge: Cambridge University
Press.
Haddington, Pentti, Tiina Keisanen and Maurice Nevile 2012. Meaning in Motion: Interaction
in Cars. Semiotica 192 (special issue).
Hakulinen, Auli and Margret Selting 2005. Syntax and Lexis in Conversation. Studies on the Use of
Linguistic Resources in Talk-in-Interaction. Amsterdam: John Benjamins.
Hausendorf, Heiko, Lorenza Mondada and Reinhold Schmitt 2012. Raum als Interaktive
Resource. Tübingen: Narr.
Heath, Christian 1982. Preserving the consultation: Medical record cards and professional con-
duct. Sociology of Health and Illness 4: 56–74.
University Press.
Heath, Christian and Paul Luff 1996. Documents and professional practices: “bad” organisa-
tional reasons for “good” clinical records. In: Mark S. Ackerman (ed.), Proceedings of the
1996 ACM conference on Computer supported cooperative work November 16–20, 1996,
354–363. New York, NY: ACM.
Heath, Christian and Paul Luff 2000. Technology in Action. Cambridge: Cambridge University
Press.
Heritage, John C. 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Hindmarsh, John and Christoph Heath 2000a. Embodied reference: A study of deixis in workplace
interaction. Journal of Pragmatics 32: 1855–1878.
Hindmarsh, John and Christoph Heath 2000b. Sharing the tools of the trade: The interactional
constitution of workplace objects. Journal of Contemporary Ethnography 29(5): 523–562.
Hopper, Paul 1987. Emergent grammar. Berkeley Linguistic Society 13: 139–157.
Kendon, Adam 1977. Studies in the Behavior of Face-to-Face Interaction. Lisse, the Netherlands:
Peter de Ridder Press.
Kendon, Adam 1980. Gesture and speech: Two aspects of the process of utterance. In: Mary
Kendon, Adam 1990. Conducting Interaction. Patterns of Behavior in Focused Encounters. Cam-
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
LeBaron, Curtis D. and Jürgen Streeck 1997. Built space and the interactional framing of experi-
ence during a murder interrogation. Human Studies 20: 1–25.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David
Lee, John R. E. and D. Rod Watson 1993. Regards et attitudes des passants. Les arrangements de
visibilité de la locomotion. Annales de la Recherche Urbaine, 57–58; 101–109.
McNeill, David 1985. So you think gestures are nonverbal? Psychology Review 92(3): 350–371.
of Chicago Press.
Mondada, Lorenza 2006. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8: 117–129.
Mondada, Lorenza 2007. Interaktionsraum und Koordinierung. In: Arnulf Depperman and Rein-
hold Schmitt (eds.), Koordination. Analysen zur Multimodalen Interaktion, 55–94. Tübingen,
Germany: Narr.
Mondada, Lorenza 2009a. Emergent focused interactions in public places: A systematic analysis of the
multimodal achievement of a common interactional space. Journal of Pragmatics 41: 1977–1997.
Mondada, Lorenza 2009b. The methodical organization of talking and eating: Assessments in din-
ner conversations. Food Quality and Preference 20: 558–571.
Mondada, Lorenza 2011a. The interactional production of multiple spatialities within a participa-
tory democracy meeting. Social Semiotics 21(2): 283–308.
Mondada, Lorenza 2011b. The organization of concurrent courses of action in surgical demonstra-
tions. In: Jürgen Streeck, Charles Goodwin and Curtis D. LeBaron (eds.), Embodied Interaction,
Language and Body in the Material World, 207–226. Cambridge: Cambridge University Press.
Mondada, Lorenza 2012. The Conversation Analytic Approach to Data Collection. In: Jack
Sidnell and Tanya Stivers (eds.), Handbook of Conversation Analysis, 32–56. Malden, MA:
Wiley-Blackwell.
Berlin: Berlin Verlag.
Nevile, Maurice 2004. Beyond the Black Box: Talk-in-Interaction in the Airline Cockpit. Aldershot,
UK: Ashgate.
Ochs, Elinor, Patrick Gonzales and Sally Jacoby 1996. When I come down I’m in the domain state:
Grammar and graphic representation in the interpretive activity of physicists. In: Elinor Ochs,
Emanuel Schegloff and Sandra E. Thompson (eds.), Interaction and Grammar, 328–369. Cam-
Ochs, Elinor, Emanuel A. Schegloff and Sandra A. Thompson (eds.) 1996. Grammar and Interac-
tion. Cambridge: Cambridge University Press.
Relieu, Marc 1999. Parler en marchant. Pour une écologie dynamique des échanges de paroles.
Langage et Société 89: 37–68.
Robinson, Jeffrey David 1998. Getting down to business: Talk, gaze, and body orientation during
openings in doctor-patient consultation. Human Communication Research 25: 98–124.
Roth, Wolff-Michael and Daniel V. Lawless 2002. When up is down and down is up: Body orien-
tation, proximity, and gestures as resources. Language in Society 31: 1–28.
Ryave, A. Lincoln and James Schenkein 1974. Notes on the art of walking. In: Roy Turner (ed.),
Ethnomethodology, 265–274. Harmondsworth, UK: Penguin.
Schegloff, Emanuel A. 1984. On Some Gestures’ Relation to Talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Action, 266–296. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. 1996. Turn organization: One intersection of grammar and interaction. In:
Elinor Ochs, Emanuel A. Schegloff and Sandra A. Thompson (eds.), Grammar and Interaction,
Schegloff, Emanuel A. 1998. Body torque. Social Research 65(3): 535–586.
Schegloff, Emanuel A., Gail Jefferson and Harvey Sacks 1977. The preference for self-correction
in the organization of repair in conversation. Language 53: 361–382.
38. Verbal, vocal, and visual practices in conversational interaction 589
Streeck, Jürgen 2009. Gesturecraft: The Manufacture of Understanding. Amsterdam: John
Benjamins.
Streeck, Jürgen, Charles Goodwin and Curtis D. LeBaron 2011. Embodied Interaction, Language
and Body in the Material World. Cambridge: Cambridge University Press.
Suchman, Lucy 2000. Making a case: “Knowledge” and “routine” work in document production.
In: Paul Luff, John Hindmarsh and Christian Heath (eds.), Workplace Studies. Recovering
Work Practice and Informing System Design, 29–45. Cambridge: Cambridge University Press.
Sudnow, David 1972. Temporal parameters of interpersonal observation. In: David Sudnow (ed.),
Studies in Social Interaction, 259–279. New York: Free Press.
Watson, Rod 2005. The visibility arrangements of public space: Conceptual resources and methodo-
logical issues in analysing pedestrian movements. Communication & Cognition 38(1–2): 201–227.
Lorenza Mondada, Basel (Switzerland)
38. Verbal, vocal, and visual practices in

conversational interaction
1. Introduction
2. Theoretical considerations: Multi-modality
3. On some relations/parallels between verbal, vocal, and visual practices
4. Goals and methodology
5. Sample analysis
6. Conclusion
7. References
Abstract
Drawing on the theories and methodologies of Conversation Analysis, Interactional Lin-
guistics and research in Multimodality, this article explores some aspects of the relation-
ship between verbal, vocal and visual practices in social interaction. After giving an
outline of some results of previous research on the use and the relationship between ver-
bal, vocal and visual resources in social interaction in general, and after outlining the
goals and methodology of this paper, I will present a sample analysis of an extract
from a video-taped conversation. In this analysis, I will make explicit and demonstrate
how participants in interaction use verbal, vocal and visual cues in co-occurrence and
concurrence in order to organize their interaction and, in this particular case, make affec-
tivity interpretable for the recipient, who then is expected to respond, and in the example
shown responds affiliatively.
1. Introduction
For a long time, research on language and interaction, in the field of Conversation Ana-
lysis (CA) and neighboring approaches such as Interactional Linguistics (IL), has
focussed on the verbal aspects of “talk-in-interaction” (Schegloff 1996). As a result,

many researchers who did not want to feel trapped by the intricacies of complex
face-to-face data chose telephone conversations for their data bases. These represent
nicely delimited interactions between two participants, with clear beginnings and end-
ings, which can be validly studied without needing to worry about the visual channel.
While many of their analyses focus on verbal interaction, Conversation Analysis practi-
tioners have also investigated, where necessary, gaze, gesture and body posture (see,
e.g., Lerner 2003; Schegloff 1984, 2005). As interlocutors on the telephone normally
(still) cannot see each other, analysis of only the audio channel for this kind of data
seems a safe and justified choice. As, at least on the surface, there does not seem to be a
great difference between participants’ use of linguistic and interactional devices in tele-
phone and face-to-face conversations, these data have often been used side-by-side in
data bases for doing Conversation Analysis (see, e.g., Sacks, Schegloff, and Jefferson
1974; Schegloff 1997) and Interactional Linguistics (see Couper-Kuhlen and Selting 2001).
Analyses in these approaches, based on these data types, have provided many impor-
tant insights into the sequential and linguistic details of the management of conversa-
tional interaction. During the last decades, research in these traditions has seen,
among other things, progress in the analysis and description of the fundamental me-
chanisms and sequences of interaction, such as the construction of units and turns,
the organization of turn-taking, repair, preference organization, the contextualization
of practices and actions, etc. (For introductions, see Hutchby and Wooffitt 1998, ten
Have 1999, Schegloff 2007). Research has focussed on the role of linguistic structures
such as, most notably, prosody and syntax in talk-in-interaction, underpinning many
Conversation Analysis sequential analyses with linguistic details. (See Couper-Kuhlen
and Selting 1996, 2001; Couper-Kuhlen and Ford 2004; Hakulinen and Selting 2005.)
More recently, however, researchers such as C. and M. H. Goodwin have maintained
that talk on the telephone constitutes a very restricted semiotic environment that re-
flects only very little of what we normally engage in in our daily lives and routines
(see C. Goodwin and M. H. Goodwin 1987; C. Goodwin 2000, 2007, 2010, etc.). In con-
trast to conventional telephone conversations, face-to-face interactions normally
involve both the audio as well as the visual channel of communication, often in work-
place or other environments where participants manipulate objects. The analysis of this
kind of everyday interaction requires, besides sequential and linguistic analyses of the
verbal and vocal aspects of interaction, also the analysis of what has been called the
nonverbal or visual aspects of interaction, i.e. the analysis of what has come to be called
the “multimodality” of interaction.
Since recent technological developments now allow easy video-recording and pro-
cessing of video data of interaction, this has recently become the first choice for data
collection.
In this paper, I will try to show how participants in interaction use verbal, vocal and
visual cues in co-occurrence and concurrence in order to organize their interaction.
After clarifying my terminology in the remainder of this section 1, in section 2, I will
outline some relevant aspects of the history of the field and some theoretical considera-
tions in general, while in section 3, I will present some results of previous research on
the relation between verbal, vocal, and visual practices. After outlining the goals and
methodology of this paper in section 4, I will present a sample analysis in section 5.
Finally, in section 6, I will draw conclusions.
Before going into more detail, the clarification of the terminology in the title and in
the following paper is necessary.
I will conceive of verbal practices and aspects of “talk-and-other-conduct-in-interaction”
(Schegloff 2005) as comprising the use of resources from the domains of rhetoric, lexico-
semantics, syntax, and segmental phonetics and phonology.
Vocal practices encompass the use of resources from the domains of prosody and
voice-quality. In this sense, the conception of vocal is co-extensive with the Firthian
conception of prosody (see Firth 1957). Firth conceived of prosodies as all types of syn-
tagmatic relationships between syllables that are not determined by the structure of
words and utterances. These include syllable structures, stress/accentuation, tone, qual-
ity and quantity, and, where applicable, also phenomena like glottalization, aspiration,
nasalization, whisper etc. Following Firth, all suprasegmental phenomena that are
constituted by the interplay of pitch, loudness, duration and voice quality can be under-
stood as prosodic, as long as they are used – independently of the language’s segmental
structure – as communicative signals (see Couper-Kuhlen 2000: 2; Kelly and Local 1989;
Selting 2010b).
In face-to-face interaction, visual practices of the participants also play an important
role. Yet, there is a problem of terminology here. The term nonverbal in expressions
such as nonverbal communication or nonverbal activities has been criticized because
it implies an unwanted verbal bias of research. Yet the term visual practices or visual
aspects, which is often used as an alternative, is problematic too, since it also includes
some aspects of articulatory phonetics and prosody that we would not want to include
as meaningful interactional devices: for instance, lip movements characteristic of the
production of segmental sounds like high front versus high back vowels ([i] vs. [u]), bi-
labial or labio-dental plosives, nasals, or fricatives. At the same time, lip movements are
realized to produce particular voice qualities, such as spread lips, “pursed lips”, for smil-
ing voice or as cues to suggest irony or the like respectively. Do we count these as con-
comitant cues of the particular voice quality – or as visual cues in themselves? Where,
thus, is the borderline between articulatory phonetic and visual cues? On the one hand,
this problem demonstrates that the cues which scientific practice and terminology want to
precisely define and allocate are deployed in co-occurrence and concurrence in the reality
of multimodal interaction. The allocation of cues to different categories is an analytic
problem, not a practical one for the participants in interaction. The only relevant issue
for participants is: which of the multiplicity of cues is interactionally relevant? Neverthe-
less, on the other hand, in order to enhance scientific generalizations, we need to isolate,
classify and categorize the different kinds of cues deployed in multimodal interaction. In
the following, I will use the terms visual practices and visual resources, referring to re-
sources from the domains of gaze, facial expression, gesture, posture, object manipula-
tion, etc. as long as they are deployed for the signalling of interactional meaning.
2. Theoretical considerations: Multi-modality

For a long time, the segmental-verbal, phonetic-prosodic-vocal and visual aspects of
talk and interaction have been studied separately, even in different disciplines and dif-
ferent departments of our research institutions. Predominantly, there was (and still is) a
mere juxtaposition of research in conversation analysis and gesture research (see also
Bohle 2007: 157).
Until quite recently, many students of primarily verbal discourse and interaction, in
encountering the necessity of dealing with other than the verbal aspects of communica-
tion, have been concerned with relationships between the verbal and non-verbal parts
of the messages or actions studied (see, e.g., Schönherr 1993, 1997). Only recently has
this changed, as Sidnell (2006: 379f.) notes: “current work on multimodality focuses on
questions of integration (or “reassembly” as Schegloff 2005 put it) by putting at the
forefront the question of how different modalities are integrated so as to form coherent
courses of action.”
When we look at video recordings of social interaction, we can see, as C. Goodwin
(2000) phrases it,
that the construction of action through talk within situated interaction is accomplished
through the temporally unfolding juxtaposition of quite different kinds of semiotic re-
sources, and that moreover through this process the human body is made publicly visible
as the site for a range of structurally different kinds of displays implicated in the constitu-
tion of the actions of the moment. (Goodwin 2000: 1490)
Stivers and Sidnell (2005), following Enfield (2005), distinguish between the vocal/aural
and visuospatial modalities, the vocal/aural modality encompassing spoken language in-
cluding prosody, the visuospatial modality including gesture, gaze, and body postures.
They point out that they do not want to rate the modalities differently with respect
to their relevance:
by looking at interaction from a multimodal perspective we do not mean to privilege one

modality over another (e.g., visuospatial over vocal/aural) but rather to suggest that much
can be gained from examining a turn-at-talk for where it is situated vocally (e.g., sequen-
tially, prosodically, syntactically) as well as visuospatially (e.g., body orientation, facial
expression, accompanying gestures), and that different modalities should not, a priori,
be treated as more or less important. (Stivers and Sidnell 2005: 2)
If we are to move towards a theory of social interaction, we will need to understand not
only how the vocal modality works but how the different channels and modalities work
together as well as the mechanics that underlie such co-operation (Stivers and Sidnell
2005: 15; see also Schmitt 2007: 399).
Yet, with respect to how precisely the concurrent modalities are understood as working
together by participants in interaction, Levinson (2006) points out that synchrony alone
cannot explain this:
Face-to-face interaction is characterized by multi-modal signal streams – visual, auditory,

haptic at the receiving end, and kinesic, vocal and motor/tactile at the producing end.
These streams present a “binding problem” – requiring linking of elements which belong
to one another across time and modality (e.g. a gesture may illustrate words that come
later, a hand grasp may go with the following greeting). (Levinson 2006: 46)
Careful inspection of video records shows that synchrony alone won’t do the trick of hook-
ing up the bits in the different signal streams: gestures, facial expressions, nods and the like
can come earlier or later than the words they go with. There seems thus to be a significant
“binding problem” in hooking up the signals that go together. If temporal binding is not
sufficient, what will do the trick? (Levinson 2006: 53).
Sidnell proposes that it is the recognizable construction, interpretation and common

engagement in activities such as word search (see M. H. Goodwin and C. Goodwin
1986), or reenactment actions or events during tellings in interaction (Sidnell 2006)
that enables participants to relate various modalities to one another, in part by seeing
them as part of a larger system of activity (see Sidnell 2006: 404):
To investigate multimodality, one needs to pay attention to the level of structured activ-
ities: those situated activity systems within which analysts and the coparticipants encounter
gestures, directed gaze, and talk working together in a coordinated and differentiated way.
This is a unit of interaction that is relatively discrete; has a beginning, middle, and end; and
provides a structure of opportunities for participation. (Sidnell 2006: 380)
This is thus the site where Conversation Analysis, Interactional Linguistics and
Multimodality can meet to fulfill their goals together.
3. On some relations/parallels between verbal, vocal,

and visual practices
Research in Interactional Linguistics has revealed many parallel structures between the
verbal and vocal practices and devices for the construction of turn-constructional units
and turns in talk-in-interaction. The verbal structures of utterances, in particular lexico-
semantic and syntactic structures, are deployed to build syntactic units for pragmatic ac-
tions in their sequential context. These lexico-syntactic structures necessarily co-occur
with prosodic structures. In most cases, the co-occurrence of structures from these
two linguistic systems is used to make actions interpretable that are realized via pro-
jected and provisionally completed turn-constructional units (TCUs; Sacks, Schegloff,
and Jefferson 1974; Selting 2000). In some cases, the abandoning of projected units
results in fragments of units (Selting 2001).
There are some characeristics of the syntax and prosody of units in spoken interac-
tion that function in a parallel fashion and thus reveal shared principles of unit and turn
construction, e.g.:
– New turn-constructional units begin with the beginning of new syntactic units, e.g.
clauses, phrases, single-word units – and also with the beginning of new prosodic
units, often contextualized as new via prosodic breaks and pitch step ups or step
downs or faster syllables at the beginning of new units.
– There may be pre-positioned items, syntactically as well as prosodically formatable in
different ways between the poles of independent/exposed units to integrated items of
the thus begun turn-constructional units, deployed in order to project and focus the
unit-to-come.
– Before pauses within or after syntactically possible complete units, level or slightly
rising pitch may be deployed in order to hold the turn and project continuation.
– After pauses in the middle of units, projected clauses can be continued – just as
projected contours and previous loudness can be continued after the pause.
– Syntactic constructions can be expanded after first and further possible completion
points – just as the prosody after possible completion points can be expanded, by
adding further words or phrases continuing the prior prosody. These may be formated
in different ways between the poles of prosodically integrated items to independent/
exposed post-positioned units.
– At syntactically possible completion points, level or slightly rising pitch can be used in
order to hold the turn and project continuation across the turn-constructional unit
boundary, thus projecting another turn-constructional unit to come.
All this shows that both syntax and prosody can be deployed to construct turn-construc-
tional units as flexible entities that may be adapted to the local exigencies of the inter-
action (see Auer 1991, 1996; Couper-Kuhlen 2007; Schegloff 1996; Selting 1995a, 1995b,
1996, 2000, 2001, 2008).
Müller (2009) gives the following general characterization on the relation between
verbal and gestural parts of utterances in interaction:
Gestures are part and parcel of the utterance and contribute semantic, syntactic and
pragmatic information to the verbal part of the utterance whenever necessary. As a
visuo-spatial medium, gestures are well suited to giving spatial, relational, shape,
size and motion information, or enacting actions. Speech does what language is
equipped for, such as establishing reference to absent entities, actions or events or es-
tablishing complex relations. In addition, gestures are widely employed to turn verbally
implicit pragmatic and modal information into gesturally explicit information. (Müller
2009: 517)
Bohle (2007) investigated the role of gestures in the organisation of turn taking in Ger-
man conversations. She found a number of parallel properties between gesture and
prosody. In particular:
– Gesture phrases have, like intonation units/phrases, flexible beginnings and endings:
they can be expanded, interrupted and continued, and abandoned (see Bohle 2007:
274).
– Gestures which are prepositioned in relation to the verbal-vocal units that they are
co-expressive with, i.e. constructed before the verbal-vocal units have begun, may
be used to project the unit and its continuation (see Bohle 2007: 274).
– During pauses, or in response to competitive incomings (French and Local 1983),
gestures may be deployed to project more-to-come and/or claim turn continuation,
they may thus be used as turn-holding devices, both locally within as well as more
globally across turn-constructional units (Bohle 2007: 275f.).
– In transition relevance spaces there are no turn-holding devices being used. Rather,
what we find is: pragmatic completion, syntactic completion, ending intonation, and
the returning of the hands and arms into a rest position. One or more of these devices
can be shifted to constitute slightly incongruous structures at the beginning or ending
of units, in order to exploit them for purposes such as projection of continuation or
the negotiation of turn taking (see Bohle 2007: 280).
Bohle (2007: 277) summarizes the parallel practices from prosody and gesture in a table
(see Tab. 38.1; my translation from German, MS):
Tab. 38.1: Parallel practices from prosody and gesture according to Bohle (2007: 277)
Function Prosodic practice Gestural practice
Utterance Ending intonation Withdrawal of hands into rest position
completion
Possible Completion of intonation Completion of gesture phrase
completion contour
of a unit
Continuation of an Continuing intonation Stop of movement at the high point/
utterance or turn climax or in the retraction phase
Continuation of Continuation of prior speech Continuation of rhythm of movement
unit with an rhythm Maintaining prior locus of movement
expansion Continuation of prior tempo Maintaining prior hand configuration
Continuation of prior loudness Continuation of prior gesture phrase
Continuation of prior intonation
contour
Turn continuation Change of speech rhythm Change of rhythm of movement
with new unit Change of tempo Change of locus of movement
Change of loudness Change of hand configuration
New intonation contour New gesture phrase
Projection of turn Semantic projection Pre-positioning of referential gesture
continuation Discourse-pragmatic projection Pre-positioning of pragmatic gesture
Activity-type specific projection Construction of cumulative
gesture unit
In Bohle’s view, gesture and speech are co-expressive, yet the construction of ges-
tures is independent from speech. In particular, the timing of gestures in relation
to speech may be deployed for the suggestion of interactional meanings (see Bohle
2007: 274).
A few studies by C. and M. H. Goodwin have investigated the concurrence of pro-
sodic and visual resources for the construction of action sequences in special contexts
and settings. E.g., M. H. Goodwin (1996) describes how, within the ecology of work si-
tuations in an airport, informings and announcements issued from the Operations room
rely on different prosodic patterns in order to get tailored for their target audience and
the space that they inhabit (see M. H. Goodwin 1996: 436). Girls playing hop scotch are
shown to build actions that require the integrated use of both particular verbal and pro-
sodic formats within the semiotic field provided by the hop scotch grid (see M. H.
Goodwin and C. Goodwin 2000). A man suffering from severe aphasia is shown to suc-
cessfully communicate with his family by relying on prosodic and visual resources (see
M. H. Goodwin and C. Goodwin 2000; C. Goodwin 2010).
Most studies, however, in investigating the multimodality of the organization of
interaction with respect to, e.g., turn taking or storytelling, have concentrated on the
deployment of visual resources. In many studies, the use of these devices for the projec-
tion of imminent actions is emphasized, projection being conceived of as a prerequisite
for human interaction and cooperation.
Building on and continuing work by, in particular, C. Goodwin and M. H. Goodwin,

Hayashi (2005) describes multimodal interaction as finely-tuned collaborative work that
“is public practices” which rely on projectability as their prerequisite principle:
At the heart of language and bodily conduct as public resources for the achievement of
socially coordinated participation in situated activities is the projectability of human con-
duct. Projectability allows participants to anticipate the future course of action being pro-
duced by another participant and produce a specific form of action that fits into the
unfolding structure of that other participant’s ongoing action (Hayashi 2005: 45f; see
also Streeck’s 2009 analysis of “forward-gesturing”).
According to Hayashi (2005: 47), turns at talk “are multimodal packages for the pro-
duction of action (and collaborative action) that make use of a range of different mod-
alities, e.g., grammatical structure, sequential organization, organization of gaze and
gesture, spatial-orientational frameworks, etc., in conjunction with each other”. In con-
clusion, Hayashi encourages a way to conceptualize a turn at talk as “a temporally un-
folding, interactively sustained domain of embodied action through which both the
speaker and recipients build in concert with one another relevant actions that contribute
to the further progression of the activity in progress” (Hayashi 2005: 47–48).
In a series of studies, Mondada (e.g., 2006, 2007) investigates the role of multimodal
resources in the organization of turn taking in interaction. With reference to a fragment
from a work meeting interaction in an architect’s office in Paris, in which three partici-
pants talk about and manipulate a map showing the castle which they want to transform
into a luxury hotel, Mondada (2006) shows that and how “emergent dynamics – such as
projections at the level of turn, sequence and action – are displayed and oriented to by
participants in a detailed and timed way” (Mondada 2006: 127). For a particular con-
text, professional work meetings of experts sitting at a table and discussing and devel-
oping a cartographic language for modelling agricultural land, using various drawings
and artefacts on the table, Mondada (2007) describes a multimodal practice of self-
selection: “the use of pointing gestures predicting turn completions and projecting the
emergence of possible next speakers” (Mondada 2007: 194).
Müller (2003) shows in a case study how gestures are related to storytelling: “Ges-
tures are intertwined with the verbal part of the utterance with regard to rhematic infor-
mation and communicative dynamism. They represent and/or highlight the rhematic
information verbally provided which receives a high communicative dynamism. And
in doing this they construct a visible account of the story line and of the narrative
peaks” (Müller 2003: 263). “The gestures (…) are part of the narrative structure of a
verbally and bodily described event. They create a visual display of the narrative struc-
ture and thus turn the story telling into a multi-modal event: something to listen to but
also something to watch (…) they are natural elements of an everyday rhetoric of telling
a story in conversation” (Müller 2003: 263).
With respect to recipients’ responses to storytelling, Stivers (2008) shows that vocal
and visual responses during storytelling convey different information: “whereas vocal
continuers simply align with the activity in progress, nods also claim access to the tell-
er’s stance toward the events (whether directly or indirectly)” (Stivers 2008: 31). That
means that whereas in mid-storytelling verbal recipiency tokens, acknowledgements
and continuers, are deployed to achieve more formal alignment, nodding may be
deployed to signal affiliation, i.e. “that the hearer displays support of and endorses the
teller’s conveyed stance” (Stivers 2008: 35).
In conclusion, although work like the latter discussed has provided important in-
sights into the relation and concurrence of verbal and visual resources in interaction,
there is still the need to integrate the results of this research with the results of research
in prosody in interaction.
4. Goals and methodology

The above conclusion calls for an approach integrating interactional linguistic and mul-
timodal analyses of interaction sequences, thus laying the foundation for and moving
towards a comprehensive interaction analysis, or, as Schegloff (2005) might call it, a
comprehensive analysis of “talk-and-other-conduct-in-interaction”.
In the following, I will demonstrate that and how participants in interaction use ver-
bal, vocal and visual resources in co-occurrence and concurrence in the organization of
interaction, in particular in the signalling of affectivity in climaxes of storytelling. I will
show how the following kinds of resources are used in their sequential context:
(i) the verbal display: rhetorical, lexico-semantic, syntactic, and segmental phonetic-
phonological resources;
(ii) the vocal display: resources from the domains of prosody and voice quality;
(iii) visual resources from the domains of body posture and its changes, head move-
ments, gaze, hand movements and gestures, and the manipulation of objects.
I will present an extract from a face-to-face conversation. The data come from a corpus
of 8 everyday face-to-face conversations with 2 or 3 participants in their home environ-
ments in the area of Berlin-Potsdam, recorded by the project Emotive Involvement in
Conversational Storytelling within the Cluster of Excellence Languages of Emotion in
2008–2009 at Freie Universität Berlin. For these data, the project devised a recording
technique adopted from Peräkyla and Ruusuvuori (2006): We used three cameras
plus an extra audio flash recorder. The three cameras were focussed to capture both
the total situation as well as the faces and bodies of the participants facing each
other separately. For data analysis, all four recordings were synchronized and combined
into one film, allowing analysts to look at the same sequences from three different
perspectives as well as to have access to a high-quality audio-recording.
The data have been transcribed according to a transcription system developed by a
group of German interactional linguists in 1998, revised in 2009 (Selting et al. 1998,
2009). This system is similar to the transcription system used in Conversation Analysis,
but it attempts a more linguistically systematic notation, especially with respect to prosody
in talk-in-interaction. The notation conventions can be found in the appendix.
For the demonstration of the relationship between verbal, vocal, and visual practices
in interaction I will present a sequence of interaction in which a storyteller and a recip-
ient enact a climax of story telling. Here, as elsewhere in face-to-face interaction, all
kinds of resources are deployed to co-occur and work together in order to make the
respective activity recognizable for the recipients, who are then expected to respond
affiliatively.
In comparison to other activities in conversation in general and in storytelling in par-

ticular, the climax of storytelling shown here is one in which affectivity is made recog-
nizable. That means, neither the climax of the story nor the responses are delivered in a
neutral manner, tone or voice. Rather, they are keyed as emotively involved. In the sig-
naling of affectivity, i.e. more-than-neutral involvement that may be interpretable as
suggesting a particular affect, it is in particular the usage of marked or salient cues
that is relevant, i.e. cues that deviate from the forms for the signaling of behavior of
the same speaker in surrounding segments of talk. So, the words marked and marking
are deployed as technical terms here: the marked realization of a cue is always a more
noticeable or more conspicuous one in comparison to its unmarked or non-salient
counterpart. Clearly marked and unmarked realizations of cues are often poles of a
continuum with possibly more or less marked realizations in between them.
For the analysis of verbal, vocal, and visual practices in conversational interaction,
sequences with affect displays are thus more marked cases. The use of marked cues sug-
gests interpretations of emotive involvement and affectivity which overlay the interpre-
tation of the other co-occurring cues. Because of their markedness, they attract our
attention. Nevertheless, these marked cases function along principles of construction
similar to more unmarked and “neutral” cases and sequences.
5. Sample analysis
The following extract (from: LoE_VG_03_Parkausweis Gehbehinderte) shows the cli-
max of a complaint story told by Carina, and recipient Hajo’s responses. Carina has just
been telling how she came to be parking in a parking place for disabled people for a
very brief time and on coming back found a ticket on the windscreen of her car. In
response to this, Hajo provides a recipiency token in segment 10. With the following
segments Carina makes the climax of her complaint story recognizable to Hajo, who
then responds more strongly.
Carina’s story climax
In overlap with Hajo’s recipiency token, Carina in 11 produces the swear word
FUCK and then in 12 gives the sum she had to pay as a fine.
{0:19} 10 Haj: hm_[hm,]

11 Car: [|<FUCK.>]
<whispery, l>
|((nodding, gazing at Haj))
12 |<<whispery>SIEBzig euro.>
seventy euros
|((with raised eyebrows))
13 (-)
{0:21} 14 Haj: |<<pressed, h>^ˀ!OAH!;>

|((with wide opened eyes and mouth,
| and with raised eyebrows))
Rhetorically and lexico-semantically, the swear word is of course remarkable. In addi-

tion, it realizes a code-switch into English. This unit presents the ticket as a very neg-
atively evaluated nuisance. For the young people in conversation here, the height of the
sum given in 12, seventy euros, is an extreme-case formulation (Pomerantz 1986). The
two units can be described as a response cry (Goffman 1978, 1981), FUCK, and an elab-
oration of it (see C. Goodwin 1996: 393ff.). Syntactically, the climax is realized with
maximally short constructions with one single and two words constituting the syntactic
units. Prosodically, the units show falling accents with final falling pitch. With her voice
quality, however, Carina creates a contrast to her prior units: FUCK is delivered in a
lower pitch register, both FUCK and SIEBzig euro are realized in a whispery voice.
Visually, Carina produces a head nod with FUCK and raises her eyebrows when uttering
SIEBzig euro.
With all these cues together, Carina suggests segments 11 and 12 as the climax of
her story. All these cues clearly construct these units as conspicuous and thus signal
heightened emotive involvement. The specific affect that she displays is more diffi-
cult to interpret, though. Her strong negative assessment suggests the interpretation
of anger or indignation because of her being treated unfairly (for indignation see
Günthner 2000). Yet, this anger and indignation is not displayed as in-situ, but as re-
ported thought, that is, as a reconstructed affect belonging to the story world (see
Günthner 2000). In contrast to other complaint stories in which reconstructed
anger and indignation is displayed with more animated cues, Carina’s anger and
indignation is displayed with more subdued cues here: whispery voice and low
pitch register. Through this, her affect seems to be displayed as a past experience,
resigned-to now.
The interpretation of the displayed affectivity as anger or indignation can be war-
ranted by taking Hajo’s response into account: After a brief lapse, he responds with
the sound object (Reber 2008) ˀ!OAH!;, in a high pitch register and with a high pitch
peak, with rising-falling pitch, and in a tense, pressed voice. Fig. 38.4 (see below)
shows that there is a burst of high intensity at the beginning of the item. After Carina’s
brief formulation of her climax, Hajo responds with a maximally short response cry
which consists of one single syllable. Concomitantly, he gazes at Carina with his eyes
suddenly wide open, his mouth open, and raised eyebrows. All these features together
constitute a conventional response cry (Goffman 1981) to display astonishment at and
affiliative agreement with the prior speaker’s negative assessment of some event pre-
sented in the prior turn. Hajo shows himself in agreement with Carina’s assessment
of the events as egregious. His visual enaction of raised eyebrows at 14 and 16 even con-
verges at Carina’s enaction of raised eyebrows at 12, thus aligning with her enaction of
facial expression. The pause in segment 13 and Hajo’s slightly late response can in this
case be analyzed as an additional signal of his astonishment (on delays in the signaling
of surprise see also Wilkinson and Kitzinger 2006: 164ff.) In addition, Hajo’s response is
quite brief and he does not project elaboration on it. This leads to Carina’s expansion of
her climax. Figs. 38.1 and 38.2 show Carina’s and Hajo’s facial expression in segments
12 and 14.
Fig. 38.1
Fig. 38.2
Carina’s evaluation of the complainable in the here-and-now

In segment 15, Carina produces an in-situ evaluation of the complainable of her story:
{0:22} 15 Car: |↑SIE:Bzig Euro f Ür (.) im behInderten (-)

seventy euros for in a disabled
|((nodding in synchrony with accented syllables,
|gazing at Haj))
|<<dim>pArkplatz [(stEhn).>
parking place
|((nodding in synchrony with accented syllables,
|then gaze away from Haj))
seventy euros for using a parking place for the disabled
{0:25} 16 Haj: |<<len>^HOLla.>

|((with still raised eyebrows))
Rhetorically and lexico-semantically, she does not add anything new, but only formu-
lates the egregious fine in a more elaborate form again. Syntactically, this is a non-finite
construction, mentioning only the bare fact, with the mentioning of the extreme sum of
the fine in a topicalized position, but it is longer than the first rendering. Prosodically,
the topicalized extreme sum is presented with an accented syllable rising to an extra-
high pitch peak and carrying some lengthening, thus signaling the focus of the unit
right from the beginning. The words in the rest of the unit carry a high number of addi-
tional secondary accents, namely five; these are not rhythmically organized but sepa-
rated by two brief pauses. Nevertheless, the accentuation is dense (see Selting 1994),
with only few unaccented syllables between the accented ones, even though most of
the accents are not very strong. The unit ends in soft voice. Visually, Carina nods her
head in synchrony with the accented syllables; at first she continues to gaze at Hajo
and then directs her gaze away from him.
In this case it is not only the verbal, vocal and visual marking that displays the emo-
tive involvement, but also the fact that Carina repeats the egregious fact again, in more
or less the same words as before. She thus draws attention to the egregious fact again.
But in contrast to the first rendering, as the climax, which seemed to re-enact her affect
in the storyworld, she now seems to comment on and evaluate the egregious fine for
Hajo in the here-and-now and thus creates another opportunity for Hajo to respond.
Carina’s in-situ evaluation of the complainable seems to be weaker and calmer than
her prior reconstructed rendering of it.
Again, this analysis can be warranted with reference to Hajo’s response at segment
16. Hajo provides ^HOLla. with slow speech rate and with marked rising-falling pitch.
This can be seen in the acoustic analysis shown in Fig 38.5, carried out with the software
programme PRAAT.
Just as Carina’s second formulation of her climax was longer than her first, so Hajo’s
second response cry is longer: it now has two syllables. And in comparison to his prior
response at 14, this second response cry is prosodically and visually less marked. As
Fig. 38.5, in comparison to Fig. 38.4, shows, the F0 peak is lower and the intensity is
0.111038805 0.392990464
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
hm hm
0 0.7227
Time (s)
Fig. 38.3
0.173040492 0.600584813
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
o a h
0 0.7193
Time (s)
Fig. 38.4
0.0345618371 0.656058515
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
h h o l a
0 0.7205
Time (s)
Fig. 38.5
lower and gradually rising and falling throughout the item. There is no pressed articu-
lation any longer, but slow tempo. Nevertheless, it is still much more prominent with
respect to both pitch movement as well as intensity than his recipiency token hm_hm
from line 10 (shown in Fig. 38.3). Hajo continues the visual marking of his first response:
he is gazing with his eyes wide open and with raised eyebrows, but does not add new
signals. This means: Just as Carina’s in-situ evaluation of the complainable was weaker
than her first re-enaction of it, so now Hajo’s second response is weaker than his first.
Nevertheless, it is a fully affiliative response to Carina’s complaint story.
Analyzing the sequence, the form and succession of the two adjacency pairs by
Carina and the interaction between Carina and Hajo here suggest the following
interpretations:
– The first formulation is presented as if it were a reproduction of Carina’s first

response upon seeing the ticket, i.e. as reported thought, from the perspective of
the character in the storyworld; the second formulation is displayed more like a
second or later thought or reflection about the event, from the perspective of the
storyteller in the here-and-now.
– In both his responses, Hajo builds on Carina’s just prior formulations of her climax.
Each of Hajo’s responses matches Carina’s prior formulation in structure and
prosody.
– Hajo’s slightly late and brief first response appears to lead to Carina’s expansion of
her climax.
– Carina and Hajo gaze at each other all the time and thus maintain a close interaction
throughout this sequence.
Carina and Hajo thus display what M. H. Goodwin (1980) has called “mutual monitor-
ing” (see also C. Goodwin and M. H. Goodwin 1987). (On a different “epistemic ecol-
ogy” created through the two sequences in succession, see also C. Goodwin, 2010. For
more detail on the sequential structure here see Selting 2010a).
The results of this analysis can be summarized as follows: In their presentation of
the climax of a complaint story – in which the storyteller uses general rhetorical re-
sources like the presentation of the offender or offending as acting or being unfair,
irrational, offensive and the presentation of the self as acting in a fair, rational, justi-
fied manner – participants use verbal, vocal and visual cues to signal their emotive in-
volvement in telling and responding to the story. In particular, we saw the following
cues being deployed:
(a) Verbal cues:
Rhetorically and lexico-semantically: Reformulation of an action, extreme-case
formulations, swear words or expletives; sound objects that function as response cries.
Syntactically: short, dense “elliptical” constructions and clauses.
(b) Vocal cues:
Prosodically: prosodic marking cues such as extra-strong accents, extra-high pitch
peaks, lengthenings, dense accentuation, tempo changes, changes of pitch register.
Voice quality: whispery voice; pressed, tense voice.
(c) Visual cues:

Head movements: head nods and head shakes.
Gaze: gazing at recipient.
Facial expression: mouth wide open, eyes wide open, raised eyebrows.
It is not single cues that suggest particular interpretations. It is rather their co-
occurrence and density that speakers deploy in order to suggest the interpretation of
their talk as emotively involved. The use of these cues in co-occurrence within their par-
ticular sequential context, here after the depiction of dramatization in storytelling, suggests
the interpretation of the respective units as both the (or a) climax of the story and as dis-
playing affectivity. These interpretations were warranted with reference to the recipient’s
affiliative responses, namely response cries that demonstrate the recipient’s agreement and
affiliation with the storyteller’s assessment of the events presented as egregious.
6. Conclusion
The extract has shown that participants in interaction deploy verbal, vocal and visual
cues and practices from several different domains in co-occurrence and concurrence
in order to make their actions recognizable and interpretable for the recipient and
thus enabling her or him to provide the appropriate recipient responses.
There is no one-to-one relationship between cues and meanings. It is rather that bun-
dles of co-occurring cues suggest the interpretation of both actions as well as concom-
itant aspects such as affectivity or emotive involvement in general, while the specific
action or affect being displayed has to be interpreted within the sequential context.
In the case shown here, the relation between the verbal, vocal and visual cues was a sim-
ple one: all cues and practices were enacted as cues that support and enhance the actions
expressed via the other cues. In other words: all cues were deployed to comply with and
support each other. I have not dealt with any cases in which cues are deployed in a non-
congruent fashion, e.g., in order to suggest the interpretation of the message as exagger-
ated, fake, mock, ironic or the like. Neither have I dealt with any cases in which visual prac-
tices were used in place of speech or even as actions altogether independent from speech.
Acknowledgements
I am grateful to Elizabeth Couper-Kuhlen for comments on a previous version of this
paper. I also thank Maxi Kupetz for making the still frames for me and Yuko Sugita
for making the PRAAT analyses for me, presented in the figures 38.3–38.5.
Appendix: Transcription conventions

(for details see Selting et al. 1998 and 2009)
Sequential structure
[] overlap and simultaneous talk

[]
= latching
Pauses
(.) micropause
(-), (--), (---) brief, mid, longer pauses of ca. 0.25–0.75 secs.; until ca. 1 sec.
(2.0) estimated pause, more than ca. 1 sec. duration
(2.85) measured pause (notation with two digits after the dot)
Other segmental conventions
und_äh assimilations within units

:, ::, ::: segmental lenghtening, according to duration
äh, öh, etc. hesitation signals, so-called “filled pauses”
ˀ cut-off with glottal closure
Accentuation
akZENT strong, primary accent

ak!ZENT! extra strong accent
akzEnt weaker, secondary accents
Pitch at the end of units
? rising to high
, rising to mid
- level
; falling to mid
. falling to low
Notation of pitch movement in and after accented syllable
SO falling
´SO rising
¯SO level
ˆSO rising-falling
ˇSO falling-rising
↑ pitch jump up to peak of accented syllable

↓´ pitch jump down to valley of accented syllable
Rhythm
/xxx /xx x/xx rhythmically integrated talk: “/” is placed before a rhythmic beat
Conspicuous pitch jumps
↑ to higher pitch
↓ to lower pitch
Changed register, end indicated by final “>”
<<l> > low register

<<h> > high register
Laughter
haha hehe hihi laugh syllables

((laughter)) description of laughter
<<laughingly> > notation of voice quality, end indicated by final “>”
Changes in loudness and speech rate, end indicated by final “>”
<<f> > =forte, loud

<<ff> > =fortissimo, very loud
<<p> > =piano, soft
<<pp> > =pianissimo, very soft
<<all> > =allegro, fast
<<len> > =lento, slow
<<cresc> > =crescendo, continuously louder
<<dim> > =diminuendo, continuously softer
<<acc> > =accelerando, continuously faster
<<rall> > =rallentando, continuously slower
Breathing
.h, .hh, .hhh inbreath, according to duration

h, hh, hhh outbreath, according to duration
Other conventions
((nods)) non-verbal/visual and extralinguistic activities and events

<<noddingly> > concomitant para- and extralinguistic activities and event
with notation of scope
<<whispery> > description of voice quality
() unintelligible according to duration
(solche) uncertain transcription
(solche/welche) possible alternatives
((…)) omissions in the transcript
| talk talk talk | parallel verbal and visual actions
| ((nods)) |
7. References
Auer, Peter 1991. Vom Ende deutscher Sätze. Zeitschrift für Germanistische Linguistik 19/2: 139–157.
Auer, Peter 1996. On the prosody and syntax of turn-continuations. In: Elizabeth Couper-Kuhlen
and Magret Selting (eds.), Prosody in Conversation. Interactional Studies, 57–100. Cambridge:
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben. Explorative Studie zur Rolle Re-
debegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Couper-Kuhlen, Elizabeth 2000. Prosody. In: Jef Verschueren, Jan-Ola Östman, Jan Blommert and
Chris Bulcaen (eds.), Handbook of Pragmatics, 1–19. Amsterdam: John Benjamins.
Couper-Kuhlen, Elizabeth 2007. Prosodische Prospektion und Retrospektion im Gespräch. In:
Heiko Hausendorf (ed.), Gespräch als Prozess. Linguistische Aspekte der Zeitlichkeit verbaler
Interaktion, 69–94. Tübingen: Narr.
Couper-Kuhlen, Elizabeth and Cecilia Ford (eds.) 2004. Sound Patterns in Interaction. Amsterdam:
John Benjamins.
Couper-Kuhlen, Elizabeth and Magret Selting (eds.) 1996. Prosody in Conversation. Interactional
Studies. Cambridge: Cambridge University Press.
Couper-Kuhlen, Elizabeth and Magret Selting (eds.) 2001. Studies in Interactional Linguistics. Am-
sterdam: John Benjamins.
Enfield, N. J. 2005. The body as a cognitive artifact in kinship representations: Hand gesture dia-
grams by speakers of Lao. Current Anthropology 46(1): 1–26.
Firth, John Rupert 1957. Papers in Linguistics, 1934–1951. Oxford: Oxford University Press.
French, Peter and John Local 1983. Turn-competitive incomings. Journal of Pragmatics 7: 701–715.
Goffman, Erwing 1978. Response cries. Language 54: 787–815.
Goffman, Erwing 1981. Forms of Talk. Oxford: Blackwell.
Goodwin, Charles 1996. Transparent vision. In: Elinor Ochs, Emanuel A. Schegloff and Sandra A.
Thompson (eds.), Interaction and Grammar, 370–404. Cambridge: Cambridge University Press.
Goodwin, Charles 2000. Action and embodiment within human interaction. Journal of Pragmatics
32: 1489–1522.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassel and
Elena Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Amsterdam:
John Benjamins.
Goodwin, Charles 2010. Constructing meaning through prosody in aphasia. In: Dagmar Barth-
Weingarten, Elisabeth Reber and Margret Selting (eds.), Prosody in Interaction, 373–394.
Goodwin, Charles and Marjorie Harness Goodwin 1987. Concurrent operations on talk: Notes on
the interactive organization of assessments. Papers in Pragmatics 1: 1–54.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated in the production of
descriptive sequences. Sociological Inquiry 50: 303–317.
Goodwin, Marjorie Harness 1996. Informings and announcements in their environment: prosody
within a multi-activity work setting. In: Elizabeth Couper-Kuhlen and Margret Selting (eds.),
Prosody in Conversation. Interactional Studies, 436–461. Cambridge: Cambridge University
Press.
Goodwin, Marjorie Harness and Charles Goodwin 1986. Gesture and coparticipation in the activ-
ity of searching for a word. Semiotica 62: 51–75.
Goodwin, Marjorie Harness and Charles Goodwin 2000. Emotion within situated activity. In:
Alessandro Duranti (ed.), Linguistic Anthropology: A Reader, 239–257. Oxford: Blackwell.
Günthner, Susanne 2000. Vorwurfsaktivitäten in der Alltagsinteraktion. Tübingen: Niemeyer.
Hakulinen, Auli and Margret Selting (eds.) 2005. Syntax and Lexis in Conversation. Amsterdam:
John Benjamins.
Have, Paul ten 1999. Doing Conversation Analysis. London: Sage.
ment on coordinated participation in situated activities. Semiotica 156(1/4): 21–53.
Hutchby, Ian and Robin Wooffitt 1998. Conversation Analysis. Cambridge: Polity Press.
Kelly, John and John Local 1989. Doing Phonology. Observing, Recording, Interpreting. Manche-
ster: Manchester University Press.
Lerner, Gene 2003. Selecting next speaker: The context-sensitive operation of a context-free orga-
nization. Language in Society 32: 177–201.
Levinson, Stephen C. 2006. On the human “interactional engine”. In: N. J. Enfield and Stephen C.
Levinson (eds.), Roots of Human Sociality: Culture, Cognition and Interaction, 39–69. Oxford:
Berg.
Mondada, Lorenza 2006. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8(1): 117–129.
Mondada, Lorenza 2007. Multimodal resources for turn-taking: Pointing and the emergence of
in a conversation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures. Mean-
ing and Use, 259–265. Porto: Universidade Fernando Pessoa Press.
Peräkylä, Anssi and Johanna Ruusuvuori 2006. Facial expression in an assessment. In: Hubert
Knoblauch, Jürgen Raab, Hans-Georg Soeffner and Bernt Schnettler (eds.), Video Analysis:
Methodology and Methods, 127–142. Frankfurt am Main: Peter Lang.
Pomerantz, Anita 1986. Extreme case formulations: A way of legitimizing claims. Human Studies
9: 219–229.
Reber, Elisabeth 2008. Affectivity in talk-in-interaction: Sound objects in English. Ph.D. disserta-
tion. University of Potsdam.
nization of turn-taking in conversation. Language 50: 696–735.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: J. Maxwell Atkinson and
John Heritage (eds.), Structures of Social Action, 266–296. Cambridge: Cambridge University
Press.
Schegloff, Emanuel A. 1996. Turn organisation: One intersection of grammar and interaction. In:
Elinor Ochs, Emanuel A. Schegloff and Sandra A. Thompson (eds.), Interaction and Grammar,
Schegloff, Emanuel 1997. Whose text? Whose context? Discourse and Society 8: 165–187.
Schegloff, Emanuel A. 2005. On integrity in inquiry … of the investigated, not the investigator.
Discourse Studies 7(4–5): 455–480.
Schegloff, Emanuel A. 2007. Sequence Organization in Interaction. A Primer in Conversation Ana-
lysis. Cambridge: Cambridge University Press.
Schmitt, Reinhold 2007. Von der Konversationsanalyse zur Analyse multimodaler Interaktion. In:
Heidrun Kämper and Ludwig M. Eichinger (eds.), Sprach-Perspektiven. Germanistische Lin-
guistik und das Institut für Deutsche Sprache, 395–417. Tübingen: Narr.
Schönherr, Beatrix 1993. Prosodische und nonverbale Signale für Parenthesen. “Parasyntax” in
Fernsehdiskussionen. Deutsche Sprache 21: 223–243.
Schönherr, Beatrix 1997. Syntax – Prosodie – Nonverbale Kommunikation. Tübingen: Niemeyer.
Selting, Margret 1994. Emphatic speech style – with special focus on the prosodic signalling of
heightened emotive involvement in conversation. Journal of Pragmatics 22: 375–408.
Selting, Margret 1995a. Der “mögliche Satz” als interaktiv relevante syntaktische Kategorie. Lin-
guistische Berichte 158: 298–325.
Selting, Margret 1995b. Prosodie im Gespräch. Aspekte einer Interaktionalen Phonologie der
Konversation. Tübingen: Niemeyer.
Selting, Margret 1996. On the interplay of syntax and prosody in the constitution of turn-construc-
tional units and turns in conversation. Pragmatics 6(3): 357–388.
Selting, Margret 2000. The construction of units in conversational talk. Language in Society 29:
477–517.
Selting, Margret 2001. Fragments of units as deviant cases of unit-production in conversational
talk. In: Margret Selting and Elizabeth Couper-Kuhlen (eds.), Studies in Interactional Linguis-
tics, 229–258. Amsterdam: John Benjamins.
39. The codes and functions of nonverbal communication 609
Selting, Margret 2008. Linguistic resources for the management of interaction. In: Gerd Antos,
Eija Ventola and Tilo Weber (eds.), Handbook of Interpersonal Communication, 217–253.
Volume 2. Berlin: De Gruyter.
Selting, Margret 2010a. Affectivity in conversational storytelling: An analysis of displays of anger
or indignation in complaint stories. Pragmatics 20(2): 229–277.
Selting, Margret 2010b. Prosody in interaction: State of the art. In: Dagmar Barth-Weingarten, Eli-
sabeth Reber and Magret Selting (eds.), Prosody in Interaction, 3–40. Amsterdam: John
Benjamins.
Selting, Margret, Peter Auer, Birgit Barden, Jörg Bergmann, Elizabeth Couper-Kuhlen,
Susanne Günthner, Uta Quasthoff, Christian Meier, Peter Schlobinski and Susanne Uh-
mann 1998. Gesprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte
173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, et al. 2009. Gesprächsanalytisches Trans-
kriptionssystem 2 (GAT 2). Gesprächsforschung – Online-Zeitschrift zur verbalen Interaktion
10(2009): 292–341 (www.gespraechsforschung-ozs.de).
Sidnell, Jack 2006. Coordinating gesture, talk, and gaze in reenactments. Research on Language
and Social Interaction 39(4): 377–409.
Stivers, Tanya 2008. Stance, aligment, and affiliation during storytelling: When nodding is a token
of affiliation. Research on Language and Social Interaction 41(1): 31–57.
Stivers, Tanya and Jack Sidnell 2005. Introduction: Multimodal interaction. Semiotica 156(1/4):
1–20.
Streeck, Jürgen 2009. Forward-Gesturing. Discourse Processes 46: 161–179.
Wilkinson, Sue and Celia Kitzinger 2006. Surprise as an interactional achievement: Reaction to-
kens in conversation. Social Psychology Quarterly 69(2): 150–182.
Margret Selting, Potsdam (Germany)
39. The codes and functions of nonverbal

communication
1. Features of a communication perspective
2. The multiple coding systems within nonverbal communication
3. Relationships among verbal and nonverbal communication
4. Neurophysiological processing
5. The functions of nonverbal communication
6. Summary
7. References
Abstract
The article considers whether verbal communication is fundamentally different from
nonverbal communication. By tracing back origins, message features, and features of
neurophysiological processing of verbal and nonverbal communication, the article iden-
tifies the nonverbal codes that form the nonverbal communication system and then more
broadly examines the communicative functions of nonverbal communication on a range
of different levels (e.g., structural, personal, interactional).
1. Features of a communication perspective

1.1. Verbal and nonverbal communication defined
Communication as a term has been subject to a wide range of denotations and connota-
tions. From the perspective of the discipline of communication, it refers to a dynamic
message exchange process in which sender and receiver create meaning through the
use of signs and symbols. Messages – the central organizing feature of this disciplinary
perspective – are comprised of both verbal and nonverbal components. The former are
typically viewed as synonymous with language. The latter – the nonverbal component
and the subject of this chapter – include all aspects of messages other than the words
themselves.
1.2. Message, sender and receiver orientations

This does not mean, however, that all behaviors other than words qualify as nonverbal
communication. Only those behaviors that constitute messages count. Specifically, Bur-
goon, Guerrero, and Floyd (2010) define nonverbal communication as those behaviors
that form a socially shared coding system. More specifically, it includes those behaviors
that
(i) are typically sent with intent,

(ii) are used with regularity among members of a given social community, society, or
culture,
(iii) are typically interpreted as intentional, and
(iv) have consensually recognized meanings.
This definition is based on what the discipline describes as a message orientation. A

message orientation focuses on the behavioral constituents and constellations that
would be recognized routinely as meaningful signals among members of the same
social community (Burgoon and Hoobler 2002). This is in contrast to the sender ori-
entation, which restricts nonverbal communication to those behaviors that senders for-
mulate and display with intent, and the receiver orientation, which embraces a much
broader view in which any behavior that a receiver interprets as a message qualifies
as communication. According to a message orientation, passive or involuntary display
of cues that an observer chooses to interpret should be regarded only as information,
as something “given off ” rather than “given.” To be communication, the behavior
must be volitional and other-directed (targeted to a receiver or receivers). This distin-
guishes it from what others have labeled informative, expressive, indicative, or inciden-
tal behavior (see Andersen 1991; Bavelas 1990; Ekman, Sorenson, and Friesen 1969;
Hall and Knapp 2010; Motley 1990, 1991). Burgoon and Newton’s (1991) social mean-
ing model reflects a message orientation in that it focuses on those nonverbal behaviors
that have regular and identifiable meanings in the context of defining interpersonal
relationships.
The reference to constellations of behavior presages the prevailing view that individ-
ual behaviors, whether verbal or nonverbal, should not be viewed in isolation. They are
highly interdependent in their expression and interpretation, in their formation as
signals and in the communicative functions they perform (see Bavelas and Chovil 2000,
for an integrated message model and McNeill 2005, for growth point theory, both of
which detail the interdependence among verbal and nonverbal signals in forming mes-
sages). In what follows, we first disaggregate the parts to identify the nonverbal codes
that form the nonverbal communication system before reassembling them into the com-
municative functions that represent what nonverbal communication is meant to do.
Although our major goal in this chapter is to address what constitutes nonverbal com-
munication, we will allude periodically in this brief overview to the ways in which
nonverbal codes articulate with language to form a total meaningful utterance or
expression.
As a last prefatory clarification, the term gesture itself is open to a wide range of in-
terpretations, ranging from the full gamut of nonverbal communication codes to be dis-
cussed in this chapter to the more narrow and familiar interpretation referring to
displays performed by the limbs (usually hands or head). Although we will tend to
use the term gesture to refer to the latter, we recognize that gestures can be construed
as commensurate with all forms of nonverbal communication, including actions of the
head, face, eyes, hands, feet, trunk and voice; use of touch and distancing; physical
appearance; and manipulation of time, environments and artifacts. Thus, our usage is
one of semantic convenience rather than reflecting a philosophical distinction.
2. The multiple coding systems within nonverbal communication

Like verbal communication, nonverbal communication is comprised of codes. “A code
is a set of signals that is usually transmitted via one particular medium or channel”
(Burgoon, Guerrero, and Floyd 2010: 19). Seven or eight codes are commonly recog-
nized: kinesics, vocalics, olfactics, physical appearance, haptics, proxemics, chronemics
and artifacts.
The most widely recognized of the nonverbal codes related to the body is kinesics –
what in the popular vernacular is known as “body language” – which encompasses mes-
sages based on body motion, such as facial expressions, eye behaviors, posture, gestures,
and other physical movements. A second visually based code that is enacted through
the body is physical appearance, which includes displays drawing upon physical attri-
butes (e.g., height, eye color, facial neoteny, attractiveness level), cranial and bodily
hair (e.g., hair style, color, beardedness), grooming and hygiene, attire (e.g., color, fit,
stylishness, conventionalism), and other adornments (e.g., tattoos, jewelry). Many of
the physical appearance attributes fall into a communicative gray area because they
are not encoded intentionally as messages but may be treated as such by onlookers.
Closely related to the grooming aspects of physical appearance is olfactics, which refers
to nonverbal signals related to smell, including natural body odors, deodorant, and
applied fragrances. Here again, those naturally arising and unintentional signals such
as natural body odor do not fit strict definitions of communication but are regarded
in some quarters as a part of the nonverbal communication repertoire.
A fourth nonverbal code is variously referred to as vocalics, prosody or paralinguis-
tics. It includes all vocalized signals, other than the words themselves, through which
messages are expressed and interpreted. Among the relevant features are pitch,
tempo, loudness, intonation, silences, pauses, resonance, dialects, and nonfluencies.
Two additional codes expressed through the body are the approach-avoidance codes.
At the approach end is haptics, which refers to communication through touch. At the
avoidance end is proxemics, which encompasses interpersonal distances, spatial
arrangements, and use of territory as forms of communication.
The two remaining codes, like proxemics, are part of what Hall (1959) called “the
silent language” or “the hidden dimension” because they refer to implicit messages
that are deeply imprinted in culture and yet operate most of the time outside conscious
awareness. Chronemics refers to the use and perception of time as a communication sys-
tem. Features such as lead time, wait time, punctuality and pacing are among the wide
range of chronemic messages that are possible. Finally, environment and artifacts con-
stitutes another nonverbal code related to place. It includes elements such as architec-
tural features, furniture arrangement, temperature, noise, and lighting. Many of these
codes also incorporate verbal cues. For example, “keep out” signs, like other territorial
barriers, regulate space and territory; t-shirts with slogans are appearance cues that send
a verbal message (Guerrero and Farinelli 2009).
These eight nonverbal codes rely on visual, olfactory, auditory, tactile, temporal, and
spatial channels for the generation, transmission, receipt and interpretation of mes-
sages. Although the codes are in principle separable, each with its own distinctive struc-
tural properties, the various codes are better understood as part of a highly
interdependent system of communication, complete with redundant and substitutable
forms of expression, all in service of specific communication functions. Although they
are also highly interrelated with the verbal stream, they are not mere hand maidens
to verbal communication. Instead, they can stand alone, performing important commu-
nication functions independent of any words being uttered. Thus, it becomes useful
to distinguish the ways in which verbal and nonverbal communication are alike or
different.
3. Relationships among verbal and nonverbal communication

Among the most noteworthy ways in which nonverbal and verbal forms of communica-
tion can be distinguished from one another are on the basis of their origins, message
features, and neurophysiological processing.
3.1. Origins of verbal and nonverbal communication

As regards the perennial nature versus nurture issue, the origins of nonverbal commu-
nication vary in at least three ways from that of verbal communication. First, virtually
all of verbal communication is thought to be learned. Even though people are predis-
posed to learn language starting in infancy, the specific language they learn is deter-
mined by their environment and is only understood within the culture or social
group to which it is endemic. Some nonverbal behaviors are also learned. For example,
certain gestures, like putting one’s thumb up, mean different things in different cultures,
and some expressions, like the Indian head waggle, may have no meaning in another
culture. Many nonverbal behaviors, however, have universal meaning and are presumed
to be innate. Facial expressions of emotion, for instance, show a similar pattern of devel-
opment among sighted and blind individuals, suggesting that people are hardwired to
display emotion in particular ways even if they cannot see the emotional expressions of
others (Eibl-Eibesfeldt 1973; Galati, Scherer, and Ricci-Bitti 1997).
Second, nonverbal communication has phylogenetic primacy over verbal communi-
cation (Burgoon, Guerrero, and Floyd 2010). Nonverbal communication predated ver-
bal communication in the evolutionary history of the human species; before people
learned to speak and use language, they communicated through nonverbal means.
Third, nonverbal communication also has ontogenetic primacy over verbal commu-
nication, which means that children learn how to communicate nonverbally before
they learn how to communicate using language. These combined characteristics of non-
verbal communication result in it often being trusted over verbal communication when
the two conflict (Burgoon, Guerrero, and Floyd 2010).
3.2. Message features of verbal and nonverbal communication

Several message features help distinguish nonverbal messages from verbal messages
(see Burgoon 1985). One of these is the degree to which a message is symbolic versus
iconic or intrinsic. When messages are symbolic, the relationship between the referent
(the word or behavior) and what it stands for is arbitrary. This is the case with most
words. The words big (English), 大きい ( Japanese) and gross (German) are different
symbolic representations of the same attribute and are only understood by people
who have learned these languages. However, if a person uses her or his hands to
show how big something is, people from the U.S., Japan, and Germany may all under-
stand this gesture’s meaning. This is because gestures such as these are iconic – they
resemble (at least to some extent) what they stand for. Even gestures that have become
symbolic within cultures often have iconic roots. For example, the A-OK gesture re-
sembles the letter O. This gesture also resembles the number zero and a coin, which
helps explain why it means “nothing” in France and “money” in Japan. Other nonver-
bal forms of communication are intrinsic, which means that “they show a person’s inter-
nal state or constitute a behavior in and of themselves” (Guerrero and Farinelli 2009).
Most intrinsic behaviors, such as smiling, hitting, and kissing, are understood across
cultures.
Buck (1988) makes a useful distinction between biological signals and social signals.
Biologically-based displays are involuntary, spontaneous, arise naturally and are intrin-
sic to what they signify. The threat stare and loud yelling, for example, are universally
recognized warning or fear signals among not only the human species but most verte-
brates. Social signals, by contrast, are human-designed, symbolic, volitional actions. The
head thrust to signify “no” in Greek is an example of a behavior with arbitrarily de-
vised, culture-specific meaning. Some behaviors with innate origins can also become
symbolic through deliberate usage beyond their original eliciting contexts or purposes.
Smiling, for example, can be feigned rather than arising as an authentic expression of
joy or pleasure. This distinction between biological and social signals is a major source
of controversy in regard to nonverbal emotional expressions because of its implications
for how direct the correspondence is between internal experience and external displays.
If facial expressions, for instance, are strictly responsive to social stimuli and not read-
outs of one’s true emotional states, then views of nonverbal behavior as highly context-
and culture-dependent gain greater credence, and discourse-analytic methods for
examining gesture and language together gain a greater warrant.
Other features that help parse out the differences between nonverbal and verbal
communication include the extent to which a message is multimodal, spontaneous,
reflexive, and occurs in the here and now (Burgoon 1985; Guerrero and Farinelli
2009). Nonverbal messages tend to be more multimodal or multichanneled, which
means that people can display various nonverbal behaviors – such as smiling while lean-
ing forward and tossing one’s head back – all at the same time. In contrast, verbal mes-
sages tend to be unimodal, inasmuch as people can only say one word at a time.
Exceptions would be communicating dual verbal messages through means such as hold-
ing up a protest sign while chanting the same (or similar) words that appear on the sign.
Nonverbal messages also tend to be more spontaneous than verbal messages. Some non-
verbal behaviors, such as speaking with a nervous voice when giving a speech, are par-
ticularly hard to control. Verbal communication, on the other hand, requires at least a
minimal level of forethought for encoding to take place. Thus, although nonverbal com-
munication is often intentional and strategic, it generally tends to be more spontaneous
than verbal communication.
Verbal communication, however, tends to be characterized by more reflexivity and
displacement (Burgoon 1985). Reflexivity is the degree to which a code can reflect
upon itself. People can make statements such as “I’m sorry I said that,” or “Let me
re-word that” but it is difficult to use nonverbal behaviors to modify or direct others
to re-interpret one’s previous nonverbal behaviors. Displacement involves being able to
refer to things that are removed in time and space. Again, people can accomplish this
with words (e.g., “I was not myself yesterday”) but it is difficult, if not impossible, to
communicate the same sentiment with nonverbal communication.
4. Neurophysiological processing
The ways in which the human brain processes nonverbal versus verbal information also
differ to some extent. Scholars have made an important distinction between analogic
and digital messages. An analogic signal contains a continuous range of values, whereas
a digital signal contains a finite set of values. Analogic signals also tend to be processed
holistically, whereas digital signals involve processing discrete units of information. For
example, children often first learn the alphabet by singing it (analogic encoding and de-
coding) but it takes a while for them to understand that the alphabet is composed of
discrete units called letters (digital encoding). Whereas verbal communication tends
to be processed digitally, nonverbal signals may be processed analogically or digitally.
Recognition of words, symbols and emblematic gestures take place in the left side of
the brain, which is primarily responsible for tasks that involve using logic and analytical
reasoning and other forms of digital processing. On the other hand, voice recognition,
depth perception, and tasks related to emotions, space, pictures, and music tend to
create more activity in the right side of the brain, which is primarily responsible for ana-
logic processing (e.g., Hopkins 2007). Andersen (2008) argued that verbal communica-
tion tends to be digital and nonverbal communication, analogic: People understand
language by recognizing individual words, whereas nonverbal expressions are con-
tinuous and holistic. For example, people look at the whole face rather than dissecting
smaller movements of the eyes and mouth.
However, neurophysiological and discourse-oriented research both suggest that this
digital-analogic distinction is more complex than originally thought (Bavelas and
Chovil 2000; Efron 1990; MacLean 1990; McNeill 1992; Ploog 2003). First, findings sup-
porting this distinction are largely limited to initial perceptions that occur during the
decoding process. So while people may initially perceive many nonverbal behaviors
using right hemispheric processing, they may make sense of those perceptions using
both sides of their brain. Similarly, people may decode many nonverbal messages ana-
logically, but the encoding of those same messages may be a different story, especially
when a nonverbal message is sent with intent. Researchers have also proposed that the
brain is better understood as containing three parts: a primitive brain (located near the
brain stem) that controls instinctive behaviors such as screaming, defending one’s ter-
ritory, and protecting one’s child; a paleomammalian brain (composed of the limbic sys-
tem surrounding the primitive brain), which controls emotional expression and bodily
functions; and a neomammalian brain (part of the cerebral cortex), which controls
higher-order cognitive activity such as language processing. The tripartite brain approach
similarly implies more interdependence in verbal and nonverbal encoding and decoding
than early perspectives assumed, although clear neurophysiological differences in the
processing of verbal and nonverbal signals remain.
5. The functions of nonverbal communication

Functions are the purposes, motives, or goals of communication. An analysis of func-
tions answers the question, what does nonverbal communication do? Verbal and nonver-
bal communication work alone and in concert to fulfill many different communicative
functions. A functional approach subscribes to principles of
– polysemy – the same action can have multiple meanings,

– equipotentiality – the same behaviors or constellations of behaviors can serve differ-
ent ends,
– equifinality – different behaviors can achieve the same ends, and
– multifunctionality – functions are co-extensive, that is, several functions can be oper-
ative in parallel.
Other assumptions are that nonverbal communication is strategic, dynamic and itera-
tive – nonverbal expressions are created intentionally to achieve various ends, they
do not remain static but instead are changeable and evolving as a function of feedback
between interlocutors (see Burgoon and Saine 1978, and Patterson 1983, for fuller
articulation of the assumptions of a functional perspective).
Nonverbal communication, in particular, has been connected to the following func-
tions: message production and processing, expression of emotions, identification and
self-presentation, interaction management, relational communication, social influence,
and deception.
5.1. Message production and processing

Humans’ ability to produce and understand messages is, in many ways, remarkable.
Communicators are able to make sense of what others say and do in interaction
while anticipating the messages they will contribute to the interaction. At first glance,
it might appear that message processing and production are primarily related to verbal
communication. However, nonverbal behaviors facilitate (or sometimes interfere with)

speakers’ production of messages and recipients’ information-processing (Krauss, Chen,
and Chawla 1996; McNeill 1992; Morsella and Krauss 2004; Rimé and Schiaratura 1991;
Swerts and Krahmer 2005). As theorized by Morsella and Krauss’ (2001, 2005) gesture
feedback model and interference theory, kinesic behaviors such as nodding and iconic
gestures affect retrieval of words and ideas that become part of an utterance, vocal
cues that pace conversation help to maintain the rhythm and flow of speech, and visual
gestures reinforce comprehension. Moreover, nonverbal behaviors like hand gestures
and facial displays are important symbolic acts that are used intentionally by speakers
and contain unique information. For instance, Bavelas and Chovil (2006) found that
gestures and pointing were used in descriptive explanations to provide information
that could not readily be conveyed verbally. Özyürek (2002) found that speakers
used different forms of gestures depending on the location of the receiver of the mes-
sage. Nonverbal cues help draw attention to the speaker, and bodily cues such as
emblematic or illustrator gestures readily reinforce the meaning of verbal content.
Thus, nonverbal cues are an integral part of message production and processing.
A key issue in message processing is the congruence between verbal and nonverbal
cues and the relative importance of each in constituting the overall meaning of a mes-
sage. On the basis of a meta-analysis by Philpott (1983) and over 100 studies summar-
ized in Burgoon et al. (1996), the evidence is compelling that reliance on nonverbal cues
is greatest for adults when verbal and nonverbal messages conflict. In these cases, non-
verbal communication takes on particular importance in helping message recipients
determine meaning. However, incongruence does not necessarily impair comprehen-
sion. LaPlante and Ambady (2000) found that receivers are better at recognizing com-
pound emotional expressions in the face (such as expressions of anger and sadness)
than singular expressions of emotion (e.g., sadness only). Message incongruence may
make messages more salient and can lead receivers to process messages more fully,
which may lead to deeper understanding and better recall of messages.
5.2. Expression of emotions

Many studies have demonstrated that nonverbal communication plays a more powerful
role in the encoding and decoding of emotion than does verbal communication (e.g.,
Guerrero and Floyd 2006; Planalp, DeFrancisco, and Rutherford 1996). Therefore, it
is not surprising that considerable research has focused on the relationship between
nonverbal communication and emotion. In nonverbal communication, the emphasis
is on the expression rather than the experience of emotion, although the linkage
between the two is of paramount interest. Some researchers take a universalistic per-
spective, proposing that a limited number of adaptive expressions of basic emotions,
such as happiness, sadness, anger, fear, disgust, and surprise, evolved until they became
innate within the human species (e.g., Izard 1977; Tomkins 1963). This perspective is
consonant with the class of biological signals introduced by Buck (1997). Proponents
of a neurocultural perspective contend that innate expressions of basic emotions can
be curbed or modified by display rules that govern how emotion should be communi-
cated within a given culture (Ekman 1971; Ekman, Sorenson, and Friesen 1969). As
such, this perspective treats emotional expressions as a combination of biological and
social signals, with innate expressions modified by social rules. For example, Japanese
people may be more likely to hide feelings of anger than people from the U.S. because
of different cultural rules related to politeness and the importance of group harmony.
Other scholars support a behavioral ecology perspective (Fridlund and Duchaine
1996), which suggests that rather than altering one’s innate emotional expression by ap-
plying display rules, people display emotions that are consistent with their social mo-
tives. So, people might inhibit expressions of anger because they want to maintain a
positive relationship with someone. According to the behavioral ecology perspective,
the resulting expression would not reflect that someone is hiding anger, but rather
that someone wishes to be perceived positively by a relational partner. This perspective
aligns with Buck’s (1997) concept of social signals that reflect intentionally formulated
displays rather than expressive read-outs of internal experiences.
Relatedly, a significant body of scholarship has examined the extent to which culture
moderates emotional encoding and decoding. Evidence from meta-analyses confirms
that people from different cultures decode facial expressions of basic emotions simi-
larly, although there is an in-group advantage, with people most likely to decode facial
expressions accurately when they are viewing someone from their own culture (Elfen-
bein and Ambady 2002, 2003). There is also cross-cultural similarity in how people
encode basic emotions, especially when emotional expressions are spontaneous. How-
ever, people from different cultures also exhibit nonverbal accents, which help people
identify their country of origin. For example, people can differentiate between Australians
and U.S. citizens based on their smiles (Marsh, Elfenbein, and Ambady 2003).
5.3. Identification and self-presentation

Who individuals are within an interaction – role, position or even personality – is typ-
ically communicated nonverbally. Although specific roles may be defined for partici-
pants in an interaction (for instance, the role of teacher and student may be clear
within a classroom setting), the enactment of the role or its meaning is conveyed
nonverbally. When communication researchers consider identification and bodily com-
munication, the emphasis is typically on group identities and the ways in which indivi-
duals signal their affiliation with (or distance from) particular social groups. For
instance, gender differences in body movements, positions and facial expressions com-
municate expectations for interaction. Men and women use different body positions
and movements, with men adopting more expansive and relaxed body positions and
engaging in more restless movements than women (Hall 2006). Additionally, gender
differences appear to be centrally tied to social interaction. Men and women do not dif-
fer in smiling when alone, but women smile more than men during interaction. More-
over, this communication difference is heightened in same-sex pairs and when people
are aware they are being observed (LaFrance, Hecht, and Levy Paluck 2003).
The use of nonverbal behavior as a way to convey group identification has been most
clearly explained in communication accommodation theory (CAT, Giles and Ogay
2006). Communication accommodation theory proposes that individuals manage their
vocalic behavior to match in-group members by converging toward their style and
rate of speech; individuals tend to diverge from out-group members to communicate
their distinctiveness and desire for autonomy. Thus, nonverbal behavior is central to
communicating group membership. Nonverbal behavior is also an important aspect
of individual identity enactment; it is used to control information about the self
(self-presentation) and to manage impressions (Goffman 1959; Tedeschi and Norman

1985). Attractiveness is one impression that is beneficial in many situations. Although
physical appearance (clothing, facial features) is an aspect of attractiveness, bodily com-
munication such as nonverbal expressivity, nonverbal immediacy and involvement, and
vocalic warmth have all been shown to increase attractiveness (Burgoon and Bacue
2003; DePaulo 1992). Expectancy violations theory (Burgoon 1983; Burgoon and Hub-
bard 2005) posits that violations of spatial expectations can be beneficial to rewarding
communicators (e.g., attractive communicators or communicators with high status).
This occurs because the meaning of such violations is ambiguous, and as interaction
partners attribute meaning to the violation, they utilize information about the reward-
ing nature of the partner. Moreover, body cues can be used to convey impressions not
only to interaction partners but to other people. Tie signs, such as holding hands, are
often used to communicate information about the relationship to others in the context
(Afifi and Johnson 1999).
5.4. Interaction management

It is our ability to coordinate our actions with others that makes interaction possible
(Cappella 1987). This is done largely through nonverbal behavior. Processes such as in-
dicating that one is ready to interact or signaling recognition of another from a distance
are achieved largely through body orientation, gaze and gesture (Bavelas 1990; Kendon
1990; Robinson 1998). This is true both in contexts where individuals are approaching
from a distance for greeting and in situations where individuals in the same room are
waiting to begin a conversation. Additionally, once interaction begins, turn-taking is sig-
naled through a variety of bodily cues including vocalic rate and pitch, gaze and gesture.
Speakers use a variety of nonverbal behaviors, such as diverting gaze from the listener,
vocalized pauses, and continuation of a gesture, to maintain possession of the floor in
interaction (Cappella 1994; Kendon 1990). Likewise, listeners often request a turn via
nonverbal cues such as speaker-directed gaze, head nods, and forward leans. Managing
turn-taking nonverbally is important because it allows partners to sustain the verbal
continuity of the conversation without having to stop and talk about whose turn it is
to speak.
Coordination of interaction, however, extends beyond turn-taking. Interaction en-
tails coordinating a range of behaviors. A large body of research on interaction adap-
tation has demonstrated that partners influence one another’s behavior; it is not
unusual for partners to mirror one another’s posture or to adopt similar rhythms and
cadence of speech. This coordination seems to emerge naturally, although its emergence
does not mean that it is unimportant as a communication process. There is evidence that
coordination can serve as an indicator of the quality of the relationship between part-
ners and an indicator of rapport (Cappella 2006). However, Manusov (1992) found that
intentional attempts to mirror others’ behaviors or mimic their speech rate were seen as
disingenuous.
Numerous nonverbal theories have conceptualized the process whereby senders and
receivers adapt their communication to one another. Specifically, these theories of inter-
action adaptation are designed to explain compensation and reciprocity in nonverbal
behavior during interaction. The importance of labeling arousal in interaction was
the focus of early theoretical approaches such as arousal labeling theory (Patterson
1976). Early research demonstrated that increased immediacy/involvement by an inter-

action partner led to arousal, but interpretation of nonverbal cues was key to predicting
behavioral responses. Cognitive valence theory (Andersen 1985) extended this view by
noting that labeling is most important in interactions where changes in behavior lead to
moderate arousal, which is then interpreted based on a variety of contextual cues, in-
cluding personality characteristics of the individual, social norms, and environmental
cues. Patterson’s sequential functional model (2001) highlights the fact that nonverbal
behavior serves many different functions within interaction and that individuals
initiate – as well as react to – patterns of interaction. In a functional view, patterns of
behavior are as likely to reflect the goals of a communicator as they are to reflect inter-
pretation of arousal. Bavelas (Bavelas et al. 1986; Bavelas and Chovil 2000) has noted
that the motor mimicry evident in many interactions is not simply a signal of underlying
affective responses; rather, it is a communicative act meant to convey meaning to inter-
action partners and most evident in interactions when a partner can perceive it. The
most comprehensive communication perspective on adaptation is found in interaction
adaptation theory (IAT) (Burgoon, Stern, and Dillman 1995). This theory posits that in-
dividuals bring to interaction a set of dispositions and expectations for interaction that
influence whether they converge or diverge their behavior from that of a communica-
tion partner. The theory predicts that convergence and divergence are communicative,
and they are used as a way to influence the behavior of others. For instance, an individ-
ual who desires moderate levels of involvement would likely diverge from a partner
who displayed low involvement. Divergence provides a way to communicate desired
levels of involvement and can influence the behavior of a partner. Additionally, IAT
holds that processes of adaptation not only influence how interactions transpire but
also impact how interactants are perceived and the relational dimensions of their
interaction.
5.5. Relational communication

Processes of adaptation are also essential to understanding the exchange of nonverbal
messages that reflect the type of relationship interactants share. Burgoon and Hale
(1984) identified seven fundamental relational categories that characterize messages:
level of intimacy, dominance/submission, degree of social composure, degree of similar-
ity, formality/informality, task versus social orientation, and level of emotional arousal.
Although all these themes are important, researchers have concentrated most of their
research efforts on intimacy and dominance/submission.
Intimacy is communicated through a set of nonverbal behaviors that have variously
been called immediacy (Mehrabian 1969) or positive involvement cues (Prager 2000;
Prager and Roberts 2004). These behaviors simultaneously communicate positive affect
and active engagement in an interaction. Coker and Burgoon (1987) noted that positive
affect and involvement are two separate dimensions underlying nonverbal messages.
Messages of intimacy and affection lie at the intersection of these dimensions. Several
behaviors have been identified as immediacy or positive involvement cues, including
touch, close conversational distancing, forward lean, smiling, direct body orientation,
and vocal warmth (Andersen 1985; Patterson 1983). Scholars have developed theories
of nonverbal adaptation to better understand how people respond to increases or de-
creases in nonverbal intimacy (Burgoon, Floyd, and Guerrero 2010). Each of these
theories advances somewhat different explanatory mechanisms for predicting patterns

of reciprocity or compensation following a change in nonverbal intimacy. In expectancy
violations theory, the positive or negative valence of the unexpected behavior combines
with the perceived reward value of the partner to predict patterns of reciprocity and
compensation (Burgoon and Hale 1988). In cognitive valence theory (Andersen
1998) and discrepancy-arousal theory (Cappella and Greene 1982), arousal change is
a key variable. Cognitive valence theory also includes culture, personality, temporary
states, the relationship, the situation, and perceptions of the partner (which are similar
to reward value in expectancy violations theory) as predictors of reciprocity. Interaction
adaptation theory (Burgoon, Stern, and Dillman 1995) and Patterson’s (1982, 2001)
functional and parallel-processing models add even more complexity by specifying
that people have different communicative needs and preferences based on the various
functions that intimacy (or any other) relational message may serve in a given interac-
tion. As suggested by Patterson’s 1983 work, nonverbal messages of dominance can
fulfill various functions within interactions, including allowing people to express
themselves, delegate work to others, or facilitate discussion. From a communication
perspective, dominance has been defined as “actual interactional behaviors by which
power and influence can be accomplished” (Burgoon and Hoobler 2002: 268). Rather
than associating dominance with threatening or intimidating communication, this per-
spective casts dominant communication as a socially skilled activity that allows people
to accomplish goals.
Dominance cues have been organized based on several principles of power (Bur-
goon and Dunbar 2006). The principle of elevation explains why nonverbal cues such
as height, living in a penthouse suite, and standing over someone communicate domi-
nance. The principle of centrality suggests that dominant individuals are often the center
of attention during group interaction, and thus are seated more centrally and looked at
more often, among other cues. The principle of space and access derives from research
showing that more powerful individuals have larger, more luxurious territories that are
private and often guarded by gatekeepers (such as administrative assistants) who reg-
ulate their space. The principle of interactional control suggests that dominant indivi-
duals manage the interaction around them through behaviors such as initiating and
ending interaction, as well as determining who can and cannot speak. Finally, the prin-
ciple of prerogative explains that powerful people can violate norms, such as showing up
late for a meeting, without as much penalty as less powerful individuals. Research on
expectancy violations theory has supported this idea by showing that people with
high-status are sometimes evaluated more positively when they violate behavioral
norms instead of staying within the norms (Burgoon and Dunbar 2006). Communica-
tion accommodation theory (Gallois, Ogay, and Giles 2005; Shepard, Giles, and Le
Poire 2001) also helps explain patterns of dominance and submission by suggesting
that people are more likely to converge to the nonverbal style of high-status others,
whereas they are more likely to diverge from the nonverbal style of low-status others.
5.6. Social influence

Dominance is not only a relational message, but also one means by which people
achieve social influence. Social influence, which has also been termed interpersonal
influence or persuasion, refers to efforts to: “preserve or change the behavior of another
individual” or “to maintain or modify aspects of another individual that are proximal
to behavior, such as cognitions, emotions, and identities” (Dillard, Anderson, and
Knobloch 2002: 426). A long history of research links nonverbal behavior to social influ-
ence. In the area of compliance-gaining, a meta-analysis demonstrated that nonverbal
communication is just as important as verbal communication for getting people to com-
ply with requests, with acquiescence more likely if requestors engage in positive and
socially acceptable forms of touch, use eye contact, present a professional and well-
groomed appearance, and stand moderately close to the person whom they are trying
to persuade (Segrin 1993). The associations between these behaviors and compliance-
gaining, however, are mediated by at least two variables. First, the credibility of the re-
questor is vitally important. Individuals who are considered to be trustworthy, likeable,
composed, competent, and charismatic are more likely to be perceived as credible, and
therefore to be more persuasive (Burgoon, Birk, and Pfau 1990). Second, people are
more likely to comply with requests that they perceive to be legitimate compared to
those that are perceived to be unreasonable and unnecessary (e.g., Baron 1978).
The way that nonverbal cues are processed in persuasive contexts may also modify
the relationship between these cues and enduring social influence. Petty and Cacioppo’s
(1986) classic work on the elaboration likelihood model of persuasion suggests that non-
verbal cues can be processed one of two ways – through a central or a peripheral route.
Central route processing involves carefully considering the relevance and merit of the
information or behavior that is presented, whereas peripheral route processing involves
assigning meaning to a simple cue, such as appearance, without much (if any) appraisal
of that cue. Nonverbal cues can be processed either way. For example, a voter might
evaluate a political candidate’s facial expressions carefully to try to determine if the
candidate really cares about the issues she is discussing (central processing), or the
voter may simply note that the candidate smiled and infer that she is a friendly person
(peripheral processing). Long-term social influence is more likely when information is
processed centrally. Nonetheless, as Burgoon, Birk, and Pfau’s (1990) research suggests,
cues that are processed peripherally can lead to perceptions of credibility, which can,
in turn, affect the persuasion process. In the political realm, nonverbal cues such as
facial expressions of happiness (Masters et al. 1987), physical appearance cues (Lau
and Redlawsk 2001), and vocal pitch (Gregory and Gallagher 2002) act as heuristic
cues that impact credibility judgments and voter preferences.
5.7. Deception
Deception is included as a unique communication function because it reflects situations
in which communicators knowingly manage a message to “foster a false belief or con-
clusion by a receiver” (Buller and Burgoon 1996: 205). That is, they covertly violate the
Gricean maxims of cooperative discourse (Grice 1989; McCornack 1992). Although
considerable research has been conducted on nonverbal deception cues, virtually all
deception scholars note that no single cue serves as a reliable indicator of deception
(e.g., Vrij 2006). Cues related to anxiety and tension that often accompany deception
(cues such as heightened pitch and nonfluencies) have been identified across studies,
but the complexity of interaction makes detecting deception from such cues very diffi-
cult. A key element of the communication perspective on deception is the recognition
that deceivers manage their behavior and respond to the communicative actions of
others. Thus, rather than seeking to identify behaviors that inadvertently “leak” infor-
mation about deception, communication researchers have explored how deception oc-
curs in interactive contexts. Interpersonal deception theory (IDT; Buller and Burgoon
1996) provides a conceptual framework for explaining how the behavior of both deceiv-
ers and receivers contribute to the success (or failure) of deceivers and the impressions
formed by their interaction partners. IDT emphasizes the numerous communication
tasks liars must accomplish simultaneously during interaction, such as managing anxiety
and responding appropriately. IDT also notes that in face-to-face encounters, decei-
vers and receivers both influence the interaction. When people communicate with one
another, a number of interaction processes, such as synchrony and matching, come into
play (Burgoon et al. 1999). For instance, White and Burgoon (2001) found that although
deceivers were less nonverbally involved initially in interactions, they increased their
involvement over time. This adjustment was most notable when deceivers were inter-
acting with a partner who displayed increased nonverbal involvement. Thus, examina-
tion of deception as a communication function requires examination of the behavior of
all participants in the interaction.
6. Summary
A communication perspective views verbal and nonverbal communication as intimately
intertwined components of the human signaling system that has both biological and
social origins. Together, language and nonverbal codes form a goal-oriented system
for creating and sharing meaning among members of a social community. The nonver-
bal codes of kinesics, vocalics, physical appearance, proxemics, haptics, chronemics and
artifacts, separately or in combination with one another and with linguistic components,
accomplish such communication functions as producing and comprehending messages;
expressing and interpreting emotions; creating personal and social identities and
managing self-presentations; managing interactions; defining interpersonal relations;
influencing others; and perpetrating and detecting deception.
7. References
Afifi, Walid A. and Michelle L. Johnson 1999. The use and interpretation of tie signs in a public
setting: Relationship and sex differences. Journal of Social and Personal Relationships 16: 9–38.
Andersen, Peter A. 1985. Nonverbal immediacy in interpersonal communication. In: Aron Wolfe
Siegman and Stanley Feldstein (eds.), Mulitchannel Integrations of Nonverbal Behavior, 1–36.
Andersen, Peter A. 1991. When one cannot not communicate: A challenge to Motley’s traditional
communication patterns. Communication Studies 42(4): 309–325.
Andersen, Peter A. 1998. The cognitive valence theory of intimate communication. In: Mark T.
Palmer and George A. Barnett (eds.), Progress in Communication Sciences, Volume 14: Mutual
Influence in Interpersonal Communication Theory and Research in Cognition, Affect, and
Behavior, 39–72. Norwood, NJ: Ablex.
Andersen, Peter A. 2008. Nonverbal Communication: Forms and Functions, second edition. Long
Grove, IL: Waveland Press.
Baron, Robert A. 1978. Invasions of personal space and helping: Mediating effects of the invader’s
apparent need. Journal of Experimental Social Psychology 14: 304–312.
Bavelas, Janet Beavin 1990. Behaving and communicating: A reply to Motley. Western Journal of
Speech Communication 54: 593–602.
Bavelas, Janet Beavin, Alex Black, Charles R. Lemery and Jennifer Mullett 1986 “I show you how
I feel.” Motor mimicry as a communicative act. Journal of Personality and Social Psychology
50: 322–329.
Bavelas, Janet Beavin and Nicole Chovil 2000. Visible acts of meaning: An integrated message
model of language in face-to-face dialogue. Journal of Language and Social Psychology 19:
163–194.
Bavelas, Janet Beavin and Nicole Chovil 2006. Nonverbal and verbal communication: Hand ges-
tures and facial displays as part of language use in face-to-face dialogue. In: Valerie Lynn Man-
usov and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 97–115.
Buck, Ross 1988. Nonverbal communication: Spontaneous and symbolic aspects. American Behav-
ioral Scientist 31: 341–354.
Buck, Ross 1997. From DNA to MTV: The spontaneous communication of emotional messages.
In: John O. Greene (ed.), Message Production: Advances in Communication Theory, 313–339.
Buller, David B. and Judee K. Burgoon 1996. Interpersonal deception theory. Communication
Theory 6: 203–242.
Burgoon, Judee K. 1983. Nonverbal violations of expectations. In: John M. Wiemann and Randall
P. Harrison (eds.), Nonverbal Interaction: Volume 11. Sage Annual Reviews of Communication,
11–77. Beverly Hills, CA: Sage.
Burgoon, Judee K. 1985. The relationship of verbal and nonverbal codes. In: Brenda Dervin and
Melvin J. Voight (eds.), Progress in Communication Sciences, Volume 6, 263–298. Norwood, NJ:
Ablex.
Burgoon, Judee K. and Aaron E. Bacue 2003. Nonverbal communication skills. In: Brant Raney
Burleson and John O. Greene (eds.), Handbook of Communication and Social Interaction
Skills, 179–219. Mahwah, NJ: Lawrence Erlbaum.
Burgoon, Judee K., Thomas Birk and Michael Pfau 1990. Nonverbal behaviors, persuasion, and
credibility. Human Communication Research 17: 140–169.
Burgoon, Judee K., David B. Buller, Amy S. Ebesu, Patricia A. Rockwell and Cindy White 1996.
Testing interpersonal deception theory: Effects of suspicion on nonverbal behavior and rela-
tional messages. Communication Theory 6: 243–267.
Burgoon, Judee K., David B. Buller, Cindy H. White, Walid A. Afifi and Aileen L. S. Buslig 1999.
The role of conversation involvement in deceptive interactions. Personality and Social Psychol-
ogy Bulletin 25: 669–685.
Burgoon, Judee K. and Leesa Dillman 1995. Gender, immediacy and nonverbal communication.
In: Pamela J. Kalbfleisch and Michael J. Cody (eds.), Gender, Power, and Communication in
Human Relationships, 63–81. Hillsdale, NJ: Erlbaum.
Burgoon, Judee K. and Norah E. Dunbar 2006. Dominance, power and influence. In: Valerie Man-
usov and Miles Patterson (eds.), The SAGE Handbook of Nonverbal Communication 279–298.
Burgoon, Judee K., Kory Floyd and Laura K. Guerrero 2010. Nonverbal communication theories
of adaptation. In: Charles Berger, Michael E. Roloff and David R. Roskos-Ewoldsen (eds.),
The New Sage Handbook of Communication Science, 93–108. Thousand Oaks, CA: Sage.
Burgoon, Judee K., Laura K. Guerrero and Kory Floyd 2010. Nonverbal Communication. Boston:
Allyn and Bacon.
Burgoon, Judee K. and Jerold L. Hale 1984. The fundamental topoi of relational communication.
Communication Monographs 51: 193–214.
Burgoon, Judee K. and Jerold L. Hale 1988. Nonverbal expectancy violations: Model elaboration
and application to immediacy behaviors. Communication Monographs 55: 58–79.
Burgoon, Judee K. and Gregory D. Hoobler 2002. Nonverbal signals. In: Mark L. Knapp and John
Augustine Daly (ed.), Handbook of Interpersonal Communication, 240–299. Thousand Oaks,
CA: Sage.
Burgoon, Judee K. and Deborah A. Newton 1991. Applying a social meaning model to relational
messages of conversational involvement: Comparing participant and observer perspectives.
Southern Communication Journal 56: 96–113.
Burgoon, Judee K. and Thomas J. Saine 1978. The Unspoken Dialogue. Boston: Houghton-Mifflin.
Burgoon, Judee K., Lesa A. Stern and Leesa Dillman 1995. Interpersonal Adaptation: Dyadic
Interaction Patterns. New York: Cambridge University Press.
Cappella, Joseph N. 1987. Interpersonal communication: Definition and fundamental questions.
In: Charles R. Berger and Steven H. Chaffee (eds.), Handbook of Communication Science,
184–238. Newbury Park, CA: Sage.
Cappella, Joseph N. 1994. The management of conversational interaction in adults and infants. In:
Mark L. Knapp and Gerald R. Miller (eds.), Handbook of Interpersonal Communication, sec-
ond edition, 380–419. Thousand Oaks, CA: Sage.
Cappella, Joseph N. 2006. The interaction management function of nonverbal cues. In: Valerie
Lynn Manusov and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communica-
tion, 361–379. Thousand Oaks, CA: Sage.
Cappella, Joseph N. and John O. Greene 1982. A discrepancy-arousal explanation of mutual influ-
ence in expressive behavior for adult-adult and infant-adult dyadic interaction. Communica-
tion Monographs 49: 89–114.
Coker, Deborah A. and Judee K. Burgoon 1987. The nature of conversational involvement and
nonverbal encoding patterns. Human Communication Research 13: 463–494.
DePaulo, Bella M. 1992. Nonverbal behavior and self-presentation. Psychological Bulletin 111: 203–240.
Dillard, James Price, Jason W. Anderson and Leanne K. Knobloch 2002. Interpersonal influence.
In: Mark L. Knapp and John A. Daly (eds.), Handbook of Interpersonal Communication, third
edition, 423–474. Thousand Oaks, CA: Sage.
Efron, Robert 1990. The Decline and Fall of Hemispheric Specialization. Hillsdale, NJ: Lawrence
Erlbaum.
Eibl-Eibesfeldt, Irenäus 1973. Expressive behaviour of the deaf and blind born. In: Mario von
Cranach and Ian Vine (eds.), Social Communication and Movement, 163–194. New York: Aca-
demic Press.
Ekman, Paul 1971. Universal and cultural differences in facial expressions of emotion. In: James K.
Cole (ed.), Nebraska Symposium on Motivation, 207–283. Lincoln: University of Nebraska Press.
Ekman, Paul, E. Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial
displays of emotion. Science 164: 86–88.
Elfenbein, Hillary Anger and Nalini Ambady 2002. On the universality and cultural specificity of
emotions. Science 164: 86–88.
Elfenbein, Hillary Anger and Nalini Ambady 2003. When familiarity breeds accuracy: Cultural expo-
sure and facial emotion recognition. Journal of Personality and Social Psychology 85: 276–290.
Fridlund, Alan J. and Bradley Duchaine 1996. Facial expressions of emotion and the delusion of
the hermetic self. In: Rom Harrè and W. Gerrod Parrott (eds.), The Emotions: Social, Cultural,
and Biological Dimensions, 259–284. Thousand Oaks, CA: Sage.
Galati, Dario, Klaus R. Scherer and Pio E. Ricci-Bitti 1997. Voluntary facial expression of emo-
tion: Comparing congenitally blind with normally sighted encoders. Journal of Personality and
Social Psychology 73(6): 1363–1379.
Gallois, Cindy, Tania Ogay and Howard Giles 2005. Communication accommodation theory: A
look back and a look ahead. In: William B. Gudykunst (ed.), Theorizing about Intercultural
Communication, 121–148. Thousand Oaks, CA: Sage.
Giles, Howard and Tania Ogay 2006. Communication accommodation theory. In: Bryan Whalen
and Wendy Samter (eds.), Explaining Communication: Contemporary Theories and Exemplars,
293–310. Mahwah, NJ: Lawrence Erlbaum.
Grice, Paul 1989. Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Goffman, Erwing 1959. The Pesentation of Self in Everyday Life. Garden City, NY: Anchor/
Doubleday.
Gregory, Stanford W. and Timothy J. Gallagher 2002. Spectral analysis of candidates’ nonverbal
vocal communication: Predicting U.S. presidential election outcomes. Social Psychology Quar-
terly 65: 8–315.
Guerrero, Laura K. and Lisa Farinelli 2009. Key characteristics of messages: The interplay of ver-
bal and nonverbal codes. In: William F. Eadie (ed.), 21st Century Communication: A Reference
Handbook, 239–248. Thousand Oaks, CA: Sage.
Guerrero, Laura K. and Kory Floyd 2006. Nonverbal Communication in Close Relationships. Mah-
wah, NJ: Lawrence Erlbaum.
Hall, Edward T. 1959. The Silent Language. Garden City, NY: Anchor/Doubleday.
Hall, Judith A. 2006. Women’s and men’s nonverbal communication: Similarities, differences,
stereotypes, and origins. In: Valerie Lynn Manusov and Miles L. Patterson (eds.), The Sage
Handbook of Nonverbal Communication, 201–218. Thousand Oaks, CA: Sage.
Hall, Judith A. and Mark L. Knapp 2010. Nonverbal Communication in Human Interaction, sixth
edition. Boston: Wadsworth.
Hopkins, William D. (ed.) 2007. The Evolution of Hemispheric Specialization in Primates. New
Izard, Carroll Ellis 1977. Human Emotions. New York: Plenum.
Krauss, Robert M., Yihsiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonver-
bal communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.),
Advances in Experimental Social Psychology, Volume 28, 389–450. New York: Academic
Press.
LaFrance, Marianne, Marvin A. Hecht and Elizabeth Levy Paluck 2003. The contingent smile: A
meta-analysis of sex differences in smiling. Psychological Bulletin 129: 305–334.
LaPlante, Debi and Nalini Ambady 2000. Multiple messages: Facial recognition advantage for
compound expressions. Journal of Nonverbal Behavior 24: 211–224.
Lau, Richard R. and David P. Redlawsk 2001. Advantages and disadvantages of cognitive heuris-
tics in political decision-making. American Journal of Political Science 45: 951–971.
MacLean, Paul D. 1990. The Triune Brain in Evolution: Role in Paleocerebral Functions. New
York: Plenum.
Manusov, Valerie Lynn 1992. Mimicry or synchrony: The effects of intentionality attributions for
nonverbal mirroring behavior. Communication Quarterly 40: 69–83.
Marsh, Abigail A., Hillary Anger Elfenbein and Nalini Ambady 2003. Nonverbal “accents”: Cul-
tural difference in facial expressions of emotion. Psychological Science 14: 373–376.
Masters, Roger D., Denis G. Sullivan, Alice Feola and Gregory J. McHugo 1987. Television cov-
erage of candidates’ display behavior during the 1984. Democratic primaries in the United
States. International Political Science Review 8: 121–130.
McCornack, Steven A. 1992. Information manipulation theory. Communication Monographs 59:
1–16.
of Chicago Press.
Mehrabian, Albert 1969. Significance of posture and position in the communication of attitude
and status relationships. Psychological Bulletin 71: 359–372.
Morsella, Ezequiel and Robert M. Krauss 2001. Movement facilitates speech production: A ges-
ture feedback model. Unpublished manuscript.
Morsella, Ezequiel and Robert M. Krauss 2005. Can motor states influence semantic processing?
Evidence from an interference paradigm. In: Alexandra M. Columbus (ed.), Advances in Psy-
chology Research, Volume 36, 163–182. Hauppauge, NY: Nova.
Motley, Michael T. 1990. On whether one can(not) not communicate: An examination via tradi-
tional communication postulates. Western Journal of Speech Communication 54: 1–20.
Motley, Michael T. 1991. One may not communicate: A reply to Andersen. Communication Stu-
dies 42: 326–339.
Özyürek, Asli 2002. Do speakers design their co-speech for their addressees? The effects of
Patterson, Miles L. 1976. An arousal model of interpersonal intimacy. Psychological Review 83:
235–245.
Patterson, Miles L. 1983. Nonverbal Behavior: A Functional Perspective. New York: Springer.
Patterson, Miles L. 2001. Toward a comprehensive model of nonverbal communication. In: Wil-
liam Peter Robinson and Howard Giles (eds.), The New Handbook of Language and Social
Psychology, 159–176. Chichester, UK: Wiley and Sons.
Petty, Richard E. and John T. Cacioppo 1986. The elaboration likelihood model of persuasion. In:
Leonard Berkowitz (ed.), Advances in Experimental Social Psychology, Volume 19, 123–205.
Philpott, Jeffrey S. 1983. The relative contribution to meaning of verbal and nonverbal channels of
communication: A meta-analysis. Unpublished master’s thesis, University of Nebraska.
Planalp, Sally, Victoria Leto DeFrancisco and Diane Rutherford 1996. Varieties of cues to emo-
tion in naturally occurring situations. Cognition and Emotion 10: 137–153.
Ploog, Detlev W. 2003. The place of the triune brain in psychiatry. Physiology and Behavior 70:
487–493.
Prager, Karen J. 2000. Intimacy in personal relationships. In: Clyde Hendrick and Susan S. Hen-
drick (eds.), Close Relationships: A Sourcebook, 229–242. Thousand Oaks, CA: Sage.
Prager, Karen J. and Linda J. Roberts 2004. Deep intimate connection: Self and intimacy in couple
relationships. In: Debra J. Mashek and Arthur P. Aron (eds.), Handbook of Closeness and Inti-
macy, 43–60. Mahwah, NJ: Lawrence Erlbaum.
Rimé, Bernard and Loris Schiaratura 1991. Gesture and speech. In: Robert S. Feldman and Ber-
nard Rimé (eds.), Fundamentals of Nonverbal Behavior, 239–281. Cambridge: Cambridge Uni-
versity Press.
Robinson, Jeffrey D. 1998. Getting down to business: Talk, gaze, and body orientation during
openings of doctor-patient consultations. Human Communication Research 25: 97–123.
Segrin, Chris 1993. The effects of nonverbal behavior on outcomes of compliance-gaining at-
tempts. Communication Studies 44: 169–187.
Shepard, Chris, Howard Giles and Beth A. Le Poire 2001. Communication accommodation
theory. In: William Peter Robinson and Howard Giles (eds.), The New Handbook of Language
and Social Psychology, 33–56. Chichester, UK: Wiley.
Swerts, Mark and Emiel Krahmer 2005. Audiovisual prosody and feeling of knowing. Journal of
Memory and Language 53: 81–94.
Tedeschi, James T. and Nancy M. Norman 1985. Social power, self-presentation, and the self. In:
Barry R. Schlenker (ed.), The Self and Social Life, 293–322. New York: McGraw-Hill.
Tomkins, Silvan S. 1963. Affect, Imagery, Consciousness: Volume 2. The Negative Affects. New
York: Springer.
Vrij, Aldert 2006. Nonverbal communication and deception. In: Valerie Lynn Manusov and Miles
L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 341–359. Thousand
Oaks, CA: Sage.
White, Cindy H. and Judee K. Burgoon 2001. Adaptation and communicative design: Patterns of
interaction in truthful and deceptive conversations. Human Communication Research 27: 9–37.
Judee K. Burgoon, Tucson, AZ (USA)

Laura K. Guerrero, Tempe, AZ (USA)
Cindy H. White, Boulder, CO (USA)
40. Mind, hands, face and body 627
40. Mind, hands, face, and body: A sketch of a goal

and belief view of multimodal communication
1. A model of mind, social action and communication
2. Communication
3. “Phonologies” and lexicons of the body
4. Multimodality
5. Multimodality in school, music and politics
6. References
Abstract
The chapter presents a definition and analysis of multimodal communication according
to a model in terms of goals and beliefs. Communication is a process in which a Sender
has a goal (a conscious intention, an unconscious, tacit, social goal, or a biological func-
tion) to have some Addressee come to know some belief, and to achieve this goal pro-
duces, in some modality (words, prosody, gesture, touch, gaze, face) a signal
connected, in one’s and the Addressee’s mind, to some meaning (the belief to convey),
according to a communication system, i.e., a set of rules to put signals and meanings
in correspondence. The model presented argues that not only words and symbolic ges-
tures, but also gaze, touch and other communication systems form a lexicon, i.e., a sys-
tematic list of signal-meaning pairs, where each signal can be analyzed in terms of a
small set of parameters, like are words in phonology. The chapter provides analyses
from the lexicons of symbolic gestures, gaze and touch, and presents the “score” of multi-
modal communication, a scheme to annotate the meanings conveyed simultaneously in
various modalities, while showing its potential for the analysis of sophisticated aspects
of communication in comic films, political discourse and piano performance.
1. A model of mind, social action and communication

This chapter presents a view of multimodal communication according to a model of
mind, social action and communication in terms of goals and beliefs, put forth since
1973 by a group of psycholinguists and cognitive scientists at the Institute of Cognitive
Science and Technology of the National Research Council and at the University Roma
Tre in Rome (Parisi and Antinucci 1973; Parisi and Castelfranchi 1975; Castelfranchi
and Parisi 1980; Conte and Castelfranchi 1995; Miceli and Castelfranchi 1992, 1995,
1998; Castelfranchi and Poggi 1998; Poggi and Magno Caldognetto 1997; Pelachaud
and Poggi 2001; Rector, Poggi, and Trigo 2003; Merola and Poggi 2004; Merola 2009;
Poggi 2006b, 2007, 2008; Poggi and Vincze 2008; Poggi and D’Errico 2009).
1.1. Goals
The life of any natural or artificial, individual or collective system (a human, an octopus,
a robot, a machine, a group, an institution) is ruled by goals. A goal is a state, repre-
sented or not in a system, that regulates its corresponding action: when the perceived
state is different from the regulating state, the system performs some action to cancel
the difference (Miller, Galanter, and Pribram 1960). To achieve a goal, a system devises
and performs a plan by making use of internal resources (the system’s action capacities
and beliefs) and external resources (material resources in the environment and social
exchange with other systems). A plan is a set of actions aimed at hierarchically arranged
goals, where a goal (e.g., creating the necessary world conditions) may aim at a super-
ordinate goal, a supergoal, and all goals and supergoals aim at a final goal. For example,
if to eat I decide to cook spaghetti, I have to heat water, boil spaghetti and make sauce.
Given the definition of goal as simply a regulating state, several psychological no-
tions can be considered as goals: a person’s wishes and intentions, needs, drives, biolog-
ical functions of animals and functions of artefacts. Further, a goal does not necessarily
imply conscious volition, nor is it necessarily represented in the system it regulates:
internal vs. external goals are distinguished. An internal goal is one represented, either
consciously or not, in an individual, while an external goal is not represented in the sys-
tem but determines its features or actions, as do the functions of artefacts, social roles
and biological functions. The function of a chair of letting me sit down is not repre-
sented in the chair but in my mind: an internal goal for me but an external one for
it. The social role of a newspaper director is determined by the goal of that organization
that someone decides and is responsible for what is written in the newspaper. The goal
of flying away triggered in a bird by the outline of a predator, though not explicitly
represented in the bird, has the biological function of preventing predation.
The power of a system to achieve one’s goals depends on world conditions, the pres-
ence of resources and the system’s capacity to perform necessary actions. If world con-
ditions are not fulfilled, system A can fulfil them through a sub-plan, but if A cannot do
the right actions or does not possess the necessary resources, it may depend on another
system B and have the goal that B adopt A’s goal. B adopts A’s goal when B gets regu-
lated by A’s goal and helps A to achieve it. Different kinds of adoption exist: exploit-
ative (I host you in a hut in a cotton field so you work for me as a slave), social exchange
(I lend you my car so you lend me your pied-à-terre), cooperative (I help you to do your
home assignment so we can go to the movie together), altruistic (I dive into the sea to
save you), normative (I let you cross the street because the light is red). Through adop-
tion, systems provide resources to each other, enhancing the likeliness of adaptation.
Adoption may entail social influence. Social influence takes place when A causes B
to have a goal B did not have before, or to give up a previous goal. If A wants B to
adopt A’s goal, she can try to influence B, i.e., induce B to place his actions or resources
at A’s disposal: if A is out of salt, she can ask neighbour B to lend her some. We try to
influence others for both selfish and altruistic goals: I may influence you to do things
that are in your interest, like advising you to take an umbrella when it’s raining or telling
you of a job opening.
1.2. Beliefs
More than other animals, humans need to acquire, process and use beliefs in goal pur-
suit, in order to check pre-conditions for action, likeliness of success and respective
worth of alternative goals. Beliefs may be represented in a system in either a sensori-
motor or a propositional format, or both. Sensorimotor representations are mental
images or body movements: the visual image of a chair (its shape, back, number of
legs) or the muscular actions we perform to sit down. In the propositional format, the
chair is represented as an object, a piece of furniture, with the function of sitting down:
a set of propositions, each formed by a predicate and its arguments, with a predicate
being a property of a single argument or a relation among two or more arguments
(Parisi and Antinucci 1973; Castelfranchi and Parisi 1980). Beliefs may be assumed
with a different status depending on their context of assumption (real world, fantasy,
dreams…) and their degree of certainty (Miceli and Castelfranchi 1995; Castelfranchi
and Poggi 1998). But how are they acquired, organized and managed? Perception, sig-
nification and communication are three acquisition devices, while through memory and
inference, respectively, previous beliefs are stored and new ones are generated. In sen-
sation and perception, our first ways to acquire beliefs from the world, the steps from
perceiving to knowing are relatively few: the beliefs we come to assume strictly corre-
spond to the stimuli met by our senses. After selection and processing, perceived beliefs
get stored in long term memory, where they connect to each other through links of time,
space, class-example, set-subset, condition, cause-effect and means-goal, forming com-
plex networks ruled by the law of reciprocal compatibility: two contradictory beliefs
with the same degree of certainty, in a context of assumption of reality (not, for
instance, dreams or fantasy) cannot be accepted at the same time, and one or the
other must be rejected (Miceli and Castelfranchi 1995). Through inference, by connect-
ing two or more beliefs, we generate new ones. An inference is a new belief I draw from
old beliefs by applying some rule of inference. If I see that John tastes an orange and
leaves it (belief 1, acquired now through perception), and I know he loves oranges
(belief 2, a previous belief retrieved from memory), I may infer that this particular
orange is very sour (belief 3, drawn by inference). Signification can be viewed as the
crystallization of inference. If from a particular belief acquired from a perceivable stim-
ulus, whatever its context, I invariably draw the same inference, the stimulus becomes a
signal, and the inference becomes its meaning. If I see smoke, and I often verified that
where there is smoke there is fire, I can say that “smoke means fire.” With time, the link
between the two beliefs comes to be stored in memory, making the inference no longer
necessary, until the first belief finally recalls the second; it stands for the second (Saussure
1916; Peirce 1935; Eco 1975).
Communication is a way to get information from other people, by taking the stimuli
people produce as signals, and finding meaning in them. We often get information from
others simply thanks to our capacity of inferring beliefs from what we perceive; but in
communication we acquire information from another person just because that person
has the goal of sending us a signal to convey information. The difference between infer-
ring beliefs from other people’s being or behaving and receiving beliefs because they
communicate them to us is like the difference between theft and gift: beliefs we infer
from people are robbed from them, while beliefs people communicate are a gift they
bestow upon us. In this, communication follows the law of “reciprocal altruism of
knowledge” (Castelfranchi and Poggi 1998) – a version in terms of goals and beliefs
of Grice’s cooperative principle – according to which, when one needs a belief, the
other is bound to provide it, and the other way around.
At the same time, though, communication is an act of influence, since with any com-
municative act we provide beliefs about a goal of ours, to request another to adopt it.
By a command we ask the other to do something for us, by a question, to let us know
something, by informing, to believe what we say.
Actually, this also opens a chance for deception. The function of communicating is to
induce goals in others; but humans decide which goals to pursue on the basis of the be-
liefs they have. So to influence others, one may need to provide information apt to trig-
ger the goals one wants, whether one believes that information to be true or not.
Castelfranchi and Poggi (1998) define as deceptive any act, morphological feature or
even any non-act (omission) of some system S whose goal is that some system A
does not come to have some belief B that S believes to be true, and that is relevant
to A’s goals. S can deceive by linguistic actions (like in lies) but also by non-verbal com-
municative actions (a firm handshake and a smile to a person she hates), by actions
which are not communicative per se (hiding her lover in her wardrobe) or by doing
nothing at all (simply omitting to tell her partner she has AIDS).
2. Communication
In terms of the notions above, a communicative process takes place when a sender S has
the goal G of causing some addressee A to come to have some beliefs through signifi-
cation (i.e. to come to believe some meaning M), and in order to this goal, S produces a
signal s that S supposes is linked to the meaning M both in S’s and in A’s minds. The
signal s is produced in a productive modality PM, it is perceivable in a receptive
modality RM, and it is linked to the meaning M through a communication system CS.
The elements of this process are defined as follows:
(i) sender: a system who has the goal of conveying some belief to an addressee using
signification, and in order to this goal produces a signal linked to some meaning;
(ii) goal of communicating: the sender’s goal of having an addressee believe some
belief;
(iii) addressee: the system to whom the sender has the goal of making have beliefs;
(iv) signal: a perceivable stimulus (an action, object, part or aspect of an object, a
morphological feature or even a non-action like silence) that is linked to some
meaning. The sender believes that the addressee can perceive it and that in the
addressee’s mind that stimulus is linked to the same meaning as in one’s own.
(v) meaning: a belief or set of beliefs that corresponds to a signal;
(vi) productive modality: the body organ used by the sender to produce the signal;
(vii) receptive modality: the sensory organ through which the addressee may receive
the signal;
(viii) communication system: the system of rules to put signals and meanings in
correspondence.
2.1. The goal of communicating

While other definitions of communication (for instance the semiotic view; see Peirce
1935; Eco 1975) are more oriented on the receiver of the communicative signal, the def-
inition provided by a model in terms of goals is more oriented on the sender. A neces-
sary condition of a communicative process is the presence of a sender and of a goal of
communicating to some addressee (represented in the mind of the sender), while the
uptake of the beliefs communicated, and even the presence of an addressee, are
not necessary: if from a desert island I send a message in a bottle, and the bottle is
gurgled by a shark, a communicative process has taken place all the same, even if
communication was not successful.
Two features characterize this definition. On one side, positing the necessity of a goal
of communicating allows one to distinguish when belief acquisition is simply due to the
receiver’s inference from when it is due to a sender’s goal, hence a view of communi-
cation that does not include any acquisition of information, even possibly leaking from
the emitter against his will. On the other side, thanks to the notion of goal as a regula-
tory state, not necessarily a conscious intention, animal, unconscious and the so-called
non-verbal communication also may be called communication in their own right.
The goal of communicating may be internal or external, i.e., represented or not in the
individual’s mind. A conscious internal goal of communicating is represented and meta-
represented: we have the goal of having someone know, and we also believe we have
this goal. This is generally the case for all verbal communication, and the so-called “em-
blems” of other modalities: “symbolic” gestures, those that in a particular culture have a
codified shared meaning and a clear verbal paraphrase; “facial emblems,” like a grimace
conveying ignorance or a head nod expressing agreement. Unconscious signals, instead,
like a neurotic symptom, a tic or a compulsive behaviour, may have the goal of commu-
nicating some unconscious disturbance, but the Sender is not aware of this goal, nor
possibly of the disturbance itself. Finally, while talking, we make “beat” gestures (we
move our hands down in correspondence with stressed syllables) and raise our eye-
brows to emphasize what we are saying. These movements have the goal of communi-
cating that the part of sentence or discourse we are presently uttering is important, but
this goal is tacit, not represented at a high level of awareness: the signals are performed
in an automatic way, without attentive control.
Cases of communication ruled by an external goal of communicating, not repre-
sented in the Sender’s mind, are the gasoline light, warning when you’re out of fuel
( function of an artefact), a seagull’s flight warning the flock of a predator (biological
communicative function), and uniforms, status symbols and regional accent, ruled by
the social ends of expressing one’s identity or group belonging.
2.2. Signals and modalities

A communicative signal may be:
(i) an action of an object (e.g., the gasoline led lighting up), organism (a gesture, a cry,
a sentence) or group (the march of a crowd);
(ii) an object produced by an action (a book, a movie, a statue, a monument);
(iii) an object used during an action (black glasses worn at a funeral to hide crying eyes,
a cop’s helmet);
(iv) a part of an object (green light of the traffic light) or organism (a woman’s fleshy lips);
(v) an aspect of an object (fluorescence of car stops), organism (cute round face of a
baby) or group (density of a crowd);
(vi) a non-action (e.g., silence), if different from expected.
In brief, we may count as a signal any physical stimulus that in the sender’s and (as
assumed by the sender) in the addressee’s mind is linked to some meaning: a word,
a picture, a kiss, a slap, a strike, a resignation letter, a terrorist action…
Signals are produced and perceived in various modalities. In humans the receptive
modalities exploit their five senses: a perfume you wear is perceived through smell, a
tasty food conveying love or hospitality, through taste. But the majority of signals are
received through visual and auditory modalities, and produced by various body organs.
The hypothesis of this model is that not only head, face, hands, trunk, legs, but even sub-
parts of body organs, like eyes or nose, are repositories of specific communication
systems, whose signals may be listed and described.
2.3. Meaning
The meaning is the set of beliefs conveyed by a signal. The meanings a human may need
to provide to others for one’s adaptive goal, conveyed by signals of one or the other
communication system, are of three types (Poggi 2007): information on the world
(about concrete and abstract events, entities, properties, relations, times and places),
on the sender’s identity (sex, age, ethnicity, cultural roots and personality), and on
the sender’s mind (goals, beliefs and emotions concerning ongoing discourse).
2.4. Communication system

Communication systems, the systems of rules to link signals and meaning, can be distin-
guished in terms of the following parameters:
(i) relation of a signal with signals in other modalities. The “beats” we make to scan
speech are non-autonomous, we can use them only while talking, while “symbolic”
gestures, as well as signs of Sign Languages, are autonomous: they can be used
without speaking.
(ii) cognitive construction: whether and how a signal is represented in long term mem-
ory. Some signals are codified, i.e., the signal-meaning link is stably represented in
long term memory, like lexical items in a dictionary, and they form a “lexicon”. But
this property, generally acknowledged only for words and symbolic gestures, can
be credited also to signals in other modalities, like gaze, touch, head movements,
postures…
(iii) As opposed to a codified signal, a creative signal is one invented on the spot to
convey some meaning for which no corresponding signal is stored in memory:
for example a neologism or an iconic gesture. Suppose you want to convey the
meaning “cello” by a gesture. No ready made gesture-meaning signal exists, so
you have to create one, and you will produce a hand movement that in some
way resembles or recalls a cello, to have the Addressee understand it. In this
case, what is at work is not a lexicon but a set of rules for the generation of
creative signals.
(iv) signal – meaning relationship: whether or not the meaning can be inferred from the
signal. A signal is motivated (Saussure 1916) if it has a relation of similarity with its
meaning (e.g., the iconic symbol fork-and-knife to mean “restaurant,” the ono-
matopoeic word “drip”), or one of compositionality (like in morphological deriva-
tives) or mechanical determinism (the nouns for “mama,” which in all cultures
exploit labial consonants; the arm raising of gestures of elation, induced by the
arousal of that emotion). When the signal is not linked to its meaning by these
or other devices that allow one to infer the meaning from the signal without
knowing it in advance, it is an arbitrary signal.
(v) correspondence between signal and communicative act. The unit of communication
in humans is the communicative act, which includes a performative (the Sender’s
specific communicative goal: asserting, asking, requesting) and a propositional con-
tent (what the Sender asserts, asks, requests). Articulated signals, like most words
and some symbolic gestures, convey only part of a communicative act, while holo-
phrastic signals, like interjections (Poggi 2008), many gaze items and other sym-
bolic gestures, convey a whole communicative act, the meaning of a whole
sentence. The Italian symbolic gesture hand with index and middle fingers up in
V shape moving back and forth before mouth means “smoke” or “cigarette”
(Fig. 40.1), while hand palm down bent down twice means “I ask you to come
here” (Fig. 40.2).
Fig. 40.1: smoke, cigarette
Fig. 40.2: come here
3. “Phonologies” and lexicons of the body

Symbolic gestures, but also gestures of touch, the signals of gaze and body signals in
other modalities, can be analyzed by methods used by linguists for verbal and sign
languages, and their “phonology” and lexicon can be outlined.
(i) “Phonology” is a set of parameters, each with a small number of possible values,
such that each item in the system is described by the combination of the specific
values it assumes as to all parameters. Like in the words town and down the param-
eter “voicing” respectively assumes the value “non-voiced” for /t/ and “voiced” for
/d/, similarly in the Italian symbolic gestures for “mad” (index finger rotating on
temple) and “good, tasty!” (index rotating on cheek) the parameter “location”
assumes different values, “temple” and “cheek”;
(ii) – lexicon is the list of signal-meaning pairs stored in long-term memory.
Investigation aimed to write down the lexicon of a bodily codified communication

system relies on the following principles:
(i) For any signal considered communicative, it is possible to represent its meaning in
terms of “cognitive units,” i.e. logical propositions, each formed by a predicate with
its arguments. E.g., the performative of imploration, conveyed by head sideward-
forward, gaze to interlocutor, internal eyebrows raised (Fig. 40.3), contains the
following cognitive units:
a. Sender has the goal that addressee do an action a;
b. Sender believes the action a is in the interest of sender;
c. Sender believes that addressee has power over sender;
d. Sender believes that if addressee does not do the action a, then sender will
be sad.
Fig. 40.3: Facial performative: “I implore you”
(ii) Any signal, whether verbal or non-verbal, may be polysemic, that is, have two or
more different meanings, and one of them is triggered by context. But this does not
imply that meaning is freely floating, because the different meanings of one signal
all share a common semantic core (one or more cognitive units): e.g., raising eye-
brows may convey surprise, perplexity or emphasis, but all share a common core of
request for attention;
(iii) Signals, besides their literal meaning, often have an indirect meaning, that can be
inferred from the literal one, but in some cases is codified as another meaning of
that signal, to be also written in a lexicon. E.g., clapping hands has a literal meaning
of praise and an ironical one of blame.
To find out the meanings of items in a body lexicon, six methods can be used.
(i) Speaker’s judgements (Chomsky 1965). For each item, wonder if it is acceptable in
a given context, if it is ambiguous, if it has synonyms or paraphrases in words or
other modalities.
(ii) Deductive method. Figure out what types of information one may need to convey
for one’s adaptive goals, and wonder if and how they are conveyed in a given
communication system.
(iii) Ethnosemantics. Collect and analyse words describing non-verbal items. The
semantic differences between gaze and stare, glance, peek, wink, frown, glower,
blood-shot eyes, make sheep’s eyes, to look down on someone help to specify
what communicative action each communicative act of gaze conveys.
(iv) Observation. Analyze videorecordings of communicative interactions, collect uses
of each signal, find its meaning in each use and the core meaning shared by all uses.
(v) Empirical studies. Test the hypothesis about the meaning of a certain signal
through questionnaires or interviews.
(vi) Simulation. Simulate the items in embodied agents by representing their “phono-
logical” and semantic analysis, and assess their interpretation by users through
evaluation studies.
3.1. Cherology and lexicon of Italian symbolic gestures

Symbolic gestures are communicative autonomous gestures in which both signal
and meaning are culturally codified: a canonical shape and movement of the hand
conveys a meaning to which a shared verbal paraphrase corresponds in a certain
culture.
The “cherology” (phonology of hands) of Italian symbolic gestures has been de-
scribed (Poggi 2002b) in terms of the parameters proposed by Stokoe (1978), Klima
and Bellugi (1979) and Volterra (1987) for the signs of American Sign Language and
Lingua Italiana dei Segni: hand configuration, location, orientation and movement. It
includes 39 handshapes, 6 orientations and 35 locations. Movement is a complex param-
eter encompassing, besides direction, the so-called “expressivity” parameters (Hartmann,
Mancini, and Pelachaud 2002) of amplitude, velocity, fluidity and repetition.
Italian symbolic gestures are a very rich lexicon. Like other signals, they can con-
vey information about the world, the sender’s identity and the sender’s mind. Within
gestures providing information about the world, some mention persons, like “Indian”
or “communist,” animals (“horse,” “donkey”); objects (“scissors,” “cigarette”). Ac-
tions are sometimes conveyed by specific gestures (“walk”) but sometimes by the
same mentioning objects: there is no clear-cut distinction between nouns and verbs
(“to cut” is like “scissors,” “to smoke” like “cigarette”); other gestures mention prop-
erties (“thin,” “stubborn/stupid”), relations (“link between two”), times (“yester-
day”), and quantifiers (“two”).
Within gestures informing about the sender’s identity, flat hand on one’s heart, often
used by politicians (Serenari 2003), claims the sender’s positive image: it points to one-
self as if saying “I” or “we,” but with a nuance of “I/we, the fair and noble person(s).”
Many Italian symbolic gestures convey information about the sender’s mind. Some
inform on the degree of certainty of the beliefs mentioned: shaking index finger
means “no,” i.e., “I do not assume this information as true”; opening flat hands with
palms up means perplexity, a mental state of uncertainty. Other gestures inform on
the source of the mentioned beliefs: snapping thumb and middle fingers means “I am
retrieving beliefs from long-term memory”; both hands bending index and middle
fingers with palms to hearer mean “I am quoting another’s belief.”
Among gestures informing on the sender’s goals, some communicate performatives
(like “I apologize” or “Attention”). Others mention the relation of something to the
speaker’s goals: sliding hand-back under chin (= “I couldn’t care less,” i.e., “there is
no goal of mine to which this is relevant”: Fig. 40.4); fist rapidly rotating on wrist
with thumb and index finger extended (= “nothing to do,” “no way to have this goal
achieved”). Some gestures express logical links among sentences in a discourse, fulfill-
ing a metadiscursive function: fist slowly rotating on wrist with thumb and index finger
extended curve sets a causal link between facts mentioned in a discourse. Raising one
hand has a conversational function of requesting turn.
Fig. 40.4: I couldn’t care less
Fig. 40.5: Good, tasty
Among gestures informing on the speaker’s emotions, “Churchill’s gesture,” index and
middle finger extended upward, expresses elation for an achievement, beating right fist
over left hand palm up, disappointment, while the “cheek screw” (Morris 1977), tip of
the extended index finger rotating like a screw on the cheek (Fig. 40.5), expresses physical
or quasi-physical pleasure, for a tasty food, a pretty girl or even an exciting book.
Various Italian gestures make use of rhetorical figures: metaphor, synecdoche, irony,
hyperbole. The finger bunch, palm down, beaten on breast (Fig. 40.6) literally means “mi
sta qua” (“it’s here, on my stomach, I can’t digest it”), but metaphorically counts as
“I can’t bear him/her”: what cannot be digested is not a food but a person.
Fig. 40.6: It’s on my stomach
Fig. 40.7: Jail
Fig. 40.8: What’s the time?, Hurry up

Clapping hands, besides its literal use to approve or praise, may be used ironically to
express a sarcastic praise, hence strong disapproval or criticism. A hyperbolic gesture
is index finger skimming one’s cheek from cheek-bone down, iconically representing a
tear: the literal meaning is “I cry” but crying is a hyperbole, the intended meaning is
simply “I am sad.” Typically hyperbolic are some gestures of description, comment
or threat representing sexual organs or actions. Finally, some gestures use a synecdoche:
they represent some object or action to refer to something linked to it. Hand with
spread fingers covering one’s face (Fig. 40.7) mimics the bars of a jail to mean “jail”
(part-whole synecdoche) or “criminal” (one who often is in jail: container-content).
Right index touching left wrist (Fig. 40.8) means “what time is it?” or “hurry up, it’s
late.” A recursive synecdoche: from place (wrist) to object (watch), to its function
(knowing the time), to resulting action (hurrying).
3.2. Optology, lexicon and morphology of gaze

To describe the signal of gaze (optology), the following parameters have been proposed
(Pelachaud and Poggi 1998; Poggi 2006a, 2006b, 2007):
(i) movements of the eyebrows (e.g., frowning means worry or concentration, eyebrow
raising, perplexity or surprise);
(ii) position, tension and movement of the eyelids (in hate you lower upper and raise
lower eyelids with tension; in boredom upper eyelids are lowered but relaxed);
(iii) features of eyes: humidity (bright eyes of joy or enthusiasm), reddening (bloodshot
eyes of rage), pupil dilation (sexual arousal); focusing (stare out into space when
thoughtful), direction of iris with respect to direction of Speaker’s head and to
Interlocutor’s location (deictic use of eyes);
(iv) size of eye sockets (to express tiredness);
(v) duration of movements (a defying gaze keeps longer eye contact).
Also in gaze, like in symbolic gestures, several codified items can be found. Within infor-
mation on the world, gaze conveys information about entities (its deictic use: gazing at
something or someone to refer to it) or about properties (squinting eyes = little, wide
open eyes = huge: an iconic use of gaze). As to the sender’s identity, eyelid shape reveals
ethnicity, bright eyes reveal aspects of personality. Within information on the sender’s be-
liefs, gaze can tell how certain we are of what we are saying (slight frown = “I am serious,
not kidding”; raised eyebrows without wide open eyes = “I am perplexed, not sure”), and
what is the source of what we are saying (eyes left-downward = “I am retrieving from
memory”). Further, gaze communicates the performative of our sentence (staring at inter-
locutor = request for attention, frowning = question, fixed stare = defiance), topic-comment
distinction (averting vs. directing gaze to interlocutor), turn-taking (gaze at speaker to take
the floor) and backchannel (frown = incomprehension or disagreement).
Recently the hypothesis has been made (Poggi and Roberto 2007; Poggi, D’Errico,
and Spagnolo 2010) that some values in the parameters of gaze are comparable to mor-
phemes, since by themselves they convey specific meanings. For example, wide open
eyelids imply activation and alert, while half open eyelids have a semantic nuance of
de-activation or relaxation, but raised lower eyelids convey effort, even when the general
meaning of the gaze item is one of de-activation.
3.3. Haptology and lexicon of touch

The parameters of communicative touch are:
(i) touching part: the part of the sender’s body actively touching the addressee (hair,
forehead, head, eyelash, nose, cheek, beard, lips, teeth, tongue, shoulder, arm,
back, elbow, hand, fingers, nails, hip, genitals, glutei, thigh, knee, foot);
(ii) touched part: the part of the addressee’s body that is touched (hair, forehead, head,
eyebrows, eyelashes, eye, temple, nose, cheek, ear, beard, lips, tongue, neck, shoul-
der, arm, forearm, breast, trunk, stomach, back, elbow, hand, fingers, hip, genitals,
glutei, thigh, knee, calf, ankle, foot);
(iii) location or space that is touched: point, line or area;
(iv) movement, encompassing sub-parameters like path, pressure, duration, speed and
rhythm.
Also for touch one can find a lexicon, and the meanings of its items account for the pos-
itive or negative influence that physical contact has on people’s relationships. Some acts
of touch are communicative, and their meaning can be paraphrased in words. For many
touch items it is possible to find their origin in action (its degree zero of meaning, Pos-
ner and Serenari 2001) and the possible communicative inferences it elicits, that is, its
indirect meaning, which is sometimes idiomatized, i.e., stored in memory as a further
(sometimes the only) meaning of the item.
The meanings of touch may be analyzed in terms of the following criteria (Poggi
2007):
(i) name or verbal description of the act of touch: e.g., kiss, slap, kick, drying the
other’s tears;
(ii) verbal paraphrase or verbal expression that may accompany the touch: “C’mon,
don’t cry” while drying the other’s tears, or “I love you” while caressing
someone;
(iii) literal meaning: drying the other’s tears conveys “I want to console you”; a caress,
“I want to give you serenity and pleasure”;
(iv) indirect idiomatic meaning: an indirect meaning of a caress may be “I want you to
be calm”;
(v) originary meaning: the primitive goal of the act from which the literal meaning
might have evolved (e.g., through ritualization). So, embracing might derive
from a desire to wrap and incorporate the other.
(vi) social goal: four types of social disposition of the toucher towards the touched per-
son. An item of touch is “aggressive” when aimed at hurting or causing harm (e.g.,
a slap); “protective” when it offers help or affect (to kiss or to hand one’s hand to
another); “affiliative” when it asks for help or affection (a wife leaning on her hus-
band’s arm); “friendly” when it offers help or affection without implying difference
in power (to walk arm-in-arm).
Polysemy also holds in touch items, since some acts of touch have different meanings if
performed by actors with different relationship or status.
4. Multimodality
The communicating body may be viewed as an orchestra; its instruments are words,
prosody and intonation, gesture, head, gaze, face and posture, all playing simulta-
neously and making multimodal communicative acts in which the meaning intended
by the Sender is distributed across modalities. This raises two interesting issues:
(i) How are meanings distributed across modalities in the planning and production of
the multimodal message?
(ii) How can one disentangle the different meanings conveyed in different modalities?
4.1. Multimodal planning

By taking the perspective of embodied agents, one may explain how meanings are dis-
tributed across modalities in the planning and production of the multimodal message,
not only considering the interaction between gesture and speech (as do Rimé and
Schiaratura 1991; McNeill 1992, 2000, 2005; Krauss 1998; Kita 2000; Alibali, Kita,
and Young 2000; Bernardis, Salillas, and Caramelli 2008) but taking all modalities
into account, without crediting one with a privilege over others (Mancini and Pelachaud
2009; Poggi et al. 2007).
The process starts when a sender S conceives of some content to communicate (pos-
sibly including propositional, imagistic, motor, emotional information). Sometimes, e.g.
as one is overwhelmed by emotion, all information is expressed immediately in an
impulsive way, randomly distributed across modalities. But if the sender is a reflexive
agent (Poggi, Pelachaud, and De Carolis 2001) s/he wonders whether it is sound to com-
municate that content or not, by considering his or her goal, addressee A’s personality,
the addressee’s relation with the sender, and the context, finally opting among “commu-
nicating as is,” “passing over” or “deceiving.” Once he or she has decided to communi-
cate, the sender searches his or her own communicative systems in various modalities,
words, gestures, prosody, face, gaze, head movements, legs movements, postures, but to
avoid computational overload orders them in a stack called “selection priority.” The
order of communicative systems in the stack is determined by combined criteria:
(i) Meaning. If some types of meanings are more easily or typically conveyed in a par-
ticular modality (e.g., facial expression is generally more apt than verbal items to
convey emotions), that modality is searched first
(ii) Physical and social context: in a noisy disco the gestural system may be on top of
the stack
(iii) Addressee’s characteristics: if talking to a child, iconic gestures may be better than
verbal descriptions.
Once the stack is created, the sender searches for a suitable signal to convey the mean-
ing in the communicative system on top of it. If the signal found is not appropriate, the
search goes on in the next communicative system. If conveying a particular meaning is
particularly important, the sender may search signals for that meaning also in other
modalities and produce them simultaneously.
4.2. The “musical score” of multimodal communication

To disentangle how meanings are distributed across modalities in real communication,
an annotation scheme called the “musical score of multimodal communication”
(Magno Caldognetto et al. 2004; Poggi 2007) was proposed where signals of the
following modalities are reported on parallel lines:
– v. verbal modality (the words and sentences uttered);

– p. prosody-intonation (speech rhythm, pauses, intensity, stress, pitch);
– g. gesture (movements of hands, arms and shoulders);
– f. face (head and eye movements, gaze, smile and other facial expressions);
– b. body (trunk and leg movements, body posture, orientation and movements in
space).
– Each communicative item of each modality may be analyzed at various levels:
– SD. signal description: the acoustic or visual features of each signal are described. For
prosody, one reports pitch, length, intensity, fundamental frequency, pauses; for ges-
tures, gaze, face and body movements, the values in their parameters are mentioned.
The description can be discursive (e.g., speaker raises right hand, palm facing hearer),
or use some notation system, like Ekman and Friesen’s (1982) Facial Action Coding
System for facial modality or PRAAT for acoustic annotation.
– MD. meaning description: a verbal paraphrase of each communicative item. E.g.,
right hand, palm to hearer moved forward may be glossed as: “wait, be careful.”
– MT. meaning type: each item is classified as conveying Information on the World, the
Sender’s Identity or the Sender’s Mind.
– F. function: the semantic relation of the item with the simultaneous verbal (or other)
signal. Five possible functions are distinguished:
(i) repetition, if a signal bears the same meaning as one or more words. E.g., Speaker
opens and drops his hands heavily to mean: “very much, intensely” while saying
“very much.”
(ii) addition, if it adds information: saying “a large balcony”, while depicting a trian-
gle with the base facing oneself.
(iii) substitution, if the speaker does not utter a word due to a slip, amnesia or inten-
tional reticence, but conveys that meaning by a body signal. E.g., a speaker says:
“In that case…”, then he suspends and makes the gesture of cutting air horizon-
tally, meaning: “I will not do that.”
(iv) contradiction, if the meaning of a signal contrasts with one of simultaneous
words. While being interviewed about racism a speaker may show very tolerant
in words, but keep distance from the interviewer in posture.
(v) independence, if a signal simultaneous to some word has no relation to it. E.g.,
while talking on the phone, I wave to someone entering my room: my words
and my wave are not part of the same communicative plan.
4.3. The two layers score

To catch the subtlest devices of multimodal communication, a two-layers score was pro-
posed. Since any signal may have, besides its literal meaning, an indirect meaning that
can be inferred from the literal one, the levels of meaning description, meaning type
and function may include an additional line, so each signal may be interpreted and clas-
sified in different ways, based on whether the literal or the indirect layer of meaning is
taken into account.
The “two-layer musical score” is particularly fit to analyze sophisticated cases of
multimodal communication, like the following fragment from the movie “Totò a colori”
(“Totò in colours,” director Steno 1952), with Totò (Antonio De Curtis), the most
famous Italian comic actor (see Fig. 40.9).
Totò has just tried to court a maiden, but on her refusal he offers an apology and a
justification. Here goes the verbal text.
(1) “Hai ragione, hai ragione, scusa tanto! Mi sono lasciato trasportare dall’impeto,
dall’imbulso della carne! Di questa carnaccia maledetta! Mo’ mi faccio male!
Non so che mi farei! Mo’ me ceco l’occhi!”
(You’re right, you’re right, I apologize! I let myself be carried by the impetus, by
the impulse of flesh! By this bloody bad flesh! Now I’ll hurt myself! I don’t know
what I would like to do to myself! Now I’ll blind my own eyes!).
v. SD hai ragione, hai ragione, scusa tanto! mi sono lasciato trasportare dall’impeto, dall’imbulso della carne!
You’re right,you’re right,I apologize! I let myself be carried by impetus, by the imbulse of flesh!
g. SD opens and raises hands. extends forearm with

closed fists in tension
palms forward before breast closed fists
M I I leave you alone I strive hard I’m relaxed

II I apologize impetus I’m serene
MT I IW ISM ISM
II ISM IW ISM
F I Add. Add. Add.
II Rep. Rep. Add.
f. SD looks down, inner eyebrows up frowns raises chin
M I I am ashamed, I am sad I strive hard I am proud

II I repent I apologize impetus
MT I ISM ISM ISM ISM
II ISM ISM IW
F I Add. Add. Add. Add.
II Rep. Rep. Rep.
Legenda:
v.=verbal modality; p-i.= prosody-intonation; g.= gesture; f.=face; b.=body; SD=Signal Description; ST=Signal Type;
MD=Meaning Description; MT=Meaning Type; F=Function; IW=Information on the World; ISM=Information on the
Sender’s Mind; lh.=left hand; rh.=right hand; I = literal layer; II = indirect layer
Fig. 40.9: Totò’s score
While saying Hai ragione, hai ragione, scusa tanto! (‘You are right, you are right, I apol-
ogise a lot!’), Totò raises his open hands, palms forward, before his breast: a gesture that
conveys a literal meaning “I leave you alone” and an indirect idiomatic meaning
“I apologize,” the former providing information on the world with an additive function,
the latter giving information on the sender’s mind with a repetitive function. He also
lowers his gaze to express shame, showing he repents (indirect meaning), and pulls
inner parts of his eyebrows up expressing sadness, hence (indirect meaning) a request
for forgiveness. For both gaze items, the first (“ashamed,” “sad”) is an additive meaning
of emotion and the second (“I repent,” “I apologize”) a performative that repeats the
meaning of simultaneous words, with all four being information on the sender’s mind.
Since showing sorry is semantically a part of the act of apologizing, its indirect meaning
is “I apologize,” a repetition against the words scusa tanto = “I apologize.”
While saying “Mi sono lasciato trasportare dall’imbulso della carne!” (‘I let myself be
carried by the impetus, by the impulse of flesh!’, with the word imbulso uttered with a
regional accent), Totò closes his fists with strong muscular tension, and frowns. Both sig-
nals have a literal meaning of “effort” and “striving hard,” information on a physical
sensation of the sender with an additive function with respect to speech. But the indi-
rect meaning, drawn through the rhetorical figure of metonymy, is “impetus” (impetus
is something you have to strive hard to oppose it): information on the world having a
repetitive function with respect to the word impeto.
As shown in Fig. 40.9, in the two-layers score meaning, meaning type and function of
each communicative signal are interpreted and classified in different ways, depending
on whether the literal or the indirect meaning is represented.
The “musical score” and other annotation schemes of multimodal communication
have been devised to analyze several types of interaction – political speech and political
talk show, judicial debate, teacher-pupil interaction, oral examination, speech-therapy,
job interview, comic and dramatic fiction, pianist performance, orchestra and choir
conduction – and proved particularly helpful in analyzing sophisticated aspects of
communication, like irony, comic recitation and so forth.
4.4. Irony
Irony in conversation typically takes advantage of multimodality. As every rhetorical
figure, irony is a case or “recitation” (Vincent and Castelfranchi 1981; Castelfranchi
and Poggi 1998), that is, of revealed deception, where the intended meaning is different
from the apparent, but this difference is revealed: the sender communicates something
different from what s/he thinks, but wants the addressee to understand it is not what
s/he thinks. In irony, the intended meaning is opposite to the apparent, and to under-
stand it, the addressee must first be alerted to irony, i.e., understand that the real
meaning is different from its appearance, then understand the specific non-literal
meaning of the communicative act (Attardo et al. 2003). Sometimes the addressee
is alerted to irony simply because the literal meaning is implausible compared to con-
text and previous knowledge, but in other cases the sender may alert the addressee in
one of two ways:
(i) through meta-communication: another communicative act about the ironic one,
performed through a “dedicated” signal that specifically means “I am being ironic”:
a wink, a “blank face” – an apparently and unnaturally inexpressive face – or an
a-symmetrical smile;
(ii) through para-communication: another communicative act besides the ironic one,
performed either in sequence in the same modality or simultaneously through
other modalities that utterly contradicts it (for instance, a bored face while uttering
an enthusiastic utterance).
Para-communication of the “irony alert” often occurs in the “Clean Hands trial,” a trial
of very high political importance where the accuser Antonio Di Pietro uses irony with
accused and witnesses, to demonstrate they are not credible (Poggi, Cavicchio, and
Magno Caldognetto 2008; Poggi 2010a).
Di Pietro is trying to demonstrate that Paolo Cirino Pomicino received 5 billions Lire
from Dr. Ferruzzi, an industry owner, for political elections. Cirino Pomicino says that
the day after the elections he received Ferruzzi at his home at 7.30 in the morning, and
that he did so just because seven months before he had promised to Sama, Ferruzzi’s
mediator, to meet Ferruzzi. Di Pietro ironically remarks it is quite strange that Cirino
Pomicino received Ferruzzi at his home at that time in the morning, and also strange
that this was only because, seven months before, he had been committed to meet
Ferruzzi, not because he was to thank Ferruzzi for granting 5 billion for the elections!
He does so to argue that Pomicino did know he was doing something illicit.
(2) Di Pietro says: “Il vero impegno che aveva preso questo signore era di ringraziare,
di sdebitarsi di un impegno che aveva preso col dottor Sama a giugno di sette mesi
prima”
(‘The true commitment of this gentleman was to thank, to pay off his debt of
something he had been committed to with Dr. Sama in June seven months
before’).
While uttering “a giugno” (in June), Di Pietro with both hands depicts an oblong shape
up in the air, and gazes up as if looking up in the sky: an iconic gesture and a deictic gaze
that together refer to something like a cloud, then a metaphor of “vagueness.” This ges-
ture of vagueness contrasts with the straight idea of a commitment (“impegno”), para-
communicating a meaning that contrasts with the words used, and thus signaling irony.
5. Multimodality in school, music and politics

The analysis of multimodality, applied to teachers’ communicative style, allows for tea-
chers to be distinguished according to their conveying more metadiscursive vs. referen-
tial information, or being more concentrated on their own identity (Merola and Poggi
2004). It also shows the devices through which the use of iconic gestures help compre-
hension and memory, especially in small children (Merola 2009). Analysis of face and
gaze communication in pianist performance (Poggi 2006a) allowed movements to be
distinguished that simply help the technical motor actions from those expressing the
pianist’s emotions – felt due to performance or enacted to be conveyed in music – or
mental states of concentration and caution in playing; the idiosyncratic facial lexicon
of the pianist was written, while showing that the number and type of body movements
varies with the musical structure of the piece to play (main theme vs. variation) and
with the social situation of performance (rehearsal vs. concert). Multimodal analysis
of orchestra and choir conduction (Poggi 2002a, 2010b) shows if hand and face signals
convey emotions to express in music, or the meanings of the words to sing, or informa-
tion concerning music parameters – timbre, rhythm, melody, musical structure – or if
they transmit feedback about singers’ or players’ performance.
By annotating political persuasive discourse, gesture and gaze were defined as
persuasive when their meanings are semantically linked to information concerning
certainty, importance, evaluation, emotion and the sender’s competence and benevolence.
These types of information contribute to persuasion in that they implement the three
Aristotelian strategies of logos (logical argumentation), ethos (the sender’s reliability)
and pathos (the appeal to the addressee’s emotions); so classifying gesture and gaze
items in terms of this typology, and comparing their frequency, allows one to distinguish
how persuasive different styles of political discourse are and how much they respec-
tively employ the pathos, ethos or logos strategy (Poggi 2005; Poggi and Pelachaud
2008; Poggi and Vincze 2008).
6. References
Alibali, Martha, Sotaro Kita and Amanda J. Young 2000. Gesture and the process of speech
production: We think, therefore we gesture. Language and Cognitive Processes 15:
593–613.
Attardo, Salvatore, Jodi Eisterhold, Jennifer Hay and Isabella Poggi 2003. Multimodal markers of
irony and sarcasm. Humor. International Journal of Humor Research 16(2): 243–260.
Bernardis, Paolo, Elena Salillas and Nicoletta Caramelli 2008. Behavioural and neurophysiologi-
cal evidence of semantic interaction between iconic gestures and words. Cognitive Neuropsy-
chology 25(7–8): 1114–1128.
Castelfranchi, Cristiano and Domenico Parisi 1980. Linguaggio, Conoscenze e Scopi. Bologna: Il
Mulino.
Castelfranchi, Cristiano and Isabella Poggi 1998. Bugie, Finzioni, Sotterfugi. Per una Scienza del-
l’inganno. Roma: Carocci.
Technology Press.
Conte, Rosaria and Cristiano Castelfranchi 1995. Cognitive and Social Action. London: University
College London Press.
Eco, Umberto 1975. Trattato di Semiotica Generale. Milan: Bompiani. Eng.Transl. A Theory of
Semiotics. Bloomington: Indiana University Press.
Ekman, Paul and Wallace V. Friesen 1982. Felt, false, and miserable smiles. Journal of Nonverbal
Behavior 6(4): 238–252.
Hartmann, Björn, Maurizio Mancini and Catherine Pelachaud 2002. Formational parameters and
adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. Computer Anima-
tion 2002: 111–119.
Klima, Edward and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard Uni-
versity Press.
Krauss, Robert M. 1998. Why do we gesture when we speak? Current Directions in Psychological
Science 7: 54–60.
Magno Caldognetto, Emanuela, Isabella Poggi, Piero Cosi, Federica Cavicchio and Giorgio Mer-
ola 2004. Multimodal score: An ANVIL based annotation scheme for multimodal audio-video
analysis. Workshop on Multimodal Corpora LREC 2004. Centro Cultural de Belem, Lisboa,
Portugal, 25th May 2004.
Mancini, Maurzio and Catherine Pelachaud 2009. Implementing distinctive behavior for con-
versational agents. In: Miguel Sales Dias, Sylvie Gibet, Marcelo M. Wanderlyey and Rafael
Bastos (eds.), Gesture-Based Human-Computer Interaction and Simulation 163–174. Berlin/
Heidelberg: Springer.
McNeill, David 1992. Hand and Mind. Chicago: University of Chicago Press.
Merola, Giorgio 2009. The effects of the gesture viewpoint on the students’ memoy of words and stor-
ies. In: Miguel Sales Dias, Sylvie Gibet, Marcelo M. Wanderley and Rafael Bastos (eds.), Gesture-
Based Human-Computer Interaction and Simulation, 272–281. Berlin/Heidelberg: Springer.
Merola, Giorgio and Isabella Poggi 2004. Multimodality and gestures in the teacher’s communi-
cation. In: Antonio Camurri and Gualtiero Volpe (eds.), Gesture-Based Communication in
Human-Computer Interaction. Proceedings of the 5th Gesture Workshop, GW 2003, Genova,
Italy, April 2003, 101–111. Berlin: Springer.
Miceli, Maria and Cristiano Castelfranchi 1992. La Cognizione del Valore. Milan: Franco Angeli.
Miceli, Maria and Cristiano Castelfranchi 1995. Le Difese della Mente. Rome: Nuova Italia
Scientifica.
Miceli, Maria and Cristiano Castelfranchi 2000. The role of evaluation in cognition and social
interaction. In: Kerstin Dautenhahn (ed.), Human Cognition and Social Agent Technology, 225–
Miller, George A., Eugene Galanter and Karl A. Pribram 1960. Plans and the Structure of Behav-
ior. New York: Holt, Rinehart and Winston.
Morris, Desmond 1977. Manwatching. London: Jonathan Cape.
Parisi, Domenico and Francesco Antinucci 1973. Elementi di Grammatica. Turin: Boringhieri.
English Transl. Fundamentals of Grammar. New York: Academic Press.
Parisi, Domenico and Cristiano Castelfranchi 1975. Discourse as a hierarchy of goals. Working
Papers, 54–55. Urbino: Centro Internazionale di Semiotica e Linguistica.
Peirce, Charles Sanders 1935. Collected Papers. Cambridge: Cambridge University Press.
Pelachaud, Catherine and Isabella Poggi 1998. Talking faces that communicate by eyes. In: Serge
Santi, Isabelle Guaitella, Christian Cavé and Gabrielle Konopczynski (eds.), Oralité et Gestua-
lité, Communication Multimodale, Interaction, 211–216. Paris: L’Harmattan.
Pelachaud, Catherine and Isabella Poggi (eds.) 2001. Multimodal Communication and Context in
Embodied Agents. Proceedings of the Workshop W7 at the 5th International Conference on
Autonomous Agents, Montreal, Canada, 29 May 2001.
Poggi, Isabella 2002a. The lexicon of the conductor’s face. In: Paul McKevitt, Seán O’ Nuallàin
and Conn Mulvihill (eds.), Language,Vision, and Music. Selected Papers from the 8th Interna-
tional Workshop on the Cognitive Science of Natural Language Processing, Galway, 1999, 271–
Poggi, Isabella 2002b. Symbolic gestures. The case of the Italian gestionary. Gesture 2(1): 71–98.
Poggi, Isabella 2005. The goals of persuasion. Pragmatics and Cognition 13: 298–335.
Poggi, Isabella 2006a. Body and mind in the pianist’s performance. In: Mario Baroni, Anna Rita
Addessi, Roberto Caterina and Marco Costa (eds.), Proceedings of the ICMPC 9 (International
Conference on Music Perception and Cognition), Bologna, August, 22–26, 2006: 1044–1051.
Poggi, Isabella 2006b. Le Parole del Corpo. Introduzione alla Comunicazione Multimodale. Rome:
Carocci.
Poggi, Isabella 2007. Mind, Hands, Face and Body. A Goal and Belief View of Multimodal Com-
munication. Berlin: Weidler.
Poggi, Isabella (ed.) 2008. La Mente del Cuore. Le Emozioni nel Lavoro, nella Scuola, nella Vita.
Rome: Armando.
Poggi, Isabella 2011. Irony, humour ad ridicule. Power, image and judical rhetoric in an Italian
political trial. In: Robert Vion, Robert Giacomi and Claude Vargas (eds.), La corporalité du
langage : Multimodalité, discours et écriture, Hommage à Claire Maury-Rouan. Aix-en Provence:
Publications de L’Université de Provence.
Poggi Isabella 2011. Music and leadership: the Choir Conductor’s multimodal communication.
In Gale Stam and Mika Ishino (eds.), Integrating Gestures. The interdisciplinary nature of ges-
tures, 341–353. Amsterdam: John Benjamins.
Poggi, Isabella, Federica Cavacchio and Emanuela Magno Caldagnetto 2007. Irony in a judi-
cial debate: analyzing the subtleties of irony while testing the subtleties of an annotation
scheme. Language Resources and Evaluation 41(3–4): 215–232.
Poggi, Isabella and Francesca D’Errico 2009. Social signals and the action-cognition loop. The case
of overhelp and evaluation. Proceedings of the 1St IEEE International Workshop on Social
Signal Processing, Amsterdam, September 13, 2009.
Poggi, Isabella, Francesca D’Errico and Alessia Spagnolo 2010. The embodied morphemes of
gaze. In: Stefan Kopp and Ipke Wachsmuth (eds.), Gesture in Embodied Communication
and Human-Computer Interaction, GW 2009, LNAI 5934, 34–46. Berlin: Springer.
Poggi, Isabella and Emanuela Magno Caldognetto 1997. Mani che Parlano. Gesti e Psicologia della
Comunicazione. Padua, Italy: Unipress.
Poggi, Isabella, Emanuela Magno Caldognetto, Federica Cavicchio, Florida Nicolai and Silvia M.
Sottofattori 2007. Planning and generation of multimodal communicative acts. Poster pre-
sented at the III International ISGS Conference, Evanston (Ill.), 15–18 June 2007.
Poggi, Isabella and Catherine Pelachaud 2008. Persuasion and the expressivity of gestures in hu-
mans and machines. In: Ipke Wachsmuth, Manuela Lenzen and Gunther Knoblich (eds.), Em-
bodied Communication in Humans and Machines, 391–424. Oxford: Oxford University Press.
Poggi, Isabella, Catherine Pelachaud and Berardina De Carolis 2001. To display or not to display?
Towards the architecture of a reflexive agent. Proceedings of the 2nd Workshop on Attitude,
Personality and Emotions in User-adapted Interaction. User Modeling 2001, Sonthofen (Ger-
many), 13–17 July 2001.
Poggi, Isabella and Emanuela Roberto 2007. Towards the lexicon of gaze. An empirical study. In:
Andrzej Zuczkowski (ed.), Relations and Structures. Proceedings of the 15th International Sci-
entific Convention of the Society for Gestalt Theory and its Application, Macerata, May 24–27.
Poggi, Isabella and Laura Vincze 2008. The persuasive import of gesture and gaze. Proceeding on
the Workshop on Multimodal Corpora, LREC, Marrakech, 46–51.
Posner, Roland and Massimo Serenari 2001. Il grado zero della gestualità: dalla funzione pratica a
quella simbolica-alcuni esempi dal Dizionario berlinese dei gesti quotidiani. In: Emanuela
Magno Caldognetto and Piero Cosi (eds.), Atti delle 11e Giornate del Gruppo di Fonetica Sper-
imentale: Multimodalità e Multimedialità della Comunicazione. Padova, November 29–Decem-
ber 1, 2000, 81–88. Padua, Italy: Unipress.
Rector, Monica, Isabella Poggi and Nadine Trigo (eds.) 2003. Gestures. Meaning and Use. Porto:
Universidade Fernando Pessoa.
Rimé, Bernard and Loris Schiaratura 1991. Gesture and speech. In: Robert Feldman and Bernard
Rimé (eds.), Fundamentals of Nonverbal Behavior, 239–284. New York: Cambridge University
Press.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Paris: Payot.
Serenari, Massimo 2003. Examples from the Berlin dictionary of everyday gestures. In: Monica
Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures. Meaning and Use, 15–32. Porto: Uni-
versidade Fernando Pessoa.
Stokoe, William C. 1978. Sign Language Structure: An Outline of the Communicative Systems of
the American Deaf. Silver Spring, MD: Linstock Press.
Vincent, Jocelyne and Cristiano Castelfranchi 1981. On the art of deception: How to lie while say-
ing the truth. In: Herman Parret, Marina Sbisà and Jef Verschueren (eds.), Possibilities and
Limitations of Pragmatics, 749–778. Amsterdam: John Benjamins.
Volterra, Virginia 1987. LIS. La Lingua Italiana dei Segni. Bologna: Il Mulino.
Isabella Poggi, Rome (Italy)

41. Nonverbal communication in a functional

pragmatic perspective
1. Functional pragmatics
2. The senses and their communicative use
3. The analysis of nonverbal communication: Conceptualisations and misunderstandings
4. Means of communication of nonverbal communication
5. The systematics of nonverbal communication
6. Concordance versus discordance of verbal and nonverbal communication
7. Basic analytic notions – movement, change, expression
8. Analysing nonverbal communication
9. The position of nonverbal communication in the systematics of linguistic action
10. Transcription of nonverbal communication
11. References
Abstract
This chapter presents an overview of nonverbal communication from a functional prag-
matic perspective. Starting from a short discussion of the notion of multimodality within
functional pragmatics, the chapter focuses on the means of nonverbal communication
and offers a systematics for nonverbal communication, basic analytic notions, its relation
to linguistic action, and its transcription.
1. Functional pragmatics
1.1. Form, function, and linguistic action
Functional pragmatics (FP) is a theory of linguistic action (Ehlich 2007a; Redder 2008;
Rehbein and Kameyama 2003; Thielmann 2013). Functional pragmatics conceives of
linguistic action as a combination of action purposes and linguistic forms that can
be used to achieve these purposes. Functional pragmatics – analysis of linguistic action
is thus based on a close nexus of function and form. Languages are seen as elaborated
systems for the realisation of the interactors’ purposes. Languages are resources for
the interactors’ activities. Linguistic interaction is the most important type of human
communication.
1.2. Action vs. behavior

Humans differ from other species particularly by purposeful action. Consequently, the
category of “purpose” stands right in the centre of human action. The historicity of ac-
tions of human beings manifests itself in the category of purpose. Purpose gives actions
a past, i.e. previous experience, and a future. Thus, anticipation of action results allows
for dissociation from the immediacy of mere behavior as to be seen in animals. This dif-
ference is abolished in analyses of human communication that conceptualise human
communication simply as behavior. Such analyses make the specifically human ele-
ments in human communication disappear. As a subsystem of human action, human
communication participates in the general characteristics of action.
41. Nonverbal communication in a functional pragmatic perspective 649
1.3. Reflected empiricism

A central aspect in the work of functional pragmatics is the role of empirical linguistic
data. From its beginning in the 1970s, functional pragmatics has collected oral and writ-
ten data and systematically subjected them to linguistic analysis. The data collection
process, in turn, is dependent upon functional pragmatics theory formation. This meth-
odology is characterised as “reflected empirical approach” and shares some aspects with
grounded theory.
1.4. Multimodality and the functional pragmatics-analysis

of nonverbal communication
In its basic form, oral communication is face-to-face communication. Linguistic data de-
rived from such communication is therefore multimodal data, i.e. data in which verbal,
nonverbal and paralinguistic communicative factors co-occur. From its beginnings,
functional pragmatics has been observant of this fact and developed transcription sys-
tems in accordance with these requirements.
In their 1982 book on eye communication, Ehlich and Rehbein develop, in an exem-
plary way, the theoretical foundations and a research programme for the analysis of the
nonverbal aspects of communication and the methodological desiderata involved. The
theory and the methodology outlined in this book have so far been applied in some stu-
dies (see Berkemeier 2003, 2006; Bührig 2005; Ehlich and Rehbein 1977; Grasser and
Redder 2011; Hanna 2003). However, the research programme presented there de-
serves further application in order to provide a comprehensive functional pragmatics
contribution towards the analysis of nonverbal communication (NVC).
2. The senses and their communicative use

Primarily, languages are phonic forms of communication (the term “phonic” is intended
as an architerm for “phonetic” and “phonological”). Languages make use of the acous-
tic domain out of the five dimensions of reality accessible to the senses. A different line
of language formation is the use of the visual dimension in sign languages (cf. Wrobel
2007). Writing makes use of the visual domain as well, but its dependence on spoken
languages gives it a characteristic altogether different from sign languages.
The human species makes use of the acoustic and the visual dimensions of reality for
its communicative needs. The other dimensions are used in a subsidiary fashion at
most – if they are used at all. In contrast to this, animal communication avails itself
of the olfactory dimension across the species.
The acoustic and the visual dimensions are used by higher developed and socially
organised species. By making languages emerge, the human species has developed a
communication system of a high degree of differentiation and of far reaching usability.
As linguistic communication is primarily phonic, the visual dimension of human
communication is of relatively minor importance. Nevertheless, this dimension contains
its own possibilities for communication. This holds true in the case of early biographical
phases in human communication, i.e. before language has been acquired by a child,
and it holds true in later phases when visual communication occurs in combination
with verbal language. From a systematic point of view these differences make visual
communication a diffuse type of phenomenon – with important consequences for

research into nonverbal communication.
3. The analysis of nonverbal communication: Conceptualisations

and misunderstandings
The term “nonverbal communication” is a term that hides the diffusivity of the com-
municative structures of nonverbal communication. The negative definition with the
element prefix “non-” hints at the difficulties in conceptualisation. The analysis of
nonverbal communication requires a systematic distinction of the various forms of
appearance that are characteristic for nonverbal communication.
For the analysis of sound-based languages alphabetical writing systems offer pre-
theoretical procedures that allow for good access to a variety of the languages’ charac-
teristics. There is no comparable set of analytical pre-categorisations with regard to
nonverbal communication.
The pre-linguistic communication aspect of nonverbal communication refers to com-
municative procedures that are common to the human species, such as laughing and
weeping. Laughing and weeping make psycho-physical states and events visible and
thus communicable to others. The elementary importance of this preverbal communi-
cative sub-system gave rise to conceptual transfers with regard to nonverbal communi-
cation as a whole. This happens in Charles Darwin’s line of research (cf. Darwin 1872),
who considered nonverbal communication in a purely biologist manner. Hand in hand
with such a view goes the notion of nonverbal communication as a universal type of com-
munication free from the dilemmas of the language-specific features of verbal communi-
cation. A great number of ethnographic research results have falsified this universalist
view. Nevertheless this universalist view continues to influence theory formation around
nonverbal communication. The means of expression used in nonverbal communication
are generic – their use, however is language or culture specific. And also the expressions
of psychological states and events by physical means of expression, though generic, are
culturally transformed and constitute a subset of nonverbal communicative structures.
Their involuntariness is subsumed under the possibilities of a voluntary use. In the pro-
cess of becoming communicative the biological structures adopt linguistic characteristics.
4. Means of communication of nonverbal communication

The communicative employment of the senses endows them with a specific double char-
acter. Neither the acoustic sense nor the visual sense are primarily developed for com-
municative purposes. This double character affects their primary functions in various
and different ways.
4.1. Body parts and their visibility

The body is “body” in the original sense of the term, namely an extended thing
(Descartes: res extensa). As such it is visible in its entirety. The different parts of the
body are equally visible. The latter are usable for communicative purposes in different
ways and to different degrees. For nonverbal communication purposes it is useful to dif-
ferentiate systematically between communication by use of the body in its entirety, by the
use of the face, and by the use of hands and fingers. Other parts of the body, though they
may be used also for communicative purposes, seem to be less apt for this application, all
the more since human beings hide parts of the body and, by contrast, disclose other. This
causes further restrictions for the usability of the body and its parts in communication.
4.2. Proxemics – mimics – gesture

The reference to the body as a whole, to the face, and to the hands and fingers identifies
three different domains of nonverbal communication, namely proxemics, mimics and ges-
ture. A theory of proxemics analyses the relationship of closeness and distance of the bodies
of communication partners as communicative phenomena. Mimics refers to the intensive
modification potential of facial expression. The extension of hands and fingers in a space
close to the body as a whole allows for their use for nonverbal communication purposes.
Each of these three domains displays its genuine features and its genuine restrictions.
These features and restrictions acquire communicative quality with reference to the
expressive potentials inherent in them.
4.3. Repertoires of expression of the body and its parts and their
aptitude for communicative purposes
The double character of visual perceptibility of the body in communicative contexts is
determined by the specific primary functions of the body and its parts being a specific
potential and a specific restriction. As compared with the 360-degree perceptibility on
the acoustic dimension, the visual dimension of human beings is characterised by a lim-
ited sight angle of about 180plus degrees – everything that happens behind the body
being invisible. Turning the body turns into invisibility everything that had been visible
before in order to make visible what has not been visible before. The basic front-rear-
dichotomy that characterises the human body (as it characterises many other animals) is
a strong restriction for the use of the body as a means of communication. On the other
hand, the possibility of turning the body towards the other offers a communicative
potential that is largely used for nonverbal communication. Face-to-face communica-
tion genuinely is communication with the front of the body turned to the front of the
body of the other. The management of proximity and distance is of basic importance
in communication. It is part of the elementary cooperation without which communication
is impossible (cf. Ehlich 2007b).
The face, the eyes and the mouth are the essential means of expression that are used
for mimetic communication. Forehead and chin are more restricted with regard to their
usability for communicative purposes. Nose and ears have only a small repertoire of
movement that may be used for communicative purposes.
The movability of arms and hands extends the space that is occupied by the body as
res extensa into a virtual space that is limited by the maximal extension that arms or
hands can reach. Since arms and hands are integrated into the front-rear type of orien-
tation the visual domain can easily be transferred into the haptic dimension (touch, cf.
Dreischer 2001).
Other parts of the body offer only few possibilities of expression though they may
have them (breast, belly, hips, lower extremities). Their communicative use is strongly
restricted in different cultures.
5. The systematics of nonverbal communication

Within human communication, nonverbal communication forms a specific sub-system.
Compared to the importance of acoustically-mediated communication and its visual
transformation, i.e. writing, nonverbal communication appears to be less relevant.
However, nonverbal communication can, so to speak, embrace and support verbal
communication and thus significantly influence its reception.
The specifics of nonverbal communication as a sub-system of human communication
make it necessary to systematically differentiate different types of nonverbal com-
munication. All of these types avail themselves of the means of expression described
above, and their categorisation has to be derived from their relationship to verbal
communication.
Probably the most frequent form of nonverbal communication is concomitant, i.e.
nonverbal communication accompanies verbal communication. In contrast to this,
there is independent nonverbal communication, i.e. nonverbal communication that
does not rely on or support verbal communication in order to be understood.
Within concomitant nonverbal communication there is the type of neutral nonverbal
communication, i.e. nonverbal communication that appears to be occurring as a matter
of course and does not require special attention on the part of the listener (typical in-
stances of neutral nonverbal communication are gestures accentuating the prosody of
speech). Only the absence of neutral nonverbal communication calls for attention –
resulting in a type of nonverbal communication that is definitely not neutral.
Because of the multimodality of communication, concomitant nonverbal communi-
cation may also be non-neutral (“eigenlinig”) – a typical example of non-neutral nonverbal
communication is winking at someone while speaking.
Independent nonverbal communication comprises at least two types: presentational
and ostentatious nonverbal communication. An example for presentational nonverbal
communication is the pointing gesture as a non-verbal answer to the question where
something is located (if the gesture were accompanied by the deictic there, this would
be an instance of non-neutral nonverbal communication). The focus of research is very
much on independent nonverbal communication, as the characteristics and specialities
of nonverbal communication are the most obvious here.
The second type of independent nonverbal communication is ostentatious nonverbal
communication. Ostentatious nonverbal communication is hyper-marked, so to speak,
and, by drawing on the interactants’ nonverbal communication-knowledge, fulfils
specific communicative purposes.
nonverbal communication
concomitant NVC independent NVC
neutral NVC non-neutral NVC presentational NVC ostentatious NVC
Fig. 41.1: Systematics of NVC

From the right to the left, the salience of nonverbal phenomena in Fig. 41.1 decreases,
while, at the same rate, the difficulty of analysis increases. Neutral concomitant nonver-
bal communication thus presents the greatest difficulties to analysis; ostentatious non-
verbal communication is quite easy to deal with, but its analysis has only little to offer in
comparison.
6. Concordance versus discordance of verbal

and nonverbal communication
Concomitant nonverbal communication is a part of ordinary oral communication.
Hence, communicative action units are complex in that they consist of verbal commu-
nicative action units, nonverbal communication and paralinguistic phenomena. Vice
versa: The communicative purpose of communicative action units manifests itself in
these dimensions – dimensions that are usually concordant. However, it is possible
that these dimensions of a communicative unit are in disagreement with each other
and thus lead to an almost total inconsistency. Such inconsistencies, i.e. discordant com-
municative action units, can be used consciously to various degrees, for instance in
order to show irony by discrediting an assertive utterance with respect to its illocution-
ary force or its truth value. In the context of communicative pathology, discordance is
frequently responsible for “double bind” situations.
The temporal relationship between nonverbal communication, verbal communica-
tion, and paralinguistic communication is difficult to determine, but quite possibly of
great communicative relevance. Assuming synchronicity as the basic mode, nonverbal
communication may nevertheless be even significantly ahead of verbal communication.
7. Basic analytic notions – movement, change, expression

In principle, the body and its parts are moveable. The fixation of movement is a border
case of movement and can be understood as the “position” of the body or the body
part. The body and the various body parts have specific movement potentials. These po-
tentials are derived from the functions of each body part, basically from its physical func-
tions. The use of the body or its parts for communicative purposes subsumes these
movement potentials under a second domain of purposes. Specific forms of expression
are derived from the movement potentials as their physical basis. They are combined
to form a repertoire of expression. Since expression is bound to change, the possibilities
of movement are optimally qualified to fulfil this task. Movements have characteristic
contours. From the repertoires of expression expressive units are shaped using the move-
ment potentials in a way that is communicatively recognisable. These expressive units can
be described as structures that are realised for the expressive purpose by use of the move-
ment potentials as their material basis. The development of the repertoires of expression
exhibits a differentiated mix of anthropological features on the one hand, and group or
culture specific selections on the other hand. The various nonverbal communication
types that constitute the systematics of nonverbal communication make different use
of the repertoires of expression. The single expressive unit can only be understood by ref-
erence to its relationship to the systematics as a whole – in the same way as the single
sound or phonemic unit in the sound system of a language can only be identified by
reference to this system and by its interrelationship with the other phonemic units.
A set of every day linguistic terms can be used for the analysis of the repertoires of
expressions and of their expressive units, as, e.g., “winking” or “waving”. They are com-
bined to form a semantic field of its own. Such everyday terms can be taken as a starting
point of analysis. But they always include the danger of misleading the analysis and to
miss the systematic structure of the communicative units and their inner connexion. In
many cases, everyday terms are applicable in everyday communication because of their
indeterminateness and lack of precision. Scientific analysis faces the task to aim at pre-
cision with reference to the phenomena under investigation and their systematic inter-
relationship. A simple use or transposition of everyday terminology will not do to
achieve adequate results. The history of nonverbal communication research is full of ex-
amples of quid pro quo’s and full of ambivalence caused by uncritical application of
everyday terms to the analysis.
Expressive units are understandable and obligatory for the members of a communi-
cative group. Their use follows the communicative purposes of the communicative in-
teractors. Hence, expressive units are more than indications, indexes or telltale signs.
In communicative interaction, it is not the question of inferring intrapsychological states
of affairs of the interlocutor by “decoding” nonverbal communication units which might
disclose what otherwise would be hidden. On the contrary, what is at stake are acts of
understanding that are structured in a way similar to those of verbal communication.
However, because of the limitations of the movement potentials nonverbal communica-
tion cannot achieve the same degree of differentiation as verbal communication. Further-
more, the ties to face-to-face communication situations are difficult to cut for nonverbal
communication. This has drastic consequences for the extension of nonverbal communi-
cation as a communicative subsystem. The “double articulation” characteristic of verbal
languages (Martinet 1960) is not applicable to nonverbal communication. The single
expressive units are characterised by an inner complexity that is difficult to dissolve.
8. Analysing nonverbal communication

The analysis of the transformation of movement potentials into repertoires of expres-
sion for nonverbal communication purposes requires a series of analytical steps (cf.
Heilmann 2005):
(i) The movement possibilities of the individual movement potential have to be

identified.
(ii) Their functional transposition into expressive units has to be reconstructed.
(iii) The communicative function of the single expressive unit has to be determined by
reference to the respective nonverbal communication subsystem.
(iv) The functional qualification of the repertoire of expression as a whole and the
functional quality of the single expressive units have to be identified and described.
(v) Their possible fields of application in specific communicative constellations have to
be identified.
These analytical tasks necessitate a long lasting research process that is structured in a
hermeneutic way. This process demands extensive empirical work and, at the same time,
the reflection of this work’s theoretic foundations. It is not likely that the research pro-
cess can be significantly shortened, for instance by means of recurring to alleged
universals. The research process most probably will yield fragmentary results for quite a
time. These results may be produced with respect to the various analytical steps named
above, such as descriptions of movement potentials or of single repertoires of expres-
sion or single expressive units, or they may refer to scrutinised investigation into the
systematics or into parts of the systematics. The achievement of a consistent conceptua-
lisation of nonverbal communication as a subsystem of purposeful human action being
of fundamental importance, it will be essential that none of the above mentioned as-
pects of analysis is isolated against the others or is proclaimed the only one of relevance.
As an example for a comprehensive analysis that demonstrates this methodological
procedure in detail the above mentioned book Ehlich and Rehbein (1982) gives an
account of eye communication and of one specific expressive unit of this communicative
sub-domain, a unit that is called “deliberative gaze avoiding” (deliberatives Wegblicken).
The functional pragmatics research programme with regard to nonverbal communica-
tion comprises the quantitative extension of similar analyses that deliver precise and de-
tailed descriptions of expressive units of nonverbal communication. One basic task of
these analyses is a critical reconstruction of the results that have been reached so far
in the rich literature on nonverbal communication by re-interpreting these results in
terms of the above mentioned categorical framework.
Research into factual communication, which has increased steadily during the last
40 years, aims at quick and comprehensive results. However, such results cannot be
achieved by short cuts and premature generalisations or universalisations. The func-
tional pragmatics research programme for nonverbal communication is aware of the
necessity of extensive efforts. Anticipations and preliminary categories will be indis-
pensable during the analytical process – as, for instance, descriptions of nonverbal com-
munication have to be entered into transcriptions of spoken language. It will be
necessary, however, to keep in mind the preliminary and restricted character of such
categories. Such preliminary categories can consist of everyday terms or of descriptions
of movement repertoires or their parts at a time when the reconstruction of their pur-
pose has not yet been achieved. This is why this research process requires great meth-
odological care and attention. However, wherever a precise description has been
achieved, its use for further development of analysis will be evident.
9. The position of nonverbal communication in the systematics

of linguistic action
The most direct access to determine the position of nonverbal communication in the
overall context of linguistic action and its forms is the analysis of presentational inde-
pendent nonverbal communication units. These units are used to realise a specific
type of linguistic procedures. Linguistic procedures are linguistic units that underlie lin-
guistic acts (illocutionary act, propositional act, utterance act). Five types of procedures
can be identified (cf. Ehlich 2007a; Thielmann 2013):
(i) symbolic procedures (e.g., by means of denotative nouns)

(ii) deictic procedures (e.g., by means of demonstratives)
(iii) incitive procedures (e.g., by means of interjections)
(iv) expressive procedures (e.g., by paralinguistic phenomena such as hyper accentuation)
(v) operative procedures (e.g., by means of articles or anaphors)
Each of these procedure types is part of a functional nexus, a linguistic field (in the
sense of Bühler 1934). Though most of the procedures are integrated into linguistic
acts, when a linguistic action is performed, a small group of procedures can stand alone
and can sufficiently execute a communicative purpose without further integration into
higher order linguistic acts or actions. The isolated articulation of the deictic “here” is
one example. Its utterance is fully sufficient to fulfil the purpose of a deictic procedure,
namely to orientate the hearer in his/her immediate environment. Such self-sufficient pro-
cedures are primary candidates for the use of a presentational independent nonverbal
communicative unit, namely the gesture of pointing. In such a case, the hearer’s (or “view-
er’s”, as it were) orientation can be accomplished completely by means of the nonverbal
communication unit without any accompanying verbal utterance. Self sufficiency is also
possible in the case of incitive procedures. “Waving” can function sufficiently to make
another person turn towards the interactor just in the same way as a loud “hello” would do.
It is much more difficult to identify the position of concomitant neutral nonverbal
communication units in the overall systematics of linguistic action. In the case of
these nonverbal communication units, it is possible that expressive procedural aspects
are specifically combined with operative procedures. Expressive procedures serve the
purpose to emotionally align the speakers with their communicative interactors. In
European languages the usual mode of expression is paralinguistic in nature, i.e. expres-
sive procedures occur in procedural combination with other procedures, esp. with
symbolic procedures. Such a function can also be realised by means of a nonverbal
communication unit that co-occurs with the verbal utterance.
Operative procedures are used to process the linguistic activities themselves. Assis-
tance in the structuring of a verbal utterance by means of gesture (cf. Müller 1998)
may serve one of the purposes of operative procedures. The acquisition of these
communicative skills obviously starts very early in childhood (cf. Leimbrink 2010).
The other types of nonverbal communication from Fig. 41.1 need specific analyses to
determine their functions in the framework of the systematics of linguistic actions.
10. Transcription of nonverbal communication

Nonverbal communicative units stand in complex temporal proportions with verbal and
with paralinguistic units. Transcription of face-to-face communication data needs a sys-
tem that represents this temporal relationship in an adequate way. Extreme preciseness
is necessary to identify the microchronic divergencies that may occur in interaction
in the case of discordance of verbal and nonverbal communication with concomitant
non-neutral nonverbal communication, causing strong communicative consequences.
The transcription system HIAT (Halbinterpretative Arbeitstranskriptionen, half-
interpretative working transcriptions) in its version HIAT 2 that was developed in
the late 1970s offers a systematic format for the representation of nonverbal communi-
cation by the use of a score notation system (Ehlich 1993; Ehlich and Rehbein 1981a,
1981b; Redder 2001; Rehbein et al. 2004). The system offers specific transcription lines
in the score for the various body parts that are involved in nonverbal communication.
The descriptive notation of nonverbal communication can either make use of conso-
lidated terms for specific nonverbal communication units, or it can give precise descrip-
tions of the observable occurrences of the movement and the movement contours of the
body parts that are involved. Progressing to more and more precisely identified
expressive units in the terms of linguistic action, transcription work will become more
and more reliable and usable for linguistic analysis.
As has been said above, nonverbal communication, seen in a functional pragmatic
perspective, is a highly complex field of analysis. Analytic knowledge with regard to
the expressive units that make up the nonverbal communication of a specific commu-
nication community still is very rudimentary. A continuous enrichment and enlarge-
ment of our knowledge of nonverbal communication, of its units and its structures is
indispensible to achieve progress. For the further development of functional pragmatics
theory and for concrete analytical work in its context this knowledge is an important
desideratum.
11. References
Berkemeier, Anne 2003. Wie Schüler(innen) ihr nonverbales Handeln beim Präsentieren und
Moderieren reflektieren. In: Otto Schober (ed.), Körpersprache im Deutschunterricht, 58–72.
Baltmannsweiler: Schneider Verlag.
Berkemeier, Anne 2006. Präsentieren und Moderieren im Deutschunterricht. Baltmannsweiler:
Schneider Verlag.
Bühler, Karl 1934. Sprachtheorie. Die Darstellungsfunktion der Sprache. Jena: Fischer. [Transla-
tion: Theory of language. The representational function of language. Translated by Donald
Fraser Goodwin, 1990. Amsterdam: John Benjamins].
Bührig, Kristin 2005. Gestik in einem inszenierten Fernsehinterview. In: Kristin Bührig and Frank
Sager (eds.), Nonverbale Kommunikation im Gespräch, 193–215. (Osnabrücker Beiträge zur
Sprachtheorie (OBST) 70.) Oldenburg: Redaktion OBST.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. London: Murray.
Dreischer, Anita 2001. “Sie brauchen mich nicht immer zu streicheln…” Eine diskursanalytische
Untersuchung zu den Funktionen von Berührungen in medialen Gesprächen. (Arbeiten zur
Sprachanalyse 39.) Frankfurt am Main: Peter Lang.
Ehlich, Konrad 1993. HIAT – A transcription system for discourse data. In: Jane Edwards and
Martin D. Lampert (eds.), Talking Data: Transcription and Coding in Discourse Research,
123–148. Hillsdale, NJ: Lawrence Erlbaum.
Ehlich, Konrad 2007a. Funktional-pragmatische Kommunikationsanalyse – Ziele und Verfahren.
In: Konrad Ehlich, Sprache und sprachliches Handeln, Volume 1, 9–28. (Pragmatik und
Sprachtheorie.) Berlin: De Gruyter.
Ehlich, Konrad 2007b. Kooperation und sprachliches Handeln. In: Konrad Ehlich, Sprache und
sprachliches Handeln, Volume 1, 125–137. (Pragmatik und Sprachtheorie.) Berlin: De Gruyter.
Ehlich, Konrad and Jochen Rehbein 1977. Wissen, kommunikatives Handeln und die Schule. In:
Herma C. Goeppert (ed.), Sprachverhalten im Unterricht, 36–114. Munich: Fink/UTB.
Ehlich, Konrad and Jochen Rehbein 1981a. Die Wiedergabe intonatorischer, nonverbaler und ak-
tionaler Phänomene im Verfahren HIAT. In: Annemarie Lange-Seidl (ed.), Zeichenkonstitu-
tion, Volume 2, 174–186. Berlin: De Gruyter.
Ehlich, Konrad and Jochen Rehbein 1981b. Zur Notierung nonverbaler Kommunikation für dis-
kursanalytische Zwecke (Erweiterte halbinterpretative Arbeitstranskriptionen (HIAT 2)). In:
Peter Winkler (ed.), Methoden der Analyse von Face-to-Face-Situationen, 302–329. Stuttgart:
Metzler.
Ehlich, Konrad and Jochen Rehbein 1982. Augenkommunikation. Methodenreflexion und Beispiel-
analyse. (Linguistik Aktuell 2.) Amsterdam: John Benjamins.
Grasser, Barbara and Angelika Redder 2011. Schüler auf dem Weg zum Erklären – eine
funktional-pragmatische Fallanalyse. In: Petra Hüttis-Graff and Petra Wieler (eds.), Übergänge
zwischen Mündlichkeit und Schriftlichkeit im Vor- und Grundschulalter, 57–78. Freiburg:
Fillibach.
Hanna, Ortrun 2003. Wissensvermittlung durch Sprache und Bild. Sprachliche Strukturen in der in-
genieurwissenschaftlichen Hochschulkommunikation. (Arbeiten zur Sprachanalyse 42.) Frank-
furt am Main: Peter Lang.
Heilmann, Christa M. 2005. Der gestische Raum. In: Kristin Bührig and Frank Sager (eds.),
Kommunikation im Gespräch, 117–136. Oldenburg: Redaktion OBST.
Leimbrink, Kerstin 2010. Kommunikation von Anfang an. Die Entwicklung von Sprache in den
ersten Lebensmonaten. Tübingen: Stauffenburg.
Martinet, André 1960. Eléments de Linguistique Générale. Paris: Armand Colin.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorien – Sprachvergleich.
Berlin: Berliner Wissenschaftsverlag.
Redder, Angelika 2001. Aufbau und Gestaltung von Transkriptionssystemen. In: Klaus Brinker,
Gerd Antos and Wolfgang Heinemann (eds.), Text- und Gesprächslinguistik. Ein internatio-
nales Handbuch, 2. Halbband, 1038–1059. (Handbücher zur Sprach- und Kommunikationswis-
senschaft, 16.2.) Berlin: De Gruyter.
Redder, Angelika 2008. Functional Pragmatics. In: Gerd Antos and Eija Ventola (eds.), Interper-
sonal Communication, 133–178. (Handbook of Applied Linguistics 2.) Berlin: De Gruyter.
Rehbein, Jochen and Shinichi Kameyama 2003. Pragmatik. In: Ulrich Ammon, Norbert Dittmar,
Klaus Mattheier and Peter Trudgill (eds.), Sociolinguistics. An International Handbook of the
Science of Language and Society, 556–588. (Handbücher zur Sprach- und Kommunikationswis-
senschaft 3.1.) Berlin: De Gruyter.
Rehbein, Jochen, Thomas Schmidt, Bernd Meyer, Franziska Watzke and Annette Herkenrath
2004. Handbuch für das computergestützte Transkribieren nach HIAT. (Serie B 56.) Hamburg:
Arbeiten zur Mehrsprachigkeit.
Thielmann, Winfried 2013. Konrad Ehlich. In: Carol Chapelle (ed.), Encyclopedia of Applied
Linguistics. Oxford: Blackwell.
Wrobel, Ulrike 2007. Raum als kommunikative Ressource. Eine handlungstheoretische Analyse vi-
sueller Sprachen. (Arbeiten zur Sprachanalyse 47.) Frankfurt am Main: Peter Lang.
Konrad Ehlich, Berlin (Germany)
42. Elements of meaning in gesture: The

analogical links
1. Co-speech gestures as analogical signs
2. Corpora
3. Methodology
4. Identifying gestural signs by analyzing relations between gestures and notions
5. Gestural sign, speech, and thought
6. To conclude
7. References
Abstract
My interest essentially lies in the specificity of co-speech gesture and in its mode of sym-
bolic functioning. I argue for viewing gesture as a symbolic system in its own right that
interfaces with thought and speech production.
42. Elements of meaning in gesture: The analogical links 659
1. Co-speech gestures as analogical signs

Our body is simultaneously the source and the displayer of sensations, the producer of
movements, postures, behaviours, actions, and signs that, from the simple physical point
of view, are all interactions of the body with its environment. The body, the seat of
symptoms and reflex actions, is a producer of actions and signs. To what extent could
the latter, via a relation of contiguity or resemblance, be derived from the former?
Reactive behaviours that serve as indicators of a person’s affective or psychic state
can be deliberately reproduced in order to signify the state that they naturally indicate.
By way of gesture, the human being re-expresses all the vibrant interactions that issue
from his culturally influenced or determined perceptual-motor experience (Mauss
[1934] 1950), whether they be deeply felt, proprioceptively sensed, or externally ob-
served and cognitively integrated. This is the conclusion I have come to after many
years of empirical research into questions about the semantic contribution of nonverbal
elements during language use.
1.1. Experience of the physical world

Gestural representation draws upon our common experience of the physical world.
Thus, by imitating how an instrument is handled, we can refer to each one of the com-
ponents involved in the action, to each element in the operational chain: the actor, his
action of using the instrument, the instrument itself, the action of the instrument on
an object to achieve a goal, and the object. Hence the sequence “subject-action-
instrument-action-object” forms what I call an action schema. For example, one can
mime holding a fishing rod to refer to: the angler, him handling the rod, the rod itself,
the act of fishing, or the fish attracted to the bait at the end of the line. Furthermore, this
action schema can be applied to another object and metaphorically represent the way in
which a person is baited in order to trick him. By analogy, the gesture presents the rel-
evant element (handling the fishing rod) referred to in the action schema, and the con-
text enables the interlocutor to identify the segment of the action schema to which the
speaker is referring. This simple example shows us that both the encoding and the
decoding of a representational gesture imply metaphoric and/or metonymic cogni-
tive processes that are inspired by links of resemblance or contiguity between what
we experience in the physical world and the gestures we perform.
The interpretation of a gesture requires us to reason on the physical level because
that is where the semiotic process is initiated. Salience given to a different selection
of a gesture’s physical features induces different representations.
For example, the palm facing forwards presenting a flat vertical surface in front of
the body, a configuration which I call Palm Forwards, can move forwards to push for-
wards or to counter an approaching aggressive force. Thanks to our physical experience,
we recognize the flat palm in this orientation and position as being an active or a reac-
tive force. The gesture simply means what it does in reality: obstruct something, give
resistance to an aggressive force coming towards oneself, protect oneself from it, or
even push it away. It draws its meaning from a physical function (self-protection). It
is or it represents an opposing force, and this primary sign of opposition is subject to
semantic derivation (see 4.2.1 Semantic derivation). The representational gesture, in
this case, is based on an analogy of function (the palm moves forwards to protect
oneself) originating from a link of contiguity between the gesture (the palm moves for-
wards) and its functional meaning (to protect oneself). The analogical link of function is
a link of contiguity.
However, as a vertical surface with a rectangular shape turned away from the body,
Palm Forwards at face level can represent a notice, the relevant element in the action of
putting an announcement on a notice board. The virtual object thus represented can
evoke, via metonymy, the whole action schema and its motive: putting a notice on a
board to make its written content known to everyone. And by further semantic deriva-
tion, outside the domain of the written word, the action of making something known to
everyone as if one were displaying it on a notice board. This example shows us that the
semiotic process that produces representational gesture occurs in stages, in this case
using several links: resemblance of shape (rectangular flat palm and notice), temporal
contiguity (displaying information), and resemblance of motive (public announcement).
The analogical link of shape is a link of physical resemblance, and the contextual meaning
(public announcement) is derived from this physical resemblance.
In sum, the analogical link is the initial link of contiguity or of resemblance estab-
lished through analogy between a physical feature of the gesture and our physical expe-
rience of the world. On the basis of the analogical link, further links of contiguity or
resemblance may come into play to create the contextual meaning of a gesture. We
shall come back to the importance of identifying the analogical link of a gesture in
order to discover the meaning of a gesture in a given context.
1.2. Representation of the physical world

One can gesturally represent an action, a state, an object, an animal, or a person char-
acterized by a distinctive dynamic or static feature. Even when giving an account of a
concrete reality, gestural representation implies a mental process of abstraction that will
influence what is to be characterized (what is the visibly distinctive feature?) and how
this is to be done (the choice of body part(s), the most appropriate movements, symbolic
norms to be respected).
Moreover, a mimetic gesture does not necessarily refer to the act that is being imi-
tated, but it can also, as we have already seen, refer to the idea derived from the result
of the imitated act (fishing > tricked person; putting up a notice > public announce-
ment). From the semiotic point of view, one observes that the link established between
a gesture and its meaning is not direct: it supposes a link of resemblance followed
by a link of contiguity. The gesture does not evoke the act that is imitated but its
consequence.
Once we recognize the analogical link, and therefore a gesture’s deep physical mean-
ing, it modifies how we interpret a gesture in a given context. For example, Palm For-
wards* – the asterisk indicates where the gesture begins – in the context of a narration
could accompany the verbal utterance *elle n’a pas dit un mot ‘she didn’t say a word’.
The reflex action (of raising the hand, palm facing forwards, to protect the face) repre-
sented by the physical elements of the co-speech gesture (palm facing forwards and out-
wards) communicates a deep meaning (self-protection) that underlies and clarifies the
gesture’s contextual meaning (self-protective prudence). The transcript of the multimo-
dal message could be: “fearing detrimental consequences, wisely, *she didn’t say a
word.” The gesture shows the reason for the silence. From the semiotic point of view,
a physical similarity between the gesture’s physical elements and the reflex action
(Palm Forwards) creates a physical metaphor (physical self-protection). This serves
to express an abstract metaphor via a transfer from the physical world (reflex of physical
self-protection) to an abstract notion (non-physical self-protective action).
A deep analysis of the kinesic sign highlights its motivated character. Researching
this motivation leads us to home in on the perceptual-motor experience of the body
in physical interaction with its environment. This return to the origins highlights the
non-conscious, physico-symbolic information conveyed by the gestural sign that thus
operates on several levels of consciousness; quite often, it is its deep motivation, its
non-conscious “symbolic action,” that is revealed to be of highest relevance on the
semantic level during speech production. Knowledge of this root meaning enables us
to explain the spontaneous choice of a particular kinesic expression at the expense
of another and, in this way, to gain a deeper understanding of the utterance.
2. Corpora
My research began by studying attitudes that French people express through co-
occurrent intonational and facial expressions, truly audio-visual nonverbal entities
(Calbris and Montredon 1980). It continued by concentrating solely on the visual mod-
ality, first of all on conventional gestural expressions that can be understood without a
context (emblems), and then on spontaneous gestural expressions that occur during
speech production (co-speech gestures).
Initially, 50 French (Calbris 1980: 245–347) and foreign (Calbris 1981: 125–156) sub-
jects were tested using an experimental film designed to study how 34 conventional
French co-occurrent manual and facial expressions are structured to convey meaning.
The results give indications of the relative pertinence of meaningful physical elements
and the cultural character of these expressions. Moreover, the foreign subjects inter-
preted them as signs that necessarily have a “motivated” origin, that is to say, there
seems to be a natural driving force that has led to their appearance as opposed to an
arbitrary pairing of forms and meanings established by convention.
An outcome of this initial research was the need to verify the motivation of the phys-
ical components of the gestures produced during spontaneous uses of language. This
gave rise to Corpus 1, a very varied collection of about a thousand samples of co-speech
French gestures ethnographically noted in 1981 in the field, for example, in trains and
cafes, as well as selected from media such as films, comedy sketch shows, and television
debates. The semiotic analysis of these gestures, classified according to their physical as-
pects by evaluating the hierarchical relevance of their physical components in view of
their corresponding contextual meanings, was the subject of a doctoral thesis (Calbris
1983), later condensed into a book (1990). A comparative analysis of the data showed
that one gesture can evoke several notions (semantic diversity) and that one notion can
be represented by several gestures (physical diversity). This being the case, how is the
presumably motivated character of gesture maintained? I sought to answer this ques-
tion by conducting a further comparative analysis of the data. This revealed the phe-
nomenon of semantic derivation on the basis of one physico-semantic link (single
motivation). This revealed the phenomenon of semantic derivation from either one
single physico-semantic link (single motivation), or one selected from several possible
physico-semantic links (plural motivation) (Calbris 1987: 57–96).
To confirm the results obtained from observations noted in the field, in 1990 I estab-
lished Corpus 2, a database of audio-visual samples of French gestures: fragments of se-
quences, varying in length from a few seconds to one minute, selected from filmed
interviews with about 60 people, mostly intellectuals.
Corpus 3 is a series of six interviews with Lionel Jospin, the former French Prime
Minister, that were broadcast on French television between July 1997 and April 1998.
It was established to study how the two types of sign – gestural and verbal – interact
synergetically during utterance production.
A major contribution of my work resides in demonstrating the plural motivation of
gesture. This modifies our perception of the referential function of co-speech gesture
because one gesture can be bi-referential or even multi-referential. Thus, in a given
instance, by establishing several analogical (physico-semantic) links between its
physical aspect and its contextual meaning, one gesture can contain several gestural
signs.
3. Methodology
In order to progress towards a deep understanding of surface phenomena, my method
of analysis operates on several descending levels: from the examination of a co-speech
gesture in its contexts of use, to its motivation, and then to the physical origin of its
motivation. The stages of analysis are summarized as follows:
(i) Establish a large database of representational gestures;

(ii) Classify the representational gestures according to their priority physical
component;
(iii) Compare the relations between gestures and the notions they represent.
By researching physical diversity, on the one hand, and semantic diversity, on the other,
one may discover the range of physico-semantic links contained in the database.
3.1. Identifying representational gestures

To pin down the meaning of a co-speech gesture, we have to take into account the infor-
mation conveyed by the situation in which it occurs, by what is said, by other bodily
information, and by the voice. It is necessary to distinguish body movement with a ref-
erential function from body movement with a demarcative or expressive function in
liaison with the voice. The distinction is relatively easy to make once the vocal scansion
and accentuation have been coded. Furthermore, a manual gesture should be inter-
preted in relation to “co-occurring” movements, i.e. those simultaneously produced
by other body parts: gaze shifts or facial expressions will determine the particular mean-
ing of a head movement, for example, as either an indication of place, benevolent atti-
tude, or a sign of restriction. The meaning is then verified by what is said in the given
situation. What seems to happen is that the context comes to “activate” one of the
meanings that are possible.
In order to arrive at an understanding of the symbolic system of gestures that replace
or accompany speech, it is necessary to have access to a diversified and representative
sample of the whole system.
3.2. Classification of representational gestures according to their

priority physical component
We begin by developing a classification of gestures according to physical criteria. This
allows for an objective and relatively exhaustive approach. Insofar as a physical charac-
teristic of a gesture can be shown to be a vehicle of meaning, this classification becomes
physico-semantic as recurrent correlations between physical characteristics of gestures
and contextual meanings become apparent in the data.
Firstly, I make a broad distinction between body-focused gestures that touch or focus
on a part of the speaker’s body, from those that do not (Tab. 42.1). For body-focused
gestures, the body part on which the gesture focuses, and not the position of the
body part making the gesture, is of primary relevance.
For gestures in space, that do not touch or focus on a body part of either the inter-
locutor or the speaker, the type of movement is of primary relevance. In these cases,
it is necessary to distinguish those which describe straight lines or flat surfaces from
those which describe curved lines or surfaces because the physical elements which
are relevant for the former are not relevant for the latter.
Finally, I have systematically examined a very large and important category of head
gestures, namely movements performed with the head, excluding facial expressions.
My research shows that particular gestural components have priority in that they
determine the next level of classification into different types of gestures: the body
part to which the gesture directs attention, that I call localization, in the case of
body-focused gestures; the movement in the case of gestures in space; and the moving
body part in the case of head gestures.
Although the semantic contribution of one of the physical elements may have pri-
ority in a given instance, the contribution of the other components must also be
taken into consideration. The meaning of the gesture is derived from the combina-
torial interplay between primarily and secondarily relevant components which convey
meaning.
Tab. 42.1: Classification of gestures by priority physical component

PRIORITY CLASSES OF GESTURES
COMPONENT
Localization Body-focused gestures
Movement Gestures in space
Form of movement: Straight pathways Curved pathways
Direction of movement: Directional axis of movement Clockwise vs. anticlockwise
Secondary components: Body part (hand or digit/s)
Configuration of body part
Orientation of configuration
…
Body part Head Gestures
The major class of gestures in space is subdivided according to the form of the move-
ment pathway: straight-line gestures are opposed to curved gestures, whose components
of secondary relevance differ completely from one another. In the case of straight-line
gestures, what matters is the directional axis of the movement performed by a body
part in a particular configuration, and in a specific orientation if the configuration
has a flat shape. Repetition and symmetry are also secondary components. In the
case of curved gestures, what matters is the form created by the movement as well as its
direction: progressive (clockwise) movement is opposed to regressive (counterclockwise)
movement.
Whereas there are essentially only three priority physical components (localiza-
tion, movement, and body part), the secondary components are numerous: repetition
of movement, type of repetition, and movement quality all have their importance, as
does laterality or the use of both hands, as well as the plane, whether it be the plane in
which the flat hand performing a straight-line movement is oriented or the plane in
which a curved movement is performed. Furthermore, certain physical elements of
the flat hand, such as the tip, the edge, or the surface of the flat palm, appear to be
relevant. The sub-components of a configuration, just like the sub-components of
a movement, can unite to constitute the relevant physical feature of the gesture in
question.
3.3. Stages of analysis

The systematic analysis of gestural signs requires several steps:
(i) Code the representational gestures’ components: the coded description indicates,
always in the same order, the hand used (or both hands), its localization (only
for gestures at face level or higher), its configuration, its orientation, and then
its movement. For example, the right hand [R], closed in a Fist [R ¶] turned
inwards towards the speaker [R ¶b], moves forwards [R ¶b.f];
(ii) Create repertoires according to gestural components;
(iii) In each repertoire, determine the common semantic element corresponding to the
common physical element;
(iv) Deduce the potential analogical link(s) between physical and semantic elements;
(v) Validate the analogical link.
This method enables us to discover the diverse analogical links established between the
physical and the semantic levels and subsequently the combination of these links man-
ifested internally in the gestures. It brings to light the complex interplay of the symbolic
relations between gestures and the notions they express.
4. Identifying gestural signs by analyzing relations between

gestures and notions
In the database of representational gestures, we investigate the relations between gestures
and notions (Fig. 42.1) and study the diversity of these physico-semantic relations by
adopting the semantic viewpoint (one gesture represents different notions), then the phys-
ical viewpoint (different gestures represent one notion). In view of the diversity of ges-
tural signs observed to refer to one notion or obtained by investigating one gesture, the
objective is always the same: to identify the analogical link explaining each of these signs.
SEMANTIC DIVERSITY PHYSICAL DIVERSITY

ONE GESTURE represents different notions Different gestures represent ONE NOTION
Alternatively Alternatively
Polysemous gesture Gesture variants
Polysign Cumulative variant

Simultaneously Simultaneously
Fig. 42.1: Relations between gestures and notions
4.1. Physical diversity to express a notion. Gesture variants

A given notion or concept may be expressed gesturally in several different ways. We may
say, then, that a given notion may have gesture variants (Fig. 42.1). Essentially, they con-
sist in a change of movement or a change of body part. For instance, the head, hand,
index finger, and thumb may be used indifferently to localize spatially or temporally,
to designate concretely or abstractly (localization and designation), or to mime move-
ments in various directions. The addition of a body part performing the same movement,
for example, lateral shaking of the hand and the head to express “negation”, allows one
to obtain stylistic variants; in fact, the addition only reinforces the expression since it pro-
duces a cumulative variant. In contrast, the substitution of one component for another,
for example, substituting body parts that cannot cumulate their movements, such as the
hand, thumb, and index finger, or substituting the shape of the movement pathway, for
example, from a straight to a curved line, allows one to obtain semantic variants. One can
then determine the semantic contribution of the substitute, whose role as a secondary
component allows nuance to be added to the meaning of the priority component. This
possible precision allows one to select the semantic variant adapted to a situation.
4.2. Semantic diversity of a gesture: Polysemy, polysign,

polysemy of a polysign
The collected corpus is sorted into repertoires of gestures that have physical compo-
nents in common; the gestures within each repertoire are then compared from the phys-
ical and the semantic points of view. One studies how a given gesture may represent
several different notions (Fig. 42.1). In cases where one gesture may represent more
than one notion, it may be polysemous, in which case these different notions are repre-
sented on different occasions of the gesture’s use. On the other hand, a gesture can
function as a polysign, in which case it may refer to several different notions at once
on a given occasion of its use.
4.2.1. Polysemy of a gesture explained by semantic derivation

or plural motivation
For example, Palm Forwards and the transverse movement of the flat hand parallel to
the ground, a configuration which I call Level Hand, are both polysemous gestures, but
their polysemy is explained differently: a semantic derivation from one analogical link
in the case of Palm Forwards, whereas the gesture performed with the Level Hand
contains several analogical links.
4.2.1.1. Semantic derivation

Palm Forwards may express the notions of “opposition,” “prudence,” “refusal of respon-
sibility,” “stopping,” “requesting someone to wait,” “agreement,” “refusal-negation,”
“objection,” “restriction,” or “perfection.” Derived from the reflex of self-protection,
the gesture expresses a (self-protective) opposition if one takes into account the notions
of “prudence,” “refusal of responsibility,” and “agreement by capitulation” thus ex-
pressed. As a variant used to signify “perfection,” it “opposes” all possible objections
regarding the quality being spoken about.
One way of verifying the value of the analogical link that one imagines is to verify
whether the modulation of the sign on the physical level entails or gives rise to the mod-
ification echoed on the semantic level, thus justifying the link itself. In reality, the sur-
face area that is physically opposed appears to be proportional to the strength of the
opposition or self-protection symbolized (Fig. 42.2).
P Hands Hand(s) Forefinger Thumb
S
Y
S
I
C
A
L
S Refusal of Self-protection
E responsibility Stop
M Request to stop ‘Time out!’
A Objection Restriction Restriction
N Refusal
T Negation Negative
I insinuation
C
Fig. 42.2: Parallel attenuation, physical and semantic: Opposition to the outside (Calbris
1990: 119)
As a semantic variant, restriction or partial opposition is offered by a partially raised

palm oriented in an oblique plane and facing forwards. Instead of adjusting the angle,
another possible way to attenuate is to reduce the opposing surface area further by raising
just one finger, the index finger, or the thumb. The index finger, the indicator finger, is
used to oppose in order to add precision, while the thumb, the strong finger, is raised out-
wards to stop something for a moment. One can say that the phenomenon of parallel
attenuation on the physical and semantic levels confirms the value of the link established
between them.
4.2.1.2. Plural motivation

The gesture contains alternative gestural signs based on different analogical links
(Fig. 42.3). The examples in my corpora show that the transverse movement of the
Level Hand can express the following alternative notions depending on its context of
use: “quantity” and as a value judgement “superlative”; “totality” and as a value judge-
ment “perfection”; “directness” on the temporal level (“immediately afterwards”), on
the logical level (“determinism,” “obligation,” “certainty”), or on the value-judgement
level (“frankness”); “stop-refusal” as in cases of “negation” (“nothing,” “never,” “no
more”), “refusal,” or “end”; “cutting”; and it can literally and figuratively express the
idea of a “flat surface,” a “second surface” that covers the first, the action of “laying
something out flat,” a “level,” or “making something level,” i.e. “equality.” It is a question
of “levelling – flattening – standardizing – equalizing.”
GESTURE Transverse movement of the Level Hand
RELEVANT Movement
TRAIT makes relevant:
the visual field the fingertips the palm the edge the flat palm
from left to right that draw that resists that cuts that covers
ANALOGICAL Everywhere Straight line Obstacle Cut Horizontal plane

LINKS
MEANINGS Quantity Directness Stop-refusal Cutting Flat surface

Totality
SEMANTIC Superlative Determinism Negation 2e Surface
DERIVATIONS Perfection Certainty Refusal (same) Level
Obligation End Equality
Frankness Stability
Fig. 42.3: Plural motivation of the transverse movement of the Level Hand (Calbris 1990: 140)
The diverse contextual meanings of the same gesture make one or another of the com-
ponents relevant (movement or configuration), or even one of the physical traits of
the configuration (the palm, the fingertips, the edge of the hand, or the orientation
of the palm); most frequently, the analogical link resides in the movement of one of
these elements (the movement of the palm, of the fingertip(s), or of the edge of the
hand).
Let us now consider the relevant physical features upon which the different analogical
links in Fig. 42.3 are based:
(i) We know that other gesture variants expressing “totality” have a component in
common: the “transverse movement” of the hand or of the gaze, sweeping the
horizon, representing the whole visual field, “everything,” and “everywhere.”
(ii) The idea of “directness” that is common to the notions of “determinism,” “obliga-
tion,” “certainty” and “frankness” supposes the representation of a straight line
(linear trace made by a moving point); hence it is the movement of the fingertips
which becomes relevant.
(iii) The idea of “stop-refusal,” previously expressed by Palm Forwards opposing
something approaching from the outside, is signified by a horizontal movement
of the palm facing downwards. Could the palm stop an opposing force? The notion
of “the end” implies stopping a process that has come to an end. Would this be
represented by the palm stopping a progression originating from the ground?
(iv) To express the notion of “cutting,” it is the edge of the hand, or more exactly the
movement of the edge of the hand, which becomes relevant.
(v) Lastly, the analogy between the flat shape of the hand and a flat surface that it is
representing is obvious.
The analogical link applies to every domain and may be at the origin of semantic
derivation. For example, a direct link expresses “immediacy” on the temporal
level, and on the logical level, “immediate consequence” (“determinism,” “obliga-
tion,” “certainty”), whereas on the level of moral judgement, directness expresses
“frankness.”
4.2.2. Polysign
A gesture that simultaneously represents several notions is called a polysign. For exam-
ple, a raised fist that as co-speech gesture signifies how well a secret has been guarded
simultaneously represents “enclosure” by virtue of the fist configuration and “increas-
ing exclamation” by the upward movement. Each component (configuration, move-
ment) supporting an analogical link is a gestural sign. Hence the gesture (movement
of the configuration) is a polysign.
The complex gesture is a particular type of polysign. Here is an example from my
data: in order to simultaneously depict “mixing” and “approximation,” usually signified
by the two hands turning around each other in the first case, and by a rotational oscil-
lation of the concave palm facing downwards in the second, a screen writer expresses a
“kind of confusion” and a philosopher expresses an “approximate mixture” by perform-
ing the same synthesis, i.e. an alternating oscillation of the two concave palms, one
behind the other, as if they were interlocked (Fig. 42.4).
GESTURE 1. 2. Synthesis of 1. & 2.
Two hands turning Rotational oscillation Alternating oscillation of
around each other of the concave palm the two concave palms,
one behind the other
MEANINGS Mixing Approximation Kind of confusion

Approximate mixture
Fig. 42.4: A complex polysign gesture (Calbris 1990: 149)
The multifaceted depiction requires one of the components to be modified so that the
analogical links necessary for synthetic representation can be cumulated. In other
words, an addition on the symbolic level requires a slight transformation on the physical
level.
We know that a gesture is a composite unit. Each of these components can itself be
decomposed. We have seen that a hand configuration, a flat hand, for example, is able to
contain several relevant physical elements: the fingertips, the palm, and the edge of
the hand. Similarly, a movement, for example, a curved line, is characterized by both
shape and direction. There are thus many elements which may convey meaning.
Since it is the analogical link, established by a gestural component or even a gestural
sub-component, that determines a meaning, several types of polysign may occur
(Tab. 42.2). Two analogical links established by two gestural components give rise to
a bi-referential gesture. Likewise, two analogical links within one component, for exam-
ple, a movement composed of two relevant sub-components, give rise to a bi-referential
movement. A polysign gesture may be multi-referential by having several analogical
links on the component and/or sub-component level. Tab. 42.2 shows just some of
the possible combinations of analogical links that, in sum, generate the referential
potential of polysign gestures.
Tab. 42.2: Types of polysign

GESTURAL COMPONENTS
Orientation Configuration Movement Analogical links Types of polysign

— — 2 Bi-referential gesture
—— 2 Bi-referential movement
— — — 3 Multi-referential gesture
Key: — relevant physical elements with an analogical link
4.2.3. Polysemy of a polysign

A polysign can be polysemous (Fig. 42.5). It is sufficient that more than one gestural
component supports several analogical links, for example, a polysemous fist configura-
tion and a polysemous movement forwards, for their polysign product to become
polysemous:
GESTURAL Configuration Movement Configuration & Movement
COMPONENTS Fist Forwards Fist moves forwards
ANALOGICAL Physical or Towards or against,

LINKS psychological Spatial or temporal
strength progression
MEANINGS Will Forwards Will to go forwards

Effort Advancing towards Effort towards a goal
Strength Temporal progression Strength & Modernism
Strength Advancing against Strength to attack
Fig. 42.5: A polysemous polysign
By comparing repertoires of contextual meanings of co-speech gestures according to

their priority component or relevant physical feature, we know for instance that the
fist can represent “strength,” physical or psychological. As for movement forwards, it
is eminently ambiguous: one advances towards something or against an opposing
force. The progression represented along this axis is both temporal and spatial.
Consequently, the different combinations of notions that these gestural components can
evoke are themselves multiple. If we consider Fig. 42.5,
(i) the “will,” on one hand, and the “progression,” on the other, represent together
the “will to advance”.
(ii) The “effort” and the “progression towards a goal” become allied to represent the
“effort towards a goal”.
(iii) The combination of the notion of “strength” and that of “temporal progression”
results in “strength and modernism”.
(iv) Finally, the representation of “strength” allied to a “progression” amounts to the
“strength to attack”.
Many combined and diverse meanings for one and the same gesture are possible.
In fact numerous contextual meanings can depend on a few physical elements that
support more than one analogical link, each of which may be subject to semantic
derivation.
When the analogical link is not obvious, one finds it by exploiting the interaction
between the phenomena of polysemy (one gesture represents different notions), on
the one hand, and variation (different gestures represent a notion), on the other. For
example, in France, the head shake expresses “negation”; but furthermore, as a co-
speech gesture, it can also express “totality” and/or “approximation.” How can we dis-
cover the analogical link inherent to each of the three contextual meanings of the head
shake? The answer lies in comparing the gesture variants of each notion. All those that
express “totality” are characterized by a transverse movement of the head, or of the
hand. What is the analogical relation between this common characteristic (transverse
movement in each gesture referring to “totality”) and the notion of “totality” if it is
not reference to the horizon, “everywhere,” concretely represented by the gaze sweep-
ing across the horizon (in a single or a repeated head movement), or by the palm cover-
ing it from one side to the other (using one hand or both hands in a symmetrical
movement).
4.2.4. Nuance contributed by the polysemous gesture to each

notion it expresses
The polysemous gesture of the transverse movement of the Level Hand (Fig. 42.6),
for instance, can express alternatively: 1. totality, 2. negation, 3. cut, or 4. stop.
But it sometimes occurs that each of the gesture variants (a, b, c, d) used to ex-
press a notion (1 or 2 or 3 or 4) conveys a semantic nuance. This means that the
nuance conveyed is a second sign underpinned by an analogical link. The variant
thus contains two links: the principal analogical link representing the common
notion (4.) and a secondary link (a) representing a second notion that comes to
add a nuance (4.a). One thus observes combinations of analogical links, and the
variant functions like a polysign. This possibility of expressive modulation enables
one, for example, to select the gesture variant appropriate to the situation (4a or 4b
or 4c or 4d).
N O T I O N S
1. 2. 3. 4.
Totality Negation Cut Stop
VARIANTS
a finished absolute total definitive
b finished defensive at ground level defensive
c finished of refusal obstacle time limit
d unified corrective in two time limit
Fig. 42.6: Polysemy and gesture variants: Different notions nuanced by the polysemy of the
gesture
Once the various elements of the symbolic Meccano system constituted by analogical
links have been identified, one can leisurely observe the constructions obtained by
the association of various elements. A polysemous gesture (meanings: 1, 2, 3, 4) can
in a given situation become a polysign: the totality is finished (1.a) and not unified;
the negation is absolute (2.a); the cut is total (3.a); and the stop is definitive (4.a).
For each notion, the polysemous gesture cumulates two analogical links, namely, the
link specific to the notion, and the link that, depicting “totality” by the transverse
movement, comes to nuance it.
Playing with the symbolic Meccano system produces curious results because a kine-
sic ensemble (facial expression and hand gesture) just as well as a kinesic unit (hand
gesture) or a kinesic sub-unit (gestural component) can, each one of them, contain
one or more analogical links!
5. Gestural sign, speech, and thought

When gesture and speech alternate, whether the order is (Speech > Gesture) or (Ges-
ture > Speech), the gesture functions much as if it were a word or another unit of linguis-
tic expression, taking its place in the construction of the utterance as a component of its
syntax. On the other hand, when gesture and speech occur together, the gestural expres-
sion and the spoken expression may be related semantically in two different ways:
5.1. A coverbal sign

Sometimes a gesture that co-occurs with speech functions like a “commentary,” simul-
taneously added on to what is being expressed in words. Either it expresses the speak-
er’s attitude in relation to the object of the utterance, or in relation to the interlocutor,
or it expresses a commentary on the object of the utterance. The complementary infor-
mation it provides enables one to avoid the verbal repetition of synonymous informa-
tion. Insofar as it concretely clarifies what is said and sometimes disambiguates the
verbal content, gesture plays an involuntary pedagogical role.
5.2. A preverbal sign

Let us imagine the audio-visual chain segmented in rhythmic-semantic groups as a num-
ber of verbal-gestural pairs that succeed one another. It may happen that the informa-
tion conveyed by the gesture in one pair announces the information conveyed by the
speech in the subsequent pair. The synchrony of the dual units may be perfect but with-
out there being any synchrony in the respective pieces of information supplied: the ges-
tural information often precedes the verbal formulation of what (idea, notion) is to be
uttered. The co-verbal gestural sign is often pre-verbal (Fig. 42.7):
[face-to-face, the two concave palms sculpt a globe] *Et c’est dans ce sens, me semble-
t-il, [sculpt a second, smaller globe], *que quand même derrière tout cela, [the speaker
leans back again, hands in the rest position] il y a malgré tout une unité
[face-to-face, the two concave palms sculpt a globe] *“And it’s in this sense, it seems
to me,” [sculpt a second, smaller globe] *“that even so behind all that,” [the speaker
leans back again, hands in the rest position] “there is nevertheless a unity”
Idea of “globality”
beginning end of the gesture
Et c’est dans ce sens, me semble-t-il

‘And it’s in this sense, it seems to me’
Fig. 42.7: Gestural formulation anticipating verbal formulation
The multichannel nature of the communication is highly evident in this example since
both the acoustic and the visual channels are used in a complementary way during
three-fifths of the utterance: one is assigned the task of diplomacy, and the other is
entrusted with the essential content. In other words, the idea of “unity,” present right
from the start, is gesturalized in the form of a globe. The gesture is repeated in order
to segment the rhythmic-semantic groups corresponding to the oratorical precautions
(three-fifths of the utterance). The multifunctionality of the gesture is equally evident
since the significance of the globe, sculpted by the referential gesture that segments and
complements the simultaneous utterance, is only confirmed at the end.
One sees the gesture giving the metaphorical image of the abstract notion often well
in advance of this being put into words. More than the imagination, it expresses the
metaphorical imagery that underlies the putting of thoughts into words. Gestural refer-
entiality is an indicator of ideation, of the spontaneity of the thought to be put into
words (see Calbris 2011: 295–312).
Now let us consider the relation between the gestural sign and thought. The study of
gestures of cutting, for example, extracted from fragments of conversation selected
from different corpora, demonstrates how gesture expresses the percept underlying
the concept (Calbris 2003: 19–46). Gesture appears as the product of a perceptual
abstraction from reality. It represents a preconceptual schema, an intermediary between
the concrete and the abstract, which allows it to evoke the one or the other equally well.
For example, the schema of cutting implicitly appears in numerous and varied notions:
“separation,” “cutting into elements,” “division into two halves,” “blockage,” “refusal,”
“elimination,” “negation,” “end,” “stoppage,” “decision,” “determination,” “measure-
ment,” “categorization,” “categorical character,” and “interruption.” An ideal, abstract,
and adaptable prototype is constructed on the basis of concrete acts. The gesture of cut-
ting represents the visual and proprioceptive operational schema and, through it, the
two extremes of the semantic continuum going from the concrete to the abstract:
from cutting a real object into pieces and the cognitive dissecting task of analysis.
Despite its motivated character and the physical concreteness of its form of expression,
the gestural sign operates at a certain level of abstraction.
6. To conclude
The methodological approach adopted here, that consists in studying the analogical link
by comparing the gesture variants of a notion, has enabled us to throw light upon the
aforementioned cases: it explains a gesture’s potential to change analogical links in
order to evoke different notions according to the context (polysemous gesture) or to
cumulate two analogical links in order to evoke two notions at the same time (polysign
gesture). The context comes to activate this and/or that analogical link which the ges-
ture can propose; it does not just develop a semantic derivation on the basis of an ana-
logical link. In short, gesture appears to be like a composite unit composed of physical
elements that are not only relevant but also potential conveyors of meaning that the
context comes to activate in a selective manner.
7. References
Calbris, Geneviève 1980. Etude des expressions mimiques conventionnelles françaises dans le
cadre d’une communication non verbale. Semiotica 29(3/4): 245–347.
Calbris, Geneviève 1981. Etude des expressions mimiques conventionnelles françaises dans le
cadre d’une communication non verbale testées sur des Hongrois. Semiotica 35(1/2): 125–156.
Calbris, Geneviève 1983. Contribution à une analyse sémiologique de la mimique faciale et ges-
tuelle française dans ses rapports avec la communication verbale, Volume 4. Thèse de doctorat
ès lettres, Paris III.
Calbris, Geneviève 1987. Geste et motivation. Semiotica 65(1/2): 57–96.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis: Gesture as the representa-
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam/Philadelphia: John
Benjamins.
Calbris, Geneviève and Jacques Montredon 1980. Oh là là! Expression intonative et mimique.
Paris: CLE International.
Mauss, Marcel 1950. Les techniques du corps. In: Marcel Mauss, Sociologie et Anthropologie,
36–85. Paris: Presses Universitaires de France. First published [1934].
Geneviève Calbris, Paris (France)
43. Praxeology of gesture

1. Introduction
2. Praxeology
4. Gesture as praxis
5. Ecologies of gesture
6. Methodology
7. Praxeologies of gesture: Some examples
8. Conclusion
9. References
Abstract
This chapter presents the outline of a praxeological approach to the study of gesture. It
argues that gestures should be analyzed for the cultural practices or methods by which
they are shaped and performed, rather than as a semiotic code or a mode of expression
of mental content. The praxeological approach can be traced to Mauss’ concept “techni-
ques du corps”, which emphasizes the cultural nature of bodily comportment. The chap-
ter discusses the “practice turn” in social research, shows the importance assigned to the
body in practice theory, and defines gesture practices as methods to achieve practical
understanding in complex, multi-modal activity contexts. It is argued that gesture prac-
tices can be investigated in terms of their impact on the ecology of the communicative sit-
uation and their relations to different components of these ecologies. It is suggested that
an adequate methodology for the rigorous study of communicative practices must be
micro-ethnographic, being attentive to both generic practices and organizations, as well
as to the ethnographic particulars of the community and the situation. The chapter
concludes with a brief summary of some recent practice-oriented studies of gesture.
1. Introduction
To investigate gesture in praxeological fashion means to conceive of it, in the first place,
as skilled physical praxis, as embodied activity performed according to methods that are
shared within some community. Gestures are physical actions by which we “do things”
43. Praxeology of gesture 675
(Austin 1962) – although the things that gestures do not only include illocutions, but a
large, presumably unknown, number of other types of social actions, including directing
or attracting attention and showing how something ought to be done. By practices we
mean established, common things that get done by gestures, as well as the habitual, rou-
tinized methods by which gestures are made. Thus, we say, roughly, that there exists in
many societies an established, commonly understood practice – we may call it “point-
ing” – the point of which is to direct an other’s visual attention to some target, and that
there are distinct practices by which pointing gets done in, say, Lao society (Enfield
2009), some involving eyes and fingers, others eyes and lips, each with a distinct social
effect. Each of these practices, when enacted, may require the concurrent enactment of
other practices, some involving the eyes, some speech – pointing is in fact a “multi-
modal” practice (or an integrated “bundle of practices” – issues of hierarchy or logical
type – practices and meta-practices, etc. – will not concern us here). More often than
not, gesture practices are enacted along with speech, and gestures may provide imagery
that overlaps or complements the imagery provided in speech, but the two modalities
nevertheless constitute separate and rather incomparable resources for communicative
action. The tasks for praxeological research on gesture, then, are
(i) to identify the things that get done with gestures,

(ii) to identify the practices by which gestures get made, and
(iii) to investigate how gesturing supports other forms of human praxis, including
practices of speaking and practices of work.
A praxeological view of gesture, thus, does not assume a pre-ordained, “psychological”

or “systemic” relationship between gesture and language; instead, it works from the
assumption that gesture, language, and the world are brought together in a number
of distinctly different ways, and that each successful gestural act is the product of
successful, situated coordination with other, concurrent communicative acts.
Gesturing by hand can, to an extent, be conceived as a craft (Streeck 2009a): a set of
physical skills acquired by individuals over a lifetime, involving routines, devices, meth-
ods, and standards that are shared with others at the same time as they are inalienable
properties of the individual bodies that have acquired them. Like many other crafts,
gesture involves the production of visible entities by hand. The parallel between gesture
and craft may end here: gestures are not usually appreciated for the level of mastery
that they display and are treated more like other everyday practices that we take for
granted, such as walking and perceiving the environment. Gesture is, by and large,
background praxis. On the other hand, the connection between gesture and manual
craft may be deeper, in so far as the human ability to make meaningful symbols such
as gestures may be predicated upon the ability to manufacture meaningful things.
The following gives a very broad and general outline of a praxeological view of ges-
ture, i.e. an account of hand gestures in light of recent and older practice theories: in the
first section, the important place given to the body in practice theories is described and
a praxis-based conception of the body is presented; the praxeological approach to ges-
ture follows Marcel Mauss’ ([1935] 1973) early observations about techniques of the
body, i.e. cultural styles or practices for embodied action. Some basic ways in which
hand gestures operate within situated ecologies of communication are then explained,
and methodological questions are discussed. The chapter concludes with a description
of some recent research on gesture that converges with the praxeological perspective.
2. Praxeology
The practice turn in social theory (Schatzki, Knorr-Cetina, and von Savigny 2001) is
a relatively recent and yet pervasive movement that comprises both classical sources
(historical materialism, phenomenology, Vygotski, Wittgenstein) and contemporary
ones (ethnomethodology, embodied cognition, discourse and conversation analysis,
Bourdieu). Associated with a variety of labels including activity theory, communities
of practice, and distributed cognition, the praxeology of social life and social interaction
does not constitute a single methodology or school, but a shared conviction that “the
social is a field of embodied, materially interwoven practices centrally organized around
shared practical understandings” (Schatzki 2001: 12). Although taken from Wittgenstein
(1953), who was concerned with linguistic practices, the notion of practical understand-
ing is given a distinctly physical interpretation in contemporary praxeology. It refers,
following Wittgenstein, not only to the practical nature of intersubjectivity – that crite-
ria for understanding can never be specified in so many words, but only reveal them-
selves in continued, successfully shared practice, but also to each human body’s tacit,
enactive understanding of the world.
Practical understanding […] exists only as embodied in the individual. An individual pos-
sesses practical understanding, however, only as a participant in social practices. Practical
understanding is […] a battery of bodily abilities that results from, and also makes pos-
sible, participation in practices. […] Because social orders rest upon practices that are
founded on embodied understanding, they are rooted directly in the human body.
(Schatzki 2001: 8)
That the human mind and its products, notably natural languages, reflect their ground-
ing in embodied experience and action has recently become commonplace in fields such
as cognitive linguistics (Johnson 1987) and cognitive neuroscience (Jeannerod 1997).
But while the human body and its fundamental experiences are often posited as univer-
sals in those fields, practice theorists insist on the culturally constituted character of the
body and its experiences in the world: it is the real body with its culturally specific sen-
sibilities and skills that our conceptual systems draw upon. The experiential schemata
that undergird these systems are not the products of our anatomical bodies, but of
our concrete dwelling in the world, that is, the habitual experiences and skills that living
bodies acquire in their daily immersion in a specific, culturally shaped world. “To speak
is to occupy the world, not only to represent it”, writes Hanks (1996: 236).
This emphasis on embodiment and practical understanding makes practice theory
attractive to gesture researchers, because relevant to gestural communication are not
only the skills that bodies acquire in interpersonal relations – where they might pick
up a communicative “code” – but also those acquired in the thing-world, as hands
reach for, grasp, hold, handle, move, hand over, and explore physical objects. The crit-
ical feature of the human hands, however, in so far as our manual gesturing is con-
cerned, is that they are skilled makers of things, organs of an organism which, with
unparalleled versatility, makes meaningful objects out of the raw materials of the
world. The transformation of nature into artifacts is, despite numerous, but invariably
limited exceptions, an exclusively human province, and it is primarily an achievement of

the hands, the most distinctive human body-part. The making of things is also a para-
digm and model for the making of meaningful signs. As Holzkamp (1978) has
argued, the making of useful objects – and thus the making of meanings-in-objects
(“Gegenstandsbedeutungen”) – is constitutive for the way humans produce and under-
stand signs: it is the prototype of all human abilities of bringing objects with shared
meanings into the world. The gesturing human is a homo faber (Kendon 2004). Gesture,
of course, comprises heterogeneous practices and involves the hands in various sym-
bolic roles. Among these different genres of gesture, the virtual, depictive building of
a virtual model world in gesture space, which is taken as a replica of a real scene, per-
haps demonstrates the analogies between these two ways of world-making (Goodman
1978) most vividly, i.e. between the making of a real world of artifacts and that of an
ephemeral, “small” world that represents the real one. Praxeology thus situates the
human body in action, in praxis, in productive, socially organized labor – in Marxist
terms: exchange with nature. The means by which and the circumstances under which
we make the world and make meaning through physical action are themselves products
of histories of human labor, and in every moment of interaction, therefore, the present
interacts with the past (cf. Hutchins 1995).
Praxeology, however, is not only concerned with physical and symbolic action, but
also constitutes a methodology for explicating perceptual and cognitive aspects of social
life. In fact, praxeology follows Ryle’s dictum that “overt intelligent performances are
not clues to the workings of minds; they are those workings” (Ryle 1949: 58). “The
statement ‘the mind is its own place’ [by Milton in Paradise Lost, JS], as theorists
might construe it, is not true, for the mind is not even a metaphorical place. On the con-
trary, the chessboard, the platform, the scholar’s desk, the judge’s bench, the lorry-
driver’s seat, the studio and the football field are among its places” (Ryle 1949: 51).
For example, C. Goodwin has shown how visual practices – professional ways of seeing –
are articulated and acquired in moments of social interaction. These moments involve
arrays of physical constraints and meaning-making resources, and it is within these
highly specified, concrete contexts that not only physical, but also perceptual practices
do their work. Referring to professionals generally, Goodwin writes:
Talk between co-workers, the lines they are drawing, measurement tools, and the ability to
see relevant events […] all mutually inform each other within a single coherent activity
[…]. [And] the ability of human beings to modify the world around them, to structure set-
tings for the activities that habitually occur within them, and to build tools, maps, slide
rules, and other representational artifacts is as central to human cognition as processes hid-
den inside the brain. The ability to build structures in the world that organize knowledge,
shape perception, and structure future action is one way that human cognition is shaped
through ongoing historical practices. (Goodwin 1994: 626–628)
The praxeology of gesture takes its cue from Marcel Mauss’ notion of techniques of the
body, by which the French social anthropologist meant “the ways in which from society
to society men know how to use their bodies” (Mauss [1935] 1973: 70). Mauss envi-
sioned a study of culture grounded in observations like the one that “walking and swim-
ming, […] are specific to determinate societies; […] the Polynesians do not swim as we
do, […] my generation did not swim as the present generation” (Mauss [1935] 1973: 70).
Techniques of the body travel by imitation and mediated representation: “American
walking fashions […] arrive[d] [in France] thanks to the cinema” (Mauss [1935] 1973: 72).
Mauss discerned the social in the seeming individuality of bodily comportment, claiming
to be able to “recognize a girl that has been raised in the convent” (Mauss [1935]
1973: 72): “The positions of the arms and hands while walking form a social idiosyncrasy,
they are not simply a product of some purely individual, almost completely psychical
arrangements and mechanisms” (Mauss [1935] 1973: 72, emphasis added).
Mauss (not Bourdieu!) coined the term habitus to refer to these “social idiosyncra-
sies” of bodily action, to translate the Aristotelian notion of hexis, “ ‘acquired ability’
and ‘faculty’ ” (Mauss [1935] 1973: 73). In contrast to Bourdieu, who focused his theoretical
imagination more or less exclusively on society’s inscription on the body, Mauss saw in
bodily habits “the techniques and work of collective and individual reason” (Mauss
[1935] 1973: 73, emphasis added), emphasizing the specific intelligence and sensibility
that is embodied and enacted in someone’s habitus. Techniques of the body are cultur-
ally specific, traditional solutions to recurrent practical, communicative and interaction
tasks. But their development in the individual, and as a consequence their manner of
use and coordination with other practices, are always also intensely personal affairs
(Polanyi 1958): as Thelen’s research has demonstrated, for example, there is no set pro-
gram for learning to walk; each child’s solution to the task of getting on the feet, keep-
ing balance, and beginning to walk, emerges through an idiosyncratic “soft assembly” of
component skills (Thelen and Smith 1994).
A conception of culture as acquired, distinctive techniques du corps also inspired the
classic photographic study of Balinese Character by Gregory Bateson and Margaret
Mead (1942). They set out to study “living persons moving, standing, sleeping, dancing,
and going into trance, [who] embody that abstraction which (after we have abstracted
it) we technically call culture” (Bateson and Mead 1942: xii).
Bateson and Mead observed, for example, how a Balinese left hand holds the paper
on which a Balinese right hand is writing and concluded from these and other observa-
tions that the Balinese specialize the left hand for “light surface touch”, while the right
hand is specialized for the more active tasks such as wielding instruments. These specia-
lizations, in turn, reveal “cultural idiosyncrasies” such as a taboo placed on the left hand
that prevents it from being used to hand over objects and to engage in many other ac-
tions. Thus, Bateson and Mead also studied how the right-hand preference is inscribed
upon infant bodies – as well as how the Balinese practice bending their fingers back-
ward, how ceremonially unclipped nails shape the ways in which fingers can hold
objects and people, and, as a consequence, people experience these objects.
Human hands are enculturated hands. But the practice of gesturing also enculturates
them. As Noland writes in Agency and Embodiment, “the human body becomes a social
fact through the act of gesturing” (Noland 2009: 19), and “gesturing is the visible per-
formance of a sensorimotor body that renders that body at once culturally legible and
interoceptively available to itself ” (21).
Gesture is a double-faced performance, making meaning visible to others, while
being experienced through kinesthesia – “muscle sense” or proprioception (Gibson
1966) – by the self. Performing culturally legible gestures, we couple patterns of enac-
tive self-perception with typified motives, stances, and interactional moves – whatever is
“conveyed” by the gesture. Kinesthesia is essential to gesture, as it is to all modes of
skilled motor action: we perceive the things that we do with our hands – and compre-
hend (“gather together”) the things with which we do them – through “internal”, kines-
thetic perception. J.J. Gibson refers to this apprehending system as the haptic system
(Gibson 1966). It is as haptic systems that we explore, handle, and make sense of the
world at hand, as well as our own physical actions within it. But kinesthesia is important
also with respect to the cognitive roles of gestures: gestures provide their makers with
“felt” motor patterns in terms of which to structure ideational content.
The praxeological approach to gesture has no prior commitment concerning the re-
lations among gesture, speech/language, and cognitive processes – whether, for exam-
ple, gesture and speech are parts of or controlled by the same psychological system.
Ethnographic research suggests that these relations are manifold, shifting, and complex:
gesture and speech can anchor, elaborate, and contextualize each other in a variety of
ways, depending in part on timing and the use of foregrounding devices in and for
speech and gesture. What remains identical across gesture situations is the fundamental
materiality – or phenomenal unity – of gesture: that meaning is made by physical
actions of the arms and hands. Leroi-Gourhan ([1964] 1993) sees no fundamental differ-
ence between instrumental and expressive gestures, i.e. between practical and commu-
nicative motor acts: “both […] produce kinesthetic experience as part of the recursive
loop of correction and refinement over time” (Noland 2009: 15).
The praxeology of gesture is interested in the ways in which gestures participate in the
bringing about of shared practical understandings; it conceives gesture as a heteroge-
neous, open-ended family of practices geared towards achieving practical understandings,
i.e. understandings that, while they may support what is being said, can not be formulated
in so many words. Many of the patterns of enactment and sensation that gesture offers
have counterparts in practical action and experience, and if this is the case, the gesture
projects the experiential pattern from everyday life onto the discursive material that is
in need of conceptualization. Understanding, thus, does not come from shared rules of
grammar or a shared lexicon, but rather from sufficiently shared practices (Hanks 1996).
4. Gesture as praxis
Gesture is a praxis, and gestures are practiced actions. Gestures belong to our “equip-
ment for dwelling” (Dreyfus 1991), for collaboratively inhabiting, sustaining, making
sense of, and re-making a common world. “Gestures originate in the tactile contact
that mindful human bodies have with the physical world. […] [They] ascend from ordi-
nary […] manipulations in the world of matter and things, and […] the knowledge that
the human hands acquire […] in these manipulations is […] brought to bear upon the
symbolic tasks of [gesture]” (LeBaron and Streeck 2000: 119).
Gestures are implicated in our activities and organize situations in a number of dif-
ferent ways. These ways can be systematized if we conceive of situations or contexts,
following Bateson (1972), as “micro-ecological systems”. The micro-ecology of commu-
nicative situations – and the conditions for practical understanding – are impacted and
reconfigured by hand gestures in a number of distinctly different ways. Streeck (2009b)
has proposed that we can minimally distinguish six modes of “ecological involvement”
of hand gestures, i.e. ecological components of the situation that are addressed by ges-
tures of the hands: the setting at hand, which a gesture can elaborate; the visible scene,
which gestures can structure; talked-about worlds beyond the here and now, which can
be depicted or evoked by motions of the hands; abstract experience – ideation – which

gesture can give concrete form; the actions of others, which the gesturer’s actions can
request, suppress, respond to, or model; and the communicative actions of self, which
are given tangible, often metaphorical form, by hand gestures. A single gesture can si-
multaneously project meaning onto multiple domains: a pointing gesture structures the
addressee’s visual field by directing the perceptual actions of the other, a gesture with
which the turn at talk is being “handed over” – and which therefore figure’s the ges-
turer’s own communicative act – can simultaneously solicit response from the other.
And yet, despite the potential for multiple contextual effects and some overlap between
them, it appears that observable, routinized gesture practices indeed cluster according
to these six task domains (which do not include gestures in the context of ritual, nor
those gestures that are sometimes parts of institutional actions such as the oath and
the pledge of allegiance); in other words, we can identify gestural practices in relation
to these tasks. Gestural productions do not always consist of prefabricated parts akin to
the entries in a lexicon; often, they are made up on the spot. This is the case, for exam-
ple, when the hands depict a specific object and, to do so, configure themselves and
move in ways that are adapted to and therefore evoke this object’s specific shape.
But incidental, improvised representations are nevertheless made according to meth-
ods, and the gestures would not be intelligible if these methods were not shared with
others with whom one wishes to achieve practical understanding. Hanks, who has re-
peatedly articulated the tenets of a practice-based linguistics (Hanks 1996, 2005), adopt-
ing Bourdieu’s concept of practice (Bourdieu 1977, 1990) as an alternative to what he
regards as an obsolete concept of rules, makes this very point: “In order for two or more
people to communicate, at whatever level of effectiveness, it is neither sufficient nor
necessary that they ‘share’ the same grammar. What they must share, to a variable
degree, is the ability to orient themselves verbally, perceptually, and physically to
each other and to their social world” (Hanks 1996: 229).
5. Ecologies of gesture
We can only give the most cursory outline here of some of the ways in which hand ges-
tures relate to and bear upon situated ecologies. First, relating to the world within reach
of the hands, we find indexical gesture actions by which objects are selected from their
context – as figures from grounds – including placing (Clark 2003), raising, tapping, and
tracing (Goodwin 1994). A seemingly unlimited, infinitely renewable set of methods of
tactility, hapticity, and manipulation serve to transform perceived reality into counter-
factual or potential versions: the scene at hand is annotated by schematic acts to adum-
brate what could be, what could be done, what should be – to transform the perceived
scene into a field of action. Hand gestures also emanate from “active touch” (Gibson
1962), i.e. information-gathering actions of the hands whose findings – an object’s tex-
ture or internal properties – can be broadcast by motions of the fingers and hands that
are amplified versions of those qualities of motion that correspond to the qualities of
the surface or object that is being touched (softness, coarseness, and so on). In this fashion,
tactile experience is made visible, attesting to the “interstitial”, modality-transcending
affordances of gestures of the hands. We can also include gestures of the hands that
are simply repetitions or augmentations of practical acts and thereby select the action
itself as cognitive object and highlight some of its features, often for instructional
purposes (Streeck 2002a); these include repeating, intensifying, and enlarging, or

re-enacting the schema of a practical act, not with, but in the vicinity of an object.
The visible world that is beyond the reach of our hands, can serve as both target and
ground of gestural figurations. Pointing gestures structure the viewer’s visual field – or,
rather, direct the viewer to do so – but what is seen can also be made out to be a per-
ceptual object of a particular kind, or an object that ought to be apprehended under a
particular aspect; this is usually done by variations of hand-shape and motion pattern.
Thus, Kendon (Kendon 2004; Kendon and Versante 2003) has shown that pointing ges-
tures in Southern Italy that are made with a flat hand present the target as something to
be inspected, as an exemplar of a kind, or to be seen with a critical attitude, depending
on the orientation of the palm. The Arrernte in Australia distinguish between finger-,
lip-, and eye-pointing (Wilkins 2003). To point with the eye, Arrernte must first attract
an other’s gaze and then direct it at a target, all without moving the head or using lan-
guage, for eye-pointing is a practice for secretly directing attention at someone. Lip-
pointing can accompany pointing by eyes, but it also involves head-movements: being
conspicuous rather than concealed, it contrasts with standard finger-pointing by presup-
posing familiarity among the parties and an absence of formality in the communica-
tive situation.
Indexical genres of gesture reconfigure the perceptual and cognitive ecology of the
situation by disclosing meaning in the environment that is relevant to the project at
hand. When they make hand gestures to describe the world beyond the here and
now, speakers briefly but methodically glance at their gesturing hands: they inspect
the model-world that they are creating at arm’s length and relate to it from without,
like the other, “in the third person”. Or they inhabit an other (or a prior self), miming
experience and action “from within”, “in the first person”, making their entire body
part of the representation. For both modes of depiction, there are batteries of routi-
nized practices, which may or may not vary much between cultures: we can not know
because there has been hardly any comparative research on methods of gestural
depiction.
The advantages of a praxeological approach to gesture are especially evident in the
depictive domain. Depictive gestures, perhaps with a small set of exceptions in different
communities, are not sedimented and transmitted in the form of a communal lexicon;
no sizable repertoire of conventionalized depictive gestures has been documented
that would parallel the descriptive lexicon of a natural language. More often than
not, when a specific gestural depiction is needed, it is improvised. The depiction may
or may not include established depictive forms. What we commonly find are recurrent
depiction methods, distinct and habitualized practices by which depictions are made and
a world is rendered. While the categories and terminology may vary somewhat, there is
consensus in the literature about the basic stock of depiction methods (Kendon 2004;
Müller 1998; Sowa 2006). Streeck’s (2008a) list includes modeling (cf. Kendon 2004),
in which a body part is given the role of object-token; bounding practices by which
sizes are shown (Sowa 2006); practices of simulated making by which objects are
acted into being (e.g., by schematic acts of “shaping” and “putting together”); scaping,
i.e. ways of using the hands to evoke not objects, but “scapes” (i.e. landscapes, terrains);
and handling, i.e. characteristic hand-motions that are schematic versions of manual
acts by which the object in question is ordinarily handled. Handling includes schemata
of greater and smaller specificity, characterizing objects with greater and smaller
specificity and granularity: simple acts of schematic holding or putting (setting down)
characterize classes of objects in the broadest terms, while schematic versions of tech-
nical acts visualize narrow object categories (for example, both hands in grip position,
moved a few inches up and down in alternation, appears to unequivocally evoke a steer-
ing wheel). When little constructional effort is expended because the depiction is not
salient or meant to be precise, very generic acts of transportation are often employed:
picking up, holding, setting down. Schematically depictive handlings build upon our
everyday experience with objects, abstracting schemata from them and using them in
representational fashion.
Commonly, depictive and conceptual gestures are co-classified as iconic gestures, and
there is no overt difference in the forms of these gestures; they use the same methods to
construct imagery that represents some phenomenon in some world. However, it makes
a fundamental difference whether an “iconic” gesture is used to depict a phenomenon
or the phenomenon that it evokes serves as a concrete form to structure abstract con-
tent. Moreover, speakers behave quite differently when they depict than when they
conceptualize by gesture. Only in the former case do they turn their attention to
their hands and thereby also direct the addressee’s attention to the gesture. This is
never done when the gesture has a conceptual function. Like linguistic metaphors –
which picture something concrete while referring to something abstract and involve com-
parison and analogy between domains – conceptual gestures provide concrete Gestalten
for ideations: they give form to notions (see Streeck 2009a: Chapter 7). Conceptual
gestures usually remain in the background, receiving no focused visual attention.
Conceptual gestures bear a relationship to manual acts by which humans seek to
make sense of tangible things: reaching for, grasping, taking, holding, manipulating,
and exploring. As our hands find the right aperture, posture, and constellation of forces
when they take hold of an object, they understand practically what kind of an object it
is, what its affordances are, what can be done with it. Our hands acquire a schema, a
haptic concept, of a (class of) thing. (Note that the very term schema originally denotes
a wrestler’s hold.) Practical understandings are embodied as habitualized prehensile
postures and acts. These habitualized schemata then become available to be projected
onto other experiential domains, in other words, for metaphorical usage. The metaphor-
ical nature of conceptual gesture practices is central to Calbris’ account of meaning in
gesture (Calbris 2011). Calbris assumes, with Leroi-Gourhan, that gestures are intrinsi-
cally related to practical action. By evoking practical actions, gestures simultaneously
evoke all the other components (or valences) associated with them: the actor, instru-
ment, object, manner of action, etc. Moreover, “not only does gesture intervene in
our physical interaction with the world but also in our pictorial […] and linguistic
[…] expressions […]. One finds it constantly present in this interactive loop of progressive
feedback between action and representation” (Calbris 2011: 1).
Gesture embodies the body’s real-world experience: “the living world [reverberates]
in the living body. Semiosis, conceptualization, and thought originate in the body, the
receptacle of real-world interactions” (Calbris 2011: 2). Gestures give form to concepts
not by providing a material substrate upon which meaning can be inscribed, but by
retaining and providing lived experiential structure to organize novel content.
Thus, gestural practices for depicting and making sense of the world are derived from
ways in which embodied human beings engage with the physical world; we understand
the gestures – and see what they are meant to evoke – because we are familiar with the
bodily engagements that they are predicated on. Gestural imagery evokes and presup-
poses the background of embodied everyday experience. It derives from our common
hold on things. In a praxeological account, conceptual gestures are conceptual actions,
i.e. physical actions that bring their own inherent experiential structures and meanings
to bear on the situation at hand. It is in this fashion, by providing lived, schematic expe-
rience for meaning-making, that gesture not so much supports and expresses thought,
but constitutes a form of thinking.
6. Methodology
Different strands of praxeological research have developed, and the granularity of
description and analysis of social practices varies between them. Research into “com-
munities of practice” aims for comprehensive accounts of those practices that define
the community, or they present more detailed investigations of key moments and the
practices deployed in them. Praxeological research has been conducted under monikers
such as “activity theory”, “complexity theory”, “actor-network theory”, “studies of
work/workplace studies”, “distributed cognition”, and others, but all praxeological
research includes some ethnographic component.
In France, anthropological film-makers in the tradition of Mauss and trained by
Jean Rouch have developed a method of praxeological research that uses cinemato-
graphic techniques (editing and montage) to unveil the structures of human activities
and the ‘knowledge-producing gestures’ (gestes de savoir; Comolli 1991) by which
techniques of the body and material practices are communicated to novices. Mauss
insisted that techniques of the body are acquired through training, in co-operative,
gesturally mediated, and sometimes transitory apprenticeship interactions (de France
1983). Praxeology, as these researchers understand it, reveals the chaine operatoire
of an activity, the series of consecutive gestures that are made in the making of an
artifact, the playing of an instrument, the setting of a table, or in the course of a social
encounter. Praxeological analysis is comprised “a partir d’une micro-analyse des
articulations entre les differents moments d’un procès, examinées de proche en
proche, et d’une macro-analyse s’attachant aux articulations entre des vastes ensem-
bles de séquences d’activitées” (de France 1983: 148). The ultimate goal appears to be
a kind of encyclopedia or grammar of everyday activities and skills, arrived at by the
micro-analysis of individual cases and the comparative analysis of collections of
phenomena.
Practices must be studied in situ, within the moment-by-moment progression of in-
teractions within which they are enacted. Micro-ethnography (Erickson and Schultz
1977; Streeck and Mehus 2004) is a research methodology that is both geared towards
the identification of generic methods and devices (practices habitualized in a commu-
nity or by an individual to deal with kinds of circumstance) and to the Gestalt and
local significance of the single moment of activity and interaction. It is particularly
well-suited for praxeological research because it focuses on the minute details of the
embodied enactment of practices in which alone the skillfulness of social actors fully
reveals itself.
Gesture practices address issues of intersubjectivity; they are practices geared to
achieve practical understanding in the context of collaboration and talk. Intersubjectiv-
ity in interaction is predicated on sequentiality, i.e. the opportunity for next parties to
display their understanding of a current action in their response to it, and for the orig-
inator of the original action to monitor in next party’s response whether the original
action has indeed been understood. Gestures of the hand do their work in relation to
and as components of conversational turns, actions, and action sequences. They address
distinctly different tasks at different positions within turns and sequences, and the pro-
jection that a gesture makes – what it conveys about the next moment – is a matter of
the exact moment when it is made (Streeck 2009b): a hand gesture made just before the
onset of a turn can specify the communicative act that is to be performed in that turn; a
gesture that occupies a gap between article and noun will be perceived as a (literal or
metaphorical) depiction; during turn-completion, a shrug can serve as an invitation to
the other to share or respond to the stance towards the just-completed utterance (its
content, its pragmatic upshot) that the speaker expresses by the shrug. There has
been a considerable amount of research on the step-by-step production of gestures,
i.e. the stages in which simple and complex gestural acts unfold, from the moment
the hands depart from their “rest positions” to the stroke, perhaps a post-stroke freeze,
and back to “home position” (Sacks and Schegloff 2002). Gesturers make use of the
possibility to manipulate the temporal characteristics of speech and gesture to achieve
distinct communicative effects. For example, a post-stroke hold during a negation ges-
ture marks the scope of the negation (Harrison 2010); or the freezing of a turn “hand-
over” gesture may indicate a waiting stance and thereby convey a request for response.
The movement arc of the single gesture – and more so the combined motions and holds
of extended gesture phrases – offer rich and varied opportunities for coupling gestures
and words in multimodal “packages”, and the importance of multimodal utterances
has been shown for various practice communities (Heath and Luff 2006) and instruc-
tional situations (Müller and Bohle 2007). Goodwin has called gesture an “intersti-
tial” medium, i.e. an embodied praxis that is particularly apt at stitching other
representations – for example, words and pictures – together (Goodwin 2007). The
fine-grained analysis of gesture practices, therefore, not only demands close attention
to the sequential unfolding of the gesture and the responses it receives, but also to the
ways in which other resources are recruited at the moment. Gesture is a visual modality,
and to perform salient communicative tasks, it needs to be seen. A speaker who seeks to
implicate a gesture in the moment’s understanding is thus tasked with getting the ad-
dressee’s visual attention. Conversely, listeners who do not wish to affiliate with the
turn’s project may do so by actively disattending the gesture that is a salient part of
it (Streeck 2008b).
In sum, the enactment and communicative effects of different gesture practices can
only reliably be investigated by combining the close, moment-by-moment analysis that
characterizes micro-analytic approaches such as context analysis and conversation
analysis with attention to the ethnographic particulars that make gesture relevant
and practical within a particular community and situation.
7. Praxeologies of gesture: Some examples

Gesture practices have been studied with diverse investigative aims. While studies of
conversational gestures are typically focused on what goes on in the single turn or
sequence – when is a gesture inserted into a turn, how is it prepared? – ethnographic
studies may aim to take stock of the entire register of practices that are enacted by
the members of a practice community in their day-to-day interactions. As the “practice

turn in contemporary theory” (Schatzki, Knorr-Cetina, and von Savigny 2001) is a rel-
atively recent phenomenon, only few studies of gesture so far have been articulated in
explicitly praxeological terms. Goodwin has analyzed “pointing as situated practice”,
describing it as a configuring of multimodal resources that include not only a body
that is visibly performing a pointing act and concurrent talk “that elaborates and is ela-
borated by the act of pointing” (Goodwin 2003: 219), but also spatial features of the
target domain, the parties’ orientation to one another, and the activity in which the
act is performed. Other studies of pointing can be re-interpreted as praxeologies, for
example Enfield’s (2009) account of the systematic coupling of Lao deictic terms
with index-finger and lip pointing, or Kendon’s studies of discourse-structuring gestures
in Southern Italy (Kendon 1995).
A distinct body of studies has been devoted to coordination practices, i.e. methods
and devices by which hand-gestures and other embodied acts, including speech, are
brought into alignment and made to contextualize and elaborate one another. Goodwin
and Goodwin (1986) have identified methods by which gestures are implicated and
made relevant during word-searches, i.e. when understanding by talk is temporarily
at risk; Streeck (2002b) has described linguistic devices (so, like) by which gestures
and other bodily displays are inserted into the grammatical frames of sentences. Atten-
tion has also been given to the body’s ability to synthesize and translate phenomena
across perceptual modalities, an ability that is important, for example, in musical con-
texts where notions of sound and timing are given gestural form in the coordination of
actions between ensemble members and in the conductor’s embodied performance vis-
a-vis the orchestra (Wakin 2012). Here, gesture is at its most practical: it patterns the
bodily praxis of others. It does so by creating dynamic bodily forms with which other
bodies can resonate and from which they receive guides to action. Hand motions
make musical notions concrete, for example by giving a motor display of the concept
‘joyful’ (allegro), or by figuring the swelling of a note. Rahaim (2008), writing about
hand gestures accompanying Indian raga singing, suggests that parallels between speech
gestures and song gestures attest to an underlying unity of musical and linguistic im-
provisation, a shared creative methodology that finds proper expression only in the
interstitial medium of gesture.
Hutchins, whose work Cognition in the Wild (1995) has done much to define a new
method, cognitive ethnography, and a new research field, distributed cognition, has
shown in studies of cockpits and science labs how gestures of the hand become “mate-
rial anchors” for conceptualizations (Hutchins 2005). Gestures offer dynamic represen-
tational possibilities that are unparalleled in other modalities, enabling depictions of
static phenomena by movement, but also the temporary fixing of information that
might otherwise be susceptible to rapid decay. Hutchins and Nomura (2011) highlight
the ability of gestures to interpret immobile graphic representations (see also Ochs,
Gonzales, and Jacoby 1996), and the ease with which gestures anchor conceptual blends,
i.e. the projection of schemata across cognitive domains. And they give examples of key
actions that have been abstracted in the community of pilots as conventional mimetic
gestures. Studies of professional communities often present gestures as adaptive rou-
tines, i.e. traditional or habitualized solutions to recurrent task configurations, but
they also emphasize the artful improvisations that characterize creative communication
in the workplace.
Analysis of the full gesture repertoire in a community of practice can also reveal a
community’s lived epistemology, as Sauer (2003) has demonstrated in her analysis of
the ways in which miners re-embody accidents or dangerous situations. They alternate
between “first-person” and “third-person” depictions of mining accidents, shifting back
and forth between rendering subjective experience and analyzing the situation from a
detached, “God’s eye” point of view. It is this duality of perspectives upon which
their survival often depends – the ability to subjectively sense danger and simulta-
neously analyze a situation in detached, objective terms (“at arm’s length”). Miners
refer to this as “pit sense”.
8. Conclusion
Because of the relative novelty of the practice approach to the analysis of social life and
communication and because many researchers are committed to longer-standing para-
digms such as cognitive psychology, and conversation analysis, it is unlikely that prax-
eology will visibly dominate gesture studies and interaction research any time soon.
Its influence may remain more indirect, coming from the practice-orientation that is
already inherent in some of these methodologies, especially those that trace part of
their theoretical lineage either to the late Wittgenstein (1953) – the Wittgenstein of
practical understandings – or to phenomenologists such as Polanyi (1958, 1966) and
Merleau-Ponty (1962) and their interest in the enactive nature of embodiment and in
the embodied nature of human cognition. It is also possible, however, as Meyer
(2010) has suggested, that the approach outlined here will provide praxeological studies
of other domains (e.g., science-and-technology studies, studies of work) and the micro-
analysis of social interaction in general with a new model for naturalistic research,
because of its empirical precision, ecological validity, and because its practitioners
recognize the enculturated nature of the living, interacting, and gesturing body.
9. References
Austin, John 1962. How to Do Things with Words. Oxford: Oxford University Press.
Bateson, Gregory 1972. Steps to an Ecology of Mind. New York: Ballantine.
Bateson, Gregory and Margaret Mead 1942. Balinese Character. A Photographic Analysis. New
York: New York Academy of Sciences.
Bourdieu, Pierre 1977. Outline of a Theory of Practice. Cambridge: Cambridge University Press.
Bourdieu, Pierre 1990. The Logic of Practice. Stanford, CA: Stanford University Press.
Clark, Herbert 2003. Pointing and placing. In: Sotaro Kita (ed.), Pointing: Where Language, Cul-
ture, and Cognition Meet, 243–268. Mahwah, NJ: Lawrence Erlbaum.
Comolli, Annie 1991. Les Gestes du Savoir, second edition. Paris: Éditions Publidix.
De France, Claudine 1983. L’Analyse Praxéologique. Composition, Ordre et Articulation d’un
Procès. Paris: Éditions de la Maison des Sciences de l’Homme.
Dreyfus, Hubert L. 1991. Being-in-the-World. A Commentary on Heidegger’s “Being and Time”.
Cambridge: Massachusetts Institute of Technology Press.
Enfield, N. J. 2009. The Anatomy of Meaning. Cambridge: Cambridge University Press.
Erickson, Frederick and Jeffrey Schultz 1977. When is a context? Some issues in the analysis of
social competence. Quarterly Newsletter of the Institute for Comparative Human Development
1(2): 5–10.
Gibson, James J. 1962. Observations on active touch. Psychological Review 69: 477–491.
Gibson, James J. 1966. The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Goodman, Nelson 1978. Ways of Worldmaking. Indianapolis: Hackett.
guage and Cognition Meet, 217–242. Mahwah, NJ: Lawrence Erlbaum.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan D. Duncan, Justine Cassell
and Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language: Essays in
Honor of David McNeill, 195–212. Philadelphia: John Benjamins.
Goodwin, Charles and Marjorie Harness Goodwin 1986. Gesture and coparticipation in the activ-
ity of searching for a word. Semiotica 62(1/2): 51–75.
Hanks, William F. 1996. Language and Communicative Practices. Boulder, CO: Westview Press.
Hanks, William F. 2005. Pierre Bourdieu and the practices of language. Annual Review of Anthro-
pology 34: 67–83.
Harrison, Simon 2010. Evidence for node and scope of negation in coverbal gestures. Gesture
10(1): 29–51.
Heath, Christian and Paul Luff 2006. Video analysis and organizational practice. In: Hubert Kno-
blauch, Bernt Schnettler, Jürgen Raab and Hans-Georg Soeffner (eds.), Video Analysis: Meth-
odology and Methods. Qualitative Audiovisual Analysis in Sociology, 35–50. Frankfurt am
Main: Peter Lang.
Holzkamp, Klaus 1978. Sinnliche Erkenntnis: Historischer Ursprung und Gesellschaftliche Funk-
tion der Wahrnehmung, 4th revised edition. Königstein: Athenäum.
Hutchins, Edwin 1991. The social organization of distributed cognition. In: Lauren B. Resnick,
John M. Levine and Stephanie D. Teasley (eds.), Perspectives on Socially Shared Cognition,
283–307. Washington, DC: American Psychological Association.
Hutchins, Edwin 1995. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Hutchins, Edwin 2005. Material anchors for conceptual blends. Journal of Pragmatics 37: 1555–
1577.
Hutchins, Edwin and Saeko Nomura 2011. Collaborative construction of multimodal utterances.
In: Jürgen Streeck, Charles Goodwin and Curtis D. LeBaron (eds.), Embodied Interaction.
Language and Body in the Material World, 289–304. New York: Cambridge University Press.
Jeannerod, Marc 1997. The Cognitive Neuroscience of Action. Oxford: Blackwell.
Johnson, Mark 1987. The Body in the Mind. Chicago: University of Chicago Press.
ian conversation. Journal of Pragmatics 23(3): 247–279.
Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan”. In: Sotaro Kita (ed.),
Pointing: Where Language, Culture, and Cognition Meet, 109–138. Mawah, NJ: Lawrence
Erlbaum.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David
Leroi-Gourhan, André 1993. Gesture and Speech. Cambridge: Massachusetts Institute of Technol-
ogy Press. First published [1964].
Mauss, Marcel 1973. The techniques of the body. Economy and Society 2(1): 70–88. First published
[1935].
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. London: Routledge.
Meyer, Christian 2010. Gestenforschung als Praxeologie: Zu Jürgen Streecks mikroethnogra-
phischer Theorie der Gestik. Gesprächsforschung – Online-Zeitschrift zur Verbalen Interaktion
11: 208–230.
Berlin: Arno Spitz.
Müller, Cornelia and Ulrike Bohle 2007. Das Fundament fokussierter Interaktion. Zur Vorberei-
tung und Herstellung von Interaktionsräumen durch körperliche Koordination. In: Reinhold
Schmitt (ed.), Koordination. Studien zur Multimodalen Interaktion, 129–166. Tübingen: Gunter
Narr.
Noland, Carrie 2009. Agency and Embodiment. Performing Gestures/Producing Culture. Cam-
bridge, MA: Harvard University Press.
Ochs, Elinor, Patrick Gonzales and Sally Jacoby 1996 “When I come down I’m in the domain
state”: Grammar and graphic representation in the interpretive activity of physicists. In: Elinor
Ochs, Emanuel A. Schegloff and Sandra E. Thompson (eds.), Interaction and Grammar,
Polanyi, Michael 1958. Personal Knowledge. Towards a Post-Critical Philosophy. New York:
Harper Torchbooks.
Polanyi, Michael 1966. The Tacit Dimension. Garden City, NY: Doubleday.
Rahaim, Matt 2008. Gesture and melody in Indian vocal music. Gesture 8(3): 325–347.
Ryle, Gilbert 1949. The Concept of Mind. London: Hutchinson’s University Library.
Sacks, Harvey and Emanuel A. Schegloff 2002. Home position. Gesture 2(2): 133–146.
Sauer, Beverly J. 2003. The Rhetoric of Risk. Technical Documentation in Hazardous Environ-
ments. Mahwah, NJ: Lawrence Erlbaum.
Schatzki, Theodore R. 2001. Introduction: Practice theory. In: Theodore R. Schatzki, Karin Knorr-
Cetina and Eike von Savigny (eds.), The Practice Turn in Contemporary Theory, 1–14. London:
Routledge.
Schatzki, Theodore R., Karin Knorr-Cetina and Eike von Savigny (eds.) 2001. The Practice Turn in
Contemporary Theory. London: Routledge.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Shape Descriptions. Berlin: Akade-
mische Verlagsgesellschaft.
Streeck, Jürgen 2002a. A body and its gestures. Gesture 2(1): 19–44.
Streeck, Jürgen 2002b. Grammars, words, and embodied meanings. On the evolution and uses of
so and like. Journal of Communication 52(3): 581–596.
Streeck, Jürgen 2008a. Depicting by gestures. Gesture 8(3): 285–301.
Streeck, Jürgen 2008b. Laborious intersubjectivity: Attentional struggle and embodied communi-
cation in an auto-shop. In: Ipke Wachsmuth, Manuela Lenzen and Günther Knoblich (eds.),
Embodied Communication in Humans and Machines, 201–228. Oxford: Oxford University
Press.
Streeck, Jürgen 2008c. Metaphor and gesture: A view from the microanalysis of interaction. In:
Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 259–264. Amsterdam: John
Benjamins.
Streeck, Jürgen 2009a. Forward-gesturing. Discourse Processes 45(3/4): 161–179.
Streeck, Jürgen 2009b. Gesturecraft. The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Siri Mehus 2004. Microethnography: The study of practices. In: Kristine L.
Fitch and Robert E. Sanders (eds.), Handbook of Language and Social Interaction, 381–405.
Thelen, Esther and Linda Smith 1994. A Dynamic Systems Approach to the Development of Cog-
nition and Action. Cambridge: Massachusetts Institute of Technology Press.
Wakin, Daniel J. 2012. The maestro’s mojo. New York Times, April 6.
Wilkins, David 2003. Why pointing with the index finger is not a universal (in sociocultural and
semiotic terms). In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition
Meet, 171–216. Mahwah, NJ: Lawrence Erlbaum.
Wittgenstein, Ludwig 1953. Philosophical Investigations. Oxford: Blackwell.
Jürgen Streeck, Austin, TX (USA)

44. A “Composite Utterances” approach to meaning 689
44. A “Composite Utterances” approach to meaning

1. Introduction
2. Composite utterances
3. Sign filtration: Triggers and heuristics
4. Semiotic analysis of gestures
5. Conclusion and prospects
6. References
Abstract
This chapter argues for a composite utterances approach to research on body, language,
and communication. It argues that to understand meaning we need to begin with the utter-
ance or speech act as the unit of analysis. From this perspective, the primary task in inter-
preting others’ behaviour in communication is to infer what a person wants to say. In
order to solve this task, an interpreter is free to consult any and all available information,
regardless of the sensory modality in which that information is gathered (e.g., vision ver-
sus hearing), and regardless of the semiotic function of that information (e.g., iconic/
indexical, symbolic/conventional, or some combination of these). Having recognized
that another person has an intention to communicate, an interpreter takes the available
relevant information (e.g., vocalizations, facial expressions, hand movements, all in the
context of synchronic knowledge of linguistic and cultural systems, and other aspects
of common ground) and looks for a way in which those co-occuring signs may simulta-
neously point to a single overall message of the move that a person is making. This is
helped by the binding power of social cognition in an enchronic context (that is, the
sequential context of turn-by-turn conversation), in particular the assumption that people
are not merely saying things but making moves. The chapter focuses on co-speech hand
gestures, and also discusses implications of the composite utterances approach to research
on syntax, and on sign language.
1. Introduction
In human social behavior, people build communicative sequences move by move.
These moves are never semiotically simple. Their composite nature is widely varied
in kind: they may consist of a word combined with other words, a string of words com-
bined with an intonation contour, a diagram combined with a caption, an icon com-
bined with another icon, a spoken utterance combined with a hand gesture. By what
means does an interpreter take multiple signs and draw them together into unified,
meaningful packages? This chapter explores the question with special reference to
one of our most familiar types of move, the speech-with-gesture composite, a classical
locus of research on body, language, and communication (see other chapters of this
handbook relating to gesture, and many references therein). The central question is
this: How do gestures contribute to the meaning of an utterance? To answer this,
we need to situate research on gesture within broader questions of research on
meaning.
1.1. Meaning does not begin with language

In a person’s vast array of communicative tools, language is surely unrivalled in its
expressive richness, speed, productivity, and ease. But the interpretation of linguistic
signs is ultimately driven by broader principles, principles of rational cognition in social
life, principles which underlie other processes of human judgment, from house-buying
to gambling to passing people on a crowded street. So, to understand meaning in human
utterances, we ought not begin with language (Enfield and Levinson 2006: 28). There is
meaning in language for the same reason there is meaning elsewhere in our social lives:
because we take signs to be public elements of cognitive processes (Peirce 1955), evi-
dence of others’ communicative intentions (Grice 1957, 1975). Our clues for figuring
out those intentions are found not only in conventional symbols like words, but in
the rich iconic-indexical relations which weave threads between just about everything
in sight (Kockelman 2005; Levinson 1983; Peirce 1955, Silverstein 1976). Language is
just a subset of the full resources necessary for recognizing others’ communicative
and informative intentions.
1.2. Meaning is dynamic, motivated, and concrete

Among fashions of thinking about language over the last century, a dominant neo-
Saussurean view says that meaning is a representational relation of phonological
form to conceptual content: A sign has meaning because it specifies a standing-for rela-
tion between a signifier and a signified. Semanticists of many different kinds agree on
this (see Cruse 1986; Jackendoff 1983; Langacker 1987; Wierzbicka 1996, among
many others). But there is reason to question whether a view of signs as static, arbitrary,
and abstract is an adequate depiction of the facts, or even optimal as an analytic frame-
work of convenience. There is reason to stay closer to the source, to see signs as they
are, first and foremost: dynamic, motivated, and concrete (Hanks 1990). To explicate
this point: Standard statements about meaning such as “the word X means Y” really
mean “people who utter the word X are normatively taken by others to intend Y across
a range of contexts”. We should not, then, understand dichotomies like static versus
dynamic, arbitrary versus motivated, or abstract versus concrete as merely two sides
of a single coin. The relation is asymmetrical, since we are always anchored in the
dynamic-motivated-concrete realm of contextualized communicative signs.
Some traditions doubt whether a Saussurean “form-meaning mapping” account of
meaning is appropriate. In research on co-speech hand gesture, McNeill (2005) has
forcefully questioned the adequacy of a coding-for-decoding model of communica-
tion. The same point has long been made for more general reasons, in more encom-
passing theories of semiosis, and in theories of how types of linguistic structure
mean what they mean when used as tokens in context (Grice 1975). Thus, alternatives
to a static view of meaning are available for dealing with the specific problems
of co-speech gesture. These come from two sources: (neo-)Peircean semiotics (e.g.,
Colapietro 1989; Kockelman 2005; Parmentier 1994; Peirce 1955) and (neo-)Gricean
pragmatics (e.g., Atlas 2005; Grice 1975; Horn 1989; Levinson 1983, 2000; Sperber and
Wilson 1995). Subsequent sections explore the relevant analytic tools offered by these
traditions.
1.3. Meaning is a composite notion

When people say things they typically do so by combining words with images. A rela-
tively simple example of a composite sign is the image-with-caption format typified by
photographs and artwork. What makes this kind of thing a composite sign is that the
visual image and the string of words are taken together as part of the artist’s single over-
all intention (Preissler and Bloom 2008; see Richert and Lillard 2002). The image and
the words are different types of signs, but they are presented together, and taken
together, in a composite. Interpreting such composites is done by means of a general
heuristic of semiotic unity: when encountering multiple signs which are presented
together, take them as one. This example illustrates essentially the same thing we
find in the co-occurrence of expressive hand movements with speech: context-situated
composites of multiple signs, part conventional, part non-conventional. Consider
Fig. 44.1, an image from a video-recording showing three Lao men sitting in a village
temple, one of them thrusting his arm forward and down, with his gaze fixed on it.
(Note: This example and the following one are from a corpus of video-recorded talk
collected in Laos since 2000; as should be obvious, the point I am making here is not
specific to the Lao data, and could be illustrated with comparable data from any
other culture.)
Fig. 44.1: Man (left of image) speaking of preferred angle of a drainage pipe under construction:
“Make it steep like this.”
The discussion in the context of Fig. 44.1 is about construction works under way in the
temple. The man on the left is reporting on a problem in the installation of drainage
pipes from a bathroom block. He says that the drainage pipes have been fixed at too
low an angle, and they should, instead, drop more sharply, to ensure good run-off. As
he says haj5 man2 san2 cang1 sii4 ‘Make it steep like this’, he thrusts his arm forward
and down, fixing his gaze on it, as shown in Fig. 44.1. The meanings of his words and his
gesture are tightly linked, through at least three devices:
(i) their tight spatiotemporal co-occurrence in place and time (both produced by the
same source),
(ii) the use of the explicit deictic expression “like this” (sending us on a search: “Like
what?”, and leading us to consult the gesture for an answer),
(iii) the use of eye gaze for directing attention.
A similar case is presented in Fig. 44.2, from a description of a type of traditional Lao
fish trap called the sòòn5 (see Enfield 2009: Chapter 5).
Fig. 44.2: Man describing the sòòn5, a traditional Lao fish trap: “As for the sòòn5, they make it
fluted at the mouth.”
Again we see a speaker’s overall utterance meaning as a unified product of multiple

sources of information:
(i) a string of words (itself a composite sign consisting of words and grammatical
constructions),
(ii) a two-handed gesture,
(iii) tight spatiotemporal co-occurrence of the words and gestures (from a single
source), and
(iv) eye gaze directed toward the hands, also helping to connect the composite utter-
ance’s multiple parts.
This is subtly different from Fig. 44.1 in that it does not involve an explicit deictic
element in the speech. Like the picture-with-caption examples mentioned above,
spatiotemporal co-placement in Fig. 44.2 is sufficient to signal semiotic unity. The
gesture, gaze and speech components of the utterance are taken together as a uni-
fied whole. As interpreters, we effortlessly integrate them as relating to one overall
idea.
A general theory of composite meaning takes Figs. 44.1 and 44.2, along with road
signs, paintings on gallery walls, and captioned photographs to be instances of a single
phenomenon: signs co-occurring with other signs, acquiring unified meaning through
being interpreted as co-relevant parts of a single whole. A general account for how
the meanings of multiple signs are unified in any one of these cases should apply to
them all, along with many other species of composite sign, including co-occurring
icons in street signs, grammatical unification of lexical items and constructions, and
speech-with-gesture composites.
In studying speech-with-gesture, there are two important desiderata for an account
of composite meaning. A first requirement is to provide a modality-independent
account of gesture (Okrent 2002). While we want to capture the intuition that co-speech
hand gesture (manual-visual) conveys meaning somehow differently to speech (vocal-
aural), this has to be articulated without reference to modality. We need to be able
to say what makes speech-accompanying hand movements “gestural” in such a way
that we can sensibly ask as to the functional equivalent of co-speech gesture in other
kinds of composite utterances; for example, in sign language of the Deaf (all visual,
but not all “gesture”), or in speech heard over the phone (all vocal-aural, but not all
“language”).
A second desideratum for an account of meaning in speech-with-gesture composites
is to capture the notion of “holistic” meaning in hand gestures, the idea that a hand
gesture has the meaning it has only because of the role it plays in the meaning of an
utterance as a whole (Engle 1998; McNeill 1992, 2005). If we want to achieve analytic
generality, then a notion of holistic meaning is required not only for analyzing the
meaning of co-speech hand gesture, but more generally for analyzing linguistic and
other types of signs as well. This results from acknowledging that an interpreter’s
task begins with the recognition of a signer’s communicative intention (i.e., recognizing
that the signer has an informative intention). The subsequent quest to lock onto a target
informative intention can drive the understanding of the composite utterance’s parts,
and not necessarily the other way around.
2. Composite utterances
2.1. Contexts of hand gesture
One view of speech-with-gesture composites is that the relation between co-expressive
hand and word is a reciprocal one: “the gestural component and the spoken component
interact with one another to create a precise and vivid understanding” (Kendon 2004:
174, original emphasis; see Özyürek et al. 2007). By what mechanism does this reci-
procal interaction between hand and word unfold? Different approaches to analyzing
meanings of co-speech gestures find evidence of a gesture’s meaning in a range of
sources, including
(i) speech which co-occurs together with the hand movement,

(ii) a prior stimulus or cause of the utterance in which the gesture occurs,
(iii) a subsequent response to, or effect of, the utterance, or
(iv) purely formal characteristics of the gesture.
These four sources, often combined, draw on different components of a single underly-
ing model of the communicative move and its sequential context, where the hand-
movement component of the composite utterance is contextualized from three angles:
A. what just happened, B. what else is happening now, C. what happens next. This
three-part sequential structure underlies a basic trajectory model recognized by many
students of human social behaviour. Schütz (1970), for example, speaks of actions
(at B) having “because motives” (at A) and “in-order-to motives” (at C; e.g., I’m picking
berries [B] because I’m hungry [A], in order to eat them [C]; see Sacks 1992; Schegloff
2007 among many others).
2.2. Enchrony: The context of composite utterances

Any utterance is a situated unit of social behaviour with causes (or conditions) and ef-
fects (Goffman 1964; Schegloff 1968). An intentional cause and interpretive effect are
as definitive of the process of meaning as the pivotal signifying behaviour itself. Any
communicative move may be seen as arising more or less appropriately from certain
commitments and entitlements, and in turn bringing about new commitments and enti-
tlements (Austin 1962; Searle 1969), for which interlocutors are subsequently account-
able. As an analytical framework, this remedies the static, decontextualized nature of
Saussure’s version of meaning (Kockelman 2005). But this is not merely because it re-
cognizes that meaning arises through a process (McNeill 2005), it is because it recog-
nizes the causal/conditional and normative anatomy of sequences of communicative
interaction, where each step brings about a new horizon, with consequences for the peo-
ple involved (Atkinson and Heritage 1984; Sacks, Schegloff, and Jefferson 1974; Schegl-
off 1968; Goffman 1981). Accordingly, we need a term for a causal, dynamic perspective
on language whose granularity matches the pace of our most experience-near, moment-
by-moment deployment of utterances in interaction, not historical time (for which the
term diachronic is standard) but conversational time. For this I use the word enchronic.
While diachronic analysis is concerned with relations between data from different years
(with no specified type or directness of causal/conditional relations), enchronic analysis
is concerned with relations between data from neighbouring moments, adjacent units of
behaviour in locally coherent communicative sequences (typically, conversations). The
real-time birth and development of a composite utterance from a producer’s point of
view (for which we might use the term microgenesis) is distinct from the intended
meaning of enchronic here, namely the intersection of
(i) a social causal/conditionality of related signs in sequences of social interaction and

(ii) a particular level of temporal granularity in a conditionally sequential view of
language: conversational time.
An enchronic perspective adopts the sequential analytic approach whose application in

empirical work as pioneered by Schegloff (1968) and Sacks (1992), following earlier
work in sociology. To call it enchronic rather than merely sequential (in the technical
sense of Schegloff 2007) draws attention to the broader set of alternative viewpoints
on systems and processes of meaning which we often need to switch between (including
phylogenetic, diachronic, ontogenetic, and synchronic).
2.3. The move: A basic-level unit for social interaction

A primitive unit of an enchronic perspective is the communicative move (Goffman
1981). A move may be defined as a recognizable unit contribution of communicative
behaviour constituting a single, complete pushing forward of an interactional sequence
by means of making some relevant social action recognizable (e.g., requesting the salt,
passing it, saying Thanks). In communication, a richly multimodal flux of impressions is
brought to order by these joint-attentional pulses of addressed behaviour (e.g., bursts of
talk) marked off in the flow of time and space, yielding sequences of co-contingent social
action (Goodwin 2000; Schegloff 2007). The linguistic utterance is a well-studied (if idea-
lized) type of instantiation of the move (see Austin 1962; Searle 1969). With this basic-
level status, the linguistic move will be homologous with usage-based analytic units of lan-
guage such as the clause (Foley and Van Valin 1984), the intonation unit (Chafe 1994;
Pawley and Syder 2000), the turn-constructional unit (Sacks, Schegloff, and Jefferson
1974), the growth point (McNeill 1992), the composite signal (Engle 1998; see Clark
1996), and the utterance as multimodal ensemble (Goodwin 2000; Kendon 2004).
Whatever its physical form, the move is a single-serve vehicle for effecting action socially.
An important argument in favour of the move’s primitive or basic-level status is its
role in the acquisition of communicative skills in children. Before learning their first
words, children master the move, beginning with its prototype, the pointing gesture
(Kita 2003). A line of research in developmental psychology has identified the onset
of the pointing gesture as a watershed moment in the development of human social cog-
nitive and communicative capacities, both ontogenetically and phylogenetically (Bates,
Camaioni, and Volterra 1975; Bates, O’Connell, and Shore 1987; Liszkowski et al. 2004;
Tomasello 2006). The pointing gesture is mastered by prelinguistic infants (by around
12 months of age) and it is the first type of move to unequivocally display the sort of
shared intentionality unique to human communication and social cognition (Frith
and Frith 2007; Liszkowski 2006; Tomasello et al. 2005).
The move is therefore a starting point, a seed, a template for the deployment of signs
in interaction. On the one hand, the move is a brick for larger structures, building up
and out, into conversational sequences and other kinds of coherent discourse structure
(Halliday and Hasan 1976; Schegloff 2007). On the other hand, it is a frame or exoske-
leton within which internal semiotic complexity may appear, building down and in,
yielding phrase distinctions, morphosyntax, information structure, and logical seman-
tics. Much of the existing research on gesture, such as found in this handbook, examines
the kinds of structure that arise when moves are built from word and hand together.
2.4. Conventional and non-conventional components of

composite utterances
Three types of sign are important in interpreting composite utterances: conventional
signs, non-conventional signs, and symbolic indexicals. For convenience, I simplify the
analysis of sign types employed here. A full anatomy of sign types would lay out the
logical possibilities first mapped by Peirce (1955), and most accessibly interpreted by
Parmentier (1994) and Kockelman (2005). The notion of conventional sign here corre-
sponds to Peirce’s symbol, non-conventional sign includes his icon and index. The Peir-
cean type/token distinction (Hutton 1990) cuts across these (see below). A conventional
sign is found when people take a certain signifier to stand for a certain signified because
that is what members of their community normatively do (Saussure [1916] 1959; on
norms, see Brandom 1979; Kockelman 2006). This kind of sign allows for arbitrary re-
lations like /khæt/ referring to ‘cat’, by which the cause of my taking [khæt] to mean
‘cat’ is my experience with previous occasions of use of tokens of the signifier /khæt/.
Examples of conventional signs include words and grammatical constructions, idioms,
and “emblem” hand gestures such as the OK sign, V for Victory, or The Finger
(Brookes 2004; Ekman and Friesen 1969). Non-conventional signs, by contrast, are
found when people take certain signifiers to stand for certain signifieds not because
of previous experience with that particular form-meaning pair or from social conven-
tion, but where the standing-for relation between form and meaning comes about by
virtue of just that singular event of interpretation. Examples include representational
hand gestures (in the sense of Kita 2000), that is, where the gesture component of an
utterance is a token, analogue representation of its object.
The symbolic indexical is a hybrid of the two types of sign just described, having
properties of both. These include anything that comes under the rubric of deixis (Fill-
more 1997; Levinson 1983), that is, form-meaning mappings whose proper interpre-
tation depends partly on convention and partly on context (Bühler [1934] 1982;
Jakobson 1971; Silverstein 1976). Take for example him in Take a photo of him. Your
understanding of him will depend partly on your recognition of a conventional, context-
independent meaning of the English form him (third person, singular, male, accusative)
and partly on non-conventional facts unique to the speech event (e.g., whichever male
referent is most salient given our current joint attention and common ground). Sym-
bolic indexicals play a critical role in many types of composite utterance, since their
job is to glue things together, including words, gestures, and (imagined) things in the
world (see Part I of Enfield 2009, and studies of pointing in this handbook).
In the context of these three kinds of sign, it is important to be mindful of the dis-
tinction between type and token (Hutton 1990; Peirce 1955). All of the signs discussed
above occur as tokens, that is, as perceptible, contextualized, unique instances. But only
conventional signs (including conventional components of symbolic indexicals) neces-
sarily have both type and token identities. That is, when they occur as tokens, they
are tokens of types, or what Peirce called replicas. It is because of their abstract type
identity that conventional signs can be regarded as meaningful independent of context,
as having “sense” (Frege [1892] 1960), “timeless meaning” (Grice 1989) or “semantic
invariance” (Wierzbicka 1985, 1996). Conventional signs are pre-fabricated signs,
already signs by their very nature. By contrast, non-conventional signs (including
non-conventional components of symbolic indexicals) are tokens but not tokens of
types. They are singularities (Kockelman 2005). They become signs only when taken
as signs in context. This is the key to understanding the asymmetries we observe in com-
posite utterances such as speech-with-gesture ensembles. A hand gesture may be a con-
ventional sign (e.g., as “emblem”). Or it may be non-conventional, only becoming a sign
because of how it is used in that context (e.g., as “iconic” or “metaphoric”). Or it may
be a symbolic indexical (e.g., as pointing gesture, with conventionally recognizable
form, but dependent on token context for referential resolution). Hand gestures are
not at all unique in this regard: the linguistic component of an utterance may, similarly,
be conventional (e.g., words, grammar), non-conventional (e.g., voice quality, sound
stretches), or symbolic indexical (e.g., demonstratives like yay or this). Ditto for
sign components of graphs, diagrams, and other illustrations. Sensory or articulatory

modality is no obstacle to semiotic flexibility.
Before concluding this section, it is worthwhile registering a common inconsistency
in discussions of the meaning of hand movements in composite utterances. The problem
is an inconsistent treatment of the way meaning is attributed to words, on the one hand,
and gestures, on the other. Linguistic items like words are often described merely in
terms of what they conventionally encode (as standing for lexical types), while gestures
are typically described in terms of what they non-conventionally convey (as standing for
utterance-level tokens of informative intention). In other words, the interpreter’s prob-
lem of comprehending word meaning is taken to be one of recognition (from token
form to type lexical entry), while the problem of comprehending gesture meaning is
taken to be one of interpretation (from token form to token informative intention).
The inconsistency here is that it overlooks the fact that comprehension of the linguistic
component also involves interpretation yielding token informative intentions. In inter-
preting the meanings of words, we do not stop with mere recognition of type lexical en-
tries, but, just like with gestures, we also use them for recognizing a speaker’s token
informative intention. To illustrate, take an example cited by McNeill (2005: 26), in
which a speaker says and he came out the pipe while doing an “up-and-down away”
hand gesture (the hand is moving away from the body as it is moved repeatedly up
and down). Hearing came out, an interpreter recognizes these sounds to be tokens of
types (i.e., with the meaning “came out”). He or she may also enrich this meaning
“came out” in using it as a clue for figuring out the speaker’s informative intention
in producing this composite utterance. They may of course exploit the accompanying
gesture in this process of enrichment. In the experiment described by McNeill, a
subject who heard the first speaker’s description of the scene as and he came out the
pipe[GESTURE-up-and-down-away] later re-describes it as the cat bounces out the pipe. Note
that the re-teller not only enriches came out[GESTURE-up-and-down-away] as “bounces
out”, he also enriches he as “the cat”; concerning the pronoun he in the original utter-
ance, the subject must have both recognized he as a token of the type “he”, which stands
in this case for a token informative intention “the cat”. This shows that both the gesture
and the words are enriched by their co-occurrence in that context, being taken to be co-
occurring signs of a single informative intention. Came out and [GESTURE-up-and-down-away]
together point to a single idea “bounces out”. While word recognition has no analogue
in the interpretation of the iconic gesture here (since the gesture is a token but not a
token of a type), attribution of overall utterance-intention of words does have an ana-
logue in the interpretation of the gesture.
When examining gesture, as when examining any other component of composite
utterances, we must carefully distinguish between token meaning (enriched, context-
situated), type meaning (raw, context-independent, pre-packaged), and sheer form
(no necessary meaning at all outside of a particular context in which it is taken to
have meaning). These distinctions may apply to signs in any modality.
2.5. Elements of composite utterances

Based on the discussion so far, we may define the composite utterance as a communi-
cative move that incorporates multiple signs of multiple types. Sources of these types of
sign are given in Fig. 44.3 (see Hanks 1990: 51ff; Levinson 1983: 14, 131).
I. Encoded
I.1. Lexical (open class, symbolic)
I.2. Grammatical (closed class, symbolic-indexical)
II. Enriched
II.1. Indexical resolution
II.1.1. Explicit (via symbolic indexicals, e.g., pointing or demonstratives)
II.1.2. Implicit (e.g., from physical situation)
II.2. Implicature
II.2.1. From code
II.2.2. From context
Fig. 44.3: Sources of composite meaning for interpretation of communicative moves. “Encoded” =
conventional sign components. “Enriched” = non-conventional token meanings drawing on
context.
In Fig. 44.3, “encoded meaning” encompasses both lexical and grammatical meaning.
Grammatical signs show greater indexicality because they signify context-specific ties
between two or more elements of a composite utterance (e.g., grammatical agreement,
case-marking, etc.) or between the speech event and a narrated event (Jakobson 1971;
e.g., through tense-marking, spatial deixis, etc.). “Indexical enrichment” refers to the
resolution of reference left open either explicitly (e.g., through symbolic indexicals
like this) or implicitly (e.g., by simple co-placement in space or time; thus, a “no smok-
ing” sign need not specify “no smoking here”). “Enrichment through implicature” re-
fers to Gricean token understandings, arising either through rational interpretation
based on knowledge of a restricted system of code (i.e., informativeness scales and
other mechanisms for Generalized Conversational Implicature; Levinson 2000), or
through rational interpretation based on cultural or personal common ground (e.g., Par-
ticularized Conversational Implicatures such as those based on a maxim of relevance;
Sperber and Wilson 1995).
Thus, composite utterances are interpreted through the recognition and bringing
together of these multiple signs under a pragmatic unity heuristic or co-relevance prin-
ciple, i.e., an interpreter’s steadfast presumption of pragmatic unity despite semiotic
complexity.
3. Sign filtration: Triggers and heuristics

The taxonomy of elements of composite signs in Fig. 44.3 presupposes that an inter-
preter can solve the problem of sign filtration, i.e., that they can parse out from a
flux of impressions those things that are to be taken as signs in the first place. This fil-
tration is assisted by triggers which direct us to lock on to certain signs, constraining the
search space. An important trigger is that a perceptible impression must be recogniz-
able as addressed, that is, being produced by a person for the sake of its interpretation
by another. Conventional signs like words have this addressed-ness by their very nature.
But other perceptibles are only potential signs, and their addressed-ness needs to be
specially marked. This can be achieved by means of attention-drawing indexicals
(hand pointing, saying like this, etc.), by sheer spatiotemporal co-occurrence, or by
special diacritic marking (see Figs. 44.1 and 44.2, above). An example of the latter is
discussed in Enfield (2009, Chapter 3) where movements of the face and head can
serve as triggers for eye gaze to be interpreted as pointing, not merely as looking. In
yet other cases, interpreters can employ abductive, rational interpretation to detect
that an action is done with a communicative intention (Grice 1957; Peirce 1955). For
instance, if you open a jar I may be unlikely to take this to be communicative, but if
you carry out the same physical action without an actual jar in your hands, the lack
of conceivable practical aim is likely to act as a trigger for implicature (Gergely,
Bekkering, and Király 2002; Levinson 1983: 157).
Data of the kind presented throughout this handbook do not usually present special
difficulties for interpreters in detecting communicative intention or identifying which
signs to include when interpreting a composite utterance. Mostly, the mere fact of lan-
guage being used triggers a process of interpretation, and the gestures which accom-
pany speech are straightforwardly taken to be associated with what a speaker is
saying (Kendon 2004). Hand gestures are therefore available for inclusion in a unified
utterance interpretation, whether or not we take them to have been intended to
communicate.
Note the kinds of heuristics that are likely being used in solving the problem of sign
filtration. (On heuristics and bounded rationality in general see Gigerenzer, Hertwig,
and Pachur 2011 and references therein.) By a convention heuristic, if a form is recog-
nizable as a socially conventionalized type of sign, assume that it stands for its socially
conventional meaning. Symbols like words may thus be considered as pre-fabricated
semiotic processes: their very existence is due to their role in communication (unlike
iconic-indexical relations which may exist in the absence of interpretants). By an orien-
tation heuristic, if a signer is bodily oriented toward you, most obviously by body posi-
tion and eye gaze, assume they are addressing you. By a contextual association
heuristic, if two signs are contextually associated, assume they are part of one signifying
action. Triggers for contextual association are timing and other types of indexical prox-
imity (e.g., placing caption and picture together, placing word and gesture together). By
a unified utterance-meaning heuristic, assume that contextually associated signs point
to a unified, single, addressed utterance-meaning. And by an agency heuristic, if a signer
has greater control over a behaviour, assume (all things being equal) that this sign is
more likely to have been communicatively intended. Language scores higher than ges-
ture on a range of measures of agency (Kockelman 2007). For further elaboration on
the application of a heuristic model to the interpretation of speech-gesture composites,
see Enfield (2009: 223–227).
4. Semiotic analysis of gestures

Like any signs, hand movements can stand for things in three essential ways (often in
combination), referred to by Peirce (1955) as types of ground: iconic, indexical, sym-
bolic. These crucial yet widely mishandled distinctions are defined as follows. A relation
of a sign standing for an object is iconic when the sign is taken to stand for the object
because it has perceptible qualities in common with it. The sign is indexical when it is
taken to stand for an object because it has a relation of actual contiguity (spatial, tem-
poral, or causal) with that object. The relation is symbolic when the sign is taken to
stand for an object because of a norm in the community that this sign shall be taken
to stand for this object. These three types of ground are not exclusive, but co-occur. In
the example of a fingerprint on the murder weapon, the print is iconic and indexical. It
is iconic in that the print has qualities in common with the pattern on the killer’s actual
fingertip and in this way it is a sign that can be taken to stand for the fingertip. It is
indexical in that
(i) it was directly caused by the fingertip making an impression on the weapon (thus a
sign standing for an event of handling it), and
(ii) the fingertip of the killer is in contiguity with the whole killer (thus a sign standing
for the killer himself).
Standard taxonomies of gesture types (Kendon 2004; McNeill 1992; inter alia) are fully
explicable in terms of these types of semiotic ground, as shown in Fig. 44.4.
Deictic:
• semiotic function: indexical (in that the directional orientation of the gesture is determined by the
conceived location of a referent), and symbolic (in that the form of pointing can be locally conven-
tionalized); the hands are used to bring the referent and the attention of the addressee together;
- in concrete deixis, the referent is a physical entity in the speech situation, while in abstract deixis
the referent is a reference-assigned chunk of space with stable coordinates
- in pointing, the attention of the addressee is directed to the referent by some vector-projecting
articulator (such as the index finger or gaze).
- in placing, the referent is positioned for the attention of the addressee
(Nb.: Gaze plays an important role in deictic gestures; it projects its own attention-directing vector
which may (a) reinforce a deictic hand gesture by providing a second vector oriented towards
the same referent, and (b) assist in the management of attention-direction during production of
other gestures.)
Interacting:
• semiotic function: iconic (in that the hands imitate an action) and indexical (in that the shape of the
hands is not the shape of the referent, but is determined by the shape of the referent); the hands arc
meant to look as if they were interacting with the referent;
- in mimetic enactment, the hands arc moving as if they arc doing something to or with the
referent
- in holding, the hands arc shaped to look as if they arc holding the referent
Modeling:
• semiotic function: iconic; the hands arc meant to look as if they arc the referent
- in analogic enactment, the hand’s movement imitates the movement of the referent
- in static modeling, the hand’s shape imitates the shape of the referent
Tracing:
• semiotic function: iconic (in that the gesture imitates drawing) and indexical (in that only part of the
referent is depicted, but the whole is referred to); the hands (more specifically, the fingers) arc meant
to look as if they were tracing the shape of some salient feature of the referent, such as its outline.
Fig. 44.4: Sketch of some semiotic devices used in illustrative co-speech gestures (see Kendon
1988; Mandel 1977; Müller 1998).
An exhaustive analysis of the semiotics of hand gestures will need to systematically

explore their values on the many parameters along which signs differ: formal segment-
ability, stability across populations, evanescence or persistence in time from production,
symmetry of perceptual access for producer and interpreter, relative immediacy of the
processes of production and interpretation, portability, combinatorics, information
structure (see Kockelman 2005: 240–241). This will entail teasing apart the large set of
distinct semiotic dimensions which hand movements incorporate (Talmy 2006; see de
Ruiter et al. 2003). For example, upon uttering a word, the human voice can simulta-
neously vary many distinct features of a speaker’s identity (sex, age, origin, state of
arousal, individual identity, etc.), along with pitch, loudness, among other things.
What makes pitch and loudness distinct semiotic dimensions is that pitch and loudness
can be varied independently of each other. But loudness is a single dimension, because
it is impossible to produce a word simultaneously at two different volumes. Hand move-
ments are well suited to iconic-indexical meaning thanks to their rich potential for shar-
ing perceptible qualities in common with physical objects and events. But they are not
at all confined to these types of meaning. As Wilkins writes, “[the] analog and supraseg-
mental or synthetic nature [of gestures] does not make them any less subject to conven-
tion, and does not deny them combinatorial constraints or rules of structural form”
(Wilkins 2006: 132). For example, in some communities, “the demonstration of the
length of something with two outstretched hands may require a flat hand for the length
of objects with volume (like a beam of wood) and the extended index fingers for the
length of essentially linear objects lacking significant volume (e.g., string or wire)”
(Wilkins 2006: 132). A similar example is a Lao speaker’s conventional way of talking
about sizes of fish, by using the hand or hands to encircle a cross-section of a tapering
tubular body part such as the forearm, calf, or thigh. This is taken as standing for the
actual size of a cross-section of the fish.
Another kind of conventionality in gestures concerns types of communicative prac-
tice like, say, tracing in mid air as a way of illustrating or diagramming (Enfield 2009:
Chapter 6; Kendon 1988; Mandel 1977). It may be argued that there are conventions
which allow interpreters to recognize that a person is doing an illustrative tracing ges-
ture, based presumably on formal distinctions in types of hand movement in combina-
tion with attention-directing eye gaze toward the gesture space. While the exact form
of a tracing gesture cannot be pre-specified, its general manner of execution may be
sufficient to signal that it is a tracing gesture.
Most important is the collaborative, public, socially strategic nature of the process of
constructing composite utterances (Goodwin 2000; Streeck 2009). These communica-
tive moves are not merely designed but designed for, and with, anticipated interpreters.
They are not merely indices of cognitive processes, they constitute cognitive processes.
They are distributed, publicized, and intersubjectively grounded. Each type of compos-
ite utterance discussed in this book is regulated by its producer’s aim not just to convey
some meaning but to bring about a desired understanding in a social other. So, like all
instruments of meaning, these composites are not bipolar form-meaning mappings, or
mere word-to-world glue, they are premised on a triadic, cooperative activity consisting
of a speaker, an addressee, and what the speaker is trying to say.
5. Conclusion and prospects

In solving the ever-present puzzle of figuring out what others are trying to say, our evi-
dence comes in chunks: composite utterances built from multiple signs of multiple
types. These composites are produced by people in trajectories of collaborative social
activity. As communicative behaviours, they are strategic, context-embedded efforts
to make social goals recognizable. If we are to understand how people interpret such
efforts, our primary unit of analysis must be the utterance or move, the single increment
in a sequence of social interaction. Component signs will only make sense in terms of
how they contribute to the function of the move as a whole.
This chapter has focused on moves built from speech-with-gesture as a sample
domain for exploring the anatomy of meaning. But the analytic requirement to think
in terms of composite utterances is not unique to speech-with-gesture. Because all utter-
ances are composite in kind, our findings on speech-with-gesture should help us to
understand meaning more generally. This is because research on the comprehension
of speech-with-gesture is a sub-field of a more general pursuit: to learn how it is that
interpreters understand token contributions to situated sequences of social interaction
(see Goffman 1981; Goodwin 2000; Schegloff 1968; Streeck 2009).
How are multiple signs brought together in unified interpretations? The issue was
framed above in terms of semiotic function of a composite’s distinct components (see
Fig. 44.4). A broad distinction was made between conventional meaning and non-
conventional meaning, where these two may be joined by indexical mechanisms of
various kinds. Think of a painting hanging in a gallery: a title (words, conventional)
is taken to belong with an image (an arrangement of paint, non-conventional) via in-
dexical links (spatial co-placement on a gallery wall, putative source in a single cre-
ator and single act of creation). Speech-with-gesture composites can be analyzed in
the same way. When a man says Make it steep like this with eye gaze fixed on his arm
held at an angle (see Fig. 44.1), the conventional signs of his speech are joined to the
non-conventional sign of his arm gesture by means of indexical devices including tempo-
ral co-placement, source in a single producer, eye gaze, and the symbolic indexical
expression like this. In these “illustrative gesture” cases, hand movements constitute
the non-conventional “image” component of the utterance. By contrast, in cases of
“deictic gesture” or pointing, hand movement is what provides the indexical link
between words and an image or thing in the world, such as a person walking by, or
diagrams in ink or mid-air.
This semiotic framework permits systematic comparison of speech-with-gesture
moves to other species of composite utterance. An important case is sign language of
the Deaf. There is considerable controversy as to how, if at all, gesture and sign lan-
guage are to be compared (see Emmorey and Reilly 1995). The present account
makes it clear that the visible components of a sign language utterance cannot be com-
pared directly to the visible hand movements that accompany speech, nor to mere
speech alone (i.e., with visible hand movements subtracted), but may only be properly
compared to the entire speech-with-gesture composite (see Liddell 2003; Okrent 2002).
The unit of comparison in both cases must be the move. By the analysis advanced here,
different components of a move in sign language will have different semiotic functions,
in the sense just discussed: conventional signs with non-conventional signs, linked in-
dexically. Take the example of sign language “classifier constructions” or “depicting
verbs” (Liddell 2003: 261ff). In a typical construction of this kind, a single articulator
(the hand) will be the vehicle for both a conventional sign component (a conventiona-
lized hand shape such as the American Sign Language “vehicle classifier”) and a non-
conventional sign component (some path of movement, often relative to a contextually
established set of token spatial referents), where linking indexical mechanisms such
as spatio-temporal co-placement and source in single creator are maximized through
instantiation in single sign vehicle, i.e., one and the same hand.
Another domain in which a general composite utterance analysis should fit is in lin-
guistic research on syntax. Syntactic constructions, too, are made up of multiple signs,
where these are mostly the conventional signs of morphemes and constructions (though
note of course that many grammatical morphemes are symbolic indexicals). An increas-
ingly popular view of syntax takes lexical items (words, morphemes) and grammatical
configurations (constructions) to be instances of the same thing: linguistic signs (Croft
2001; Goldberg 1995; Langacker 1987). From this “construction grammar” viewpoint,
interpretation of speech-only utterances should be just as for speech-with-gesture. It
means dealing with multiple, simultaneously occurring signs (e.g., That guy may be
both noun phrase and sentential subject), and looking to determine an overall target
meaning for the communicative move that these signs are converging to signify. A dif-
ference is that while semantic relations within grammatical structures are often nar-
rowly determined by conventions like word order, speech-with-gesture composites
appear to involve simple co-occurrence of signs, with no special formal instruction
for interpreters as to how their meanings are to be unified. Because of this extreme
under-determination of semiotic relation between, say, a gesture and its accompanying
speech, many researchers conclude that there are no systematic combinatorics in
speech-with-gesture. But speech-with-gesture composites are merely a limiting case in
the range of ways that signs combine: all an interpreter knows is that these signs are
to be taken together, but there may be no conventionally coded constraints on how.
Such under-determination is not unique to gesture. In language, too, we find minimal
interpretive constraints on syntactic combinations within the clause, as documented
for example by Gil (2005) for the extreme forms of isolating grammar found in some
spoken languages. And beyond the clause level, such under-determined relations are
the standard fabric of textual cohesion (Halliday and Hasan 1976).
In sum, to understand the process of interpreting any type of composite utterance,
we should not begin with components like “noun”, “rising intonation”, or “pointing ges-
ture”. We begin instead with the notion of a whole utterance, a complete unit of social
action which always has multiple components, which is always embedded in a sequential
context (simultaneously an effect of something prior and a cause of something next),
and whose interpretation always draws on both conventional and non-conventional
signs, joined indexically as wholes.
Research on speech-with-gesture yields ample motivation to question the standard
focus in mainstream linguistics on competence and static representations of meaning
(as opposed to performance and dynamic processes of meaning; see McNeill 2005:
64ff, Wilkins 2006: 140–141). There is a need for due attention to meaning at a con-
text-situated token level, a stance preferred by many functionalist linguists, linguistic
anthropologists, conversational analysts, and some gesture researchers. Speech-with-
gesture composites quickly make this need apparent, because they force us to examine
singularities, i.e., semiotic structures that are tokens but not tokens-of-types. These sin-
gularities include non-conventional gestures as utterance components, as well as the
overall utterances themselves, each a unique combination of signs. This is why, for
instance, Kendon writes of speech-with-gesture composites that “it is only by studying
them as they appear within situations of interaction that we can understand how they
serve in communication” (Kendon 2004: 47–48; see also Hanks 1990, 1996, among
many others). Here is the key point: What Kendon writes is already true of speech
whether it is accompanied by gesture or not. Speech-with-gesture teaches us to treat
utterances as dynamic, motivated, concrete, and context-bound, which is the stance we

need for the proper treatment of communicative moves more generally. By studying
gesture in the right way, we study meaning better.
Acknowledgements
The text of this chapter is drawn from chapters 1 and 8 of The Anatomy of Meaning:
Speech, Gesture, and Composite Utterances (Cambridge University Press, 2009), with
revisions. I gratefully acknowledge Cambridge University Press for permission to
reproduce those sections here. I would also like to thank Cornelia Müller for her
encouragement, guidance, and patience.
6. References
Atkinson, J. Maxwell and John Heritage 1984. Structures of Social Action: Studies in Conversation
Analysis. Cambridge: Cambridge University Press.
Atlas, Jay D. 2005. Logic, Meaning, and Conversation: Semantical Underdeterminacy, Implicature,
and Their Interface. Oxford: Oxford University Press.
Austin, John L. 1962. How to Do Things with Words. Cambridge, MA: Harvard University Press.
Bates, Elizabeth, Luigia Camaioni and Virginia Volterra 1975. The acquisition of performatives
prior to speech. Merril-Palmer Quarterly 21: 205–224.
Bates, Elizabeth, Barbara O’Connell and Cecilia M. Shore 1987. Language and communication in
infancy. In: Joy D. Osofsky (ed.), Handbook of Infant Competence (2nd edition), 149–203. New
York: Wiley and Sons.
Brandom, Robert B. 1979. Freedom and constraint by norms. American Philosophical Quarterly
16(3): 187–196.
Bühler, Karl 1982. The deictic field of language and deictic words. In: Robert J. Jarvella and Wolf-
gang Klein (eds.), Speech, Place, and Action, 9–30. Chichester: John Wiley and Sons. First
published [1934].
Chafe, Wallace 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
Clark, Herbert H 1996. Using Language. Cambridge: Cambridge University Press.
Colapietro, Vincent M. 1989. Peirce’s Approach to the Self: A Semiotic Perspective on Human Sub-
jectivity. Albany: State University of New York Press.
Croft, William 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective.
Cruse, D. Alan. 1986. Lexical Semantics. Cambridge: Cambridge University Press.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Origins, usage,
and coding. Semiotica 1: 49–98.
Emmorey, Karen and Judy S. Reilly 1995. Language, Gesture, and Space. Hillsdale, NY: Lawrence
Erlbaum.
Enfield, N. J. and Stephen C. Levinson 2006. Introduction: Human sociality as a new interdisciplin-
ary field. In: N. J. Enfield, Stephen C. Levinson, Robert J. Jarvella and Wolfgang Klein (eds.),
Roots of Human Sociality: Culture, Cognition, and Interaction, 1–35. Oxford: Berg.
Engle, Randi A. 1998. Not channels but composite signals: Speech, gesture, diagrams and object
demonstrations are integrated in multimodal explanations. In: Morton A. Gernsbacher,
Sharon J. Derry, Robert J. Jarvella and Wolfgang Klein (eds.), Proceedings of the Twentieth
Annual Conference of the Cognitive Science Society, 321–326. Mahwah, NJ: Lawrence
Erlbaum.
Fillmore, Charles J. 1997. Lectures on Deixis. Stanford, CA: Center for the Study of Language and
Information Publications.
Foley, William A. and Robert D. Van Valin Jr. 1984. Functional Syntax and Universal Grammar.
Frege, Gottlob 1960. On sense and reference. In: Peter Geach and Max Black (eds.), Translations
from the Philosophical Writings of Gottlob Frege, 56–78. Oxford: Blackwell. First published
[1892].
Frith, Chris D. and Uta Frith 2007. Social cognition in humans. Current Biology 17(16): R724–
R732.
Gergely, György, Harold Bekkering and Ildikó Király 2002. Developmental psychology: Rational
imitation in preverbal infants. Nature 415: 755.
Gigerenzer, Gerd, Ralph Hertwig and Thorsten Pachur (eds.) 2011. Heuristics: The Foundations of
Adaptive Behavior. New York: Oxford University Press.
Gil, David 2005. Isolating-monocategorial-associational language. In: Henri Cohen and Claire
Lefebvre (eds.), Categorization in Cognitive Science, 348–380. Amsterdam: Elsevier.
Goffman, Erving 1964. The neglected situation. American Anthropologist 66(6): 133–136.
Goffman, Erving 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press.
Goldberg, Adele E. 1995. Constructions: A Construction Grammar Approach to Argument Struc-
ture. Chicago: University of Chicago Press.
Grice, H. Paul 1957. Meaning. Philosophical Review 67: 377–388.
Grice, H. Paul 1975. Logic and conversation. In: Peter Cole and Jerry L. Morgan (eds.), Speech
Acts, 41–58. New York: Academic Press.
Grice, H. Paul 1989. Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Halliday, Michael A. K. and Ruqaiya Hasan 1976. Cohesion in English. London: Longman.
Hanks, William F. 1990. Referential Practice: Language and Lived Space among the Maya. Chi-
Hanks, William F. 1996. Language and Communicative Practices. Boulder, CO: Westview Press.
Horn, Laurence R. 1989. A Natural History of Negation. Chicago: Chicago University Press.
Hutton, Christopher M. 1990. Abstraction and Instance: The Type-Token Relation in Linguistic
Theory. Oxford: Pergamon Press.
Jackendoff, Ray 1983. Semantics and Cognition. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Jakobson, Roman 1971. Shifters, verbals categories, and the Russian verb. In: Roman Jakobson
(ed.), Selected Writings II: Word and Language, 130–147. The Hague: Mouton.
Kita, Sotaro 2003. Pointing: Where Language, Cognition, and Culture Meet. Mahwah, NJ: Lawrence
Erlbaum.
Kockelman, Paul 2005. The semiotic stance. Semiotica 157(1/4): 233–304.
Kockelman, Paul 2006. Residence in the world: Affordances, instruments, actions, roles, and iden-
tities. Semiotica 162(1/4): 19–71.
Kockelman, Paul 2007. Agency: The relation between meaning, power, and knowledge. Current
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Volume I, Theoretical Prerequi-
sites. Stanford, CA: Stanford University Press.
Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.

Levinson, Stephen 2000. Presumptive Meanings: The Theory of Generalized Conversational Impli-
cature. Cambridge: Massachusetts Institute of Technology Press.
Liszkowski, Ulf 2006. Infant pointing at twelve months: Communicative goals, motives, and social-
cognitive abilities. In: N. J. Enfield and Stephen C. Levinson (eds.), Roots of Human Sociality:
Culture, Cognition, and Interaction, 153–178. London: Berg.
Liszkowski, Ulf, Malinda Carpenter, Anne Henning, Tricia Striano and Michael Tomasello 2004.
Twelve-month-olds point to share attention and interest. Developmental Science 7(3): 297–307.
Mandel, Mark 1977. Iconic devices in American Sign Language. In: Lynn A. Friedman (ed.), On the
Other Hand: New Perspectives on American Sign Language, 57–107. New York: Academic Press.
of Chicago Press.
Müller, Cornelia 1998. Iconicity and gesture. In: Serge Santi, Isabelle Guaı̈tella, Christian Cavé
and Gabrielle Konopczynski (eds.), Oralité et Gestualité: Communication Multimodale, Interac-
tion, 321–328. Paris: L’Harmattan.
Okrent, Arika 2002. A modality-free notion of gesture and how it can help us with the morpheme
vs. gesture question in Sign Language linguistics (or at least give us some criteria to work with).
In: Richard P. Meier, Kearsy Cormier and David Quinto-Pozos (eds.), Modality and Structure in
Signed and Spoken Languages, 175–198. Cambridge: Cambridge University Press.
Özyürek, Asli, Roel M. Willems, Sotaro Kita and Peter Hagoort 2007. On-line integration of
semantic information from speech and gesture: Insights from event-related brain potentials.
Journal of Cognitive Neuroscience 19(4): 605–616.
Parmentier, Richard J. 1994. Signs in Society: Studies in Semiotic Anthropology. Bloomington:
Indiana University Press.
Pawley, Andrew and Frances Syder 2000. The one clause at a time hypothesis. In: Heidi Riggen-
bach (ed.), Perspectives on Fluency, 163–191. Ann Arbor, MI: University of Michigan Press.
Peirce, Charles S. 1955. Philosophical Writings of Peirce. New York: Dover.
Preissler, Melissa Allen and Paul Bloom 2008. Two-year-olds use artist intention to understand
drawings. Cognition 106: 512–518.
Richert, Rebekah A. and Angeline S. Lillard 2002. Children’s understanding of the knowledge
prerequisites of drawing and pretending. Developmental Psychology 38(6): 1004–1015.
Ruiter, Jan Peter de, Stephane Rossignol, Louis Vuurpijl, Douglas W. Cunningham and Willem J.
M. Levelt 2003. SLOT: A research platform for investigating multimodal communication.
Behavior Research Methods, Instruments, and Computers 35(3): 408–419.
Sacks, Harvey 1992. Lectures on Conversation. London: Blackwell.
nization of turn-taking for conversation. Language 50(4): 696–735.
Saussure, Ferdinand de 1959. Course in General Linguistics. New York: McGraw-Hill. First pub-
lished [1916].
Schegloff, Emanuel A. 1968. Sequencing in conversational openings. American Anthropologist
70(6): 1075–1095.
Schegloff, Emanuel A. 2007. Sequence Organization in Interaction: A Primer in Conversation Ana-
lysis, Volume 1. Cambridge: Cambridge University Press.
Schütz, Alfred 1970. On Phenomenology and Social Relations. Chicago: University of Chicago Press.
Searle, John R. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cam-
Silverstein, Michael 1976. Shifters, linguistic categories, and cultural description. In: Keith Basso
and Henry Selby (eds.), Meaning in Anthropology, 11–55. Albuquerque: University of New
Mexico Press.
45. Towards a grammar of gestures: A form-based view 707
Sperber, Dan and Deirdre Wilson 1995. Relevance: Communication and Cognition (2nd edition).
Oxford: Blackwell.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Talmy, Leonard 2006. The representation of spatial structure in spoken and signed language. In:
Maya Hickmann and Robert Stéphane (eds.), Space in Languages: Linguistic Systems and
Cognitive Categories, 207–238. Amsterdam: John Benjamins.
Tomasello, Michael 2006. Why don’t apes point? In: N. J. Enfield and Stephen C. Levinson (eds.),
Roots of Human Sociality: Culture, Cognition, and Interaction, 506–524. London: Berg.
Tomasello, Michael, Malinda Carpenter, Josep Call, Tanya Behne and Henrike Moll 2005. Under-
standing and sharing intentions: The origins of cultural cognition. Behavioral and Brain
Sciences 28(5): 664–670.
Wierzbicka, Anna 1985. Lexicography and Conceptual Analysis. Ann Arbor, MI: Karoma.
Wierzbicka, Anna 1996. Semantics: Primes and Universals. Oxford: Oxford University Press.
Wilkins, David P. 2006. Review of Adam Kendon (2004). Gesture: Visible action as utterance.
Gesture 6(1): 119–144.
N. J. Enfield, Nijmegen (Netherlands)
45. Towards a grammar of gestures:

A form-based view
1. Introduction
2. Principles of meaning creation: From form to meaning
3. Simultaneous structures of gesture forms: Variation of formal features
4. Linear structures of gesture forms: Combining gestures
5. Conclusion
6. References
Abstract
Departing from Kendon’s (2004) notion of “features of manifest deliberateness” and their
particular movement characteristics: “Deliberate expressive movement was found to be
movement that had a sharp boundary of onset and offset and that was an excursion,
rather than resulting in any sustained change of position” (Kendon 2004: 12), the chapter
presents a form-based approach to gesture analysis, which regards gestures as motivated
signs and considers a close analysis of their form as the point of departure for reconstruct-
ing their meaning. Furthermore, by considering gestural meaning not only as visual
action but also as a form of dynamic embodied conceptualization, the approach takes
a cognitive and interactive perspective on the process of ad hoc meaning construction
in the flow of a discourse. By discussing principles of meaning creation (sign motivation
via semiotic and cognitive processes) and the simultaneous (variation of formational fea-
tures and gesture families) and linear structures (combinations within gesture units) of
gesture forms, the chapter explicates individual aspects of a “grammar” of gestures. It
is concluded that in gestures we can find the seeds of language or the embodied potential
of hand-movements for developing linguistic structures.
1. Introduction
Our form-based approach to gesture analysis departs from the premise that the articu-
lation of shapes, movements, positions and the orientations of hands, fingers and arms is
meaningful. Moving the hands so that they stand out as figure against a ground, that
they form a particular shape in a particular location and move with a distinct motion
pattern requires effort, articulatory effort which uses muscular energy to work against
gravity (Müller 1998b). It assumes that what we see in gestures is an articulatory effort,
which can be recognized as a communicative one (Müller and Tag 2010). As Kendon
(2004) has shown in a little experiment, people observing a speaking person without
hearing what is being talked about can very reliably distinguish communicative move-
ments from non-communicative ones. Only based on their particular characteristics of
form, those movements are attributed “intentionality” and “having something to do
with what is being said”: “These movements are seen as deliberate, conscious, governed
by an intention to say something or to communicate. These were the movements that
were considered to be, in one observer’s words, “part of what the man was trying to
say” (Kendon 2004: 11). Kendon reports the following form characteristics of those
communicative hand movements: “Deliberate expressive movement was found to be
movement that had a sharp boundary of onset and offset and that was an excursion,
rather than resulting in any sustained change of position. For limb movements, deliber-
ately expressive movements were those in which the limb was lifted away from the body
and later returned to the same or a similar position from which it started” (Kendon
2004: 12). Kendon argues that observers are highly consistent and sure in differentiating
the body movements that show what he terms “features of manifest deliberate expres-
siveness” (Kendon 2004: 13–14, highlighting in the original) from those that are to be
disattended in Goffman’s terms (Goffman 1974, chapter 7; Kendon 2004: 12–13) and
that hearers, in fact, treat those gestures very much like speech: “Just as a hearer per-
ceives speech, whether comprehended or not, as ‘figure’ no matter what the ‘ground’
may be, and just as speech is always regarded as fully intentional and intentionally com-
municative, so it is suggested that if movements are made so that they have certain
dynamic characteristics they will be perceived as ‘figure’ against the ‘ground’ of other
movement, and such movements will be regarded as fully intentional and intentionally
communicative” (Kendon 2004: 13). Kendon thus proposes features of form that are re-
cognized as deliberate to be the core characteristics that distinguishes gesture from
other body movements and actions: “If an action is an excursion, if it has well defined
boundaries of onset and offset, and if it has features which show that the movement is
not made solely under the influence of gravity, then it is likely to be perceived as ges-
tural” (Kendon 2004: 14). In short, Kendon bases his definition of gesture on those form
characteristics of actions “that have the features of manifest deliberate expressiveness”
(Kendon 2004: 15).
Our form-based approach takes on Kendon’s primacy of form in gesture analysis, but
it differs from Kendon in that it regards gestural meaning not only as visual action but
also as a form of embodied conceptualization. In doing so, it follows Langacker’s cog-
nitive linguistic tenet “that meaning resides in conceptualization (in the broadest sense
of this term)” (Langacker 1991: 1). When for instance a German speaker describes a
political turn metaphorically as a “set back” (politischer Rückschlag, literally: “political
back hit”) and while doing so he performs a hitting gesture, his gesture reveals that
he perceives the second part of the metaphoric compound Schlag (‘stroke’) as a manual
action. The gesture embodies the source domain of a verbal metaphor and activates the
embodied sensory experience of the lexicalized meaning. We suggest that in such multi-
modal (or verbo-gestural) metaphoric expressions we see that metaphoricity is cogni-
tively and interactively activated (Müller 2003, 2008a, 2008b; Müller and Cienki 2009;
Müller and Tag 2010). In other words, this embodied conceptualization is an ad hoc cre-
ation of a speaker that documents his (dynamically changing) focus of attention, which
in turn is a product of the interactive needs of a given communicative situation. The
form-based approach thus advocates a cognitive take on the process of ad hoc meaning
construction in the flow of a discourse: “Meaning construction is an on-line mental
activity whereby speech participants create meanings in every communicative act on
the basis of underspecified linguistic units.” (Radden, Köpcke, Berg, et al. 2007: 3, high-
lighting in the original, for a cognitive linguistic view on gesture and language see also
Cienki this volume, 2012) We consider gestures to be a core partner in this interactive
process of meaning construction, but we are not regarding co-verbal gestures as linguis-
tic units in the full-fledged sense. However, we do take the position that gestures may
take over functions of linguistic units either in collaboration or in exchange with vocal
linguistic units. While most of the gesture research addresses gestures accompanying
speech, that specify, extend, or complement what is said (Calbris 1990, 2011; Enfield
2009; Kendon 2004; McNeill 1992, 2000; Mittelberg 2006; Streeck 2009), only scarce
attention has been paid to gestures filling grammatical slots (e.g., Bohle 2007; Clark
and Gerrig 1990; Clark 1996; McNeill 2005, 2007; Müller and Tag 2010; Slama-Cazacu
1976; Wilcox 2002).
Ladewig (2012 volume 2) has shown that co-verbal gestures may replace spoken
units and become part of the syntactic structure of a given utterance. Investigating in-
terrupted spoken utterances that are completed by gestures, she found that gestures
replace nouns and verbs in most cases, taking over the functions of objects and predi-
cates. When a speaker, for instance, describes a person and says He had a […] and
molds an arched object in front of his stomach, this gesture is inserted in the syntactic
gap of a noun and describes what verbally could be expressed as a big belly. In another
case, when a speaker, talking about a baseball game, says It’s like a baseball player who
digs his feet in the sand before he […], the gesture occupies the syntactic position of a
verb and enacts the action of hitting a ball with a baseball bat. Studies on gestures ac-
companying speech could show that gestures may adopt the syntactic functions of ad-
jectives or adverbs, modifying the semantic information expressed in speech (Bressem
2012; Fricke 2012). An arced gesture accompanying the utterance there is such a small
bridge is cataphorically integrated into the verbal noun phrase and adds the informa-
tion regarding the shape of the bridge to the meaning expressed in speech. This phenom-
enon has been described as “multimodal attribution” (Fricke 2012). A flat hand being
moved downwards twice with an accented movement while accompanying the utterance
“is being stamped” specifies the manner of the action of stamping and, as such, fulfills
the function of an adverb (Bressem 2012). Such observations furnish evidence for the
structural and functional integration of gestures into spoken language and lay the ground
for developing the framework of a “multimodal grammar” (Fricke 2012, this volume).
These empirical findings substantiate Kendon’s assumption of gestures and speech as
being “two sides of one process of utterance” (Kendon 1980). By complementing this
view with a perspective on gestural meaning construction as a cognitive process we
are indebted to David McNeill’s theory of gestures as visible forms of imagistic thought
(McNeill 1992):
Gesture provides a new perspective on the processes of language. Language is a broader

concept than we ordinarily suppose. […] The effect is like viewing the world through
two eyes rather than one. Just as binocular vision brings out a new dimension of seeing,
gesture reveals a new dimension of the mind. This dimension is the imagery of language
which has laid hidden. We discover that language is not just a linear progression of seg-
ments, sounds, and words, but is also instantaneous, nonlinear, holistic and imagistic.
(McNeill 1992: 1)
McNeill thus suggests that gestures reveal thinking and this is in line with our assump-
tion of gestures as embodied conceptualization. This assumption is also in line with
Slobin’s concept of thinking for speaking (Slobin 1987) although it is somewhat broader
in scope. Speakers orient their thinking according to the expressive properties at hand:
a specific language with its individual grammatical and semantic/pragmatic structures
and a body which is apt for communication. So we must talk about “thinking and ges-
turing for speaking”, because the cognitive processes that are activated during speaking
must be adjusted to both the linguistic properties of a given language and the expressive
resources of the body (Cienki and Müller 2008). Taking a form-based approach to ges-
ture analysis means then that those expressive resources are described and documented
as resources.
How is it that hand movements can be meaningful? What are the techniques of the
body (to quote Marcel Mauss (1973)) that people employ to create gestures? If gestures
can represent other than themselves, if they can fulfill, in Bühler’s terms, a representa-
tional function (see Müller this volume, 2009), what are the modes of gestural represen-
tation? How are gestures used as instruments for depiction? And what are the
principles governing gestures’ potential to depict concrete as well as abstract actions,
entities, properties, to refer to space as well as to time? Is it just their contextual place-
ment that motivates their local meaning, so that in gesture analysis we need not care
much about a careful account of form properties of a given movement? Or are there
differences of form that are pertinent for meaning variations, for instance, when we
look at gestures relating to literal and metaphoric meaning? We know that form fea-
tures of gestures do play out in regard to the question of systematic relations between
gestures, Kendon’s gesture families (Kendon 2004: 15). In fact, Kendon’s concept of
gesture family is based on the assumption that gesture forms are meaningful and a ges-
ture family emerges around a shared semantic core or semantic theme incorporated in a
particular aspect of form, as for instance: the open hand (e.g., a particular hand shape;
see Kendon 2004, chapter 13; Müller 2004). We believe that these aspects of gestural
form are the basis of gestural meaning – and we believe that these are precisely
Kendon’s “features of manifest deliberate expressiveness.” Providing an encompassing
documentation of those properties of form that characterize the hand(s) as a medium of
expression was the goal of two research enterprises carried out between 2000 and 2011:
“The Berlin Gesture Project” and the project “Towards a Grammar of Gesture: Evolu-
tion, Brain and Linguistic Structures” (ToGoG) (for further information see www.
togog.org and http://www.berlingesturecenter.de/berlingestureproject/fugestureproject.
html).
The interdisciplinary project “Towards a grammar of gesture” was funded within the
program “Key Topics of the Humanities” of the VolkswagenStiftung: It addressed three
key topics of the humanities: the multimodal nature of language, the evolution of gesture,
and neuropsychological foundations of gestures. It investigated these issues from the per-
spective of linguistics (Cornelia Müller), semiotics (Ellen Fricke), evolutionary psychol-
ogy (Katja Liebal), and neuropsychology (Hedda Lausberg). Notably, the term
“grammar of gesture” refers to the basic form properties of gestures and to their struc-
tures. It does not imply, however, that co-verbal gestures have anything like a grammat-
ical structure. The formulation “Towards a grammar of gestures” underlines two aspects:
first, co-verbal gestures show properties of form and meaning which are prerequisites of
language and which – in case the oral mode of expression is not available – may evolve
into a more or less full-fledged linguistic system such as a sign language or an alternate
sign language (see Bressem’s work (2012) on reduplication in gesture, speech, and
sign). Second, when used in conjunction with speech, co-verbal gestures may take over
grammatical functions, such as that of verbs, nouns, or attributes pointing towards a multi-
modal nature of grammar (Bressem 2012; Fricke 2012; Ladewig 2012). In the following,
we will substantiate in particular the first assumption with an overview of our findings
regarding the principles of meaning creation, the simultaneous and linear structures of
gesture forms, and introduce our distinction between singular and recurrent gestures.
For different meticulous substantiations of the second claim, see Bressem on repetitions
in gestures (Bressem 2012; Fricke 2012), Cienki (2012) on gestures as part of usage events
and a variably multimodal concept of spoken language grammar, Fricke on the basic prin-
ciples of a multimodal grammar (2012, this volume), Ladewig on gesture’s integration
into the syntactic and sequential structure of verbal utterances (Ladewig 2012) and Müller
(2010b, submitted) for an overview.
2. Principles of meaning creation: From form to meaning

We depart from the assumption that the meaning of gestures is motivated (see also
Calbris 1990, 2011; Mittelberg 2006, this volume), that their forms embody meaning
in a dynamic and mostly ad hoc manner, and that manual actions are a core basis of
gestural meaning creation (see also Streeck 2009, this volume). Gestures coming
along with speech do not constitute a separate and closed sign system, as they are
often created spontaneously and they are always a matter of language use. In this
sense, they are part of the dynamic flow of attention in communication. Many gestures
that we observe in everyday conversations are spontaneously created iconic and index-
ical signs; if they are conventionalized, their motivation is often still transparent. We
follow Enfield’s (this volume) perspective in considering those “signs as they are, first
and foremost: dynamic, motivated and concrete.”
2.1. Techniques of the hands: Gestural modes of representation

as mimetic devices
Moving hands are visible natural objects – their habitat is the space and their modes of
being are movements and actions. When being used as material for the creation of ges-
tures they particularly lend themselves to depict or to mime actions, tangible and visible
objects and movements in space. It is their material character or their specific mediality
which makes them particularly suitable to mime and depict actions, objects, movements
and spatial relations of all kinds. In the process of the formation of gestural signs, the
hands undergo transformational processes that rely on a basic set of techniques. In ear-
lier work, we distinguished four basic modes of representation – the hand acts, the hand
molds, the hand draws (or traces), and the hand represents. Note, that the term repre-
sentation is used here in Bühler’s sense of Darstellung, which might probably be better
translated as depiction (for more detail, see Müller 2009, this volume). Those four
modes of representation (or of gestural depiction) were a first attempt to provide an
answer to the question: what are the hands actually “doing” when they are transformed
into gesture? (see Müller 1998a, 1998b) The answer was: they are used as techniques for
the depiction of “the world” much like artists use a paint brush, a pencil or clay to shape
a sculpture, a vase, a cup or a bowl. This means, that in the acting mode the hands are
used to mime or reenact actual manual activities. Examples are: grabbing, holding, giv-
ing, receiving, opening a window, turning off a radiator, or holding a steering-wheel. In
the molding mode, the hands create a transient sculpture, such as a frame or a bowl; in
the drawing (or tracing) mode, the hands outline the contour of objects or they trace the
path of movements in space. In all these cases gestural meaning is motivated through
the re-enacted action. This motivation differs in the representing mode. Here, the
hand embodies an object as a whole and becomes itself a kind of “manual sculpture”.
Examples are: the extended index finger representing the pen or the open hand be-
comes a piece of paper on which something is written down (Fig. 45.1). While in the
first three modes of representation the tactile and sensory-motor experience of actions
of the hands provide the derivational base for gestural meaning, in the fourth mode the
derivational base of the gesture might as well be a visual perception only.
Hand Acts Hands Mold Hands Outline Hands Represent

and Trace
Fig. 45.1: The four basic techniques of gesture creation or the gestural modes of representation:
acting, molding, drawing (tracing), representing. Hand(s) reenact an everyday action (pulling a
gear-shift), they mold or outline shape objects (a picture frame), they trace paths, or they repre-
sent objects (in motion) (extended index represents pen, palm open represents paper).
Notably, research conducted on processes of gesture and sign creation and on the
nature of iconicity in gestures and signs has made similar distinctions: for gesture stu-
dies (Kendon 2004; McNeill 1992; Sowa 2005; Streeck 2008, 2009, this volume;
Wundt 1921), for sign linguistics (Battison 1974; Cohen, Namir, and Schlesinger 1977;
Kendon 1980, 1981, 1986; Mandel 1977; Taub 2001), and for semiotics (Andrén 2010)
(For a detailed discussion of those accounts see Müller, Fricke, Ladewig, et al. in
preparation).
We assume, however, that the derivation from a particular action or a specific hand-
ling of objects is not given pre-hoc. On the contrary, speakers often use different ges-
tural techniques to create gestures which refer to the same object. So, you may
observe somebody describing a picture hanging on the wall by molding its shape, by
outlining it, or by representing it – and all this in the same sequence of talk. Yet by
using different techniques of depiction, different properties of the respective object
are foregrounded: molding highlights its corporality, tracing reduces it to lines, repre-
senting brings forward the qualities of objects as located and moved in space (for a de-
tailed account of such a gestural sequence, see Müller 2010a, and for its relation to
classifiers in signed languages see Müller 2009). Speakers use these devices for gesture
creation in a very fine-grained manner and in accordance with their communicative
goals. These techniques of representation (or mimetic devices) imply an orientation
of speakers towards specific facets of perceived and conceived “things and actions”
in the world. It is in this sense that gestures are subtle and variable conceptualizations
of a perceived reality – designed for and triggered by the purposes of communication at
a given moment in the flow of interaction.
Gestures can be regarded as mundane forms of creating illusions of reality, illusions
which are manifestations of “visual thinking” in Arnheim’s (1974) terms (who includes
gestures himself) and as a form of “thinking for speaking and gesturing” in Cienki and
Müller’s terms (Cienki and Müller 2008). According to Gombrich (1961) and Arnheim
(1974) the artist conceptualizes the perceived reality in terms of the available mimetic
devices or “modes of representation”. The mimetic technique has a direct relation to
the process of conceptualization of a perceived world. The psychology of art teaches
us that images are conceptualizations of perceived objects and that they are shaped
by different modes of representation. Images in that sense are artfully created illusions
of reality, creative abstractions shaped by their respective mode of representation. So
it makes a great difference whether a landscape is perceived for drawing or for painting
or for black and white photography (see Gombrich 1961). Gestures in this view are
“natural” and “artful” illusions of reality, created by speakers in the flow of discourse
and interaction, and they are probably the first mimetic devices appearing on the
stage of human evolution (Müller 1998b).
For Aristotle, mimesis is a fundamental anthropological trait (Aristotle Poetics
1965). Only humans possess a capacity for it, and it is considered the base for the devel-
opment of all arts. Aristotle distinguishes three aspects of mimesis: medium, objects,
and modes of mimesis and we suggest that those three aspects characterize gestural
mimesis as well (see also Müller 2010a).
The medium of gestural mimesis is the body with its range of articulators: hands,
head, face, eyes, mouth, shoulders, trunk and even leg(s) and feet. Up until now, gesture
studies have focused mainly on the hands, but obviously the hands are but the most
articulate one (along with the face) and often interact with the other ones. We charac-
terize the objects of gestural mimesis through their functions. Following Bühler, we dis-
tinguish three basic functions of gestures: representation, expression and appeal (see
Müller 1998b, 2009, this volume). The modes of mimesis concern precisely the modes
of gestural representation that we have introduced above: acting, molding, drawing
and tracing, representing.
When reconsidering those four different modes of representation from the point of
view of a more strictly praxeological perspective (as advocated by Streeck 2009, this
volume), one can argue, however, that there are actually only two fundamental techni-
ques: acting and representing. In the acting mode the hands act as if performing an
actual action, they mime different kinds of actions and in the representing mode the
hand acts as if being an object itself, the hand mimes an object by becoming a “manual
sculpture” of that object. In the acting mode of mimesis, gestures are based on actions,
and put in terms of Peirceian semiotics, this means that here the representamen-object
relation is one in which the gestural form relates to an action. In the representing mode,
on the contrary, gestures are based on objects; the representamen-object relation is one
in which the gestural form relates to a “material object in the world” (Peirce 1960, for an
extended analysis in Peirceian terms see Müller, Fricke, Ladewig, et al. in preparation).
Neuro-cognitive research conducted in the ToGoG group underlines the reduction
to two modes of representation as the two basic mimetic modes (see Lausberg, Cruz,
Kita et al. 2003; Lausberg, Heekeren, Kazzer, et al. submitted). In split-brain as well
as in fMRI studies, Lausberg and colleagues found that the two modes (acting and re-
presenting) are processed in different parts of the brain and must therefore be consid-
ered different neuro-cognitive entities. Primatological studies carried out in our group
furthermore indicate that the “acting” mode of representation is clearly the most wide-
spread technique for creating gestures that non-human primates have at their disposal.
Both neuro-cognitive and primatological work indicate that the difference between
object use and gesturing object use (i.e. acting as if the hands would hold or show an
object versus acting with a real object) is not trivial at all. Rather they appear as fun-
damentally distinct behavioral and cognitive processes. Acting as if holding, showing
or moving an object presupposes some kind of mental concept of that very object.
What we face here is no less than the difference between instrumental and symbolic
behavior, or the transition from action to gesture.
Notably, both ways of gesture formation – acting and representing – involve two cog-
nitive-semiotic processes: iconicity and indexicality. In the acting mode indexicality
(contiguity) is the primary cognitive-semiotic process, which mediates between the ges-
tural form and the perceived object in the world, while in the representing mode iconi-
city (similarity) is primary. So this would also point towards indexicality as a different
and evolutionarily earlier cognitive process than iconicity. But, why is indexicality the
basis of a type of gestures that has very widely been described as iconic gesture (see
McNeill 1992)? Because the semiotic principle governing the motivation of the gestural
form is primarily a metonymic one – or more particularly a synecdochical one: a part of
the action stands for the action. This principle characterizes the reenactment of mun-
dane actions, the reenactment of touching surfaces (for the molding mode) and the re-
enactment of drawing traces in soft surfaces with an extended index (for the drawing
mode). In the case of the representing mode iconicity is primary, because the motiva-
tion of the form is primarily based on similarity of a particular hand-shape with an
object.
Müller (1998b, 2009), Müller, Fricke, Ladewig, et al. (in preparation), Mittelberg
(2006) and Mittelberg and Waugh (2009), have argued that one pertinent relationship
between a gesture’s form and its object or ground (in a Peirceian sense) is metonymy
and both consider metonymy as based on contiguity relations (in a Jacobsonian
sense), i.e. as based on indexicality (for more detail see Mittelberg’s cognitive-semiotic
approach to gesture, Mittelberg 2006, this volume). Müller (1998b) has described the
process of gesture creation in terms of different forms of metonymy, all of which are
characterized by a process of creative abstraction from a perceived or conceived object
or ground in the world. She has related these to processes of abstraction in modern
art in particular to Kandinsky’s concept of “abstraction” and to the sculpting art of
Brancusi underlining that it is here where the creative processes of conceptualization
described above come in. Therefore we would like to suggest that the processes of ab-
stractive conceptualization (see also Langer’s concept of “abstractive seeing” 1957) are
mediated by the specific technique the artist as well as the gesturer applies, meaning
that what is metonymically abstracted, depends on the techniques of depiction (paint,
pencil, photography or acting, molding, drawing, representing), and those techniques
will guide “the thinking or conceiving for depiction” and they will give each artistic
and every gestural depiction its particular taste.
We therefore suggest that there is no such thing as an “automatic” or “natural” rela-
tion between a perceived or remembered object or event in the world and its gestural
depiction. Speakers choose between different gestural modes: they may trace, mold,
represent an object or they may act with it and each time they will highlight a different
dimension of this object. At their moment of creation the modes of representation
or the mimetic devices structure the embodied abstractive conceptualizations that we
see in the gesturing hands.
Reconstructing the techniques of gesture creation is one core aspect of a form-based
view of gesture analysis. It offers a key to a differentiated understanding of gestural
meaning construction and is one element of a method for a linguistic analysis of gestural
meaning (see Bressem, Ladewig, and Müller this volume; Ladewig and Bressem forth-
coming; Müller, Ladewig and Bressem volume 2). This is crucial, since gestures are ico-
nic and indexical signs and considering the modes of representations helps to determine
the particular motivation of a given gestural movement. It provides a form-based,
descriptive, and rational ground to reconstructing the meaning of gestures in a partic-
ular context and it is an important contribution to intersubjectively accountable
descriptions of Kendon’s features of manifest deliberate expressiveness.
2.2. Differences in form: Gestures depicting literal

or metaphoric meaning
A form-based approach to the analysis of gestures should reveal differences between
different types of gestures. This implication of a form-based view on gestural meaning
was the subject of an experimental study conducted in the context of the ToGoG proj-
ect (Müller, Bressem, Ladewig, et al. 2009). We tested the hypothesis whether gestures
of the concrete and gestures of the abstract can be differentiated on the basis of their
formational features. In order to answer this question experiments were conducted in
which 20 target phrases, each having a concrete and an abstract sense (or a literal
and a metaphoric meaning), were included in a set of 40 stories. Several experimental
conditions were tested to determine which design works best for the elicitation of ges-
tures. We chose a setting in which subjects were asked to listen to the stories, repeat the
target sentences (always the last sentence of a story), and perform a gesture. Altogether
thirty-six subjects were tested and 286 gestures accompanying a concrete target item
and 268 gestures accompanying an abstract item were elicited. The description of the
gestures’ forms was based on the four parameters of sign language, i.e. hand shape, ori-
entation of the hand, movement, and position in the gesture space (Battison 1974; Sto-
koe 1960; for a specification of those parameters for gestures see Bressem this volume).
In addition, we took facial expressions as well as the involvement of other articulators
such as posture, arm and head movements as well as gaze into consideration.
First of all, the analysis of the elicited gestures did not yield any kind of systematic
variation of form features between gestures depicting literal versus metaphoric mean-
ing. We would have expected gestures referring to abstract notions and actions to be
more “abstract” in the literal sense of the word. This means, we expected them to dis-
play less form features that contribute to the meaning of the gesture. This was however
not what we found.
What we observed were not differences in the gestural forms with regard to abstract-
ness or concreteness of the action or object gesturally depicted, but different ways of
gestural depiction irrespectively of whether the depicted object was a concrete or an
abstract one. So the subjects either produced a pantomimic re-enactment of the target
event described in the story (i.e., gestures involving gaze, body movement, facial expres-
sion and specified hand shapes) or they used a non-pantomimic (hands-only) depiction,
a kind of prototypical more de-contextualized depiction of the scene. Put differently, in
the case of pantomimic re-enactments of a scenario, more form features of the hands
contributed to the meaning of the gestures and more body parts were involved. In
non-pantomimic depictions, the deployed gestures were semantically reduced, i.e.,
less form parameters of the hands contributed to the meaning of the gesture and
they mostly involved only the hands.
These results can be explained drawing upon Müller’s theory of a dynamic focus of
attention which proposes that particular aspects of meaning are foregrounded by the
participants of a conversation depending upon the flow of attention within a given inter-
action (Müller 2008a, b; Müller and Tag 2010). The results of our experiment indicate
that participants differ in their ways of conceptualization and experiencing the scenes
depicted in the story, irrespectively of whether the gesture depicts aspects of literal
or metaphoric expressions. So no matter whether concrete or abstract objects and
events in the world are depicted gesturally, there will be people which provide rich pan-
tomimic enactments and others which do fairly floppy loose hand gestures, which are
semantically poorer.
We would like to suggest that when speakers use highly specified hand shapes and
posture and the face it is likely that they fall back on full body experiences of rich sce-
narios. When, on the contrary, semantically reduced gestures are used and no further
body part is involved in the gestural performance, the gestures depict prototypical situa-
tions and their associated embodied experiences. For instance, some speakers will
depict a sticking piece of chewing gum precisely where it got stuck (at the bottom of
something else), while others will just perform a movement of the index and thumb il-
lustrating only the nature and quality of stickiness as property. The same is true for me-
taphoric usages of “sticky”: some subjects performed pantomimic enactments others
fairly abstract hands only depictions of stickiness.
These findings can be interpreted along Fricke’s (2007, 2012) Peirceian distinction of
object versus interpretant-related gestures. Accordingly, we suggest that our analysis
of the formational features of the gestures and the types and amount of articulators
involved in gesturing uncovers whether subjects conceptualize a specific event as an
individually perceived one (object-related gestures) or as one that is prototypically con-

nected with a particular word meaning (interpretant-related gestures). This variation
can be explained by a differing focus of attention in the different speakers: either
they focus on a full embodied scenario or on prototypical aspects of word meaning.
In conclusion, we would like to underline that these results could only be yielded by
an analysis that closely describes the formational features of hand gestures and the
range of additional articulators involved in the creation of the meaning of a gesture.
ToGoG’s form-based approach may be used as a microscope for revealing such subtle
variations in the meaning of gestures.
3. Simultaneous structures of gesture forms: Variation

of formal features
Taking the form properties of gestures as a point of departure for gesture analysis im-
plies that the properties of the gestures as movements that take place in space and time
is put center stage. Following Kendon (2004: 14), we consider of “features which show
that the movement is not made solely under the influence of gravity” as the formational
features of hand-movements: hand shape, orientation of the hand, movement, and posi-
tion in the gesture space. Gestures may systematically vary with regard to how and
which formational features participate in meaning construction. This is what we
consider simultaneous structures of gestures.
3.1. Gesture families

As earlier work of Kendon (2004) and Müller (2004) has shown, some kinds of gestures
tend to cluster around a shared and “distinct set of kinesic features” that goes along
with a “common semantic theme”. These form-meaning clusters are termed gesture
families:
When we refer to families of gestures we refer to groupings of gestural expressions that

have in common one or more kinesic or formational characteristics. […] [E]ach family
not only shares in a distinct set of kinesic features but each is also distinct in its semantic
themes. The forms within these families, distinguished as they are kinesically, also tend to
differ semantically although, within a given family, all forms share in a common semantic
theme. (Kendon 2004: 227; highlighting in the original)
Examples of gesture families are Kendon’s two families of the open hand: the family of
the Open Hand Prone (“palm down”) and the family of the Open Hand Supine (“palm
up”) (Kendon 2004: chapter 13). Based on a context-of-use study of the Open Hand
Prone Kendon suggests that gestures with a palm downward orientation “share the
semantic theme of stopping or interrupting a line of action that is in progress” (Kendon
2004: 248–249). For the Open Hand Supine or “palm up” family of gestures Kendon
and Müller assume that the shared formational features, e.g. hand-shape (palm open)
and the orientation (upwards) come with the shared semantic themes of “offering” and
“receiving” (Kendon 2004: 264; Müller 2004). Both base their assumptions of group-
ing gestures as a family on the semantization of gestural forms, meaning that the
semantic core or the thematic theme is connected with one or several of the four
formational features: hand shapes, orientations, movements, and positions in gesture

space. Ladewig’s work on the family of the “cyclic gesture” shows a case in which
the variations within a family take place in the formational feature “position”. The
formational core is characterized by a “continuous circular movement of the hand, per-
formed away from the body” (e.g., the formational feature being movement) expressing
“cyclic continuity” (see Ladewig 2010, 2011).
Within the ToGoG group, Fricke (2012) has proposed a semasiological notion for
groupings that are based on gestures with stable form-meaning pairings: the concept
of gesture fields. This concept is inspired by the notion of the lexical field or the
word field (Trier 1973). Here the shared semantics defines the gesture field and the com-
mon underlying features are not based on gestural forms, but on the shared semantics of
all gestures.
Taking a strictly onomasiological approach we found yet a third type of gesture fam-
ily: here the shared semantic theme is based on an underlying action scheme. More spe-
cifically what is being semanticized in all the gestures that belong to this family, is the
result of the underlying action: the fact that an object that was close to the body is being
removed, or one that wants to come close is being held away (Bressem, Müller and
Fricke in preparation; Teßendorf 2005).
3.2. The family of the Away Gestures: Embodied roots of negation

Gesture families constitute an important facet of “a grammar of gestures”. In this sec-
tion we will illustrate how those structural islands emerge. In doing that we will discuss
a new type of gesture family that we have identified: the family of the Away Gestures.
As we have mentioned above, gestures might be created by ways of different techniques
or modes of representation: they may re-enact actual actions or represent objects. When
we consider the mundane actions of the hands as vivid sources of derivation for gestural
meaning, it turns out that we might even locate the main source for the gesture’s mean-
ing in particular aspects of a practical action itself. We found that all gestures share an
underlying effect of action that can be characterized as: moving or keeping things away
from the body by sweeping, throwing, brushing, and holding them away with the hand(s).
This shared moment of moving or keeping things away from the body is why we term them
the family of the Away Gestures.
Similarities in form and meaning within the gestures investigated are motivated by a
similar effect of the action base and oppositions between the gestures result from dif-
ferences in everyday actions. We concluded that these actions provide the basis for
shared as well as distinct characteristics and thus explain form and meaning variation
within this group of gestures.
It seems, moreover, that the underlying effect of the gestures’ action constitutes the
embodied basis for the emergence of gestural negation. When sweeping, throwing,
brushing, holding away objects, then something that was there is no longer present or
something that wants to be there is rejected. This practical, embodied mundane expe-
rience is metaphorically extended from the world of actions upon real objects to into
the realm of communication (Teßendorf 2008). In this process of transformation prac-
tical actions become metaphorical actions performed upon speech (Streeck 1994; Mül-
ler 2004), that is the gesture acts upon parts of the utterance or the utterance as a whole
constitutes an utterance or speech act in itself (Kendon 2004; Müller 1998b). In the
family of Away Gestures, the process of removing objects is exploited to form gestures
of exclusion and negation, showing that negation rests upon bodily concepts, is struc-
tured by tactile experiences, and thus has a bodily dimension (Bressem, Müller, and
Fricke in preparation; Harrison 2009; Lapaire 2006).
To sum up, a form-based view also includes a close analysis of the motivation of
gestures and in this case it resulted in the discovery of a new type of gesture family.
3.3. Recurrent and singular gestures

Applying ToGoG’s form-based method of gesture analysis we identified a repertoire of
those gestures that are potential candidates for forming structural islands like the three
different types of gesture families just presented. We termed them recurrent gestures,
because they are “used repeatedly in different contexts and […] their formational
and semantic core remains stable across different contexts and speakers.” (Ladewig
2011: 2) Recurrent gestures are thus to be distinguished from singular gestures),
which are created on the spot (McNeill’s spontaneous gestures (see Müller submitted)).
The fixed form-meaning relation that holds stable across a wide range of communica-
tive contexts makes it likely to assume that recurrent gestures are undergoing a process
of conventionalization. This assumption gains further support from the fact that in a big
context-of-use study, where we considered gesture uses in many different types of dis-
courses, a fairly limited set of gestures with a recurring form and meaning was found.
Based on these findings we would like to suggest that recurrent gestures build up a rep-
ertoire of co-verbal gestures, are candidates to form gesture families, and that in this
sense they are elements of “a grammar of gesture”. The following table provides an over-
view of this repertoire, giving a brief description of the gestural form and the meaning
every gesture comes along with (see Bressem and Müller volume 2).
3.4. A form-meaning based repertoire of recurrent gestures

used in German discourses
Tab. 45.1
Recurrent gesture Description of form and meaning
Brushing away The hand, in which the palm is oriented towards the speakers’
body, is moved outwards by a (rapid) twist of the wrist.
Topics of talk are rejected by (rapidly) brushing them away
from the speakers’ body.
Holding away The (lax) flat hand(s), with the palm turned away from the
speakers’ body, are moved upwards and held and/or moved.
Topics of talk are rejected by holding and moving the palm away.
Throwing away The hand is orientated vertically, the palm is facing away from
the speakers’ body and the hand is flapped downwards by
moving down the wrist. An imaginary topic of talk, sitting in the
palm of the hand, is dismissed by throwing it away.
(Continued )
Tab. 45.1. Continued

Sweeping away The (lax) flat hand(s) with the palm facing downwards are
moved horizontally outwards. Topics of talk are rejected by
(rapidly) moving the palm away form the center to the periphery.
Back and forth The hands are alternated away and towards the speakers’ body.
The gesture is used to mark several arguments and points of
views on the same topic. Moreover, it is frequently used when
speakers talk about changing situations and events.
Cyclic gestures The hand is moved in a continuous rotating motion performed
away from the body or “clockwise” and the hand remains in situ.
The gesture is used in the context of: word/concept searches,
descriptions, and requests and indexes used processes, duration,
continuity or the procedural structure of conversations
(Ladewig 2010, 2011).
Vague The lax flat hand, palm downwards, is repeatedly turned by
turning the wrist clockwise.
The gesture is used to mark events, states as well as ideas as
uncertain and indeterminate.
Change Index finger and thumb are bent, the palm is held laterally and
the fingers are turned back and forth by turning the wrist.
The gesture is used to exemplify changing events and
processes as well to mark opposition of arguments, events etc.
PUOH “The Palm Up Open Hand is characterized by a specific hand
shape and orientation: palm open, fingers are extended more or
less loosely, palm turned upwards. It is often used with a
downward movement or turn of the wrist and a hold in the end”
(Müller 2004). “The Palm Up Open Hand presents an abstract
discursive object as an inspectable one – an object, which is
concrete, manipulable, and visible – and it invites participants to
take on a shared perspective on this object” (Müller 2004).
Weighing up The (lax) flat hands, palm facing upwards, are moved up and
down in an alternating fashion.
The gesture is used to mark several arguments and points of
views on the same topic highlighting the individual pros and cons.
Stretched index finger – held The stretched index finger is raised in the air and held.
The gesture has a cataphoric function by drawing the attention
of other participants to new and particular important topics of
talk as well as to signaling thematic shifts, such as when
dismissing the statement of others for instance.
Stretched index finger – The stretched index finger, palm facing away from the speaker, is
moved horizontally moved upwards and rapidly moved from one side to another by
turning the wrist.
The gesture is used to negate and express denial often going
along with verbal negation.
Dropping of hand The lax flat hand is moved upwards and then dropped on the lap,
the table etc. The dropping usually results in an acoustic signal of
the hand “hitting” the speakers’ body or the table for instance.
(Continued )
Tab. 45.1. Continued

The gesture dismisses topics of talk, by marking parts of the
utterance as less important and interesting.
Fist The fist(s) is/are moved (rapidly) downwards.
The gesture is used to put emphasis on the parts of the
utterance by directing the listeners attention and signals
emotional involvement and insistence.
Ring The index finger(s) and thumb(s) form a circle. The hand(s) are
held or moved up and down repeatedly.
It is used for specification, clarification and emphasis of the
speakers’ utterance.
Shaking off The lax flat hand, orientated towards the speakers’ body, is
shaked by rapid movements of the wrist. Often the gesture is
accompanied by facial expressions.
The gesture has a metacommunicative function and is used to
mark an object or situation as potentially dangerous, delicate or
appalling.
3.5. Variation of form and meaning: Recurring gestures

in non-human primates
Since the form-based approach advocated here grounds the meaning of gestures in their
forms and in the context-of-uses of those forms, it can also be applied to describe the
gestures of non-speaking species (for more detail see Müller 2005). Such an analysis
can contribute to the discussion of language evolution and shed light on the question
of precursors of human gestures in the bodily communication of non-human primates
(see Arbib this volume; Corballis this volume; Kendon 2008; McNeill this volume).
Notably, despite a fairly long tradition in attempting to document the bodily commu-
nication of non-human primates and even in the era of teaching chimpanzees to use sign
language, there has been no attempt to describe the formational features of ape ges-
tures in detail let alone potentially systematic variations across different contexts of
use. On the contrary, most of the research regarding gestures of nonhuman primates
centers on their functions in interactions with other conspecifics or humans. However,
little is known about the structural properties of their gestures (Liebal and Call 2012;
Müller 2005). A recent study examining visual and tactile gestures in Orang-Utans of-
fers further insights into the simultaneous structures of manual gestures in great apes
(Bressem, Liebal, Müller, et al. in preparation). Based on ToGoG’s form-based method
developed for the description of human speech-accompanying gestures (Müller 2010a;
Müller, Ladewig, and Bressem volume 2; for a form-based notation scheme see Bres-
sem this volume), the study revealed that particular form parameters of visual and tac-
tile gestures, such as hand shape and orientation for instance, differ depending on the
context in which they are used. For instance, the tactile gesture “slap” (Liebal, Pika,
and Tomasello 2006) is executed with the flat hand, palm oriented downwards and
moved onto the body of another primate by lowering the whole arm in aggressive con-
texts. In playful contexts however, the shape of the hand, its orientation and the
movement changes: The “slap” is executed with a lax flat hand, a palm oriented
upwards, and a movement anchored in the lower arm. Another systematic form-
function distinction was found in the group of visual gestures. It turned out that visual
gestures with an object are not about the object itself, but are used to regulate social
relationships. Visual gestures without an object on the other hand are used for a
range of different functions (Bressem, Liebal, Müller, et al. in preparation).
Even if these variations in form may seem very rudimentary at first sight, they nev-
ertheless indicate that Orang-Utans are able to systematically vary form and meaning
of a recurrent gestural form. In doing so, the study not only provides insights into
the structure of ape gestures, but also suggests essential similarities in the use of ges-
tures between human and non-human primates, regarding the internal structures, as
well as the development of gestures from instrumental actions.
To conclude: a close analysis of the formational features of gestures in non-human
primates revealed that apes seem to modify at least some of their gestures depending
on the context they are used in – a finding that might shed new light on discussions
of language evolution (see Arbib this volume; Corballis this volume; Kendon 2008;
McNeill this volume).
4. Linear structures of gesture forms: Combining gestures

Taking a form-based view on gesture analysis as a point of departure necessarily in-
volves the consideration of the linear patterns and structures of the gestural movement.
Gestures are movements that are structured and organized in temporal sequences and
this was indeed one of the first important observations made early on in the field of ges-
ture studies. Pioneering work by Kendon (1972, 1980) documented that gestures are
structured linearly. Kendon distinguishes units of varying sizes, ranging from gesture
phrases to gesture unit to posture shifts. He finds, moreover, that this hierarchical struc-
ture of units of body movement goes along with a similar hierarchy in the speech units
they accompany. In a nutshell, the relation between gesture and speech units was de-
scribed as follows: the more body parts involved in the movement changes, the larger
the conversational unit they go along with (Kendon 1972, 1980).
The fact that gestures are describable as units of movement is of central importance
to Kendon’s understanding and definition of gestures. To re-quote him: “If an action is
an excursion, if it has well defined boundaries of onset and offset, […], then it is likely
to be perceived as gestural.” (Kendon 2004: 14). The boundaries of onset and offset for
hand-movements are rest-positions of the hands that frame the excursion of the hands
in the gesture space in front of the body – they delimit gesture units (Kendon 1980, 2004:
chapter 7). According to Kendon, the basic linear structure of a hand-movement is
characterized by a preparatory movement, the execution of a stroke and a retraction
phase, whereby preparation and stroke are considered to form the gesture phrase and
the stroke in conjunction with a post-stroke hold are held to form the nucleus of the
gestural movement. In a recent study Bressem and Ladewig (2011) have suggested
that gesture phases (preparation, stroke, hold, retraction, rest position) differ in their
particular articulatory features such as tension of the hand or movement and can
hence be distinguished based on different articulatory features.
When it comes to the linear structures of gestural movement, gesture research has
almost exclusively focused on the structure of single gestures. The structure of a
gestural movement as a succession of preparation, stroke and retraction, which might

be extended by pre- or post-stroke holds has been subject to quite a number of debate
and refinements. Originally proposed by Kendon (1980) it was further developed by
Kita, van Gijn, and van der Hulst (1998), Seyfeddinipur (2006), and Bressem and La-
dewig (2011). However, there is no systematic account of larger forms of linear struc-
tures. The largest one is Kendon’s proposal of a gesture unit, which can comprise
more than one gesture phrase and is delimited by two rest-positions (Kendon 1980,
2004: chapter 7).
To sum up: a form-based approach to gesture analyses, also systematically considers
the linear and temporal structures and relations. In the following two sections, we will
show how it reveals different forms of linear structures within a gesture unit and a prin-
ciple of creating coherence between successive gestures (e.g., gesture strokes within one
gesture unit).
4.1. Different kinds of gesture combinations: Variation in the form

of gesture units
When we consider the field of gesture studies, it seems as if the world of gesture usage
would consist of nicely delimited and carefully integrated single gestures (e.g., Kendon’s
gesture phrases), however, when we look at gestures “in the wild”, it turns out that very
often they appear in sequences. Just looking at different types of gestures, we find a
systematic range of four different possible combinations:
(i) One gesture might be repeated several times, resulting in the repetition of the
same gestural meaning (iteration) or in the creation of a new gestural meaning
(reduplication) (Bressem 2012, volume 2);
(ii) Several gestures depicting objects, actions, events in a literal (McNeill’s iconics) or
metaphoric manner (McNeill’s metaphorics) may combine to describe an entire
scenario. A verbo-gestural account of an afternoon visit to the baroque garden
of the Versailles castle would be an example;
(iii) Several pragmatic and performative gestures (the Ring Gesture, the Palm Up
Open Hand (Kendon’s Palm Presenting), the Away Gestures) may combine.
They are typically found in argumentative discourses (Political speeches or discus-
sions are a good example for this type of gesture combination);
(iv) The three types might combine, with pragmatic and performative gestures often
located at the beginning or at the end of speaking turns (very often with metaprag-
matic functions) and depictive gestures often placed in the middle of gesture
sequences.
We will provide one characteristic example for the second type of combination. A stu-
dent of art history uses a succession of six depictive gestures to describe his impressions
and memories of an afternoon visit in the baroque garden of the palace at Versailles,
where the changing weather and light created spectacular effects (Fig. 45.2).
This sequence of depictive gestures begins with a departure of both hands from rest-
position into the upper part of the gesture space. In this location the speaker molds
what is supposed to indicate the blue sky (G1), then leaving his left hand up in gesture
G1 G2 G3
und dann hats so dann hats wieder

blauen himmel (.) und dann kam wind weiße wolken/
and then there was and then came the wind and there were white clouds/
such a blue sky (.)
G4 G5 G6
dann kam n kurzer

regenschauer dann war alles naß und glitzerte/ ...
then came a short then everything was wet and sparkling/rain
Fig. 45.2: A sequence of gestures depicting impressions of an afternoon visit to the baroque
garden of Versailles, where the changing weather created spectacular effects.
space (the left “end” of the sky), he uses his right hand “to wave in” wind blowing up in
the sky (G2), then, resuming both hands, still in the “sky” location, round shapes of
clouds are molded (G3), coming in as a result of the blowing wind. Also in the same
position loose downward movements of the relaxed hands are used to depict falling
rain (G4). Now, at the lower end of the raining movements, the effect of this rain is
shown “everything was wet”. His hands are now in a palm downward orientation and
perform a lateral sweep (Kendon’s PL) (G5) to show that “everything”, the entire ground
had been turned wet (for an analysis of the meaning of the PL, see Kendon 2004:
chapter 13). With the last gesture of this sequence the speaker comes to the point of his
little story, namely the effect that this changing weather had on the beauty of the famous
baroque garden: with the returning sun after the rain shower everything was sparkling.
Delicate finger movements of both hands (located between the ground and the sky
position) illustrate the shimmering and sparkling light reflexes on the wet landscape
(G6). The series of depictive gestures comes to an end, with a return to full rest-position.
What is remarkable about this long gesture unit is that all the gestures are performed
within the same gesture space. With his first gesture the speaker sets up a spatial
frame in the upper part of the gesture space (“the blue sky”) – and creates a kind of
stage for all the facets of the afternoon scenario that he will illustrate with the follow-
ing gestures: the wind, the clouds, the rain, all the wet and shimmering reflexes in the
garden. This is what we term a mimetic relation between different gestures (e.g., ges-
ture phrases) and we consider it one fundamental principle of gesture combination
(see Müller, Ladewig, and Bressem volume 2; Bressem, Ladewig, Müller, et al. in
preparation).
4.2. How gestures combine: Creating mimetic

and non-mimetic relations
Focusing on the forms and principles of gesture combinations the question arises, what
is it that turns the “baroque garden” gesture sequence (one large gesture unit) de-
scribed above, into a meaningful unit? What is it that makes it perceivable as a unit
in which each gesture adds on to the description of a framed scenario, each gesture con-
tributes to the full “picture” of the described afternoon? Formally, there are three rea-
sons: first, obviously, there is no return to a full rest position; second, all gestures are
performed within one area in the gesture space; third, they are all performed in close
temporal proximity.
What we see, are series of preparation and stroke phases and in one case a partial
retraction of one hand (when G2 is performed, the left hand loosens but stays in the
“sky” position), this accounts for identifying them as gesture unit in Kendon’s sense.
However, something more is going on here. By placing all gestures within the same spa-
tial frame, a mimetic relation between them is created in which successive gestures are
used to mime actual spatial relations. In this way a formal coherence between them is
established and a larger unit of meaning is created (maybe Kendon’s “idea unit”, Ken-
don 1980). The successive gestures relate to one another, depicting different facets of
one event: first the sky, then, wind blowing, then the clouds, then the rain and below
the sparkling ground. With the first gesture, the frame is set up and for all the subse-
quent gestures the position in space is used to mime the perceived spatial position of
each event within the initially set up frame.
What we encounter here is a basic principle of creating linear structures in gestures:
namely a mimetic use of gesture space. However, we found that there are also non-
mimetic uses of gesture space. In those cases the space is used to mark an abstract rela-
tion between successive gestures. Spatial and temporal proximity are employed here to
indicate that the meaning of the succeeding gestures is meant to import on the meaning
of the preceding gestures, to put it very simply: in non-mimetic uses of the gesture
space, spatial proximity indicates that the series of gestures functions as one larger
unit of meaning.
Take as an example the case in which a young man describes two parallel paths in a
public garden. In doing so, he uses his two hands (open hands, palms lateral) in a par-
allel motion forward to sketch a path on the left hand side and another one on the right
hand side. While the first two gestures mime the actual spatial location (a path on the
left and one on the right), a third gesture follows, which is used to depict a third path
located on the leftmost part of the garden. This gesture is performed further left (i.e., it
mimes the actual spatial location), but it is also performed significantly higher up in the
gesture space. It is synchronized with the intensifier “leftmost” (“and then to the leftmost
there is another path”, und dann noch ganz links diesen einen Weg). We would like to
suggest that here the higher position in gesture space is not used to indicate that the
leftmost path actually runs on a higher level than the other two ones. We believe,
that this upward move functions as a kind of intensification of the gestural meaning
(a gestural rendition of the left-most). Moving a gesture up in the gesture space in-
creases its visibility and as a consequence its communicative prominence. By moving
it towards the center of the interlocutor’s visual attention the gesture is more likely
to be seen. Notably, we found this kind of upward move within gesture sequences
also with other types of gestures and in different contexts, indicating that this non-
mimetic use of gesture space is a systematic way of combining gestures and of creating
abstract structural relations between them (Tag and Müller 2010; see also Bressem
2012).
5. Conclusion
We believe that the form-based view proposed here, substantiates Kendon’s claim of
gestures as “movement excursions” and as showing “features of manifest deliberate ex-
pressiveness”. To offer a systematic, form-based, and linguistic documentation of them
is what we term “a grammar of gestures”. We emphatically avoid to project linguistic
categories onto co-verbal gestures. In doing so, our approach is in accordance with
Enfield’s characterization of analyzing gestures as part of composite utterances:
When examining gesture, as when examining any other component of composite utter-
ances, we must carefully distinguish between token meaning (enriched, context-situated),
type meaning (raw, context-independent, pre-packaged), and sheer form (no necessary
meaning at all outside of a particular context in which it is taken to have meaning).
These distinctions may apply to signs in any modality. (Enfield this volume)
It is also in accordance with Cienki’s account of “language […] as variably multimodal”

(Cienki 2012: 149):
The basic idea being proposed here is that the category of language (in general, and in
terms of any individual language) is not clearly bounded, but is a dynamic category with
a fuzzy, flexible boundary which can incorporate other forms of symbolic expression to
varying degrees, and differently in different contexts of use. (Cienki 2012: 155)
Our specific twist is that we take the form of the gestural movement as a point of depar-
ture for our analysis of the meaning of gestures. We regard gestures as motivated
signs and as such we consider a close analysis of their form as point of departure for
reconstructing their meaning.
In order to arrive at the local, indexical or more shared and maybe even conven-
tional forms of gesture’s meaning, we regard how a particular gesture is deployed within
its context-of-use (Kendon 1990, 2004) or more specifically within a particular linguistic
and interactional context. In doing this, we arrive at intersubjectively accountable de-
scriptions of gestural meaning (see Bressem, Ladewig, and Müller this volume; Müller
2010, submitted; Müller, Ladewig, and Bressem volume 2). This holds for ad hoc as well
as recurrent forms of meaning in gestures. It furthermore allows us to determine if – and
if so, how – gestures may be used to take over the functions of grammatical categories,
like verb or noun, or adverb (Fricke 2012; Ladewig 2012, volume 2).
In a form-based view, meaning creation in gestures is considered to be fundamentally
rooted in human experience. In that we are sympathetic to Streeck’s praxeological
approach to gesture (Streeck 2008, this volume). Gestures display embodied facets of
language as well as their embodied roots. In singular gestures (McNeill’s spontaneous
gestures), sensory (-motor) experiences are used to create ad hoc gestural meaning.
In recurrent gestures we observe processes of schematization, abstraction, segmentation
of the gestural movement and its meaning. Here we can see how – based on those
processes within recurrent gestures – structural islands of “gesture families” emerge.

Finally, with regard to the linear structures of gestures, we identified basic principles
of linear combinatorics: in gesture repetitions (Bressem 2012), in the combination of
gestures within gesture units and we even found recursive patterns on the level of ges-
ture phase combinations (see Fricke 2012, this volume).
These systematic documentations of the nature of gesture forms, their motivation,
their simultaneous and linear structures is what we term a “grammar” of gestures. It
is not to say that gestures possess a linguistic system. What we do want to suggest, how-
ever, is that the study of those co-verbal gestures reveals different facets and stages of
emerging linguistic structures. So, in singular gestures we see how meaningful units for
communication are created based on bodily experience (involving Modes of Represen-
tation, image schemas, motor patterns, actions in general). In recurrent gestures, we see
how a holistic gestalt begins to break down into formational features, some of which
contribute to the thematic core of the gesture others not. In gesture families we can
observe how structural islands based on two different types of principles emerge (ges-
ture families based on a formational and semantic core and gesture families based on a
shared and basic action underlying all members of the family). Analyzing both the
simultaneous and the linear forms of gestures reveals embodied forms and principles
of emerging structures that may shed light on processes of grammaticalization in signed
languages (Wilcox 2007). Notably, in working on those facets of a “grammar” of ges-
tures, we have deliberately decided to not depart from a sign linguistic point of view.
We have avoided this just because we did not want to fall into the trap of applying lin-
guistic categories identified for signed languages to gestures. The same holds for the
categories of spoken language grammar.
Conducting research on a grammar of gesture does not imply to use linguistic cate-
gories to describe gestures or to assume that gestures possess a linguistic system. On the
contrary – when used in conjunction with speech, gestures as an articulatory modality
are under no pressure to develop a parallel semiotic system for communication, such
as a parallel or alternate sign language. Gestures used with speech create composite
utterances, they are visible actions that serve as utterances and they are embodied con-
ceptualizations. We suggest, however, therefore that gestures used along with speech
reveal that spoken language is inherently multimodal. In that sense we speak of a multi-
modal grammar (for more grammatical rigor concerning this proposal, see Fricke 2012,
this volume.)
Finally, we would like to take up a thought expressed by the Roman teacher of rhet-
oric Quintilian. In the actio part of his Rhetoric he says that gestures are the natural
language of mankind. We believe that gestures have a potential for language and
that if we look closely enough we can indeed detect some of the embodied seeds of
language.
Acknowledgments
The Berlin Gesture Project was located at the Freie Universität Berlin (2000–2006)
and headed by Cornelia Müller. Members of the group were: Karin Becker, Janina
Beins, Ulrike Bohle, Gudrun Borgschulte, Jana Bressem, Ellen Fricke, André Hatting,
Alexander Kästner, Silva Ladewig, Nadine Schmidt, Daniel Steckbauer, Susanne Tag,
Sedinha Theßendorf, and Juliane Wutta. The ToGoG project (2006–2011) was located
at the European-University Viadrina, the Free University Berlin and the German Sport
University Köln and headed by Cornelia Müller (Linguistics), Ellen Fricke (Semiotics),
Hedda Lausberg (Neurology), and Katja Liebal (Primatology). Members of the group
were: Jana Bressem, Mary Copple, Firdaous Fatfouta-Hanka, Marlen Griessing, Julius
Hassemer, Ingo Helmich, Katharina Hogrefe, Henning Holle, Benjamin Marienfeld,
Robert Rein, Philipp Kazzer Silva Ladewig, Christel Schneider, Nicole Stein, Susanne
Tag, Sedinha Teßendorf, Sebastian Tempelmann, and Ulrike Wrobel. Irene Mittelberg
was an associated member of the project.
We are grateful to the Volkswagen Foundation for supporting this research with a
grant for the interdisciplinary project “Towards a grammar of gesture: Evolution,
brain and linguistic structures” (www.togog.org).
We thank Karin Becker for the drawings and all the group members for their inspir-
ing contributions to this intellectual enterprise.
6. References
guages and Literature, Lund University.
Arbib, Michael this volume. Mirror systems and the neurocognitive substrates of bodily commu-
nication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Aristotle 1965. Poetics. Oxford: Oxford University Press.
Arnheim, Rudolph 1974. Art and Visual Perception. A Psychology of the Creative Eye. Berkeley:
University of California Press.
Battison, Robin 1974. Phonological deletion in American sign language. Sign Language Studies 5:
1–19.
Bohle, Ulrike 2007. Das Wort ergreifen–das Wort übergeben: Explorative Studie zur Rolle redebe-
gleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures. In:
Bressem, Jana volume 2. Repetitions in gesture. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill, and Jana Bressem (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
Bressem, Jana, Cornelia Müller and Ellen Fricke in preparation “No, not, none of that” – cases of
exclusion and negation in gesture.
Bressem, Jana and Cornelia Müller volume 2. A repertoire of recurrent gestures of German. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Alan Cienki, Silva H. Ladewig, David McNeill and
Jana Bressem (eds.), Body – Language – Communication: An International Handbook on Mul-
timodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.2.) Berlin, New York: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases – articulatory features of
gestural movement? Semiotica 184(1/4): 53–91.
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Science 38.1.) Berlin, New York: De Gruyter Mouton.
Bressem, Jana, Silva H. Ladewig, Cornelia Müller, Melissa Arnecke, Franziska Boll, Dorothea Böhme,
Lena Hotze, Benjamin Marienfeld, Nicole Stein in preparation. Linear structures in gestures.
Bressem, Jana, Katja Liebal, Cornelia Müller and Nicole Stein in preparation. Recurrent forms
and contexts: Families of gestures in non-human primates.
Cienki, Alan 2012. Usage events of spoken language and the symbolic units (may) abstract from
them. In: Kosecki, Krzysztof and Janusz Badio (eds.), Cognitive Processes in Language,
149–158. Frankfurt am Main: Peter Lang.
Cienki, Alan this volume. Cognitive Linguistics: Spoken language and gesture as expressions of
conceptualization. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Cienki, Alan and Cornelia Müller 2008. Metaphor, gesture and thought. In: Raymond W. Gibbs
(ed.), Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cambridge Uni-
versity Press.
Clark, Herbert H. 1996. Using Language, Volume 4. Cambridge: Cambridge University Press.
764–805.
Cohen, Einya, Lila Namir and I. M. Schlesinger 1977. A New Dictionary of Sign Language: Em-
ploying the Eshkol-Wachmann Movement Notation System. The Hague: Mouton.
Corballis, Michael this volume. Gesture as precursor to speech in evolution. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1) Berlin,
New York: De Gruyter Mouton.
Enfield, N. J. this volume. A ‘Composite Utterances’ approach to meaning. In: Cornelia Müller,
De Gruyter Mouton.
Gruyter Mouton.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech: A multimodal
approach. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Goffman, Erving 1974. Frame Analysis. Cambridge, MA: Harvard University Press.
Gombrich, Ernst H. 1961. Art and Illusion: A Study in the Psychology of Pictorial Representation.
London: Phaidon.
example. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communica-
tion, 177–210. New York: Elsevier.
R. Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam 1981. Introduction: Current issues in the Study of ‘Nonverbal Communication’. In:
Adam Kendon (ed.), Nonverbal communication, interaction, and gesture: Selections from
Semiotica (Approaches to Semiotics 41), 1–56. The Hague: Mouton.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62(1/2): 3–28.
Press.
Kendon, Adam 2008. Some reflections on the relationship between “gesture” and “sign”. Gesture
8(3): 348–366.
speech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Springer.
nitive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt
(Oder).
Ladewig, Silva H. volume 2. Linear intergration of gestures into speech. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill , and Jana Bressem (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Inter-
action. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Structures in gestures based on the four parameters of sign language. Semiotica.
Langacker, Ronald W. 1991. Concept, Image, and Symbol: The Cognitive Basis of Grammar. Ber-
lin: De Gruyter Mouton.
Langer, Susanne K. 1957. Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and
Art. Vol. 17. Cambridge, MA: Harvard University Press.
Lapaire, Jean-Remı́ 2006. Negation, reification and manipulation in a cognitive grammar of sub-
stance. In: Stephanie Bonnefille and Sebastian Salbayre (eds.), La Négation: formes, figures,
conceptualisation, 333–349. Tours: Presses universitaires François Rabelais.
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alan Ptito 2003. Pantomime to
visual presentation of objects: left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
Lausberg, Hedda, Hauke Heekeren, Phillip Kazzer and Isabell Wartenburger submitted. Differen-
tial cortical mechanisms underlying pantomimed tool use and demonstrations with tool in
hand.
Liebal, Katja and Josep Call 2012. The origins of non-human primates’ manual gestures. Philoso-
phical Transactions of the Royal Society B: Biological Sciences 367(1585): 118–128.
Liebal, Katja, Simone Pika and Michael Tomasello 2006. Gestural communication in orang-utans
(Pongo pygmaeus). Gesture 6(1): 1–38.
Mandell, Mark 1977. Iconic devices in American Sign Language. In: Lynn A. Friedman (ed.), On the
Other Hand. New Perspectives on American Sign Language, 57–108. New York: Academic Press.
Mauss, Marcel 1973. The techniques of the body. Economy and Society 2(1): 70–88.
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gestures. Cambridge: Cambridge University Press.
McNeill, David 2007. Gesture and thought. In: Anna Esposito, Maja Bratanić, Eric Keller and
Maria Marinaro (eds.), Fundamentals of Verbal and Nonverbal Communication and the Bio-
metric Issue, 20–33. Amsterdam: IOS Press.
McNeill, David this volume. The co-evolution of gesture and speech, and downstream conse-
quences. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discoursive evidence
for multimodal models of grammar. Ph.D. dissertation, Cornell University. Ann Arbor, MI: UMI.
Mittelberg, Irene this volume. The exbodied mind: Cognitive-semiotic principles as motivating
forces in gesture. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998a. Beredte Hände. Theorie und Sprachvergleich redebegleitender Gesten.
In: Noll, Thomas and Caroline Schmauser (eds.), Körperbewegungen und ihre Bedeutung,
21–44. Berlin: Arno Spitz.
Müller, Cornelia 1998b. Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich. Ber-
lin: Arno Spitz.
and Use, 259–265. Porto: Universidade Fernando Pessoa.
Müller, Cornelia 2004. Forms and uses of the palm up open hand. A case of a gesture family? In:
Müller, Cornelia 2005. Gestures in human and nonhuman primates: Why we need a comparative
view. Gesture 5(1–2): 259–283.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View. Chi-
cago: Chicago University Press.
Müller, Cornelia 2009. Gesture and Language. In: Kirsten Malmkjaer (ed.), Routledge’s Linguis-
tics Encyclopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Martin Vöhler and Christiane
Perspektive. In: Sprache und Literatur 41(1): 37–68.
Müller, Cornelia submitted. How gestures mean – The construction of meaning in gestures with speech.
Müller, Cornelia, Jana Bressem, Silva H. Ladewig and Susanne Tag 2009. Introduction to special
session “Towards a grammar of gesture”. Paper presented at the Gesture and Speech in Intera-
tion (GESPIN) at the Adam Mickiwicz University Poznan, Poland.
Müller, Cornelia, Ellen Fricke, Silva H. Ladewig, Irene Mittelberg and Sedinha Teßendorf in
preparation. Gestural Modes of Representation – Revisited.
lysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Alan Cienki, Silva H. Ladewig, David
McNeill and Jana Bressem (eds.), Body – Language – Communication: An International
Müller, Cornelia and Susanne Tag 2010. The dynamics of metaphor: Foregrounding and activating
metaphoricity in conversational interaction. Cognitive Semiotics 6: 85–120.
Peirce, Charles S. 1960. Collected Papers of Charles Sanders Peirce (1931–1958). Vol. I: Principles
of Philosophy, Vol. II: Elements of Logic. Cambridge, MA: Belknap Press of Harvard Univer-
sity Press.
Radden, Günter, Klaus-Michael Köpcke, Thomas Berg and Peter Siemund 2007. Introduction: The
construction of meaning in language. In: Günter Radden, Klaus-Michael Köpcke, Thomas Berg
and Peter Siemund (eds.), Aspects of Meaning Construction, 1–15. Amsterdam: John Benjamins.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting speech and gesture. Ph.D. dissertation, Uni-
versity of Nijmegen, Nijmegen, the Netherlands.
William Charles McCormack and Stephen A. Wurm (eds.), Language and Man: Anthropolog-
ical Issues, 217–227. The Hague: Mouton.
Slobin, Dan 1987. From thought and language to thinking for speaking In: John J. Gumperz and
Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge: Cambridge
University Press.
Sowa, Timo 2005. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Ph.D.
dissertation. Berlin: Akademische Verlagsgesellschaft.
Streeck, Jürgen 1994. ‘Speech handling’: The metaphorical representation of speech in gesture. A
cross-cultural study. Unpublished manuscript.
Streeck, Jürgen 2008. Depicting by gesture. Gesture 8(3): 285–301.
Streeck, Jürgen 2009. Gesturecraft: Manu-facturing Understanding. Amsterdam: John Benjamins.
Streeck, Jürgen this volume. Praxeology of gesture. In: Cornelia Müller, Alan Cienki, Ellen
Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Com-
munication: An International Handbook on Multimodality in Human Interaction.
Tag, Susanne and Cornelia Müller 2010. Combining gestures: Mimetic and non-mimetic use of
gesture space. Paper presented at the 4th conference of the International Society for Gesture
Studies on 29.07.2010, Frankfurt (Oder).
Taub, Sarah F. 2001. Language from the Body: Iconicity and Metaphor in American Sign Lan-
guage. Cambridge: Cambridge University Press.
Barrer”. Unpublished MA thesis, Freie Universität Berlin.
tive approaches. Unpublished manuscript.
Trier, Jost 1973. Aufsätze und Vorträge zur Wortfeldtheorie. The Hague: Mouton.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana
Albertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
46. Towards a unified grammar of gesture and speech: A multimodal approach 733
Wilcox, Sherman 2007. Routes from gesture to language. In: Elena Pizzuto, Paola Pientrandrea
and Raffaele Simone (eds.), Verbal and Signed Languages: Comparing Structures, Constructs
and Methodologies, 107–131. Berlin: Walter de Gruyter.
Wundt, Wilhelm 1921. Völkerpyschologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Erster Band. Die Sprache. Leipzig, Germany: Engelmann.
Cornelia Müller, Frankfurt (Oder), Germany

Jana Bressem, Chemnitz, Germany
Silva H. Ladewig, Frankfurt (Oder), Germany
46. Towards a unified grammar of gesture and speech:

A multimodal approach
1. Introduction
2. Prerequisites for a multimodal approach to grammar
3. Grammar and the concept of multimodality
4. Kinesthemes as syntactic units: Processes of typification and semantization
5. Syntactic structures as code manifestation of the language faculty: Constituency and
recursion in co-speech gestures
6. Syntactic functions as a device for code integration within single languages: Multimodal
attribution in German noun phrases
7. Conclusion: Why we need a multimodal approach to grammar
8. References
Abstract
This chapter argues for a multimodal approach to grammar (Bressem 2012; Fricke 2008,
2012; Harrison 2008, 2009; Ladewig 2012) and offers a sketch of the theoretical founda-
tions according to Fricke (2012). Two main research traditions in linguistics are consid-
ered: generative grammar and linguistic structuralism and functionalism. The enterprise
of a multimodal grammar is substantiated by the analyses of typification and semantiza-
tion of gestures as potential syntactic constituents, by giving the rules of a generative
phrase structure grammar of co-speech gestures which displays recursion and self-
embedding, and by the grammatical analysis of multimodal attribution in German
noun phrases. If we conceive of multimodality as a global dimension of linguistic and
semiotic analysis which is generally applicable to language and other systems of signs,
then we have to broaden our perspective by also including grammars of single languages
and the human language faculty.
1. Introduction
Until recently, the idea that a multimodal approach to grammar is necessary was by no
means evident. Most grammarians so far focus their grammatical analyses on written
and spoken language without considering co-speech gestures. Yet the progress in
gesture studies offers a new perspective on the grammatical capacity of gestures accom-
panying speech (Bressem 2012; Fricke 2008, 2012; Harrison 2008, 2009; Ladewig 2012).
Human speech is not only composed of articulations of the mouth, primarily perceived
by ear, but also of visible articulations of other body parts affecting the eye (e.g., Ken-
don 2004; McNeill 1992, 2005). In this regard, the movements of the hands play a special
role: the sign languages of the deaf show that movements of the hands alone can func-
tion as articulators of fully established languages (e.g., Wundt [1900] 1904, 1973). If it is
the case that movements of the hand inherently have the potential for establishing a
grammar, what are the grammatical implications of all those hands movements that
accompany the speech of hearing people? Are single languages like German or English
partially multimodal? How far is the faculty of language (Hauser, Chomsky, and Fitch
2002) bound to a particular mode of manifestation?
These are basic questions that link the enterprise of a multimodal approach to gram-
mar to a long intermittent linguistic tradition of research affiliated above all with the
names Wilhelm Wundt, Karl Bühler, Louis Hjelmslev, and Kenneth Pike. Karl Bühler
([1934] 2011) justifies a necessary combination of gestural pointing and verbal deictics
with the argument that only in this way are speakers able to successfully refer to entities
in certain utterance situations. Wundt ([1900] 1904, 1973), using the example of sign lan-
guages, demonstrates for the first time that the faculty of language is manifest in the
visual-gestural mode. Hjelmslev ([1943] 1969), as well as Pike (1967), argues for a per-
spective of mode neutrality with respect to utterances as a whole because gestures can
potentially instantiate structures and functions of language and speech. Therefore, ac-
cording to this view, defining linguistic categories has to occur independently of any lin-
guistic “substance” (Hjelmslev 1969). In this research tradition, two basic presumptions
coexist in an unconnected way: first, the presumption that speech can manifest itself in
various media and second, the presumption that subcodes, which may differ medially
from the vocal language, can be integrated into a vocal matrix code. At first glance,
both principles, manifestation and integration, seem to be incompatible. However, a
deeper, more systematic approach facilitates a productive conceptual perspective:
The phenomenon of the multimodality of language is to be tackled on different levels
and from different linguistic perspectives. With respect to generative as well as struc-
tural-functional conceptions of grammar and language, the basic argumentation in
the following sections pursues the goal of substantiating that co-speech gestures belong
at least partially to the subject area of grammar.
2. Prerequisites for a multimodal approach to grammar

How do gestures and words interact on the level of grammar? In order to prove that the
project of a multimodal grammar is at all feasible, it is necessary to show that co-speech
gestures can be structurally and/or functionally integrated into grammars of single vocal
languages or, alternatively, that co-speech gestures can be manifestations of the “faculty
of language in the narrow sense” according to Hauser, Chomsky, and Fitch (2002)
(Fricke 2008, 2012). This can only be proven if it is possible to show that gestures are
capable of being typified and semantisized independently of the simultaneously accom-
panying vocal utterance. An important argument which is commonly held against
co-speech gestures as potential units of the language system is their lack of conventiona-
lization. Without conventionalization and segmentation there are no well-defined
linguistic units with stable form-meaning relations which might be capable of entering
into syntactic constituent structures as constituents. How can we address this objec-
tion? The claim that movements of the hand are categorically not capable of instantiat-
ing linguistic structure and functions and, in a narrower sense, not capable of building
morphemes is easily invalidated. Consider the sign languages of the deaf, which are
fully developed linguistic systems possessing both a manual syntax and a manual lex-
icon. Gestures of the hearing may also be meaningful, comparable to morphemes in
vocal utterances. These gestures with stable form-meaning relations are the so-called
“emblematic gestures” (Efron [1941] 1972; Ekman and Friesen 1969). In section 4
we introduce the concept of kinesthemes as submorphemic units, which allows for
modeling semiotic processes of typification and semantization and thereby provides
terminal constituents for gestural constituent structures in section 5 (see also vol-
ume 2). This concept supports the assumption of a “rudimentary morphology” (Müller
2004: 3) as well as substantiating the category of “recurrent gestures” located between
idiosyncratic and emblematic gestures in Kendon’s continuum (e.g., Ladewig 2010,
2011, 2012; Müller 2008, 2010, submitted; Fricke, Bressem, and Müller volume 2).
With regard to co-speech gestures, gesture scholars have so far neglected grammar
and the syntactic dimension of analysis and linguists have largely considered gesture
as “non-verbal” and as a phenomenon of language use only, excluding it from their
subject. The crucial questions of the following sections are:
– How can linguistic multimodality be defined and distinguished from multimediality?

(Section 3)
– To what extent can the faculty of language and the grammars of single vocal languages
be considered as multimodal? (Section 4 to 6)
– Are co-speech gestures capable of being typified and semantisized independently of
verbal utterances? (Section 4)
– Is it possible to analyze co-speech gestures independently of speech in terms of con-
stituency? Do gestural constituent structures display recursion? If so, what would be
the implications of gestural recursion for language theory? (Section 5)
– To what extent can gestures be integrated into verbal syntax, for example using syn-
tactic functions? (Section 6)
Proving the possibility of typification for gestures is the prerequisite for the assumption
of syntactic constituents that enter syntactic constituent structures. Proving the possibil-
ity of their semantization is the prerequisite for assigning the syntactic relation of mod-
ification in multimodal attribution in verbal noun phrases. On the basis of Eisenberg’s
grammar (1999), we show that co-speech gestures can fulfill syntactic as well as semantic
attributive functions in German as a single language. This implies that they must be seen
as part of the subject area of German grammar. With regard to the faculty of language,
co-speech gestures can be assigned syntactic constituent structures that are recursive. Ac-
cording to the hypothesis that recursion is the defining criterion for the language faculty
in the narrow sense (Hauser, Chomsky, and Fitch 2002) recursive co-speech gestures then
have to be considered as an integral part of human language (see volume 2). Conse-
quently, analyzing the grammar of single vocal languages as well as modeling the
human language faculty require a multimodal approach (section 7).
3. Grammar and the concept of multimodality

3.1. Medium and modality
Every articulation necessarily requires a medium. However, media are not neutral with
respect to the mediated subject but leave traces within it (Krämer 1998; Stetter 2005).
Thus, the definition of the concept of “medium” is part of an indispensable basis of an
epistemically reflected linguistics (Stetter 2005: 266). It is therefore advisable to begin
with several conceptual distinctions with respect to the following basic questions: What
is to be understood by the terms “multimediality” and “multimodality” in the field of
language? To what extent can vocal language be considered as multimodal? What is
the difference between multimediality and multimodality? How can these terms be
defined sufficiently with respect to the objectives pursued in this article?
The basic idea can be characterized as follows: The concept of linguistic multimo-
dality is not primarily based on the criterion of a simultaneous occurrence of different
sensory impressions. Rather, with respect to a single language such as German, multi-
modality only exists if different media can adopt the same linguistic structures and/or
functions. Examples can be found in the spoken and the gestural instantiation of the
syntactic function of an attribute in a noun phrase (see section 6), and also in the sub-
stitution of a spoken constituent with a gestural constituent in the same syntactic posi-
tion, which means that two different media can effect common linguistic structures.
Furthermore, with respect to language in general, multimodality exists when the
same structural principles manifest themselves simultaneously in different media. Ex-
amples of structural principles may be constituency and recursion (see section 5) or
basic semiotic principles of sign constitution, for example, in processes of typification
and semantization (see the parallels between phonesthemes and kinesthemes in
section 4).
This means that multimodality, in contrast to multimediality, not only requires the
simultaneity of at least two media but also either their structural and/or functional
integration into a matrix code (code integration) or the manifestation of one and
the same code in different media (code manifestation). It is the main thesis of this arti-
cle that multimodal code integration and multimodal code manifestation occur not
only on the level of language use but also on the level of the language system. In
the next section we explicate the concepts of multimodality and multimediality
from a linguistic and semiotic point of view. For this we will need a clear definition
of “medium”.
Within his semiotic approach to culture, Posner offers a definition that distinguishes
between physical, technological, sociological, functional, and code-related media con-
cepts (Posner 1986, 2004). Particularly important for our further discussion are the
biological and the code-related concepts. According to Posner, the biological media con-
cept relies on the sensory apparatus and characterizes sign processes according to the
bodily organs (e.g., the ear or the eye) which are involved in the production and recep-
tion of signs. We then speak of auditory or visual media. The code-related media con-
cept on the other hand “characterizes sign systems according to the types of rules
by means of which the sign users manage to assign messages to the signs” (Posner
2004: 61). Single languages like English, French, Spanish or German are examples of
code-related media.
3.2. Multimodality in gesture studies and language-image studies

The starting point of modern gesture studies is the basic idea that one and the same code
(e.g., English) manifests itself in vocal utterances as well as co-speech gestures, with each
belonging to a different sensory mode (biological media concept). This idea competes
with the concept of nonverbal communication, which considers co-speech gestures as
non-linguistic or “nonverbal” (e.g., Ruesch and Kees [1956] 1972; Scherer and Walbott
1984; Watzlawick, Beavin Bavelas, and Jackson 1967). According to Kendon (1980:
208), “speech and movement appear together, as manifestations of the same process of
utterance”, and McNeill (1985), with his article “So you think gestures are nonverbal?”,
initiated a debate in Psychological Review. He adopts Kendon’s viewpoint, transfers it to
psychology, and argues: “[…] gestures and speech are parts of the same psychological
structure and share a computational stage” (McNeill 1985: 350).
As opposed to his earlier concept of multimodality, which he defined as code manifes-
tation through two different sensory modes, Kendon (2004) later argues for a concept in-
tegrating two different codes or semiotic resources with reference to a communicative
goal: “In creating an utterance that uses both modes of expression, the speaker creates
an ensemble in which gesture and speech are employed together as partners in a single
rhetoric enterprise” (Kendon 2004: 127). Obviously, Kendon recognizes this shift of
focus because he asks: “Is this because they are expressions of two different forms of
thought that originate jointly in a single, ‘deeper’ process? Or are they integrated as a
consequence of how a person, engaged in producing an utterance, adapts two separate
modes of expression and conjoins them in a single rhetorical aim?” (Kendon 2004: 3).
Kendon’s later concept of multimodality parallels Kress’ and Van Leeuven’s ([1996]
2006), who conceive multimodality in the context of a general social-semiotic approach:
“Mode is a socially shaped and culturally given resource for making meaning. Image,
writing, layout, music, gesture, speech, moving image, soundtrack are examples of
modes used in representation and communication. […] Modes offer different potentials
for making meaning; these have a fundamental effect on choices of mode in specific
instances of communication” (Kress [2009] 2011: 54).
Like Kress and Van Leeuven, Stöckl conceives of “mode” as “code” (code-based
media concept) (Stöckl 2004: 11), modeling multimodality as a networked system of
modes and sub-modes: “Firstly, modes cut across sensory channels, so the nature of a
sign is not sufficiently characterized by looking at its path of perception. Secondly, one
mode can be realized in different media thus creating medial variants of one mode
(e.g., speech and writing as variants of the linguistic mode)” (Stöckl 2004: 11). Stöckl,
however, primarily addresses the language-image link in printed media. Accordingly, in
his model gestures have the status of a marginal sub-mode. He points out that simple
monomodal texts or monomodal vocal utterances have always been the exception, and
states that we “seem to know more about the functioning of individual modes than
about how they interact and are organized in text and discourse” (Stöckl 2004: 10).
If we do not want to constrain the notion of linguistic multimodality to specific types
such as gesture-speech relations or language-image relations, then we have to broaden
our perspective: towards single languages, towards language types, towards the faculty
of language and towards language in general. Such an extension of the perspective on
multimodality leads to the observation that we are dealing with a phenomenon resem-
bling an optical illusion. On the one hand, the code of the language faculty can manifest
itself partially in two different “modalities”, for example, in single vocal and single sign
languages. On the other hand, gestures can be integrated structurally and/or function-
ally into the more dominant vocal matrix code of a single language. Therefore, we deal
first with processes of code manifestation and second with processes of code integra-
tion. At first glance, analyses of language-image relations deal primarily with the
principle of code integration while gesture-speech relations may be subject to both
principles.
3.3. The basic principles of linguistic multimodality: Code manifestation

and code integration
If we start from a code-based media concept (see Posner’s definition in section 3.1.) in
Fig. 46.1, then language in general and therefore every single language, German for
example, is a medium. According to Merten (1999: 134), language itself is the first com-
municative medium, and its features mold the basic criteria of all further media (in the
sense of a technological media concept). These criteria include “quantization” into the
smallest syntactic entities (high resolution capacity), “non-consumability”, “relational-
ity” (reference to something other than itself), “perception of distance”, “fungibility of
the processed contents”, “linking of psychic systems”, and “multiplier functioning” if
language can be adopted by various recipients at the same time (Merten 1999:
134ff.). For language to evolve as a basic code-related medium, however, it is a require-
ment that high-resolution capacity of a physical medium is present (Merten 1999: 141)
as in the acoustic channel with spoken languages or the optical channel with sign lan-
guages. Physical media interconnect things and thereby effect our perception. They
are media for our perception (Merten 1999: 144).
MEDIA CONCEPT MEDIA CONCEPT MEDIA CONCEPT

– code-related – code-related – code-related
– biological – biological
MULTIMODALITY
Code manifestation/
Code integration
Sign language
– structural
(visual)
– functional
Language
Verbal level
Written language (auditory)
(visual)
SPOKEN LANGUAGE
Gestural level
(primarily auditory) (visual)
(Simultaneity of gestural
LANGUAGE FACULTY SINGLE LANGUAGE
and verbal level)
Fig. 46.1: Mediality and linguistic multimodality

When, in a further step, the code-related media concept and the biological media con-
cept are combined, different media within the linguistic sphere can be differentiated:
sign language, written language, and spoken language. A single sign language like
American Sign Language (ASL) or German Sign Language (DGS) on this level is
merely monomedial and monomodal because only one sensory modality is affected:
the visual. Barring hypertextual applications on the internet, English or German as a
written language is primarily visual and therefore primarily monomedial and mono-
modal as well. Visual and auditory material integrated into a text presupposes the written
language as code but the reverse is not true. Written communication functions indepen-
dently from other potentially integrable codes which also seems to be the case with
spoken language. When on the telephone, for example, people are restricted to an audi-
tive modality and can nevertheless communicate without the visual modality. In such
communication situations, spoken language is monomedial and monomodal as well. Vis-
ible gestures accompanying speech presuppose an audible, vocal language, not the other
way around. Is it therefore sufficient to likewise classify the vocal language as monome-
dial and monomodal? There are strong arguments for not doing so. When comparing
communication by phone with face-to-face communication, it becomes clear that the
latter is the ontogenetically and phylogenetically primary form of communication.
The telephone is a rather young technological innovation, and its handling is learned
relatively late in a child’s development. With respect to a biological media concept,
face-to-face communication is primarily audiovisual and, according to Lyons (1977:
637–638), the “canonical situation of utterance”, serving as point of origin for all
other communication situations with its specific contextual constraints.
Although the term “spoken language” suggests that only the auditory sensory mo-
dality is affected, the visual sensory modality plays a role via gestures and other body
movements in face-to-face communications. Consequently two media (biological media
concept) are involved, which fulfill the necessary condition both of multimodality
and of multimediality. Is spoken language then to be classified as multimedial or as mul-
timodal? What could be a reasonable differentiation between the two notions with
respect to our research intentions of developing a multimodal approach to grammar?
When considering characterizations of co-speech gestures (e.g., Kendon 2004; McNeill
1992, 2005), they have to be seen as body movements observed when someone is
speaking. Concerning their timing, they are closely related to the uttered speech with
which they also share semantic and pragmatic functions.
This points to a workable criterion for differentiating multimodality and multimedi-
ality, that is to say, the structural and functional integration into one and the same
matrix code or, alternatively, the manifestation of one code in two different media. If
two linguistic media are structurally and/or functionally integrated into the same
code at the same time, or if, conversely, one code manifests itself simultaneously in
two different media, then we can speak of multimodality. If two or more media are nei-
ther structurally nor functionally integrated into one code and if there is no manifesta-
tion of the same code in different media, then the phenomenon is defined as
multimedial. This differentiation underlies our definitions of linguistic multimediality
as well as linguistic multimodality in the broad and narrow sense. The following table
summarizes the individual terms and their defining features:
Tab. 46.1: The differentiation of multimediality and multimodality

DEFINING FEATURE TERM
Multimodality in the Multimodality in the Multimediality

narrow sense broad sense
More than one medium + + +
More than one code +/– + +
More than one sense + – +/–
modality
Use of a technological – – +
medium
Code integration or code + + –
manifestation
What all of the terms listed in Tab. 46.1 share is that multimodality and multimediality
are only present when more than one medium is involved. This is the defining charac-
teristic as opposed to monomodality or monomediality. Multimediality can be distin-
guished from multimodality when there is no structural and/or functional integration
of the same primary code or the same code is not manifested in different media.
Within multimodality we distinguish between the broad and narrow case. Multimodal-
ity in the narrow sense occurs when the media involved in an expression belong to dif-
ferent sense modalities in terms of a biological media concept and are structurally and/
or functionally integrated in the same code or, alternatively, manifest the same code in
terms of a code-based media concept. Multimodality in the broad sense differs from
the narrow sense in that the media involved belong to different codes in terms of a
code-based media concept and the same sense modality in terms of a biological
media concept. For the concepts of code integration and code manifestation, there
are precedents in the speech theories of Pike (code integration) and Hjelmslev
(code manifestation). In our current analysis, we are fundamentally open to both
forms of multimodality.
4. Kinesthemes as syntactic units: Processes of typification

and semantization
The widespread assumption that gestures are primarily expressive neglects the observa-
tion that even co-speech gestures, which are considered to be idiosyncratic (e.g.,
McNeill 1992), are based on particular gestural codes. The crucial question from a lin-
guistic point of view is: Are co-speech gestures only part of the concrete utterance
(parole) or can they also be considered to be part of the abstract language system (lan-
gue)? Scholars who argue against abstract linguistic properties of co-speech gestures
emphasize that co-speech gestures lack conventionalization, which is the basis for the
segmentation of linguistic units with a stable form-meaning-relationship. According
to this assumption, only conventionalized types called “morphemes” and “words” can
be combined into higher complex units in syntax.
Some linguists question such rigid concepts of morphemes and assume systematic
processes of typification and semantic loading on a sub-morphemic level of the spoken
language. The delimitable, meaningful segments resulting from such processes are
called “phonesthemes” or “sub-morphemic” units (Bolinger [1968] 1975; Firth [1935]
1957; Zelinsky-Wibbelt 1983). They are defined as intersubjective sound/meaning cor-
relations based on diagrammatic iconicity according to Charles S. Peirce (1931–58).
Bolinger (1975) characterizes them as words clustering in groups, for example, the
words ending in -ump: bump, chump, clump, crump, flump, glump, grump, hump. Se-
mantically, most of them suggest “heaviness”. Bolinger’s crucial observation is that of
an “underlying iconic drive to make sound conform to sense” (Bolinger 1975: 218).
The integration of such concepts like that of the phonestheme into grammars of the
spoken language is hindered in particular by the sharp separation between language use
and language system. This separation is valid for structural linguistics in the tradition of
Saussure as well as for generative linguistics in the Chomskyan tradition. In his book
System und Performanz (Stetter 2005), Christian Stetter makes an interesting proposal
to bridge the gap between language use and language system. In his approach, based on
Nelson Goodman’s work, linguistic types (language system) are understood as sets of
tokens (language use) which – rather than being identical – are only similar to each
other and do not share a common original as basis. This concept allows for intermediate
stages of conventionalization like phonesthemes – and kinesthemes.
Fricke defines kinesthemes analogously to phonesthemes as gestural tokens with in-
tersubjective semantic loading based on diagrammatic iconicity (Fricke 2008, 2010,
2012). Similarities of form correlate with similarities of meaning. Analyses of empirical
examples from route descriptions at Potsdam Square in Berlin show that kinesthemes
can be simple or complex. Complex kinesthemes can be compared to processes of mor-
phological contamination or blending in word formation of spoken languages (e.g.,
smog is a blend of smoke and fog, cf. Zelinsky-Wibbelt 1983) (Fricke 2008, 2012). The
following pointing gestures in figure 46.2 and 46.3 illustrate an analogous process of ges-
ture formation:
Fig. 46.2: Two types of pointing gestures in German: G-Form and PLOH (Fricke 2012: 110)
In German, we can observe two typified forms of pointing gestures: Firstly, the so-called
G-Form with an extended index finger and the palm oriented downwards, secondly,
the palm-lateral-open-hand gesture (PLOH) (Fricke 2007, 2008; see Kendon and Ver-
sante 2003 for Italian gestures). The G-form is semantically loaded with a meaning
which can be paraphrased as “pointing to an object”, whereas the meaning of the
palm-lateral-open-hand gesture is directive (“pointing in a direction”). Fig. 46.3 shows
an example of a gestural contamination that blends both types. It can be paraphrased
as “pointing to an object in a particular direction”.
Fig. 46.3: Blending of G-Form and PLOH (Fricke 2012: 113)
This example shows that processes of formal typification and semantic loading on the
verbal and gestural level are both guided by the same principles. Phonesthemes and kin-
esthemes manifest the same general semiotic code of sign formation and complement
other types of meaning construction, for example, metonymies and metaphors (Cienki
2008; Cienki and Müller 2008; Mittelberg 2006, 2008; Müller 2008, 2010).
5. Syntactic structures as code manifestation of the language

faculty: Constituency and recursion in co-speech gestures
Some linguists claim that recursion is a fundamental characteristic of human language.
They argue that recursion is shared neither by animals nor by cognitive capacities other
than the language faculty (Hauser, Chomsky, and Fitch 2002). In contrast to this view,
Everett (2005) claims that the Amazonian Pirahã lacks evidence of recursion in its syn-
tax. Other researchers consider recursion as the defining feature not only of human
language, but of human cognition in general (Corballis 2007).
New data offer a chance to break vicious circles of theoretical argumentation. Fricke
(2008, 2012) presents empirical examples from route descriptions in German that give
evidence of recursion in co-speech gestures.
The fundamental questions for a multimodal approach to grammar are: Is it possible
to analyze co-speech gestures independently of speech in terms of constituency? Do
gestural constituent structures display recursion? What would be the implications of
gestural recursion for language theory? The concept of recursion entered linguistics
from mathematics and computer science. However, as Lobina and Garcı́a-Albea
(2009) pointed out, the adaptation of this notion in lingustics and cognitive science is
not very clear. If recursion – in the context of linguistics – applies to “a constituent
that contains a constituent of the same kind” in vocal language (Pinker and Jackendoff
2005: 203) as well as to the possibility of producing an infinite number of expressions
with finite means, then some gestural structures can be called recursive, for example,
gesture units (GU) in German.
In this section we will differentiate recursion from iteration, make a distinction
between recursion as a process and structure and characterize the relation between re-
cursion and self-embedding. If we look at recursion as separate from iteration then we
see that they are different forms of sequentializing units: with the help of recursion,
structures take on an increasing depth of embedding whereas iteration produces flat
structures with the same level of embedding (Karlsson 2010: 45). While the units
sequentialized through iteration are completely independent of one another, the same
is not true for recursive sequentialization: “Iteration involves repetition of an action or
object, where each repetition is entirely independent of those that come before and
after. Recursion involves the embedding of an action or object inside another of the
same type, each embedding being dependent in some way on the one it is embedded
inside” (Kinsella 2010: 180).
Let us look at the following examples of iteration (1) and recursion (2) (Kinsella
2010: 181):
(1) Iteration: Jack ate [NP1 the sandwiches NP1] and [NP2 the doughnut NP2] and
[NP3 the apple NP3].
(2) Recursion: [NP1 [NP2 [NP3 John’s NP3] mother’s NP2] neighbor NP1] bought the car.
At first glance, both examples appear on the surface to be chains of noun phrases. Look-
ing closer at the structure of each sentence, however, it becomes apparent that example
(1) has a flat structure in which the noun phrases are independent of each other. In
example (2), on the other hand, there exists a dependence between the noun phrases
that determines the relation of modification (Kinsella 2010: 181). This is also the reason
why in example (1) the order of the noun phrases could, in general, be changed whereas
in example (2) this is not the case (Kinsella 2010).
Lobina (2011: 155) and Fitch (2010: 78) advise that it is necessary to maintain a strict
separation between recursive structures and fundamental recursive algorithms in which
iterative structures, such as in example (1) above, can be created:
[…] many studies focus on the so-called self-embedded sentences (sentences inside other
sentences, such as I know that I know etc.) as a way to demonstrate the non-finiteness of
language, and given that self-embedding is sometimes used as a synonym for recursive struc-
tures (see infra), too close a connection is usually drawn between the presence of these
syntactic facts and the underlying algorithm of the language faculty. (Lobina 2011: 155–154)
Lobina’s distinction between structure and fundamental process also allows a different
perspective on Everett’s objections to the article by Hauser, Chomsky, and Fitch (2002)
in that the absence of self-embedding as a structure cannot be an argument against ac-
cepting recursion as a fundamental algorithm that, as a key element of the faculty of
language, might be present in all natural languages. He argues:
However, even if there were a language that did not exhibit self-embedding but allowed
for conjunction, you could run the same sort of argument and the non-finiteness conclusion
would still be licensed. These two aspects must be kept separate; one focuses on the sort of
expressions that languages manifest (or not), while the other is a point about the algorithm
that generates all natural language structures. (Lobina 2011: 156)
With regard to the analysis of speech-accompanying gestures, Fricke (2012) shows that
they alone – without reference to vocal utterances – can essentially form arbitrarily long
“flat” chains: On the one hand, gestures share the structural characteristics of iteration
that can be created through a recursive algorithm. On the other hand, however, ges-
tures also share the structural properties of a “deeper” self-embedding in that gestural
constituents can contain other gestural constituents of the same type. Based on
empirical analyses, Fricke proposes (2012: 176) the following phrase structure rules for
co-speech gestures (for more details see volume 2):
GP Retr
GU GU (GU1 … GUn) Retr
GU GP GP (GP1 … GPn) Retr
GU (GU1 … GUn) GP (GP1 … GPn) (GUn+1 … GUz) Retr
GP (GP1 … GPn) GU (GU1 … GUn) (GPn+1 … GPz) Retr
GP (Prep) SP
SP S (S1 … Sn)
S (Hold) s (Hold)
The starting point for this system of rules is the gesture unit. A primary gesture unit is
the highest unit of the constituent hierarchy. This fact is reflected by using the category
GU as the starting symbol, comparable to the category sentence (S) in generative gram-
mars of vocal languages. The property of self-embedding is indicated when to the left
and right of the arrow the same category symbol is present. There is, to date, no empir-
ical evidence for levels of embedding deeper than one (primary and secondary gesture
units). The braces show that the vertical listing of alternative symbol chains could each
serve as a “replacement” for gesture units. The individual symbols and the constituents
they represent can either be obligatory or optional. If they are optional, this is shown by
using parentheses.
According to Kendon (2004, 1972) gesture units (GU) are limited by positions of
relaxation and – in contrast to gesture phrases (GP) – obligatorily contain a phase of
retraction (Fricke 2012). A primary gesture unit is the highest constituent in the ges-
tural constituent structure, whereas secondary gesture units are dominated by a primary
gesture unit (Fricke 2012). Gesture units can be simple (GP + Retr) or complex. In prin-
ciple, complex gesture units consist of an arbitrary number of gesture units and/or ges-
ture phrases. Analyses of selected video sequences show that the embedding of
secondary gesture units is indicated by the degree of relaxation and the location of
the respective rest position. Primary gesture units show complete relaxation, whereas
in secondary gesture units the relaxation is only partial. The hierarchy level is indicated
by “gestural cohesion” (McNeill 2005): All coordinated gesture units show the same
degree of relaxation in their retraction and the same location of the rest position (Fricke
2012). Stroke phrases (SP), too, can be either simple or complex. They expand to one or
more strokes (S) that are ordered next to each other. Strokes (S) expand then to an
obligatory stroke nucleus (Kendon 2004: 112) which can be preceded or followed by
an optional hold. Whether it is a so-called pre- or post-stroke hold is not categorically
determined, but rather by its position in the constituent structure. The terminal constit-
uents are the gesture phases (e.g., Bressem and Ladewig 2011; Kendon 1980, 2004;
Kita, Van Gijn, and Van der Hulst 1998) stroke nucleus (s), hold (Hold), preparation
(Prep), and retraction (Retr).
What conclusions can we draw from this for language theory? If we consider the cur-
rent debate about recursion and language complexity started by Hauser, Chomsky, and
Fitch (2002) then the fact that co-speech gestures are recursive carries with it the
following consequences:
Based on the assumption that recursion is specific to the language faculty in the nar-
row sense (FLN), then the recursion of co-speech gestures forces them to be considered
as an integral element of language. An indication that the human language faculty of
Hauser, Chomsky, and Fitch is not viewed as being fundamentally modality-specific
but rather the opposite, that the possibility of a change in modality is a determining fea-
ture can be found in the following quote: “[…] only humans can lose one modality (e.g.,
hearing) and make up for this deficit by communicating with complete competence in a
different modality (e.g., signing)” (Hauser, Chomsky, and Fitch 2002: 1575). With this,
the authors find themselves not far from Hjelmslev’s postulate that the substances do
not in and of themselves define language and that one and the same form can be mani-
fest in different substances (Hjelmslev 1969). From the acceptance of a compensatory
function of gestures through Hauser, Chomsky, and Fitch it is just a small step to accept-
ing a fundamentally multimodal constitution of language. Should the multimodality
of language be denied, it follows that, through the recursivity of co-speech gestures,
recursion cannot be unique to the faculty of language in the narrow sense.
6. Syntactic functions as a device for code integration

within single languages: Multimodal attribution
in German noun phrases
6.1. Code integration: Structural and functional integration
in grammar and language use
The assumption that co-speech gestures are integrated on the level of language use is
widely accepted. For example, co-speech gestures are coordinated with prosodic aspects
(e.g., Loehr 2004; McClave 1991) and are assumed to be semantically and pragmatically
co-expressive with the verbal utterance (e.g., Kendon 2004; McNeill 1992, 2005). More-
over, the idea that certain syntactic functions or structures can be instantiated by enti-
ties of different modalities, e.g., visual or auditory, is not new and can be traced back to
linguists and semioticians such as Karl Bühler (2011), Louis Hjelmslev (1969), and
Kenneth Pike (1967). Nevertheless, most theoretical and descriptive studies in linguis-
tics so far are based on vocal speech and its auditory channel alone, whereas gesture
scholars have tended to neglect the dimension of grammar. Therefore, we aim at the
core of grammar and focus on the syntactic relationship between spoken language
and accompanying gestures. Our basic questions are: Can gestures take over grammat-
ical functions in spoken language? Are there points of structural integration into vocal
syntax? And if so, of what kind? The first goal of section 6 is to give proof that co-
speech gestures can be structurally integrated as constituents of noun phrases in
German spoken language. The second goal is to show that these syntactically integrated
gestures can function as attributes to the verbal nucleus of noun phrases. The current
debate on the German deictics so ‘like this’ and son ‘such a’ shows that the adverb so
and the article son constitute a point of multimodal integration (e.g., Ehlich 1987, Fricke
2007, 2012; Streeck 2002, 2009; Stukenbrock 2010). According to Hole and Klumpp
(2000) analyses of German noun phrases show that son has to be considered as an article
which is governed by the nuclear noun of the noun phrase. As a qualitative deictic,
denoting a quality, son requires a qualitative description, which can be instantiated either
verbally or gesturally. If the qualitative description takes place through an iconic
gesture, it results in a categorial selection of the gestural modes of representation

according to Müller (Fricke 2012).
6.2. Attributes in noun phrases

As the term is commonly defined, attributes are a word or group of words that qualify
another word. Attributes in the narrow sense are conceived as syntactically constrained
to expansions of the nuclear noun in noun phrases (Eisenberg 1999). They constitute
the core area of attribution and are covered by all definitions of attribute. The following
example is a typical case of an adjectival attribute:
(3) the circular table
On the syntactic level, the adjectival attribute circular is an expansion of the nuclear
noun and a constituent of the respective noun phrase the circular table. On the semantic
level, attributes are modifications of the noun, which is the nucleus of the noun phrase
(Eisenberg 1999). In this case modification can most easily and simply be understood as
the intersection of sets between the semantic extension of the adjective circular (all cir-
cular entities) and the semantic extension of the noun table (all tables). The resulting set
is a set of tables with the characteristic “being circular”.
Now regard the following examples, which are designed on the basis of an empirical
route description. They address the quality of shape of a given office tower, the so-called
“Sony Center” in Berlin, which is built in the shape of a semicircle. The speaker localizes
this tower, which resembles a bisected cylinder, on the right side of her gesture space.
Her mode of modeling evokes the impression of a vertical image with a sense of depth.
In each of the following four examples in figure 46.4 we deal with noun phrases ini-
tiated by a finite article and the noun tower as its nucleus. In (5) and (6) the noun
phrase on its verbal level is expanded with the attribute semicircular, which modifies
the nuclear noun semantically, whereas in (4) and (7) there are no attributive expan-
sions on the verbal level. The verbal utterances in the examples (6) and (7) are also ac-
companied by a speaker’s gesture of modeling a semicircular shape.
(4) the tower

(5) the semicircular tower
(6) the semicircular tower (+ gesture)
(7) the tower (+ gesture)
Fig. 46.4: The gesture modeling a semicircular shape in the examples (4) to (7)
What difference occurs between the examples (5) and (7)? Both utterances inform the
addressee about the shape of the object referred to. The difference merely consists of
the fact that in example (5) the speaker refers to the shape of the object exclusively
verbally, while in example (7) this happens solely gesturally. This shows that the
attributive function of modifying the nuclear noun in a noun phrase can also be instan-
tiated solely by gesture. The resulting intersection of semantic extensions is the same
in both cases: A set of towers with the characteristic “being semicircular”. So certain
occurrences of co-speech gestures fall within the scope of Eisenberg’s concept of an
attribute mentioned above.
6.3. Types of multimodal code integration: Substitution, temporal

overlap, and cataphoric integration
With regard to noun phrases we can distinguish between two main types of multimodal
code integration: First, positional integration (substitution of a syntactic gap and tem-
poral overlap) and second, cataphoric integration by way of using the German article
son (Fricke 2008, 2012). A further subordinated type is the categorial selection of iconic
gestures with respect to specific modes of representation according to Müller (1998)
(see section 6.4.). Ladewig (2012) has shown that primarily co-speech gestures in
linearly-constructed multimodal utterances substitute for syntactic constituents (posi-
tional integration) and not emblematic gestures as, until recently, was widely assumed.
The following example (fig. 46.5 and 46.6) focuses on temporal overlap and cataphoric
integration (Fricke 2012: 251).
Fig. 46.5: Modeling a right angle (Stroke 1)
Fig. 46.6: Modeling a right angle (Stroke 2)
The German speaker on the left describes the façade of the Berlin State Library: She
uses the noun phrase sone gelb-goldenen Tafeln ‘such yellow golden tiles’ accompanied
by a gesture modeling a rectangular shape. On the verbal level, the adjective gelb-
golden expands the nuclear noun, modifying it at the same time by reducing its
extension to tiles with a specific characteristic of color. On the gestural level, the rect-
angular shape performed by the hands of the speaker fulfills an analogous function of
modifying the nuclear noun. The resulting intersection of both extensions is a set of tiles
with a specific characteristic of color (yellow golden) and a specific characteristic of
shape (rectangular). This division of labor is a very frequent pattern in multimodal
noun phrases: due to the particular medial capacity of both modes, speakers tend to
use gestures for referring to aspects of shape, whereas the use of verbal adjectives pro-
vides information with respect to color (Fricke 2008, 2012). The noun phrase of this
example shows a temporal overlap between the verbal adjective and the modifying
co-speech gesture.
The crucial question is whether or not co-speech gestures are capable of instantiating
independent syntactic constituents detached from the nuclear noun in a noun phrase.
Co-speech gestures can only be expected to adopt an attributive function within verbal
noun phrases on the syntactic level if this requirement is met. Research into this ques-
tion so far did not get beyond the assumption that co-speech gestures can fill syntactic
gaps in linear verbal constituent structures. Considering temporal overlaps as given in
this example, the following alternative explanation with respect to the relation between
the rectangular gesture and the nuclear noun tiles could be offered: The rectangular
shape metonymically stands for the respective concept TILE which is associated with
the word form tiles (e.g., Lakoff and Johnson 1980; Mittelberg 2006; Mittelberg and
Waugh 2009). This explanation would also be in line with the assumption of a so-called
“lexical affiliate” according to Schegloff (1984).
It is worth observing at this point that in colloquial German the article son ‘such a’
provides a syntactic integration of modifying gestures within verbal noun phrases as re-
quired above. According to Hole and Klumpp (2000), the qualitative deictic son is a
fully grammaticalized article inflecting for case, gender and number, which is simulta-
neously used for definite type reference and indefinite token reference. They give con-
vincing evidence that in German the article son is not just an optional contraction of so
‘such’ and ein ‘a’. They emphasize that “son does not just narrow down the meaning of
the indefinite article, it introduces a whole new dimension, namely, that of a necessary
two-dimensional reference classification” (Hole and Klumpp 2000: 240). Exactly these
two dimensions of reference apply also to multimodal noun phrases with son as article.
In the following examples the speaker informs the addressee about the shape of the
table he wants to buy within the next few days. The underlying pattern is “I want to
by such a [quality] table”. In contrast to noun phrases with definite articles the speaker
refers in this case to an indefinite token of a definite type (Hole and Klumpp 2000: 234).
With respect to example (10) this means that the speaker wants to buy a specific kind of
table (definite type) that only looks like the table he is pointing at (indefinite token).
(8) Ich will sonen [runden] Tisch kaufen. (verbal description)

‘I want to buy such a [circular] table.’
(9) Ich will sonen Tisch kaufen. (+ iconic gesture of a circle)

‘I want to buy such a table.’ (+ iconic gesture of a circle)
(10) Ich will sonen Tisch kaufen. (+ pointing to a circular table)

‘I want to buy such a table.’ (+ pointing to a circular table)
As we have seen, the German article son shows a multimodal integratability of a very
high degree. As a fully grammaticalized article it is governed by the nuclear noun (first
step), as a qualitative deictic son cataphorically requires the description of a quality,
which can be instantiated either verbally or gesturally (second step). Both steps will
be explicated and complemented by a third step in the following section with respect
to Seiler’s continuum of determination within complex noun phrases.
6.4. Seiler’s continuum of determination: The German article son as

the turning point between specification and characterization
Son as an article with two dimensions of reference (indefinite token of a definite type)
instantiates exactly the turning point in Seiler’s continuum of determination from
“determination of reference to determination of concept” (Seiler 1978: 310) by epito-
mizing both principles. Seiler assumes the following two main rules for the serialization
of determiners in the broad sense (including adjectives, numerals, quantifiers, etc.)
within noun phrases:
(i) Specification (determination of reference): “The range of head nouns for which a
determiner D is potentially applicable increases with the potential distance of
that determiner from the head noun N”. (Seiler 1978: 308)
(ii) Characterization (determination of concept): “Determiners indicate properties im-
plied in the concept represented by the head noun. The degree of naturalness of
such an implication of Dni vs. Dnj decreases proportionally to the distance of
Dni vs. Dnj with regard to the head noun”. (Seiler 1978: 310)
In the following illustration (Fig. 46.7), Seiler’s continuum with both its domains is high-
lighted by two grey-shaded rectangles with a bilaterally oriented arrow between them.
Moving from the left side towards the right, the determination of reference declines and
the determination of concept increases, while moving from right to left determination of
concept declines and determination of reference increases. On the gestural level, the
deictic gesture is attached to the domain of specification and the iconic gesture to the
domain of characterization.
Because son as article is (Fig. 46.7), according to Eisenberg, governed by the nuclear
noun with respect to its gender, and because in specific contexts the existence of a ges-
ture, either deictic or iconic, is a precondition for the possibility of using son, son within
a noun phrase instantiates an additional turning point, namely, between linguistic
monomodality and linguistic multimodality. Son is the syntactic integration point on
the level of the linguistic system for gestures accompanying speech in noun phrases.
Gestures structurally integrated to such an extent can also be integrated functionally
as attributes in verbal noun phrases. Thus, because son in the noun phrase requires a
qualitative description, which can be gesturally instantiated as well, it is shown that ico-
nic gestures in noun phrases constitute autonomous syntactic units detached from the
nuclear noun. Furthermore, they can establish syntactic relations with the nuclear noun.
If the gestural qualitative determination takes place through an iconic gesture then
there follows a categorial selection of the gestural modes of representation (Müller
1998) by the article son (Fricke 2012): in noun phrases with the article son, iconic
MULTIMODAL INTEGRATION
(qualitative determination)
Government (gender)
Turning point (TP) Nuclear noun (NN)
cataphoric/
son
catadeictic
Deictic gesture categorial

Iconic gesture
– the hand models
selection
(The demonstratum is – the hand draws
an indefinite token of a – the hand acts (rare)
definite type.) – *the hand represents
Extension Intension
Class/Individual Properties
Reference continuum Concept
Specification Characterization
Fig. 46.7: The article son as turning point and syntactic integration point for co-speech gestures in
noun phrases (Fricke 2012: 228)
gestures primarily occur in the modes of representation “the hand models” and “the
hand draws”, with rare occurrences in the mode “the hand acts”. The mode “the
hand represents”, by contrast, does not appear once in a corpus of instructions for
reaching a destination around Potsdam Square. The nuanced manual depiction of par-
ticular characteristics of objects might be hindered because the whole hand represents
an object in this mode.
The fact that there seems to occur a categorial selection of iconic gestures through son
basically distinguishes a qualitative determination through iconic gestures generated dur-
ing speaking from a qualitative determination through extralinguistic objects as demon-
strata of deictic gestures. These objects are given in a concrete situation and are
interpreted by the speaker and the addressee according to a specific quality by means
of deictic guidance of attention. Taken together, we therefore deal with a syntactical inte-
gration of gestures into noun phrases emerging in three consecutive steps (Fricke 2012:
230): The first step is constituted by the government of son by the nuclear noun of the
noun phrase (dotted arrow), the second step consists of the cataphoric integration of a
gestural qualitative determination required by son (solid arrow). In the case of an iconic
gesture providing the qualitative determination, the third step accomplishes a categorial
selection with respect to the four gestural modes of representation “the hand models”,
“the hand draws”, “the hand acts”, and “the hand represents” (dashed arrow).
7. Conclusion: Why we need a multimodal approach to grammar

“Verbal and nonverbal activity is a unified whole, and theory and methodology should
be organized or created to treat it as such” (Pike 1967: 26). Therefore, we need a multi-
modal approach to grammar contributing to a description of language in all its struc-
tural, functional as well as medial and cognitive particularities. As we have seen, this
kind of approach is not only basically feasible but also necessary to do justice to lan-
guage in general and to its function as a medium of individual communication. This
argument has been substantiated by the analysis of typification and semantization of
gestures as potential syntactic constituents, by giving the rules of a generative con-
text-free phrase structure grammar of co-speech gestures which displays recursion
and self-embedding, and by the grammatical analysis of multimodal attribution in Ger-
man noun phrases. The notion of multimodality so far has been constrained to specific
types of utterances such as gesture-speech relations or language-image relations. If we
conceive of multimodality as a global dimension of linguistic and semiotic analysis
which is generally applicable to language and other systems of signs then we have to
broaden our perspective by also including grammars of single languages and the
human faculty of language. With respect to linguistics, this extension of perspective
on multimodality reveals two basic principles: multimodal code manifestation of the
language faculty and multimodal processes of code integration within grammars of sin-
gle languages on the level of the language system. With regard to the objectives of a mul-
timodal approach to grammar, we have distinguished between linguistic multimodality
in the narrow and the broad sense. Multimodality in the narrow sense occurs when
the media involved in an expression belong to different sense modalities and are struc-
turally or functionally integrated in the same code or, alternatively, manifest the same
code, e.g., “gesture-speech ensembles” (Kendon 2004). In the broad sense of multimedi-
ality, the media involved belong to the same sense modality, e.g., “language-image ensem-
bles”. It is worth pointing out that both kinds of multimodal ensembles differ with
respect to their specific potential for establishing and instantiating grammatical struc-
tures and functions. According to Goodman’s Languages of Art (1976), non-linear
images are essentially syntactically and semantically dense whereas we must give due
recognition to the fact that linear co-speech gestures are not. These findings offer new
perspectives for comparative studies on multimodality and grammaticalization combining
both areas of research.
8. References
Bolinger, Dwight L. 1975. Aspects of Language. New York: Harcourt Brace Jovanovich. First
published [1968].
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases: Articulatory features of ges-
tural movement? Semiotica 184: 53–91.
Bühler, Karl 2011. Theory of Language. The Representational Function of Language. Amsterdam:
Benjamins. First published [1934].
Metaphor and Gesture, 5–24. Amsterdam: Benjamins.
Cienki, Alan and Cornelia Müller (eds.) (2008). Metaphor and Gesture. Amsterdam: Benjamins.
Corballis, Michael C. 2007. The uniqueness of human recursive thinking. American Scientist 95:
240–248.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton. First published [1941].
Ehlich, Konrad 1987. so – Überlegungen zum Verhältnis sprachlicher Formen und sprachlichen
Handelns, allgemein und an einem widerspenstigen Beispiel. In: Inger Rosengren (ed.),
Sprache und Pragmatik. Lunder Symposium 1986, 279–313. Stockholm: Almqvist and Wiksell.
Eisenberg, Peter 1999. Grundriß der deutschen Grammatik. Volume 2: Der Satz. Stuttgart:
Metzler.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior. Categories, ori-
Everett, Daniel L. 2005. Cultural constraints on grammar and cognition in Pirahã. Another look at
the design features of human language. Current Anthropology 46: 621–646.
Firth, John Rupert 1957. The use and distribution of certain English sounds. In: John Rupert Firth,
Papers in Linguistics 1934–1951, 34–46. London: Oxford University Press. First published
[1935].
Fitch, W. Tecumseh 2010. Three meanings of “recursion”: Key distinctions for biolinguistics. In:
Richard K. Larson, Viviane Déprez and Hiroko Yamakido (eds.), The Evolution of Human
Language, 73–90. Cambridge: Cambridge University Press.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: De Gruyter.
Fricke, Ellen 2008. Grundlagen einer multimodalen Grammatik: syntaktische Strukturen und
Funktionen. Habilitation thesis, European University Viadrina, Frankfurt (Oder).
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik. Sprache und
Literatur 41: 69–88.
Gruyter.
Fricke, Ellen, Jana Bressem and Cornelia Müller in preparation. Gesture families and gestural
fields. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication. An International Handbook
on Multimodality in Human Interaction (HSK 38.2). Berlin and Boston: De Gruyter Mouton.
Goodman, Nelson 1976. Languages of Art. An Approach to a Theory of Symbols. London: Oxford
University Press. First published [1968].
Harrison, Simon 2008. The expression of negation through grammar and gesture. In: Jordan Zla-
tev, Mats Andrén, Marlene Johansson Falck and Carita Lundmark (eds.), Studies in Language
and Cognition, 405–409. Newcastle upon Tyne: Cambridge Scholars Publishing.
dissertation, Université Bordeaux 3.
Hauser, Marc D., Noam Chomsky and W. Tecumseh Fitch 2002. The faculty of language: What is
it, who has it, and how did it evolve? Science 298(4): 1569–1579.
Hjelmslev, Louis 1969. Prolegomena to a Theory of Language. Madison: University of Wisconsin
Hole, Daniel and Gerson Klumpp 2000. Definite type and indefinite token: The article son in col-
loquial German. Linguistische Berichte 182: 231–244.
Karlsson, Fred 2010. Syntactic recursion and iteration. In: Harry van der Hulst (ed.), Recursion
and Human Language, 43–67. Berlin: De Gruyter Mouton.
Kendon, Adam 1972. Some relationships between body motion and speech. An analysis of an
example. In: Aron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary R.
Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The Hague: Mouton.
Pointing: Where Language, Culture, and Cognition Meet, 109–137. Mahwah, NJ: Erlbaum.
Kinsella, Anna 2010. Was recursion the key step in the evolution of the human language faculty? In:
Harry van der Hulst (ed.), Recursion and Human Language, 179–191. Berlin: De Gruyter Mouton.
speech gestures, and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Krämer, Sybille 1998. Das Medium als Spur und Apparat. In: Sybille Krämer (ed.), Medien, Com-
puter, Realität: Wirklichkeitsvorstellungen und Neue Medien, 73–94. Frankfurt am Main:
Suhrkamp.
Kress, Gunter 2011. What is mode? In: Carey Jewitt (ed.), The Routledge Handbook of Multimo-
dal Analysis, 54–67. London: Routledge. First published [2009].
Kress, Gunter and Theo van Leeuven 2006. Reading Images. The Grammar of Visual Design. Lon-
don: Routledge. First published [1996].
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern. Varianten einer rekurrenten Geste.
Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt
(Oder).
Lobina, David J. 2011 “A running back” and forth: A review of recursion and human language.
Biolinguistics 5(1–2): 151–169. http://www.biolinguistics.eu/index.php/biolinguistics/article/
view/198.
Lobina, David J. and José Eugenio Garcı́a-Albea 2009. Recursion and cognitive science: Data
structures and mechanisms. In: Niels Taatgen and Hedderik van Rijn (eds.), Proceedings of the
31st Annual Conference of the Cognitive Science Society, 1347–1352. Austin, Texas: Cognitive
Science Society
Loehr, Daniel P. 2004. Gesture and intonation. Ph.D. thesis, Georgetown University, Washing-
ton, DC.
Lyons, John 1977. Semantics. Volume 2. Cambridge: Cambridge University Press.
McClave, Evelyn Z. 1991. Intonation and gestures. Ph.D. thesis, Georgetown University, Washing-
ton, DC.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92: 350–371.
University Press.
Merten, Klaus 1999. Einführung in die Kommunikationswissenschaft. Volume 1/1: Grundlagen der
Kommunikationswissenschaft. Münster, Germany: LIT.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University, Ithaca, NY.
ture, 115–154. Amsterdam: Benjamins.
Mittelberg, Irene and Linda R. Waugh 2009. Metonymy first, metaphor second: A cognitive-
semiotic approach to multimodal figures of speech in co-speech gesture. In: Charles Forceville
and Eduardo Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
tures. Proceedings of the Berlin Conference April 1998, 233–256. Berlin: Weidler.
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View.
Perspektive. Sprache und Literatur 41/1: 37–68.
Müller, Cornelia submitted. How gestures mean. The construal of meaning in gestures with speech.
Peirce, Charles Sanders 1931–58. Collected Papers. Charles Hawthorne and Paul Weiss (eds.)
Volumes 1–6; Arthur W. Burks (ed.) Volumes 7–8. Cambridge, MA: Harvard University Press.
Pike, Kenneth L. 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior. The Hague: Mouton.
Pinker, Steven and Ray Jackendoff 2005. The faculty of language: What’s special about it? Cog-
nition 95(2): 201–236.
Posner, Roland 1986. Zur Systematik der Beschreibung verbaler und non-verbaler Kommunika-
tion. In: Hans-Georg Bosshardt (ed.), Perspektiven auf Sprache: Interdisziplinäre Beiträge
zum Gedenken an Hans Hörmann, 267–313. Berlin: De Gruyter.
Posner, Roland 2004. Basic tasks of cultural semiotics. In: Gloria Withalm and Josef Wallmanns-
berger (eds.), Signs of Power – Power of Signs. Essays in Honor of Jeff Bernard, 56–89. Vienna:
INST.
Ruesch, Jurgen and Weldon Kees 1972. Nonverbal Communication: Notes on the Visual Perception
of Human Relations. Berkeley: University of California. First published [1956].
Sauerland, Uli and Andreas Trotzke 2011. Biolinguistic perspectives on recursion: Introduction to
the special issue. Biolinguistics 5(1–2): 1–9. http://www.biolinguistics.eu/index.php/biolinguistics/
article/view/201/210.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Action: Studies in Conversational Analysis, 266–296. Cam-
bridge: Cambridge Universtity Press.
Scherer, Klaus R. and Harald G. Walbott 1984. Nonverbale Kommunikation. Forschungsberichte
zum Interaktionsverhalten. Weinheim: Belz.
Seiler, Hansjakob 1978. Determination: A functional dimension for interlanguage comparison. In:
Hansjakob Seiler (ed.), Language Universals, 301–328. Tübingen: Narr.
Stetter, Christian 2005. System und Performanz. Symboltheoretische Grundlagen von Medienthe-
orie und Sprachwissenschaft. Weilerswist: Velbrück Wissenschaft.
Stöckl, Hartmut 2004. In between modes: Language and image in printed media. In: Eija Ventola,
Cassily Charles and Martin Kaltenbacher (eds.), Perspectives on Multimodality, 9–30. Amster-
dam: Benjamins.
Streeck, Jürgen 2002. Grammars, words, and embodied meanings: On the uses and evolution of so
Streeck, Jürgen 2009. Gesturecraft. The Manu-facture of Meaning. Amsterdam: Benjamins.
Stukenbrock, Anja 2010. Überlegungen zu einem multimodalen Verständnis der gesprochenen
Sprache am Beispiel deiktischer Verwendungsweisen des Ausdrucks so. InLiSt – Interaction
and Linguistic Structures 47. http:// www.inlist.uni-bayreuth.de/issues/47/InLiSt47.pdf.
Watzlawick, Paul, Janet Beavin Bavelas and Don D. Jackson 1967. Pragmatics of Human Commu-
nication: A Study of Interactional Patterns, Pathologies, and Paradoxes. New York: Norton.
Wundt, Wilhelm 1904. Völkerpsychologie. Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Volume 1: Die Sprache. Leipzig: Engelmann. First published [1900].
Wundt, Wilhelm 1973. The Language of Gestures. The Hague: Mouton. First published [1900].
Zelinsky-Wibbelt, Cornelia 1983. Die semantische Belastung von submorphematischen Einheiten
im Englischen: Eine empirisch-strukturelle Untersuchung. Frankfurt am Main: Peter Lang.
Ellen Fricke, Freiburg i. Br. and Chemnitz (Germany)

47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 755
47. The exbodied mind: Cognitive-semiotic principles

as motivating forces in gesture
1. Gesture and the exbodied mind
2. Theoretical points of departure: Peirce, Jakobson, and cognitive semantics
3. Motivation in gesture
4. Abstraction, metonymy, and semiotic grounding
5. Jakobsonian contiguity relations and metonymic modes in gesture
6. On the semiotic reality of image schemata and force gestalts in gesture
8. References
1. Gesture and the exbodied mind

Emerging from the human body, gestures have been shown to provide valuable insights
into the physical grounding and socio-cultural situatedness of cognition and language
use. Since they tend to be produced rather unconsciously, gestures may particularly
reveal less-monitored aspects of cognitive and emotional processes during communica-
tion (e.g., Cienki 1998a; Müller 1998; Sweetser 1998). Given the central place of the em-
bodied mind in cognitive linguistics and other experientialist theories that account for
multimodal and sensorimotor dimensions of concepts, mental imagery, language, and
meaningful experience more broadly, exploring the bodily basis of perceptive, imagina-
tive, and communicative processes seems to be a pertinent endeavor (e.g., Gibbs 2006;
Johnson 1987, 2005; Taub 2001; Turner 2006). The assumption that our higher cognitive
and linguistic capabilities are shaped by the architecture of our bodies and the way we
interact with the world around us has challenged more traditional theories, especially
Cartesian approaches, which view intelligence as predominantly abstract and disembo-
died. According to the theory of the embodied mind (e.g., Gibbs 1994; Johnson 1987,
2007; Lakoff and Johnson 1999; Sweetser 1990; Varela, Thompson and Rosch 1992),
“meaning and value are grounded in the nature of our bodies and brains, and in our
physical, social and cultural environments” (Johnson 1992: 346). Particularly relevant
for gesture is the insight that mental “imagery is accompanied by sensorimotor sensa-
tions, or whole “body loops” (Damasio, 1994), which give imagistic experience its rich
phenomenological quality” (Gibbs 2006: 138). Complementing cognitivist theories,
Bourdieu’s (1990) notion of the habitus unites historically formed behaviors with a
deep enculturation of the body, reflecting and engendering schemata of perception,
thought and symbolic action.
Building on the premises of the embodied mind, and looking at it from the inside
out, the present approach to gesture centers on how cognitive-semiotic principles –
such as iconicity, indexicality, metaphor, metonymy, and image schemata – interact in
motivating and structuring multimodal messages and performances. While acknowled-
ging how internalized patterns of action and cognition arise, the focus is on how they
might drive processes of meaningful expression, and visibly manifest themselves in
communicative bodily movements integrated with spoken discourse. Another concern
is how the specific material properties of each medium may determine the cross-modal
distribution of semantic features and pragmatic functions (Mittelberg 2006). Studying
bodily semiotics from the angle proposed here aims to shed light on the “ex-bodiment”
(Mittelberg 2008: 148; 2010b: 376) of mental imagery, internalized conceptual struc-
tures, action patterns, and felt qualities of experience, whereby the human body func-
tions, as in processes of embodiment, as the living medium through which such
bidirectional mechanisms of abstraction and concretization are shaped.
Exbodiment entails the motivated semiotic structure inherent to communicative ges-
tures made with the hands and arms, as well as to expressive postures and full body
movements. Action routines and embodied image schemata are assumed to drive the
body’s intuitive expressions, as well as more consciously produced descriptions and
(re-)enactments of observed, lived or imagined experience, including physical and
social forces. This view also accounts for gestural signs exhibiting dimensions that
point beyond perceivable bodily semiotics by metonymically alluding to imaginary
physical objects, virtual movement traces or spatial projections that appear to be con-
tiguous to the gesturing hand(s). Gestural manifestations of basic geometric patterns,
image-schematic structures and metaphorical understandings of abstract ideas or pro-
cesses may be observed to emerge when speakers seem to outline or manipulate virtual
physical objects, relations or structures, while talking about emotions, inner mental
states or abstract knowledge domains (e.g., Cienki and Müller 2008; Mittelberg 2010b;
Müller 2008; Núñez 2008). Gestures are thus a means to express, reify and show to inter-
locutors both imagined and sensed dimensions of mental imagery. They may lend a per-
ceptible gestalt to concepts, ideas and memories, if only for a moment and if only in the
form of furtively drawn, invisible lines or demarcated chunks of space. Primary meta-
phors (Grady 1997), in particular, have been shown to emerge in the gestural modality,
even if the accompanying speech is non-figurative (Mittelberg 2006, 2008). Hence, one
of the guiding questions is how cognitively entrenched patterns of experience – arisen
from visual perception, navigation through space, tactile exploration, and other practices
of bodily interaction with the sensorial and social world – may motivate gestural sign
formation and interpretation, as well as structure the (interactive) use of gesture space.
The semiotic perspective taken here further places a focus on gesture interpretation,
in which one’s own habitual movements and actions as well as one’s personal semiotic
history may guide the understanding of multimodal semiotic acts performed by others.
As we may express “felt qualities of our experience, understanding, and thought”
(Johnson 2005: 31) through our own gestures and body postures, observing others
doing so enables us to see and feel what they are trying to convey (Mittelberg
2010a). Johnson (2007: 162) further points out that understanding other people’s actions
involves mental simulation of physical actions: “This deep and pre-reflective level of
engagement with others reveals our most profound bodily understanding of other peo-
ple, and it shows our intercorporeal social connectedness”. The recent discovery of mir-
ror neurons indicates that the sensorimotor areas that are active in the brain when a
person performs a goal-directed action are triggered when the person observes someone
else perform the action (e.g., Rizzolatti and Craighero 2004). These observations are also
crucial with regard to gestures and their role in language understanding (Skipper et al.
2009) as well as in respect to the human capacity to assume and multimodally express
multiple viewpoints on a given experience (Sweetser 2012: 12–16).
A gesture is – at least in many cases – a gesture, because the hands do not manipulate
a physical object or structure, but rather pretend to do so. This letting go of the material
world in imitative gestures turns a transitive manual action with an object or a tool in
hand into a more abstract communicative hand movement, from which objects or tools
may still be inferred. Indeed, gestures may reflect the speakers’ embodied knowledge of
the material world and its affordances in various ways. Gesturing hands may also seek
and establish contact with the physical environment and with what Hutchins (1995)
calls “material anchors” in cross-modally orchestrated processes of manufacturing
meaning (Streeck 2009; see also Enfield 2009; Goodwin 2007; Lebaron and Streeck
2000; Williams 2008). Such grounded activities of multimodal cognition and communi-
cation also fall into the scope of the exbodied mind, understood as always being indexi-
cally anchored in the concreteness of the human body, its physical habitat, and the
intersubjective dynamics of communication (e.g., Zlatev et al. 2008).
In the present framework, these phenomena are described with help of the funda-
mental semiotic modes similarity and contiguity, as well as their subtypes, which provide
means to capture both fine distinctions and transient cases regarding the gestural forms
and functions of interest here. It will be argued that while the perceived similarity
between the bodily actions and gestures we observe in others and our own perceptual
experiences and physical routines may determine how we cognitively and physically
align with our interlocutors, similarity is only one pathway to understanding the inten-
tions and semantics of communicative behavior (for mimicry in gesture see, e.g., Holler
and Wilkin 2011). As will be discussed in detail below, contiguity relations between
the gesturing body and the material and social world also play a central role in sensing
and interpreting the meaning of coverbal gestures. In addition, contact, adjacency and
impact are different kinds of contiguity relations between entities in the physical world
or in semiotic structure that may be established, highlighted, or deleted in gestural sign
formation and interpretation.
For a brief preview of the kinds of distinct semiotic processes that will be discussed
in detail throughout this article, let us look at two gestural examples (taken from
Mittelberg 2010b). Both persons shown below are US-American linguistics professors
lecturing about grammatical categories. In the sequence where the gesture shown in
Fig. 47.1 occurs, the speaker introduces the notion of semantic roles: To account for
this… we use names of semantic roles that bounce around in linguistics… agent, patient,
recipient, goal, experiencer… those are semantic roles. On the mention of recipient she
produces a palm-up open hand with slightly bent fingers held near her body at hip
level. In this multimodal performance unit, the uttered word recipient refers to a specific
semantic role, i.e. an abstract grammatical function, which the teacher literally personi-
fies with her entire body: in that very moment, she is a recipient. Given this particular
combination of body posture, arm configuration and hand shape, we can see a similarity
relation between this unified corporeal image and a person holding something. It is left
unspecified whether her open hand is holding an imaginary object already received, or
whether it merely signals readiness to receive something. However, in her speech she
does not refer to any possible object, but solely to the role she is assuming. This bimodal
portrayal of an abstract function is afforded through a) iconicity between the semiotic
structure inherent to her body posture plus gesture and the mundane action of holding/
receiving something; b) a latent contiguity relation between the open hand and a potential
object; and c) metaphor (i.e. personification).
Fig. 47.1: Personification of the semantic role recipient
Fig. 47.2: Emergent noun on open palm
A closer look at the pragmatic functions of the two seemingly similar gestures above
reveals that their similarity mainly resides in the form features of the gestural articula-
tors: they are variants of the palm-up open hand (Müller 2004). Without considering the
speech content it would be impossible to establish which of the semiotic modes mixing
in each of these multimodal explanations predominantly contribute to their meaning.
Explaining the framework of emergent grammar, the speaker shown in Fig. 47.2 points
out that a priori … you cannot define a noun from a verb. When saying the word noun,
this palm-up open hand gesture, slightly extended toward the student audience, consti-
tutes a perceivable surface, i.e. a material support structure, on which the speaker
seems to present the abstract category noun reified as an imaginary tangible object.
In cognitive semantic terms, this gesture seems to manifest the image schemata SUP-
PORT, SURFACE, and OBJECT (Johnson 1987; Mandler 1996), as well as the primary met-
aphor IDEAS ARE OBJECTS or CATEGORIES ARE CONTAINERS (Lakoff and Johnson 1980).
The point here is that iconicity and metaphor do not suffice to account for the partic-
ular form and function of this hand configuration; there is no iconic relationship
between the shape of the manual articulator and a grammatical category. Rather,
an imputed contiguity relation between the open palm and the imaginary noun be-
comes significant: the simultaneously uttered word noun draws attention away from
the action to the imaginary entity which needs to be metonymically inferred from the
open hand.
In the sections below, first the different theoretical strands building the conceptual
foundation of the present approach will be sketched. After discussing processes of moti-
vation and abstraction in gestural sign formation, a set of cognitive-semiotic principles
will be defined and illustrated with gestural examples. Special attention will be paid to
metonymic modes and image-schematic structures that seem to feed into both literal
and metaphoric meaning construction in multimodal discourse. The overall goal of
this article is to show some of the ways in which gestures might attest to the semiotic
reality of embodied conceptual schemata and action patterns, by invoking aspects of
their visuo-spatial, material and multisensory origins.
2. Theoretical points of departure: Peirce, Jakobson,

and cognitive semantics
The cognitive-semiotic approach to gestural sign motivation and interpretation laid out
in this article combines contemporary embodied views of language and cognition (see
above) with traditional semiotic theories. The framework was originally developed to
account for multilayered processes of gestural sign formation and the emergence of
meaning in multimodal academic discourse about grammar and linguistic theory (Mit-
telberg 2006, 2008, 2010a; Mittelberg and Waugh 2009). Taking Peirce’s (1955, 1960)
triad of similarity (iconicity), contiguity (indexicality) and conventionality/habit (symbo-
licity), as well as the three subtypes of iconicity (image/diagram/metaphor) as a starting
point, this work further builds on Jakobson’s (1956, 1960) balanced theory of metaphor
and metonymy as two major modes of association and signification. Jakobson’s work
here functions as a juncture between Peirce’s typology and more recent work on cog-
nitive iconicity (S. Wilcox 2004) and contiguity (Dirven and Pörings 2002; Peirsman
and Geeraerts 2006), frame metonymy (Dancygier and Sweetser 2005; Fillmore
1982), as well as reference-point phenomena (Langacker 1993), and pragmatic inferen-
cing (Panther and Thornburg 2004). Embodied image schemata and force dynamics
also fulfill a crucial role in that they have a mediating function between physical expe-
rience and abstract thought, as well as between literal and metaphorical aspects of
meaning (e.g., Hampe 2005; Johnson 1987; Talmy 1988).
One of the underlying assumptions is that cognitive semantics and the older but still
relevant semiotic frameworks share central tenets regarding patterns of experience,
expression and interpretation (Danaher 1998; Hiraga 2005; Mittelberg 2008). There
seems to be agreement that cognition is inherently multimodal, comprising perceptual,
sensorimotor, tactile, image-schematic, and interactional dimensions (e.g., Gallese and
Lakoff 2005; Johnson 2007; Krois et al. 2007). Moreover, meaning is not assumed to
reside in the material form a sign takes (i.e., a word or a gesture), but to arise in the
dynamic gestalt of a mental representation or some other kind of cognitive and/or phys-
ical response to a perceived sound, word, image, or human behavior.
Adopting a wider semiotic perspective allows us to account for both a highly sym-
bolic sign system, such as language, and visuo-spatial modalities, such as coverbal ges-
ture and body posture. According to Peirce’s pragmatist doctrine of signs, cognition and
semiosis go hand in hand: “we think only in signs” (Peirce 1960: 169). Peirce’s dynamic
triadic model of the sign process has informed gesture research done from several per-
spectives, including psycholinguistics (e.g., McNeill 1992), linguistics (e.g., Andrén 2010;
Fricke 2007) and anthropology (e.g., Enfield 2009; Haviland 2000). Peirce’s widely cited
definition of the sign is also provided here, particularly to recall the terms Representa-
men (the material form the sign takes) and the interpretant, i.e., the response/association
the Representamen evokes in the mind of the sign receiver. Hence, without an
interpreting mind there is no sign, that is, no semiosis and no meaning.
A sign [in the form of a representamen] is something which stands to somebody for some-
thing in some respect or capacity. It addresses somebody, that is, creates in the mind of that
person an equivalent sign, or perhaps a more developed sign. That sign which it creates I
call the interpretant of the first sign. The sign stands for something, its object. (Peirce’s
1960: 135, § 2.228; italics in the original)
A strong point of this model is that it includes the receiver as a participant actively in-
volved in making meaning of the signs she or he perceives. So the notion of interpretant
is central for several reasons: it marks the moment when meaning emerges in interpre-
tative processes (some of which might be propelled by metaphoric associations, for
example, while others by metonymic); it accounts for different minds with different
semiotic experiences and habits; and it exhibits a potential for augmentation regarding
different degrees of semiotic density and ways to link up the intended object with
semantic structure in the conceptual system (Mittelberg 2006: 43).
While responding to the need of categorizing gestures for the purpose of analysis,
many scholars have come to realize that working with categories, even if seen as not
absolute, hardly does justice to the polysemous and multifunctional nature of gestural
forms (cf. Müller 1998). McNeill, for instance, moved away from his original taxonomy
(i.e. iconics, deictics, metaphorics, beats, and cohesives; McNeill 1992) in preference to
dimensions such as iconicitiy and metaphoricity (McNeill 2005). In light of the noted
multifunctionality of gestural signs, the present approach advocates, in alignment
with Peirce (1960) and Jakobson (1987), a hierarchical view, asserting that among the
different semiotic modes that may mix and interact in a given gestural sign, one needs
to establish, in conjunction with the concurrent speech and other contextual factors,
which one(s) actually determine(s) its specific form and local function.
Before laying out in detail the workings of the cognitive-semiotic principles of cen-
tral interest in this article, a few words need to be said about the corpus from which the
examples discussed below are taken, as well as about the empirical methods employed.
The corpus consists of naturalistic academic discourse and coverbal gestures produced
by four US-American linguists while teaching introductory courses to linguistics. On the
basis of twenty-four hours of multimodal discourse, those segments were selected in
which referential gestures (cf. Müller 1998: 110–113) portray linguistic units of different
degrees of complexity, grammatical categories, and syntactic structures and operations.
Transcriptions included the speech of each segment, the course of each gestural move-
ment excursion according to its phases (Kendon 2004: 111), and the exact speech-
gesture synchrony. To record the kinesic features of the gestures, the most widely
used coding parameters were used: hand presence, hand dominance, hand shape,
palm orientation, movement manner and trajectory, and the location in gesture
space. Opting for a data-driven typology of manual signs, the corpus was searched
for prominent hand shapes and movement patterns recurring across speakers and con-
texts. A set of schematic images of objects, actions and relations emerging from the data
then provided the basis for the analysis of cross-modal processes of meaning construc-
tion. For each gesture unit, the information conveyed in the concurrent speech seg-
ments was considered to determine the interaction of the different iconic, metonymic
and metaphoric modes (for more details regarding the methods, see Mittelberg 2007,
2008, 2010a).
Since the relationship between iconicity and metaphor in gesture has already re-
ceived ample attention (e.g., Cienki and Müller 2008), this article focuses on the inter-
action between iconicity and indexicality on the one hand and between metonymic
and metaphoric modes on the other. Assigning considerable weight to metonymic
principles and the role they play in processes of perception, abstraction and inferencing,
the approach presented here aims to offer insights into the motivated parthood of
communicative movements of the human body.
3. Motivation in gesture
From the perspective on gesture taken here the issue of motivation is central. Taking
the gestural material as a starting point, the intention is to establish how the different
modalities share the semiotic work of creating form and meaning. The task is to identify
the forces that might have motivated the form features and pragmatic functions of a
given gesture or sequence of gestures. One complicating factor in gesture analysis is
the fact that the semiotic material we are looking at consists not only of observable
physical components – such as body posture, bodily motion as well as configurations,
movements and locations of hands and arms – but also of immaterial dimensions
such as virtual movement traces left in the air or imagined surfaces, objects or points
in space. Compared to static visuo-spatial modalities such as drawings or sculptures,
gestures typically evoke persons, objects, actions, places or relations in a rather fluid
and ephemeral way. As the articulators in the speaker’s mouth constantly form new
configurations to produce speech sounds, hands may also constantly change their articu-
latory shape as well as the manner and trajectory of movement when in gestural motion
(Bouvet 2001; Bressem this volume). What a potentially polysemous gestural form
stands for can only be determined by considering the simultaneously produced utter-
ances. Speech and its accompanying gestures have been shown to assume specific semi-
otic roles in processes of utterance formation (Kendon 2000: 53). Being “motor signs”
(Jakobson 1987: 474), gestures are prone to depict, or actually constitute, spatial and
dynamic dimensions of what the speaker is talking about, thus grounding information
(partly) conveyed through speech in a visuo-spatial context sharable by interlocutors
(e.g., Müller 2008; Sweetser 1998, 2007). Gestures may further regulate social interac-
tion, either explicitly or in the form of a sort of subtext unfolding in parallel to the
ongoing conversation (e.g., Bavelas et al. 1992; Müller 1998).
Within the field of linguistics, the arbitrary versus motivated nature of human lan-
guage has been a matter of great debate (Jakobson and Waugh 1979; Saussure 1986).
Drawing on Peirce’s notions of image iconicity and diagrammatic iconicity, Jakobson
(1966) not only demonstrated that iconicity is a constitutive factor at all levels of lin-
guistic structure (phonology, morphology, syntax and the lexicon; Waugh 1976, 1994),
he also devised different kinds of contiguity relations in language and other sign sys-
tems. As we will see below, Jakobson’s distinction between inner and outer contiguity
takes center stage in the present framework and serves as the basis for different
types of metonymy (Jakobson and Pomorska 1983). Iconicity and metonymy have also
been ascribed a constitutive role in the formation of signs in American Sign Language
(ASL)(Mandel 1977; P. Wilcox 2004; S. Wilcox 2004). Investigating the relationship of
iconicity and metaphor in American Sign Language, Taub (2001) suggested a set of
principles including image selection (based on similarity or contiguity), schematization
(through abstraction), and encoding (through conventionalization). Compared to highly
symbolic sign systems such as spoken and signed languages, spontaneous gestures do
not show the same degree of formalized conventionalization and grammaticalization.
Hence, some of the most interesting questions that arise here concern the ways in
which gestures exploit and create similarity and contiguity relations differently than
language and how these modes feed into processes of conventionalization.
In his observations on “the body as expression”, Merleau-Ponty (1962: 216)
succinctly states:
It is through my body that I understand other people, just as it is through my body that
I perceive “things”. The meaning of a gesture thus “understood” is not behind it, it is
intermingled with the structure of the world outlined by the gesture.
This kind of “structure of the world” (Merleau-Ponty 1962: 216) as profiled in a gesture
can be assumed to comprise different kinds of structure: physical, semiotic, and/or con-
ceptual. It may reflect the spatial structures and physical entities humans routinely per-
ceive and interact with in their daily lives and professional practices (Goodwin 2007;
Streeck 2009). When asking someone for a small bowl, for instance, we can iconically
illustrate the desired object by evoking the shape of a round container through holding
two cupped open hands closely together with palms facing upward. Such a hand config-
uration not only expresses the idea of a bowl (as the word “bowl” does), but it actually
constitutes for a moment a container of that sort. In linguistics courses, gestures are one
of many visual semiotic resources used to explain abstract categories and functions. For
example, teachers have been found to trace triangle-shaped figures in the air, thus imi-
tating conventional tree diagrams used in textbooks and on blackboards to visualize
hierarchic sentence structure (Mittelberg 2008). In view of gestures that lend a tangible
form to abstract ideas and structures, the question of motivation becomes more com-
plex and leads us into the realm of figurative thought and expression. We then need
to ask in what ways gestures portraying abstracta, beliefs, mental or emotional states
might be shaped by construal operations such as metaphors, metonymies, and framing
(e.g., Cienki 1998a; Cienki and Müller 2008; Evola 2010; Gibbs 1994, 2006; Harrison
2009; Mittelberg 2010a; Müller 1998, 2008; Sweetser 1998, 2012).
Indeed, as Merleau-Ponty pointed out, getting at the meaning of a gesture does not
seem to be simply a matter of reference. Gestures may in the very moment of expres-
sion actually “be” what they are taken to “be about”, and create new semiotic material
from scratch which then can take on a life of its own in the ongoing discourse and
may as such serve as a reference structure for subsequent multimodal explanations
(e.g., Fricke 2007; McNeill 2005). Gesture researchers have come to differentiate ges-
tures that carefully depict an existing and experienced place, object, or event, such as
a certain tool one has used or an animated cartoon one has watched, from those ges-
tures that seem to reflect a concept, a conceptual image/schema, or a vague idea (e.g.,
Andrén 2010; Cienki and Mittelberg 2013; McNeill 1992; Mittelberg 2006; Müller
1998). In his elaborations of the “thinking hand”, Streeck (2009: 151) distinguishes
between two gestural modes: depicting (e.g., via an iconic gesture portraying a physical
object) and ceiving (i.e. via a gesture conceptualizing a thematic object). He attributes
the latter mode to a more self-absorbed way of finding a gestural image for an emerging
idea. Fricke (2007) speaks of interpretant gestures which, in multimodal instruction giving,
may reflect a general concept of an architectural structure, such as a gate, instead of de-
scribing the idiosyncratic shape of a particular passageway that does not represent a pro-
totypical member of the category.
Additional subjective factors influencing which elements of a given object or sce-
nario are attended to, and how the locally salient features are encoded cross-modally,
pertain to the viewpoint the sign producer adopts, e.g. character or observer viewpoint
(McNeill 1992). As a substantial body of recent research has shown, viewpoint is a uni-
versal and extremely flexible construal operation shaping expressions across modalities
in spoken and signed languages (e.g., Dancygier and Sweetser 2012). As Sweetser (2012:
1) points out, “cognitive perspective starts with bodily viewpoint within a real physical
Ground of experience”. Being existentially tied to the speaker’s body, as well as to its
material and socio-cultural context, gestures, whether they be predominantly iconic or
deictic, are inherently indexical (Mittelberg 2008). While gestural and corporeal signs
tend to reflect aspects of the gesturer’s own bodily disposition and stance, they may,
at the same time, reflect multiple, shifting viewpoints on a given scene, thus also
embodying the perspective of others (Sweetser 2012).
What we can draw from these observations for the concept of the exbodied mind is
that motivated gestural sign constitution reflects – besides the primordial role of move-
ment, space and material culture – the workings of the speakers’ cognitive filter, result-
ing in, for instance, viewpointed conceptual images and structures. In this way, they may
attest to the psychological and semiotic reality of dynamic multimodal processes of
conceptualization (e.g., Cienki 1998b; Ladewig 2011; Mittelberg 2008; Müller 2008;
Núñez 2008).
4. Abstraction, metonymy, and semiotic grounding

As any kind of signs, gestures are always partial representations of something else;
hence, they tend to be metonymic in one way or another. So as most processes of per-
ception and expression, gestural sign formation implies abstraction (e.g., Bouvet 2001;
Mittelberg 2006; Müller 1998):
Actually, the portrayal of an object by gesture rarely involves more than some one isolated
quality or dimension, the large or small size of the thing, the hourglass shape of a woman,
the sharpness or indefiniteness of an outline. By the very nature of the medium of gesture,
the representation is highly abstract. What matters for our purpose is how common, how
satisfying and useful this sort of visual description is nevertheless. In fact, it is useful not in
spite of its spareness but because of it. (Arnheim 1969: 117)
Before teasing apart the distinct ways in which this useful metonymic “spareness” (Arn-
heim 1969: 117) of gestures may be brought about, let us first look more closely at what
they might be metonymic of by considering the notion of the Object in Peirce’s triadic
sign model (henceforth, the elements of the Peircean sign model will be capitalized).
We will then narrow in on his concept of the Ground, as principles of abstraction and
partial representation are implemented already at this very basic level of the semiotic
process.
Peirce’s understanding of what a semiotic Object can be is extremely wide and
ranges from existing to non-existing things: it encompasses both concrete and abstract
entities, including possibilities, goals, qualities, feelings, relations, concepts, mental
states, and ideas (e.g., Kockelman 2005). Anything can be an Object, as long as it is re-
presented by a sign (Shapiro 1983: 25). From a cognitive semantics perspective, the non-
physical Objects listed above remind us of common target domains of conceptual
metaphors. As a large body of research on multimodal metaphor has shown, metaphor-
ical understandings can be expressed in either speech or gesture, or simultaneously in
both modalities (e.g., Müller and Cienki 2009). Moreover, the nature and properties
of the Object determine, according to Peirce, the sign. So in the case of a gesturally ex-
pressed metaphor, the Object of the gesture can be said to be the source domain of
the underlying mapping. In this case, the “structure of the world”, to come back to
Merleau-Ponty’s (1962: 216) observations quoted above, that might determine the
form of a metaphoric gesture, would be conceptual structure underlying a metaphoric
projection.
We are now in a position to bring Peirce’s concept of the Ground of a sign carrier, i.e.
of the Representamen, into the picture, which accounts for the fact that sign vehicles do
not represent Objects with respect to all of their properties, but only with regard to
some salient or locally relevant qualities. These foregrounded features function as the
Ground of the Representamen. In Peirce’s own words (1960: 135, § 2.228; italics in
the original):
The sign stands for something, its object. It stands for that object, not in all respects, but in
reference to some sort of idea, which I sometimes called the ground of the representamen.
“Idea” is here to be understood in a sort of Platonic sense, very familiar in everyday talk;
I mean in that sense in which we say that one man catches another man’s idea.
The Ground can thus be understood as a metonymically profiled quality of an Object

that is portrayed by a Representamen; as such, the Ground puts the Representamen
(e.g., two cupped open hands held together) to an interpreting mind into relation
with an Object (e.g., a small bowl). In light of the supposed partiality of perception
and depiction, viewpoint appears to be a decisive factor also here. As has been pointed
out above, whether a sign producer adopts, for instance, character or observer view-
point will influence which aspects of the Object may be abstracted and feed into the
Ground of the Representamen. The multimodal and multidimensional sign processes
of interest here can be devised based on basic semiotic grounding mechanisms (Sones-
son 2007: 40), which are introduced next and will inform the gesture analyses provided
in the ensuing sections.
A sign with a strongly iconic Ground, for instance, is a partial, i.e. metonymic, ren-
dition of what it represents based on a perceived or construed similarity. Similarity, as is
well known, may be perceived in images per se, but also in two other subtypes of what
Peirce called hypoicons: diagrams (i.e. icons of relations) and metaphors (implying a
parallelism; Peirce 1960: 135). In Peirce’s own words (1960: 157; § 2.276), “[i]cons
have qualities which “resemble” those of the objects they represent, and they excite
analogous sensations in the mind”. While there might be a visual bias in the term icon, it
encompasses a multimodal understanding of iconicity and hence also includes those
“sensations” that cause something to feel, taste, look, smell, move, or sound like some-
thing else. This view again corresponds well with the multisensory basis of embodied
image schemas and metaphors assumed in cognitive semantics (cf. section 6 for details).
It also encompasses representational gestures that are motivated by mental imagery
and conceptual structures, that is, by a mental Object: the cupped hands mentioned
above evoking the kind of bowl one is looking for (image icon); hands tracing the rela-
tions between concepts making up a theoretical framework (diagram); or an open
cupped hand that represents an abstract category in the form of a small container (met-
aphor; cf. Fig. 47.5 below and Mittelberg 2008). In addition, iconic gestures may take
shape in different ways. Inspired by the tools, media and mimetic techniques visual ar-
tists deploy, Müller (1998) introduced four modes of representation in gesture: drawing
(e.g., tracing the outlines of a picture frame), molding (e.g., sculpting the form of a
crown); acting (e.g., pretending to open a window), and representing (e.g., a flat open
hand stands for a piece of paper). If one applied all four modes to the same Object,
each resulting gestural Representamen would establish a different kind of iconic
Ground and hence highlight different features of the Object. This means that each
portrayal would convey a different “idea” of the Object (Peirce 1960: 135; § 2.228).
Pointing gestures are signs with a highly indexical Ground. In indexical signs, the
relation between sign and Object is based on contiguity, that is, on a factual (physical
or causal) connection between the two. According to Peirce (1960: 143; §2.228), “[a]n
Index is a sign which refers to the Object that it denotes by virtue of being really
affected by that object”. Indeed, the spatial orientation of highly context-sensitive
pointing gestures depends on the location of the Object they are directed toward,
and through the act of pointing the Object is established via a visual vector (cf. Fricke
2007; Haviland 2000; Kita 2003). Another example, not as strongly indexical though, is
the palm-up open hand gesture discussed above (cf. Fig. 47.2), since there is an indexical
relation between the open palm and the category noun it seems to be presenting to the
audience. Here we can see an interesting interaction between the speech content and
the gestural grounding mechanism: not the act of holding per se (which would presup-
pose an iconic Ground) is pertinent, but the entity (or the space) to be imagined as
being in contact with the open palm becomes the lieu of attention and thus the lieu
of meaning (cf. section 5.3. for details and more examples). In view of these observa-
tions, iconic and indexical grounding mechanisms may be regarded as two routes of
abstraction competing in corporeal/gestural sign creation and being tightly integrated
with the information provided in speech. As will be discussed next, Jakobson proposed
distinct types of contiguity relations and metonymic modes that seem to correlate with
these basic mechanisms of semiotic grounding.
5. Jakobsonian contiguity relations and metonymic

modes in gesture
Roman Jakobson introduced Peirce’s semiotic theory to a larger audience of linguists in
Europe and the United States (Waugh and Monville-Burston 1990). The concepts of
similarity and contiguity, as the two essential structural relations between signs, also
provide the basis for Jakobson’s (1956) theory of metaphor and metonymy seen as two
opposite modes of association and signification that structure both linguistic and non-
linguistic signs. Like all semiotic modes, they are not mutually exclusive: signs tend
to exhibit varying degrees and different hierarchies of both (Jakobson 1966: 411). De-
riving his understanding of metonymy from Peirce’s notion of contiguity (and indexical-
ity), Jakobson emphasized the difference between synecdoche and other types of
metonymy:
One must – and this is most important – delimit and carefully consider the essential differ-
ence between the two aspects of contiguity: the exterior aspect (metonymy proper), and
the interior aspect (synecdoche, which is close to metonymy yet essentially different).
To show the hands of a shepherd in poetry or the cinema is not at all the same as showing
his hut or his herd, a fact that is often insufficiently taken into account. The operation of
synecdoche, with the part for the whole or the whole for the part, should be clearly distin-
guished from metonymic proximity. […] the difference between inner and outer contiguity
[…] marks the boundary between synecdoche and metonymy proper. (Jakobson and
Pomorska 1983: 134)
In the following subsections, we will first see how, when pragmatically operationalized
in a given sign process, these two contiguity relations may feed into corresponding me-
tonymic processes (i.e. internal and external metonymy). The analytical framework,
comprising different kinds of gestural icons and indices brought about through different
kinds of metonymic processes, will then be presented and exemplified with examples
from the multimodal corpus (Mittelberg 2006). Modes of interaction between metony-
mic and metaphoric processes will also be addressed (Mittelberg and Waugh 2009).
(Tab. 47.1 in section 5.4 provides a synopsis of the taxonomy of cognitive-semiotic
principles developed in sections 5.1–5.3 below.)
5.1. Internal and external metonymy

Jakobson’s understanding of contiguity and metonymy allows the gesture analyst to
account for distinct processes that seem to be central to the formation and interpreta-
tion of manual as well as corporeal signs implying the entire body. Before looking at
gestural examples, definitions and examples will be provided here.
Internal metonymy rests on inner contiguity relations, e.g. as exploited by the princi-
ple of partiality: a part stands for another part that is connected to the first; a part stands
for the whole; or a whole stands for a part. For example, in the expression there are
many new faces in the group, faces refers to a prominent part of persons. The face is
part of the head which is a part of the physical structure of a human body. Everyone
lives under one roof would be another example, in which roof stands for the entire
house of which it is a physical fragment. As these cases show, Jakobson integrated
what is generally known as synecdoche into his notion of internal metonymy (cf. Peirs-
man and Geeraerts 2006; Radden and Kövecses 1999). Internal suggests that the inter-
nal structure of a reference object is broken down into fragments, and one of these
fragments (e.g., face or roof ) is taken to refer to the entire structure (e.g., body or
house). In visual signs, internal metonymy also drives the abstraction of essential prop-
erties inherent to a given Object, resulting in the portrayal of aspects of its contours,
dimensions, or internal structure.
External metonymy involves different kinds of outer contiguity relations, particularly

those pertaining to contact, adjacency, and impact; they also include instrument, source,
as well as cause and effect. In the utterance The White House remained silent, the White
House refers to the U.S. President or his spokesperson. The contiguity relations holding
between the place, or building, and its inhabitants are of spatial and pragmatic nature;
however, the people living and working inside the building are obviously not part of its
architectural structure (as the roof in the example for internal metonymy given above).
As the house and the persons belong to the same experiential domain, this expression
can also count as an example of frame metonymy (Dancygier and Sweetser 2005; Fill-
more 1982). Another example of external metonymy is the question Would you like
another cup? If this utterance is meant to ask the addressee if she cares for more tea,
for instance, the cup stands for its adjacent content, i.e. the beverage, which is not
part of the material structure of the container cup, but external to it.
In view of the different types of contiguity relations and metonymic modes put forth
by Jakobson, the assumption guiding the ensuing gestural analyses is that internal
metonymy correlates with signs exhibiting a predominantly iconic Ground, and external
metonymy correlates with signs exhibiting a predominantly indexical Ground (cf.
Tab. 47.1). Although it is understood that conventionality/habit, the third type of
sign-Object relations in Peirce’s triadic typology, can be observed in these dynamic
sign processes to varying degrees, processes of conventionalization cannot be treated
in detail here for lack of space (cf. Mittelberg 2006).
5.2. Internal metonymy in gestures with predominantly iconic Ground

Metonymy has been shown to assume an important role in manual sign formation (e.g.,
Bouvet 2001; Gibbs 1994; Mandel 1977; Müller 1998; P. Wilcox 2004; Taub 2001). In this
body of work, the main focus has been synecdoche which corresponds, as we have just
seen, to Jakobson’s idea of inner contiguity (i.e., internal metonymy). Highlighting
some of the ways in which iconicity and metonymy may jointly drive form and meaning
construction in dynamic (metaphoric) visuo-spatial signs, the analyses presented in this
subsection will proceed according to Peirce’s (1960: 135; cf. section 4 for definitions)
subtypes of iconicity: image, diagram, and metaphor.
IMAGE ICON (POSTURE; ACTION; OBJECT; EVENT; ETC.). Through internal metonymy,
salient qualities of a given action, object, or idea may be profiled in a gestural/corporal
sign. There is a difference between sign processes where the body itself imitates a par-
ticular posture or action and those where the hands iconically represent or delineate
some of the essential features of an object or event. For example, the person personify-
ing the semantic role recipient shown in Fig. 47.1 renders the essential features inherent
to a common physical action/posture such as holding something in one’s hand. It qua-
lifies as a visual sign with a highly iconic Ground. In light of the different iconic modes
proposed by Peirce, this gesture is an example of an image icon resulting from a partic-
ular interaction of iconicity and internal metonymy. Moreover, since the speaker liter-
ally becomes a recipient, this gestural portrayal reflects character viewpoint (e.g.,
McNeill 1992; Sweetser 2012). As pointed out earlier, in the case of this particular ges-
ture, the mention of the Object, the role recipient, in the concurrent speech centers
our attention on the person. According to conceptual metaphor theory (e.g., Lakoff
and Johnson 1980, 1999), personifying an abstract grammatical function entails a
metaphoric mapping. In the present framework this corporeal sign is first and foremost
analyzed as an image icon, that is, a literal portrayal of the idea of a recipient simulta-
neously expressed in speech.
Another example of a gestural form with a strongly iconic Ground is given in
Fig. 47.3 below. This tracing gesture starts out with both hands joined in the center
of gesture space. Then the hands move laterally outward until both arms are fully ex-
tended, as if they were tracing a horizontal line or chain. Here, too, the speaker’s verbal
utterance – we think of a sentence as a string of words – disambiguates a potentially
polysemous schematic gestural image icon. So the focus is not on the body itself or
the action of tracing, but on the virtual line drawn in the air that results from the action
of tracing. Cross-modal processes of meaning construction also play a crucial function
here in that the trace is an image icon of the idea of a string that is simultaneously ex-
pressed in the speech modality. Via internal metonymy the imaginary line stands for an
entire sentence. Whether this schematic image reifies abstract conceptual structure or
whether it is a minimal icon of a graphical representation of a sequence of written
words, it is likely to result from cognitive processes of visual perception and analysis.
In contrast to the previously discussed gesture (Fig. 47.1), this gesture exhibits observer
viewpoint.
Fig. 47.3: A string of words (image icon)
Fig. 47.4: Noun teach-er (diagrammatic icon)

DIAGRAMMATIC ICON (INTERNAL RELATIONS; STRUCTURE). While the focus of the sentence
string of Fig. 47.3 was on its linear gestalt as a whole, the following gesture puts into
relief the inner structure of a word (cf. Fig. 47.4). In this well orchestrated multimodal
performance unit, two hands produce two individual gestural signs whose functional
relation turns out to be of particular interest. While explaining the basics of noun mor-
phology, the teacher complements the verbal part of his utterance as speakers of
English you know that … teacher consists of tech– and –er by making use of both
of his hands with the palms turned upwards and the fingers curled in. On the men-
tion of teach– he brings up his left hand, and immediately thereafter, on the mention
of –er, the right hand. He then keeps holding the two hands apart as depicted
in Fig. 47.4. This gesture can be interpreted in several ways. If we assume the left
hand to represent the morpheme teach– and the right hand the morpheme –er, we
can say that each sign itself entails a reification in that an abstract linguistic unit or a
speech sound is construed as a physical object through a metaphorical projection
(e.g., IDEAS ARE OBJECTS). If we assume the hands to be enclosing small imaginary
items, we can suppose an outer contiguity relation between the perceivable gestural ar-
ticulators and the metaphorically construed objects inside of them (cf. section 5.3). The
visually inaccessible contents would then be metonymically inferred from the percepti-
ble closed containers. In both readings, this composite gesture puts into relief the
boundary between the two elements, while accentuating the fact that the linguistic
units referred to in speech are connected on a conceptual level. As such, it constitutes
a gestural diagram: icons, “which represent the relations, mainly dyadic, (…) of the
parts of one thing by analogous relations in their own parts, are diagrams” (Peirce
1960: 157; § 2.277). The diagrammatic character of this cognitive-semiotic structure al-
lows us to identify contiguity relations between its constitutive parts: there is thus exter-
nal metonymy holding between individual signs building a structured whole (cf.
Tab. 47.1: to account for the hybrid status of the diagram, it is positioned closer to
the middle of the iconicity-indexicality continuum than image icon and metaphor
icon; cf. Mittelberg [2006: 117–132; 2008: 134–139] for diagrammatic iconicity in gesture;
cf. Waugh [1994] for iconicity in language).
METAPHOR ICON (PERSONIFICATION; REIFICATION; ETC.). Due to the metaphoricity char-
acterizing the meta-grammatical gestures analyzed here, the previously discussed exam-
ples of image icons could, in principle, also be analyzed as metaphor icons: the semantic
role recipient personified by the speaker’s bodily posture (cf. Fig. 47.1) and the sentence
conveyed as a string of words reified in the form of an imaginary line (cf. Fig. 47.3).
However, the present framework differentiates such gestural image icons of metaphoric
linguistic expressions from cases of metaphor iconicity in gesture that imply additional
semantic leaps in establishing similarity (Coulson 2001), that is, leaps not cued by met-
aphorical expressions in the speech modality. In the sequence of interest here, the
speaker explains the difference between main verbs and auxiliaries (Fig. 47.5). While
saying there is … what’s called the main verb, he directs his right hand toward the black-
board behind him, thus disambiguating and contextualizing the deictic existential
expression there is (cf. section 5.3). Immediately thereafter, while holding the deictic
gesture, the speaker makes a gesture with his left hand on the mention of the main
verb: the cupped palm-up open hand imitates the form of a small round container.
Showing a strongly iconic Ground, the formal features of the cupped hand are
motivated by internal metonymy in that they portray some of the essential structural
characteristics of a generic, small round container. This iconic form, however, does
not directly represent the idea of a main verb mentioned in speech.
Fig. 47.5: There is (index away from body) … the main verb (metaphor icon)
Since it constitutes a concrete, perceivable image of an abstract grammatical category,

this gesture can be regarded as instantiating not only the image schema OBJECT or CON-
TAINER, but also the metaphor CATEGORIES ARE CONTAINERS or CONCEPTUAL STRUCTURE IS
PHYSICAL STRUCTURE (cf. Sweetser 1998). Put in Peircean terms, the cupped hand repre-
sents “a parallelism” (1960: 157; §2.277) between a category and a cup-like container
(i.e. here the hand is the category container). In contrast to the personified recipient
and the string-like gesture, the linguistic expression the main verb is non-figurative,
and its meaning would be difficult to represent iconically. Again, the point here is
that the container-like gesture adds another metaphoric dimension to this multimodally
performed explanation, thus evidencing the speaker’s implicit metaphorical under-
standing of the main verb as a physical entity which can take the form of a container.
Cross-modally achieved processes of meaning construction like this can also be identi-
fied inside the gestural diagram of the word teach-er discussed above (cf. Fig. 47.4).
Although the source domain of the primary metaphor IDEAS ARE OBJECTS is not ex-
pressed linguistically, the closed hands function as physical entities that stand in for
teach– and –er. In these metaphor icons, metaphorical understandings of basic linguistic
units and categories are expressed monomodally (see Müller and Cienki 2009 on multi-
modal metaphor). Because the source domains of the underlying metaphors only man-
ifest themselves in the gestural modality, these gestures are particularly good examples
of the spontaneous exbodiment of conceptual structure.
5.3. External metonymy in gestures with predominantly

indexical Ground
Generally speaking, there are outer contiguity relations between gesticulating hands
and the virtual objects they seem to manipulate, as well as between hands and the adja-
cent space they demarcate. Based on outer contiguity relations such as contact, impact
or cause/effect, external metonymy may account for finger/hand movements and visible
traces or other kinds of imprints left on paper, blackboards, canvas, as well as other
surfaces such as sand or other types of grounds (e.g., the famous example of animal foot-
prints). In a similar fashion, external metonymy accounts for the relation between ges-
tural movements and the emerging virtual traces they create in the air. Taking the
human body as the starting point, the following discussion of different types of outer con-
tiguity relations in gesture entails different degrees of “metonymic proximity” (Jakobson
and Pomorska 1983: 134) and an increasingly noticeable interaction with iconic modes.
INDEX AWAY FROM BODY (POINTING). Pointing is a highly coordinated and culturally-
shaped activity (e.g., Fricke 2007; Haviland 2000; Kendon 2004; Kita 2003; McNeill
1992). While they are not treated in depth here, pointing gestures are included in the
taxonomy as examples of signs with a highly indexical Ground. For example, the deictic
gesture shown above in Fig. 47.5 creates an invisible vector pointing away from the
speaker’s body and directing the audience’s attention to the word taught written on
the blackboard behind him. There is an outer contiguity relation between the tip of
the pointing finger and the target of the pointing action. Instances of deictic gestures
pointing to more distant Objects also belong to this group of indices.
BODY PART INDEX (LOCATIONS ON BODY). This kind of external metonymy is repre-
sented by gestures whose meaning derives partly from their contact with, or proximity
to, a particular body part of the speaker. The gesture depicted in Fig. 47.6 below is a
both-handed body part index co-occurring with the word knowledge in the verbal utter-
ance Grammar emerges from language use, not from knowledge becoming automatized.
It can be described as a hybrid of a) two simultaneously produced pointing gestures, tar-
geting each of the speaker’s temples, and b) a bimanual gesture consisting of two
cupped hands jointly constituting a metaphor icon of a container held next to the
head. In order to get to the site of knowledge, it takes two steps along an inferential
pathway, both of them afforded through external metonymy. First, there is an outer con-
tiguity relation between the hands and the head; then, there is another outer contiguity
relation between the outside of the head and its inside. In cognitive semantic terms: the
head is metaphorically understood as a CONTAINER which stands metonymically for its
content, i.e. knowledge (see Panther and Thornburg 2004 on metonymy and pragmatic
inferencing in language, and Dudis 2004 on body partitioning in American Sign
Language).
Fig. 47.6: Knowledge (body part index)

Fig. 47.7: Noun (hand/object index; support)
HAND/OBJECT INDEX (SUPPORT; CONTAINER). In gestures involving open or closed hands,

outer contiguity relations may hold between the hands and the (imagined) objects
they seem to be supporting, holding, placing, or otherwise manipulating. Hence,
these gestural signs imply “immediate contiguity” between the gestural articulators
and the implied elements of the sign process (Jakobson and Pomorska 1983: 134).
One of the palm-up open hand gestures presented in the opening section of this article
is reproduced here (cf. Fig. 47.7) to highlight its particular form and function in relation
to the other gestures with indexical Ground discussed in this section. The principle of
external metonymy is instantiated through an outer contiguity relation (contact/
adjacency) between the surface of the open hand and the noun the speaker is referring
to in speech (i.e., the speech does not hint at an action of supporting something or the
idea of a surface as such). Depending on the concurrent speech, open cupped hands like
the one shown in Fig. 47.5 above (the main verb) can also be employed to draw atten-
tion to (imagined) contents of it. Profiling the inside of an open hand would require
some deictic element in the discourse-pragmatic context leading to the things inside,
e.g. a demonstrative pronoun or adjective, the speaker’s gaze directed at it, his other
hand pointing at the inside of the hand, or any combination of the above. In any
event, external metonymy can draw on such outer contiguity relations and create an
inferential pathway between cupped hands and possible contents (CONTAINER FOR CON-
TENT). Via outer contiguity, closed hands, too, may metonymically stand for what they
seem to be enclosing (cf. Fig. 47.4). Now, especially in the case of open hand gestures
(Müller 2004), virtual objects generally do not receive much geometric specification.
One reason might be that the multimodal discourse is about abstract concepts and ca-
tegories. Since that which the imaginary objects stand for is revealed in the concurrent
speech, it might be sufficient to simply provide a surface for them, point to their exis-
tence or, if relevant, to their position in relation to other signs in gesture space (cf.
Mittelberg 2010b: 376–378). It is the indexical Ground of these manual signs that pro-
pels, together with other discourse-pragmatic factors, a sort of muted function of point-
ing or indicating (see Liddell 2003 on locations and surrogates in American Sign
Language).
HAND/OBJECT INDEX 2-SIDED (BOUNDED SPACE). We now turn to gestures with indexical
Ground that provide more iconic cues regarding the geometry of the imaginary object
they seem to be holding or of the chunk of space they demarcate. These gestural signs
also involve “immediate contiguity” between the gestural articulators and the invisible
elements of the sign process (Jakobson and Pomorska 1983: 134). Compared to palm-up
open hand gestures, the gestures depicted below (cf. Figs. 47.8 and 47.9) employ two ar-
ticulators, e.g. two fingers or two hands, which help specify to a higher degree the size
and shape of the objects involved in the imitative actions. The person shown in Fig. 47.8
below is lecturing about sentence structure. When explaining the short sentence
Diana fell, his right hand shows this hand configuration held relatively high up in ges-
ture space on the mention of the verb form fell. Between his thumb and index finger,
he seems to be holding the verb fell, conceptualized as a tangible object or space
extending between the articulators. If we only considered the visible gestural articu-
lators as the semiotic material of this gesture, it would seem impossible to establish
a similarity relationship between this particular hand configuration and a verb form
or speech sound fell (no falling event is iconically depicted, either). However, through
the pragmatic context and the simultaneously uttered word fell attention is drawn
to the imaginary entity between the two fingers. So it is not the iconic relationship
between the imitated action and the real action of holding something up in the air
that gets profiled here; rather, the cognitive-semiotic principle of relevance is the
outer contiguity relation (contact/adjacency) between the observable gestural articu-
lators and the imagined word form that becomes, due to the linguistic cue fell, opera-
tionalized through external metonymy (see Hassemer et al. 2011 for a detailed account
of the profiling of gesture form features).
Fig. 47.8: Verb form fell (hand/object index 2-sided)
Fig. 47.9: Subcategory (hand/object index 2-sided)

The last example is a two-handed gesture combining indexical and iconic modes in a
relatively balanced fashion. In the sequence from which the image in Fig. 47.9 is
taken, the speaker talks about the functional difference between main verbs and aux-
iliaries. He explains that auxiliaries such as have, will, being, and been (…) must all
belong to some subcategory. Upon some subcategory he makes the gesture shown
above, consisting of two hands that seem to be holding an imaginary three-dimensional
volume whose geometry is comparatively well designated. While there is an iconic rela-
tionship (via internal metonymy) between the physical action of holding or placing an
object as such and this gestural action of pretending to do so, it is again the outer con-
tiguity relation between the hands and the adjacent imagined object that is profiled here
via external metonymy in conjunction with the linguistic cue some subcategory. This
association works effortlessly because the action of “holding” and the object being
“held” are part of the same experiential domain (cf. Dancygier and Sweetser 2005 on
frame metonymy). Moreover, the meaning of the term subcategory is reinforced by
the gesture’s comparatively low location in gesture space. Since the subcategory is liter-
ally placed underneath the location where the superordinate category it relates to was
produced only a few utterances earlier (i.e. the main verb; cf. Fig. 47.5), this gesture also
is an instance of metonymy of place.
HAND/TRACE INDEX (PATH; LINE FIGURE; ETC.). Communicative hand movements leave
invisible traces in gesture space. These traces become meaningful when attention is
drawn to, for instance, their execution in terms of the trajectory they project, the
shape contours they delineate, or the specific manner of movement they exhibit (e.g.,
through straight or wavy lines; see Bressem this volume). In such sign processes,
hand/trace indices highlight the outer contiguity relation between perceivable gestural
articulators, e.g. the index finger or the entire hand, and the imaginary lines or figures
they leave in the air or the visible traces they imprint on surfaces. Such figurations
constitute, however sketchy or minimal they may be, signs in their own right and
may as such be iconic, diagrammatic, and/or metaphoric icons of something else
(e.g., the line representing a string of words in Fig. 47.3; see Mittelberg 2010a for
additional movement patterns). Given their strong interaction with iconic principles,
this type of index in placed towards the middle of the continuum in Tab. 47.1. Outer
(tactile) contiguity relations between flat hands and the surfaces they pretend to be
exploring, as well as between bent hands and the volumes they seem to be touching
or creating (external to the hands), represent another indexical relation that may
engender iconic figurations: instantiations of HAND/PLANE INDEX are not exemplified
here, but listed in the table below (see Hassemer et al. 2011 for dimensions of gesture
form).
5.4. Icon and index in concert: Cross-modal grounding mechanisms

in (metaphoric) cospeech gestures
Throughout this article we have seen how iconic and indexical principles jointly create
form and meaning in multimodal sign processes. Functioning as a synthesis of the
approach to gesture presented here, Tab. 47.1 below has two objectives: a) to give an
overview of how the theoretical notions constituting the present framework are related
to one another (upper half of table); and b) to present the analytical taxonomy of cog-
nitive-semiotic principles that have been laid out in sections 5.1–5.3 and additional ones
stemming from previous work (lower part of table; see Mittelberg 2006, 2010a, 2010b).
Horizontally, the table is structured via a continuum spanning from an iconic pole on
the left to an indexical pole on the right. Along this continuum different kinds of ges-
tural and corporeal icons and indices are positioned, depending on whether they exhibit
a predominantly iconic or indexical Ground. Gestural signs combining increased de-
grees of both iconicity and indexicality are placed towards the center of the continuum.
Neither the taxonomy nor the placement of the gestures is to be seen as static or abso-
lute. In a given sign process, the particular combination of pragmatic forces might
require a reordering of certain principles along the continuum. There also is room
for variation regarding additional gesture forms and functions not considered here
(e.g., pragmatic gestures, beats and other primarily indexical gestures that fulfill various
functions regarding affect, attention, interaction, and information management; e.g.,
Bavelas et al. 1992; Müller 1998).
Table 47.1 reads as follows. While IMAGE ICON and METAPHOR ICON are positioned clo-
ser to the left pole of the continuum, it is understood that gestural imagery may exhibit
varying degrees of abstraction and schematicity along the line. As pointed out earlier,
image icons tend to literally depict what is described in speech, while metaphor icons
imply additional cognitive leaps. Since the DIAGRAMMATIC ICON (INTERNAL RELATIONS;
STRUCTURE) entails both inner contiguity, regarding the sign-object relation, and outer
contiguity relations among its constitutive parts, it is placed closer toward the indexical
side of the continuum. The different gestural indices proposed here point to locations
where meaning is multimodally constructed, either directly on the speaker’s body or
in “metonymic proximity” to it (Jakobson and Pomorska 1983: 134).
Deictic gestures with a highly indexical Ground constitute the far right of the contin-
uum: namely, pointing gestures (INDEX AWAY FROM BODY) and pointers to specific body
parts or locations on the speaker’s body (BODY PART INDEX). Gestures with a slightly
muted indexical Ground may indicate the existence or location of a mentally construed
entity in the form of a virtual object, by providing a support structure in or on which it
can be presented, e.g. via a HAND/OBJECT INDEX (SUPPORT; CONTAINER). In addition, ges-
tures employing more than one articulator may demarcate chunks or extensions of
space between them that may get semantically charged in acts of multimodal mean-
ing-making, e.g. via a HAND/OBJECT INDEX 2-SIDED (BOUNDED SPACE). To differentiate
between physical objects and tools used to perform a certain transitive action on some-
thing else, e.g. pantomiming cutting a fruit (object) with a knife (tool), another outer
contiguity relation is added here. A HAND/TOOL INDEX (tool involved in action) incorpo-
rates iconic dimensions, derived from the particular hand shape or grip and the motor
routine typical for the performed action (Grandhi, Joue and Mittelberg 2011, 2012).
Finally, HAND/TRACE INDEX (PATH; LINE FIGURE) and HAND/PLANE INDEX (FLAT SURFACE;
VOLUME) combine indexicality and iconicity in the creation of, for instance, invisible
paths, line drawings, planes or volumes in gesture space (Hassemer et al. 2011 for a
detailed account of the geometry and dimensions of gesture form). There is outer con-
tiguity between fingers/hands that move through gesture space and the paths and fig-
ures they delineate, or the surfaces and volumes they seem to explore or actually
create in the process. Since such invisible figurations may be iconic of something
else, e.g. of cognitive or physical structures or actions, these signs are positioned closer
to the iconic pole than the other indices. Regardless of the predominant Ground of
the gestural signs, the metonymically inferred objects may come in different shapes
and sizes and show different degrees of iconic and geometric specification. In the ges-
tures representing grammatical categories and linguistic structure analyzed here, smal-
ler units such as morphemes fit into a closed hand; single words and categories were
held between index and thumb or rest on a palm-up open hand; and more complex
units such as sentences were represented as linear structures unfolding horizontally
in front of the speaker’s body (see Mittelberg 2008 and 2010a for additional examples).
Tab. 47.1: Cognitive-semiotic principles interacting in gestural and corporeal sign

processes with speech
similarity contiguity
THEORETICAL BASES
C.S. Peirce
iconicity indexicality
(image; diagram; metaphor)
internal metonymy external metonymy

inner contiguity outer contiguity
R. Jakobson (part-whole; part-part) (contact; adjacency; impact; place; etc.)
metaphor metaphor
IMAGE ICON INDEX AWAY FROM BODY (pointing)

COGNITIVE-SEMIOTIC PRINCIPLES
(posture/action; object = trace, figure, hand) BODY PART INDEX (locations on body)
Gestural and METAPHOR ICON HAND/OBJECT INDEX (surface; container)
corporeal (personification; reification (trace, hand))
sign processes HAND/OBJECT INDEX 2-sided (bounded space)
(with speech) DIAGRAMMATIC ICON HAND/TOOL INDEX (tool involved in action)
(internal relations; structure) HAND/TRACE INDEX (path; line figure)
HAND/PLANE INDEX (flat surface; volume)
Semiotic
grounding: iconic Ground indexical Ground
Iconicity–
indexicality
continuum
As metaphor is assumed to interact with all metonymic modes to varying degrees (Ja-
kobson 1956), it is placed on both sides of the continuum. In their investigations of how
indexical and iconic principles interact in the interpretation of metaphoric gestures,
Mittelberg and Waugh, in their (2009) article “Metonymy first, metaphor second”,
have suggested two distinct but intertwined mappings. From the perspective of the per-
son listening to and looking at such a multimodal performance, metonymy can be said
to lead into metaphor. First, gestural sign vehicles, i.e. hand shapes and movements,
may serve as visual reference points (Langacker 1993) triggering cognitive access to
concepts represented as chunks of demarcated space or invisible objects (e.g., P. Wilcox
2004). Via a metaphorical mapping, these reified entities stand for the abstract cate-
gories the person talks about (see Taub 2001 for metaphorical mappings in American
Sign Language). In some of the instances examined here, the concurrent speech is not
metaphorical in nature (e.g., teach-er; main verb). Yet, the body portrays, i.e. exbodies,
how the person conceptualizes and understands the abstracta.
Such instances of gestural manifestations of both mental imagery and physical ac-
tions seem to be cross-modally grounded in several interlaced ways: a) via the human
body indexically anchored in its momentary temporal, spatial and social context;
b) via the concurrent linguistic utterance carrying the information that determines
the local meaning of a potentially polysemous gestural/corporeal sign, which c) is

further contextualized via previous and subsequent gestures (see Müller 2010 for anal-
yses of gesture sequences). In light of these processes, we witness a tendency to create –
out of nothing – material anchors for abstract ideas and structures in the immediate,
visible and evolving environment sharable by the interlocutors (e.g., Hutchins 1995;
Williams 2008).
6. On the semiotic reality of image schemata and force

gestalts in gesture
Image schemata and force dynamics occupy a central place in embodied approaches to
meaning (cf. Johnson 1987, 2007; Talmy 1988, 2000). When applying Johnson’s (1987:
xiv) original definition of image schemata as “recurring, dynamic patterns of our per-
ceptual interactions and motor programs that give coherence and structure to our expe-
rience” to gesture, it becomes evident that gestures are prone to offer additional
support for the “semiotic reality” of such experiential gestalts (Danaher 1998: 190).
Recent work on image schemata in language comprises a variety of understandings
and definitions (e.g., Hampe 2005), but generally the idea of embodiment seems to be
understood more and more literally (see Zlatev 2005 on mimetic schemas in children’s
gestures). As the gesture research reviewed in this article suggests, studying the human
body’s intuitive expressions and culturally-shaped practices may provide valuable in-
sights into how higher cognitive activities may be grounded in dynamic patterns of
visual perception, bodily motion, and social behavior. According to Cienki (2005:
435), “image schemas are readily available, indeed ‘on hand,’ for recruitment as ges-
tural forms”. An example of a pervasive image schema is the path schema consisting
of a beginning (SOURCE), an end point (goal), and a vector constituting a path between
source and goal. The PATH schema may also structure gestural representations of a sen-
tence as an imaginary line traced horizontally from left to right in front of the speaker
(cf. Mittelberg 2008). Image schemata have further been argued to be a means to
project conceptual structure into gesture space, thus underpinning the flexible spatiali-
zation of abstract concepts as well as systematic uses of certain regions of gesture space
(e.g., UP/DOWN; LEFT/RIGHT; FRONT/BACK; CENTER/PERIPHERY; e.g., Calbris 2003; Cienki
1998b, 2005; Evola 2010; Mittelberg 2010a, 2010b; Williams 2008). Moreover, image
schemata may build the basis for variants of a recurrent gesture: for example, the
cycle schema has been shown to underlie variants of the cyclic gesture which may
evoke processes of searching, describing, or calling on addressees (e.g., Ladewig 2010,
2011; cf. also Müller 2010 on recurrent gestures).
Independently of the medium in which they materialize, image schemata, assumed to
be dynamic and malleable conceptual gestalts, are rarely fully instantiated. In gesture,
too, salient structural properties of a posture, a motion pattern, a hand configuration, or
an ephemeral figuration in space may provide only minimal information that may feed
into the evocation of the full schemata or gestalts. It is left to the eye of the addressees
and to their own experience with physical actions and thought processes to complement
kinesthetic and cognitive entailments. Among the prominent hand configurations and
motions patterns that emerged from the corpus of linguistics lectures, a large number
were found to be reminiscent of image-schematic structures and basic geometric shapes.
The following schemata, some of which also resulted from the analyses presented in this
article, have been identified: OBJECT; CONTAINMENT; SUPPORT; CONTACT; SOURCE-PATH-GOAL;

LINK; SCALE; EXTENSION; BALANCE; CENTER-PERIPHERY; CYCLE; ITERATION; FRONT-BACK; ADJA-
CENCY, PART-WHOLE; FORCE (Mittelberg 2010b: 374). In addition, the analysis revealed a
set of geometric patterns (e.g., circle, semicircle, triangle, rectangle, and square), as well
as lines traced along the horizontal, vertical, diagonal, or sagittal axis, thereby showing
various qualities such as straightness and curviness (see Cienki 1998b on the image
schema STRAIGHT). Several schemata may further interact in a given multimodal expres-
sion, as evidenced by the gestural diagram representing the morphological structure
of the noun teach-er (Fig. 47.4): OBJECT, CONTAINMENT, LINK, PART-WHOLE, SPLITTING, and
BALANCE jointly engender a complex cross-modally achieved explanation of abstract re-
lations. Given the visuo-spatial mediality of gestures, the list above comprises, perhaps
not too surprisingly, various spatial and spatial relations image schemata which are as-
sumed to structure systems of spatial relations cross-linguistically (Lakoff and Johnson
1999: 35).
As highlighted throughout the article, the present approach to gesture puts into
relief the importance of contiguity relations, not only among elements within an object,
event or domain, but also regarding the mechanisms that ground individual sign pro-
cesses in their physical, conceptual, and semiotic contexts. What is crucial here is that
conceptual structures with an internal structure such as diagrams, frames (Fillmore
1982), and image schemata (e.g., the path schema and other relational schemas) also
exhibit contiguity relations between the parts they consist of. As Johnson (1987: 126;
italics in the original) emphasized, “image schematic structures whose relations make
up the fabric of our experience [are] pervasive, well-defined and full of sufficient internal
structure to constrain our understanding and reasoning.” It is this internal structure that
allows the language user/gesturer to profile one part of a structured whole, e.g. the
beginning or the end of an itinerary (or a love relationship; Lakoff and Johnson
1980), while other parts remain backgrounded but can still be inferred (cf. the diagram
of morphological structure discussed in section 5.3; Fig. 47.4). At this point, we can
make a link back to metonymy, for the schemata PART-WHOLE, LINK, CONTACT, and AD-
JACENCY correspond to the central inner and outer contiguity relations identified in the
gestures discussed throughout this article. Via verbally triggered processes of metony-
mic inferencing, these contiguity relations may become operationalized, resulting, for
instance, in shifts of attention from a visible hand performing a gestural action to the
inferred object, tool, or figuration that is to be imagined as existing or evolving adjacent
to the gesturing hands. This is in part afforded by the principle of “immediate contigu-
ity” (Jakobson and Pomorska 1983: 134) between the gesturing hands and the implied
elements. Despite their immaterial nature, these imaginary dimensions of gestural signs
are meaningful elements in the multimodal sign process of interest here.
Force-dynamic experiential gestalts (e.g., Johnson 1987; Talmy 1988) are also perti-
nent to the study of gesture and bodily communication more broadly; yet, they are
only starting to receive attention in gesture research. Johnson (1987: 42; italics in the
original) reminds us that “our bodies are clusters of forces and that every event of
which we are a part consists, minimally, of forces in interaction”. Bodily routines of
interaction with the environment that may motivate multimodal expression may
imply, inter alia, exerting force on objects or people, such as pulling or pushing some-
thing or somebody, as well as forces felt on one’s own body such as GRAVITY, BLOCKAGE,
COMPULSION or REPULSION. Physical activities such as walking against the wind, carrying
heavy items, keeping balanced while riding a bike, or being pushed through a narrow
hallway by a crowd of people, can be easily reenacted through imitative actions per-
formed with the full body or parts of it, such as the head, torso and/or hand gestures.
In the meta-grammatical gesture corpus, certain gestures portrayed the behavior of
grammatical categories and the dynamic nature of cognitive and syntactic operations
via movements exhibiting an increased level of energy (Mittelberg 2010a: 370). For
instance, the idea that the theory of emergent grammar views grammar and language
as merging domains (Johnson 1987: 126) was illustrated by a comparably forceful move-
ment. The bimanual gesture in question starts out with two hands held apart, palms fac-
ing each other. On the mention of it blurs the boundary between learning and doing, the
palms are suddenly pushed towards each other. In this vivid portrayal, a forcefully per-
formed bodily action seems to erase both physical and conceptual boundaries at the
same time. Generally speaking, gesture research promises to augment our understand-
ing of the bodily logic of force schemata, especially regarding the multimodal expres-
sion, i.e. exbodiment, of less tangible, yet crucial dimensions of meaning such as social
forces and attitudes, but also affective and intersubjective dimensions of human
communicative behavior ( Johnson 2005, 2007).
In gesture studies, the “bodily basis of meaning, imagination and reason” (the subtitle
of Johnson’s 1987 book) indeed is the starting point for examining the physical and cul-
turally shaped forces that lend a certain degree of systematicity to less-consciously pro-
duced bodily signs. In the same vein, the intent of this article has been to demonstrate
how certain cognitive-semiotic principles – such as iconicity, indexicality, metaphor and
metonymy – interact in motivating physically grounded processes of gestural form cre-
ation and interpretation. Inspired by Peircean and Jakobsonian notions, an iconicity-
indexicality continuum was suggested, allowing to relate gestural signs with predomi-
nantly iconic or indexical Grounds, as well as more transient cases to one another
(cf. the taxonomy of icons and indices presented in Tab. 47.1). The idea of the exbodied
mind was put forth to focus on how structures of embodied multisensory experience,
such as image schemata and force gestalts, may visibly manifest themselves, at least
to certain degrees, in the form of dynamic ephemeral gestural and corporeal signs
produced with speech.
The analyses presented above have shown once again that gesture analyses need to
carefully consider a host of contextual factors, particularly speech and neighboring ges-
tures (Müller 2010), not only to disambiguate potentially polysemous gestural forms,
but also to determine which parts and movements of the bodily articulators become
profiled and thus meaningful in a given moment (Hassemer et al. 2011). In these
multimodal performances, the speech content further proved to be instrumental in es-
tablishing whether the corporeal actions or hand configurations themselves are focused
upon, or whether the imaginary entities, spaces or lines immediately contiguous to, or
created by, the gesturing hands become the salient elements in such processes of cross-
modal meaning construction. In this and many other respects, gesture studies can no
doubt greatly benefit from systematic comparisons with the morphology and discourse
pragmatics of signed languages (see, e.g., Dancygier and Sweetser 2012; Dudis 2004;
Liddell 2003; Taub 2001; P. Wilcox 2004; S. Wilcox 2004).
In light of Johnson’s (2005: 31) suggestion to “analyze various additional strata of

meaning, such as the social and affective dimensions, to flesh out the full story of mean-
ing and thought”, bodily semiotics seems to bear a particularly high potential to con-
tinue to contribute to a fuller understanding of the imaginative, intersubjective and
felt qualities of meaningful experience and expression.
Acknowledgements
The research presented in this article was supported by the Excellence Initiative of the
German Federal and State Governments. The author wishes to thank the editors, as
well as Vito Evola, Julius Hassemer and Gina Joue for valuable feedback and Yoriko
Dixon for the gesture drawings.
8. References
guages and Literatures, Lund University.
Arnheim, Rudolf 1969. Visual Thinking. Berkeley: University of California Press.
Bavelas, Janet, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures. Dis-
Bourdieu, Pierre 1990. The Logic of Practice. Stanford, CA: Stanford University Press.
Bouvet, Danielle 2001. La Dimension Corporelle de la Parole. Les Marques Posturo-Mimo-Ges-
tuelles de la Parole, leurs Aspects Métonymiques et Métaphoriques, et leur Rôle au Cours
d’un Récit. Paris: Peeters.
Bressem, Jana volume 1. A linguistic perspective on the notation of form features in gestures.
Calbris, Geneviève 2003. From cutting an object to a clear-cut analysis: Gesture as the repre-
sentation of a preconceptual schema linking concrete actions to abstract notions. Gesture
3: 19–46.
sions. In: Jan-Peter Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: CSLI Publications.
Cienki, Alan 1998b. Straight: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9: 109–147.
Cienki, Alan and Irene Mittelberg 2013. Creativity in the forms and functions of gestures with
speech. In: Tony Veale, Kurt Feyaerts and Charles Forceville (eds.), The Agile Mind: Creativity
in Discourse and Art, 231–252. Berlin: De Gruyter Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Coulson, Seana 2001. Semantic Leaps: Frame-Shifting and Conceptual Blending in Meaning Con-
struction. Cambridge: Cambridge University Press.
Damasio, Antonio R. 1994. Descartes’ Error: Emotion, Reason, and the Human Brain. New York:
Putnam and Sons.
Danaher, David S. 1998. Peirce’s semiotic and cognitive metaphor theory. Semiotica 119 (1/2):
171–207.
Dancygier, Barbara and Eve E. Sweetser 2005. Mental Spaces in Grammar: Conditional Construc-
tions. Cambridge: Cambridge University Press.
Dancygier, Barbara and Eve E. Sweetser 2012. Viewpoint in Language: A Multimodal Perspective.
Dirven, Rene and Ralf Pörings (eds.) 2002. Metaphor and Metonymy in Comparison and Contrast.
Dudis, Paul 2004. Body partitioning and real-space blends. Cognitive Linguistics 15(2): 223–238.
Enfield, N. J. 2009. The Anatomy of Meaning. Speech, Gestures, and Composite Utterances. Cam-
Evola, Vito 2010. Multimodal cognitive semiotics of spiritual experiences: Beliefs and metaphors
in words, gestures, and drawings. In: Fey Parrill, Vera Tobin and Mark Turner (eds.), Form,
Meaning, and Body, 41–60. Stanford, CA: CSLI Publications.
Fillmore, Charles J. 1982. Frame semantics. In: Linguistic Society of Korea (ed.), Linguistics in the
Morning Calm, 111–137. Seoul: Hanshin.
Fricke, Ellen 2007. Origo, Geste und Raum – Lokaldeixis im Deutschen. Berlin: De Gruyter Mouton.
Gallese, Vittorio and George Lakoff 2005. The mind’s concepts: The role of the sensory-motor
system in conceptual knowledge. Cognitive Psychology 22: 455–479.
Gibbs, Raymond W., Jr. 1994. The Poetics of Mind: Figurative Thought, Language, and Under-
standing. Cambridge: Cambridge University Press.
Gibbs, Raymond W., Jr. 2006. Embodiment and Cognitive Science. New York: Cambridge Univer-
sity Press.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell and
Elena T. Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Amsterdam:
John Benjamins.
Grady, Joseph 1997. Foundations of meaning: Primary metaphors and primary scenes. Ph.D. dis-
sertation, University of California at Berkeley.
Grandhi, Sukeshini A., Gina Joue and Irene Mittelberg 2011. Understanding naturalness and in-
tuitiveness in gesture production: Insights for touchless gestural interfaces. Proceedings of the
ACM 2011 Conference on Human Factors in Computing Systems (CHI), Vancouver, BC.
Grandhi, Sukeshini A., Gina Joue and Irene Mittelberg 2012. To move or to remove? A human-
centric approach to understanding of gesture interpretation. Proceedings of the 10th ACM con-
ference on Designing Interactive Systems. Newcastle: ACM Press.
Hampe, Beate (ed.) 2005. From Perception to Meaning: Image Schemas in Cognitive Linguistics.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Unpub-
lished Ph.D. dissertation, University of Bordeaux, France. Newcastle: ACM Press.
Hassemer, Julius, Gina Joue, Klaus Willmes and Irene Mittelberg 2011. Dimensions and mechan-
isms of form constitution: Towards a formal description of gestures. Proceedings of the GE-
SPIN 2011 Gesture in Interaction Conference. Bielefeld: Zentrum für interdisziplinäre
Forschung.
Haviland, John 2000. Pointing, gesture spaces, and mental maps. In: David McNeill (ed.), Lan-
Hiraga, Masako K. 2005. Metaphor and Iconicity: A Cognitive Approach to Analysing Texts. Ba-
singstoke: Palgrave-MacMillan.
Holler, Judith and Katie Wilkin 2011. Co-speech gesture mimicry in the process of collaborative
referring during face-to-face dialogue. Journal of Nonverbal Behavior 35: 133–153.
Hutchins, Edwin 1995. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Jakobson, Roman 1956. Two aspects of language and two types of aphasic disturbances. In: Linda
R. Waugh and Monique Monville-Burston (eds.), R. Jakobson – On Language, 115–133. Cam-
bridge, MA: Belknap Press of Harvard University Press.
Jakobson, Roman 1960. Linguistics and poetics. In: Krystyna Pomorska and Stephen Rudy (eds.),
Roman Jakobson – Language in Literature, 62–94. Cambridge, MA: Belknap Press of Harvard
University Press.
Jakobson, Roman 1966. Quest for the essence of language. In: Linda R. Waugh and Monique
Monville-Burston (eds.), Roman Jakobson – On Language, 407–421. Cambridge, MA: Belknap
Press of Harvard University Press.
Jakobson, Roman 1987. On the relation of auditory and visual signs. In: Krystyna Pomorska and
Stephen Rudy (eds.), Language in Literature, 467–473. Cambridge, MA: Belknap Press of Har-
vard University Press.
Jakobson, Roman and Krystyna Pomorska 1983. Dialogues. Cambridge: Massachusetts Institute of
Technology Press.
Jakobson, Roman and Linda R. Waugh 2002. The Sound Shape of Language. Berlin: De Gruyter
Mouton. First published [1979].
Johnson, Mark 1987. The Body in the Mind. The Bodily Basis of Meaning, Imagination, and Rea-
son. Chicago: University of Chicago Press.
Johnson, Mark 1992. Philosophical implications of cognitive semantics. Cognitive Linguistics 3:
345–366.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.),
From Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruy-
ter Mouton.
Johnson, Mark 2007. The Meaning of the Body: Aesthetics of Human Understanding. Chicago: Chi-
cago University Press.
Kendon, Adam 2000. Language and gesture: Unity or duality. In: David McNeill (ed.), Language
and Gesture: Window into Thought and Action, 47–63. Cambridge: Cambridge University
Press.
Press.
Kita, Sotaro 2003. Pointing: Where Language, Culture, and Cognition Meet. Mahwah, NJ: Lawrence
Erlbaum.
Kockelman, Paul 2005. The semiotic stance. Semiotica 157(1/4): 233–304.
Krois, John M., Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.) 2007. Embodiment
in Cognition and Culture. Amsterdam: John Benjamins.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste:
Sprache und Literatur 41(1): 89–111.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University
Press.
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. The Embodied Mind and Its
Challenge to Western Thought. New York: Basic Books.
Langacker, Ronald W. 1993. Reference-point constructions. Cognitive Linguistics 4: 1–38.
Lebaron, Curtis and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David McNeill
(ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Liddell, Scott 2003. Grammar, Gesture and Meaning in American Sign Language. Cambridge:
Mandel, Mark 1977. Iconic devices in American Sign Language. In: Lynn Friedman (ed.), On the
Other Hand. New Perspectives on American Sign Language, 57–107. New York: Academic
Press.
Mandler, Jean M. 1996. Preverbal representation and language. In: Paul Bloom, Mary A. Peter-
son, Lynn Nadel and Merrill F. Garrett (eds.), Language and Space, 365–384. Cambridge,
MA: Massachusetts Institute of Technology Press.
University Press.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. New York: Humanities Press.
multimodal models of grammar. Ph.D. dissertation, Cornell University. Ann Arbor, MI: UMI.
Mittelberg, Irene 2007. Methodology for multimodality: One way of working with speech and ges-
ture data. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson and Michael Spivey
(eds.), Methods in Cognitive Linguistics, 225–248. Amsterdam: John Benjamins.
represenations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture,
Mittelberg, Irene 2010a. Geometric and image-schematic patterns in gesture space. In: Vyvyan
Mittelberg, Irene 2010b. Interne und externe Metonymie: Jakobsonsche Kontiguitätsbeziehungen
in redebegleitenden Gesten. Sprache und Literatur 41(1): 112–143.
Mittelberg, Irene and Linda R. Waugh 2009. Metonymy first, metaphor second: A cognitive-semi-
otic approach to multimodal figures of thought in co-speech gesture. In: Charles Forceville and
Eduardo Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
lin: Berlin Verlag.
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gesture:
The Berlin Conference, 233–256. Berlin: Weidler Verlag.
Müller, Cornelia 2008. Metaphors: Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
cago: Chicago University Press.
Perspektive. Sprache und Literatur 41: 37–68.
Müller, Cornelia and Alan Cienki 2009. Words, gestures and beyond: Forms of multimodal met-
Núñez, Rafael 2008. A fresh look at the foundations of mathematics: Gesture and the psycholog-
ical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and
Gesture, 93–114. Amsterdam: John Benjamins.
Panther, Klaus-Uwe and Linda L. Thornburg 2004. The role of conceptual metonymy in meaning
construction. Metaphorik.de 06: 91–113.
Peirce, Charles Sanders 1955. Logic as semiotic: The theory of signs. In: J. Bucher (eds.) Philoso-
phical Writings of Peirce, 98–119. New York: Dover.
Peirce, Charles Sanders 1960. Collected Papers of Charles Sanders Peirce (1931–1958). Vol. I.:
Principles of Philosophy, Vol. II: Elements of Logic, edited by Charles Hartshorne and Paul
Weiss. Cambridge, MA: Belknap Press of Harvard University Press.
Peirsman, Yves and Dirk Geeraerts 2006. Metonymy as a prototypical category. Cognitive Lin-
guistics 17(3): 269–316.
Radden, Günther and Zoltan Kövecses (eds.) 1999. Metonymy in Language and Thought. Amster-
Rizzolatti, Giaomo and Laila Craighero 2004. The mirror-neuron system. Annual Review of Neu-
roscience 27: 169–192.
Saussure, Ferdinand de 1986. Course in General Linguistics. 3rd edition. Translated by Roy Harris.
Chicago: Open Court.
Shapiro, Michael 1983. The Sense of Grammar: Language as Semeiotic. Bloomington: Indiana
University Press.
Skipper, Jeremy I., Susan Goldin-Meadow, Howard C. Nusbaum and Steven Small 2009. Gestures
orchestrate brain networks for language understanding. Current Biology 19: 611–667.
Sonesson, Göran 2007. The extensions of man revisited: From primary to tertiary embodiment. In:
John M. Krois, Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.), Embodiment in
Cognition and Culture, 27–53. Amsterdam: John Benjamins.
Sweetser, Eve E. 1990. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of
Semantic Structure. Cambridge: Cambridge University Press.
Sweetser, Eve E. 1998. Regular metaphoricity in gesture: Bodily-based models of speech interac-
tion. Actes du 16e Congrès International des Linguistes (CD-ROM), Elsevier.
Sweetser, Eve E. 2007. Looking at space to study mental spaces: Co-speech gesture as a crucial
data source in cognitive linguistics. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana
Coulson and Michael Spivey (eds.), Methods in Cognitive Linguistics, 202–224. Amsterdam:
John Benjamins.
Sweetser, Eve E. 2012. Viewpoint and perspective in language and gesture. Introduction to Bar-
bara Dancygier and Eve Sweetser (eds.), Viewpoint in Language: A Multimodal Perspective,
Talmy, Leonard 1988. Force dynamics in language and cognition. Cognitive Science 12: 49–100.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Volume 1 and 2. Cambridge: Massachusetts
Taub, Sarah 2001. Language from the Body: Iconicity and Metaphor in American Sign Language.
Turner, Mark (ed.) 2006. The Artful Mind: Cognitive Science and the Riddle of Human Creativity.
Varela, Francisco, Evan Thompson and Eleanor Rosch 1992. The Embodied Mind: Human Cog-
nition and Experience. Cambridge: Massachusetts Institute of Technology Press.
Waugh, Linda R. 1976. Roman Jakobson’s Science of Language. Lisse: Peter de Ridder.
Waugh, Linda R. 1994. Degrees of iconicity in the lexicon. Journal of Pragmatics 22(1): 55–70.
Waugh, Linda R. and Monique Monville-Burston 1990. Introduction: The life, work and influ-
ence of Roman Jakobson. In: Linda R. Waugh and Monique Monville Burston (eds.),
Roman Jakobson – On Language, 1–45. Cambridge, MA: Belknap Press of Harvard Univer-
sity Press.
Wilcox, Phyllis P. 2004. A cognitive key: Metonymic and metaphorical mappings in ASL. Cogni-
tive Linguistics 15(2): 197–222.
Wilcox, Sherman 2004. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed lan-
guages. Cognitive Linguistics 15(2): 119–147.
Hampe (ed.), From Perception to Meaning: Images Schemas in Cognitive Linguistics, 313–342.
Zlatev, Jordan, Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.) 2008. The Shared Mind:
Perspectives on Intersubjectivity. Amsterdam: John Benjamins.
Irene Mittelberg, Aachen (Germany)

48. Articulation as gesture: Gesture and the nature of language 785
48. Articulation as gesture: Gesture and the

nature of language
1. Introduction
2. Language as performance
3. Signed languages and gesture
4. From gesture to language
5. Cognitive neuroscience and the evolution of brain and language
6. Gesture and the nature of language
7. References
Abstract
This chapter describes an approach to the study of spoken and signed languages, as well
as communicative gestures, as the production and perception of coordinated movements,
or articulatory gestures. Such an approach derives from usage-based theories of language
as performance, and proposes that dynamic system theory can serve as the framework for
understanding language and cognition as action. One implication is a new understanding
of the relation between signed languages and gesture as systems of articulatory gesture.
The articulation as gesture framework also has implications for neuroscience, supporting
the idea that cognition and perception must be intimately linked in an embodied mind,
and suggesting that thinking is itself the evolutionary internalization of movement.
1. Introduction
In order to communicate, animals must create perceptible signals. For human commu-
nication, the predominant way in which signals are produced is by moving parts of our
bodies. For speech, the means of production is predominantly restricted to the vocal
tract. For signed languages and gestural communication, much more of the body is
used, including the hands, face, and body postures.
As Neisser (1967: 156) observed:
To speak is to make finely controlled movements in certain parts of your body, with the
result that information about these movements is broadcast to the environment. For this
reason, the movements of speech are sometimes called articulatory gestures. A person
who perceives speech, then, is picking up information about a certain class of real, physical,
tangible (as we shall see) events that are occurring in someone’s mouth.
This view of speech has given rise to a radically new way of conceptualizing language,
specifically phonology, most succinctly explained by Browman and Goldstein (1985: 35):
Much linguistic phonetic research has attempted to characterize phonetic units in terms of
measurable physical parameters or features. Basic to these approaches is the view that
phonetic description consists of a linear sequence of static physical measures, either articu-
latory configurations or acoustic parameters. The course of movement from one such con-
figuration to another has been viewed as secondary. We have proposed an alternative
approach, one that characterizes phonetic structure as patterns of articulatory movement,
or gestures, rather than static configurations. While the traditional approaches have viewed
the continuous movement of vocal-tract articulators over time as “noise” that tends to
obscure the segment-like structure of speech, we have argued that setting out to character-
ize articulatory movement directly leads not to noise but to organized spatiotemporal
structures that can be used as the basis for phonological generalizations as well as accurate
physical description. In our view, then, a phonetic representation is a characterization of
how a physical system (e.g., a vocal tract) changes over time.
This view led to the development of a branch of linguistics called articulatory phonology
(Browman and Goldstein 1986, 1990, 1992).
2. Language as performance
Under this view, the basic units of speech are articulatory gestures, where gesture is de-
fined as functional unit, an equivalence class of coordinated movements that achieve
some end (Studdert-Kennedy 1987: 77). These functionally-defined ends, or tasks, are
modeled in terms of task dynamics (Hawkins 1992). The vocal tract, for example, is
used not only for producing speech; the speech organs may be used for talking, but
also for chewing, singing, eating, or biting. Each of these activities is a task. In modeling
speech, the task may be the formation of a constriction such as bilabial closure; this task
involves the coordinated action of several articulators, such as the lower lip, upper lip,
and jaw.
The articulatory phonology framework has significance for language and gesture that
goes far beyond describing speech tasks. Other articulators may be modeled in this way
as well. For example, the arm and hand can be used to reach for a cup, scratch your
head, or to produce a sign or gesture. Whether for speech, sign, gesture, or motor activ-
ities unrelated to communication, tasks require the coordinated action of multiple ar-
ticulators moving appropriately in time and space. These coordinated actions, called
coordinative structures, are not hardwired; rather, they are emergent structures in a
dynamically changing system.
Another significant aspect of this framework is that it unifies description over levels
that in other theories are seen as distinct. For example, formalist theories such as
Chomsky’s minimalist program assumes that universal grammar “specifies certain lin-
guistic levels, each a symbolic system” (1995: 167). One such level is a computational
system that generates structural descriptions; these structural descriptions are in turn
seen as instructions that are fed into another level, the articulatory-perceptual
performance system, which specifies how the expression is to be articulated.
Rather than viewing the units of language – whether they are phonemes, syllables,
morphemes, words or formalist structural descriptions – as timeless and non-physical
(i.e., mental) units which must be passed to and implemented in a performance system,
the dynamic view defines language “in a unitary way across both abstract ‘planning’ and
concrete articulatory ‘production’ levels” (Kelso, Saltzman, and Tuller 1986: 31). Thus,
the distinction between competence and performance, which plays such a large theoret-
ical role in generative linguistics, is collapsed into a single system described not in the
machine vocabulary of mental programs and computational systems, but in terms of a
“fluid, organic system with certain thermodynamic properties” (Thelen and Smith 1994:
xix). As Thelen and Smith go on to observe, the distinction between competence and
performance does not make biological sense: “Abstract formal constraints are fine for
disembodied logical systems. But people are biological entities; they are embedded, liv-
ing process. If competence in the Chomskyan sense is part of our biology, then it must
also be embodied in living, real-time process” (Thelen and Smith 1994: 27). The artic-
ulation as gesture framework provides the theoretical basis for describing language as
a dynamic, real-time process.
3. Signed languages and gesture

Originally developed for understanding speech as a dynamic, physical system, the artic-
ulation as gesture framework obviously has relevance for signed languages. One of the
first applications of the framework to signed language was reported by Wilcox (1992),
who relied on the principles of articulatory phonology to describe fingerspelling. Like
speech, fingerspelling also may be characterized as a linear sequence of static physical
measures – the 26 static (with the exception of J and Z) handshape configurations of the
fingerspelled alphabet. In production, however, fingerspelling is never static. Finger-
spelled words are dynamically changing, spatiotemporal structures. Fluent fingerspell-
ing is often highly coarticulated, meaning that canonical hand configurations may
appear only rarely; rather, configurations are influenced by preceding and following
configurations, rate of fingerspelling, careful vs. casual style, and other factors. The
dynamic organization of fingerspelling is also seen in second language acquisition in
the emergence of successive levels of coordinative structures. At first, fingerspellers
attempt to control each individual articulator – fingers, thumb, hand orientation, arm
position – as they produce a single fingerspelled letter. At a later stage, the letters
become coordinative structures, and the learners develop new coordinative structures
at a higher level, that of combinations of letters such as “fi”, “sh”, or “ing”. Eventually,
a new level of coordinative structure is formed and entire words become functional
units.
Wilcox collected kinematic data using motion tracking markers placed on three ar-
ticulators, the thumb, index finger, and hand as deaf signers fingerspelled various simple
words and letter combinations. The data was then divided into fluent and non-fluent
groups. The results showed that the fluent productions could be characterized by coor-
dinative structures across a group of articulators, such that the group acted as a func-
tional unit. In the non-fluent productions, these coordinative structures broke down,
each articulator acting independently.
The articulatory phonology framework also has been used to analyze signed lan-
guage production. Tyrone and her colleagues (Tyrone et al. 2010) examined prosodic
lengthening at phrase boundaries in American Sign Language (ASL). Kinematic data
were collected as American Sign Language users produced target signs with movements
toward or away from the body, in phrase-initial, medial, or final position. Their findings
suggested that signs are lengthened at phrase boundaries by a slowing of the sign’s
entire release gesture, analogous to the type of boundary adjacent slowing that occurs
in speech. This result is consistent with the predictions of a task-dynamic model.
More broadly, the articulation as gesture framework offers a new perspective on
how signed and spoken languages may be unified under one theoretical model. Two
solutions to this unification problem are possible: an abstractionist solution, and an
embodied solution.
The abstractionist solution removes all traces of language as a physical system and
instead views language as a formal system of abstract rules. This solution strips away
the performance of language by means of vocal tracts, hands, faces, and the anatomy
and musculature that controls these articulators. The question of how these non-
physical and timeless mental units are implemented in a physical production system
is left largely unanswered. Likewise, perceptual systems play no part in understanding
language from this perspective because the physical execution of language is removed
from consideration under the abstractionist solution.
The embodied solution claims that language, whether spoken or signed, and indeed
all communication, is made possible because we have physical bodies which we move to
produce signals. In this view, language is the performance of a physical system. Whether
spoken or signed, languages are systems of meaningful movement. While the percep-
tible signals are distinct – acoustic for speech and optical for signs – the distal events
are the same: spoken and signed languages are produced by dynamically organized
articulatory gestures.
Finally, just as the articulation as gesture framework offers a way to unify descrip-
tions across spoken and signed languages, it also extends to the description of the articu-
latory movements that constitute meaningful gestures of the type described by gesture
researchers (Cienki and Müller 2008; Kendon 2004; McNeill 2005). This unification of
three communication systems – speech, sign, and gesture – has significant implications
for the historical development of signed languages, for our understanding of the cog-
nitive and neural underpinnings of language and gesture, and for the evolution of
language.
4. From gesture to language

Historically, signed languages were considered to be “mere gesture” having none of the
features of language. During the infamous 1880 Milan conference, which pitted advo-
cates of signing against oralists who wanted to abolish signed languages and permit
only speech in the education of deaf children, Marius Magnat, a confirmed oralist,
proclaimed:
It is doubtful that sign can engender thought. It is concrete. It is not truly connected with
feeling and thought. (…) It lacks precision. (…) Sign cannot convey number, gender, per-
son, time, nouns, verbs, adverbs, adjectives, he claims. (…) It does not allow [the teacher] to
raise the deaf-mute above his sensations. (…) Since signs strike the senses materially they
cannot elicit reasoning, reflection, generalization, and above all abstraction as powerfully
as can speech. (Lane 1984: 388)
As linguists began to study and understand signed languages as true human language,
the pendulum began to swing in the opposite direction. Signed languages were seen
as categorically distinct from gesture.
The articulation as gesture framework suggests that a third option is possible: non-
linguistic, everyday gestures may become incorporated into the linguistic system of
signed languages through the process known as grammaticalization (Bybee, Perkins,
and Pagliuca 1994; Heine, Claudi, and Hünnemeyer 1991; Heine and Kuteva 2007).
In a series of publications, Wilcox (2004, 2009; Wilcox and Wilcox 2010) has described
two routes by which this takes place.
In one route, a manual gesture enters a signed language as a lexical sign and then devel-
ops into a grammatical morpheme. For example, Janzen and Shaffer (2002) have suggested
that a gesture signaling departure that has been in common use in the Mediterranean
region for centuries was incorporated into French Sign Language as the lexical sign PAR-
TIR “to depart”. This sign appears as a lexical sign meaning “to leave” in American Sign
Language at the turn of the 20th century, but also as a grammatical sign meaning “future”.
Similarly, modal forms in American Sign Language and in Catalan Sign Language can be
traced to gestures that were incorporated into the language and then grammaticalized
from lexical to grammatical forms (Wilcox and Wilcox 1995; Wilcox 2004).
In the second route, the source is not the manual gesture itself; rather, it is the way
that the gesture is produced, its quality or manner of movement, as well as various
facial, mouth, and eye gestures that may accompany a manual gesture or sign. Upon
entering the linguistic system, these manner of movement and facial gestures follow
a developmental path from prosody/intonation to grammatical marker. For example,
gestural manner of movement is incorporated into Italian Sign Language (LIS) in the
expression of modal strength signifying various degrees of impossibility (Wilcox, Ros-
sini, and Antinoro Pizzuto 2010). Pizzuto (1987) observed that verb aspect is expressed
in Italian Sign Language through systematic alterations of the verb’s movement pattern;
for example, “suddenness” is expressed by means of a tense, fast, short movement,
while a verb produced with an elongated, elliptical, large and slow movement marks
an action that is repeated over and over in time or takes place repeatedly. Verb aspect
in American Sign Language is also marked by changes to the temporal dynamics of a
verb’s movement (Klima and Bellugi 1979).
The significance of this work is twofold. First, it suggests that gesture and language
may not be as distinct as previous linguistic theories suggested. A model that views lan-
guage and gesture as physical systems is able to account for this more unified perspec-
tive. Second, it suggests that the distinction among linguistic levels of analysis –
phonetic (prosody/intonation) and grammar – is also better seen as a continuum. If
this is true, then the dynamic approach to language as a physical system applies not
just to phonetics and phonology but to grammar as well.
5. Cognitive neuroscience and the evolution of

brain and language
The articulation as gesture framework also has profound implications for cognitive neu-
roscience. Just as the abstractionist solution trivializes the role of perception in lan-
guage and grammar, the articulation as gesture framework, by grounding language
and gesture in physical systems, suggests that cognition and perception must be intimately
linked. In this view, “what we perceive is determined by what we do,” and perception is
seen as a type of skillful bodily activity (Noë 2004: 1). The same dynamic models that
account for the emergence of coordinative structures in skilled movement, such as fluent
fingerspelling, speech, sign, or gesture, can be recruited to explain cognition. Under this
framework, then, “cognition – mental life – and action – the life of the limbs – are like the
emergent structure of other natural phenomena” (Thelen and Smith 1994: xix).
This view is consistent with several current theories of brain phylogenetic and
ontogenetic development and function. Berthoz (2000: 9), for example, observes that
“perception is more than just the interpretation of sensory messages. Perception is
constrained by action; it is an internal simulation of action.” In his view, the highest

cognitive functions are the evolutionary result of the brain’s ability to skillfully plan
movements to meet the needs of future events.
The theory of neuronal group selection developed by Gerald Edelman is also consis-
tent with the general principles of articulation as gesture. Edelman (1987: 227) defines
gesture as a “degenerate set of all those coordinated motions that can produce a partic-
ular pattern that is adaptive in a phenotype.” This view of gesture plays a significant role
in Edelman’s theory in two ways. First, it ties motor activity to perception. Second,
Edelman (1989) claims that the brain bases of gestural ordering played a significant
role in the evolutionary emergence of language.
The coordination and planning of movement also plays a key role in the theory of
the evolution of the brain and language advanced by Rudolfo Llinás. According to
Llinás (2001: 17), “the evolutionary development of a nervous system is an exclusive
property of actively moving creatures.” One of the problems that the brain had to
solve, according to Llinás, was the dimensionality problem – the need to limit the de-
grees of freedom for any given movement. One solution to this problem is the develop-
ment of a time series of muscle collectives that are successful in achieving some
purpose. These muscle collectives are “activated by a single command and controlled
as a single, functional entity” (Llinás 2001: 35). In other words, muscle collectives or
synergies are, in the articulation as gesture vocabulary, coordinative structures.
Again, the significance is twofold. First, Llinás links the control of movement to the
development of higher cognitive functioning and the mind: “that which we call thinking
is the evolutionary internalization of movement” (Llinás 2001: 35). Second, Llinás also
ties the significance of movement to the emergence of language. He does this in two
ways. First, he claims that the coordinated movements, or gestures, required for speak-
ing are no different than other motor actions, noting that “the premotor events leading
to expression of language are in every way the same as those premotor events that pre-
cede any movement that is executed for a purpose” (Llinás 2001: 228). Second, Llinás
ties the evolutionary emergence of language even more closely to articulation as ges-
ture through a rather broad conception of prosody, which he defines as “a more general-
ized form of motor behavior, an outward gesturing of an internal state, an outward
expression of a centrally generated abstraction that means something to another animal”
(Llinás 2001: 229, italics in original).
6. Gesture and the nature of language

The articulation as gesture framework, specifically articulatory phonology, was devel-
oped as a way to characterized speech as a physical system of articulatory movement.
Its theoretical underpinnings in dynamic systems theory, however, gives the framework
much broader application. The same dynamic principles that are used to model speech
as a physical system that changes over time also can be recruited characterize skilled
activities (Saltzman and Kelso 1987; Swinnen and Wenderoth 2004), cognitive develop-
ment (Smith 2005), neural development and function (Kelso 1995), primate communi-
cation (King 2004), and the evolution of language (Armstrong, Stokoe, and Wilcox
1995; Armstrong and Wilcox 2007). Indeed, it seems that the production and perception
of movement has much more to teach us about the nature of language, communication,
and the mind than has been previously appreciated.
7. References
Armstrong, David F., William C. Stokoe and Sherman Wilcox 1995. Gesture and the Nature of
Armstrong, David F. and Sherman Wilcox 2007. The Gestural Origin of Language. Oxford: Oxford
University Press.
Berthoz, Alain 2000. The Brain’s Sense of Movement. Cambridge, MA: Harvard University Press.
Browman, Catherine P. and Louis M. Goldstein 1985. Dynamic modeling of phonetic structure. In:
Victoria A. Fromkin (ed.), Phonetic Linguistics, 35–53. New York: Academic Press.
Browman, Catherine P. and Louis M. Goldstein 1986. Towards an articulatory phonology. Phonol-
ogy Yearbook 3: 219–252.
Browman, Catherine P. and Louis M. Goldstein 1990. Representation and reality: Physical systems
and phonological structure. Journal of Phonetics 18(3): 411–424.
Browman, Catherine P. and Louis M. Goldstein 1992. Articulatory phonology. Phonetica 49:
155–180.
Bybee, Joan, Revere Perkins and William Pagliuca 1994. The Evolution of Grammar: Tense,
Aspect, and Modality in the Languages of the World. Chicago: University of Chicago Press.
Chomsky, Noam 1995. The Minimalist Program. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Cienki, Alan J. and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Edelman, Gerald M. 1987. Neural Darwinism: The Theory of Neuronal Group Selection. New
York: Basic Books.
Edelman, Gerald M. 1989. The Remembered Present: A Biological Theory of Consciousness. New
York: Basic Books.
Hawkins, Sarah 1992. An introduction to task dynamics. In: Gerard J. Docherty and Robert D.
Ladd (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 9–25. Cambridge:
Heine, Bernd, Ulrike Claudi and Friederike Hünnemeyer 1991. Grammaticalization: A Concep-
tual Framework. Chicago: University of Chicago Press.
Heine, Bernd and Tania Kuteva 2007. The Genesis of Grammar: A Reconstruction. Oxford:
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL gram-
maticization. In: Richard Meier, David Quinto and Kearsy Cormier (eds.), Modality and
Structure in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University
Press.
Kelso, J. A. Scott 1995. Dynamic Patterns: The Self-Organization of Brain and Behavior. Cam-
bridge: Massachusetts Institute of Technology Press.
Kelso, J. A. Scott, Elliot L Saltzman and Betty Tuller 1986. The dynamical perspective on speech
production: Data and theory. Journal of Phonetics 14: 29–59.
Press.
King, Barbara J. 2004. The Dynamic Dance: Nonvocal Communication in African Great Apes.
Cambridge, MA: Harvard University Press.
versity Press.
Lane, Harlan 1984. When the Mind Hears: A History of the Deaf. New York: Random House.
Llinás, Rodolfo 2001. I of the Vortex: From Neurons to Self. Cambridge: Massachusetts Institute of
Technology Press.
Neisser, Ulrich 1967. Cognitive Psychology. New York: Appleton-Century-Crofts.
Noë, Alva 2004. Action in Perception. Cambridge: Massachusetts Institute of Technology Press.
Pizzuto, Elena 1987. Aspetti morfosintattici. In: Virginia Volterra (ed.), La Lingua Italiana dei
Segni – La Comunicazione Visivo-Gestuale dei Sordi, 179–209. Bologna: Il Mulino.
Saltzman, Elliot and J.A. Scott Kelso 1987. Skilled actions: A task-dynamic approach. Psycholog-
ical Review 94(1): 84.
Smith, Linda B. 2005. Cognition as a dynamic system: Principles from embodiment. Developmen-
tal Review 25(3–4): 278–298.
Studdert-Kennedy, Michael 1987. The phoneme as a perceptuomotor structure. In: Alan Allport,
Donald G. Mackay, Wolfgang Prinz and Eckart Scheerer (eds.), Language Perception and Pro-
duction: Relationships between Listening, Speaking, Reading, and Writing, 67–84. London:
Academic Press.
Swinnen, Stephan P. and Nicole Wenderoth 2004. Two hands, one brain: Cognitive neuroscience of
bimanual skill. Trends in Cognitive Sciences 8(1): 18–25.
Thelen, Esther and Linda B. Smith 1994. A Dynamic Systems Approach to the Development of
Cognition and Action. Cambridge: Massachusetts Institute of Technology Press.
Tyrone, Martha E., Hosung Nam, Elliot Saltzman, Guarav Mathur and Louis Goldstein 2010.
Prosody and Movement in American Sign Language: A Task-Dynamics Approach. Proceedings
from 5th Speech Prosody International Conference, Chicago.
Wilcox, Sherman 1992. The Phonetics of Fingerspelling. Amsterdam: John Benjamins.
Wilcox, Sherman 2004. Gesture and language: Cross-linguistic and historical data from signed lan-
guages. Gesture 4(1): 43–75.
Wilcox, Sherman 2009. Symbol and symptom: Routes from gesture to signed language. Annual
Review of Cognitive Linguistics 7: 89–110.
Wilcox, Sherman, Paolo Rossini and Elena Antinoro Pizzuto 2010. Grammaticalization in sign
languages. In: Diane Brentari (ed.), Sign Languages, 332–354. Cambridge: Cambridge Univer-
sity Press.
Wilcox, Sherman and Phyllis Wilcox 1995. The gestural expression of modality in American Sign
Language. In: Joan Bybee and Suzanne Fleischman (eds.), Modality in Grammar and Dis-
course, 135–162. Amsterdam/Philadelphia: John Benjamins.
Wilcox, Sherman and Phyllis Wilcox 2010. The analysis of signed languages. In: Bernd Heine and
Heiko Narrog (eds.), The Oxford Handbook of Linguistic Analysis, 739–760. Oxford: Oxford
University Press.
Sherman Wilcox, Albuquerque, NM (USA)
49. How our gestures help us learn

1. Introduction
2. Gesture is associated with learning and reflects knowledge
3. Gesture can play a causal role in changing knowledge
4. How does gesture change knowledge?
5. References
Abstract
The gestures we produce when we talk reflect our thoughts. But the hand movements that
accompany speech can do more – they can change the way we think. Gesture can bring
about cognitive change through its effects on the learning environment – learners’
49. How our gestures help us learn 793
gestures signal to their communication partners that they are ready to learn a particular
task and their partners alter their input accordingly. Gesture can also bring about change
directly through its effects on learners themselves – learners’ gestures can change, bring
new implicit knowledge into their repertoires, bring action information into their mental
representations, and lighten their cognitive load. Whatever the mechanism, it is clear that
the way we move our hands when we speak is not mere handwaving – our hands can
change how we think and, as such, have the potential to be harnessed in teaching and
learning situations.
1. Introduction
People move their hands when they talk – they gesture. Even congenitally blind speak-
ers who have never seen anyone gesture move their hands when they speak (Iverson
and Goldin-Meadow 1998). Although the gestures that accompany speech might, at
times, appear to be meaningless movements, they are not mere handwaving. Gestures
are synchronized, both semantically and temporally, with the words they accompany
(Kendon 1980; McNeill 1992) and, in this sense, form an integrated system with speech.
The goal of this chapter is to demonstrate that the relation gesture holds to speech
has cognitive implications. Learners who are on the verge of making progress on a task
gesture differently from learners who are not as far along in their thinking – they pro-
duce gestures that convey information that is different from the information they
convey in their speech (Goldin-Meadow 2003). These so-called “gesture-speech mis-
matches” are a signal that the learner is ready to learn that particular task and, in
fact, learners who produce gesture-speech mismatches show greater gains from instruc-
tion on the task than children who produce only matches (Church and Goldin-Meadow
1986; Perry, Church, and Goldin-Meadow 1988; Pine, Lufkin, and Messer 2004). Ges-
tures, when evaluated in relation to the speech they accompany, thus reflect a learner’s
cognitive state.
There is, in addition, recent work suggesting that gesture can do more than reflect
the state of a learner’s knowledge – it can act as a catalyst for change and, as such,
play a causal role in learning. We begin with a look at the evidence suggesting that ges-
ture is associated with learning and reflects what learners know. We then turn to new
findings demonstrating that gesture can play an active role in learning, and we explore
the mechanisms by which gesture has its effects.
2. Gesture is associated with learning and reflects knowledge

We know that adults’ gestures can provide insight into their problem-solving strategies.
For example, when adults produce a strategy in gesture that reinforces the strategy
mentioned in speech, they are particularly likely to include that strategy in their
plans for solving the problem – more likely than if their gestures do not reinforce the
strategy mentioned in speech (Alibali et al. 1999). Gesture thus reflects ideas that the
adult has about a problem but does not say, ideas that will have an impact on how
the problem is solved. But does gesture predict the acquisition of new knowledge,
ideas that are not yet in the learner’s repertoire? There is evidence that it does and
in a variety of learning situations. We consider two in the next sections, learning
language and learning math.
2.1. Gesture predicts changes in language learning

Children often enter language hands first, producing gestures before they produce their
first words (Bates 1976; Bates et al. 1979; Petitto 1988). Gesture thus extends the range
of ideas a young child is able to express. This fact opens up the possibility that gesture
serves a facilitating function for language learning. If it does, changes in gesture should
not only predate changes in language, they should also predict them. And they do –
both for words and for sentences. In terms of words, the more a child gestures early
on, the larger the child’s vocabulary later in development (Acredolo and Goodwyn
1988; Rowe and Goldin-Meadow 2009a). We can even predict which lexical items
will enter a child’s verbal vocabulary by looking at the objects that child indicated in
gesture several months earlier (Iverson and Goldin-Meadow 2005). In terms of sen-
tences, we can predict when a child will enter the two-word stage by looking at the
child’s gesture-speech combinations. The age at which children first produce combina-
tions in which gesture conveys one idea and speech another (point at cup + “mommy”)
reliably predicts the age at which they first produce two-word utterances (“mommy’s
cup”; Goldin-Meadow and Butcher 2003; Iverson et al. 2008; Iverson and Goldin-
Meadow 2005; see also Özçaliskan and Goldin-Meadow 2005).
Gesture thus forecasts the earliest stages of language learning. It might do so
because gesture use is an early index of global communicative skill. If so, children
who convey a large number of different meanings in their early gestures might be gen-
erally verbally facile and therefore not only have large vocabularies later in develop-
ment but also produce relatively complex sentences. Alternatively, particular types of
early gesture use could be specifically related to particular aspects of later language
use. In fact, we find that gesture selectively predicts later language learning. The number
of different meanings children convey in gesture at 18 months predicts their spoken
vocabulary at 42 months, but the number of gesture-speech combinations they produce
at 18 months does not. In contrast, the number of gesture-speech combinations, partic-
ularly those conveying sentence-like ideas, children produce at 18 months predicts sen-
tence complexity at 42 months, but the number of meanings they convey in gesture at
18 months does not (Rowe and Goldin-Meadow 2009b). We can thus predict particular
language milestones by watching the particular ways in which children move their
hands two years earlier.
2.2. Gesture predicts changes in math learning

Once the fundamentals of language have been mastered, gesture could lose its cognitive
potency. But, in fact, gesture seems to maintain its ability to predict learning, but now
with respect to other (non-language) domains. We find, for example, that the way chil-
dren explain their solutions to a math problem predicts whether they will profit from
instruction in that problem. Consider a 9-to-10-year old child who solves problems
such as 5+3+6=__+6 incorrectly but justifies her incorrect solution by producing ges-
tures that convey a different problem-solving strategy from her speech (e.g., saying,
“I added the 5, the 3, and the 6, and put 14 in the blank,” an add-to-equal-sign strategy,
while pointing at the 5, the 3, the 6 on the left side of the equation, and the 6 on the
right side of the equation, an add-all-numbers strategy). This child will be more likely
to profit from instruction in mathematical equivalence than a child who justifies his
incorrect answer by producing gestures that convey the same information as his speech
(e.g., saying, “I added the 5, the 3, and the 6, and put 14 in the blank,” while pointing at
the 5, the 3, and the 6 on the left side of the equation, i.e., producing an add-to-equal-
sign strategy in both speech and gesture; Perry, Church, and Goldin-Meadow 1988;
Alibali and Goldin-Meadow 1993).
This phenomenon turns out to be a general one, found not only in 9- and 10-year olds
learning math problems and in toddlers learning to produce two-word sentences, but also
in 5- to 8-year olds learning to solve conservation problems (Church and Goldin-
Meadow 1986); in 5- to 6-year-olds learning to mentally rotate objects (Ehrlich, Levine,
and Goldin-Meadow 2006); in 5-year-olds learning to balance blocks on a beam (Pine,
Lufkin, and Messer 2004); and in adults learning how gears work (Perry and Elder 1997).
3. Gesture can play a causal role in changing knowledge

Gesture can thus mark learners as being ready to change their knowledge. But evidence
is mounting that gesture not only reflects readiness for knowledge change but also plays
a role in bringing that change about. Gesture can play a role in causing change in two
non-mutually exclusive ways: (1) The gestures learners produce contain information
about whether they are ready to learn a task. If their communication partner is able
to “read” those gestures, the partner could then provide input that facilitates learning.
In other words, learners’ gestures could change the learning environment. (2) The ges-
tures learners produce could also have a direct effect on what they are learning. Learn-
ers’ gestures could change the learners themselves. We review evidence for both types
of effects in the next sections.
3.1. A learner’s gestures can change the learning environment

Consider a child who does not yet know the word “dog” and refers to the animal by
pointing at it. If mother and child are engaged in an episode of joint attention, mother
is likely to respond, “yes, that’s a dog,” thus supplying the child with just the word he is
looking for. Or consider a child who points at her father’s scarf while saying the word
“dada.” Her father may reply, “that’s dada’s scarf,” thus translating the child’s gesture-
speech combination into a simple sentence. Because they are immediate responses to
the child’s gestures and therefore finely-tuned to the child’s current state, parental re-
sponses of this sort could be particularly effective in teaching children how an idea is
expressed in the language they are learning.
If child gesture is playing this type of role in language learning, mothers’ translations
ought to be related to later word and sentence learning in their children. There is evi-
dence that they are. In terms of word learning, when mothers translate the gestures that
their children produce into words, those words are more likely to quickly become part
of the child’s vocabulary than words for gestures that mothers do not translate (Goldin-
Meadow et al. 2007). In terms of sentence learning, children whose mothers frequently
translate their child’s gestures into speech tend to be first to produce two-word utter-
ances. The age at which children produce their first two-word utterance is correlated
with the proportion of times mothers translate their child’s gestures into speech
(Goldin-Meadow et al. 2007), suggesting that mothers’ targeted responses to their chil-
dren’s gestures might be playing a role in helping the children take their first steps into
multi-word combinations.
There is evidence that children’s gestures can elicit targeted input in tasks other than
language learning. Adults interacting with children solving math problems change the
input they give children as a function of the gestures that the children produce.
Goldin-Meadow and Singer (2003) asked teachers to interact individually with children
who could not yet solve the mathematical equivalence problems. They found that the
teachers gave different kinds of instruction to children who produced gesture-speech
mismatches than they gave to children who produced only gesture-speech matches.
In particular, the teachers gave more different kinds of problem-solving strategies in
speech to children who produced mismatches than to children who produced matches.
Teachers also produced more mismatches of their own – typically containing two cor-
rect strategies, one in speech and the other in gesture – when teaching children who pro-
duced mismatches than when teaching children who produced matches. Thus, teachers
do notice the gestures learners produce and they change their instruction accordingly.
Does the tailored input teachers give math learners promote learning? To find out,
Singer and Goldin-Meadow (2005) designed a math lesson based on the instruction that
teachers spontaneously gave children who produced mismatches. In particular, the les-
son included either one correct strategy (equalizer) or two correct strategies (equalizer
and add-subtract) in speech; in addition, the instruction either contained no gestures at
all, matching gestures, or mismatching gestures. There were thus six different training
groups. Interestingly, including more than one strategy in speech in the lesson turned
out to be an ineffective teaching strategy – children improved significantly more after
the lesson if they had been given one strategy in speech than if they had been given
two. But including mismatches in the lesson was very effective – children improved sig-
nificantly more after the lesson if their lesson included mismatching gestures than if it
included matching gestures or no gestures at all. The lesson that was most effective con-
tained the equalizer strategy in speech, combined with the add-subtract strategy in ges-
ture (e.g., “to solve this problem, you need to make one side equal to the other side,”
said while pointing at the three numbers on the left side of the equation and then pro-
ducing a “take away” gesture under the number on the right side). In other words, a
lesson containing two strategies was particularly effective, but only if the two strategies
were produced in different modalities. Including gesture in instruction has, in general,
been found to promote learning in a variety of tasks (Church, Ayman-Nolley, and
Mahootian 2004; Perry, Berch, and Singleton 1995; Ping and Goldin-Meadow 2008;
Valenzeno, Alibali, and Klatzky 2003).
Taken together, the findings suggest that the gestures learners produce convey mean-
ing that is accessible to their communication partners. The partners, in turn, alter the way
they respond to a learner as a function of that learner’s gestures. Learners then profit
from those responses, which they elicited through their gestures. Gesture can thus play
a causal role in learning indirectly through the effect it has on the learning environment.
3.2. A learner’s gestures can change the learner

Gesture also has the potential to contribute to learning by having a direct effect on the
learner. Although this hypothesis has not been explored in the domain of language
learning, it has been tested with respect to math learning. One reason that including
gesture in a lesson may be good for learning is because seeing a teacher gesture en-
courages learners to produce gestures of their own. Indeed, Cook and Goldin-Meadow
(2006) found that children were more likely to gesture during a lesson when their
teacher gestured. Importantly, those children who gestured during the lesson were
more likely to profit from the lesson than those who did not gesture. Gesturing can
help children get the most out of a lesson.
The children in the Cook and Goldin-Meadow (2006) study were not forced to ges-
ture – they chose to. Thus, the children who chose to gesture may have been more ready
to learn than the children who chose not to gesture. If so, the fact that they reproduced
the experimenter’s gestures may have been a reflection of that readiness to learn, rather
than a causal factor in the learning itself. To address this concern, gesture needs to be
manipulated more directly – all of the children in the gesture group must reproduce the
experimenter’s hand movements during the lesson.
Cook, Mitchell and Goldin-Meadow (2008) solved this problem by teaching children
words and hand movements prior to the math lesson, and then asking the children to
reproduce those words and/or gestures during the lesson itself. All of the children
were then given the same lesson in mathematical equivalence. The only difference
among the groups during the lesson was the children’s own behavior – the children
repeated the words and/or hand movements they were taught before and after each
problem they were given to solve. These self-produced behaviors turned out to make
a big difference, not in how well the children did at post-test (children in all three
groups made equal progress right after the lesson), but in how long they retained the
knowledge they had been taught – children who were told to produce gestures (with
or without speech) during the lesson performed significantly better than children who
were told to produce only speech on a follow-up test four weeks later. Thus, the chil-
dren’s own hand movements worked to cement what they had learned, suggesting
that gesture can play a role in knowledge change by making learning last.
The information that the children produced in gesture in the Cook, Mitchell and
Goldin-Meadow (2008) study (the equalizer strategy – the way to solve the problem
is to make one side of the equation equal to the other) was reinforced by the equalizer
information they heard in both speech and gesture during the lesson. Thus, their ges-
tures did not provide new information. To determine whether gesture can create new
ideas, Goldin-Meadow, Cook and Mitchell (2009) again taught children words and
hand movements to produce before the lesson began. But this time, the hand move-
ments instantiated a different strategy from the one conveyed in the words they were
taught. All of the children were taught to say the equalizer strategy in speech, “to
solve this problem, I need to make one side equal to the other side,” but some were
also taught to produce the grouping strategy in gesture (a V-hand under the 6+3 in
the problem 6+3+5=__+5, followed by a point at the blank; the correct answer can
be gotten by grouping and summing 6 and 3 in this problem). The children were re-
quired to produce the words or words+gestures they had been taught before and
after each problem they solved during the lesson, which did not include the grouping
strategy. Children who produced grouping in gesture learned more from the lesson
than children who did not. Since the teacher did not use the grouping strategy in either
gesture or speech, and the children only produced the grouping strategy in gesture and
not in speech, the strategy had to have come from the children’s own hands. Gesture
can thus introduce new knowledge into a child’s repertoire.
Gesture can bring new knowledge into a children’s system if the child is told to pro-
duce particular hand movements. But learners are rarely told how to move their hands.
What would happen if children were told to move their hands but without instruction
about which movements to make? Broaders et al. (2007) addressed this question by
first asking children to solve six mathematical equivalence problems without any in-
structions about what to do with their hands. The children were then asked to solve a
second set of comparable problems but, this time, some children were told to move
their hands as they explained their solutions to the second set of problems; others
were told not to move their hands or given no instructions at all about what to do
with their hands. Children who were told to gesture on the second set of problems
added significantly more new strategies to their repertoires than children who were
told not to gesture and than children given no instructions at all. Most of those strate-
gies were produced uniquely in gesture, not in speech, and, surprisingly, most were cor-
rect. The children who were told to gesture had been turned into mismatchers – they
produced information in gesture that was different from the information they produced
in speech. Were these created mismatchers also ready to learn? To find out, Broaders
et al. (2007) gave another group of children the same instructions to gesture or not
to gesture while solving a second set of mathematical equivalence problems, and
then gave all of the children a lesson in mathematical equivalence. Children told to ges-
ture again added more strategies to their repertoires after the second set of problems
than children told not to gesture. Moreover, children told to gesture showed more
improvement on the post-test than children told not to gesture, particularly if the chil-
dren had added strategies to their repertoires after being told to gesture. Being told to
gesture thus encouraged children to express new ideas that they had previously not
expressed, which, in turn, led to learning.
4. How does gesture change knowledge?

Gesture can play a causal role in learning, but what is the mechanism that underlies these
effects? The next sections describe mechanisms through which gesture has been shown to
affect cognition, but there are undoubtedly others that have not yet been explored.
4.1. Gesture creates implicit knowledge

When a speaker produces a gesture-speech mismatch, the information conveyed in ges-
ture is, by definition, different from the information conveyed in the accompanying
speech. Recall the child who produced an add-all-numbers strategy in gesture while
producing an add-to-equal-sign strategy in speech. The information conveyed in gesture
in a mismatch is unique to gesture in that response. However, it is possible that this
child is able to articulate the add-all-numbers strategy in speech, and does so in
other responses. Alternatively, the information conveyed in gesture in a mismatch
may be accessible only to gesture. If so, this child should not be able to articulate
add-all-numbers in speech in any of his responses. Goldin-Meadow, Alibali and Church
(1993) explored these alternatives, and found that the strategies that a child expressed
in gesture in a mismatch were almost never found in that child’s speech on any of his
responses. Thus, the information conveyed in gesture in a mismatch does not appear
to be verbalizable and, in this sense, constitutes implicit knowledge.
What happens when learners either see others gesture or produce gestures of their
own? One possibility is that gesture activates implicit knowledge that the learner
already had. But gesture might also be able to create new implicit knowledge, and thus
serve as a vehicle by which new information can be brought into the system. If gesture
serves only to activate implicit knowledge (as opposed to creating it), then asking lear-
ners to gesture (or having them observe gesture) should improve learning only for chil-
dren who already have implicit knowledge. However, if gesture can create new
knowledge, then gesture should also be effective for children who do not yet have
implicit knowledge.
To determine whether gesture affects learning by creating implicit knowledge or acti-
vating it, Cook and Goldin-Meadow (2009) reanalyzed data from previous studies, divid-
ing children into those who had implicit knowledge prior to the experimental
manipulation and those who did not. They used the gestures that children produced
prior to instruction, evaluated in relation to the accompanying speech, as a marker for
implicit knowledge. Children who produced at least some gestures that conveyed different
information from their speech (i.e., children who produced gesture-speech mismatches) on
a particular task were classified as “having implicit knowledge” with respect to that task.
Children whose gestures always conveyed the same information as their speech on a task
were classified as “not having implicit knowledge” with respect to that task.
If gesture is merely activating implicit knowledge, as opposed to creating it, then ask-
ing learners to gesture (or having them observe gesture) should improve learning only
for children who already have implicit knowledge. However, if gesture can create new
knowledge, then gesture should also be effective for children who do not yet have
implicit knowledge. Cook and Goldin-Meadow (2009) found that gesture did, in fact,
lead to learning not only in children who had implicit knowledge, but also in children
who did not have implicit knowledge, suggesting that the gesture manipulations were
not merely activating implicit knowledge but were creating it.
In addition to pinning down the mechanism by which gesture affects learning, Cook
and Goldin-Meadow (2009) were able to explore whether having implicit knowledge
prepares children to profit from instruction. They found that instruction of any sort,
whether it contained gesture or not, led to improved learning on a task if children already
had implicit knowledge on that task. In contrast, for children who did not have implicit
knowledge prior to instruction, including gesture in instruction (either seeing other peo-
ples’ gestures or producing one’s own gestures) was necessary in order for the children to
show improvement. In general, the analyses showed that gesture manipulations promote
learning in children who do not yet have implicit knowledge, suggesting that gesture can
indeed create implicit knowledge rather than merely activate it.
4.2. Gesture grounds thought in action

Gesturing has been shown to change speakers’ thoughts by introducing action informa-
tion into their mental representations of a problem, which then impacts how they solve
the problem. Beilock and Goldin-Meadow (2009) asked adults to solve a well known
problem, the Tower of Hanoi, in which a stack of four disks, arranged from the largest
on bottom to the smallest on top, must be moved from the leftmost of three pegs to the
rightmost; only one disk can be moved at a time and larger disks cannot be placed on
top of smaller disks (Newell and Simon 1972). In their task, the weights of the disks
were correlated with their size: The smallest disk weighed the least (0.8kg), the largest
disk the most (2.9kg). After solving the problem once (Tower of Hanoi 1), adults were
asked to explain how they solved the problem to a confederate. In the final step, adults
were asked to solve the Tower of Hanoi problem a second time (Tower of Hanoi 2), but
this time half of the adults used disks whose weights were switched so that the smallest
disk weighed the most and the largest the least (Switch condition); the other half used
disks whose weights remained correlated with their size (No-Switch condition).
Adults gestured when explaining how they solved Tower of Hanoi 1, often producing
action gestures; for example, one-handed or two-handed motions mimicking actions
used to move the disks (see Cook and Tanenhaus 2009; Garber and Goldin-Meadow
2002). Some of these gestures – in particular, one-handed gestures produced to describe
moving the smallest disk – were incompatible with the actions needed to solve Tower of
Hanoi 2 in the Switch condition (where the smallest disk was now the heaviest and re-
quired two hands to move), but were not incompatible with actions needed to solve
Tower of Hanoi 2 in the No-Switch condition (where the smallest disk continued to
be the lightest and could easily be moved one-handed).
The more incompatible gestures that adults in the Switch condition produced when
explaining how they solved Tower of Hanoi 1, the worse they performed on Tower of
Hanoi 2. No such relation between gesture and solution performance was found in the
No-Switch condition. Gesturing thus seemed to change adults’ mental representation of
the Tower of Hanoi task. After gesturing about the smallest disk with one hand, the
adults mentally represented this disk as a light object. For adults in the Switch condi-
tion, this representation was incompatible with the disk that the subjects eventually en-
countered when solving Tower of Hanoi 2 (the smallest disk was now too heavy to lift
with one hand). The relatively poor performance of adults in the Switch condition on
Tower of Hanoi 2 suggests that the mental representation created by gesture interfered
with subsequent performance on Tower of Hanoi 2.
There is, however, another possibility. The adults’ gestures could be reflecting their
representation of the smallest disk as a light object rather than creating it. But if gesture
changes thought by adding action information – rather than merely reflecting action
information already inherent in one’s mental representation of a problem – then perfor-
mance of adults in the Switch condition should not be impaired if those adults do not
gesture. Beilock and Goldin-Meadow (2009) asked a second group of adults to solve
Tower of Hanoi 1 and Tower of Hanoi 2, but they were not asked to do the explanation
task in between and, as a result, did not gesture. These adults performed equally well on
Tower of Hanoi 2 in both the Switch and No-Switch conditions. Switching the weights of
the disks interfered with performance only when subjects had previously produced
action gestures relevant to the task.
Gesturing thus adds action information to speakers’ mental representations – when
incompatible with subsequent actions, this information interferes with problem-solving.
When the information gesture adds to a speaker’s mental representations is compatible
with future actions, those actions will presumably be facilitated. Gesturing introduces
action into thought and, in this way, may be another instance of how the way in which
we move our bodies influences how we think (Barsalou 1999; Beilock and Holt 2007;
Glenberg and Robertson 1999).
4.3. Gesture lightens cognitive load

Gesturing can also have an impact on thinking by lightening the load on working mem-
ory. Gesturing while speaking seems likely to require motor planning, execution, and
coordination of two separate cognitive and motor systems. If so, gesturing might increase
speakers’ cognitive load. Alternatively, gesture and speech might form a single, integrated
system in which the two modalities work together to convey meaning. Under this view,
gesturing while speaking would reduce demands on the speaker’s cognitive resources (rel-
ative to speaking without gesture), and free up cognitive capacity to perform other tasks.
To distinguish these alternatives and to determine the impact of gesturing on a
speaker’s cognitive load, Goldin-Meadow et al. (2001; see also Wagner, Nusbaum
and Goldin-Meadow 2004) explored how gesturing on one task (explaining a math
problem) affected performance on a second task (remembering a list of words or let-
ters) carried out at the same time. If gesturing increases cognitive load, gesturing
while explaining the math problems should take away from the resources available
for remembering. Memory should then be worse when speakers gesture than when
they do not gesture. Alternatively, if gesturing reduces cognitive load, gesturing while
explaining the math problems should free up resources available for remembering.
Memory should then be better when speakers gesture than when they do not. Both
adults and children remembered significantly more items when they gestured during
their math explanations than when they did not gesture. Gesturing appeared to save
the speakers cognitive resources on the explanation task, permitting the speakers to
allocate more resources to the memory task.
Why does gesturing lighten cognitive load? Perhaps it is the motor aspects of gesture
that are responsible for the cognitive benefits association with producing gesture. If so,
the meaning of the gesture should not affect its ability to lighten cognitive load.
Wagner, Nusbaum and Goldin-Meadow (2004) replicated the cognitive load effect on
adults asked to remember lists of letters or locations on a grid while explaining how
they solved a factoring problem. The adults remembered more letters or locations
when they gestured than when they did not gesture. But the types of gestures they pro-
duced mattered. In particular, gestures that conveyed different information from the ac-
companying speech (mismatching gesture) lightened load less than gestures that
conveyed the same information as the accompanying speech (matching gesture).
Thus, the effect gesture has on working memory cannot be a pure motor phenomenon –
it must stem instead from the coordination of motor activity and higher order concep-
tual processes. If the motor aspects of gesture were solely responsible for the cognitive
benefits associated with gesture production, mismatching gestures should be as effective
in promoting recall as matching gestures – after all, mismatching gestures are motor
behaviors that are physically comparable to matching gestures.
Gesturing on a task thus allows speakers to conserve cognitive resources. Learners
might then have more resources available to learn a new task if they gesture while
tackling the task than if they do not gesture.
To summarize, the gestures we produce when we talk not only reflect our thoughts
but also play a role in changing those thoughts. Gesture can bring about cognitive
change in at least two ways – by signalling to others that a learner is in a particular cog-
nitive state and, as a result, eliciting input that could lead to learning, and by changing
the learner’s cognitive state itself. There are a number of mechanisms through which
gesture could bring about cognitive change, including bringing new implicit knowledge
into the learner’s repertoire, bringing action information into the learner’s mental re-
presentations, and lightening the learner’s cognitive load. Whatever the mechanism,
it is clear that the way we move our hands when we speak is not mere handwaving –
our hands can change how we think and, as such, have the potential to be harnessed in
teaching and learning situations.
Acknowledgement
Supported by grant no. R01 HD47450 and P01HD40605 from NIDCD.
5. References
Acredolo, Linda and Susan Goodwyn 1988. Symbolic gesturing in normal infants. Child Develop-
ment 59: 450–456.
Alibali, Martha Wagner, Miriam Bassok, Karen Olseth, Sharon Syc, and Susan Goldin-Meadow
1999. Illuminating mental representations through speech and gesture. Psychological Sciences
10: 327–333.
Alibali, Martha Wagner and Susan Goldin-Meadow 1993. Transitions in learning: What the hands
reveal about a child’s state of mind. Cognitive Psychology 25: 468–523.
Barsalou, Lawrence 1999. Perceptual symbols systems. Behavioral and Brain Sciences 22: 577–660.
Bates, Elizabeth 1976. Language and Context. New York: Academic Press.
Bates, Elizabeth, Laura Benigni, Inge Bretherton, Luigia Camaioni and Virginia Volterra 1979. The
Emergence of Symbols: Cognition and Communication in Infancy. New York: Academic Press.
Beilock, Sian L. and Susan Goldin-Meadow 2009. Gesture grounds thought in action, under review.
Beilock, Sian L. and Susan Goldin-Meadow 2010. Gesture changes thought by grounding it in
action. Psychological Science 21: 1605–1610.
Beilock, Sian L. and Lauren E. Holt 2007. Embodied preference judgments: Can likeability be
driven by the motor system? Psychological Science 18: 51–57.
Broaders, Sara C., Susan Wagner Cook, Zachary Mitchell and Susan Goldin-Meadow 2007. Mak-
ing children gesture brings out implicit knowledge and leads to learning. Journal of Experimen-
tal Psychology: General 136: 539–550.
Church, Ruth B., Saba Ayman-Nolley and Shahrzad Mahootian 2004. The effects of gestural
instruction on bilingual children. International Journal of Bilingual Education and Bilingualism
7(4): 303–319.
Church, Ruth B. and Susan Goldin-Meadow 1986. The mismatch between gesture and speech as
an index of transitional knowledge. Cognition 23(1): 43–71.
Cook, Susan Wagner and Susan Goldin-Meadow 2006. The Role of Gesture in Learning: Do children
use their hands to change their minds? Journal of Cognition & Development 7(2): 211–232.
Cook, Susan Wagner, Melissa Duff and Susan Goldin-Meadow 2013. Co-speech gesture is a vehicle
for non-declarative knowledge that can change a learner’s mind, under review.
Cook, Susan Wagner, Zachary Mitchell and Susan Goldin-Meadow 2008. Gesturing makes learn-
ing last. Cognition 106: 1047–1058.
Cook, Susan Wagner and Michael K. Tanenhaus 2009. Embodied communication: Speakers’ ges-
tures affect listeners’ actions. Cognition 113(1): 98–104.
Ehrlich, Stacy B., Susan C. Levine and Susan Goldin-Meadow 2006. The importance of gesture in
children’s spatial reasoning. Developmental Psychology 42: 1259–1268.
Garber, Philip and Susan Goldin-Meadow 2002. Gesture offers insight into problem-solving in
adults and children. Cognitive Science 26: 817–831.
Glenberg, Arthur M. and David A Robertson 1999. Indexical understanding of instructions. Dis-
Goldin-Meadow, Susan, Martha W. Alibali and Ruth B. Church 1993. Transitions in concept acqui-
sition: Using the hand to read the mind. Psychological Review 100(2): 279–297.
Goldin-Meadow, Susan and Cindy Butcher 2003. Pointing toward two-word speech in young chil-
dren. In Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 85–107.
Goldin-Meadow, Susan, Susan Wagner Cook and Zachary Mitchell 2009. Gesturing gives children
new ideas about math. Psychological Science 20(3): 267–272.
Goldin-Meadow, Susan, Whitney Goodrich, Eve Sauer and Jana M. Iverson 2007. Young children
use their hands to tell their mothers what to say. Developmental Science 10: 778–785.
Goldin-Meadow, Susan, Howard Nusbaum, Spencer D. Kelly and Susan Wagner 2001. Explaining
math: Gesturing lightens the load. Psychological Science 12: 516–522.
Goldin-Meadow, Susan and Melissa A. Singer 2003. From children’s hands to adults’ ears: Ges-
ture’s role in the learning process. Developmental Psychology 39: 509–520.
Iverson, Jana M., Olga Capirci, Virginai Volterra, and Susan Goldin-Meadow 2008. Learning to
talk in a gesture-rich world: Early communication of Italian vs. American children. First Lan-
guage 28: 164–181.
Iverson, Jana M. and Susan Goldin-Meadow 1998. Why people gesture as they speak. Nature 396: 228.
Iverson, Jana M. and Susan Goldin-Meadow 2005. Gesture paves the way for language develop-
ment. Psychological Science 16: 368–371.
Ritchie Key (ed.), Relationship of Verbal and Nonverbal Communication, 207–228. The Hague:
Mouton.
of Chicago Press.
Newell, Allen and Herbert A. Simon 1972. Human Problem Solving. Englewood Cliffs, NJ: Pre-
ntice-Hall.
Şeyda, Özçalışkan, and Susan Goldin-Meadow 2005. Gesture is at the cutting edge of early lan-
guage development. Cognition 96(3): B101–B113.
Perry, Michelle, Denise B. Berch and Jenny L. Singleton 1995. Constructing shared understanding:
The role of nonverbal input in learning contexts. Journal of Contemporary Legal Issues Spring
(6): 213–236.
Perry, Michelle, Ruth B. Church and Susan Goldin-Meadow 1988. Transitional knowledge in the
acquisition of concepts. Cognitive Development 3(4): 359–400.
Perry, Michelle and Anastasia D. Elder 1997. Knowledge in transition: Adults’ developing under-
standing of a principle of physical causality. Cognitive Development 12: 131–157.
Petitto, Laura Ann 1988 “Language” in the pre-linguistic child. In: Frank Kessel (ed.), The Devel-
opment of Language and Language Researchers: Essays in Honor of Roger Brown, 187–221.
Pine, Karen J., Nicola Lufkin and David Messer 2004. More gestures than answers: Children learn-
ing about balance. Developmental Psychology 40: 1059–1106.
Ping, Raedy and Susan Goldin-Meadow 2008. Hands in the air: Using ungrounded iconic gestures
to teach children conservation of quantity. Developmental Psychology 44: 1277–1287.
Rowe, Meredith and Susan Goldin-Meadow 2009a. Differences in early gesture explain SES dis-
parities in child vocabulary size at school entry. Science 323: 951–953.
Rowe, Meredith and Susan Goldin-Meadow 2009b. Early gesture selectively predicts later lan-
guage learning. Developmental Science 12: 182–187.
Singer, Melissa A. and Susan Goldin-Meadow 2005. Children learn when their teacher’s gestures
and speech differ. Psychological Science 16(2): 85–89.
Valenzeno, Laura, Martha W. Alibali and Roberta Klatzky 2003. Teachers’ gestures facilitate stu-
dents’ learning: A lesson in symmetry. Contemporary Educational Psychology 28: 187–204.
Wagner, Susan, Howard Nusbaum and Susan Goldin-Meadow 2004. Probing the mental represen-
tation of gesture: Is handwaving spatial? Journal of Memory and Language 50: 395–407.
Susan Goldin-Meadow, Chicago, IL (USA)

50. Coverbal gestures: Between communication

and speech production
1. Introduction
2. The pragmatics of word-gesture matching
3. The physical features of gesture
4. The affiliated speech unit
5. Temporal unfolding
6. Processing accounts – preliminaries
7. Evidence from speech dysfluency
8. Gesture and the facilitation of speech production: A model
9. References
Abstract
Some of the gestures that normally accompany continuous speech (coverbal gestures)
seem to be related to the content of speech and have a form that is related to this content
(iconic gestures). They have a low semantic specificity, they are physically complex, they
have systematic timing relations with the parts of speech to which they relate and they
tend to occur in the neighborhood of speech dysfluencies in both normal and pathological
speech. The present chapter reviews the evidence concerning iconic gestures and suggests
that they reflect the facilitation of lexical processing by recourse to secondary, imagistic
information.
Other coverbal gestures may be more variable in their form and timing in relation to
speech. These include pointing to a particular direction (deictics), pantomiming an action
(pantomimes), indicating measures (quantifiers) or performing a gesture with a specific,
well-known meaning (emblems). These gestures have primarily communicative functions
that act like words or specify the meanings of the accompanying speech.
I present a model (after Hadar and Butterworth 1997) in which I suggest that the
function of most iconic gestures is to facilitate word retrieval. To enhance retrieval, the
cognitive system elicits imagistic information from two distinct stages in the speech pro-
duction system: pre-verbally, from the processes of conceptualization, and post-semanti-
cally, initiated in the lexicon. Imagistic information assists or facilitates lexical retrieval in
three ways: first, by defining the conceptual input to the semantic lexicon; second, by
maintaining a set of core features while reselecting a lexical entry and, third, by directly
activating phonological word-forms.
The model offers an account of a range of detailed gestural phenomena, including the
semantic specificity of gestures, their timing in relation to speech and aspects of their
selective breakdown in aphasia.
1. Introduction
People engaged in speaking normally perform a variety of movements which are not
strictly a part of the speaking act: they nod their heads, change their postures, gesture
with their arms and hands, etc. The question why speakers should do this has occupied
the attention of curious observers from the Greek philosophers to Wundt ([1900] 1973)
50. Coverbal gestures: Between communication and speech production 805
and Freud (1938b). More systematic and rigorous recent studies resulted in viewing
these speech-related body movements as representatives of diverse phenomena, both
behaviorally and functionally. The functions ascribed to gesture included motor (redu-
cing the great number of degrees of freedom with which the articulatory system has to
manage (Hadar 1989)); communicative (helping the speakers communicate their
intended messages more fully (McNeill 2005) or even communicate unintended mes-
sages about the cognitive states of the speaker (Scheflen 1973)); cognitive (helping com-
municators think out their intended messages (Goldin-Meadow 2002)) or ancillary to
speech production (facilitating word retrieval during continuous speech) (Butterworth
and Hadar 1989; Krauss and Hadar 1999). The present chapter elaborates on the latter
function.
In descriptive terms, movements involving the whole trunk (postural shifts) were dif-
ferentiated from those of the head only, and both were separated from arm and hand
gesture (Bull 1983). Within the latter class (termed gesture henceforth), movements
having a definite and accepted meaning independently of the accompanying speech
(such as the V for victory), called emblems, were differentiated from those that do
not have a conventional meaning. The latter, in turn, were subdivided into short, fast
movements, used primarily for emphasis and related to speech rhythm (beats), as
against wider and more complex movements which were said to have some ideational
content, though not usually interpretable without the accompanying speech. The latter,
in turn, could be subdivided into deictic, quantifying, pantomimic and iconic gestures
(Feyereisen and de Lannoy 1991; Rose 2006). The elaboration of the speech productive
functions centers around iconic gestures, or gestures that I consider iconic in principle,
albeit, to a concept that is more remotely related to the words appearing in speech. This
causes the gesture to appear as metaphoric rather than iconic (Hadar and Butterworth
1997; McNeill 2005; Müller 2008).
Iconic gestures were said to depict in their form or dynamics a meaning related to
that of the accompanying verbal utterance, as in the following examples (the word
to which the gesture is presumed to be related – its lexical affiliate (Schegloff 1984) –
appears underlined):
(1) Phrase: “Kind of long, cylindrical shape”;

Accompanying gesture: Hands come together, each shaping a cylinder, touched
at chest level and gradually moved apart in a horizontal plane until each arm is
outstretched (Riseborough 1981).
(2) Phrase: “The network of wires that hooks up the cable car”;
Accompanying gesture: Both hands rise together, fingers interlocking momen-
tarily at chest level (McNeill 1986).
(3) Phrase: “With a big cake on it”;

Accompanying gesture: A series of circular motions with the forearm pointing
downwards with index finger extended (Kendon 1980).
(4) Phrase: “He got the first down”;

Accompanying gesture: Forward motion of forearm with extension of forefinger
(Schegloff 1984).
Iconic gestures have been at the center of much attention in the study of coverbal
behavior, probably because they are thought to reflect some natural processes of sym-
bolic representation, where the term “natural” refers to representations that do not sub-
sume and do not require social conventions to acquire meaning (Kristeva 1978).
Instead, the form of the gesture suggests the meaning through ideational equivalence,
a homeomorphy, so that no process of literal decoding is required to derive the meaning
from the gesture. The iconic gesture is, in that sense, an imagistic proposition: it “shows”
the meaning rather than “denotes” it, to borrow Frege’s terminology (1952). Müller
(e.g., 2008) has consistently taken a much wider view of signification in iconic represen-
tation, where a whole range of meanings – both abstract and concrete – can be gener-
ated by gesture in imagistic modes. While I do not object to this wider use of the term,
I feel that a more constrained usage may be more suitable for a detailed processing
account of iconic phenomena.
The meaning of an iconic gesture is typically vague in itself. Whilst iconic gestures
often have recognizable physical features (see below), their meaning can seldom be de-
rived from their form with any degree of certainty (Feyereisen, Van de Wiele, and
Dubois 1988; Krauss, Morrel-Samuels, and Colasante 1991; Hadar and Pinchas-Zamir
2004). The shape and dynamics of an iconic gesture are not sufficient to derive its mean-
ing, which requires also the identification of that part of the verbal message to which the
gesture relates. Given that understanding, the accompanying speech appears necessary
to interpret the gesture, the outstanding question is why speakers produce iconic ges-
tures at all: the gestures seem redundant to the communicative purposes of the speech
exchange.
2. The pragmatics of word-gesture matching

When gesture accompanies speech, it can represent readings that are equivalent to
those of speech, partial to them, or additional to the verbal meanings. Equivalence
can be seen in example 2 above, where the hooking of fingers accompanies the word
hook. Similarly, a partial reading may be seen in example 3 above, where only one fea-
ture of the semantics of “cake” is expressed in the gesture, namely, its shape (round).
Another example of partial reading is presented in Kendon (1985), where an English-
man asks a newcomer whether he has already seen the New York Times on Sunday. Just
prior to saying “The New York Times” he made a gesture indicating a measure of thick-
ness, in allusion to the bulkiness of this paper on Sunday. Here gesture represented a
non-obligatory part of the semantics of the related word. Additional readings may be
seen in example 1, where not only the shape (cylindrical) of the object is coded in
the gesture, but also its orientation (horizontal) and possibly its length (the length of
the gesture). Other cases where gesture represented features that were additional to
those of the lexical affiliate have been reported by McNeill (1985), for example,
where the phrase “he found a knife” was accompanied by a gesture pantomiming the
grasping of the knife. The semantics of grasp clearly contains the semantics of find.
Cases have also been reported where all possible readings of gesture were claimed to
be incongruous with or contradictory to all possible readings of the lexical affiliate. Un-
fortunately, these cases have never been well documented from the behavioral point of
view, and their hermeneutic procedures are also problematic in allowing a particularly
wide range of readings of the gesture. I bring those cases up here mainly to depict the
kind of phenomena with which a comprehensive theory of gesture may eventually have
to grapple, but it would be pre-mature to offer their analysis before some descriptive
rigor is achieved. Watzlawick, Beavin, and Jackson (1968) describe contradictory mes-
sages of affect and attitude. A typical example is one in which a mother says to her
daughter ‘come give mummy a hug’, whilst turning her upper trunk away from the
girl. Freud (1938b) gives other examples in which persons convey in their acts those
meanings which they try to conceal in their verbal message. Studies of ‘leakage’
(Ekman and Friesen 1969) present gestural phenomena that betray the falseness of
the verbal statement, but these clearly do not refer to iconic gestures and do not rely
on content analysis. Generally speaking, the ability to convey contradictory messages
appears to belong in the repertoire of communicative behavior: it occurs intentionally
in ironic and sarcastic speech or in stylistic paradoxical formulations.
3. The physical features of gesture

A processing account of gesture requires not only the definition of pragmatic relations,
but also the description of those physical features of gesture that may constrain proces-
sing or, methodologically, may help define the domain of relevant behaviors. The need
for a formal characterization in terms of physical features is emphasized by the low
semantic specificity of gestures: in many, perhaps most, cases, a given gesture may be
judged as content-bearing (iconic) without giving a clue as to what this content might
be. Feyereisen, Van de Wiele, and Dubois (1988) have shown that naive observers
can identify iconic gestures from videotapes of naturalistic conversations with the
sound turned off. Indeed, the subjects’ judgments were highly correlated with those
of experienced researchers who performed the task whilst listening to the accompany-
ing speech. However, when subjects were asked to indicate the possible meanings of the
identified gestures, they performed very poorly: missing, in most cases, the meaning in-
dicated in the concurrent speech. In another series of experiments, Krauss, Morrel-
Samuels, and Colasante (1991) asked subjects to identify information-bearing gestures
in silent videotapes, and then to coordinate them with one of the content words of the
accompanying utterance. Again, the first task was performed reliably, whilst the latter
was not. Performance became even less determinate when the correct lexical affiliate
was competing with other words of similar meaning in a forced choice design. The
above results suggest that whilst iconic gestures can be identified on the basis of config-
urational and dynamic properties alone, their interpretation may not be. This, of course,
does not exclude the possibility – now verified with sophisticated methodologies that
assess brain activations during the observation of speech-gesture clips – that the listen-
er’s cognitive system always attempts to use gestural information in order to resolve dif-
ficulties in the understanding of speech (Kircher et al. 2009; Obermeier, Holle, and
Gunter 2011).
Hadar (1989) suggested the following features as characteristic of iconic gestures
generally. Firstly, they are usually relatively complex, where complexity may be defined
by the number of vectorial components required for the geometrical description of the
movement. Iconic gestures typically comprise two or more orthogonal vectorial compo-
nents. This contrasts with the more simple movements (beats), which usually comprise a
single vectorial component (or two of the same magnitude and opposite directions).
Secondly, unlike beats, iconic gestures typically have wide amplitudes. Hadar (1991a)
suggested that because of their relative width, iconic gestures usually involve upper arm
movement of greater amplitude than 1 cm. Moreover, over large samples of gesture,
collected in similar test conditions, greater average amplitude of movement indicated
a greater proportion of iconic gesture. For example, in a single conversation, periods
of greater average amplitude of arm movement imply the occurrence of a greater pro-
portion of iconic gestures (Hadar 1991a). Thirdly, because of their relatively wide
amplitude, iconic gestures also tend to be of relatively long duration. Hadar (1991a)
suggested that most iconic gestures were of greater duration than 0.5 sec. Also, differ-
ences in relative proportion of iconic gestures in the total of recorded body movement
were reflected in comparable differences in average duration of movement.
Combined, the above characteristics provide a powerful heuristic for inferences
about the nature of coverbal movement. This becomes most useful when inferences
have to be made without the analysis of pragmatic and semantic relations as, for exam-
ple, in analyzing gestures of aphasics whose speech may be impossible to interpret. It
is then necessary to derive the category of a movement from its physical features
alone, according to the above characterization. Whilst various non-iconic movements
may have some of the above characteristics, analysis of all four can generate valid
categorization of coverbal movements (Hadar and Yadlin-Gedassy 1994).
4. The affiliated speech unit

An important issue in the literature on coverbal behavior is the definition of the unit
with which iconic gestures are coordinated. The relevant information in this respect
concerns both the meaning relations and the timing relations between gestures and
speech. By general consent, some temporal proximity is required to determine ver-
bal-gestural coordination: words occurring a few sentences away from gesture would
not be considered as lexically affiliated with it. The underlying assumption here, ac-
cepted by most researchers in the field, is that if there is cognitive coordination between
the verbal and gestural channels, then the related processes must temporally overlap. In
examples 1–4 above, there is a clear meaning link between the gesture and the single
underlined word in the accompanying speech. Thus, a gesture depicting a cylindrical
shape is related to the word cylindrical, a circular gesture to the word cake etc. On
the other hand, Kendon (1985) gives examples of gestures related to whole ideas, as
when a daughter said to her mother “You don’t know anything about it”, accompanying
the phrase with a lateral movement in the direction of, and the palm facing, the mother,
as if pushing her (1985: 232). Here, the message of the gesture probably complemented
that of the utterance, conveying something like “Don’t interfere with this business”
(because you don’t know anything about it).
In modular views of informational processing (eg, Levelt 1989; Levelt, Roelofs, and
Meyer 1999) the size of the speech unit with which gesture is coordinated is relevant to
models of gesture production, because it determines the nature of the linguistic, or
indeed conceptual, input to the process. If the gesture is coordinated with idea-sized
units it must originate in pre-verbal processes of message construction (because all
later, verbal processes address only parts of the uttered idea). If, however, it is coordi-
nated with a single word, than it may originate in later stages of processing, where
words are being selected, which is the case made by Butterworth and Hadar (1989).
Smaller speech units have not been claimed to coordinate with iconic gestures, although
beats were said to coordinate with the suprasegmental features of stress and terminal
juncture (Hadar 1989). Here, an even lower locus of processing has been, proposed –
articulatory programming.
McNeill (1992, 2005) holds a view of speech production which is very different from
the above. In his view, linguistic processing evolves from generic units, “growth points”,
containing the meaning of the whole idea-to-be-expressed in an embryonic form. In this
view, the eventual size of the verbal unit is irrelevant to understanding the gesture, but
only the analysis of temporal, pragmatic and semantic relations. Since the present view
differs markedly in its understanding of speech production, it is also different in its view
of iconic gesture, despite the similarity of our understanding of the shared processing of
coverbal gesture and speech.
5. Temporal unfolding
An underlying assumption, accepted by most researchers in the field, is that if there is
cognitive coordination between the verbal and gestural channels, then the related pro-
cesses must temporally overlap. Whilst the precise temporal unfolding of this overlap is
not clear, the current practices and the related conceptual assumptions coordinate ges-
ture with elements appearing in the co-occurring idea (Butterworth and Beattie 1978;
McNeill 1985). In this context the term “idea” refers to a small number of sentences
having a strong thematic link, which appear to be planned as a single unit (Butterworth
1975).
Searching for a temporal overlap is an accepted practice, but the supporting argu-
ments are circumstantial. Firstly, in those cases where pragmatic and semantic analysis
is sufficient for the coordination of gesture and speech, the gesture concurred with the
utterance of the idea within which occurred the related verbal unit. In examples 1–4
above, there was an actual overlap between the duration of the gesture and the produc-
tion of the related word. Secondly, if the information conveyed in the gestural and the
verbal channel is pragmatically and semantically related, as all available evidence
seems to indicate, it is hard to imagine a cognitive procedure that will separate them
in time. Such procedure will necessarily require repeated processing of the same mate-
rial; it will be highly uneconomical in terms of its computational demands and will
offer no specifiable advantages in either production or perception. These considera-
tions are plausible, but leave open the issue of the precise timing of gesture in relation
to speech.
In the present restrictive definition of iconic gestures, most of them have a prepara-
tory phase during which the arm moves to a starting position at relatively low speeds.
Following this comes the iconic part of the gesture (its stroke) and it is this part of ges-
ture which will be referred to as iconic gesture henceforth. Iconic gestures usually start
before the related speech event (Butterworth and Beattie 1978; Kendon 1980; McNeill
1985; Morrel-Samuels and Krauss 1992) with a mean time lag of about 1.00 sec and with
a range of lags from 0 to 2.5 secs (Butterworth and Beattie 1978: Table 4), or to 3.8 secs
(Morrel-Samuels and Krauss 1992: Figure 1). However, Morrel-Samuels and Krauss
(1992: Figure 4) show that duration of gestures, ranging from 0.6 to 7.9 secs, is greater
than their lag in the overwhelming proportion of cases, implying a temporal overlap.
On average, gestures terminated about 1.5 secs after the initiation of their lexical
affiliates.
6. Processing accounts – preliminaries

As I wrote in the Introduction, gestures could have various origins in the mental pro-
cesses involved in face-to-face communication, which origins largely constrain their
functional interpretation. Early researchers inclined towards ethnographic interpreta-
tions, according to which movement originated in interpersonal and emotional pro-
cesses, serving specific communicative functions different from those of speech, and
shaped by cultural factors (Argyle 1975; Efron 1972; LaBarre 1947). Evidence support-
ing this came from the diversity of gestural phenomena between cultures and from
illustrations of semantic and pragmatic mis-matches between speech and the accompa-
nying gesture. While it is clear that some differences exist in the coverbal behavior of
members of different cultures, such differences do not entail differences in the under-
lying processes. For example, some gestures may occur more frequently in one cultural
group than in another, but there is no reason to ascribe a processing consequence to
such a difference. The strength of the thesis for cultural differences is also undermined
by the ability of subjects to interpret the communication of people of a different culture
on the basis of visual information only (Kendon 1978).
As for semantic/pragmatic mis-matches, there may be cases where the information
conveyed in gesture is missing from or contradictory to that of speech. However,
because gesture lacks semantic and pragmatic specificity (Feyereisen, Van de Wiele,
and Dubois 1988; Krauss, Morrel-Samuels, and Colasante 1991), many of the illustra-
tions presented in support of this argument are given to multiple interpretations.
More importantly, as Butterworth and Hadar (1989) suggest, mis-matches between dif-
ferent parts of a communicative message may appear in speech itself, as analyses of slips
of the tongue demonstrate (see, especially, Freud 1938b and Harley 1984). Therefore,
semantic/pragmatic mis-matches do not suffice to demonstrate the separateness of ges-
ture and speech. In addition, if gesture is involved in communication independently of
speech, then some changes in the conditions of communication must be reflected in
changes in gestural behavior. Especially, the differences in gesture between face-to-
face interaction and mediated interaction (e.g., via intercom) would be marked.
Here, again, beyond some quantitative variations, no qualitative differences were
observed in coverbal behavior (Rime 1983; Williams 1977).
These considerations led many contemporary students of coverbal behavior to the
belief that gesture and speech are closely related being, perhaps, different aspects of
a single process (Butterworth and Hadar 1989; De Ruiter 2000; Kendon 1986; Krauss,
Morrel-Samuels, and Colasante 1991; McNeill 1985). However, within this broad frame-
work, explanations differed with regard to the precise linkages between speech and ges-
ture. Three general kinds of linkage have been suggested, relating to 1) pre-verbal
message construction, 2) linguistic processing and 3) phonetic execution of speech.
The last was suggested by Dittman (1972) who argued that gesture was generated in
order to dissipate energies that were developed for articulation, but were redundant
for some reason. As Hadar (1989) and Butterworth and Hadar (1989) argued, such link-
age could, and probably does, hold for a subset of coverbal movements, small hand/arm
“beats”, though Hadar, Steiner, and Clifford-Rose (1984) showed that movements of
the much heavier head would be a more effective dissipator of energy. Wider, symbolic
movements – and certainly iconic movements – would not be effective energy dissipators.
Therefore, this linkage will not be discussed here further.
Conceptualization, in various terminological guises, was favored as an origin of ges-

tures by many researchers on both theoretical and empirical grounds. Theoretically, the
computations involved in language production are different from those involved in ges-
ture, rendering it difficult, if not impossible, to describe linkages in specific stages of
processing. For example, consider the previously mentioned case where the utterance
“… and finds a big knife…” was accompanied by a gesture in which “hand in grip shoots
out to rear and grasps ‘knife’ ” (McNeill 1985: 368). The most parsimonious explanation
here is that gesture originated prior to linguistic processing, where related, yet different,
concepts were considered for articulation. Speech then articulated one concept, and
gesture the other. McNeill (1985, 1992, 2005), who is probably the most influential re-
searcher in the field, has a somewhat different story, whereby both gesture and speech
began processing the communicative intention related to the cartoon picture. At this
point, “grasp” and “find” joined in a single unit of meaning (the “growth point”) having
both linguistic and imagistic components. Later in processing, speech articulated a
lesser development of the growth point, expressing in “find”, whereas gesture processed
the more specific idea of “grasp”. The adequacy of this explanation needs to address a
number of issues.
Firstly, in the majority of known cases the onset of iconic gesture preceded the onset
of the related speech unit (Butterworth and Beattie 1978; Morrel-Samuels and Krauss
1992). Moreover, in some cases, gesture precedence over speech was such that it started
prior to the beginning of the phrase containing the related speech unit (Kendon 1985;
McNeill 1986). This appeared to suggest that some linguistic computations continued to
take place at the time when the computation of gesture had already been completed
and passed on to an execution stage. This may be accounted for in at least two ways:
either gesture originated, again, in pre-verbal processes and prior to the computation
of the related speech, or the computation of gesture is less demanding than that of
speech, and therefore takes less time (Butterworth and Hadar 1989). The two possibi-
lities, of course, are not mutually exclusive. It may be difficult to decide between them,
but the situation is even more complicated, because a gesture often starts immediately
prior to the related word (Butterworth and Beattie 1978; McNeill 1986; Morrel-Samuels
and Krauss 1992). The question now arises as to whether or not different mechanisms
need to be summoned to explain these different timing relations.
If large scale precedence of gesture is indicative of a pre-verbal origin, and if pre-
verbal origin is assumed also for immediate adjacency between gesture and the word,
then why should the production of gesture be withheld until the occurrence of the
word? This is particularly problematic when the lexical affiliate occurs far from
phrase-initial positions. In a similar vein, if gesture precedence reflects the relative
amount of computational demands, then something must happen which distinguishes
between these demands in large scale and small scale precedence. Either one discards
the idea that temporal relations reflect computational origin and process, or one accepts
that the two cases represent real differences.
7. Evidence from speech dysfluency

The tendency of gesture to occur in the neighborhood of speech dysfluencies was ob-
served almost five decades ago, but its implications are still poorly appreciated. In a
series of related studies, Boomer (1963), Boomer and Dittman (1964) and Dittman
and Llewelyn (1969) have shown that, although some body movement occurs during flu-
ent speech, the probability of its occurrence increases markedly with the occurrence of
a pause, especially a hesitation pause, indicative of some breakdown in the process of
speech production. According to them, such pauses typically violate the correspon-
dence between speech flow and syntactic structure. Because no configurational descrip-
tion of movements was presented, the extent to which iconic gestures were implicated is
not clear. Hadar, Steiner, and Clifford-Rose (1984) showed that at least two kinds of
movement are involved, one which is relatively wide and probably iconic, which starts
during the pause, prior to the renewal of speech, and one which occurs after the pause
and usually coincides with the first primary stress of the following prosodic phrase. Con-
fusing the two kinds of movement resulted in ascribing to iconic gestures, as well as
beats, the motor function of dissipating redundant energy (Dittman 1972). Other
research shows that specifically iconic gestures tend to occur in the neighborhood of
hesitation pauses (Butterworth and Beattie 1978; Ragsdale and Silvia 1982).
Spontaneous speech of more than about 1 minute shows cycles of fluency comprising
a relatively hesitant “planning phase”, during which the conceptual content of speech is
determined, followed by a fluent “execution phase”, during which planned units are
produced and most lexical items are selected. Pauses in the fluent phase tend to be
brief and for lexical selection (Butterworth 1975; 1980; Henderson, Goldman-Eisler,
and Skarbek 1966). Iconic gestures tended to appear during the second phase when
most of the current pre-verbal processing has already been completed (Butterworth
and Beattie 1978: Table 1). During these fluent phases, gesture tended to start in a
pause just prior to a noun, a verb, an adverb or an adjective (Butterworth and Beattie
1978: Table 3), classes of words which tend to have lower frequencies in the language,
and also lower estimated transitional probabilities (Beattie and Butterworth 1979).
Now it is this latter measure, and not word frequency per se, that has been found to pre-
dict the occurrence of pauses in speech (Beattie and Butterworth 1979), and pre-
sumably the accessibility of the word to the speaker at that precise moment. It is,
therefore, a reasonable inference, though not one guaranteed by the available data,
that lexical retrieval difficulties are the source of both these dysfluencies and the
associated gestures.
On this evidence, those iconic gestures that are adjacent to their lexical affiliates are
generated following the conceptual planning of speech, but why should these gestures
be associated with hesitation? To investigate this, it is necessary to consider the possible
function of these gestures. There are two possibilities. First, gesture may serve a com-
municative function such as holding the floor while searching the lexicon for the target
word or conveying a part of the intended meaning in case lexical searches fail. If this
function is under voluntary control, then these gestures should vanish in the absence
of a visual contact between interlocutors, as in intercom or telephone conversations.
This is clearly not the case (Moscovici 1967; Rime 1983; Williams 1977). So, this is
unlikely to be the whole story.
Alternatively, gesture may be the exteriorized manifestation of a speech productive
function. The semantic relatedness of gesture to its lexical affiliate implies that some
aspect of the semantics of the current utterance have been determined prior to both lex-
ical selection and gesture generation. According to most models of speech production
(Butterworth 1980; Garrett 1984; Kempen and Huijbers 1983; Levelt 1989; Levelt,
Roelofs, and Meyer 1999), lexical retrieval proceeds in two distinct stages. In the first
stage, an abstract lexical item is retrieved on the basis of a conceptual specification. This
stage is called the “semantic lexicon” by Butterworth (1980) and Howard and Franklin
(1988: Chapter 12) or “lemma retrieval” by others (Kempen and Huijbers 1983; Levelt
1989; Levelt, Roelofs, and Meyer 1999). The second stage uses information from the
first stage to retrieve the phonological word form or “lexeme” (Kempen and Huijbers
1983) from the “phonological lexicon”. In terms of this model, the semantic relatedness
of the gesture could depend on the semantic specification used to access the semantic
lexicon and a likely functional candidate is that of facilitating lexical retrieval. In addi-
tion to the association between gestures, pauses and words of low probability in context,
evidence in support may be obtained by looking at aphasics whose lexical retrieval is
impaired by focal brain damage. These aphasics, the lexical hypothesis predicts, should
show an increase in the incidence of those iconic gestures which are related to lexical
failure.
The published evidence here is problematic. Firstly, in most published studies, not all
the relevant information is available. Thus, in some much-quoted studies such as by
Cicone et al. (1979) and Behrmann and Penn (1984) neither the timing of iconic ges-
tures in relation to lexical affiliates, nor the association with hesitation phenomena
were presented, nor indeed the exact nature of the language deficits of individual pa-
tients. Secondly, in those cases showing semantic difficulties, the determination of prag-
matic/semantic relations between gesture and speech is problematic because the
relevant word may not appear in the utterance. Thus, if a patient accompanies the
phrase “the building was tall” with a circular arm gesture, it is possible that the coordi-
nation between movement and speech is lost due to a gestural impairment (making a
circular gesture for the concept “tall”), or due to a linguistic impairment (choosing
the word “tall” for the concept “round”). There is no simple way of distinguishing
between the two. Therefore, pragmatic/semantic mis-matches between gesture and
speech in aphasic patients may not be interpreted in a specific way, but this precisely
is the basis for claims made by Cicone et al. (1979), McNeill (1985) and others. Of
course, semantics of a gesture may still be ventured on the basis of its configurational
properties, and the rate of incidence of semantic/pragmatic mis-matches may also be
determined (as in Butterworth, Swallow, and Grimston 1981), but beyond that, inves-
tigation of gesture in aphasics showing semantic impairments must be limited to the
analysis of lexical composition and timing relations.
The hypothesis of two links between gesture and speech, one pre-verbal and one
lexical, predicts a number of phenomena in the behavior of aphasic patients. Firstly,
the incidence of pragmatic/semantic mis-matches between gesture and speech will
occur irrespective of whether gesture originated early or late in the speech production
process. Deficits in word retrieval are sufficient to create these mis-matches. If gesture
does not occur in compensation for retrieval deficits, but only as a part of ideational
processing, there will be no increase in the incidence of gesture relative to speech rate.
This seems to be the case in groups of patients with fluent aphasia (Feyereisen 1983;
Hadar 1991b). This is clear in the case of KC, a jargon aphasic, who had an apparently
normal rate and ordinary timing of gestures (Butterworth, Swallow, and Grimston
1981). The authors noted that gestures probably occurred where neologistic jargon
words were produced as substitutes for words that KC was unable to retrieve. Sec-
ondly, word retrieval deficits originating in post-semantic processing will show
compensatory changes in gestural behavior which, again, on a functional account,

should be reflected in higher gesture rates (relative to speech). This could best be
tested in patients who show no semantic deficits (or just a low level of them), and
there is increasing indirect evidence to that effect (Carlomagno et al. 2005; Rose
2006). In one case, AP, showing mild anomic aphasia and no semantic deficit, the
incidence of gestures with configurational and dynamic properties compatible with
iconic gestures was just over 1 gesture per word, whilst the highest rating subject of
a group of normal controls had an incidence of only 0.7 gestures per word and the
mean for a control group of 6 subjects was 0.51 gestures per word (Hadar 1991b).
AP (renamed AU) was compared to another patient (PA) who had marked deficits
of conceptual processing, but relatively preserved lexical processing. Both patients
produced more gestures than controls, but AU produced more content-bearing
gestures (77% compared to PA’s 48%) and primarily in lexical positions and in prox-
imity to a hesitation pause (49% compared to PA’s 11%); PA produced his content-
bearing gestures either at phrase initial positions or during running speech (Hadar and
Yadlin-Gedassy 1994).
To sum up, iconic gestures appear to link to speech at two processing stages, one con-
ceptual and one lexical. The association of both kinds of gesture with speech deficits
generally, and word retrieval problems in particular, suggests that they reflect a process
of facilitation of speech production.
8. Gesture and the facilitation of speech production: A model

Following Hadar and Butterworth (1997), Krauss and Hadar (1999) and De Ruiter
(2000), I suggest a model in which gesture production links up with speech production
at two distinct stages of speech processing, one early (conceptual) and one late (lexical).
The model makes a number of assumptions that are only circumstantially supported by
the available data, but its consequences give a good fit of what we know about iconic
gestures during continuous speech. My first assumption is that conceptual processing ac-
tivates imagistic representations, either in visual form or in sensorimotor form. This
activation is, presumably, automatic and involves conceptual features that are image-
able. Some support for this can be found in evidence showing iconic gesture and pan-
tomime as early forms of communication, both ontogenetically (Capirci and Volterra
2008) and phylogenetically (Gentilucci and Corballis 2006; Kimura 1979). Gesture
may occur in the course of pre-verbal computations just as lip movement may occur
in the course of reading: it is not the result of a fully intentional process, although it
is given, within limits, to voluntary suppression.
The second assumption is that a visual or a sensory-motor image mediates between
conceptual processing and the generation of iconic gestures. The model proposes that
the visual image facilitates word-finding in three distinct ways: by focusing conceptual
processing, by holding core features during semantic re-selection and by directly acti-
vating word forms in the phonological lexicon. Word-finding failures themselves will
tend to elicit imagery and its associated gestures. I believe that this assumption accounts
for the fact that gesture production is not necessarily controlled by the hemisphere that
controls language, even if the latter is necessary for adequate coordination with speech
(Kita and Lausberg 2008; Lausberg et al. 2007).
The general architecture of the model is presented in Fig. 50.1, where a two stage
model of word retrieval is assumed. A number of authors have made broadly equivalent
proposals regarding speech production and I follow a version that is derived from Levelt
(1989) although, currently, the model is not specific enough to differentiate among the
various two-stage models of lexical retrieval. Conceptual (“message level”) processing
constructs or selects a set of semantic features [Fw, Fx, Fy …} to be realized linguistically.
These features are envisaged to be conceptual and perceptual primitives of the sort pro-
posed by Miller and Johnson-Laird (1976), or by Bierwisch and Schreuder (1992). Lexical
entries are defined in terms of feature structures, and access to the entries entails matching
a subset of features, [Fx, Fy …}, to one or more entries in the semantic lexicon.
In Fig. 50.1, the subset [Fx, Fy …} may activate an image via two routes, the concep-
tual and the lexical. An image may, in turn, feed into the formulator process, and hence
into subsequent processes of word finding. The idea here is that the image will be trans-
lated back into semantic features that can then engage in conceptual processing. Now,
on occasion, the translation will not be identical to the original subset that evoked the
image: some features may be lost, or some accidental feature of the image may be in-
cluded, some previously unstressed feature may become salient, etc. For example, the
features designed to elicit the word table, will evoke the visual image of perhaps
some specific table, known to the speaker. Context or intention may determine which
aspect of the image of the table is salient and activates a gesture: its squareness will
elicit a different gesture from its flatness, for example.
Imagistic facilitation: the conceptual route

Consider, for example, a speaker intending to describe a moment in the flight of a glider
in order to explain the way it is steered. The speaker wants to describe a moment of
change in altitude and is deciding which moment he is going to describe. He has
selected the features
(1) (THROUGH(TRAVEL))(x, AIR)

(see Miller and Johnson-Laird 1976: 534)
As he formulates the phrase “in order to make the glider [verb]”, he generates the image
of an upwards motion and gestures an uprising curve with his arm. The feature of upward
motion is fed back into the conceptualization and added to (1), to give (2):
(2) lx (UPWARD (THROUGH(TRAVEL)))(x, AIR)
This set of features accesses the entry for soar and he says “in order to make the glider
soar”. The gesture will start before the beginning of the phrase, during conceptualization,
and will continue through the word “soar”. Alternatively, the speaker could preserve the
notion of a change in altitude, while evoking the image of a downward motion. He would
then produce a descending arm gesture and say “in order to make the glider descend”.
The image may also be used to refocus conceptual processing, so that, in effect, a dif-
ferent lexical entry is sought. Consider a case where a speaker is describing the way in
which her handbag was stolen. She wishes to emphasize that the bag, y, was taken force-
fully and rapidly by x, and has selected the set of features (3).
SPEECH PRODUCTION
Long Term Memory
Spatial Propositional Visual

Representation Knowledge Representation
Gesture Production
Spatial/
Imagistic Visual
Dynamic
shaping
Conceptual Propositional
Gesture
Spatial/dynamic
Specifications
Conceptualizer
Motor
Planning Preverbal
Massage
Lexical
Motor Lexicon Formulator
Gesture
Program
Grammatical
Lemmas Encoder
Motor Word
Phonological
Execution Forms
Encoder
Phonetic
Plan
Overt
Gesture Articulator
Overt
Speech
Fig. 50.1: A model of the relation between iconic gesture generation and speech production.
(3) lx ly (RAPIDLY, FORCEFULLY(TAKE))(x,y) & HOLD (x,y)

Let us suppose that this set of features evokes an image of the event which in turn elicits
the gesture of an open hand reaching out and then closing into a fist. Now, according to
Miller and Johnson-Laird (1976: 574) the following features are entailed by the entries
for grab and snatch:
(4a) grab (FORCEFULLY(TAKE))(x,y) & HOLD (x,y)
(4b) snatch (RAPIDLY(TAKE))(x,y) & HOLD (x,y)
If this exhausts the speaker’s vocabulary for this particular semantic domain, there will
be no entry that satisfies all the features in (3). The image evoked by (3) may allow the
critical or core sense, (5)
(5) TAKE(x,y) & HOLD (x,y),
to be maintained while the speaker searches in conjunction with either FORCEFULLY

or RAPIDLY to produce either “the bag was snatched” or “the bag was grabbed”.
To sum up, conceptual gesture can facilitate word finding in two ways: by ongoing
focusing the conceptual process to a set of features for which a lexical entry exists
and by refocusing when semantic selection fails.
Imagistic facilitation: the lexical route

A feature set, like (4b), accesses or activates a lemma representation, LE1, which yields
an “address” for a phonological word form in the lexicon. Suppose the word form of
“snatch” cannot, for some reason, be retrieved within a socially acceptable period of
time; then the image may be used to maintain the core features in (5) while an attempt
is made to retrieve some other lexical entry, LE2, that fits at least the core features of
the input specification. For this reason, retrieval failure will lead to a new or renewed
activation of the imagistic set of features, resulting in the production of gesture.
In addition to re-selection of a lexical entry LE2, whose retrieval is easier for some
reason, it is possible that there is activation of the word form by gesture which is
not mediated by semantic features. Such direct activations may be the source of image-
ability effects in a variety of lexical tasks (Howard and Franklin 1988; Luzzatti et al.
2002). Halpern (1963) described a patient who improved his phonological retrieval by
recourse to active visual imagination. Following a verbal, dictionary style definition,
the patient closed his eyes and produced an image of the object, prior to being able
to name it. Freud (1938a: 234) described dreams in which visual images generated pho-
nological material whose meaning diverged from that of the image. For example, a phy-
sician dreamt of seeing “on the last phalange of my left forefinger a primary syphilitic
affection” which, in the context of the dream, represented the expression “prima affec-
tio” (“first love”), referring to a recent infatuation of the dreamer. Finally, there is
suggestive evidence that tip-of-the-tongue situations tend to elicit iconic gesture pro-
duction, although this did not seem to improve word retrieval relative to non-gesture
tip-of-the-tongue (Beattie and Coughlan 1999). This evidence is important because
tip-of-the-tongue phenomena are usually considered a failure in the retrieval of the
word form, when lemma retrieval had been successful. It might be that gesture appear-
ing here directly facilitates the retrieval of the word form, without lemma re-selection.
There is also evidence to this effect from aphasia (Hanlon, Brown, and Gerstman 1990;
Rose 2006). However, Hostetter and Alibali (2008) make a convincing case in favor of
a re-run of lexical selection from the earlier point of conceptual re-activation even in
tip-of-the-tongue phenomena or in cases of failure to retrieve word forms in aphasia.
To sum up, I suggest two possible mechanisms in which gesture may facilitate speech
production, one through enhancing message construction (conceptual gesture), which
runs automatically in parallel to speech, and one that facilitates lexical retrieval, espe-
cially in cases where retrieval fails (lexical gesture). Retrieval failure may or may not
manifest in speech dysfluency, depending on the time course of the re-selection process,
in relation to speech rate. In both the conceptual and the lexical cases, facilitation is
mediated through imagistic representation (De Ruiter 2000; Kita and Lausberg
2008), gesture being the motor manifestation of imagistic activation, just as lip move-
ment may be a motor accompaniment of silent reading. Of course, even in these
cases, when all evidence points to a lexical retrieval function, gesture may still have
communicative utility as a visible element of a cognitive process. In fact, there is evi-
dence that iconic gestures often help disambiguate the verbal message itself, let alone
its message (Obermeier, Holle, and Gunter 2011). This, to my mind, renders the
lip-reading metaphor even more instructive.
9. References
Argyle, Michael 1975. Bodily Communication. London: Methuen.
Beattie, Geoffrey W. and Brian Butterworth 1979. Contextual probability and word frequency as
determinants of pauses and errors in spontaneous speech. Language and Speech 22: 201–211.
Beattie, Geoffrey and Jane Coughlan 1999 An experimental investigation of the role of iconic ges-
tures in lexical access using the tip-of-the-tongue phenomenon. British Journal of Psychology
90: 35–56.
Behrmann, Marlene and Claire Penn 1984. Nonverbal communication of aphasic patients. British
Journal of Disorders of Communication 19: 155–168.
Bierwisch, Manfred and Robert Schreuder 1992. From concepts to lexical items. Cognition 42: 23–60.
Boomer, Donald S. 1963. Speech disturbance and body movement in interviews. Journal of Ner-
vous and Mental Disease 136: 263–266.
Boomer, Donald S. and Allen P. Dittman. 1964. Speech rate, filled pause and body movement in
interviews. Journal of Nervous and Mental Disease 139: 324–327.
Bull, Peter 1983. Body Movement and Interpersonal Communication. London: Wiley.
Butterworth, Brian 1975. Hesitation and semantic planning in speech. Journal of Psycholinguistic
Butterworth, Brian 1980. Evidence from pauses. In: Brian Butterworth (ed.), Language Produc-
tion, 1:423–459. London: Academic Press.
Butterworth, Brian and Geoffrey Beattie 1978. Gesture and silence as indicators of planning in
speech. In: Robin Campbell and Philip T. Smith (eds.), Recent Advances in the Psychology
of Language: Formal and Experimental Approaches, 347–360. London: Plenum.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech and computational stages. Psychological
Review 96: 168–174.
Butterworth, Brian J. Swallow and M. Grimston 1981. Gestures and lexical processes in jargon
aphasia. In: Jason W. Brown (ed.), Jargonaphasia, 113–124. New York: Academic Press.
Capirci, Olga and Virginia Volterra 2008. Gesture and speech: The emergence and development
of a strong and changing partnership. Gesture 8: 22–44.
Carlomagno, Sergio, Maria Pandolfi, Andrea Martini, Gabriella Di Iasi and Carla Cristilli 2005.
Coverbal gestures in Alzheimer’s type dementia. Cortex 41: 535–546.
Cicone, Michael, Wendy Wapner, Nancy Foldi, Edgar Zurif and Howard Gardner 1979. The rela-
tion between gesture and language in aphasic communication. Brain and Language 8: 324–349.
de Ruiter, Jan Peter 2000. The production of gesture and speech. In David McNeill (ed.), Lan-
Dittman, Allen T. 1972. The body movement – speech rhythm relationship as a cue to speech en-
coding. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
Dittman, Allen T. and Lynn G. Llewelyn 1969. Body movement and speech rhythm in social con-
versation. Journal of Personality and Social Psychology 11: 98–106.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton.
Ekman, Paul and Wallace V. Friesen 1969. Nonverbal leakage and clues to deception. Psychiatry 3:
88–105.
Feyereisen, Pierre 1983. Manual activity during speaking in aphasic subjects. International Journal
of Psychology 18: 545–556.
Feyereisen, Pierre and Jacques-Domique de Lannoy 1991. Gestures and Speech: Psychological In-
vestigations. Cambridge: Cambridge University Press.
Feyereisen, Pierre, Michele Van de Wiele and Fabienne Dubois 1988. The meaning of gestures:
What can be understood without speech? European Journal of Cognitive Psychology 8: 3–25.
Frege, Gottlob 1952. On sense and reference. In: Peter Geach and Max Black (eds.), Translations
from the Philosophical Writings of Gottlob Frege. Oxford: Basil Blackwell.
Freud, Sigmund 1938a. The interpretation of dreams. In: Abraham Arden Brill (ed.), The Basic
Writings of Sigmund Freud, 181–468. New York: Modern Library.
Freud, Sigmund 1938b. Psychopathology of everyday life. In: Abraham Arden Brill (ed.), The
Basic Writings of Sigmund Freud, 35–178. New York: Modern Library.
Garrett, Merrill F. 1984. The organization of processing structure for language production: Applica-
tions to aphasic speech. In: David Caplan, André Roche Lecours and Allan Smith (eds.), Biolog-
ical Perspectives on Language, 172–193. Cambridge: Massachusetts Institute of Technology Press.
transition. Neuroscience and Behavioral Reviews 30: 949–960.
Goldin-Meadow, Susan 2002. Hearing Gestures: How Our Hands Help Us Think. Cambridge, MA:
Hadar, Uri 1989. Two types of gesture and their role in speech production. Journal of Language
and Social Psychology 8: 221–228.
Hadar, Uri 1991a. Body movement during speech: Period analysis of upper arm and head move-
ment. Human Movement Science 10: 419–446.
Hadar, Uri 1991b. Speech-related body movement in aphasia: Period analysis of upper arm and
head movement. Brain and Language 41: 339–366.
Hadar, Uri and Brian Butterworth 1997. Iconic gestures, imagery and word retrieval in speech.
Semiotica 115: 147–172.
Hadar, Uri and Lian Pinchas-Zamir 2004. The semantic specificity of gesture: Implications for ges-
ture classification and function. Journal of Language and Social Psychology 23: 204–214.
Hadar, Uri, Timothy J. Steiner and Frank Clifford Rose 1984. The relationship between head
movements and speech dysfluencies. Language and Speech 27: 333–342.
Hadar, Uri and S. Yadlin-Gedassy 1994. Conceptual and lexical aspects of gesture: Evidence from
aphasia. Journal of Neurolinguistics 8: 57–65.
Halpern, Lipa 1963. Observations on sensory aphasia and its restitution in a Hebrew polyglot. In:
Lipa Halpern (ed.), Problems in Dynamic Neurology, 156–173. Jerusalem: Hadassah.
Hanlon, Robert E., Jason W. Brown and Louis J. Gerstman 1990. Enhancement of naming in non-
fluent aphasia through gesture. Brain and Language 38: 298–314.
Harley, Trevor A. 1984. A critique of top-down serial models of speech production: Evidence from
non-plan-internal speech errors. Cognitive Science 8: 191–219.
Henderson, Alan, Frieda Goldman-Eisler and Andrew Skarbek 1966. Sequential temporal pat-
terns in spontaneous speech. Language and Speech 9: 207–216.
action. Psychonomic Bulletin & Review 15: 495–514.
Howard, David and Sue Franklin 1988. Missing the Meaning. Cambridge: Massachusetts Institute
Kempen, Gerard and Pieter Huijbers 1983. The lexicalization process in sentence production and
naming: Indirect selection of words. Cognition 14: 185–209.
Kendon, Adam 1978. Differential perception and attentional frame in face-to-face interaction:
Two problems for investigation. Semiotica 24: 305–315.
Hague: Mouton.
Kendon, Adam 1985. Some uses of gesture. In: Deborah Tannen and Murielle Saville-Troike
(eds.), Perspectives on Silence, 215–234. Norwood (NJ): Ablex.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62: 3–28.
Kimura, Doreen 1979. Neuromotor mechanisms in the evolution of human communication. In:
Horst D. Steklis and Michael J. Raleigh (eds.), Neurobiology of Nonverbal Communication
in Primates: An Evolutionary Perspective, 197–220. New York: Academic Press.
Kircher, Tilo, Benjamin Straube, Dirk Leube, Susanne Weis, Olga Sachs, Klaus Willmes, Kerstin
Konrad and Antonia Green 2009. Neural interaction of speech and gesture: Differential acti-
vations of metaphoric co-verbal gestures. Neuropsychologia 47: 169–179.
Kita, Sotaro and Hedda Lausberg 2008. Generation of co-speech gestures based on spatial imag-
ery from the right-hemisphere: Evidence from split-brain patients. Cortex 44: 131–139.
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech and Sign, 93–116.
Krauss, Robert M., Palmer Morrel-Samuels and Christina Colasante 1991. Do conversational
hand gestures communicate? Journal of Personality and Social Psychology 61: 743–754.
Kristeva, Julia 1978. Gesture: Practice or communication. In: Ted Polhemus (ed.), Social Aspects
of the Human Body. Harmondsworth, UK: Penguin.
LaBarre, Weston 1947. The cultural basis of emotions and gestures. Journal of Personality 16:
49–68.
Levelt, Willem J. M. 1989. Speaking: From Intention to Articulation. Cambridge: Massachusetts
Levelt, Willem J. M., Ardi Roelofs and Antje S. Meyer 1999. A theory of lexical access in speech
production. Behavioral and Brain Sciences 22: 1–38.
Luzzatti, Claudio, Rossella Raggi, Giusy Zonca, Caterina Pistarini, Antonella Contardi and Gian
Domenico Pinna 2002. Verb–noun double dissociation in phasic lexical impairments: The role
of word frequency and imageability. Brain and Language 81: 432–444.
McNeill, David 1986. Iconic gestures of children and adults. Semiotica 62: 107–128.
Miller, George A. and Philip N. Johnson-Laird 1976. Language and Perception. Cambridge, MA:
Belknap Press of Harvard University Press.
Morrel-Samuels, Palmer and Robert M. Krauss 1992. Word familiarity predicts the temporal asyn-
chrony of hand gestures and speech. Journal of Experimental Psychology: Learning Memory
and Cognition 18: 615–622.
Moscovici, Serge 1967. Communication processes and the properties of language. In: Leonard
Berkowitz (ed.), Advances in Experimental Social Psychology, Volume 3, 225–270. New
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View.
Obermeier, Christina, Henning Holle and Thomas C. Gunter 2011. What iconic gesture fragments
reveal about gesture–speech integration: When synchrony is lost, memory can help. Journal of
Cognitive Neuroscience 23: 1648–1663.
Ragsdale, J. Donald and Catherine Fry Silvia 1982. Distribution of kinesic hesitation phenomena
in spontaneous speech. Language and Speech 25: 185–190.
51. The social interactive nature of gestures 821
Rime, Bernard 1983. Nonverbal communication nonverbal behavior? Towards a cognitive-motor

theory of nonverbal behavior. In: Willem Doise and Serge Moscovici (eds.), Current Issues in
European Social Psychology, 85–141. Cambridge: Cambridge University Press.
Riseborough, Margaret Gwendolin 1981. Physiographic gestures as decoding facilitators: Three
experiments exploring a neglected facet of communication. Journal of Nonverbal Behavior
5: 172–183.
Rose, Miranda L. 2006. The utility of arm and hand gestures in the treatment of aphasia. Advances
in Speech–Language Pathology 8: 92–109.
Scheflen, Albert E. 1973. How Behavior Means. New York: Gordon and Breach Science.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: John Maxwell Atkinson and
John Heritage (eds.), Structures of Sound Action: Studies in Conversational Analysis, 266–296.
Watzlawick, Paul, Janet Helmick Beavin and Don D. Jackson 1968. The Pragmatics of Human
Communication. London: Faber.
Williams, Ederyn 1977. Experimental comparison of face to face and mediated communication: A
review. Psychological Bulletin 5: 563–576.
Wundt, Wilhelm 1973. The Language of Gesture. The Hague: Mouton. First published [1900].
Uri Hadar, Tel Aviv (Israel)
51. The social interactive nature of gestures:

Theory, assumptions, methods, and findings
1. Introduction
2. What is dialogue?
3. Experimental investigations of social gestures
4. Conclusion
5. References
Abstract
There is a rapidly increasing number of social experiments on gesture use, that is, experi-
ments with at least one condition in which both participants can interact freely. Already
this experimental evidence shows that conversational hand gestures serve a variety of
social interactive functions in face-to-face dialogues. First, speakers in a dialogue gesture
at a higher rate than in a monologue–even when the speaker and addressee cannot see
each other (e.g., on the telephone). Moreover, the form and function of speakers’ gestures
change to fit specific social conditions such as dialogue versus monologue, the presence or
absence of mutual visibility, a shared or different visual perspective, and the presence or
absence of common ground. Gestures also serve several functions other than conveying
information about the topic of the dialogue: They contribute to maintaining the inter-
action process, and they provide the speaker with information about the gesturer’s state
of understanding. Altogether, these findings show how gestures communicate in social
interaction. They also demonstrate that well-designed and controlled experiments need
not reduce dialogue to the study of individuals but can study dialogue itself as an
indivisible unit.
1. Introduction
People use gestures primarily in social interaction, seldom when alone. What we mean
here by social interaction is a face-to-face dialogue, and gestures are the conversational
hand movements that people integrate with their words to convey meaning to each
other in a dialogue. This chapter addresses both how to study the social nature of ges-
tures experimentally and what such studies reveal about how the participants in a dia-
logue influence each other’s gestures. Why do speakers gesture when talking on the
phone? Why do interlocutors describe something to one person with clear, well-formed
gestures then use sketchy, poorly formed gestures when describing the same thing to
another person? Why do speakers gesture at some times and not others? It turns out
that the answer to each of these questions is social, as revealed by experimental studies
of gesturing in face-to-face dialogues.
Limiting this review to lab experiments requires some explanation. All gesture re-
searchers find inspiration in the myriad details of everyday conversations. To pursue
these compelling observations, it is necessary to videotape similar phenomena for care-
ful study, which leads to a methodological choice: The researcher could either find con-
versations that are occurring naturally (e.g., at a party, a family dinner, or a playground)
or elicit controlled experimental dialogues in the lab. Either context advances gesture
research in its own way, and each method has its problematic aspects. Experimental re-
searchers must achieve a balance between control and spontaneity. Some experimental-
ists believe that spontaneous dialogue inherently precludes experimental control, so
they replace one of the participants with a confederate or themselves. We consider
this “dialogicide” to be unnecessary, and our review will show that an increasing num-
ber of studies are achieving the desired experimental control without depriving the dia-
logue of its essential features. Indeed, one of our objectives here is to promote the use
of face-to-face dialogue when collecting experimental data for investigations of social
gesture use. After outlining the defining features of dialogue, we review the gestural
findings from studies that used videotapes of real dialogues in the lab.
2. What is dialogue?
Many scholars have proposed that face-to-face dialogue is the primary site of language
use (e.g., Bavelas 1990; Bavelas et al. 1997; Chafe 1994; Clark 1996; Fillmore 1981;
Goodwin 1981; Levinson 1983; Linell 1982). Clark’s (1996) outline of 10 essential fea-
tures of face-to-face dialogue (see Tab. 51.1) provides a practical checklist for research-
ers who want to ensure that the gestural phenomena they elicit in the lab arise in real
dialogues. Perhaps most germane is what is not a dialogue. Obviously, a speaker who is
alone in the lab, describing something to a camera, is not in a dialogue; there is no
addressee. But simply placing an addressee in front of the speaker is not sufficient.
Both participants must be able to formulate their own actions spontaneously, be self-
determined, and act as themselves.
Tab. 51.1: Ten unique features of spontaneous face-to-face dialogues
1. Co-presence: Both participants are in the same physical environment.

2. Visibility: They can see each other.
(Continued )
Tab. 51.1: Continued

3. Audibility: They can hear each other.
4. Instantaneity: They see and hear each other with no perceptible delay.
5. Evanescence: The medium does not preserve their signals, which fade rapidly.
6. Recordlessness: Their actions leave no record or artifact.
7. Simultaneity: Both participants can produce and receive at once and simultaneously.
8. Extemporaneity: They formulate and carry out their actions spontaneously, in real time.
9. Self-determination: Each participant determines his or her own actions (vs. scripted).
10. Self-expression: The participants engage in actions as themselves (vs. roles).
Note. Adapted from Clark 1996, pp. 9–10 Table 3.
The following 12-second excerpt of interaction (from Bavelas et al. 2008) illus-
trates a controlled experimental task in the laboratory that also fulfills all of Clark’s
criteria. The two participants, sitting across from each other, interacted spontaneously
within their assigned task. A female speaker was describing a drawing of an unusual
18th century dress (shown in Fig. 51.1) to a male addressee who would later have to
select a picture of this dress from an array of similar dresses. In this excerpt, the speaker
was describing the large design on the front of the very wide skirt. The transcript below
includes subscripts and underlining that indicate where each gesture occurred in rela-
tion to words, with the description of gestures and other actions in italics and square
brackets below. Fig. 51.1 shows still photos of most of the gestures.
(Continued )
(Continued )
Fig. 51.1: Screen shots of the gestures made during the speaker’s description of the large design on
the center of the skirt in the picture above. The video is a three-camera split, with a side view of
the speaker and addressee in the lower screen, a front view of the speaker in the upper screen, and
a head shot of the addressee superimposed on the upper screen (except when the speaker stands
and blocks that camera).
Speaker: 1’Kay, so it goes out like that?

[Starts to trace a symmetrical W with both hands; holds end position of
gesture in the air]
Addressee: 2starting where?

[spreads hands in front of himself like hers and holds in position until
gesture 3, below]
Speaker: Starting like- - -

[places hands in two new, different starting positions, but abandons both
and looks perplexed]
’Kay, am I allowed to stand up? Probably. ’Kay,

[……………….………..stands up……….…………………….] [shifts posi-
tion and pushes her hair back]
3starting like here

[Places hands at waist with fingers pointing inwards; holds]
Addressee: 4on her waist?

[moves hands to his waist and holds in same position as hers]
Speaker: 5_______________________________on her dress, 6like a bit under her waist

[keeps hands at waist, but rotates [moves hands down about 2 inches]
so thumbs inwards, holds]
Addressee: Okay
The addressee asked two questions about the location of the design on the dress (“start-
ing where?” and “on her waist?”), each time suspending his own hands as if waiting for
the speaker to specify her description. The speaker worked hard to answer his ques-
tions, ultimately standing up to demonstrate the location of the design on her own
body. Meanwhile, he was simultaneously using his own body as a reference point for
his questions. At the end of the excerpt, the addressee’s feedback (“Okay”) signaled
to the speaker that they had established common ground about where the design was
located. At every moment, his questions and actions were influencing her gestures.
The fluidity, spontaneity, and responsiveness to the addressee of her gestures was
striking and, we propose, similar to gestures she might use in typical, everyday social
interactions, such as describing something with unusual spatial characteristics to an
addressee.
Any addressee (such as a confederate or the experimenter) who must respond only
minimally violates the essential characteristics of dialogue and creates a different, unfa-
miliar kind of interaction. Without the precise reciprocity and collaboration inherent in
a real dialogue, we cannot be sure that the gestures produced in the laboratory have
anything in common with the gestures people use in everyday dialogues. There is evi-
dence to suggest that natural behaviors by an addressee are not limited to back chan-
nels. They are closely linked, in both timing and meaning, to what the speaker is saying
(Bavelas, Coates, and Johnson 2000), and a confederate or experimenter who is trying
to respond in a “neutral” or “standard” manner could have unintended effects (Beattie
and Aboudan 1994). When experimental researchers are knowledgeable about dialogue
and take steps to ensure its essential features in the laboratory, they can be more
confident that the conversation is a true dialogue.
3. Experimental investigations of social gestures

The rest of this article reviews experiments with dialogic data that fulfill Clark’s check-
list of 10 essential features. These studies provide not only a variety of models for social
gesture experiments but also evidence that spontaneous interaction in the lab does not
eradicate experimental control. Each study had sufficient control of variability to reveal
informative gestural differences between conditions.
The review begins with evidence that participants in a dialogue actually gesture more
than speakers who are alone or with a constrained addressee. Beyond that, the available
experimental research demonstrates that
(i) participants in face-to-face dialogues use gestures to manage the interpersonal,

interactive aspects of their conversations;
(ii) they collaborate with gestures, using each other’s gestures to complete a shared
task successfully and efficiently; and
(iii) speakers adapt their gestures to a number of social variables.
3.1. Speakers in dialogues gesture more

A perplexing pattern of results in the gesture literature is that even when their addres-
sees are out of sight (e.g., sitting behind a partition), speakers still gesture, either at
a lower rate than when the addressee was visible (Alibali, Heath, and Myers 2001;
Cohen 1977; Cohen and Harrison 1973; Emmorey and Casey 2001; Krauss et al. 1995)
or at the same rate (Bavelas et al. 1992, Experiment 2; Rimé 1982). Why would speak-
ers gesture to an addressee who cannot see them? Bavelas et al. (2008) pointed out a
confounding variable in this group of studies: These experiments manipulated whether
the participants could see each other, but not whether they were participating in a
dialogue.
Bavelas et al. (2008) hypothesized that participating in a dialogue might indepen-
dently elicit gesturing, regardless of whether the gestures were visible. These authors
tested their hypothesis by creating three conditions that disentangled visibility and
dialogue. Speakers described the 18th century dress in the example above to an
addressee in face-to-face dialogue (visibility plus dialogue), to an addressee on the tele-
phone (dialogue only), or alone in the room to a tape recorder (neither visibility nor
dialogue). The authors used linear regression to separate the effects of visibility and
dialogue on the rate of gesturing, checking first for an effect of visibility, then for an
additional, independent effect of dialogue. They found that restricting visibility did sup-
press gesturing, but participating in a dialogue significantly increased it. For example,
speakers gestured at a significantly higher rate in the telephone condition than in the
tape recorder condition, which differed only in whether the speakers were participating
in a dialogue. Interestingly, although speakers appeared to gesture at a higher rate in
the face-to-face condition compared to the telephone condition, this difference was
not significant, a finding that replicated two of the earlier visibility experiments. Bavelas
et al. (2008) noted that these shared a methodological feature:
In Rimé (1982), Bavelas et al. (1992, Experiment 2), and the present experiment, the
speaker and addressee were both participants who could interact freely and sponta-
neously, when and as they wished. In contrast, the five experiments that found a signif-
icant effect of visibility were also the ones that constrained the addressee (who was
usually the experimenter or a confederate) to a limited repertoire of responses. (Bavelas
et al. 2008: 512)
Comparing the three dialogic studies to the other five (cited above) provided additional
evidence that being in a real dialogue increases gesturing, even if the participants can-
not see each other.
Beattie and Aboudan (1994) focused specifically on how sensitive gesturing was to
how closely the speaker’s context resembled a real dialogue. They asked speakers to
describe a cartoon narrative three times, once alone in a room (nonsocial/monologue),
once to a confederate addressee who was present but unresponsive (social/monologue),
and once to a confederate addressee who interacted freely (social/dialogue). There was
a stepwise pattern of results: Participants gestured least in the nonsocial setting, slightly
more in the social/monologue setting, and most in the social/dialogue setting. Strikingly,
the difference between having a nonresponsive addressee and no addressee at all was
not significant: Speakers talking to an unresponsive addressee did not gesture much
more than when there was no addressee at all. However, the difference between the
social/monologue and social/dialogue conditions was significant. Participants gestured
almost three times more when talking to a freely responding addressee than to an un-
responsive addressee. This effect of what Clark (1996) called extemporaneity has obvi-
ous implications for investigations of social gesture use. The authors had even broader
conclusions, asserting that “in future, those theorists who wish to use gesture as an
important window on the computational stages of the human mind might find it neces-
sary to pay more attention to the social contexts from which their data is extracted”
(Beattie and Aboudan 1994: 260).
3.2. Some gestures are specialized for dialogue

The vast majority of gestures illustrate the topic of discussion, but some appear to have
a different function. Bavelas and colleagues (Bavelas et al. 1995; Bavelas et al. 1992)
investigated these non-topical gestures using a variety of videotaped, task-oriented dia-
logues, including retelling a cartoon narrative, explaining how to get a book out of the
library, and telling a close-call story. They began by locating all of the gestures in these
face-to-face dialogues, then excluding gestures that depicted any aspect of the topic of
the dialogue. Approximately 15% of the gestures remained, all of which shared two
common features. First, the gesturer’s hand was oriented toward the addressee (e.g.,
quickly pointing one or more fingers at the addressee or displaying an open palm to
the addressee). Second, the gesture referred directly to the addressee; a paraphrase
of the gesture would include the word “you.” For example, while discussing how to
use the library’s card catalogue, one student had made suggestions about how to look
up books. Later, the other student referred to this suggestion, saying “then look it up
under the appropriate thing.” As he said “appropriate,” the speaker flicked his finger
toward the addressee. This gesture did not refer to using a card catalogue; it cited the
other person as the original source of the information – the gestural equivalent of “as
you said” (Bavelas et al. 1992: 475–476). The authors proposed that participants used
these gestures to serve a variety of interactive purposes, such as requesting help with a
word search (e.g., by holding the palm out as if to receive something from the addressee)
or referring to something that the participants had discussed earlier and was now com-
mon ground (e.g., by flicking the hand toward the addressee, which in this context indi-
cates “as you know”). These gestures appeared to be related to the social interaction
independently of topic, so Bavelas et al. (1992) proposed that they had interactive rather
than topical functions.
Bavelas et al. (1992; 1995) embarked on a series of experiments to confirm that these
gestures with interactive functions were indeed linked to dialogue, while the other ges-
tures were linked to topic. In the first study (Bavelas et al. 1992: Experiment 1), parti-
cipants described the same material under two different conditions: the speaker was
talking to an addressee or was alone in the room. The latter group talked to the camera,
although they knew the experimenters were watching from the control room. As pre-
dicted, there were more interactive gestures when the speakers were in dialogue than
when alone; the topic gestures showed the opposite trend. In the next study, Bavelas
et al. (1992: Experiment 2) tested the hypothesis that, if speakers made interactive ges-
tures solely for interactive purposes, these should be less likely to occur if the addressee
could not see them. As predicted, the rate of interactive gestures was significantly
higher for participants who were speaking face to face compared to those speaking
through a partition. The same variable did not significantly affect the rate of topic ges-
tures. To test their hypothesis that gestures with interactive functions were an efficient
way for the speaker to include and refer to the addressee without interrupting the topic
of the dialogue, the authors developed a reliable redundancy analysis for the gestures in
both of these experiments. The analysis revealed that interactive gestures were signifi-
cantly less likely to be redundant with words than topic gestures were. Whereas the
modal topic gesture added no information to the phonemic clause it accompanied,
the modal interactive gesture was completely non-redundant with the words.
Together, these results confirmed that gestures with interactive functions responded
to the availability of a visible addressee, but what if the visible addressee was not
engaged in interaction with the speaker? The next experiment (Bavelas et al. 1995:
Study 1) addressed this question with two conditions that were both face-to-face
dyads. In one condition, the dyads retold a cartoon together in a full dialogue. In the
other condition, one participant retold the first half of the cartoon, then the other par-
ticipant retold the second half; they could not help each other, so they were in sequen-
tial monologues. The results showed that, even though there was a visible addressee in
both conditions, the dyads in the full dialogue condition made interactive gestures at
significantly higher rate than those in the sequential monologues. Finally, they tested
whether addressees understood and responded to the various functions of the interac-
tive gestures. For example, when the speaker made a word-searching interactive ges-
ture, would the addressee provide a word, even though the speaker had not asked
for assistance verbally? One set of analysts identified the specific function of a large ran-
dom sample of the interactive gestures in the data, and another set of analysts classified
the addressee’s response. The predicted effect of interactive gestures on the addressees’
immediately subsequent behavior was statistically significant. Altogether, the series of
studies showed that this relatively small group of previously unnoticed gestures seem to
be an efficient way for interlocutors to manage the social requirement of including and
coordinating with each other, moment by moment, in their dialogue.
3.3. Gestures can be collaborative

Furuyama’s (2000) study showed how dyads built on and elaborated each other’s ges-
tures. In each dyad, Furuyama taught one participant (the Instructor) how to fold an
origami figure. Then, he videotaped the Instructor teaching the other participant (the
Learner) how to do it. The dyads did not have any origami paper for this task, so
they could only use words and gestures. The data revealed what Furuyama called col-
laborative gestures, “which interact with the gestures of the communicative partner (…).
The meaning of this type of gesture crucially depends on the interlocutor’s gesture,
since the interlocutor’s gesture is a part of the collaborative gesture as a whole” (Fur-
uyama 2000: 105). For example, an Instructor was gesturally depicting the origami
paper as having been folded into an imaginary triangle, and was starting to depict
the next fold. The Learner interrupted and took over, saying “and you take this corner”
while gesturally picking up and moving the corner of the instructor’s triangle – which
was, in fact, empty space (Furuyama 2000: 106). The Instructor’s gestures had created
a virtual origami paper, and both participants’ gestures could maintain, manipulate,
and even refer to it deictically (“this corner”). Almost 18% of the 400 Learners’
gestures analyzed were collaborative gestures (calculated from Furuyama 2000: 108,
Table 5.2). The Learners also contradicted the generalization that individuals only
gesture with their own speech, because they often timed their collaborative gestures
with the Instructor’s speech rather than their own. Moreover, whether the Learners
made collaborative gestures depended on the form of the Instructor’s gestures. In
sum, participants used collaborative gestures in intricate coordination to complete a

potentially difficult spatial task with ease.
3.4. Gestures can monitor understanding

Clark and Krych (2004) showed that, when collaborating to accomplish a joint task, the
participants can use gestures as a means of providing moment-by-moment feedback
about their mutual understanding of instructions. Dyads in this study consisted of Di-
rectors, who had models constructed from Lego blocks, and Builders, who could not
see the models but who had to build them according to the Director’s instructions.
Clark and Krych (2004) manipulated whether participants could see each other or
not and whether they could interact in dialogue or not. They found that the dyads
who completed this task significantly more quickly were the ones in which the Director
could see the Builder’s workspace and in which the two participants could interact
freely. In order to discover the behaviors and processes that led to this advantage,
they reliably analysed the details of these particular interactions. They found that the
Builders’ gestures were central to a dyad’s efficiency. Builders often responded to Direc-
tor’s instructions with provisional actions, such as pointing to a particular block, picking it
up to exhibit it to the Director, or poising a block over a possible position. These actions,
all of which the authors considered to be communicative gestures, provided overt displays
of the Builders’ current state of understanding, and they had an immediate influence
on Director’s utterances. For example, when the Builder’s gesture indicated a correct
understanding, the Director often broke off further instruction about that step – even
in midsentence – and moved on to the next one. In contrast, when the Builder’s actions
proposed a potentially incorrect step, the Director would insert a precisely timed correc-
tion to redirect the Builder. Thus, Clark and Krych (2004) demonstrated that one of the
advantages of a face-to-face dialogue is the availability of gestures, which the participants
can use to monitor (and to correct) their mutual understanding.
3.5. Social variables shape gestures and their relationship to words

3.5.1. Visibility
This literature review opened with a description of a study by Bavelas et al. (2008)
showing that the apparent effect of visibility on the rate of gesturing has been con-
founded by a strong effect of dialogue. However, the same data set (gestures used
when describing the 18th century dress) showed that visibility does have a strong effect
on gesture features other than their rate. Gestures in the face-to-face condition were
more communicative than gestures in the telephone and tape recorder conditions, in
four ways: First, in face-to-face dialogue, speakers made life-sized gestures that de-
picted the dress in proportion to the size of a human body. These speakers often
placed these gestures around their own body, as in the example at the beginning of
this chapter. In contrast, speakers on the telephone and tape recorder conditions
made small gestures that matched the size of the picture of the dress. Second, in
face-to-face dialogue, speakers made interactive gestures at a higher rate than in the con-
ditions where no one would see them. This replicated the results of Bavelas et al. (1992:
Experiment 2). Third, in face-to-face dialogue, speakers’ gestures were significantly
less likely to be redundant with the concurrent words. That is, the gestures were more
likely to contribute unique information, which was not being conveyed verbally. In tele-
phone dialogues and tape recorder monologues, the information in the gestures added
little information over and above what the immediately accompanying words conveyed.
The fourth, related finding was that speakers in the visibility condition accompanied sig-
nificantly more of their gestures with verbal deictic expressions (such as “here” or
“there”). These deictics drew attention to the gesture, which carried information
that was not in the words. Speakers in telephone dialogues and tape recorder monolo-
gues rarely marked their gestures with deictic expressions. All four of these effects sug-
gest that the speakers whose addressees could see them drew on their gestures as a
communicative resource, while the other speakers did not.
Kimbara (2006, 2008) demonstrated another effect of visibility on gestures, namely,
whether interlocutors who can see each other tend to use similar gestures for the same
events. Gestures depicting a particular referent can do so in a variety of ways. For exam-
ple, speakers can demonstrate someone running by moving their own arms as though
running, by wiggling two fingers to represent little running legs, or by tracing a path
in the air. In the 2006 study, Kimbara showed that interlocutors tended to encode ges-
tures about the same referent in the same way (e.g., they might both wiggle their fingers
to show a man running). However, this effect might have nothing to do with seeing each
other’s gestures. It could emerge solely from participants’ shared linguistic context and
subsequent convergence on linguistic encoding (i.e., they use similar words in a similar
context). Kimbara (2008) tested this possible alternative explanation by varying visibil-
ity. Ten dyads watched 10 short excerpts from cartoons. After each excerpt, the dyad
retold the excerpt together “in as much detail as possible so that a person who had
not seen the clips could understand what was being described” (Kimbara 2008: 126).
To test the effect of visibility, Kimbara alternated the retellings between two conditions:
the participants could see each other or they had a blind pulled down between them
so that they could hear but not see each other. Kimbara located all instances of co-
referential gesture pairs and analyzed them for convergence of the gestures’ form.
The results showed that when participants could see each other, their gesture forms
converged significantly more often than when they could not (66% vs. 30%). Thus
the similarly in form could not be attributed to shared linguistic context and verbal
encoding; it was an effect of visibility, of being able to see each other’s gestures.
3.5.2. Addressee location

Özyürek (2000, 2002) examined the effects of participants’ spatial relationship to each
other on the form of speakers’ gestures. The speakers in these experiments had watched
a cartoon, which they then narrated to addressees who had not seen it. In the 2000
study, speakers narrated the story twice, once to a pair of addressees who were seated
at either side of the speaker in a triangular formation and once to a single addressee
who was sitting on one side of the speaker. In the 2002 study, the speakers narrated
the story to only one addressee, who sat either directly across from the speaker or
off to one side. The cartoon included several situations where characters or objects
moved from one place to another (e.g., running into a building or climbing up a drain-
pipe). In both studies, Özyürek analysed the gestures speakers used to describe these
movements. The speakers’ gestures that depicted “into” and “out of ” differed
according to the location of the addressee or addressees: Speakers’ gestures represented

the direction as “into” or “out of ” the space that the participants shared, which differed
in each experimental condition. That is, speakers accommodated to whether the
addressee was sitting to the side rather than facing them, presumably so that the mean-
ing of their gestures would be clear to their addressees. Özyürek also provided evidence
that it was specifically the shared physical space between participants that led to the dif-
ferences in gesture trajectory. First, speakers did not change their verbal descriptions
when the shared space changed, so the adjustments in gesture direction and orientation
were not due to changes in their speech. Second, there was no change in gestures for the
movements that would not be affected by these seating conditions. For example, a ges-
ture indicating “up” looked the same whether participants were sitting face to face or
side by side. It was the relationship between the meaning that the speakers were con-
veying and the configuration of their shared space that determined differences in the
speakers’ gestural representations of the same movements in the cartoon.
3.5.3. Shared visual perspective

Bangerter (2004) investigated how participants can elect to contribute information
either in words or in gestures in order to minimize the collaborative effort required
to establish mutual understanding. Pairs of participants did a referential communication
task. The Director in each pair had an array of photos of faces that were arranged in a
particular order. The Matcher in each pair had to arrange his or her own set of photos in
the same order, according to the Director’s instructions. Although they couldn’t see
each other’s set of photos, the participants could refer to a larger array of photos
that was on a board that both participants could see. Participants were free to use what-
ever means of communication they chose to refer to the photos, including words (i.e.,
descriptions of the photos) or gestures (i.e., pointing at the display board). Bangerter
manipulated two variables. The first was whether Directors and Matchers could see
each other and thus use pointing to refer to the photos. When the participants could
not see each other, they used significantly more words to do the task. Presumably,
this difference was because they could not use pointing, only words. Second, he manipu-
lated the distance between the board and the participants, which changed their shared
visual perspective and therefore the relative utility of words and pointing. At close dis-
tances, pointing was an efficient and unambiguous strategy for referring to particular
photos, and both participants used pointing to indicate a particular photo rather than
verbally describing the location and features of the photo. They often combined point-
ing with verbal deixis (this, that, here, there), which suppressed or replaced full verbal
descriptions. At farther distances, where pointing at the closely grouped photos
would be more ambiguous, participants used verbal descriptions. These findings
strongly suggest that the participants were systematic, flexible, and opportunistic in
how they use their words or gestures to establish reference as quickly and accurately
as possible, given their shared perspective.
3.5.4. Addressee knowledge

Common ground is another social factor that influences speakers’ gestures. Gerwing
and Bavelas (2004) manipulated common ground experimentally using several groups
of three participants. One participant was randomly assigned to be the target partici-
pant. The experimenter seated all three out of each other’s sight and gave each of
them some toys (e.g., a “finger trap” or “whirligig”). The target participant had the
same toys as one of the participants (creating common ground) but a different set
from the other (and therefore no common ground). When they had all finished trying
out their set of toys, the experimenter told them which two had had the same set
and which one had a different set. The experimenter then asked the three participants
to briefly discuss what they had done with their toys, but to do so in assigned pairs
with the extra participant waiting out of earshot. In the first two dialogues, the target
participant talked to each of the other two participants in counter balanced order,
which created a within-subjects comparison. Because the objects did not have well
standardized names, the participants depicted what they had done using gestures.
When their addressee did not share common ground, the target participants’ initial
gestures were reliably judged to be more complex, precise, or informative than
when the addressee did share common ground. Common ground led to sketchier,
“sloppier” gestures, presumably because these were all that this addressee needed.
In addition, a qualitative analysis of the pairs without common ground showed that
their gradual accumulation of common ground over the course of the dialogue simi-
larly influenced the form of their gestures. Successive gestures referring to the same
object showed a given-new effect, that is, gestures for new information about a refer-
ent were sharp and clear, and later gestures for the same referent became more
schematic.
Holler and Stevens (2007) followed up these findings by including both speech and
gesture in their analysis so they could investigate how common ground influenced the
interplay between the two modalities. They focused on how participants’ expression of
size in speech and gesture was influenced by whether the participants shared common
ground. The authors used a referential communication task, specifically, several
“Where’s Wally” pictures that prominently featured large objects such as a gigantic
knots in a pipe or an unusually large, dome-shaped roof on a house. Each speaker de-
scribed these pictures and the locations of these large objects either to an addressee
with whom the speaker had previously looked at the pictures, that is, who had seen
the same pictures (common ground condition) or to an addressee who had not seen
them (no common ground condition). Holler and Stevens (2007) found that how speak-
ers used words and gestures to refer to the large objects differed significantly according
to whether they shared common ground or not. Although speakers in both conditions
used verbal size markers (e.g., “big,” “huge,” “enormous”) equally, those in the no com-
mon ground condition often accompanied these words with gestures that were large en-
ough to depict the size accurately. Speakers in the common ground condition did not
accompany their verbal size markers with gestures as often; when they did, the ges-
tures were significantly smaller than the gestures produced in the no common ground
condition. In summary, in the common ground condition, speakers expressed the size
of the objects with their words but used their gestures in the no common ground con-
dition. Holler and Stevens (2007) pointed out that researchers should consider both
the linguistic and imagistic sides of utterances when deciding whether speakers
have become more elliptical. In other words, analyzing only speech or only gesture
is not sufficient.
4. Conclusion
The rich group of experiments reviewed here documents the important role that hand
gestures play in language as a social process. Moreover, these experiments illustrate the
variety of tasks and variables that can be used to elucidate gestures’ functions, as well as
an equal variety of dependent variables to assess effects of these variables. The findings
include evidence that participants in a dialogue gesture at a higher rate than speakers
who are in a monologue or with a constrained addressee. In addition, participants in
face-to-face dialogues use interactive gestures that function specifically to manage inter-
active aspects of their conversations. Furthermore, participants can monitor each
other’s gestures to display and check their mutual understanding in order to complete
a task together. Finally, speakers adapt their gestures to a wide variety of social vari-
ables, including whether the participants can see each other, where they are sitting in
relation to each other, what they can see, and whether the addressee is familiar with
what the speaker is describing.
As noted at the outset, our primary theoretical assumption is that the fundamental
site of social interaction is face-to-face dialogue – unmediated, spontaneous, and recip-
rocal conversations between at least two interlocutors. Because face-to-face dialogue is
also the primary site of conversational gestures, the understanding of these gestures
must draw as much on social interactive processes as on factors attributed to individuals
(e.g., cognition, culture, personality, etc.). These two assumptions have strong method-
ological implications. None of the above findings could have arisen in experiments that
used an individual alone, with an experimenter, or with a confederate; they required a
real addressee. If, as Lockridge and Brennan (2002) have shown in another area, there
are different results for the same experimental procedure when using real versus confed-
erate addressees, then methods that do not include real social interaction in face-to-face
dialogue have questionable generalizability to the natural use of conversational gestures.
Fortunately, the number of truly social experiments on gestures has increased rapidly
in the past decade, showing that it is possible to do experimental investigations of ges-
tures in full face-to-face dialogues. The variety of published experiments reviewed here
have already contributed both substantive results and methodological exemplars. It is
therefore possible, fruitful, and necessary to leave the narrow constraints of reduction-
ism in order to continue to advance our knowledge of the social interactive nature of
gestures. The neuropsychologist Alexander Luria pointed out that the principle of
reduction to the smallest possible element is not a scientific necessity. Indeed,
there are grounds to suppose that it may be false. To study a phenomenon, or an event, and
to explain it, one has to preserve all of its basic features. (…) It can easily be seen that re-
ductionism may very soon conflict with this goal. One can reduce water (H2O) into H and
O, but – as is well known – H (hydrogen) burns and O (oxygen) is necessary for burning;
whereas water (H2O) has neither the first or second quality (…). In order not to lose the
basic features of water, one must split it into units (H2O) and not into elements (H and O).
(Luria 1987: 675, emphasis in original)
We propose that, in order to learn about the social interactive nature of gestures, the
indivisible unit of study must be a true dialogue. Attempting to learn about dialogue
from the study of individuals will lose the basic features of social interaction, just as
studying hydrogen and oxygen separately loses the basic features of water.
5. References
Alibali, Martha W., Dana C. Heath and Heather J. Myers 2001. Effects of visibility between
speaker and listener on gesture production: Some gestures are meant to be seen. Journal of
Bangerter, Adrian 2004. Using pointing and describing to achieve joint focus of attention in dia-
logue. Psychological Science 15: 415–419.
Bavelas, Janet Beavin 1990. Nonverbal and social aspects of discourse in face-to-face interaction.
Text 10: 5–8.
Bavelas, Janet Beavin, Linda Coates and Trudy Johnson 2000. Listeners as co-narrators. Journal of
Personality and Social Psychology 79(6): 941–952.
Bavelas, Janet Beavin, Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for
dialogue. Personality and Social Psychology Bulletin 21(4): 394–405.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive ges-
tures. Discourse Processes 15: 469–489.
Bavelas, Janet Beavin, Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing
on the telephone: Independent effects of dialogue and visibility. Journal of Memory and Lan-
guage 58: 495–520.
Bavelas, Janet Beavin, Sarah Hutchinson, Christine Kenwood and Deborah Hunt Matheson 1997.
Using face-to-face dialogue as a standard for other communication systems. Canadian Journal
of Communication 22: 5–24.
Beattie, Geoffrey and Rima Aboudan 1994. Gestures, pauses and speech: An experimental inves-
tigation of the effects of changing social context on their precise temporal relationships. Semi-
otica 99: 239–272.
Clark, Herbert H. and Meredyth A. Krych 2004. Speaking while monitoring addressees for under-
standing. Journal of Memory and Language 50: 62–81.
Cohen, Akiba A. 1977. The communicative functions of hand illustrators. Journal of Communica-
tion 27: 54–63.
Cohen, Akiba A. and Randall P. Harrison 1973. Intentionality in the use of hand illustrators
in face-to-face communication situations. Journal of Personality and Social Psychology 28:
276–279.
Emmorey, Karen and Shannon Casey 2001. Gesture, thought and spatial language? Gesture 1(1):
35–50.
Fillmore, Charles J. 1981. Pragmatics and the description of discourse. In: Peter Cole (ed.), Radical
Pragmatics, 143–166. New York: Academic Press.
Furuyama, Nobuhiro 2000. Gestural interaction between the instructor and the learner in origami
instruction. In: David McNeill (ed.), Language and Gesture, 99–117. Cambridge: Cambridge
University Press.
Gerwing, Jennifer and Janet Beavin Bavelas 2004. Linguistic influences on gesture’s form. Gesture
4(2): 157–195.
Holler, Judith and Rachel Stevens 2007. The effect of common ground on how speakers use ges-
ture and speech to represent size information. Journal of Language and Social Psychology
26(1): 4–27.
Kimbara, Irene 2006. On gestural mimicry. Gesture 6(1): 39–61.
ior 32: 123–131.
Krauss, Robert M., Robert A. Dushay, Yishiu Chen and Frances Rauscher 1995. The communica-
tive value of conversational hand gestures. Journal of Experimental Social Psychology 31(6):
533–552.
Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.
Linell, Per 1982. The Written Language Bias in Linguistics. University of Linkoping, Sweden:
Department of Communication Studies.
Lockridge, Calion B. and Susan E. Brennan 2002. Addressees’ needs influence speakers’ early syn-
tactic choices. Psychonomic Bulletin and Review 3: 550–557.
Luria, Alexander R. 1987. Reductionism in psychology. In: Richard Langton Gregory (ed.), The
Oxford Companion to the Mind, 675–676. Oxford: Oxford University Press.
Özyürek, Asli 2000. The influence of addressee location on spatial language and representational
gestures of direction. In: David McNeill (ed.), Language and Gesture, 64–83. Cambridge: Cam-
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects
of addressee location on representational gestures. Journal of Memory and Language 46:
688–704.
Rimé, Bernard 1982. The elimination of visible behavior from social interactions: Effects on ver-
bal, nonverbal and interpersonal variables. European Journal of Social Psychology 12: 113–129.
Jennifer Gerwing, Victoria (Canada)

Janet Bavelas, Victoria (Canada)
V. Methods
52. Experimental methods in co-speech

gesture research
1. Introduction
2. Can we capture co-speech gestural behaviour in experimental settings?
3. Some core questions in the field of co-speech gesture research
4. Experimental methods and paradigms
5. Some methodological shortcomings and limitations
6. Conclusion
7. References
Abstract
This chapter provides an introductory overview of some of the basic experimental
paradigms traditionally employed in the field of gesture studies to investigate both com-
prehension and production in adult and child populations. With respect to gesture pro-
duction, the chapter taps into paradigms used for exploring both intra-psychological
and inter-psychological functions of co-speech gestures. At the same time, the present
chapter aims to shed light on some of the core questions researchers have been addressing
in using the described paradigms, concluding with a reflection on some of the methodolog-
ical shortcomings and limitations of the respective paradigms and methods used.
1. Introduction
Co-speech gestures occur in all cultures (Kita 2009) and in a wide variety of conversa-
tional contexts. This includes more formal settings, such as doctor-patient/therapist-
client interaction (Duncan and Niederehe 1974; Heath 1989, 2002), teacher-pupil
interaction (Roth 2001), work contexts (Mondada 2007) and official gatherings (Streeck
1994), as well as more informal conversational contexts, for example in interactions
with acquaintances, friends and family (Efron 1941; Goodwin 1986; Kendon 1980,
1985; Müller 2003; Seyfeddinipur 2004; Streeck 1994). The kinds of gestures used in
these contexts and the functions they fulfil are manifold. Explorations of co-speech ges-
tures occurring in natural contexts have been the origin of gesture studies and they
remain a prime focus in the field of gesture studies.
2. Can we capture co-speech gestural behaviour

in experimental settings?
While analyses of gestures “in the real world” yield important insights in their own
right, they are also an important source of inspiration for experimental research on
gesture. This chapter focuses on the latter, in particular the experimental techniques
838 V. Methods
and paradigms that have been employed to find answers to some of the central ques-
tions in the field of co-speech gesture (mainly of a psychological nature). Before dis-
cussing these in detail, it is important to emphasise that one fundamental assumption
underlying this research is that the behaviour we elicit in the laboratory is represen-
tative of what we observe outside of it. Of course, one possibility is that, in any given
experiment, the chosen stimulus material influences the number and nature of ges-
tures used; therefore results have to be considered in their particular context. Largely,
however, the things participants are asked to talk about in gesture experiments tend
to also feature frequently in everyday talk, including spatial relations, actions, objects
and persons. One potentially critical issue which remains, though, is the common use
of cartoon pictures or videos. The semantics of a cartoon world are radically different
to the world we live in – literally anything can happen, even physical impossibilities. It
is therefore possible that speakers use gestures differently in talk about more mun-
dane events, especially if we consider that one use of gesture may be to channel
and influence addressees’ inferences; these, of course, could be crucially different
when trying to process talk about rather unpredictable cartoon worlds (cf. Holler
2003; Holler and Beattie 2003a). However, because, to the best of my knowledge,
no study to date has systematically investigated to what extent gestural behaviour
in- and outside the lab are the same or different, we currently have no reason to dis-
count experimental research on these grounds (but it is certainly an issue requiring
future research).
Further, participants in experimental settings are providing us with insight into
spontaneously produced gestural behaviour (sometimes, bodily behaviour may be
slightly inhibited initially since research ethics require us to inform participants
when they are video-recorded, but warm up conversations tend to get around
this problem). Moreover, we know that we observe at least some of the same phe-
nomena in experimental and non-experimental gesture data. For example, imagistic
gestural representations are common in everyday conversation (e.g., Kendon 1985),
and they occur frequently in laboratory-based communication, too (e.g., McNeill
1992); similar parallels can be claimed for interactive gestures (these involve the
addressee in the interaction and are often associated with handing over a turn or
keeping the floor) which have been observed both in the lab (Bavelas et al. 1995;
Bavelas et al. 1992) as well as in everyday talk (Duncan and Niederehe 1974; Kendon
2004; Streeck and Hartge 1992). Further examples include the so-called “return ges-
ture” (de Fornel 1992) where one participant in a conversation repeating another’s
gesture, which has also been observed in experimental contexts (Holler 2003; Holler
and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara 2006). The parallels
mentioned here are but a few and although no hard and fast evidence they serve
to illustrate the point that gestural behaviour can be observed in experimental
settings which, at least in some important aspects, is like that occurring outside the
laboratory.
Of course, all this is altogether less of an issue if we assume that co-speech ges-
tures are largely independent from the interactive processes happening between the
people talking. The basic requirement is here that the experimental tasks participants
engage in appropriately model the cognitive demands encountered by participants in
communication. This brings us to the questions gesture researchers have been trying
to answer.
52. Experimental methods in co-speech gesture research 839
3. Some core questions in the field of co-speech gesture research

Of course, the number of questions researchers interested in co-speech gestures have
tackled is vast. One of the major debates which has dominated the field in recent
years focuses on whether co-speech gestures are indeed an integral part of language,
a notion put forward by Bavelas and Chovil (2000), Clark (1996), Kendon (1980,
2000), and McNeill (1985, 1992), amongst others. One aspect of this debate focuses
on the function(s) of co-speech gestures and the idea that they may not necessarily
be communicatively intended, but, rather, benefit the speaker him or herself (such as
through the facilitation of lexical access (e.g., Krauss, Chen, and Gottesman 2000;
Rauscher, Krauss, and Chen 1996) or conceptual planning (e.g., Hostetter, Alibali,
and Kita 2007; Kita and Davies 2009)). A more overarching question, then, is why
we gesture when we speak – which, in addition to the discussion about inter- and intra-
personal functions of gesture, also addressees the evolutionary roots and development
of co-speech gestures and language (Corballis 2003; Kelly et al. 2002; Rizzolatti and
Arbib 1998; Tomasello 2008).
4. Experimental methods and paradigms

The paradigms that have been employed to answer these questions experimentally are
based on a wide range of methods and techniques. The following sections will provide a
general overview of these (rather than in-depth discussions of individual paradigms) –
however, due to limitations on space, this overview cannot be completely exhaustive
in scope.
4.1. Co-speech gesture comprehension

How we comprehend and process co-speech gestures has been explored experimentally
to a large extent using “play-back paradigms”. Here, the participant takes on the role of
an observer, decoding the information they are presented with in the form of a video
stimulus. Researchers have used this kind of paradigm primarily to test whether co-
speech gestures communicate or not. To do so, they have often combined this basic par-
adigm with many variations regarding the conditions under which the video clips are
generated and presented.
One common method is to recruit a first set of participants who describe a range of
stimuli (for example landscapes, buildings and people (Krauss, Morrel-Samuels, and
Colasante 1991) or cartoon stories (Beattie and Shovelton 1999a, 1999b, 2001)) to a
confederate (usually the experimenter). These spontaneous narrations or descriptions
have been shown to elicit a great amount of co-speech gestures. Video clips showing
isolated gestures from this footage are then played to a new set of participants in a sec-
ond stage of the study who view gesture and speech together, just the gesture (in
absence of speech) or who hear the speech (in absence of gesture); this method allows
researchers to evaluate the individual and combined contributions of the two modalities
to the decoders’ message comprehension and information take-up. The measures these
studies have used to identify whether the gestures have communicated information to
the decoder-participants (both in the absence as well as over and above speech) vary.
Krauss, Morrel-Samuels, and Colasante (1991) techniques required decoders, amongst
840 V. Methods
other things, to identify the lexical affiliates of gestures, the semantic interpretation and
categorisation of gestures, and the recollection of individual gestures. Beattie and Sho-
velton (1999a, 1999b, 2001) used what they called a “semantic feature approach”, which
involved quizzing participants about the kinds of information they had received regard-
ing a range of detailed semantic categories. This was done in various forms, using either
open-ended or forced-choice questionnaires which participants completed for each clip.
Feyereisen, van de Wiele, and Dubois (1988) used a similar method; however, they
filmed people delivering lectures rather than describing imagistic stimuli and then
played video clips of the gestures to decoders. Rather than measuring the amount
the gestures communicated, their focus was on how well the decoders could differenti-
ate gesture types (iconic and batonic). These judgements were made either with or
without speech to see whether access to the verbal message content would modulate
the perception (and communicativeness) of the gestures.
Krauss et al. (1995) also used video footage of speakers communicating, but they in-
cluded a condition in which participants exchanged information via an intercom (i.e.,
where addressees could not see the speakers’ gestures). They played these videos
back to a set of decoders to compare the communicativeness of gestures which ap-
peared to be produced for addressees and those which did not (measured in terms of
the accuracy of decoders’ stimulus selection based on the speakers’ descriptions).
Also, their study introduced a slightly different set of stimuli bearing more abstract
features, such as synthesized sounds, tea flavours and abstract shapes.
Studies by Rogers (1978) and Riseborough (1981), too, used the basic play-back
paradigm to test the communicativeness of gestures, but by introducing conditions in
which just the speaker’s face was visible or the face was blanked out they managed
to filter out the contribution of facial information accompanying gestural representa-
tions (thus contrasting with the studies above). Another variation is the presentation
of noise at different levels of intensity to determine the importance of gestures when
speech is more or less intelligible. A further important difference to the studies above
is that the video footage played back to decoders stemmed from spontaneous interac-
tions between two “naı̈ve” interlocutors, rather than from interactions involving a
confederate (with the exception of Riseborough 1981, experiments 2 and 3). This is a
crucial point, as speakers in these experiments may have produced more natural ges-
tures than when talking to a confederate – I will come back to this issue in section 5.
The measure employed by Rogers (1978) bears similarity to the semantic feature
approach used by Beattie and Shovelton (1999a, 1999b, 2001), as individual questions
(with multiple choice answers) tapped different semantic aspects of the actions and
objects described by the participants in the stimulus videos (an approach based on
Fillmore 1971). Riseborough’s (1981) measure, in contrast, was based on participants’
guesses about the objects the gestures represented, their recall of gestures, and the
information they inserted into blank fields in a transcript of the original narrative.
Apart from adult to adult communication, it has also been tested whether adults
can glean information from children’s gestures, motivated by the idea that co-speech
gestures can reveal something about children’s cognitive development (Alibali
and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al. 2007;
Goldin-Meadow 2000, 2003). In particular, gestures can reveal whether children are
at a so-called transitional stage (a period of time just before their implicit knowledge
is about to advance by a significant step). Because children may benefit a great
deal from instruction and input from the environment during such periods, it is an
important question whether adults (e.g., in the role of parents and teachers) are sensi-
tive to this information in the child’s gestural communication. A large number of exper-
imental studies has explored this issue, using a paradigm in which adults decode
information from children’s gestures, extracted from videos of spontaneous interactions
between the child and an experimenter (e.g., Alibali, Flevares, and Goldin-Meadow
1997; Goldin-Meadow and Sandhofer 1999; Goldin-Meadow, Wein, and Chang 1992).
During these interactions, children were asked to explain mathematical equations or
traditional Piagetian conservation problems, which children tend to grasp only at cer-
tain developmental stages. The adult decoders were presented with clips of either just
the speech, the speech accompanied by a “matching” gesture (the gesture represents
the same information as the speech) or a “mismatching” gesture (the gesture represents
different, supplementary information to that contained in speech). They were then
asked to check questionnaire answers relating to the video vignettes tapping the infor-
mation the children had provided, or to talk about the children’s explanations, with
their own speech and gestures subsequently being analysed for content to see what
information the adults had picked up.
Another question is whether children are also able to glean information from co-
speech gestures. Kelly and Church (1997) employed a paradigm very similar to that
used by Goldin-Meadow and colleagues (e.g., Goldin-Meadow, Wein, and Chang
1992), adapted to test the gesture comprehension of 7 year olds. To do so, they employed
three measures, a recall task (children describing, in their own words, the responses given
by the children in the video vignettes), a questionnaire testing for the information the
children thought had been given in the videos, and a task requiring children to assess
whether they thought the children in the videos were just about ready to understand
the concepts they were explaining. Other studies have directly compared the decoding
abilities of children and adults using the same basic paradigm, combined with compre-
hension and memory measures appropriate for the different age groups (Church,
Kelly, and Lynch 2000; Kelly and Church 1998; Thompson and Massaro 1986). Also,
some studies have started to investigate comprehension of gestures in very small children
(around 1 year of age) which have mainly focused on the understanding of intentionality
associated with gestures. These have used quite different paradigms. For example, Gliga
and Csibra (2009) measured children’s looking times in response to objects appearing at
locations indicated by pointing gestures or at the opposite side to that indicated by the
gesture.
Paradigms testing the communicativeness of co-speech gestures have been widely
applied to children and healthy young adults; some studies have adapted these para-
digms also to other populations, such as older adults and aphasics (for examples, see
Cocks et al. 2009; Feyereisen, Seron, and de Macar 1981; Feyereisen and van der Linden
1997; Thompson 1995).
Apart from researching whether co-speech gestures communicate semantic infor-
mation, studies have also focused on the comprehension of the pragmatic aspects of
messages, in particular indirect requests. For example, Kelly et al. (1999) used a play-
back paradigm to present observers with clips of an actor expressing indirect
requests accompanied by gesture (or not), where the gesture provided additional infor-
mation relevant to interpreting the speaker’s communicative intent. They asked parti-
cipants to predict the response of the person who acted as addressee in the stimulus
842 V. Methods
video, thus measuring information uptake from the gestures and whether this infor-
mation was integrated with the interpretation of the speaker’s intended meaning.
Kelly (2001, experiment 1) investigated the role of gesture for pragmatic understand-
ing in children (3–5 year olds) using a similar paradigm. Children watched video-
recorded interactions in which one person uttered an indirect request accompanied
by gesture or not, with children being asked what the speaker in the video had
referred to.
In addition to experiments presenting video clips to participants acting as observers/
decoders, some studies have tested the communicativeness of gestures in live interac-
tions. Graham and Argyle (1975) asked individuals to describe abstract shapes (of
high and low verbal encodability) to a group of addressees present in the same room.
In one condition, describers were allowed to gesture freely, in the other they were
asked to fold their arms. Addresses then drew the shapes, followed by an evaluation
of the accuracy of their drawings in the two conditions to measure gestural communi-
cation. Holler, Shovelton, and Beattie (2009) asked an actor to provide a scripted car-
toon narrative (based on spontaneously produced narratives) to addressees, including
the production of gestures which accompanied the original narratives. After the narra-
tions, addressees answered questions about the stories which were then scored by the
experimenters for the information they contained according to individual semantic fea-
tures (some of which were only represented in the gestures). The communicativeness of
the gestures in the face-to-face condition was compared to video (gesture + speech and
gesture only), as well as to an audio only condition (speech without gesture). With
regard to children’s gestures, Goldin-Meadow and Sandhofer (1999) have shown that
adults can glean significant amounts of information from them when observing the chil-
dren communicate live with an experimenter, using the same paradigm as with their
video-based play-back conditions. Kelly (2001, experiment 2) tested the communicative
role of gesture in children’s pragmatic understanding live by engaging them in interac-
tion with the experimenter who uttered indirect requests (using just speech, gesture and
speech, or just gestures to make the request, such as by pointing at an object). The chil-
dren’s success at understanding was reflected in their response to the indirect requests.
Behne, Carpenter, and Tomasello (2005) and Gräfenhain et al. (2009) showed that ges-
tures are communicative in a live context even to very small children (14 months of
age). Their studies tested children’s interpretation of the communicative intent asso-
ciated with gestures produced by an adult. For example, in the task used by Gräfenhain
et al. (2009), one adult pointed towards one of two locations combined with either
averted gaze or gaze directed at another adult looking for a toy. Children who observed
this scene were then allowed to look for the toy themselves, with their choice of location
providing insight into their comprehension of gesture and gaze cues.
Studies testing the communicativeness of gestures in a live, face-to-face context
advance our knowledge of gesture considerably, as they eradicate some of the potential
limitations of studies using video play-back techniques. For example, in case of the
latter, video clips of individual gestures are presented to decoders often without the
natural context in which they occur, thus isolating them from any other contextual
cues, and in some studies the clips were even played repeatedly. However, video
play-back paradigms do offer the advantage that gestures from spontaneous interac-
tions can be used as the stimuli whereas in most of the studies using a face-to-face con-
text reviewed here (with the exception of the studies by Graham and Argyle as well as
Goldin-Meadow and Sandhofer described above), confederates/experimenters produced

the gestures; this issue will be addressed in more detail in section 5.
Yet another alternative is to base the decision about whether the information from
a gesture has been received and understood on the addressee’s behaviour – such as in
response to gestures with an interactive function which are known to elicit certain
addressee responses (cf. Bavelas et al. 1995), as well as in cases where participants
mirror their interactants’ gestures (Holler 2003; Holler and Wilkin 2011; Kimbara
2006, 2008; Parrill and Kimbara 2006). Whereas the amount of detailed insights
we can glean from such paradigms (e.g., exactly how the gesture was perceived/
interpreted by the addressee, or exactly how much information was received), this
sort of paradigm preserves most of the natural interaction in which co-speech gestures
are used.
Other paradigms measuring the communicativeness of co-speech gesture do not rely
on the use of questionnaires or other pen and paper recordings of participants’ answers.
Eye-tracking studies, for example, have investigated recipients’ overt attention to ges-
tures by measuring the amount and duration of recipients’ eye fixations on speakers’
gestural movements presented on video or in live conditions (Gullberg 2003; Gullberg
and Holmqvist 1999, 2006). Although these eye-tracking data can provide useful in-
sights into when and for how long participants overtly attend to gesture, the tool is
not sensitive enough to capture covert attention processes and is not suitable for mea-
suring information uptake from gestures (neither amount nor type) as there appears to
be no clear association with direct fixations (Gullberg and Kita 2009). Reaction time
measurements, however, are a suitable method for tapping into more covert processes
of gesture comprehension (e.g., Kelly, Özyürek, and Maris 2010).
In addition to the behavioural measures reviewed above, at least two types of tech-
niques from cognitive neuroscience have been used to measure the brain’s response to
gestures. They are suitable for providing insight into information uptake from gestures,
the relationship between gesture and speech and the way in which the brain processes
information from the two modalities. More precisely, studies using ERP (Event Related
Potentials; the measurement of the brain’s electrophysiological response as an indica-
tion of its activity following stimulus presentation) are suitable to answer questions
about the time course of the processing and integration of different signals (including
semantic integration, typically captured by the N400 component). The first ERP studies
explored the semantic integration of speech and gesture using matching and mismatch-
ing gestures either as primes to subsequently presented words (Kelly, Kravitz, and
Hopkins 2004), within a sentence context presented simultaneously with speech
(Özyürek et al. 2007), and in association with imagistic information (cartoon images)
and matching or mismatching words (Wu and Coulson 2007). ERP studies have also
been used to investigate if and when the brain picks up information from co-speech ges-
tures representing information that is not contained in the speech at all but semantically
relevant for the interpretation of the verbal message (e.g., in the context of ambiguous
speech, Holle and Gunter 2007). And, recently, study by Kelly, Creigh, and Bartolotti
(2010) has used ERPs to gain insight into how voluntary or automatic gesture-speech
integration is.
fMRI techniques (Functional Magnetic Resonance Imaging; a technique used to
measure the brain’s neural activity based on changes in blood flow, providing insight
into brain area-specific activity in response to a stimulus or task) have been used to
844 V. Methods
find out where in the brain gesture and speech are integrated and which neural net-
works are involved in their processing. For example, Willems, Özyürek, and Hagoort
(2007) used a paradigm in which they varied the difficulty of gesture-speech integration
(with matching versus mismatching gestures) to explore this issue. In another study,
they compared co-speech gestures with those that are less strongly tied to speech (pan-
tomimes) to see whether they activate different or overlapping brain areas (Willems,
Özyürek, and Hagoort 2009). Other researchers have focused on gesture-speech inte-
gration when gesture provides supplementary information which disambiguates speech
(Holle et al. 2008), and for the involvement of the human mirror system in co-speech
gesture processing (Skipper et al. 2007). Further, paradigms have manipulated the
degree of perceived communicative intent associated with gestures to investigate prag-
matic aspects of gesture-speech integration (e.g., by creating gesture-speech mismatches
produced by the same versus different persons (Kelly et al. 2007), or by varying the
speaker’s gaze direction (Holler, Kelly, Hagoort, and Özyürek 2012), or their body ori-
entation as being oriented either towards the participant or towards a third person
(Straube et al. 2010)).
In both ERP and fMRI studies exploring co-speech gesture processing, it is often
necessary to work with video stimuli of a highly controlled nature, as, otherwise, it is
difficult to attribute observed effects to the intended experimental manipulation. The
stimuli used in these studies tend to be video clips of individual gestures presented
on their own, accompanied by speech (words or sentences), or preceded by it. Due
to the strong need for careful control, the stimuli usually involve a trained actor carry-
ing out scripted hand movements. (Also, ERP and fMRI studies often incorporate addi-
tional tasks requiring participants to answer questions or make some other kind of
decision (e.g., using a push-button device). This results in additional datasets of reaction
times (RTs) and response accuracy, for example, which provide further insight into the
comprehension of co-speech gestures.)
4.2. Co-speech gesture production

A paradigm that has become widely established is that first used by David McNeill
(1985, 1992), involving the use of cartoon videos (famously, Sylvester and Tweety car-
toons) watched by one participant who then tells it to another while being video recorded.
This paradigm was originally used without any further experimental manipulations
and McNeill (1985, 1992) used the footage, rich in spontaneously produced co-speech
gestures, to analyse the semantic relationship between gesture and speech. These
analyses provided the basis for him to make the important argument that thought is
externalised by, and that language consists of, both speech and co-speech gestures, and
to model the kind of mental representations underlying speakers’ gesture-speech
utterances.
Some later studies used the same basic paradigm. Holler and Beattie (2002, 2003a)
used it to investigate and quantify the semantic interplay of gesture and speech using a
fine-grained semantic feature analysis (albeit with static rather than moving cartoon
images), as well as a host of studies exploring cross-linguistic differences, specifically,
how speakers of different languages package information in gesture and speech when de-
scribing the same stimuli (e.g., Allen et al. 2007; Kita and Özyürek 2003; McNeill 2001;
McNeill and Duncan 2000; Özyürek et al. 2008; Özyürek et al. 2005).
The idea that co-speech gestures can provide us with a greater insight into speakers’
underlying mental representations has also become of great relevance in developmental
psychology. Here, particularly the work by Susan Goldin-Meadow and colleagues (e.g.,
Alibali and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al.
2007; Goldin-Meadow, Alibali, and Church 1993; Goldin-Meadow 2003; Perry, Church,
and Goldin-Meadow 1988) (see also Pine, Lufkin, and Messer 2004) has shown that
children often externalise knowledge in co-speech gesture before they are ready to
communicate verbally about the same concepts, such as in their explanations of conser-
vation, maths and balance problems. In these kinds of studies, children are given prob-
lems of the aforementioned kind and are simply asked to provide their explanation of
it. Both gesture and speech can then be analysed for the semantic information they
represent.
Because co-speech gestures bear a very close relationship to speech, researchers
have been intrigued by the nature of this relationship, the role of gesture in the process
of speaking and communicating and the exact functions they fulfil in talk. Different
experimental paradigms have been used to test different hypotheses; these can be
broadly classed into those postulating cognitive functions (thus benefiting mainly the
speaker) and those postulating communicative functions (thus benefiting primarily
the addressee). However, although contrasted here and discussed separately, these
approaches are not necessarily mutually exclusive.
Goldin-Meadow et al. (2001) have argued that co-speech gestures may reduce a
speaker’s cognitive load and thus free up cognitive capacities. In their paradigm, parti-
cipants explained their solutions to a series of maths tasks, which they frequently accom-
panied with gestures, while trying to remember a sequence of letters. This was combined
with a standard memory test (tapping the letter sequences) to see whether those who
gestured more would perform better, assuming that gesturing enabled participants to
allocate more resources to the memory task.
Other researchers have claimed that gestures maintain representations in spatial
working memory, thus indirectly influencing speech production (Morsella and Krauss
2004; Wesp et al. 2001). Both Morsella and Krauss (2004) and Wesp et al.’s (2001) para-
digms required participants to describe stimulus objects either from memory (stimulus
absent) or while looking at them (stimulus present). A similar procedure was used by
de Ruiter (1998, experiment 3) to test the lexical retrieval theory against the theory
that gestures facilitate the encoding of imagery in speech.
Co-speech gestures have also been postulated to facilitate conceptual planning
during the speech production process. To investigate this hypothesis, researchers
have used paradigms which compare conditions under which conceptual planning is
easy versus difficult. For example, Alibali, Kita, and Young (2000) used traditional
Piagetian conservation tasks and asked children to either explain why they thought
two vessels held the same or different amounts of liquid, or describe how the two ves-
sels looked differently. Other studies have asked participants to describe a range of
shapes made up from lines connecting a number of dots to another person; to create
a conceptually more difficult condition, they removed the lines, leaving just dot pat-
terns less suggestive of a particular shape (Hostetter, Alibali, and Kita 2007). Another
study created conditions where participants had to describe geometrical shapes with
or without distracting lines creating competing conceptualisations (Kita and Davies
2009). Melinger and Kita (2007) increased conceptual planning load by asking
846 V. Methods
participants to complete a secondary, competing task (either similar or different to the

primary task).
Lexical access is another core component of the speech production process, happen-
ing at a later stage than the conceptual planning of messages. Some researchers have
argued that it is at this point that co-speech gestures fulfil a facilitating function (e.g.,
Krauss, Chen, and Gottesman 2000; Rauscher, Krauss, and Chen 1996). Two main para-
digms have been used to test this particular theory. One involves preventing speakers
from gesturing followed by a subsequent analysis of the effects on verbal encoding; re-
searchers have used various methods to restrict speakers’ gestures, such as asking them
to press desk-mounted hand switches (Lickiss and Wellens 1978), hold objects in their
hands (Frick-Horbury and Guttentag 1998, experiment 1), or to keep their arms folded
(Beattie and Coughlan 1999; Graham and Argyle 1975); however, this procedure may
require the speaker to divide their attention by concentrating not only on the experi-
mental task but also on the fact that they should not move their hands. Other studies
have therefore used methods which meant that speakers were not physically able to
move their hands, for example by fastening their forearms to arm rests (Rime et al.
1984), by immobilising their hands in special apron pockets (Frick-Horbury and Gut-
tentag 1998, experiment 2) or by placing electrodes on their palms while pretending
that the experiment focused on the psychophysiological recordings made during the
task (Rauscher, Krauss, and Chen 1996).
A second paradigm used to investigate the function of gesture and test the theory of
lexical access involves the elicitation of tip-of-the-tongue (ToT) states. For example,
researchers have tried to get participants thinking about certain words by providing
them with the dictionary definitions (and the first letter) of a range of words (Beattie
and Coughlan 1999; Frick-Horbury and Guttentag 1998) or with pictures of objects
when studying tip-of-the-tongue states in children (Pine, Bird, and Kirk 2007; Yan
and Nicoladis 2009). The aim was to then analyse the frequency and type of gestures
accompanying tip-of-the-tongue states, and to compare the number of tip-of-the-tongue
states resolved with and without gesture. Some researchers have combined this para-
digm with that of gesture prevention described above (e.g., Beattie and Coughlan
1999; Frick-Horbury and Guttentag 1998).
In addition to linking gesture use to cognitive, intra-personal functions, researchers
have also developed paradigms to investigate their communicative, inter-personal
ones. One well established paradigm involves the manipulation of visibility between
the speaker and the addressee, usually by separating both participants, present in the
same room, with an opaque screen (Alibali, Heath, and Myers 2001; Gullberg 2006).
Cohen and Harrison (1973), who were amongst the first to experimentally investigate
the communicative functions of gestures, used a combined manipulation of both visibil-
ity and co-presence; instead of being separated by an opaque screen, participants were
located in different rooms and communicated via an intercom (they compared this to a
face-to-face condition). In 1977, Cohen published a follow-up study and introduced a
third condition in which participants talked into a tape-recorder, thus removing the
addressee completely (speakers were told they were just practicing the task). This al-
lowed him to compare the influence of visibility and co-presence on gestures with the
influence of a completely absent addressee. Mol et al. (2009) compared four different
conditions, participants communicating face-to-face, separated by an opaque screen,
and via a web cam (here the participant did not see the addressee but was told that
the addressee could see them via a video link); the fourth condition was set up exactly
like the latter but participants were told that their communication (both audio and
visual) would be fed into a computer. This manipulation allowed the authors to com-
pare the previous three different communicative contexts as well as human-human
and human-machine communication.
Bavelas et al. (2008) introduced another important manipulation to tease apart
the influence of visibility and dialogue on gesture use. In addition to visibility and co-
presence, they varied dialogic interaction. This was done by comparing a face-to-face
and a screen condition, in which interactants were free to engage in dialogue, with a
tape recorder condition, in which participants believed they were recording their mes-
sage for another person who would listen to the recording later – thus, neither did these
speakers see their addressees, nor did they engage in dialogue with them. This work
builds on a series of other studies manipulating monologue and dialogue and investigat-
ing the role of the addressee’s involvement in gesture and language use (Bavelas et al.
1995; Bavelas, Coates, and Johnson 2000; Bavelas et al. 1988; Bavelas et al. 1986). In
addition to research on gesture use in dyadic interaction, manipulating visibility, co-
presence and dialogic interaction, researchers have investigated the influence of
another contextual factor on co-speech gestures – that of addressee location (Özyürek
2002). Here, speakers talked to either one or two addressees who were located directly
opposite or towards the side. Speakers’ use of gesture when representing spatial infor-
mation was compared between these conditions to provide insight into recipient design
in gesture use. Further, because this research involved multi-party interactions, it ex-
pands our knowledge from gesture use in dyadic interactions to that in triads.
Apart from the influence of the degree of interactivity and physical contextual fac-
tors (such as co-presence, visibility, number of addressees and their location), re-
searchers have investigated the influence of more cognitive, covert processes of
conversation. One variable that has been manipulated in this context is the common
ground between speakers and their addresses (i.e., the knowledge, beliefs and as-
sumptions mutually shared by participants in an interaction (Clark 1996)). Common
ground has been experimentally induced in a variety of ways. Gerwing and Bavelas
(2004) asked participants to play with either the same (common ground) or a differ-
ent (no common ground) set of toys and then asked one person to tell another about
their experiences with the toys. The gestures used to refer to the toys in the two
groups were compared for differences their form (precision). Apart from creating
common ground based on shared action-based experiences, researchers have also
used paradigms to induce it visually by presenting stimuli to the speaker and the
addressee or just to the speaker (who then talks to an unknowing addressee). Holler
and Stevens (2007) used images showing particularly large entities amongst smaller
ones and focused their analysis on the effect of common ground on the encoding
of size information in both gesture and speech. Holler and Wilkin (2009) used a
similar method, but instead of pictures used a short video, allowing their analysis
to focus on a wider range of semantic features (relating to actions, objects and
persons, as well as their attributes). Parrill (2010) also used video stimuli to experi-
mentally manipulate common ground, but instead of a longer video (telling a
whole story) she used a short clip showing a single event and, similar to Holler and
Stevens (2007), focused her analysis on one semantic aspect of it (here, the ground ele-
ment). In addition, she combined this with the manipulation of “information salience”,
848 V. Methods
i.e., whether the ground element had been mentioned by an experimenter previously to
the participant referring to it or not. Holler (2003) and Jacobs and Garnham (2007) ma-
nipulated common ground by asking participants to relay the same description of events
represented in cartoon pictures to the same addressee repeatedly (thus accumulating
common ground) in order to then compare the speakers’ gesture rate across the trials.
Jacobs and Garnham (2007) also used joint visual availability of the stimulus to induce
common ground, by providing both speaker and addressee with the view of the stimulus
while it was being described.
Other studies investigating the link between communicative intent and co-speech
gestures have manipulated verbal ambiguity to find out whether speakers would draw
on the gestural modality to clarify their speech for the interlocutor (Holler and Beattie
2003b). Further, Melinger and Levelt (2004) investigated whether co-speech gestures
encode what they defined as “necessary” information, and whether in such cases speakers
were less likely to also represent this information in speech.
Although not manipulating communicative intent directly, studies focusing on ges-
tural mimicry (Holler and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara
2006) provide insight into the collaborative use of co-speech gestures and add further
to our knowledge of communicative uses of co-speech gestures in communication.
Finally, many of the paradigms reviewed above have been adapted (and in some
cases special paradigms have been newly created) to investigate gesture production in
populations other than children and the “healthy student adult”, such as in older adults
(Feyereisen and Havard 1999), split-brain patients (Kita and Lausberg 2008; Lausberg
and Kita 2002; Lausberg et al. 2003), aphasic patients (Cocks, Hird, and Kirsner 2007;
Hadar et al. 1998) and Alzheimer’s patients (Carlomagno et al. 2005; Glosser, Wiley,
and Barnoski 1998).
5. Some methodological shortcomings and limitations

One very debatable issue is the use of confederates in production (but also comprehen-
sion) studies of gesture. The obvious reason is that the verbal and nonverbal behaviour
of confederates may seem unnatural as it is non-spontaneous. Of course, confederates
may be able to control some behaviours quite well, such as whether to ask a question at
a certain point or not. But this may not be the case for others, especially quick, fleeting
micro-behaviours which are under much less voluntary control (movements of the facial
muscles, for example) and behaviours which it is difficult to produce consistently across
experimental trials (e.g., the exact intonation with which we utter something). What en-
hances the problem further is that in most cases the same confederate is “used” in more
than one experimental trial. This means that when listening to descriptions or narratives
of certain stimuli, latest on trial number 2 the confederate has pre-existing knowledge
about what the speaker is telling (i.e., it is given information, without the speaker know-
ing this) and hence may respond to it differently than when being provided with new
information. This, in turn, may of course affect the participant’s behaviour. An addi-
tional problem is that in many cases the experimenter takes on the role of the confed-
erate. This means that they are familiar with the experimental manipulations and in
most cases probably also with the exact hypotheses. Potentially, this can have a huge
impact on the confederate’s behaviour which may be influenced by their particular ex-
pectations. The participant’s verbal and/or gestural behaviour may, as a result, be biased
into a certain direction (e.g., the experimenter may, unconsciously, respond more enthu-
siastically or encouragingly (verbally or nonverbally) in cases where the participant
has displayed gestural behaviour in line with the experimental predictions (or sanction
behaviour going against them, such as with a lack of positive feedback)).
Of course, there may sometimes be good reasons as to why researchers want to use
confederates in their studies. In comprehension studies, it is important to isolate a single
manipulation or difference to test a particular hypothesis and obtain clear results. Espe-
cially in ERP and MRI studies, a tightly controlled, carefully constructed stimulus (op-
timising the signal/noise ratio) is necessary for the signal to pick up any meaningful and
unequivocally interpretable responses from the brain. Other reasons are that confeder-
ates producing scripted behaviour allow researchers to examine recipients’ responses to
these behaviours – which may be useful when the natural occurrence of such behaviours
is rather rare (meaning that an unmanageably huge number of hours of recordings and
participants would be needed to obtain large enough a dataset), or when the social con-
text in which it occurs creates too much noise for a clear analysis. Yet another reason of
for using confederates, at least in production studies, is the availability of resources, in-
cluding the size of the participant pool, financial means for compensating participants,
and the greater effort and difficulty associated with recruiting unacquainted participants
as pairs. Although regarding this latter reason scientific rigour and validity should cer-
tainly weigh stronger, much of the research using confederates in production studies
was carried out quite a few years ago, when the strong influences social-interactional
contexts can have on gesture use was not all that well known. Researchers nowadays
benefit from this awareness and where future studies need to employ confederates as
addressees or stimuli-actors for the above named (or other) reasons, one way of redu-
cing methodological limitations is to complement the analyses with a second, smaller
dataset using spontaneous interactions between “naı̈ve” participants in the same respec-
tive context. This helps to demonstrate that similar behaviour occurs in a more natural
context. Another (or better, additional) option is to have the consistency and natural-
ness of the confederate’s behaviour established by a separate set of independent
observers.
Another controversial issue is the manipulation of the interaction between speaker
and addressee. In many of the production studies cited in this chapter, the participant
taking on the role of the addressee was asked not to interrupt the speaker with ques-
tions (while still delivering back-channel responses though). It appears that one of
the reasons researchers choose to limit the amount of dialogic exchange is the possi-
bility of experimental confounds. Studies often measure the influence of various cog-
nitive and social variables on gesture by focusing on gesture frequency or gesture
rate. However, we also know that verbal interaction itself (as compared to mono-
logue) influences gestures rate, independent of any additional manipulation (Bavelas
et al. 1995; Beattie and Aboudan 1994). Thus, when manipulating, for instance, common
ground or conceptual load, participants may interact more with their addressee in one of
the experimental conditions than in another (e.g., participants may feel more rapport
with the other participant when mutually sharing certain knowledge, or they may seek
more help or feedback from their addressee when finding communication conceptually
more difficult). In such a case, a higher gesture rate in one of the conditions could be
due to a difference in dialogic interaction per se as well as due to the experimental
manipulation.
850 V. Methods
However, considering that other studies have shown that dialogical involvement of
the addressee impacts crucially on gestural behaviour (Bavelas et al. 1995; Beattie
and Aboudan 1994) studies restricting interaction may be fundamentally limited in
the extent to which their findings can be generalised to dialogue. Because dialogue is
one of the most common forms of everyday conversation this is a serious potential lim-
itation. While researchers certainly need to be aware of this limitation (and take it into
account when drawing their conclusions), those studies based on restricted interactions
are certainly not without value. This is because everyday talk constitutes a continuum,
ranging from monologue to dialogue (Pickering and Garrod 2004). People talk in
monologue when delivering lectures, conference talks or other oral presentations,
and in conversation, individual speakers often take extended turns to tell stories and
anecdotes, jokes, describe how someone gets from A to B, what procedure to follow
to achieve a certain goal, or about other complex contexts whose explanation stretches
over several sentence. During such extended turns it is not rare that addressees provide
mainly backchannel responses rather than take the floor. In other cases, some interlo-
cutors may simply be more dominant, vocal or extrovert and therefore talk consider-
ably more than others, possibly leaving no opportunity at all for turn contributions
from other participants for much of the conversation. Moreover, in almost all of the
gesture production studies reviewed here, one participant is assigned the role of the
speaker who has all the information (i.e., who has seen the stimuli) and who tells it
to their addressee. This sort of situation leads, for obvious reasons, mainly to conversa-
tions dominated by one individual with a limited number of turns between speakers,
even when these are completely free to interact. Considering the wide range of different
forms of talk, it is important that our research reflects this spectrum, thus capturing
human communication as the multi-faceted dimension that it is. At the same time,
though, it is vital that researchers recognise the particular facet that individual datasets
and analyses are representative of and be wary of over-generalisation.
Alternatively, researchers may choose to explore free interaction as a default, and in
contexts where differences in dialogic interaction could confound results, undertake
steps to tackle these unwanted influences. For example, experimental groups could be
compared for the number of turns used/the number of questions asked, and so on. If
differences on these dimensions are found, statistical procedures that partial out the
respective influences could be employed. This would allow researchers to carry out
experimental studies in order to exert some degree of control over aspects such as con-
tent of talk (narration/description of set stimuli) but without compromising spontane-
ous social interaction and running the risk of unnecessary reductionism; only
approaches using a social unit of analysis offer the opportunity to capture those pro-
cesses that cannot be captured by simply “summing the parts” (cf. Bavelas 2005). Con-
sidering we still know relatively little about gesture as a social behaviour and its use in
dialogic interaction, experimental paradigms based on spontaneous, free interaction
between non-confederates is certainly one main avenue researchers in this field need
to pursue.
6. Conclusion
In this chapter, I have tried to provide an overview of the range of basic paradigms (and
their variations) employed in experimental co-speech gesture research, combined with
some degree of critical reflection on aspects of these procedures. Due to the scope of
this article, this overview remains selective and limited in many ways, but I hope to
have been able to provide some starting point here, especially to scholars new in the
field of co-speech gesture research.
Ultimately, in choosing between different experimental methods and weighing up
their pros and cons, it depends on the researcher’s exact aim to decide what is gained
and lost by opting for a particular paradigm. With respect to the interpretation of
research findings, it is important to recognise that differences in research results may
in fact be rooted in differences between experimental paradigms used, even if these
may seem small (such as regarding the degree of dialogic interaction). Further, it is
important to be careful with the generalisation of findings and with distinguishing,
for example, whether results tell us something about gesture use in dialogue or in
more monologue-type contexts; or whether they tell us nothing about gesture use in
interaction at all but useful things about the gesture-speech relationship nevertheless
(i.e., something that Bavelas 2005 has referred to as studying the mind, or individuals’
thinking, as opposed to social interaction).
In trying to make a choice between different paradigms in the light of their method-
ological advantages and limitations, the most fruitful approach may still be one that
combines less with more experimentally controlled methods (given there are good rea-
sons for employing the latter). This way, we might throw light on the phenomena we
aim to investigate from a range of different angles, capturing partly different aspects,
and obtaining the most comprehensive answers. In my view, different experimental
methods and techniques complement, similar to laboratory-based research of co-speech
gesture complementing observations of gesture in non-experimental contexts – both in
terms of the methods used and the questions answered.
7. References
Alibali, Martha W., Lucia M. Flevares and Susan Goldin-Meadow 1997. Assessing knowledge con-
veyed in gesture: Do teachers have the upper hand? Journal of Educational Psychology 89:
183–193.
Alibali, Martha W. and Susan Goldin-Meadow 1993. Gesture speech mismatch and mechanisms of
learning: What the hands reveal about a child’s state of mind. Cognitive Psychology 25: 468–523.
Alibali, Martha W., Dana C. Heath and Heather J. Myers 2001. Effects of visibility between
speaker and listener on gesture production: Some gestures are meant to be seen. Journal of
Alibali, Martha W., Sotaro Kita and Amanda J. Young 2000. Gesture and the process of speech
production: We think, therefore we gesture. Language & Cognitive Processes 15: 593–613.
Allen, Shanley, Ash Özyürek, Sotaro Kita, Amanda Brown, Reyhan Furman, Tomoko Ishikuza and
Mihoko Fujii 2007. Language-specific and universal influences in children’s syntactic packaging
of manner and path: A comparison of English, Japanese, and Turkish. Cognition 102: 16–48.
Argyle, Michael and Jean A. Graham 1975. A cross-cultural study of the communication of extra-
verbal meaning by gestures. International Journal of Psychology 10: 57–67.
Bavelas, Janet B. 2005. The two solitudes: Reconciling social psychology and language and social
interaction. In: Kristine L. Fitch and Robert E. Sanders (eds), Handbook of Language and
Social Interaction, 179–200. Mahwah, NJ: Lawrence Erlbaum.
Bavelas, Janet B., Alex Black, Nicole Chovil, Charles R. Lemery and Jennifer Mullett 1988. Form
and function in motor mimicry topographic evidence that the primary function is communica-
tive. Human Communication Research 14: 275–299.
852 V. Methods
Bavelas, Janet B., Alex Black, Charles R. Lemery and Jennifer Mullett 1986 “I show how you feel”:
Motor mimicry as a communicative act. Journal of Personality and Social Psychology 50: 322–329.
Bavelas, Janet B. and Nicole Chovil 2000. Visible acts of meaning: An integrated message model
of language in face-to-face dialogue. Journal of Language and Social Psychology 19: 163–194.
Bavelas, Janet B., Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for dia-
logue. Personality and Social Psychology Bulletin 21: 394–405.
Bavelas, Janet B., Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures.
Discourse Processes 15: 469–489.
Bavelas, Janet B., Linda Coates and Trudy Johnson 2000. Listeners as co-narrators. Journal of Per-
sonality and Social Psychology 79: 941–952.
Bavelas, Janet B., Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing on the
telephone: Independent effects of dialogue and visibility. Journal of Memory and Language 58:
495–520.
Beattie, Geoffrey and Rima Aboudan 1994. Gestures, pauses and speech: An experimental inves-
tigation of the effects of changing social context on their precise temporal relationship. Semi-
otica 99: 239–272.
Beattie, Geoffrey and Jane Coughlan 1999. An experimental investigation of the role of iconic gestures
in lexical access using the tip-of-the-tongue phenomenon. British Journal of Psychology 90: 35–56.
Beattie, Geoffrey and Heather Shovelton 1999a. Do iconic hand gestures really contribute any-
thing to the semantic information conveyed by speech? Semiotica 123: 1–30.
Beattie, Geoffrey and Heather Shovelton 1999b. Mapping the range of information contained in
the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social
Beattie, Geoffrey and Heather Shovelton 2001. An experimental investigation of the role of dif-
ferent types of iconic gesture in communication. Gesture 1(2): 129–149.
Behne, Tanya, Malinda Carpenter and Michael Tomasello 2005. One-year-olds comprehend the
communicative intentions behind gestures in a hiding game. Developmental Science 8: 492–499.
Broaders, Sara C., Susan Wagner Cook, Zachary Mitchell and Susan Goldin-Meadow 2007. Mak-
ing children gesture brings out implicit knowledge and leads to learning. Journal of Experimen-
tal Psychology: General 136: 539–550.
Carlomagno, Sergio Maria Pandolfi, Andrea Marini, Gabriella Di Iasi and Carla Cristilli 2005.
Coverbal gestures in Alzheimer’s type dementia. Cortex 41: 535–546.
Church, Ruth Breckinridge and Susan Goldin-Meadow 1986. The mismatch between gesture and
speech as an index of transitional knowledge. Cognition 23: 43–71.
Church, Ruth Breckinridge, Spencer D. Kelly and Katherine Lynch 2000. Multi-modal processing
over development: The case of speech and gesture detection. Journal of Nonverbal Behavior
24: 151–174.
Cocks, Naomi, Kathryn Hird and Kim Kirsner 2007. The relationship between right hemisphere
damage and gesture in spontaneous discourse. Aphasiology 21: 299–319.
Cocks, Naomi, Laetitia Sautin, Sotaro Kita, Gary Morgan and Sally Zlotowitz 2009. Gesture and
speech integration: An exploratory study of a man with aphasia. International Journal of Lan-
guage and Communication Disorders 44: 795–804.
Cohen, Akiba A. 1977. The communicative functions of hand illustrators. Journal of Communica-
tion 27: 54–63.
Cohen, Akiba A. and Randall P. Harrison 1973. Intentionality in the use of hand illustrators in face-
to-face communication situations. Journal of Personality and Social Psychology 28: 276–279.
Corballis, Michael 2003. From Hand to Mouth: The Origins of Language. Princeton, NJ: Princeton
University Press.
de Fornel, Michel 1992. The return gesture: Some remarks on context, inference, and iconic ges-
ture. In: Peter Auer and Aldo Di Luzio (eds.), The Contextualisation of Language, 159–176.
de Ruiter, Jan-Peter 1998. Gesture and speech production. Ph.D. dissertation, MPI Series in Psy-
cholinguistics, Catholic University of Nijmegen, the Netherlands.
Duncan, Starkey and George Niederehe 1974. On signalling that it’s your turn to speak. Journal of
Experimental Social Psychology 10: 234–247.
Efron, David 1941. Gesture and Environment. New York: King’s Crown Press.
Feyereisen, Pierre and Isabelle Havard 1999. Mental imagery and production of hand gestures
while speaking in younger and older adults. Journal of Nonverbal Behavior 23: 153–171.
Feyereisen, Pierre, Xavier Seron and M. de Macar 1981. L’interpretation de differentes categories
de gestes chez des sujets aphasiques. Neuropsychologia 19: 515–521.
Feyereisen, Pierre and Martial van der Linden 1997. Immediate memory for different kinds of ges-
tures in younger and older adults. Cahiers de Psychologie Cognitive/Current Psychology of
Cognition 16: 519–533.
Feyereisen, Pierre, Michèle van de Wiele and Fabienne Dubois 1988. The meaning of gestures:
What can be understood without speech? Cahiers de Psychologie Cognitive 8: 3–25.
Fillmore, Charles 1971. Types of lexical information. In: Danny D. Steinberg and Leon A. Jako-
bovits (eds.), Semantics. An Interdisciplinary Reader in Philosophy, Linguistics and Psychol-
ogy, 370–392. New York: Cambridge University Press.
Frick-Horbury, Donna and Robert E. Guttentag 1998. The effects of restricting hand gesture pro-
duction on lexical retrieval and free recall. American Journal of Psychology 111: 43–62.
Gerwing, Jennifer and Janet B. Bavelas 2004. Linguistic influences on gesture’s form. Gesture 4(2):
157–195.
Gliga, Teodora and Gergely Csibra 2009. One-year-old infants appreciate the referential nature of
deictic gestures and words. Psychological Science 20: 347–353.
Glosser, Guila, Mary J. Wiley and Edward J. Barnoski 1998. Gestural communication in Alzhei-
mer’s disease. Journal of Clinical and Experimental Neuropsychology 20: 1–13.
Goldin-Meadow, Susan 2000. Beyond words: The importance of gestures to researchers and learn-
ers. Child Development 71: 231–239.
Goldin-Meadow, Susan, Howard Nusbaum, Spencer D. Kelly and Susan Wagner 2001. Explaining
math: Gesturing lightens the load. Psychological Science 12: 516–522.
Goldin-Meadow, Susan and Catherine Sandhofer 1999. Gestures convey substantive information
about a child’s thoughts to ordinary listeners. Developmental Science 2: 67–74.
Goldin-Meadow, Susan, Martha Wagner Alibali, and Ruth Breckinridge Church 1993. Transi-
tions in concept acquisition: Using the hand to read the mind. Psychological Review 100:
279–297.
Goldin-Meadow, Susan, Debra Wein and Cecilia Chang 1992. Assessing knowledge through ges-
ture: Using children’s hands to read their minds. Cognition and Instruction 9: 201–219.
Goodwin, Charles 1986. Gesture as a resource for the organization of mutual orientation. Semi-
otica 62: 29–49.
Gräfenhain, Maria, Tanya Behne, Malinda Carpenter and Michael Tomasello 2009. One-year-
olds’ understanding of nonverbal gestures directed to a third person. Cognitive Development
24: 23–33.
Graham, Jean A. and Michael Argyle 1975. A cross-cultural study of the communication of extra-
verbal meaning by gestures. International Journal of Psychology 10: 57–67.
Gullberg, Marianne 2003. Eye movements and gestures in human face-to-face interaction. In:
Jukka Hyönä, Ralph Radach and Heiner Deubel (eds.), The Mind’s Eye: Cognitive and
Applied Aspects of Eye Movement Research, 685–703. Oxford: Elsevier.
Gullberg, Marianne 2006. Handling discourse: Gestures, reference tracking, and communication
strategies in early L2. Language Learning 56: 155–196.
Gullberg, Marianne and Kenneth Holmqvist 1999. Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics & Cognition 7: 35–63.
854 V. Methods
Gullberg, Marianne and Kenneth Holmqvist 2006. What speakers do and what addressees look at:
Visual attention to gestures in human interaction live and on video. Pragmatics & Cognition 14:
53–82.
Gullberg, Marianne and Sotaro Kita 2009. Attention to speech-accompanying gestures: Eye
movements and information uptake. Journal of Nonverbal Behavior 33: 251–277.
Hadar, Uri, Dafna Wenkert-Olenik, R. Krauss and Nachum Soroker 1998. Gesture and the pro-
cessing of speech: Neuropsychological evidence. Brain and Language 62: 107–126.
Heath, Christian 1989. Pain talk: The expression of suffering in the medical consultation. Social
Psychology Quarterly 52: 113–125.
Heath, Christian 2002. Demonstrative suffering: The gestural (re)embodiment of symptoms. Jour-
nal of Communication 52: 597–616.
Holle, Henning and Thomas C. Gunter 2007. The role of iconic gestures in speech disambiguation:
ERP evidence. Journal of Cognitive Neuroscience 19: 1175–1192.
Holle, Henning, Thomas C. Gunter, Shirley-Ann Rüschemeyer, Andreas Hennenlotter and Marco Ia-
coboni 2008. Neural correlates of the processing of co-speech gestures. NeuroImage 39: 2010–2024.
Holler, Judith 2003. Semantic and pragmatic aspects of representational gestures: Towards a uni-
fied model of communication in talk. Ph.D. dissertation, Department of Psychology, University
of Manchester, Manchester (UK).
Holler, Judith and Geoffrey Beattie 2002. A micro-analytic investigation of how iconic gestures
and speech represent core semantic features in talk. Semiotica 142: 31–69.
Holler, Judith and Geoffrey Beattie 2003a. How iconic gestures and speech interact in the repre-
sentation of meaning: Are both aspects really integral to the process? Semiotica 146: 81–116.
Holler, Judith and Geoffrey Beattie 2003b. Pragmatic aspects of representational gestures: Do
speakers use them to clarify verbal ambiguity for the listener? Gesture 3(2): 127–154.
Holler, Judith, Spencer D. Kelly, Peter Hagoort and Asli Özyürek 2012. When gestures catch the eye:
The influence of gaze direction on co-speech gesture comprehension in triadic communication.
In: Naomi Miyake, David Peebles and Richard D. Cooper (eds.), Proceedings of the 34th Annual
Conference of the Cognitive Science Society, 467–472. Austin, TX: Cognitive Society.
Holler, Judith, Heather Shovelton and Geoffrey Beattie 2009. Do iconic hand gestures really con-
tribute to the communication of semantic information in a face-to-face context? Journal of
Nonverbal Behavior 33: 73–88.
Holler, Judith and Rachel Stevens 2007. The effect of common ground on how speakers use gesture
and speech to represent size information. Journal of Language and Social Psychology 26: 4–27.
Holler, Judith and Katie Wilkin 2009. Communicating common ground: How mutually shared
knowledge influences speech and gesture in a narrative task. Language and Cognitive Processes
24: 267–289.
Holler, Judith and Katie Wilkin 2011. Co-speech gesture mimicry in the process of collaborative
referring durig face-to-face dialogue. Journal of Nonverbal Behavior 35: 133–153.
Hostetter, Autumn B., Martha W. Alibali and Sotaro Kita 2007. I see it in my hands’ eye: Represen-
tational gestures reflect conceptual demands. Language and Cognitive Processes 22: 313–336.
Jacobs, Naomi and Alan Garnham 2007. The role of conversational hand gestures in a narrative
task. Journal of Memory and Language 56: 291–303.
Kelly, Spencer D. 2001. Broadening the units of analysis in communication: speech and nonverbal
behaviours in pragmatic comprehension. Journal of Child Language 28: 325–349.
Kelly, Spencer D., Dale Barr, Ruth Breckinridge Church and Katheryn Lynch 1999. Offering a
hand to pragmatic understanding: The role of speech and gesture in comprehension and mem-
ory. Journal of Memory and Language 40: 577–592.
Kelly, Spencer D. and Ruth Breckinridge Church 1997. Can children detect conceptual information
conveyed through other children’s nonverbal behaviors? Cognition and Instruction 15: 107–134.
Kelly, Spencer D. and Ruth Breckinridge Church 1998. A comparison between children’s and
adults’ ability to detect conceptual information conveyed through representational gestures.
Child Development 69: 85–93.
Kelly, Spencer D., Peter Creigh and James Bartolotti 2010. Integrating speech and iconic gestures
in a stroop-like task: Evidence for automatic processing. Journal of Cognitive Neuroscience
22(4): 683–694.
Kelly, Spencer D., Jana M. Iverson, Joseph Terranova, Julia Niego, Michael Hopkins and Leslie
Goldsmith 2002 Putting language back in the body: Speech and gesture on three time frames.
Developmental Neuropsychology 22: 323–349.
Kelly, Spencer D., Asli Özyürek and Eric Maris 2010. Two sides of the same coin: Speech and ges-
ture mutually interact to enhance comprehension. Psychological Science 21: 260–267.
Kelly, Spencer D., Sarah Ward, Peter Creigh and James Bartolotti 2007. An intentional stance
modulates the integration of gesture and speech during comprehension. Brain and Language
101: 222–233.
Hague: Mouton.
Kendon, Adam 1985. Some uses of gesture. In: Deborah Tannen and Muriel Saville-Troike (eds.),
Perspectives on Silence, 215–234. Norwood, NJ: Ablex.
Kendon, Adam 2000. Language and gesture: Unity or duality. In: David McNeill (ed.), Language
ior 32: 123–131.
Kita, Sotaro 2009. Cross-cultural variation of speech-accompanying gesture: A review. Language
and Cognitive Processes 24: 145–167.
Kita, Sotaro and Thomas S. Davies 2009. Competing conceptual representations trigger co-speech
representational gestures. Language and Cognitive Processes 24: 761–775.
Kita, Sotaro and Hedda Lausberg 2008. Generation of co-speech gestures based on spatial imag-
ery from the right-hemisphere: Evidence from split-brain patients. Cortex 44: 131–139.
Krauss, Robert M., Yihsiu Chen and Rebecca Gottesman 2000. Lexical gestures and lexical access:
A process model. In: David McNeill (ed.), Language and Gesture, 261–283. New York: Cam-
Krauss, Robert M., Robert A. Dushay, Yihsiu Chen and Frances Rauscher 1995. The communicative
value of conversational hand gesture. Journal of Experimental Social Psychology 31: 533–552.
Krauss, Robert M., Palmer Morrel-Samuels and Christina Colasante 1991. Do conversational
hand gestures communicate? Journal of Personality and Social Psychology 61: 743–754.
Lausberg, Hedda and Sotaro Kita 2002. Dissociation of right and left hand gesture spaces in split-
brain patients. Cortex 38: 883–886.
Lickiss, Karen P. and A. Rodney Wellens 1978. Effects of visual accessibility and hand restraint on
fluency of gesticulator and effectiveness of message. Perceptual and Motor Skills 46: 925–926.
McNeill, David 2001. Analogic/Analytic representations and cross-linguistic differences in think-
ing for speaking. Cognitive Linguistics 11: 43–60.
McNeill, David and Susan Duncan 2000. Growth points in thinking-for-speaking. In: David
856 V. Methods
Melinger, Alissa and Willem J. Levelt 2004. Gesture and the communicative intention of the
speaker. Gesture 4: 119–141.
Melinger, Alissa and Sotaro Kita 2007. Conceptualisation load triggers gesture production. Lan-
guage and Cognitive Processes 22: 473–500.
Mol, Lisette, Emiel Krahmer, Alfons Maes and Marc Swerts 2009. The communicative import of
gestures: Evidence from a comparative analysis of human-human and human-machine interac-
tions. Gesture 9(1): 97–126.
Mondada, Lorenza 2007. Multimodal resources for turn-taking: Pointing and the emergence of
possible next speakers. Discourse Studies 9: 194–225.
in a conversation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures: Mean-
ing and Use, 259–265. Porto: Universidade Fernando Pessoa.
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects of
Özyürek, Asli, Sotaro Kita, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2008. Development of cross-linguistic variation in speech and gesture: Motion events
in English and Turkish. Developmental Psychology 44: 1040–1054.
Özyürek, Asli, Sotaro Kita, Shanley Allen, Reyhan Furman and Amanda Brown 2005. How does
linguistic framing of events influence co-speech gestures? Insights from cross-linguistic varia-
tions and similarities. Gesture 5(1/2): 219–240.
Özyürek, Asli, Roel M. Willems, Sotaro Kita and Peter Hagoort 2007. On-line integration of
semantic information from speech and gesture: Insights from event-related brain potentials.
Journal of Cognitive Neuroscience 19: 605–616.
Parrill, Fey 2010. The hands are part of the package: Gesture, common ground, and information packag-
ing. In: John Newman and Sally Rice (eds.), Empirical and Experimental Methods in Cognitive/
Functional Research, 285–302. Stanford, CA: Center for the Study of Language and Information.
Parrill, Fey and Irene Kimbara 2006. Seeing and hearing double: The influence of mimicry in
speech and gesture and observers. Journal of Nonverbal Behavior 30: 157–166.
Perry, Michelle, Ruth Breckinridge Church and Susan Goldin-Meadow 1988. Transitional knowl-
edge in the acquisition of concepts. Cognitive Development 3: 359–400.
Pickering, Martin J. and Simon Garrod 2004. Toward a mechanistic psychology of dialogue.
Behavioural and Brain Sciences 27: 1–57.
Pine, Karen, Hannah Bird and Elizabeth Kirk 2007. The effects of prohibiting gestures on chil-
dren’s lexical retrieval ability. Developmental Science 10: 747–754.
Pine, Karen, Nicola Lufkin and David Messer 2004. More gestures than answers: Children learn-
ing about balance. Developmental Psychology 40: 1059–1067.
Rauscher, Frances H., Robert M. Krauss and Yihsiu Chen 1996. Gesture, speech, and lexical
access: The role of lexical movements in speech production. Psychological Science 7: 226–231.
Rime, Bernard, Loris Schiaratura, Michel Hupet and Anne Ghysselinckx 1984. Effects of relative
immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level. Moti-
vation and Emotion 8: 311–325.
Riseborough, Margaret G. 1981. Physiographic gestures as decoding facilitators: Three experiments
exploring a neglected facet of communication. Journal of Nonverbal Behavior 5: 172–183.
Rizzolatti, Giacomo and Michael Arbib 1998. Language within our grasp. Trends in Neuroscience
21: 188–194.
Rogers, William T. 1978. The contribution of kinesic illustrators toward the comprehension of ver-
bal behavior within utterances. Human Communication Research 5: 54–62.
Roth, Wolff-Michael 2001. Gestures: Their role in teaching and learning. Review of Educational
Research 71: 365–392.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday
53. Documentation of gestures with motion capture 857
Gestures. Proceedings of the Berlin Conference April 1998 (Körper Zeichen Kultur 9), 205–216.
Berlin: Weidler.
Skipper, Jeremy I., Susan Goldin-Meadow, Howard C. Nusbaum and Steven L. Small 2007.
Speech-associated gestures, Broca’s area, and the human mirror system. Brain and Language
101: 260–277.
Straube, Benjamin, Antonia Green, Andreas Jansen, Anjan Chatterjee and Tilo Kircher 2010.
Social cues, mentalizing and the neural processing of speech accompanied by gestures. Neurop-
sychologia 48: 382–393.
guage and Social Interaction 27: 239–267.
Aldo di Luzio (eds.), The Contextualisation of Language, 135–157. Amsterdam, NL: John Benjamins.
Thompson, Laura A. 1995. Encoding and memory for visible speech and gestures: A comparison
between young and older adults. Psychology & Aging 10: 215–228.
Thompson, Laura A. and Dominic W. Massaro 1986. Evaluation and integration of speech and
pointing gestures during referential understanding. Journal of Experimental Child Psychology
42: 144–168.
Wesp, Richard, Jennifer Hesse, Donna Keutmann and Karen Wheaton 2001. Gestures maintain
spatial imagery. American Journal of Psychology 114: 591–600.
Willems, Roel M., Asli Özyürek and Peter Hagoort 2007. When language meets action: The neural
integration of gesture and speech. Cerebral Cortex 17: 2322–2333.
Willems, Roel M., Asli Özyürek and Peter Hagoort 2009. Differential roles for left inferior frontal
and superior temporal cortex in multimodal integration of action and language. NeuroImage
47: 1992–2004.
Wu, Ying Choon and Seana Coulson 2007. How iconic gestures enhance communication: An ERP
study. Brain and Language 101: 234–245.
Yan, Stephanie and Elena Nicoladis 2009. Finding le mot juste: Differences between bilingual and
monolingual children’s lexical access in comprehension and production. Bilingualism: Lan-
guage and Cognition 12: 323–335.
Judith Holler, Manchester (UK)
53. Documentation of gestures with motion capture

1. Introduction
2. Tracking technologies
3. Representation formats and tools for analysis
4. Examples of motion capture studies
5. Best practices for study design
6. Conclusion
7. References
Abstract
For the scientific observation of non-verbal communication behavior, video recordings
are the state of the art. However, everyone who has conducted at least one video-based
858 V. Methods
study has probably made the experience, that it is difficult to get the setup right, with
respect to image resolution, illumination, perspective, occlusions, etc. And even more
effort is needed for the annotation of the data. Frequently, even short interaction
sequences may consume weeks or even months of rigorous full-time annotations.
One way to overcome some of these issues is the use of motion capturing for assessing
(not only) communicative body movements. There are several competing tracking tech-
nologies available, each with its own benefits and drawbacks. The article provides an
overview of the basic types of tracking systems, presents representation formats and
tools for the analysis of motion data, provides pointers to some studies using motion cap-
ture and discusses best practices for study design. However, the article also stresses that
motion capturing still requires some expertise and is only starting to become mobile
and reasonably priced – arguments not to be neglected.
1. Introduction
Modern tracking technology for full body tracking, also referred to as motion capturing,
can be used as an alternative or a complementary method to video recordings (see also
Pfeiffer this volume, on documenting with data gloves for a discussion). It offers a high
precision of position and orientation information regarding specific points of reference,
which can be chosen by the experimenter. Motion tracking can be used to collect move-
ment trajectories, posture data, speed profiles and other performance indices. Disadvan-
tages are: the employment of obtrusive technology that has to be applied to the body of
the target, such as reflective markers used by optical tracking systems; data which de-
scribes rather the postures and movements of a stick-like figure than that of a body
with certain masses. Therefore, motion capturing should almost always go together
with video recordings; and last but not least the costs of the installation.
The following text provides a short introduction into the state of the art of tracking
technology that is available for the use in laboratories. While there are some companies
that offer high-quality motion capturing services for the film and gaming industries,
making use of their services is probably beyond the scope of more modest research
projects. The following presentation is therefore focused on the technology that can
be found in research laboratories at present.
2. Tracking technologies
There are two general types of tracking technologies available: marker-based systems
and marker-less systems. Both types have their advantages, especially in the context
of authentic empirical research of natural human communication.
2.1. Marker-based tracking systems

Marker-based tracking systems are the most common type. When speaking of motion
capturing, the picture of a guy in a black suit (a motion-capturing suit) that is systemat-
ically sprinkled with bright markers might come into our minds. Such systems are com-
monly used in film and game development to control computer graphical animations; a
famous example being the animation of Gollum in The Lord of the Rings by Andy Serkis
(who actually wore a blue dress with small black patches and bright markers).
As the name suggests, marker-based tracking systems rely on the detection of spe-
cific markers. The systems differ in the types of markers they use: some are based on
passive reflective markers or colored patches others use active infrared-LEDs. Also,
combinations of different types of markers are used. The markers can be single objects,
most frequently spheres that support a tracking of their position. As the position is spe-
cified in three coordinates (X/Y/Z), these markers provide three degrees of freedom
(3 DoF). Other markers, such as the ones presented in Fig. 53.1, have a more complex
3D structure. These markers also support a tracking of their orientation and thus
have six DoF (6 DoF, 3 DoF for the position and 3 DoF for the orientation). Markers
with 6 DoF can also be uniquely identified, which is not easily possible with sphere-like
3-DoF markers.
Fig. 53.1: Left: The optical tracking system from Advanced Realtime Tracking GmbH uses infra-
red cameras (upper left and upper right corner) to measure the position and orientation of specif-
ically designed targets. The targets use reflective spherical markers in unique 3D configurations,
which enables the system to uniquely identify each marker. The person at the center wears markers
at distinct positions relevant for a study on body movements related to verb productions. Right: The
picture on the right shows a screenshot of the software visualizing the tracking data recorded using
the setup on the left.
The systems with motion-capturing suites operate in the so-called outside-in mode: the
markers are attached to the object of interest and the motion of the markers is observed
by appropriate devices, such as high-speed infrared cameras. The markers have to be
carefully arranged on the target, so that all relevant movements are captured and the
structure of the body can be reconstructed from the data. Fig. 53.1 shows a participant
of one of our studies with a small set of markers attached to knees, shoulders, elbows
and head. The hands are tracked with the special finger-tracking system offered by
the Advanced Realtime Tracking GmbH (ART 1999). Often, the body is represented
by a skeletal stick figure. The movements of the markers are then translated into the
movements of the skeletal representation of the body (see Fig. 53.1, right). As the
860 V. Methods
markers are attached to the soft skin of the body and not to the bones themselves, a
mapping from the markers to the skeletal figure has to be defined, e.g. in an initial cal-
ibration procedure. This type of tracking is the standard tracking system for human
body movements today. Prominent examples of such tracking systems are the systems
developed by Advanced Realtime Tracking GmbH (ART 1999), Vicon Motion Systems
(VMS 1984) and Motion Analysis (MA 1982).
The advantages of the outside-in tracking are that the markers which are attached to
the tracked target are very small, lightweight and cheap. If the full body is to be tracked,
many markers need to be attached to get a good approximation of the body’s posture
and thus size, weight and costs of markers are important factors.
A major disadvantage is that the devices used to detect the markers and their move-
ments need to be set up with a good view on the target object. Thus outside-in tracking
faces a problem similar to those faced by standard video recordings. The resolution and,
e.g., the focal distances of the cameras are also restrictive factors constraining the opera-
tional area in which movement can be tracked, the interaction space of the tracking system.
A typical setup for the tracking of dyadic communication has an interaction space of 3 m x
3 m. Such a setup would already require eight or more cameras, which have to be placed
carefully in the surrounding of the interaction space and which have to be thoroughly cali-
brated to construct a common coordinate system as frame of reference. An advantage of
the motion tracking approach in contrast to normal video recordings is that whereas the
video from each camera has to be annotated separately, the motion-capturing system pro-
vides one integrated data point for each marker. Thus, increasing the number of cameras
in an outside-in tracking system will only increase the accuracy and range of the system
and have no effects on the effort to invest for the analysis of the data.
Inside-out tracking reverses the arrangement of tracking devices and markers: the
tracking devices are attached to the target and the markers are distributed over the
environment, e.g. the ceiling. This kind of setup can reduce costs if only few positions
need to be tracked. It also enables tracking in large interaction spaces which are too
large to be covered with an outside-in tracking system. The tracking devices, however,
are larger and less lightweight than the corresponding markers and they will be more
expensive and more fragile. This will increase the obtrusion of the tracking technology
experienced by the tracked participant and thus might negatively influence the perfor-
mance to be observed. Well-known examples for inside-out tracking are the GPS sys-
tem used for navigation or the Wii-Remote (Wii 2006) developed by Nintendo for
their Wii gaming console. In case of the Wii-Remote, however, the rather large sensor
has to be held in the hand during the recordings.
A special case of marker-based inside-out tracking are magnetic tracking systems,
such as the Ascension Flock of Birds (ATC 1986, see Fig. 53.2). These systems have sen-
sor devices, which are attached to the target, but they do not use discrete markers in the
environment. Instead, they have an outside unit which produces a magnetic field that
covers the interaction space (3 m x 4.5 m) and in which the sensors can measure
their position and orientation. The advantage of these systems is that they have no
need of a clear line of sight to markers and provide a high update rate greater than
100 Hz. Current systems, such as Ascension MotionStar, can operate wired or wireless
and provide up to 20 sensor positions per target. The accuracy of these systems depends
on the structure of the magnetic field and is typically better, the closer the sensor is to
the field generating unit.
Fig. 53.2: Ascension’s Flock of Bird is the most prominent magnetic tracking solution. The black
box in the back generates a magnetic field in which the sensor presented in the front can deter-
mine its position and orientation.
2.2. Marker-less tracking systems

Marker-less tracking systems only require the sensor devices and no artificial enhance-
ments of the environment or the target. They also come as inside-out or outside-in sys-
tems. Examples for inside-out systems are inertial trackers, that measure relative
accelerations from which changes in translation and orientation can be derived (or to
be more precise: integrated). The disadvantage of these systems is that they only mea-
sure relative movements and are subject to drift errors, which need to be accounted for.
Fig. 53.3: Example of a segmentation of four persons based on the depth image provided by a Mi-
crosoft Kinect using marker-less tracking. The basic depth image data is visualized as a grey-scale
image and brighter colors represent areas closer to the Kinect. The segmentation and detection of
persons, here overlaid using colored regions, was computed by OpenNI (ONI 2011). The person to
the left also had OpenNI’s skeleton tracking activated.
Marker-less outside-in tracking systems observe the movement of the body from a dis-
tance. Most of the available systems operate in the visual domain. For an overview of
purely vision-based techniques, see Wang and Singh (2003), Moeslund, Hilton, and Krüger
(2006), Poppe (2007) or Poppe (2010).
A recent prominent example of an outside-in tracking system is the Microsoft Kinect
(MSDN 2010), an interaction device based on a depth camera produced by PrimeSense
862 V. Methods
(PS 2005), which actually uses a kind of marker, a pattern of structured light which is
projected onto the target and whose distortions are measured to extract depth informa-
tion. The Kinect does not require any attachments to the target. As a first result, the
Kinect provides a depth image that is represented as a greyscale image where the indi-
vidual intensities encode the depth of the first object hit by the light (see Fig. 53.3). The
provided software frameworks, Microsoft Kinect SDK (MSDN 2010) for Windows or
OpenNI (ONI 2011) for Windows and other platforms, analyze the depth image and
extract skeletal information in a second step (see Fig. 53.3, person to the left). This skel-
eton model is still rather coarse, as can be seen in Fig. 53.3, and does not contain hands,
fingers or the orientation of the head. This technology is rather new, so more precise
versions of Kinect-like systems and better software frameworks for skeleton extraction
are to be expected.
3. Representation formats and tools for analysis

Besides device-specific raw data formats, a small collection of standard formats exists.
One of these formats is the Biovision Hierarchy (BVH) format. This format has been
originally developed by the motion tracking experts at Biovision for their data exchange
and has been widely adapted. The file format is split into two sections. A short example
for the left arm is given below in Listing 1.
The first section (HIERARCHY) is used to specify a skeleton based on a hierarchy
of joints ( JOINT, see example below), starting from a common root (ROOT). Each
joint has a certain fixed position (OFFSET) which can be specified as offset relative to
the parent joint. Besides the fixed offset, channels (CHANNELS) with dynamic data
can be defined as well. The end of such a chain of joints is marked by an end position
(End Site) which can have a last offset.
After the specification of the skeleton, the captured data is specified as a table in the
section (MOTION). First, however, the number of frames and the frame duration is
specified. Within each row, the data is arranged in columns with a layout corresponding
to the channels specified in the hierarchy. In the example, the correspondences have
been specified using subscripts – which are not part of the plain ASCII-file format
but only used here for educational purposes.
(1) Listing 1: Example of a Biovision Hierarchy file

HIERARCHY
ROOT Hips
{
OFFSET Xhip Yhip Zhip
CHANNELS 6 Xpositionhip Ypositionhip Zpositionhip Zrotationhip
Xrotationhip Yrotationhip
JOINT Chest
{
OFFSET Xchest Ychest Zchest
CHANNELS 3 Zrotationchest Xrotationchest Yrotationchest
JOINT LeftCollar
{
OFFSET XLeftColloar YLeftCollar ZLeftColla

CHANNELS 3 ZrotationLeftCollar XrotationLeftCollar
YrotationLeftCollar
JOINT LeftUpArm
{
OFFSET XLeftUpArm YLeftUpArm ZLeftUpArm
CHANNELS 3 ZrotationLeftUpArm XrotationLeftUpArm
YrotationLeftUpArm
JOINT LeftLowArm
{
OFFSET XLeftLowArm YLeftLowArm ZLeftLowArm
CHANNELS 3 ZrotationLeftLowArm XrotationLeftLowArm
YrotationLeftLowArm
JOINT LeftHand
{
OFFSET XLeftHand YLeftHand ZLeftHand
CHANNELS 3 ZrotationLeftHand XrotationLeftHand
YrotationLeftHand
End Site
{
OFFSET XEnd YEnd ZEnd
}}}}}}}
MOTION
Frames: 30
Frame Time: 0.033333
Xpositionhip Ypositionhip Zpositionhip Zrotationhip Xrotationhip Yrotationhip
Zrotationchest Xrotationchest Yrotationchest ZrotationLeftCollar XrotationLeftCollar
YrotationLeftCollar ZrotationLeftUpArm XrotationLeftUpArm YrotationLeftUpArm
ZrotationLeftLowArm XrotationLeftLowArm YrotationLeftLowArm ZrotationLeftHand
XrotationLeftHand YrotationLeftHand
There are several alternatives to the Biovision Hierarchy file format, such as the
Hierarchical Translation Rotation (HTR) and the hierarchy-less Global Translation
Rotation (GTR) formats used by the company Motion Analysis (MA 1982). A more
recent format is GMS (Luciani et al. 2006) which is more compact as it uses a binary
representation and is also more flexible than Biovision Hierarchy. It is, however, less
widespread and thus has not gained a similar support as Biovision Hierarchy yet.
The National Institute of Health defined a format mainly targeted at biomechanical
research which is called Coordinate 3D (C3D 1987). It is a binary format and supports a
large variety of data well beyond the pure 3D position and orientation data described in
the Biovision Hierarchy format. A Coordinate 3D file can include data from electro-
myography, force plates, patient information, analysis results, such as gait timing, and
is extensible to support new data.
Formats that are quite popular are the commercial Autodesk FBX format (FBX
2012), which is a binary format that can be accessed and manipulated using the FBX
SDK, or the Collada format (COLLADA 2012), which is an open format based on Ex-
tensible Markup Language (XML) (Bray et al. 2008). The gaming company Acclaim
864 V. Methods
developed their own motion capturing system and defined two file formats, Acclaim
Skeleton File (ASF) and Acclaim Motion Capture data (ACM), to store the recorded
data (Schafer 1994). These file formats have been adopted by Oxford Metrics (OMG
1984) for their Vicon system (VMS 1984). An advantage of the Acclaim Skeleton
File / Acclaim Motion Capture files is, that they are text-based American Standard
Code for Information Interchange (ASCII) files and thus human readable.
Gestures are typically analyzed by annotating a visualization of the data recordings.
Please view Section 4 in Pfeiffer this volume, on documentation with data gloves for a
description of software tools that support the annotation of gesture recordings using
motion capture.
4. Examples of motion capture studies

One relatively well document example of the use of motion capturing in linguistic
research is the project “CAREER: Learning to Generate American Sign Language
Animation through Motion-Capture and Participation of Native American Sign Lan-
guage Signers” by Matt Huenerfauth (for more examples on machine learning on
American Sign Language see Loeding et al. 2004). Huenerfauth and Lu (2010)
collected a corpus on American Sign Language (ASL) for improving software for the
generation of American Sign Language. They were especially interested in spatial
reference points created by the speakers in American Sign Language. Spatial refer-
ence points are locations in 3D signing space that are associated with entities under
discussion and that serve as reference points later.
For their investigations, they combined motion capture recordings with other tech-
nologies. They used two Immersion CyberGloves for recording the hand and finger
movements, an American Sign Language H6 eye-tracking system, an Intersense
IS-900 (ultrasonic inside-out tracking) for absolute head-tracking and an Animazoo
IGS-190 bodysuit for relative motion tracking of the body (interial/magnetic inside-
out tracking). In previous work Huenerfauth (2006) recorded motion capture for
American Sign Language in a study and achieved only poor quality, so that American
Sign Language readers could barely understand the gestures. Apparently, reasons for
this were dropped connections, noise and poor calibration. In their most recent work,
they thus emphasize the need for a proper selection, setup and calibration of the equip-
ment. One way to achieve this is by elaborating and standardizing the procedures used
in this phase of a study as thoroughly as possible. For example, Lu and Huenerfauth
(2009) report on a special protocol they developed for calibrating the data-gloves
(Immersion CyberGloves) used in their studies.
In our own study on manual pointing (Kranstedt et al. 2006) we used an optical
tracking system from Advanced Realtime Tracking GmbH (ART 1999) to record the
pointing movements of speakers when referring to objects in a naming game task.
These references to the environment bring in another quality of research that can be
done with motion capture: by creating an abstract 3D model of the experiment setup,
we were able to relate the pointing gestures to the target objects. This way, we could
measure the precision of a pointing gesture automatically and evaluate different models
in data-driven simulations to derive a model of the intended pointing direction from the
recorded gesture trajectory and hand shape (Pfeiffer 2011).
Other well documented studies using motion capturing have been recorded by the
Spontal project (Beskow et al. 2009) and by the POETICON project (Pastra et al. 2010).

Common configurations of tracking systems include an outside-in marker-based track-
ing for full body movements and data-gloves if the fine grained movements of the fin-
gers are to be observed (see Pfeiffer this volume). Only few systems support both body
and finger movements with the same tracking technology, thus the body tracking is
often done visually and finger tracking using data-gloves measuring changes in flexible
cables. If only few data points are required, such as position and orientation of the
hands, 6 DoF markers are of great use.
Every tracking system has a coordinate system as reference. If several tracking systems
are to be combined, it is essential to have a reliable mapping from one coordinate system
to the other. Also, the order, orientation, and scaling of the axes might be different from
one system to the other. The position and orientation of the origins of the coordinate sys-
tems might be located depending on the research question. If, e.g., participants are inter-
acting with the environment, the coordinate system might best be located at an absolute
position within the interaction space (with reference to the room). In other situations, e.g.
when co-verbal gestures of a speaker are of interest, the origin of the coordinate system
might best be located at a specific point of the participant’s body, such as close to the pel-
vis bone or the center of mass on the level of the chair if the participant is seated (with
reference to the body). With the latter choice, issues of laterality can be analyzed more
easily (e.g. data from left-handed participants could be flipped by negating one axis) and
data from different participants can be compared more easily. It could also be interesting
to normalize gestures, by taking the length of the arms and the height of the body into
account and scaling the recorded data accordingly.
At the beginning of a recording session, it is helpful to generate some artifacts that are
recorded by all employed recording technologies to provide grounds for a synchroniza-
tion of the multimedia material. One way of achieving this is by asking the participants
to make certain exaggerated movements (Lu and Huenerfauth 2010). Alternatively, a
special device, such as the clapperboard used in film-making, can be used, to which mar-
kers from the tracking system are attached. In some conditions, it is also extremely impor-
tant to get a precise measurement of the different body parts of the participants. This is
necessary whenever skeletons have to be fitted onto the recorded point clouds.
Most motion capture recordings will include some noise, i.e. missing marker positions,
orientations or little jumps of marker data. This noise is the result of partial occlusions or
of targets leaving the tracking range of the sensors. Depending on the later usage of the
data, it might be relevant to add a post-processing data clean-up procedure to identify and
correct the data, if possible. A thorough preparation and conduction of a motion captur-
ing study should thus be combined with a control of the quality of the actually recorded
data. While problems in audio or video recordings are relatively easy to identify, the eval-
uation of motion capture data is more difficult. One straight-forward way is to generate
statistics about dropped markers and general noise (e.g. jumps). For most applications,
a visual review of the recorded data based, e.g., on a data-driven animation of a stick fig-
ure or an animated human-like virtual avatar will provide best results. The animation of a
virtual avatar, however, will be rather costly to realize in terms of effort and time. In
866 V. Methods
Pfeiffer, Kranstedt, and Lücking (2006) we describe an approach where we visualized the
recorded data in an immersive Virtual Reality environment that allowed us to evaluate a
life-sized 3D multimodal visualization of all data recorded in our study on pointing ges-
tures simultaneously in an integrated view. This helped us to assess the quality of the dif-
ferent data channels (motion data, video, audio, speech transcription, and gesture
annotation) in one place and led to an improved quality of the created corpus.
6. Conclusion
Motion capturing technology paves the way for large corpora of quantitative data on
gestures at high spatial and temporal resolutions. The temporal and spatial precision
of today’s motion capturing systems is higher than anything that has been archived
based on the annotation of video recordings, and systems are still improving.
These advantages, however, come at a cost: tracking equipment is still expensive and
it requires some expertise to be set up and used. Also, pure tracking data is not as easily
evaluated as video recordings. With some experience, these investments are compen-
sated by a greatly reduced time required for the analysis of the data, as the manual
annotation of the recordings is reduced to a minimum, e.g. to identify and label the
time intervals of interest.
The increased interest in game production and consumer electronics has also led to fur-
ther developments in the field. Basic tracking systems already have a four-digit price tag and
more advancement can be expected. Consumer tracking systems, such as the Microsoft
Kinect, allow the tracking of simple skeleton models in a restricted interaction volume. So-
lutions combining several of such systems to extend precision and interaction volume are
also available. In the near future, we can expect high quality tracking systems to be found
in nearly every household. As consumers are also not too enthusiastic in attaching markers,
the most successful systems will be based on unobtrusive marker-less tracking technology.
And, maybe even more important, we will become attuned to being tracked by such systems
through everyday exposure – maybe even more than we are used to being videotaped – and
thus we will be less affected by the use of motion capturing in experimental settings.
Motion capturing, however, is not the panacea for linguistic research, as the follow-
ing last example underlines. Martell and Kroll (2007) used a machine-learning approach
to identify gesture phases in a pre-annotated video-based corpus. Their annotation of
the gestures is based on FORM. In a first study, they pre-annotated the positions of
the end-effector (the hand) within a 5x5x5 grid manually and were able to train a Hid-
den Markov Model to detect gesture phases – but only with moderate results. In a sec-
ond study, they compared the use of motion-capture data with the manual annotations
for the same problem. They expected that the fidelity of the motion capture data would
lead to more successful classifications. However, the opposite was the case and manual
annotations performed better than motion capture as basis for the machine learning
process. They explained this by a smoothing of the data done by the human annotators
that did not happen with the raw motion capture data. In addition, the annotation
scheme based on the discrete 5x5x5 grid abstracted away from detailed trajectories
and thus the annotation of many different gesture paths look the same. This more
coarse representation, a result of the pre-processing by human annotators, could have
been easier to learn by the machine learning algorithm. The punch line is that one
has to be careful when optimizing away human involvement from an analytical process,
as we might not yet have penetrated the topic deep enough to mold our knowledge into
an algorithm and leave the machine alone to handle the data
7. References
ART 1999. Advanced Realtime Tracking GmbH, online: http://www.ar-tracking.de (last access
January 2012).
ATC 1986. Ascension Technology Corporation, online: http://www.ascension-tech.com/ (last access
February 2012).
Beskow, Jonas, Jens Edlund, Kjell Elenius, Kahl Hellmer, David House and Sofia Strömbergsson
2009. Project presentation: Spontal – multimodal database of spontaneous dialog. In: Peter
Branderud and Hartmut Traunmüller (eds.), Proceedings of FONETIK 2009, 190–193. Stock-
holm: Stockholm University.
Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler and François Yergeau 2008. Extensible
Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation, 26 November 2008. W3C.
C3D 1987. The 3D Biomechanics Data Standard. Online: http://www.c3d.org/ (last access January 2012).
COLLADA 2012. Khronos Group, COLLAborative Design Activity, online: https://collada.org
(last access January 2012).
FBX 2012. Autodesk FBX, online: http://www.autodesk.com/fbx (last access January 2012).
Huenerfauth, Matt 2006. Generating American sign language classifier predicates for English-to-
ASL machine translation. Ph.D. dissertation, Department of Computer and Information
Sciences, University of Pennsylvania.
Huenerfauth, Matt and Pengfei Lu 2010. Eliciting spatial reference for a motion-capture corpus of
American Sign Language discourse. In: Philippe Dreuw, Eleni Efthimiou, Thomas Hanke,
Trevor Johnston, Gregorio Martı́nez Ruiz and Adam Schembri (eds.), 4th Workshop on the
Representation and Processing of Signed Languages (LREC 2010): http://www.sign-lang.uni-
hamburg.de/lrec2010/lrec_cslt_01.pdf. CD Content (Copyright by the European Language Re-
sources Association - ISBN 2-9517408-6-7). 121–124.
Kranstedt, Alfred, Andy Lücking, Thies Pfeiffer, Hannes Rieser and Ipke Wachsmuth 2006. Deic-
tic object reference in task-oriented dialogue. In: Gert Rickheit and Ipke Wachsmuth (eds.),
Situated Communication, 155–207. Berlin: De Gruyter Mouton.
Loeding, Barbara L., Sudeep Sarkar, Ayush Parashar and Arthur I. Karshmer 2004. Progress in
automated computer recognition of sign language. Computers Helping People with Special
Needs 3118: 1079–1087.
Lu, Pengfei and Matt Huenerfauth 2009. Accessible motion-capture glove calibration protocol for
recording sign language data from deaf subjects. In: Shari Trewin and Kathleen F. McCoy
(eds.), Proceedings of the 11th International ACM SIGACCESS Conference on Computers
and Accessibility, 83–90. ACM New York, NY: USA.
Lu, Pengfei and Matt Huenerfauth 2010. Collecting a motion-capture corpus of American Sign
Language for data-driven generation research. In: Melanie Fried-Oken, Kathleen F. McCoy
and Brian Roark (eds.), Proceedings of the NAACL HLT 2010 Workshop on Speech and Lan-
guage Processing for Assistive Technologies, 89–97, Association for Computational Linguistics
Stroudsburg, PA: USA.
Luciani, Annie, Matthieu Evrard, Damien Couroussé, Nicolas Castagné, Claude Cadoz and Jean-
Loup Florens 2006. A basic gesture and motion format for virtual reality multisensory applica-
tions. In: Proceedings of the 1st International Conference on Computer Graphics Theory and
Applications. Setubal. arXiv:1005.4564 [cs.HC].
MA 1982. Motion Analysis, online: http://www.motionanalysis.com/ (last access January 2012).
Martell, Craig and Joshua Kroll 2007. Corpus-based gesture analysis: An extension of the form
dataset for the automatic detection of phases in a gesture. International Journal of Semantic
Computing 1: 521.
868 V. Methods
Moeslund, Thomas B., Adrian Hilton and Volker Krüger 2006. A survey of advances in vision-based
human motion capture and analysis. Computer Vision and Image Understanding 104: 90–126.
MSDN 2010. Microsoft Kinect SDK, online: http://www.microsoft.com/en-us/kinectforwindows/
(last access January 2012).
OMG 1984. Oxford Metrics Group, online: http://www.omg3d.com (last access January 2012),
Oxford, UK.
ONI 2011. OpenNI, online: http://www.openni.org/ (last access January 2012).
Pastra, Katerina, Christian Wallraven, Michael Schultze, Argyro Vataki and Kathrin Kaulard 2010.
The POETICON corpus: Capturing language use and sensorimotor experience in everyday inter-
action. In: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph
Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of the
Seventh International Conference on Language Resources and Evaluation (LREC’10), European
Language Resources Association (ELRA). online: http://poeticoncorpus.kyb.mpg.de/ (last access
January 2012). European Language Resources Association (ELRA): Valletta, Malta. 3031–3036.
Pfeiffer, Thies 2011. Understanding Multimodal Deixis with Gaze and Gesture in Conversational
Interfaces. Aachen, Germany: Shaker.
Pfeiffer, Thies this volume. Documentation of gestures with data gloves. In: Cornelia Müller, Alan
Pfeiffer, Thies, Alfred Kranstedt and Andy Lücking 2006. Sprach-Gestik Experimente mit IADE,
dem Interactive Augmented Data Explorer. In: Stefan Müller and Gabriel Zachmann (eds.),
Dritter Workshop Virtuelle und Erweiterte Realität der GI-Fachgruppe VR/AR, 61–72. Aachen,
Germany: Shaker.
Poppe, Ronald 2007. Vision-based human motion analysis: An overview. Computer Vision and
Image Understanding 108: 4–18.
Poppe, Ronald 2010. A survey on vision-based human action recognition. Image and Vision Com-
puting 28: 976–990.
PS 2005. PrimeSense Ltd., online: http://www.primesense.com/ (last access January 2012).
Schafer, M. 1994. ASF Acclaim Skeleton File Format. online: http://mocap.co.nz/downloads/ASF_
spec_v1.html (last access January 2012).
VMS 1984. Vicon Motion Systems, online: http://www.vicon.com/ (last access January 2012), Oxford, UK.
Wang, Jessica JunLin and Sameer Singh 2003. Video analysis of human dynamics – a survey. Real-
Time Imaging 9: 321–346.
Wii 2006. Nintendo Wii Remote Gaming Controller, online: http://wii.com/ or http://en.wikipedia.
org/wiki/Wii_Remote (last access February 2012).
Thies Pfeiffer, Bielefeld (Germany)
54. Documentation of gestures with data gloves

1. Introduction
2. Tracking technologies for the hand
3. Representation formats
4. Analyzing gestural data
6. Conclusion
7. References
54. Documentation of gestures with data gloves 869
Abstract
Human hand gestures are very swift and difficult to observe from the (often) distant per-
spective of a scientific overhearer. Not uncommonly, fingers are occluded by other body
parts or context objects and the true hand posture is often only revealed to the addressee.
In addition to that, as the hand has many degrees of freedom and the annotation has
to cover positions and orientations in a 3D world – which is less accessible from the typ-
ical computer-desktop workplace of an annotator than, let’s say, spoken language – the
annotation of hand postures is quite expensive and complex.
Fortunately, the research on virtual reality technology has brought about data gloves in
the first place, which were meant as an interaction device allowing humans to manipulate
entities in a virtual world. Since their release, however, many different applications have
been found. Data gloves are devices that track most of the joints of the human hand and
generate data-sets describing the posture of the hand several times a second. The article
reviews different types of data gloves, discusses representation formats and ways to
support annotation, and presents best practices for study design using data gloves as
recording devices.
1. Introduction
The human hands form very unique and versatile parts of the body. They explore the
environment via tactile and haptic feedback, they manipulate the environment and
they communicate with interlocutors. The movements of the hands thus provide the
receptive observer with information about the current sensory information state, the
current actions or the communicative intent of a subject. Scientific observations of
the communicative function of manual gestures are what we are interested in within
this chapter.
If such observations are to be done systematically, e.g. as part of a scientific study, it
is essential that the movements and the postures taken by the hands are captured and
archived for later analysis. In this process, it is important to have ways of capturing
these data in an objective way. The live manual annotation of a gesturing interlocutor
or the depiction of a posture or movement in a sketch, for example, may be able to cap-
ture a qualitative impression of a certain behavior. The movement itself, however, is
then no longer accessible for more thorough investigation and for validation by a
third party. While such approaches to investigate, e.g., the gestural capabilities of hu-
mans are essential to gather an overview of the field, a rigid validation of such findings
will require different methods for capturing the data.
As Bouissac (this volume) shows, depictions of static and dynamic gestures are found
even in prehistoric art. However, for a long time science lacked the appropriate meth-
ods for capturing dynamic manual gestures in the moment of their production, which
without doubt is one reason for the difficulties scientists experienced in studying the
communicative function of gestures. Written texts can be approached more easily
and written language provides a means to transcribe at least the verbal part of spoken
dialogues. Interactive dialogue is less accessible, because it is fast paced and typically
multimodal.
Technical progress brought us first film recordings: celluloid film was developed by
Hannibal Goodwin in 1887 and the first film camera by Louis Le Prince in 1888; the
870 V. Methods
first amateur film camera was the Birtac by Birt Acres in 1898. Video recordings fol-
lowed shortly afterwards, starting in 1935 with AEG-Telefunken. The first home-
video recorders were marketed in 1969 by Philips and Grundig; later came Sony’s
Videomovie in 1980 and the Video 8 systems in 1982. Today, the required technology
is ubiquitously available, both for spontaneous recordings in the field, which can be
taken by any smartphone, and for high-quality and high-speed recordings in the
laboratory.
At first glance, high-definition video camcorders seem to be ideal for capturing fine-
grained hand movements during gestures. There are, however, still several problems.
First of all, a video camera provides a single perspective on the scene. It may therefore
happen that parts of the captured movements are occluded. This is typically the case
for movements of the fingers, which may easily be occluded by the back of the same
hand or the other arm and hand. This problem can be solved to some extent by
using multiple video camcorders to capture the movements from several perspectives
simultaneously. However, this significantly increases the amount of captured data, in-
troduces challenges to synchronize the different devices and will finally have a severe
impact on the processing time needed for the analysis of the recorded data. It is also
difficult to handle multiple perspectives within the same annotation tool (Hanke,
Storz, and Wagner 2010).
This leads us to the second major problem that comes with video recordings
(although it is not restricted to them): the requirement of manual annotation. Direct
or mediated observations of gestures via video recordings can only provide qualitative
data. If the results need to be quantified – and in most cases they will – the data has to
be annotated. The annotation of videos is a challenge in its own, and a rigid approach is
required to ensure a high quality of the annotations. Typically, a codebook and annota-
tion guidelines are defined, and the agreement of all participating annotators is con-
trolled. This basically means hours of manual work, often stepping through the
recorded videos frame by frame. When working on explorative studies, I have known
it to happen that the codebook was updated iteratively during the annotation process
to reflect the latest findings. This, however, always meant that the annotations already
made had to be updated as well. This is a tedious process, but going over such iterations
can only improve the quality of the annotations.
The third problem is that not all features that might be relevant for targeting a cer-
tain research question can be accessed with the same precision by annotating videos.
Features that are easily accessible concern timing, such as onset and offset of a gesture,
the well-known gesture phases preparation, stroke and retraction (Kita, van Gijn, and
van der Hulst 1998; see also Pfeiffer this volume), and handedness. However, a posture
description of the shape of the hand at a certain point in time can only be coarse-
grained if made from a fixed perspective. Under such conditions, certain features,
such as the position of the hand or the tips of individual fingers, as well as the exact pos-
ture of the hand are less accessible. I have seen cases where the position of the hand was
measured with a ruler on the screen displaying the video recordings. This approach
might be the most practicable way if no other measurement is available – or if the
need for positional data only came up after the experiment was conducted. The overall
quality of the data gathered using such measurements, however, will in most cases be
low. The distance of the recorded subjects from the camera and therefore their per-
ceived size may vary; the gesture might be subject to perspective distortions (unless
it is done completely in a plane orthogonal to the viewing direction); the spatial reso-
lution of the video recordings may be lower than required or the technical characteris-
tics of the display devices may not have been considered (e.g. different displays have
different dot densities).
An alternative to video recordings for documenting the hand movements of a subject
are so-called data gloves (see Section 2 for an overview), which the subjects can slip on
to track changes of their hand postures. Once worn, the data gloves stream descriptions
of the current hand posture (see Section 3 for representation formats) to a recording
device in near real-time. Such devices offer very rich data, up to 27 degrees of freedom
(including the translation) per hand (ElKoura and Singh 2003). Data gloves typically
provide the configuration of the fingers only. If the position and orientation of the
hand is needed, data gloves can be combined with another tracking technology (see
Pfeiffer this volume, on Motion Capturing or the discussion in Heloir, Neff, and Kipp
2010). Data gloves offer high spatial precision, which is in most cases independent
of perspective, and provide sufficient temporal precision. The rich data-set provided by
data gloves is machine readable and as such ready for further feature extraction and qual-
itative analysis (see Section 4), which can be done automatically or semi-automatically.
The time needed for analysis and evaluation will thus be significantly reduced. None-
theless, there are also some disadvantages of this technique, such as the obtrusiveness
of the technology, the increased complexity of the set-up, the required additional exper-
tise for analyzing the data and plainly the additional costs of such devices (see Section 5
for some best practices).
2. Tracking technologies for the hand

The first data glove on the market was the name-giving DataGlove (see Fig. 54.1) de-
veloped by Jaron Lanier’s company VPL Research for NASA in 1985 (Fisher 1990;
Lanier and Zimmermann 1986). Lanier is also known for coining the phrase “Virtual
Reality” in 1989, and the DataGlove was primarily used as an input device for Virtual
Reality applications. The first Data Glove was equipped with an absolute positioning
sensor at the back of the hand. The configuration of the fingers was measured relative
Fig. 54.1: Sketch of the VPL DataGlove by VPL Research. The fibre-optic cables for measuring
the flexion of the fingers are woven into the surface of the gloves.
872 V. Methods
to the back of the hand using fibre-optic cables running along the fingers. At one end of
the cable a light source emitted light into the fibre, which was measured using a photo-
sensor at the other end. In between, at each knuckle some light was released from the
fibre relative to the flexion at that particular knuckle by a precisely calibrated mecha-
nism. In this way, the software was able to calculate the flexion of the finger based on
the remaining light detected by the sensor. A comparison of these early systems has
been compiled by Eglowstein (1990).
The resolution of the DataGlove is about 5˚, which is not sufficient if a high precision
is required (Shaw et al. 1992), but the body of the glove is soft and comfortable to wear.
This allows the user to make fine-grained hand movements without much hindrance.
There are, however, some drawbacks: the DataGloves need to be calibrated for
every user, as the position of the fibres may change. A new calibration may also be nec-
essary when the DataGlove is used for a longer period, due to warming of the material
or sweat. In addition, the fibres are quite sensitive.
A different technology is used by the CyberGlove developed by Virtex. Virtex
replaced the optical fibres by metal straps which alter their resistance when bent.
This device proved more robust and is more easily calibrated than the DataGlove.
The Dextrous Hand Master developed by Elizabeth Marcus at Exos offers much higher
precision and is able to detect the splay of the thumb. However, it is based on an exos-
keleton which is quite cumbersome to set up. The flexion is calculated by measuring the
Hall-effect induced by magnets on the skeleton.
The success of the DataGlove also led to the development of the PowerGlove by
AGE, Mattel and VPL, a data glove for Nintendo NES gaming consoles. Overall
about 1 million units were sold until the end of 1991 (Rheingold 1992). The configura-
tion of the fingers is measured by a strap of polyester on which a conducting ink is
applied. The resolution of the system, however, only allows for the detection of the
bending of a whole finger, not for individual phalanxes. The position of the glove is
detected based on ultrasonic technology.
The CyberGlove data glove series by CyberGlove Systems (see Fig. 54.2) covers
modern data gloves. Some of these devices come with a wireless option, force feedback
Fig. 54.2: The wired CyberGlove I data glove by CyberGlove Systems. The bimetal stripes for mea-
suring the flexion of the fingers have been worked into the garment (see picture on the right, exam-
ple areas are highlighted in red). The black crossbar with the grey spheres is not part of the data
glove but of a complementary optical tracking system (photography by Martin Brockhoff).
or tactile stimulation. While the options for providing feedback to the wearer are in
principle not required for most documentation needs, they could be relevant for
study design if, e.g., social contact or interaction with objects is of interest. The latest
CyberGlove III system is a mobile data glove, which can record data on hand postures
to a storage card without the need for an operator personal computer. When recording
hand gestures in the field, this could be an interesting option. The system, however,
does not provide absolute positions and orientations of the hand, though it can be
combined with an additional tracking system for that purpose.
Modern data gloves that operate using optical outside-in tracking are also available,
such as the Fingertracking system by the Advanced Realtime Tracking GmbH (ART
2011, see Fig. 54.3). The Advanced Realtime Tracking system consists of an active
hand target for tracking the position and orientation of the back of the hand. Wired
to the target are 3 or 5 thimbles, small cups for the fingers with attached IR-LEDs.
The main unit is synchronized with the external optical tracking system through infra-
red flashes. Different light patterns are triggered in this way, so that the individual fin-
gers can be identified by the tracking system. The system provides a high precision
measurement of the end position of each finger, as this is where the LED is attached.
The flexion of the fingers, however, has to be computed through a model of the hand,
which is initialized in a calibration procedure. Because the Fingertracking system relies
on optical tracking, it has problems with occlusions, similar to the video recording
approach. Typical systems therefore include several tracking cameras, which observe
the tracking volume from different perspectives. This reduces the likelihood of occlu-
sion, but especially grasp-like gestures will still be a problem. The advantage of the sys-
tem is the exact data regarding the absolute position of the finger tip. This makes the
device especially attractive as an interaction device in virtual reality, where a precise
measurement of the point of contact with a virtual object is critical.
Fig. 54.3: The optical Fingertracking device by the Advanced Realtime Tracking GmbH. This
device uses active outside-in optical tracking to track markers on the hand and on the finger
tips; options for three or five fingers are available (photography by Martin Brockhoff).
Last but not least I would like to briefly mention current approaches to low-cost hand
tracking. Wang and Popović (2009) presented ColorGlove, a tracking system which
should – in principle – work with a standard webcam or with the recordings of a
video camcorder. Their key idea is to use a glove colored in a specific pattern which
allows their tracking algorithms to identify postures stored in a database. While the
ColorGlove system shows a surprisingly good performance in interactive tasks, it cannot
874 V. Methods
match the precision and stability of the professional solutions. In interactive tasks, the
user receives real-time feedback of the recognition results and can adjust his or her
behavior appropriately. This adaptive behavior improves the performance of the system,
but adaption is not desirable when documenting gesture use. Nevertheless, if, e.g., spe-
cific gestures are to be counted, the ColorGlove approach could be a viable low-cost solu-
tion. Other approaches try to deal with hand gestures recorded without modifications of
the hand. A comparison of purely vision-based implementations of finger tracking based
on the famous EyesWeb system has been compiled by Burns and Mazzarino (2006).
3. Representation formats
Once the question of which tracking device to use has been decided, it is important to
identify an appropriate representation format for storing the recorded data. Based on
my own experiences, I would recommend recording the data at a level which is as
close to the device as possible, with few if any initial abstractions. Device-specific cali-
bration procedures should however be followed. Only in a second step should the rel-
evant parameters be identified and a pre-processed version of the data suitable for
further analysis created. This ensures that the initial assumptions underlying the pre-
processing step can still be the target of discussion and refinement later on in the
analysis process. Nothing is more annoying than data that has gone to waste.
The raw data comes in different formats for the individual data glove solutions. The
CyberGlove systems, for example, are available in versions with 18 or 22 joint-angle
measurements (see Fig. 54.4). The raw data thus consists of an 18- or 22-dimensional
vector of joint angles per measurement (which can be about 90 per second). The raw
data provided by the Advanced Realtime Tracking Fingertracking system provides
only 10 joint-angles directly – those between the phalanxes – but it is more detailed
in other aspects. It includes the absolute position and orientation of the back of the
hand, as well as the positions of the finger tips relative to the back of the hand. In addi-
tion to the joint-angles, the radius of the fingertip and the lengths of the phalanxes are
Fig. 54.4: This image highlights the sensor positions on the CyberGlove system (22 joint-angle
measurements). The raw data format provides a single value for each sensor.
returned. The sample-rate of the Advanced Realtime Tracking Fingertracking system is

15 Hz for the three-finger version and 10 Hz for the five-finger version, which is much
lower than that of the CyberGlove system.
In most cases, the representation formats for hand postures will be the same as for
motion capturing. An overview of the most common formats is presented in Pfeiffer
this volume. The representation formats described there try to capture the postures
and the movement trajectories of gestures over time. They do not provide support for
more high-level descriptions.
High-level descriptions of hand shapes are supported by sign-language formats such
as SignWriting (Sutton 1981) or HamNoSys (Hanke 2004), which can be viable repre-
sentation formats to describe the recorded gestures at a more abstract level. There are
even approaches for recognizing such symbolic descriptions automatically from motion
capture data, as the next section will show.
4. Analyzing gestural data

The most common approach to analyzing gestural data is manual annotation. The gen-
eral procedure of annotation is described in the articles in Chapter V in this volume, so
in the following I will concentrate on two specific aspects: the tools currently available
for annotating manual gestures and the approaches to semi-automatic and automatic
annotation of gestures.
4.1. Annotating manual gestures

Two annotation tools have proved useful in my own research on gestures: Anvil (Kipp
2001) and ELAN (Wittenburg et al. 2006). Both are primarily targeted at annotating
video data. They are time-line oriented and allow for a multi-level annotation of
speech, gestures and other events. The Anvil tool has recently been extended by a
framework for annotating skeleton movements manually on top of video recordings
(Nguyen and Kipp 2010). The same tool also has support for playing motion capturing
data from BVH files (Kipp 2010). Duncan (this volume) provides more information
about multimodal annotation tools.
4.2. Semi-automatic or automatic annotation of gestures

Data gloves have been used successfully in innovative human-computer interfaces (HCI),
especially in research on Virtual Reality. These application areas are dominated by com-
mand interfaces, in which a certain operation has to be selected from a set by the use of
configurations with a very high robustness. It can, for example be sufficient to differentiate
between extended, half extended and closed finger configurations (Bryson 1992). Only
few interactions actually require an exact knowledge of the continuously changing pos-
tures (e.g., when rotating something with a single hand, driving a screw, closing a bottle).
If one is interested in the detection of gestures which can be labeled with a symbol,
the approaches developed in this area of human-computer interfaces could be very
interesting and time-saving (Barbič et al. 2004), especially learning mechanisms to
classify gestures based on recorded data.
Once individual gestures or features of gestures have been described by symbols, it is
also possible to describe and/or recognize more complex phrases of gestures based on
876 V. Methods
grammars. This approach is followed in the area of automatic sign-writing (Lenseigne,

Gianni, and Dalle 2004; Lu et al. 2010; Ong and Ranganath 2005).

A common set-up for tracking hand movements consists of a data glove for tracking the
hand postures and a motion tracking system for tracking the absolute position and orien-
tation of the hands. The motion tracking system may also be used to track other parts of
the human body, such as the head, the shoulders, the elbows, etc. Although the researcher
may only be interested in certain movements of the hands, some knowledge about the
posture of the body can be very informative, especially if the gestures will be visualized
later on. It is often difficult to evaluate gestures if their relation to the body is not known.
For example, without knowledge of the position of the upper body and the head, it is dif-
ficult to tell whether a gesture is targeted at the speaker or the hearer, whether it is
stressed or not, and how much the speaker attends to the gesture him/herself.
In my own studies on pointing gestures, it turned out that the direction of eye-gaze of
the speaker was a determining factor for the direction of the pointing gesture (Pfeiffer
2011). This meant that it would have made sense to include an eye-tracking system in
the experimental set-up, which unfortunately I had not considered beforehand. Using
the position and orientation of the head, which had been recorded, I was at least
able to use heuristics to estimate the current position of the eyes and a general viewing
direction. This is another example where recording more data points than previously
considered necessary turned out to be helpful later.
If exact knowledge of the relative hand postures is relevant or if very fast movements
are to be recorded, data gloves using technologies to measure the bending of joints, sim-
ilar to the CyberGlove, should be selected. Such systems have high sampling rates and
can identify movements regardless of visibility issues. It is also possible to reconstruct
the positions of the fingertips by integrating the information about the angles of the
individual joints of a finger with the position and orientation of the hand, as provided
by the motion tracking system. Then, however, small errors in the measurements will
add up and the final result will suffer from these little inaccuracies. If an exact knowl-
edge of the position and orientation of certain fingers is required, systems that provide
this information directly, such as the Fingertracking system by Advanced Realtime
Tracking GmbH, will be the better option.
All data glove systems will at first feel a little awkward for the subjects of a study. The
design of the experiment should thus include enough time for the participants to get used
to the devices. This could, for example, be realized in the form of a small game, where the
participant has to do some manual tasks, such as catching a ball or rolling it on a table.
The goal is that the participants use their hands as naturally as possible and no longer
think about the glove. Sometimes it is also useful to reconsider alternatives. In a study
on manual pointing (Kranstedt et al. 2006: 158), we started in our pretests with a full-
blown data glove as a recording device. During the tests, however, it turned out that
some participants were afraid of moving their fingers when wearing the gloves. They
moved into a pointing-hand posture with extended index finger in the very beginning
of the study and held that posture throughout the study. We coined this behavior “tool-
like” use of the hand – but it was not the natural pointing behavior we were interested
in observing. After reconsidering the requirements, we developed a new home-made
tracking glove by combining soft golf gloves with markers for the optical tracking system
to determine only the position and orientation of the index finger. As these custom gloves
looked like ordinary gloves and had no strings attached, the participants felt more
comfortable wearing them and we did not observe any further oddities during the study.
Other aspects that need to be considered are, for example, the different hand sizes of
the participants. Most data gloves are configurable in this respect, but none support the
large variations that can be found in the population. This can especially be an issue when
tracking children and adults or Asians and Europeans in the same study. In most cases
you would also try to avoid mixing data gloves from different companies in the same
study, so a device or a device series has to be found that supports all relevant hand sizes.
Special care also has to be given to the length of the interaction periods to be re-
corded with the devices. As with all tracking systems attached to the human body,
the devices’ relative position to the body may drift slightly over time. This is particularly
true for the hands, which will naturally move very often and fast and – at the same
time – will have many interactions where they touch or hit other objects or body
parts. Also, if a data glove is used together with a motion tracking system, the relative
position of the two tracking systems can change over time, for example when the
marker of an optical tracking system is not fixed to the glove itself. If the glove is closed,
sweat might also be an issue if the tasks are very long or require much effort. Depend-
ing on the technology used, some data gloves might be more affected by this than
others. It is thus important to include short phases for recalibration or error estimation
in the design, to ensure maximum accuracy.
6. Conclusion
Data gloves provide a great opportunity to capture dynamic hand gestures. They do not
suffer from problems with perspective or occlusion, and offer machine-readable data.
The provided data also has a higher precision than what can typically be obtained by
annotating hand postures based on video recordings.
A few drawbacks and caveats should however be borne in mind. There is only little
support on the software side to analyze the recorded data. Exploiting the full potential
of data gloves for gesture analysis thus requires some expertise. Finally, wearing data
gloves may make participants of experiments feel uncomfortable and lead them to
show different behavior than without the gloves.
7. References
ART GmbH 2011. Homepage of the Advanced Realtime Tracking GmbH, http://www.ar-tracking.
de, last access November 2011.
Barbič, Jerneij, Alla Safonova, Jia-Yu Pan, Christos Faloutsos, Jessica K. Hodgins and Nancy S.
Pollard 2004. Segmenting motion capture data into distinct behaviors. In: Wolfgang Heidrich
and Ravin Balakrishna (eds.), Proceedings of Graphics Interface 2004, 185–194. Canadian
Human-Computer Communications Society School of Computer Science, University of Water-
loo, Waterloo, Ontario: Canada.
Bryson, Steve 1992. Virtual environments in scientific visualization. In: Compcon Spring’92. Thirty-
Seventh IEEE Computer Society International Conference, Digest of Papers, 460–461. IEEE.
Bouissac, Paul this volume. Prehistoric art: Hands in cave paintings and rock art. In: Cornelia Mül-
ler, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
878 V. Methods

Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter
Mouton.
Burns, Anne-Marie and Barbara Mazzarino 2006. Finger tracking methods using EyesWeb. In:
Sylvie Gibet, Nicolas Courty, Jean-Francois Kamp (eds.), Gesture in Human-Computer Inter-
action and Simulation: 6th International Gesture Workshop, GW 2005, 156. 3881. 156–167.
LNAI 3881, Springer-Verlag Berlin/Heidelberg, Germany.
Duncan, Susan this volume. Multimodal annotation tools. In: Cornelia Müller, Alan Cienki, Ellen
Eglowstein, Howard 1990. Reach out and touch your data: Three input devices, ranging from US
$100 to US$15,000, let you “hand it to computers”. BYTE 15(7): 283–290.
ElKoura, George and Karan Singh 2003. Handrix: Animating the human hand. In: Rick Parent,
Karan Singh, David Breen and Ming C. Lin (eds.), Proceedings of the 2003 ACM SIG-
GRAPH/Eurographics Symposium on Computer Animation, 110–119. Aire-la-Ville, Switzer-
land: Eurographics Association.
Fisher, Scott S 1990. Virtual interface environments. In: Brenda Laurel (ed.), The Art of Human-
Computer Interface Design. Reading MA: Addison-Wesley.
Hanke, Thomas 2004. HamNoSys – Representing sign language data in language resources and lan-
guage processing contexts. In: Oliver Streiter and Chiara Vettori (eds.), Proceedings of the Inter-
national Conference on Language Resources and Evaluation (LREC) 2004. Paris: ELRA. PP. 1–6
Hanke, Thomas, Jakob Storz and Sven Wagner 2010 iLex: Handling multi-camera recordings. In:
Philippe Dreuw (ed.), 7th International Conference on Language Resources and Evaluation.
Workshop Proceedings. W13. 4th Workshop on Representation and Processing of Sign Lan-
guages: Corpora and Sign Language Technologies, 110–111. Paris: ELRA.
Heloir, Alexis, Michael Neff and Michael Kipp 2010. Exploiting motion capture for virtual human
animation: Data collection and annotation visualization. In: Martin Kipp, Patrizia Paggio and
Dirk Heylen (eds.), Proceedings of the Workshop on Multimodal Corpora: Advances in Cap-
turing, Coding and Analyzing Multimodality. Online Ressource: http://embots.dfki.de/doc/
MMC2010-Proceedings.pdf
Kipp, Michael 2001. Anvil – a generic annotation tool for multimodal dialogue. In: Paul Dals-
gaard, Børge Lindberg, Henrik Benner, Zheng-Hua Tan (eds.), Seventh European Conference
on Speech Communication and Technology. 1367–1370. ISCA.
Kipp, Michael 2012. Multimedia annotation, querying and analysis in ANVIL. In: Mark T. Maybury
(ed.), Multimedia Information, 351–367. John Wiley & Sons, Inc., Hoboken: New Jersey.
speech gestures, and their transcription by human coders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Proceedings of Bielefeld Gesture Workshop. 23–36. Springer Verlag.
Kranstedt, Alfred, Andy Lücking, Thies Pfeiffer, Hannes Rieser and Ipke Wachsmuth 2006. Deic-
tic object reference in task-oriented dialogue. In: Gert Rickheit and Ipke Wachsmuth (eds.),
Situated Communication, 155–207. Berlin: De Gruyter Mouton.
Lenseigne, Boris, Frédérick Gianni and Patrice Dalle 2004. A new gesture representation for
sign language analysis. In: Oliver Streiter and Chiara Vettori (eds.), Workshop on Representa-
tion and Processing of Sign Language, 4th International Conference on Language Resources
and Evaluation (LREC 2004), 85–90. Paris: ELRA.
Lu, Gan, Lik-Kwan Shark, Geoff Hall and Ulrike Zeshan 2010. Hand motion recognition and vi-
sualisation for direct sign writing. In: Ebad Banissi, Stefan Bertschi, Remo Burkhard, John
Counsell, Mohammad Dastbaz, Martin Eppler, Camilla Forsell, Georges Grinstein, Jimmy Jo-
hansson, Mikael Jern, Farzad Khosrowshahi, Francis T. Marchese, Carsten Maple, Richard
Laing, Urska Cvek, Marjan Trutschl, Muhammad Sarfraz, Liz Stuart, Anna Ursyn, and Theo-
dor G Wyeld (eds.), 14th International Conference for Information Visualization (IV), 2010:
467–472. IEEE Computer Society, Los Alamitos, CA.
55. Reliability and validity of coding systems for bodily forms of communication 879
Nguyen, Quan and Michael Kipp 2010. Annotation of human gesture using 3D skeleton controls.
In: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani,
Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of 7th Interna-
tional Conference on Language Resources and Evaluation (LREC), 3037–3041. ELDA. Euro-
pean Language Resources Association (ELRA).
Ong, Sylvie C.W. and Surendra Ranganath 2005. Automatic sign language analysis: A survey and
the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence 27(6): 873–891.
Pfeiffer, Thies 2011. Understanding Multimodal Deixis with Gaze and Gesture in Conversational
Interfaces. Aachen, Germany: Shaker.
Pfeiffer, Thies this volume. Documentation of gestures with motion capture. In: Cornelia Müller, Alan
Rheingold, Howard 1992. Virtuelle Welten – Reisen im Cyberspace. Hamburg: Rowohlt.
Shaw, Chris, Jiandong Liang, Mark Green and Yunqi Sun 1992. The decoupled simulation model
for virtual reality systems. In: Penny Bauersfeld, John Bennet and Gene Lynch (eds.), Proceed-
ings of the SIGCHI Conference on Human Factors in Computing Systems, 321–328. New York:
Association for Computing Machinery.
Sutton, Valerie 1981. SignWriting for Everyday Use. Newport Beach: The Sutton Movement Writ-
ing Press. Newport Beach, CA.
Zimmermann, Thomas G., Jaron Lanier, Chuck Blanchard, Steve Bryson, Young Harvill. A Hand
Gesture Interface Device. ACM SIGCHI Bulletin, 189–192. May 1986.
Wang, Robert Y. and Jovan Popović 2009. Real-time hand-tracking with a color glove. ACM
Transactions on Graphics 28(3): 63:1–63:8. ACM, New York, NY: USA.
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann and Han Sloetjes 2006.
ELAN: a professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation. online: http://
www.lrec-conf.org/proceedings/lrec2006/pdf/153_pdf.pdf. 1556–1559. ELRA, European Lan-
guage Resources Association.
Thies Pfeiffer, Bielefeld (Germany)
55. Reliability and validity of coding systems for bodily

forms of communication
1. Introduction
2. Reliability, validity and their basic concepts: Lexical clarification
3. Reliability of coding systems
4. Brief hints on validity of coding systems
5. References
Abstract
This contribution aims to provide an up-to-date picture of the state of the art on reliability
and validity in the field of quantitative observation of verbal and bodily processes of com-
munication, where the main source of error is the human observer who codes the
880 V. Methods
interactants’ behaviour by means of coding schemes. After a lexical clarification that al-
lows readers to have a more aware orientation in this field, the contribution a) introduces
the basic concepts of intra-, inter- and observer reliability; b) reviews the summary and
point-by-point agreement coefficients among which the most used Cohen’s K; c) critically
reviews the K in the light of the more recent advances; d) provides brief hints on the valid-
ity of coding systems. Research examples, applications, indications for the right use of a
determined coefficient accompanies the contribution.
1. Introduction
Coding systems are typically used to classify. In social sciences, classification is particu-
larly used in behavioral observation (Bakeman 2000; Bakeman and Gottman 1997) or
content analysis (Bartholomew, Henderson, and Marcia 2000; Berelson 1954; Mehl
2005; Smith 2000). Given the emphasis of this volume on interaction, communication,
language and speech, focus is on behavioral observation. Behavioral coding systems
may apply to gestures, language and other communication constructs, especially when
they regard the development of the interaction in time, i.e., its sequential organization.
Initially the ability of observational coding systems to grasp non-individual interac-
tional processes and nonverbal aspects of behavior was recognized in the study of
animals (e.g., Atmann 1965) or of infants, “nonverbal” humans (Tronick et al. 1978).
Systematic observation roots in Stevens’ (1946) theory of measurement. It is a public
and replicable activity whereby one or more trained observers associate predefined
behavioral categories to interactional events (sequences of live or recorded behavior,
sometimes transcribed) according to rules fixed in the coding manual (Bakeman 2000).
This definition constrains the type of categories involved in coding systems, that is,
only nominal or ordinal. If a measurement is done with quantitative variables, it is called
rating, which by definition opposes coding. Coding systems are therefore sets of prede-
fined behavioral categories representing conceptually meaningful distinctions, almost
always theory-based that the investigator uses to answer research questions (Bakeman
and Gnisci 2005; Gnisci, Bakeman, and Maricchiolo this volume). Developing and imple-
menting a new category system is a time- and resources-consuming challenge. Alterna-
tively it can be adopted from literature or adapted to one’s own needs. A coding
system is informed by a combination of theory, previous research findings, analysis of
pilot, reflections on the preliminary efforts of coding, observation of the material to be
analyzed, other than individual insight (Bakeman and Gottman 1997; Bartholomew,
Henderson, and Marcia 2000). In this phase categories are not a priori, they are empiri-
cal, that is they emerge from the material to be analyzed (Smith 2000). Often, they consist
of a single set of mutually exclusive and exhaustive codes.
In Fig. 55.1 a coding system for hand gestures, hierarchically organized, is shown.
Usually two or more independent coders are compared to identify possible problems
of the category system or of the coders themselves. Idiosyncrasy among coders is not
the subject of the research, thus to escape individual views on observed data a compar-
ison among observers is due. This is done after each coder has been trained conceptu-
ally and practically to the categories used and to the studied phenomenon and context.
Agreement among coders is then checked in an empirical way. To make a long story
short, the training phase is crucial and requires many warnings (see Bakeman and
Gottman 1986, 1997).
HAND GESTURE
Speech Linked Gesture (SLG) Speech Non-linked Gesture (SNG)
Conversational Ideational Self-adaptor Hetero-adaptor
Cohesive Rhythmic Emblem Illustrator Person- Object-
iconic deictic
metaphoric
Fig. 55.1: Example of coding system for hand gestures (from Maricchiolo, Gnisci, and Bonaiuto,
2012).
2. Reliability, validity and their basic concepts: Lexical clarification

Behavioral observation was considered bias free and inherently valid (Bryington,
Palmer, and Watkins 2004; Watkins and Pacheco 2000): this view no longer lasted.
Observers may in good faith believe in the correctness of their coding, but when they
are compared, differences among them appear: Coders are regarded as the main source
of error in observational research.
When talking about data quality of coding systems, researchers are confused by the
terms used: agreement, fidelity, reliability, calibration, precision, accuracy, validity,
random and systematic error, consistency, stability; and by the different statistical in-
dexes (percentage of agreement, Cohen’s K, etc.). In 1987, Cone argued for a clarifica-
tion of the lexicon in behavioral assessment and Suen (1988) clarified the matter.
Researchers now ask questions about how to calculate agreement for their data, or
which is the best index, and so on. This contribution aims at answering these questions,
locating its perspective in the traditional psychometric approach (rather than the
so-called representational one, for which see, e.g., John and Benet-Martı́nez 2000).
More than in other ways of collecting data, coding data are the product of an
observer, a judge, a coder, etc. Therefore, even if there are many other causes of errors,
such as situational differences, temporal instabilities, sampling errors (Suen 1988), the
main source of error is the observer, that is the human being used for coding.
According to the psychometric tradition, when coding, two errors can occur: casual
and systematic. The first are made for many practical reasons (fatigue, readiness, etc.) as
when an observer is tired and tends to note only the more used behavioral categories or
when s/he is alert and tend to note more the less used codes: if the coding would be
repeated infinite times, these kinds of error would compensate each other. The second
error is when, for whatever reason, an observer systematically codes a determined code
rather than another.
Applying psychometrics to this specific field, the validity of a coding system is defined
as the degree to which the coding system is able to measure what it purported to measure,
and the reliability as the degree of agreement among independent coding of the same
882 V. Methods
interaction. While reliability refers to internal consistency of a coding system (that is, if
different measures of the same construct go together), validity refers to the fact that our
coding system is really reflecting a process that is being realized “there,” in the “world.”
If two witnesses give the same evidence, they agree (reliability), but this does not necessar-
ily mean that they tell the truth, with respect to what really happened (validity).
Usually, reliability and validity are understood by referring to some basic concepts,
such as precision, stability, and accuracy. Although precision is not a consensual crite-
rion (see Suen 1988: note 1), in this particular version, precision is used here as the
degree of coherence with which the observer associates events or objects to determined
categories. If, within the same coding session, a coder systematically assigns a behavior,
such as a smile, to a determined category, say A (A represents the positive behaviors),
then this coding activity is precise. Note that a coder may be precise, but not accurate,
when, for example s/he always assigns a behavior as a smile to a category such as B,
where B represents negative behaviors: such an observer is systematically biased. Sta-
bility over time (or retest reliability) refers to the degree of correlation among different
coding, in different periods of time, of the same interaction. It applies across sessions
of coding occurring at different times (e.g., after 6 months) and guarantees that an ob-
server’s coding does not decay (Bakeman and Gottman 1997). Precision refers to the
coherence in the coding within the same session, stability to the coherence between ses-
sions. Both precision and stability refer to reliability because they allow, although dif-
ferently, for the checking of the internal consistency of the coding activity: Are
different collections of the same thing, as grasped by two coders observing the same
interaction, consistent? Thus, reliability is a multidimensional concept.
Finally, accuracy refers to the degree of correspondence between the categories used
and the “reality,” that is, how much what the observer coded corresponds to or reflects
the behaviors or the processes that occurred in the interaction. It has much to do with
validity, even if it may be also involved in reliability. If, reliably coding a marital inter-
action, a coding system detects a strong aggression of a husband toward his wife, and if
actually no aggressive behavior or process is really happening among husband and wife,
the category system is simply wrong: it is not accurate.
Another term is often used: calibration among observers as the degree of agreement
reached by two (or more) coders, one with each other. This is because data collected
should not vary as a function of the observer (Bakeman and Gottman 1997). One rea-
son to calibrate observers is essentially practical: if calibrated, two observers provide
the same or at least similar coding when observing the same interaction, and thus
may be used interchangeably. Psychometric literature gives not much clearness about
the role of calibration in relation with reliability and/or validity. However, when two
coders are calibrated, their coding is of course reliable, but not necessarily valid: Two
observers whose coding agrees could share a deviant worldview and, for whatever rea-
son (e.g., an error in training), both are convinced that certain observed behaviors fall
into certain categories but both are wrong.
3. Reliability of coding systems

There is no definitive answer about the difference between agreement and reliability.
Following Bakeman and Gottman (1997), agreement describes the extent to which
two observers agree with each other. Thus it coincides simply with an observed agree-
ment that may or may not be substantial, or “true,” and, in general, it does not prevent
many potential sources of error. Instead, reliability, invoking a rich and complex psy-
chometric tradition, addresses eventually all the possible sources of errors, and it is
not constrained to the case of two or more observers, appearing a more general
term. However, many scholars agree on the fact that agreement can be regarded as
a necessary but not sufficient condition for reliability (Bakeman and Gottman 1997).
When lacking overt agreement among observers, no reliability can be established;
yet, if agreement is found, this does not necessarily translate into reliability.
Reliability is commonly defined as data consistency and refers to the degree to which
data are free from measurement errors: the less error, the more consistent the data
(Suen 1988). Making reference to the distinction made among precision, stability and
accuracy and considering the observer as a source of data, there are at least three
types of reliability (Berk 1979; Martin and Bateson 1986; Bakeman and Gottman
1997): an observer can be reliable with respect to him/herself (intra-observer reliability),
to another observer (inter-observer reliability), or to an ideal observer (a standard pro-
tocol or master) that, it is assumed, coded perfectly (observer reliability). Each reliabil-
ity is defined below, introducing also adequate reliability coefficients for coding
schemes in different situations.
3.1. Intra-observer reliability

Intra-observer reliability is the extent to which the same observer, who observes the
same stretch of interaction under the same conditions, produces similar coding results,
realizing within-observer consistency. Because this reliability implies that the same
observer makes repeated viewing of the same videorecording, its evaluation is problem-
atic due to fatigue, boredom, etc., or facilitation (Suen 1988: note 6). For this reliability,
two or more observers are involved as if they were parallel forms of a single observer.
3.2. Inter-observer reliability

Inter-observer reliability is the extent to which two independent observers, who observe
the same interaction, produce similar results, realizing between-observer consistency
(Watkins and Pacheco 2000). It can be regarded as the extent to which those particular
observers are interchangeable, and it indicates how much data are free from random
and systematic observer error.
3.3. Observer reliability

Let’s assume that investigators prepare and preserve a presumed true coding flow,
labeled standard protocol, which represents the coding product of an ideal, not fallible
observer (Bakeman and Gottman 1997) or of a “master” coder (Suen 1988), and they
contrast the observer with this so-called standard protocol. They reach three aims: a) to
check if the coder is actually doing good; b) to calibrate different observers; c) to obtain
a coding which reflects the contents of what they want (approaching this reliability to
construct or criterion-oriented validity). Therefore, observer reliability is the extent to
which an observer agrees with an assumed true standard protocol. This should eliminate
whatever kind of errors (of course, giving a correct standard protocol). A simple way to
calculate observer reliability is to locate the standard protocol, for example, in the
884 V. Methods
columns of the agreement matrix (see below) in place of the observer O2. If a good
index of reliability is obtained, it does not tell us that observers are calibrated but
that the observer coding is somehow accurate.
3.4. Coefficients for observer reliability

For brevity when discussing the coefficients for reliability, we will make reference
only to the case of two observers (O1 and O2) – the inter-observer reliability – because
it is the more frequent case. However, if instead of the second observer, we posit a
second coding by the same observer in a different moment (O1 in t2) or a standard
protocol, what we will attribute to inter-observer agreement can be easily applied to
intra-observer reliability or to observer accuracy.
The first basic distinction is between summary or global agreement coefficients and
point-by-point coefficients.
3.4.1. Summary or global agreement coefficients

In general, summary or global coefficients grasp the general common trend of the
two observers, that is how many times they identify determined behaviors, with no
regard to the fact that that behavior has been identified in the same moment of inter-
action (this is the scope of the point-by-point coefficients). Three kinds of coefficients
can be applied, the Pearson’s correlation (r), the relative intraclass correlation (ICCREL)
and the absolute intraclass correlation coefficient (ICCABS).
The starting point is a matrix with the different sessions of observation in row, the
two observers in column and the frequencies of the determined observed behavior
(e.g., A) in the cells. The Pearson’s r among observers ranges from −1 to 1 with 0 indi-
cating no agreement. A high positive index means that in those same sessions in which
the events observed by one observer increases, even the events observed by the other
observer increase. However, this index is relative because it does not help to identify
if the observers identify just the same number of events, it only tells us if they both
order the sessions in the same way as far as the occurrence of the events regards. More-
over, it is not expressed in the classical ratio between true and total variance. Instead,
the intraclass correlation coefficients (ICC) are expressed just as ratio between true and
total variance. The relative ones (ICCREL) simply order the sessions, just as r does,
and express therefore the internal consistency among the two observers. The absolute
ones (ICCABS) indicate if they observe just the same number of behaviors in the
same sessions. Thus, absolute correlation implies relative one.
The intraclass correlation coefficients have another advantage. They are the indexes
for the generalizability theory, that is, an integrated approach that allows (through an
analysis of variance procedure) to disentangle different sources of error (Cronbach
et al. 1972). They are: variance due to observed subjects, which represents true variance,
variance due to observers, which indicates systematic error or bias, and random error
variance, which represents intra-observer random error (Suen 1988). From these
sources the different intra-class correlation coefficients, which can be derived, allow
to conclude if the adopted coding system does “the work it is meant to do,” that is, it
discriminates among relevant aspects of the study (e.g., between the observed subjects)
and does not among irrelevant facets of the study (e.g., the coders). This approach
seems to enlarge the concept of reliability so as to soften the boundaries between relia-
bility and validity (for specific applications and calculations, see Bakeman and Gottman
1986; Berk 1979; Brennan 1983; Hartmann 1982; Suen 1988; for a procedure applicable
to sequential data see Bakeman and Gottman 1997: 78).
In any case, given that all intraclass correlations coefficients come from a matrix as the
one mentioned above, they will never grasp the agreement of the observer point-by-
point, that is for each moment or event or interval in which the interaction is segmented.
3.4.2. Point-by-point agreement coefficients: Cohen’s K.

When researchers want to establish a more strict reliability among observers, they focus
on the agreement or disagreement of the coding of each sequential moment, event or
interval (Gnisci, Bakeman, and Maricchiolo this volume). In this case, they start
from a very particular matrix – the K table – and are able to calculate the so-called
point-by-point agreement coefficients.
The percentage of agreement (or the proportion), though far from perfect, is the more
used index for inter-observer reliability: the number of agreements in coding between
observers divided by the sum of agreement and disagreement, multiplied by 100 (for
calculations see Fig. 55.2). Occurrence/nonoccurrence percent agreement is similar but
calculated only on occurrence or nonoccurrence of a single category. Percentage of
agreement is inflated because it does not correct for the so-called chance agreement
(the observers may sometimes agree even if they provide categories at chance rather
than coding) and by the high frequency of behavior observed (Bryington, Palmer,
and Watkins 2004; Towstopiat 1984); occurrence/non-occurrence agreement index limits
but does not eliminate errors due to chance agreement.
Let’s assume that gestures are events occurring during interaction and observers
(O1 and O2) that code them use a basic coding scheme consisting in four categories: adap-
tors, conversational, emblematic and illustrator gestures (Ekman and Friesen 1969;
McNeill 1992). A cross-table can be done with the four gestures in both rows (O1) and
columns (O2). Fig. 55.2 reports an example of such type of tables, defined confusion or
agreement matrix (from Pedon and Gnisci 2004) or K matrix (Bakeman and Quera
2011). The first upper-left cell shows that the two coders 42 times observed the same ges-
tures and coded them as adaptors (agreement), the cell below that shows instead that
3 times O1 observed a gesture coded as conversational whereas O2 coded it as adaptor
(disagreement). Consequently, the agreement frequencies locate in the major diagonal
and the disagreement ones outside it. This matrix is basic to calculate agreement indexes
but also to identify alleged sources of lack of calibration among the two observers.
Often, a best choice as reliability index has been considered the K of Cohen (1960)
because it corrects for chance agreement (Bakeman and Gottman 1997; Watkins and
Pacheco 2000; Bryington, Palmer, and Watkins 2004; for an easy program for calculat-
ing kappa, Robinson and Bakeman 1998; for a sequential program, Bakeman and
Quera 1995). For calculations refer to Fig. 55.2.
Nothing further needs to be said about the many advantages: it can be tested for sig-
nificance, adequately interpreted, it can be extended to situations with more than two
observers, it can be calculated for the coding unit (see below), for the complete coding
system and for each single category in order to identify specific reasons of unreliability,
and finally a version for ordinal variables exists (Bakeman and Gottman 1997).
886 V. Methods
O2
Adaptors Conversational Emblematic Illustrators Tot.
Adaptors 42 0 0 1 43
Conversational 3 21 2 0 26
O1
Emblematic 0 4 18 1 23
Illustrators 0 0 0 31 31
Tot. 45 25 20 33 123
k
Observed agreement (AC ) = ∑x
i = j =1
ij N = 112 = .91*
123
Observed disagreement (D ) = ∑x
i = j =1?
ij N = 11 = .09
123
∑x
2
Chance Agreement (AC ) = +i xi+ N = 45 × 43 + 26 × 25 + 23 × 20 + 31 × 33 = 4158 = .27
1232 15129
i =1
True Agreement (AT ) = AO – AC = .91 – .27 = .64
AO – AC AV .64
K= = = = .88
AO – AC + D 1 – AC 1 – .27
* If multiplied by 100, the observed agreement is the percentage of agreement.
Fig. 55.2: A hypothetical confusion matrix with four codes (for hand gestures) and two observers
(O1 and O2) and the calculations for the percentage of agreement and for the Cohen’s K.
3.4.3. Evaluating the value of reliability coefficients

Reliability coefficients have to be regarded as proportions (between true and total vari-
ance), therefore they range among 0 and 1 (if K results negative, they should still be
considered as 0; Bakeman and Gottman 1997; Suen 1988). Although in the past it
has been regarded as varying from 0 to 1, it is now recognized that K ranges among
−1 and +1, with positive, negative and null value indicating, respectively, that the obser-
vers agree more, less and as much as expected by chance alone: if the value is zero, they
agree as two persons that provide code casually, when it is negative, they systematically
disagree; only when it is positive, they agree independently by the agreement due to
chance. In this case, it does not suffice that it will be significant or that it is positive:
as with Cronbach’s alfa we need a threshold to know when the K is adequate. Although
less stringent criteria were previously proposed (Fleiss 1981), Bakeman and Gottman
(1986) argue that values less than .70 should be regarded as problematic. In general, va-
lues of reliability accepted by the journals for an observational study are less stringent
than those for a self-report scale, probably because agreement among observers is con-
sidered a harder evidence to reach. However, we will see that this (and others) rule of
thumb for establishing good values of Cohen’s K have to be qualified.
3.4.4. The K critically reviewed

Although the K has been proposed as the best index for observer agreement for many
years, recently it has been critically reviewed (Bakeman and Quera 2011; Uebersax
1987), in particular for two reasons.
First, many scholars affirm correctly that the model of chance at the basis of the K is
simply unrealistic, it does not reflect the real state of affair when the observers make a
decision during the real activity of coding (e.g., Uebersax 1987). Thus, the consequent
correction for chance of K is inadequate. What we need to pursue is a model of how the
observers take decisions and which corrections have to be done of the index assuming
that more realistic model.
Second, we now know that K depends not only on the real agreement but also on
many different elements (number of categories, simple distributions of occurrence of
the codes by the two observers, etc.). This makes its evaluation with respect to the (pos-
itive 0–1) scale relative: a K of .95 in a study can express the same reliability of a K of .58
of another study. Moreover, two studies made in different conditions in which the Ks
are both .70 do not necessarily convey the same level of reliability. But, what does actu-
ally affect the K? Sim and Wright (2005) finally demonstrated that K depends, apart
from agreement among observers, on:
(i) The systematic bias among observers, that is the similarity or difference among the
marginal frequencies of the two observers, for example, of the last row and the last
column of the table represented in Fig. 55.2 (when the two are different, even if the
two observers always agree, K cannot reach 1);
(ii) The number of codes of the category system (the more the codes the higher the K);
(iii) The prevalence of categories, that is how they are distributed in the universe of
reference (the more they are equally distributed the higher the value of K).
However, a simulation study shows that when a coding system holds more than 5–7
codes, the increase of K is small and negligible and K does not anymore depend on prev-
alence, finally, particularly when codes are less than 5, low values of K predict high
values of accuracy (Bakeman et al. 1997). This seems to demonstrate what has been
many times sustained, that K is a conservative index (Strijbos et al. 2006).
3.5. Agreement about codes and agreement about units

In many research fields, the above sections on reliability of codes would be sufficient.
Instead, in research on interaction, communication, and language, each single coding
act of the observer actually requires two distinct coding acts, even if the observers them-
selves may sometimes be not aware of it. If we are coding gestures, before asking which
category is applicable to that event (is this gesture an illustrator?), coders have to ask
something more basic: Is this event a gesture? The question is not on the nature of the
gesture itself, but on its occurrence. Therefore, observational researches often report
888 V. Methods
two indexes of agreement. An extra index of agreement on the gesture occurrence is

needed, and this agreement is about the coding unit (Bakeman and Gottman 1997).
When a study uses time or is based on transcripts of speech, a reliability study on the
unit is simpler because time allows segmentation of the stream of interaction precisely
and transcripts to embody codes on them. However, in other situations and for some
particular coding systems comparing the way the observers unitized the interaction,
is not easy because no reference is available. Consider for example turns of talk. It is
easy to ascertain that both coders identified a so-called turn of talk, that one identified
a turn and the other coder did not (and vice versa), but it is impossible to know when
both missed to identify a turn.
Researchers have proposed practical but not perfectly satisfying solutions, such as
providing a percentage of agreement for the unit, considering only the onset and
offset time of events, and using gaps between adjacent words as potential boundaries
(Bakeman and Gottman 1997). Recently, advanced methods of alignments provided
by simulations based on algorithms have been successfully proposed (Bakeman,
Quera, and Gnisci 2009; Gnisci, Bakeman, and Quera 2008; Quera, Bakeman, and
Gnisci 2007; Quera 2008).
4. Brief hints on validity of coding systems

Reliability of a coding system is a necessary but not sufficient condition for its validity.
Only if good reliability coefficients are obtained, can a validity study be done. As exem-
plified in Fig. 55.3, when the reliability is high (.95), there is wide room for validity: ide-
ally the 95% of measurement variance can be valid (although in the example it is only
80%); when it is low (.20), the valid variance can be maximum 20%, a slight value.
Reliability = .95 Reliability = .20
reliable and
reliable and error not-valid
not-valid 5% 15%
15%
reliable and valid

5%
reliable and error

valid 80%
80%
Fig. 55.3: Two graphic examples of dependency of the validity of a coding scheme by the reliabil-
ity: the first with high reliability, the second with low reliability.
Why lose time in checking whether the coding system reflects something real, if it is not
even coherent internally, that is, if the observers saw two almost different things?
Therefore, only once reliability is established, is validity considered.
It is impossible to report the whole discussion on the validity of coding systems in a
few words. Only the traditional aspects of it are reported, providing few additional
notes on basic aspects for coding systems. Content validity is established by demonstrating
that the behaviors represented by the scheme’s categories are a representative sample
of the universe of behavior to be observed. Traditional approaches to validity require
demonstrating that the coding system correlates in sensible ways with different mea-
sures allegedly associated with it in the present (concurrent validity) or in the future
(predictive validity) and with other measures assumed to measure the same construct
(convergent validity); and does not correlate with other measures assumed to measure
different constructs (divergent validity).
Many advances in the field of language and communication could be achieved if the
efforts of researchers would be addressed to concurrent and predictive criterion (or
external) validity. This would render the coding schemes much more effective and use-
ful. Bonaiuto, Gnisci, and Maricchiolo (2002), for example, found that conversational
and ideational gestures (SLG in Fig. 55.1) made by people involved in conversation
correlate with sent out turns (.76, p=.001) while adaptor gestures (SNG) do not (.12,
p>.05): this seems a confirmation of the speech-related feature of conversational and
ideational gestures. Furthermore, particular behaviors may be seen with a different
eye (although caution is always welcomed) when it is known that in marital interaction,
for example, a particular facial display of contempt or disgust of the husband or the wife
correlates significantly with variables linked with the future duration of the couple
(Gottman 1994). Such evidence seems to strengthen the solidity of coding schemes
and to connect the behavioral categories to differently collected evidence.
Researchers do not use coding schemes as a mere list of behaviors but because they
are dimensions of interest, or constructs. A construct is a concept that cannot be ob-
served directly but only through several behavioral indicators. In research on language
and communication, these constructs often regard the interaction rather than the indi-
vidual (e.g., processes as reciprocity, accommodation, dominance). The ratio of
construct validity is that the construct that the coding system wants to measure will
be correlated with the same construct measured by other methods (e.g., a self-report
scale) and will not be correlated with unrelated constructs (Campbell and Fiske 1959;
Cronbach and Meehl 1955). This comprehensive approach has been formalized in the
multitrait-multimethod approach (MTMM; e.g., Eid 2005; John and Benet-Martı́nez
2000; Schmitt 2005).
From the latter half of the last century the concept of construct validity, originally
intended as part of general validity, developed toward an integrated general conception
that includes reliability and other forms of validity as the criterion validity. Validity be-
comes the degree to which the adequacy and the appropriateness of inferences and ac-
tions based on the results of the application of a coding system are supported by
empirical evidence and theoretical rationale (Messick 1995; Wiggins 1973). This new
conception, sometimes called nomological validity, wants to check whether the inter-
pretations of the data collected with the coding schemes are consistent with a nomolo-
gical network of other measurements, for example the same construct measured with
different methods, compared with different constructs and correlated with appropriate
criteria. In other words, this validation becomes very close to testing a complex causal
model, theoretically based, with many variables related to the construct itself. Originally,
this was made by qualitative and tabular summaries of previous research, then by meta-
analyses, and recently by structural equation model testing ( John and Benet-Martı́nez
2000; for the statistical technique, Corral-Verdugo 2002; Eid, Lischetzke, and Nussbeck
2005; for a critical voice see Borsboom, Mellenbergh, and Heerden 2004).
890 V. Methods
5. References
Atmann, Stuart A. 1965. Sociobiology of rhesus monkeys. II. Stochastics of social communication.
Journal of Theoretical Biology 8: 490–522.
Bakeman, Roger 2000. Behavioral observation and coding. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social and Personality Psychology, 138–159. Cam-
Bakeman, Roger and Augusto Gnisci 2005. Sequential observational methods. In: Michael Eid
and Ed Diener (eds.), Handbook of Multimethod Measurement in Psychology, 127–140. Wash-
ington, DC: American Psychological Association.
Bakeman, Roger and John M. Gottman 1986. Observing Interaction. An Introduction to Sequential
Analysis. New York: Cambridge University Press.
Analysis. 2nd Edition. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995. Analyzing Interaction. Sequential Analysis with SDIS
and GSEQ. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 2011. Sequential Analysis and Observational Methods for the
Behavioral Sciences. New York: Cambridge University Press.
Bakeman, Roger, Vicenç Quera and Augusto Gnisci 2009. Observer agreement for timed-event
sequential data: A comparison of time-based and event-based algorithms. Behavior Research
Methods 41: 137–147.
Bakeman, Roger, Vicenç Quera, Duncan McArthur and Byron Robinson 1997. Detecting sequen-
tial patterns and determining their reliability with fallible observers. Psychological Methods 2:
357–370.
Bartholomew, Kim, Antonia J. Z. Henderson and James E. Marcia 2000. Coding semistructured
interviews in social psychological research. In: Harry T. Reis and Charles M. Judd (eds.), Hand-
book of Research Methods in Social and Personality Psychology, 286–312. Cambridge: Cam-
Berelson, Bernard 1954. Content analysis. In: Gardner Lindzey (ed.), Handbook of Social Psy-
chology, Volume 1, 488–522. Oxford: Addison-Wesley.
Berk, Ronald A. 1979. Generalizability of Behavioral Observation: A Clarification of Interobserver
Agreement and Interobserver Reliability. Cambridge: Cambridge University Press.
Bonaiuto, Marino, Augusto Gnisci and Fridanna Maricchiolo 2002. Proposta e verifica empirica di
una tassonomia dei gesti nell’interazione di piccolo gruppo. Giornale Italiano di Psicologia 29:
777–807.
Borsboom, Denny, Gideon J. Mellenbergh and Jaap van Heerden 2004. The concept of validity.
Brennan, Robert L. 1983. Elements of Generalizability Theory. Iowa City: American College Test-
ing Program.
Bryington, April A., Darcy J. Palmer and Marley W. Watkins 2004. The estimation of interobser-
ver agreement in behavioral assessment. Journal of Early and Intensive Behavior Intervention
1: 115–119.
Campbell, Donald T. and Donald W. Fiske 1959. Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin 56: 81–105.
Cohen, Jacob A. 1960. A coefficient of agreement for nominal scales. Educational and Psycholog-
ical Measurement 20: 37–46.
Cone, John D. 1987. Behavioral assessment: Some things old, some things new, some things bor-
rowed? Behavioral Assessment 9: 1–4.
Corral-Verdugo, Victor 2002. Structural Equation Modeling. In: Robert B. Bechtel and Arza
Churchman (eds.), Handbook of Environmental Psychology, 256–270. New York: Wiley.
Cronbach, Lee J., Goldine C. Gleser, Harinder Nanda and Nageswari Rajaratnan 1972. The
Dependability of Behavioral Measurement: Theory of Generalizability for Scores and Profiles.
New York: Wiley.
Cronbach, Lee J. and Paul E. Meehl 1955. Construct validity in psychological tests. Psychological
Bulletin 52: 281–302.
Eid, Michael 2005. Methodological approaches for analyzing multimethod data. In: Michael Eid
ington, DC: American Psychological Association.
Eid, Michael, Tanja Lischetzke and Fridtjof W. Nussbeck 2005. Structural equation models for
multitrait-multimethod data. In: Michael Eid and Ed Diener (eds.), Handbook of Multimethod
Measurement in Psychology, 283–299. Washington, DC: American Psychological Association.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior. Semiotica 1: 49–98.
Fleiss, Joseph L. 1981. Statistical Methods for Rates and Proportions. New York: Wiley.
Gnisci, Augusto, Roger Bakeman and Fridanna Maricchiolo this volume. Sequential notation and
analysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Gnisci, Augusto, Roger Bakeman and Vicenç Quera 2008. Blending qualitative and quantitative anal-
yses in observing interaction. International Journal of Multiple Research Approaches 2: 15–30.
Gottman, John M. 1994. What Predicts Divorce? The Relationship between Marital Processes and
Marital Outcomes. Hillsdale, NY: Lawrence Erlbaum.
Hartmann, Donald P. 1982. Assessing the dependability of observational data. In: Donald P. Hart-
mann (ed.), New Directions for the Methodology of Behavioral Sciences: Using Observers to
Study Behavior, 51–65. San Francisco: Jossey-Bass.
John, Oliver P. and Veronica Benet-Martı́nez 2000. Measurement: Reliability, construct validation,
and scale construction. In: Harry T. Reis and Charles M. Judd (eds.), Handbook of Research
Methods in Social and Personality Psychology, 339–369. Cambridge: Cambridge University Press.
Martin, Paul and Patrick Bateson 1986. Measuring Behavior: An Introductory Guide. Cambridge:
Maricchiolo, Fridanna, Augusto Gnisci and Marino Bonaiuto 2012. Coding hand gestures: A reli-
able taxonomy and a multi-media support. In: Anna Esposito, Antonietta M. Esposito, Ales-
sandro Vinciarelli, Rüdiger Hoffman and Vincent C. Müller (eds.), Cognitive Behavioural
Systems, 405–416. Berlin: Springer.
Mehl, Matthias R. 2005. Quantitative text analysis. In: Michael Eid and Ed Diener (eds.), Hand-
book of Multimethod Measurement in Psychology, 141–156. Washington, DC: American Psy-
chological Association.
Messick, Samuel 1995. Validity of psychologycal assessment. American Psychologist 50: 741–749.
Pedon, Arrigo and Augusto Gnisci 2004. Metodologia della Ricerca Psicologica. Bologna: il Mulino.
Quera, Vicenç 2008. RAP: A computer program for exploring similarities in behavior sequences
using random projections. Behavior Research Methods 40: 21–32.
Quera, Vicenç, Roger Bakeman and Augusto Gnisci 2007. Observer agreement for event se-
quences: Methods and software for sequence alignment and reliability estimates. Behaviour
Research Methods 39: 39–49.
Robinson, Byron F. and Roger Bakeman 1998. ComKappa: A Windows 95 program for calculating
kappa and related statistics. Behavior Research Methods, Instruments, and Computers 30: 731–732.
Schmitt, Manfred 2005. Conceptual, theoretical, and historical foundations of multimethod assess-
ment. In: Michael Eid and Ed Diener (eds.), Handbook of Multimethod Measurement in Psy-
chology, 9–25. Washington, DC: American Psychological Association.
Sim, Julius and Chris Wright 2005. The kappa statistic in reliability studies: Use, interpretation,
and sample size requirements. Physical Therapy 85: 257–268.
892 V. Methods
Smith, Charles P. 2000. Content analysis and narrative analysis. In: Harry T. Reis and Charles M.
Judd (eds.), Handbook of Research Methods in Social and Personality Psychology, 313–335.
Stevens, Stanley S. 1946. On the theory of scales of measurement. Science 103: 677–680.
Strijbos, Jan-Willem, Rob Martens, Frans Prins and Wim Jochems 2006. Content analysis: What
are they talking about? Computers and Education 46: 29–48.
Suen, Hoi K. 1988. Agreement, reliability, accuracy and validity: Toward a clarification. Behav-
ioral Assessment 10: 343–366.
Towstopiat, Olga 1984. A review of reliability procedures for measuring observer agreement. Con-
temporary Educational Psychology 9: 333–352.
Tronick, Ed, Heidelise Als, Lauren Adamson, Susan Wise and T. Barry Brazelton 1978. The in-
fant’s response to entrapment between contradictory messages in face-to-face communication.
Journal of American Academy of Child Psychiatry 17: 1–13.
Uebersax, John 1987. Diversity of decision-making models and the measurement of interrater
agreement. Psychological Bulletin 101: 140–146.
Watkins, Marley W. and Miriam E. Pacheco 2000. Interobserver agreement in behavioral research:
Importance and calculation. Journal of Behavioral Education 10: 205–212.
Wiggins, Jerry S. 1973. Personality and Prediction: Principles of Personality Assessment. Reading,
MA: Addison Wesley.
Augusto Gnisci, Naples (Italy)

56. Sequential notation and analysis for bodily

forms of communication
1. Introduction
2. Notation for sequential analysis
3. Brief hints on sequential analysis
4. Brief conclusions
5. References
Abstract
Human interaction is a dynamic process that unfolds in time between two or more people
and that is typically characterized by mutual influence; participants in interaction coordi-
nate in time various aspects of their own behavior (e.g., speech and gestures) with other
people’s behavior (e.g., different turns of talk). Sequential analysis is a quantitative
approach that emphasizes measurement and utilizes traditional research instruments
for data collection and analysis in the attempt to grasp interactive processes without los-
ing the dynamic processes of interaction. The Sequential Data Interchange Standard
(SDIS) allows users to code data based on time (states and timed-events) and on events
(intervals, simple and multi-events), using a basic common language based on a universal
data grid and developing particular notation. Using this language allows users to analyze
basic interactive processes (e.g., reciprocity, coordination, divergence, etc.) in a way that
56. Sequential notation and analysis for bodily forms of communication 893
reflects their dynamic and sequential aspect. The contribution provides both worked
research examples and brief hints on which kind of sequential analysis can be fruitfully
applied to proper data.
1. Introduction
Sequentiality has been recognized as a basic attribute of interactive processes by many
scholars since at least the 1950s (Argyle 1967; Heyns and Lippitt 1954). The term
“sequential” is often used to characterize the dynamic aspect of interaction. It is not
a mystery that human interaction is essentially a dynamic process that unfolds in
time between two or more people and that is typically characterized by mutual influ-
ence (Bakeman and Gottman 1997). Participants in such interaction have to coordinate
in time various aspects of their own behavior (e.g., speech and gestures) with other
people’s behaviour (e.g., different turns of talk).
Some qualitative approaches to interaction (e.g., Conversation Analysis or CA) put
sequence at the heart of their approach (Sacks 1972; Sacks, Schegloff, and Jefferson
1974). However, the approach to sequential analysis that we will follow here is quanti-
tative, that is, an approach that emphasizes measurement and utilizes traditional
research instruments for data collection and analysis in our attempts to understand in-
teractive processes (Bakeman 2000; Bakeman and Gnisci 2005; Bakeman and Gottman
1997; Bakeman and Quera 1995a; Bakeman and Robinson 1994; for criticism, see
Slugoski and Hilton 2001; for some answers to qualitative criticism, see Gnisci, Bakeman,
and Quera 2008). It consists of observational techniques based on systematic observa-
tion and statistical techniques of analysis that take sequencing into account (sequences,
co-occurrence, durations, intervals, parallel streams of behaviours, etc.). Its scope in-
cludes attempts to operationalize basic interactive processes (e.g., reciprocity, coordina-
tion, divergence, etc.) in a way that reflects their dynamic and sequential aspect (Gnisci
2005). To put the matter metaphorically, we are interested in an approach that lets us
reconstruct a film clip and not just a single picture.
Such detailed sequential analysis has been made feasible by the exceptional devel-
opment of video and audio recording technology during the last several decades, allow-
ing us not just to view but to review repeatedly behaviour of interest (Bull 2002).
Recording technology has evolved from film, to analogue tape, to digital computer
files, and from expensive and cumbersome to inexpensive and portable recording de-
vices (Bakeman and Gottman 1997). It is not an exaggeration to say that computer
and video technology is to the study of behaviour what the introduction of the micro-
scope was to the biological sciences (Bull 2002). It has permitted the fine grained analysis
of interaction, called microanalysis (Bull 2002), and has led us to consider anew how we
should represent and analyse interaction (Müller this volume). This chapter presents a
notational scheme for behavioural sequences, one that, in particular, facilitates their
sequential analysis.
2. Notation for sequential analysis

The sequential notation that we will describe is the Sequential Data Interchange Stan-
dard. It follows conventions developed by Bakeman and Quera (1995a). Sequential
Data Interchange Standard has several desirable features; it is comprehensive and
894 V. Methods
flexible, and accommodates several different ways of recording observational data,

transforming and exporting data to statistical programs, and so on.
The notation can be applied to many different research problems and questions of
interest for communication, from gestures to body movements, from language to con-
text, and so on. The notation makes some epistemic assumptions (e.g., it is possible
to quantify human behaviour by means of measurement as per Stevens 1946), but oth-
erwise it is theoretically uncommitted. Therefore, scholars with different theoretical ap-
proaches may find it useful to check their hypotheses, once they have developed a
category (i.e., measurement) system. This relatively content-free aspect of the notation
makes it very flexible and practical.
The category systems, or coding schemes, are sets of categories that map basic
dimension of behaviour, are guided by research questions, and allow systematic obser-
vation. Categories, or codes, are often grouped into sets and, within the set, are mutually
exclusive and exhaustive, that is to say, only one code applies to the thing coded (mutu-
ally exclusive) and some code applies to every thing coded (exhaustive) (Bakeman,
Deckner, and Quera 2004).
Sometimes it is thought that observation is inherently correlational. In fact, the nota-
tion described here can be applied in both unmanipulated, natural settings as well as
laboratory settings. Thus it can be used in the context of experimental studies to test
causal hypotheses.
The fact that it is possible to code parallel streams of behaviours and attribute some
of them to aspects of language and others to aspects of the body, that is, to visible body
movements (Müller Outline, here), allows the study of embodiment and of multimodal
aspects of language and communication in a precise and accurate way (Gnisci, Maric-
chiolo, and Bonaiuto this volume). Last but not least, the analysis may greatly benefit
from contextual aspects if some parallel streams are attributed to them (see Gnisci and
Bonaiuto 2003).
Fig. 56.1 summarizes the data types whose notation we describe in the next several
paragraphs. For four of these data types, observers are asked to detect and code events
in the stream of behaviour, but for one they are asked to code pre-defined fixed
intervals (e.g., successive 15-second intervals). Three different types of units may be
used analytically: events, fixed intervals, or time-units. When time-units serve as the ana-
lytic unit, observers are asked to record onset times for events and offset times as well
(unless offset times are implied by the onset of another event belonging to the same
mutually exclusive and exhaustive set); in such cases, events are represented as a
sequence of time-units (e.g., a second) coded as “on” for a the particular code (see
Section 2.6 below).
To make the description of the sequential data format more concrete, we will use
as an example a part of an interchange between a lawyer and a witness, coded by
two independent and trained observers who used particular coding systems for ques-
tions, answers, turn taking (as in Gnisci 2005, and Gnisci and Bakeman 2007), and ges-
tures. Assume that the observers used six mutually exclusive and exhaustive category
systems, one for lawyers’ questions (Q1=declaration, Q2=tag-, Q3=yes/no-, Q4=
choice-, Q5=narrow wh-, Q6=broad wh-question), one for witnesses’ answers (A1=per-
tinent, A2=elaborate, A3=implicit reply, NR=no-reply), two for turn taking (for
witnesses T3=long pause, T2=short pause, T1=latching, T4=conflict interruption,
T5=simple interruption; for lawyers S3, S2, S1, S4 and S5 have similar meaning), and
Data type What is coded? What is recorded? What is tallied?
Simple-event Events Event codes Events
Multi-event Events (more than Event codes Events

(multi-dimensioned event) one dimension) (>1 per event)
Interval (fixed) Intervals Event codes Intervals

(0–n per interval)
State Events Event codes, Time-units

(single and multiple streams) onset times
Timed-event Events Event code, onset and Time-units

(optional) offset times
Fig. 56.1: Different kinds of observational sequential data.
two for type of gestures (for witnesses WP=propositional and WNP=non propositional,
and for lawyers LP and LNP have similar meaning).
Fig. 56.2 shows in graphical form the outcome of coding, after observers have viewed
the video record. It is based on the beginning of an interchange reported and analyzed
extensively in Gnisci and Pontecorvo (2004), although here it has been simplified; thus
it should be considered simply as a hypothetical example. The examples of sequences
reported in subsequent paragraphs refer to the interaction portrayed in Fig. 56.2.
2.1. Simple-event sequences

The simplest way to represent sequential behaviour consists of a single stream of coded
events (without any time information); only one code is assigned to each successive
event, thus necessarily the codes must form one or more mutually exclusive and exhaus-
tive sets.
This way of representing the lawyer-witness interchange is somewhat limited because
we have to choose a flow of behaviour, that is, this form does not allow for co-occur-
rence. If our research questions focused simply on connections among subsequent
questions by the lawyer, we could simply code all the lawyer’s questions this way:
Q5 Q5 Q1 Q4 Q1 Q1 …
However, if question and answer exchanges were our research focus, we could repre-
sent the flow of behavior between lawyer and witness as:
Q5 A2 Q5 A2 Q1 A3 Q4 NR Q1 NR Q1 …
We can even include four coding systems (lawyer questions, witness answers, turn tak-
ing by L and by W) because they occur in sequence:
Q5 T3 A2 S2 Q5 T3 A2 S5 Q1 T1 A3 S1 Q4 T1 NR S5 Q1 T2 NR S2 Q1 …
However, with simple-event sequences we cannot represent co-occurring behaviors, for

example between answers and gestures. There is no direct way to represent them as a
sequence of simple events. One possibility would be to create new codes that represent
896
Category system Seconds

and participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Turn by the L
Turn by the w
kind of Question Q5 Q5 Q1
kind of Answer A2 A2
Turn taking by L S2 S5
Turn taking by W T3 T3 T1
Gestures by L P
Gestures by W NP NP
Seconds
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 30 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
Q4 Q1 Q1 ...
A3 NR NR
S1 S5 S2
T1 T2
P
NP NP NP
Fig. 56.2: General representation of an examination between a lawyer (L) and a witness (W) in a criminal trial with annotation of the durations of turns
of question and answer, the duration of the gestures, and the occurrence of turn taking (approximation 1 second).
V. Methods
combinations of old codes (e.g., a new code X for the answer A1 co-occurring with ges-
ture P, a new code Y for A2-P, etc.), but this can produce a proliferation of new codes
that rapidly become awkward and unmanageable for subsequent data analysis.
In sum, simple-event data are useful under limited circumstances, for example, when
only a simple flow of behaviour is of interest. With these data it is possible to understand
how often events occur and how events are sequenced but not how long individual events
last or the proportion of time devoted by each participant to each kind of event. Their use
is suggested in very introductory studies, when resources are limited or when research
questions suit them. In such cases they can provide easy and effective answers.
2.2. Multi-event sequences

As noted earlier, assigning one code, and one code only, to successive events may be
limiting. We might take a more multi-dimensional view, applying more than one set
of mutually exclusive and exhaustive codes to a single event, in effect, cross-classifying
each event. Or, we might see conversation as a sequence of turns of talk (Gottman 1979,
1983) or of adjacent pairs (Sacks, Schegloff, and Jefferson 1974). Again, we would apply
more than one set of mutually exclusive and exhaustive codes to this more complex
“event.” Because more than one code is applied to each event, it makes sense to call
such data Multi-Event (Bakeman and Quera 1995a).
For example, using periods or dots to separate successive multi-events, if our
research question focused on the question-answer package as the event of interest,
our representation for the Fig. 56.2 data would be:
Q5 T3 A2 S2. Q5 T3 A2 S5. Q1 T1 A3 S1. Q4 T1 NR S5. Q1 T2 NR S2. Q1 …
Here each event or interactional unit (i.e., the question-answer exchange) represents
an event of variable duration (for example, the 1st exchange lasts 8 sec, the 2nd 37,
the 3rd 6). But if the actual duration is not of interest, representing data as multi-events
can be useful. Note, however, that this representation did not include gestures, which
occurred at varying points in time and were not necessarily a dimension of the
question-answer unit.
Although limited, multi-event data have advantages. For example, since multi-
dimensional contingency tables result, they lend themselves to log-linear analysis
(see below).
2.3. Interval sequences

For simple-event and multi-event sequential data, observers are asked to detect and
code events. In contrast, for interval sequential data, the intervals are fixed to a speci-
fied duration (e.g., 15 seconds) and observers are asked to code each successive interval.
No codes or one or more codes may be assigned to each interval. Interval Sequential
data have been very important in the past because they enabled researchers to work
in sometimes remote settings using only simple equipment such as pencil and paper
(e.g., see Bakeman et al. 1990). A limitation is that interval sequential data provide
only approximate estimates of frequency and duration (Bakeman and Gottman 1997).
The researcher establishes a fixed interval of time, for example 5 seconds; a timing
898 V. Methods
device can signal to the observer when the interval is over and the observer can then
proceed to code whatever happened within that interval. Therefore, the flow of behavior
is conceptualized as a sequence of intervals. If we assume intervals equal to 5 seconds,
and commas distinguish different intervals, our representation becomes:
Q5 T3 A2 NP, NP A2 S2, Q5, Q5, T3 NP A2, A2, A2, A2, A2 P, A2 P S5 Q1, …
If, instead, the coder assigns only the behaviors co-occurring at the end of the interval,
in effect point sampling, and starting from the 5th second, the representation is:
A2 NP, S2, Q5, Q5, NP A2, A2, A2, A2, A2 P, …
Of course, which strategy is the best depends on the aim of the researcher and the
research questions. In many cases, the best strategy may be to use a different data
type because of the inherent limitations of interval data. Still interval sequential data
may be useful when the investigator is interested in only approximate estimates of
how often events occur and what proportion of time is devoted on each event. When
used as point sampling, intervals provide a random sample of behaviors that cannot
be regarded as truly sequential.
2.4. State sequences

State sequences consist of one (or more) streams of events, where each stream consists
of codes from a single mutually exclusive and exhaustive set and contains information
about each code’s duration. Therefore, a single stream of states is like an event
sequence to which time information has been added. For questions and answers, the
representation could be (where numbers represent seconds):
Q5=2 A2=5 Q5=10 A2=24 Q1=4 A3=2 Q4=9 NR=6 Q1=10 NR=15 Q1=8 …
In this case the representation only approximates the interaction. It eliminates the
length of the inter-turns pauses and the occurrence of overlap between turns. However,
when applied to mutually exclusive and exhaustive codes it is very effective because it
preserves the sequence and the duration of a flow of events, providing more information
than simple-event sequences provide.
More than one stream can be represented, which allows us to examine co-occurrences
between codes in two or more streams. The multiple stream state format allows for con-
siderable flexibility. Here is a four-stream example. The first stream represents the law-
yer’s questions, the second the witness’s answers, the third the witness’s gestures, and
the fourth the lawyer’s gestures. To make each set exhaustive, Q0=no lawyer question,
A0=no witness answer, WG0=no witness gesture, LG0=no lawyer gesture. The time
unit is one second:
Q5=2 Q0=8 Q5=10 Q0=26 Q1=4 Q0=2 Q4=9 Q0=3 Q1=10 Q0=17 Q1=8 … &
A0=4 A2=5 A0=14 A2=24 A0=3 A3=2 A0=9 NR=6 A0=8 NR=15 … &
WG0=4 WNP=2 WG0=17 WNP=2 WG0=25 WNP=2 WG0=9 WNP=2 G0=12 WNP=2 …. &
LG0=43 LP=4 LG0=15 LP=5 LG0=…
The symbol “&” indicates the end of a stream, thus the next stream overlaps any pre-
vious streams. In sum, multiple-stream state sequences allows for the representation of
many parallel streams of interaction and are therefore very useful, but, note, they
require that all codes within a stream be members of mutually exclusive and exhaustive
sets; momentary or frequency-only codes are not included.
2.5. Timed-event sequences

Timed-event sequences are the most general and flexible type of sequential data, and
are strongly recommended. As with state sequences, the occurrence, the sequence,
and the duration of the events are preserved, but the codes do not all need to be mem-
ber of mutually exclusive and exhaustive sets. Thus momentary or frequency behaviors
(behavior whose frequency of occurrence and not duration is of interest, such as baby
burps, dog yelps, child points, etc.) can be recorded.
Here is the timed-event representation for our example:
Q5,1–2) T3,3–4) WNP,5–6) A2,5–9) S2,1 Q5,11–20) T3,21–23) WNP,24–25) A2,24–47)

LP,44–47) S5,47 Q1,47–50) T1,51 A3,51–52) WNP,51–52) S1,53 Q4,53–61) T1,62 NR,62–67)
WNP,62–63) S5,65–67) LP,65–67) Q1,65–74) T2,75 WNP,76–77) NR,76–90) S2,91
Q1,92–99) …
The general format is: code,onset-time – offset time, or code,onset-time for momentary
behaviours. Here, times are in seconds. A right parenthesis after the offset time indi-
cates an inclusive offset, for example, Q5,11–20) means the code Q5 occurred from sec-
ond 11 through second 20, for a duration of 10 seconds. (In SDIS, no right parenthesis
after the offset time indicates an exclusive offset, for example, Q5,11–21 means the code
Q5 occurred from second 11 up to but not through second 21, for a duration of
10 seconds.)
Information on event frequency, duration, proportion of time, and sequence is most
completely preserved with the timed-event data type, making this data notation the
most flexible and complete.
2.6. The basic grid for representing different data

The data types we have described here are those defined in Sequential Data Inter-
change Standard to provide flexibility. In fact, all can be represented with a universal
data grid, which facilitates data analysis. The rows of this grid represent the codes de-
fined by the user and the columns represent the unit tallied (i.e., what is counted and
will be later elaborated with statistical analysis; see Fig. 56.1). For simple- and multi-
event sequential data, the unit is the event. For interval sequential data, the unit is of
course an interval. For state and timed-event sequential data, the representational
unit is time (i.e., the time-unit, as defined by the precision with which time is recorded).
For example, if a second, each column of the data grid represents a second.
In Fig. 56.3 we have provided a schematic representation of the data grid. The exam-
ple is adapted from the beginning of the lawyer-witness interchange shown in Fig. 56.2.
When we represent sequential data, as done in the preceding sections, we use one of the
data types defined by Sequential Data Interchange Standard. When we want to analyze
such data we use the Generalised Sequential Querier (GSEQ; Bakeman and Quera
1995a). Both Sequential Data Interchange Standard and Generalised Sequential
900 V. Methods
Querier are available for free download at www.gsu.edu/~psyrab/gseq or www.ub.es/

comporta/gseq (select English and Download or Español and Descargar).
Simple Events Multi-Events Fixed Intervals
Codes 1 2 3 4 5 6 7 8 Codes 1 2 Codes 1 2 3 4 5 6 7 8 9 10 11
Q1 Q1 Q1
Q5 Q5 Q5
A2 A2 A2
T3 T3 T3
S5 S5 S5
S2 S2 S2
S5 S5 S5
WNP WNP WNP
LP LP LP
Timed/State Data*
Codes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Q5
Q6
A1
A2 ...
T3
S2
S5
WNP
* For brevity the representation stops at the 30th second.
Fig. 56.3: Representational grid for Simple-Event, Multi-Event, Fixed Interval, and Timed-Event/
State sequential data for our example from the beginning of the examination until the beginning
of the third question-answer exchange (approximately 47 sec.).
3. Brief hints on sequential analysis

When we write a sequential data file following Sequential Data Interchange Standard
conventions, whatever the data used, we usually use Generalised Sequential Querier
for obtaining the grid and computing particular statistics, guided by our research
questions.
The possibilities for analysis are many (see Bakeman and Gottman 1997; Bakeman
and Quera 1995a). What we highlight here are the ones we regard as the most useful
and fruitful. Needless to say, most depend on the sequential data type chosen to repre-
sent interaction and on the research questions of the researcher.
In general, the first step when analyzing interactional data is obtaining descriptive
statistics. These allow the investigator to grasp a general, although static, picture of sin-
gle variables in the interaction at stake (frequencies and probabilities of events, dura-
tion and probabilities of durations, average duration of events, rates, etc.) or a more
dynamic one involving two or more variables (transitional probabilities, odds, odds
ratio, Yule’s Q, etc.). These and other statistics, as the proximity index (Taylor and
Donald 2007; Taylor and Thomas 2008), may be illustrated by means of graphs, transi-
tional diagrams, or lagged probability profiles (Bakeman and Gottman 1997), time
series graphs (Gottman 1979, 1983), similarity maps (Gnisci, Bakeman, and Quera
2008; Quera 2008), and so on.
The second step is using standard packages to compute descriptive and inferential
statistics to provide answers to single research questions. For example, investigators
wanting to check specific sequential chains among behaviours may choose a chi square
strategy that uses the chi square test applied to cross-tables sequentially organized to
identify significant associations among different behaviors as omnibus test, so they
might compute transitional probabilities and build transitional matrixes or diagrams.
They could then compute adjusted residuals for the cells of the table to identify partic-
ular significant sequences (for details of this strategy, see Bakeman and Gottman 1997).
To check particular hypotheses, investigators might want, instead, to obtain a partic-
ular sequential statistic for each subject (for example an associative index such as
Yules’s Q between behaviors of the mother and subsequent responses by the child for
each interacting mother-child dyad) and export all of them in a participants-by-variables
matrix, such as the one that SPSS expects. Classical inferential analyses can then be per-
formed, such as ANOVA to identify differences between groups (e.g., between an autis-
tic group and controls), or regression to check, for example, if such individual variables as
the intelligence of the mother or of the child is associated with communicative compe-
tence as indicated by sequential indexes (see Bakeman and Gottman 1997; Gnisci and
Bakeman 2000). In sum, when deriving a dyadic score from sequential data, it can
then serve as a score to be analyzed with whatever statistical procedures are appropriate
(for an application, see Gilstrap and Ceci 2005).
Finally, investigators may want to identify general patterns of transition involving
many behaviors and therefore use a lag-sequential log-linear analysis (Bakeman and
Quera 1995b). This strategy allows integration between sequential analysis into an es-
tablished and well-supported statistical tradition as the one of log-linear models (Bake-
man and Robinson 1994) and resembles the Markovian chains analysis (Castellan 1979;
Chatfield 1973) of earlier literature. The resulting multidimensional sequential tables
that report data on sequences of many behaviors can be used with log-linear analysis
to find the simpler but more informative sequential model, the so called best fitting
model. In this manner, we can identify stable patterns of interaction (for a worked
example involving mother-child interaction, see Cohn and Tronick 1987). For example,
using this strategy, Gnisci (2005; see also Gnisci and Bakeman 2007) demonstrated that
in examinations during criminal trials the answers provided by witnesses and the way
they managed turn taking depended not only on the questions posed by lawyers but
also on the way lawyers took the floor when asking questions (see Fig. 56.4). What is
ΔQ 2 = 5%**, ΔE = .46%
ΔQ 2 = 6%***, ΔE = .87%
Lawyer taking Lawyer asking Witness taking Witness providing

the floor (S) the question (Q) the floor (T) the answer (A)
E = 5.82% E = 3.14%
ΔQ 2 = 36%***, ΔE = 4.74%
ΔQ 2 = 22%*, ΔE = 2.73%
Fig. 56.4: Sequential log-linear analysis of examinations between lawyers and witnesses (From
Gnisci, 2005).
902 V. Methods
interesting is that content or transitional aspects of the turn of the witnesses depended
on both content and transitional aspect of the lawyer’s preceding turn (multimodality).
Here we have indicated only some ways to proceed in analyzing sequential data. In
any case, any of a variety of statistics can be fruitfully applied to answer research ques-
tions on interaction.
4. Brief conclusions
Researchers of language and communication who take a quantitative approach may
strongly benefit from the notation of sequential analysis. First, it provides a standard
way to represent different types of sequential data (as defined by Sequential Data Inter-
change Standard) that, second, can be fruitfully analyzed to represent dynamic aspects
of the interaction. The ability to operationalize these often elusive aspects of interaction
is a major advantage of the sequential approach: many different streams of behaviour
or over-lapping events can be taken into consideration simultaneously, the effects of dif-
ferent behavior modalities can be detected, different notions of context can be operation-
alized, and many relationships between them, reflecting different processes (e.g.,
reciprocity, accommodation, synchronicity, suggestibility, etc.), can be fruitfully pictured.
5. References
Argyle, Michael 1967. The Psychology of Interpersonal Behavior. Harmondsworth: Penguin.
Bakeman, Roger 2000. Behavioral observation and coding. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social and Personality Psychology, 138–159. Cam-
Bakeman, Roger, Lauren B. Adamson, Melvin Konner and Ronald G. Barr 1990 !Kung infancy:
The social context of object exploration. Child Development 61: 794–809.
Bakeman, Roger, Deborah F. Deckner and Vicenç Quera 2004. Analysis of behavioral streams. In:
Douglas M. Teti (ed.), Handbook of Research Methods in Developmental Science, 394–420.
Oxford: Blackwell.
Bakeman, Roger and Augusto Gnisci 2005. Sequential observational methods. In: Michael Eid
ington, DC: APA.
Analysis, 2nd Edition. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995a. Analyzing Interaction. Sequential Analysis with SDIS
and GSEQ. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995b. Log-linear approaches to lag-sequential analysis when
consecutive codes may and cannot repeat. Psychological Bulletin 118: 272–284.
Bakeman, Roger and Byron F. Robinson 1994. Understanding Log-Linear Analysis with ILOG.
An Interactive Approach. Hillsdale, NJ: Lawrence Erlbaum.
Bull, Peter 2002. Communication under the Microscope. The Theory and the Practice of Microa-
nalysis. London: Routledge.
Castellan, N. John 1979. The analysis of behavior sequences. In: Robert B. Cairns (ed.), The Ana-
lysis of Social Interactions: Method, Issues, and Illustrations, 81–116. Hillsdale, NJ: Lawrence
Erlbaum.
Chatfield, Christopher 1973. Statistical inference regarding Markov chain models. Applied Statis-
tics 22: 7–20.
Cohn, Jeffrey F. and Edward Z. Tronick 1987. Mother-infant face-to-face interaction: The
sequence of dyad states at 3, 6, and 9 months. Developmental Psychology 23: 68–77.
Gilstrap, Livia L. and Stephen J. Ceci 2005. Reconceptualizing childrens’ suggestibility: Bidirec-
tional and temporal proprieties. Child Development 76(1): 40–53.
Gnisci, Augusto 2005. Sequential strategies of accommodation: A new method in courtroom. Brit-
ish Journal of Social Psychology 44(4): 621–643.
Gnisci, Augusto and Roger Bakeman 2000. L’osservazione e l’analisi sequenziale dell’interazione
[Sequential observation and analysis of interaction]. Rome: LED.
Gnisci, Augusto and Roger Bakeman 2007. Sequential accommodation of turn taking and turn length:
A study of courtroom interaction. Journal of Language and Social Psychology 9(26): 234–259.
Gnisci, Augusto, Roger Bakeman and Vicenç Quera 2008. Blending qualitative and quantitative anal-
yses in observing interaction. International Journal of Multiple Research Approaches 2(1): 15–30.
Gnisci, Augusto and Marino Bonaiuto 2003. Grilling politicians. A study on politicians’ answers to
questions comparing televised political interviews and legal examinations. Journal of Language
and Social Psychology 22(4): 384–413.
Gnisci, Augusto, Fridanna Maricchiolo and Marino Bonaiuto this volume. Reliability and validity
of coding systems. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Gnisci, Augusto and Clotilde Pontecorvo 2004. The organization of questions and answers in the
thematic phases of hostile examination: Turn-by-turn manipulation of meaning. Journal of
Pragmatics 36(5): 965–995.
Gottman, John M. 1979. Marital Interaction. Experimental Investigations. New York: Academic Press.
Gottman, John M. 1983. How children becomes friends. Monographs of the Society for Research in
Child Development 48(3): 1–86.
Heyns, Roger W. and Ronald Lippitt 1954. Systematic observational techniques. In: Gardner
Lindzey (ed.), Handbook of Social Psychology. I. Theory and Method. II. Special Fields and
Applications, Vol. 1, Chapter 10. Oxford: Addison-Wesley.
Müller, Cornelia this volume. Introduction. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva
H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communica-
tion: An International Handbook on Multimodality in Human Interaction. (Handbooks of Lin-
guistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Quera, Vicenç 2008. RAP: A computer program for exploring similarities in behavior sequences
using random projections. Behavior Research Methods 40: 21–32.
Reis, Harry T. and Charles M. Judd (eds.) 2000. Handbook of Research Methods in Social and Per-
sonality Psychology. Cambridge: Cambridge University Press.
Sacks, Harvey 1972. An initial investigation of the usability of conversational data for doing soci-
ology. In: D. Sudnow (ed.), Studies in Social Interaction, 31–74. New York: Free Press.
Sacks, Harvey, Emanuel Schegloff and Gail Jefferson 1974. A simplest systematics for the organi-
zation of turn-taking for conversation. Language 4: 696–735.
Slugoski, Ben R. and Denis J. Hilton 2001. Conversation. In: W. Peter Robinson and Howard Giles
(eds.), The New Handbook of Language and Social Psychology, 193–220. New York: Wiley.
Stevens, Stanley S. 1946. On the theory of scales of measurement. Science 103: 677–680.
Taylor, Paul J. and Ian Donald 2007 Testing the relationship between local cue-response patterns
and the global structure of communication behaviour. British Journal of Social Psychology
46(2): 273–298.
Taylor, Paul J. and Sally Thomas 2008. Linguistic style matching and negotiation outcome. Nego-
tiation and Conflict Management Research 1(3): 263–281.

Roger Bakeman, Atlanta, GE (USA)
904 V. Methods
57. Decoding bodily forms of communication

1. Introduction
2. Research methods on decoding
3. Data collection: Measurement procedures
4. Conclusions
5. References
Abstract
This contribution aims to provide brief descriptions of different methods for analysing
how people read, process and interpret bodily forms of communication. Research on de-
coding can be carried out using different methods according to research questions and
hypotheses as well as to background theoretical approaches. Main method features are:
experimental designs (laboratory, field experiments) and stimulus materials (e.g. photos,
videos). The first section of the contribution provides a brief description of the nonverbal
stimuli, which can be used in the experimental design with the aim of measuring receivers’
reactions (decoding). Such data can be recorded through direct and indirect measures de-
pending on the underling processes (deliberate/reflective or automatic/impulsive) mainly
involved in the perception of bodily communication. In the second section, both types of
procedure are defined, described and exemplified, by analysing reliability, validity and
applicability to body language. Comparisons about advantages and disadvantage of
explicit and implicit measures and the relations between the two kinds of measures are
treated in the final section of the contribution.
1. Introduction
In social interaction, people use the body, on the one hand to communicate their
states or traits to other people, on the other hand to perceive and identify their inter-
locutors’ personality, intentions, attitudes, emotions, etc., whether in a “transparent”
or tactical way.
Any form of communication requires a sender and a recipient, who both have the
task to produce and to interpret the message, respectively. So, the sender is called
upon to correctly encode the message and the recipient is called upon to decode it.
This is true also for bodily communication and irrespective of “honest” or “deceitful”
intentions. Whenever considering bodily movements as a form of communication, it is
necessary that the message conveyed by the sender’s body is interpreted by the recip-
ient. This process involves many other psychological concepts and abilities: the specific
bodily action must be perceived and the recipient has to assign his/her intention to the
sender in order to communicate something via that bodily action. Afterwards, bodily
actions are evaluated and elicit a reaction from the recipient, who is finally able to per-
form a behaviour in reaction or as an answer to the perceived communication. For
example, a “sender” feels happy and shows expressions of joy (smile, laugh, high
tone voice, etc.), which, in turn, can be perceived and interpreted by the “receiver”
as showing that the sender is happy. These processes can happen in a way that may
57. Decoding bodily forms of communication 905
vary in awareness, accuracy, truthfulness, and reliability, both from the sender and the
recipient/perceiver (DePaulo and Friedman 1998).
So, research questions on bodily communication mainly regard two principal bodily
processes: encoding (by the sender) and decoding (by the receiver).
A study by Gifford (1994) on bodily indexes of personality and on their perception
may be remembered as an example, in order to illustrate such research methodological
aspects. Gifford (1994) found significant relations between:
(i) interpersonal dispositions of the participants, measured through an inventory

(Interpersonal Adjective Scales; Wiggins 1979) and their bodily behaviours during
spontaneous videotaped conversations, coded through a nonverbal scoring system
(Seated Kinesic Activity Notation System;Gifford 1986);
(ii) bodily cues of videotaped participants and observers’ perceptions of their
personality traits, measured through the same inventory applied to them;
These perceptions and the interpersonal characteristics of the observed subjects. Gif-
ford (1994), in this study, analysed both encoding and decoding processes of bodily
communication, as well as the relationship between their outcomes.
In his study, it is possible to recognize different methodologies and instruments for
the analysis of these two different processes. For the study of the encoding process,
the adopted techniques were: video-recording of participants in conversation, coding,
notation and analysis of their nonverbal signals through nonverbal scoring systems,
and completing of a self-reported questionnaire about their interpersonal traits. For ana-
lysing the decoding process, the stimuli in procedures used by Gifford were observations
of videotaped conversations without audio track by small groups of unacquainted peers,
and the measurement tool was a questionnaire for the evaluation of the subjects’ inter-
personal traits.
This chapter presents an introduction to research methods on decoding, that is, pro-
cedures and techniques of analyzing how people read, process and interpret bodily
forms of communication. It will introduce the reader to basic elements for understand-
ing a large section of bodily communication research. After reading this chapter, the
reader will have a clearer understanding of the methods used in the studies on bodily
communication decoding processes included in this handbook, too.
Within the decoding process, the “decoder”, the receiver of a bodily communica-
tion – through specific cognitive and attentive abilities, nonverbal sensitivity, motiva-
tions, past experiences, and stereotypes – gets, reads, decodes, interprets and
understands bodily signals of his/her interlocutors (the senders). The research questions
are: How does the receiver perceive bodily signals made by the sender? How does s/he
use and interpret such signals in order to know the sender better, to understand his/her
intentions, to behave and to interact with him/her? For example, are smiling people
perceived as more likeable, friendly or trustworthy? How is a speaker judged that
uses many hand gestures in comparison with a speaker that uses none or little of
them? Is s/he evaluated as more effective and/or nice when s/he uses a certain type
of gesture in contrast to when s/he uses another type? Are the recipients conscious
of their evaluations? Which of the receivers’ characteristics affect decoding processes?
What are bodily decoding skills? Which of the receivers’ behaviours are affected by
906 V. Methods
such evaluations? Are these processes automatic or controlled? When studying decoding
processes, researchers try to answer these or similar questions.
2. Research methods on decoding

The general theory behind this literature can be termed the general psychometric model
or approach, where a statistically significant relation is looked for among bodily signs
and consequent psychological perceptions/evaluations by the beholder. Thus, decoding
as discussed in this paper refers to perceptions of the receivers that are measurable
with psychometric methods and techniques. According to classical psychometric
theories, it is assumed that what is measured by the techniques described below corre-
sponds to the perceptions and evaluations of the recipients in reaction to or in front of
bodily signals of the senders, provided that measures’ reliability and validity have been
properly checked (see Borsboom, Mellenbergh, and Van Heerden 2004; Cronbach and
Meehl 1955).
Research on decoding can be carried out using different methods according to
the research questions and the hypotheses, as well as the background theoretical
approaches. The most common methodological procedures are: experimental designs
(laboratory, field experiments), stimulus materials (e.g., photos, videos), and measure-
ment procedures. This section touches on the first two topics: a brief description of
the stimuli which can be used in experimental designs on bodily decoding. The next
section deals in more detail with the third topic, i.e., measurement procedures.
To understand how bodily signals affect interpersonal perception, an experimental
design is the adequate method: the researcher manipulates the bodily stimulus to verify
how the receivers’ perception varies as a function of the stimulus variation. Stimuli are
shown to the participants in order to measure their reaction to them. These reactions
can be measured in different ways, as described below, depending on the research ques-
tions. Stimuli often consist in photos, videos or audio-tracks, which include different
bodily signals.
For example, Ekman (1972), in his classical studies on emotional expressions,
showed slides of differently posed facial expressions to people from different cultures,
with the aim to understand whether different cultures link a given expression to the
same emotion. In fact, people from different cultures identified the same basic emotion
in any given expression.
In her study on the body in social interactions, Burgoon (1991) showed pictures of
work colleagues engaged in different types of contact and posture to participants: par-
ticipants evaluated touching as affection, similarity and intimacy if accompanied with a
postural openness/relaxation; while closed/tense posture with parallel touching was
evaluated as dominance relationship.
Guerriero and Miller (1998) used film excerpts from distance learning training
courses finding that more bodily expressive teachers were evaluated more positively
in comparison to less expressive teachers.
In an experimental study on decoding of persuasive discourse, Maricchiolo et al.
(2009) showed participants five ad hoc video-messages. In each video, the speaker per-
formed only one kind of hand movements, according to one of five conditions:
only ideational gestures, only conversational gestures, only object-adaptors, only self-
adaptors, no gestures. The verbal content of the discourse was the same in all five
conditions. The aim was to investigate the effects of different types of hand gesture on
receivers’ evaluations about the effectiveness and persuasiveness of the speaker and
the message. When gesturing with ideational and conversational gestures, the speaker
was evaluated as more competent and the message as more persuasive than with
self-adaptors.
In an experimental study on gesture processing, Kelly and Goldsmith (2004) showed
participants lecture videos with or without gestures in different conditions of cognitive
load (left or right hemisphere), with the aim to evaluate the gestures’ effect on lecture
comprehension and evaluation, and to understand the role of the hemisphere in gesture
processing. In this case, the researcher, besides manipulating body stimuli (presence/
absence of gesture), also manipulated an individual variable (cognitive load on right/
left hemisphere) to verify the involvement of the right hemisphere on the gesture
processing aimed to the lecture evaluation.
Some stimuli can also be construed by computer from real stimuli. For example,
Kramer and Ward (2010) used composite face images, combining facial photos from
a pool of 63 Caucasian women on the basis of their self-reported personality traits
and health conditions. The authors composed 14 computerised faces on the basis of
highest and lowest scores on seven measures: The Big Five traits (openness, conscien-
tiousness, extraversion, agreeableness, neuroticism) and physical and mental health.
The composed images were shown to the participants with the aim to verify if particular
facial features are signals of personality and health. Outcomes showed that four of
the Big Five traits were accurately differentiated on the basis of the facial features
(conscientiousness was the exception), as was physical health.
Again, for identifying which socially salient characteristics are affected by politi-
cians’ bodily movements and whether they can be used to predict voting behaviour,
Kramer, Arend, and Ward (2010) used film segments of a presidential debate
(Obama vs. McCain) converted into stick-figure displays as stimuli. In these displays,
10 landmarks (eyes, shoulders, elbows, wrists, tie knot, tie point) were manually
identified for each frame and an animation was produced. The study found that only
physical health, perceived only from body motion, predicted the participants’ voting
behaviour.
Besides pictures, films or other stimuli, often ad hoc recorded simulated situations
can be used as stimuli for decoding research, where bodily signal of a confederate is
the stimulus for the participants. Some people (experimenter’s confederates) are
trained to perform particular bodily behaviours towards participants unaware of this.
The aim is to measure behavioural and/or evaluative reactions to the confederate’s
bodily performances. In a study on the role of interpersonal behaviour in perpetuating
stereotypes Castelli et al. (2009), for example, trained a confederate to perform two
specific bodily behaviours while responding (consistently or inconsistently with the ste-
reotype) experimental participants’ questions about the elderly, namely touching her
own face and crossing her legs. Experimenters expected these bodily actions to be mim-
icked by the participants when the answers of the confederate were consistent with the
stereotype of the elderly. The finding indicates that stereotypes are faced with subtle
bodily cues from the audience that can retroactively reinforce their behaviours and
thus make the dismissal of the stereotype difficult to be achieved. Maddux, Mullen,
and Galinsky (2008) manipulated bodily behaviours of experimental participants to
examine their effect on negotiation outcomes. The authors instructed some participants
908 V. Methods
to mimic the mannerisms of their negotiation partner to get a better outcome. Negotia-
tors who mimicked the mannerisms of their opponents both secured better individual
outcomes, and their dyads as a whole also performed better when mimicking occurred
compared to when they did not.
Other bodily features can be manipulated by experimental studies with confederates.
In another study, for example, confederates were instructed to manipulate four differ-
ent kinds of interruption while interviewing naı̈ve participants, providing change-
subject, same-subject, disagreement, and supportive interruptions, at two different
rates, more or less frequently (Gnisci et al. 2012). At the end of the interview, partici-
pants were asked to answer questions on the interview they had just gone through.
Results showed that the negative effects of change- and same-subject interruptions
were amplified when they were more frequent, as were the positive effects of supportive
interruptions. Contrary to expectations, disagreement interruptions were regarded as
positive. The results as a whole provide support for the amplification hypothesis,
whereby the frequency of interruptions during an interaction amplifies the positive or
negative effect of the type of the interruptions on the interruptee.
3. Data collection: Measurement procedures

Regardless of whether the study uses a lab or field experiment or whatever the stimulus
materials may be, the researcher will have to record people’s reactions to bodily beha-
viour. The distinction between direct and indirect measures, introduced in many fields
of psychological research, proves to be pertinent and useful also for the study of bodily
gesture perception. Those measurement procedures which provide an index of the to-
be-measured construct, without asking the participant directly for the information,
but inferring it on the basis of another behaviour, are called indirect (De Houwer
2006). This means that the construct, supposed to determine, at least partially, the
measurement outcome, can be inferred by the outcome itself.
The distinction between direct and indirect measures was firstly introduced because
of the exigency to obtain data not influenced by individual characteristics, such as social
desirability, self-presentation or lack of awareness and introspection (Dovidio and Fazio
1992). Furthermore, in the last decades, the concept of the mind’s dual functioning also
increased the theoretical and empirical relevance of considering both direct and indi-
rect measures. In social psychology, dual models are currently the model of choice to
explain interpersonal perception, impression formation, attitudes, etc., all fitting for
bodily decoding. These models distinguish processes into explicit and implicit, con-
trolled and uncontrolled, deliberative and spontaneous processes, that is, based on asso-
ciative processes or on semantic and noetic ones (Evans 2008; Smith and DeCoster
2000; Strack and Deutsch 2004). Direct and indirect measures are conceptualised as
tapping into different kinds of individual responses (namely, direct measures elicit
explicit, controlled and deliberative responses; indirect ones capture implicit, spontane-
ous and associative-based reactions); considering both direct and indirect measures con-
sents to collect more information on how bodily gestures are perceived both from a
reflective- and an associative-based process.
Applied to the studies on bodily communication, information on the participants’
interpretation and evaluation of body stimuli presentation, can be obtained through
different types of measure:
(i) direct measures: asking participants directly about their perception and evaluation
of bodily signals;
(ii) indirect measures: reaction and evaluation indirectly assessed without explicitly
asking the participants.
3.1. Direct measures

Direct measures consist of interviews, self-reported questionnaires, inventories, scales,
etc., in which participants are asked about their perception of the performers in the
stimulus material (or about the message, etc.). These measures are based on answers
with a substantial component of deliberation, reflection, elaboration, description, etc.
The aim is to understand how participants process (with or without awareness) others’
bodily signals and then deliberately decode these signals. These tools of data collection
can include oral or written questions of two different types: open and closed, on the
basis of the type of answer to give them. Examples of open answers can be: “Describe
the kind of relationship between the two persons in the picture”; “How do you define
the person in the photo?”
Closed questions propose specific alternative answers to the respondent: these pro-
posed answers can differ between each other either qualitatively or quantitatively. In
the first case, an example of qualitatively different answers in a question is: “Did the
person in the movie lie?” (alternative answers are “Yes” or “No”); or “For which of
the two persons depicted in the photo would you vote if they were candidates in the
upcoming political elections?” (alternative answers are “Candidate A” or “Candidate
B”); in the second case, an example of quantitatively different answers to a question
are given by response scales measuring the strength of the evaluation. For example:
(1) “How do you evaluate the person in the video?”

Hostile 1 2 3 4 5 6 7 Friendly
Incapable 1 2 3 4 5 6 7 Competent
In this case, the respondent marks the number representing the intensity of the evalu-
ation s/he gives to the stimulus (1–7 point scale: 1 is nearer to the left adjective, 7 to the
right adjective; 4 indicates a neutral evaluation). This form tries to extract numeric
values quantitatively reflecting the psychological process (other evaluation) to be mea-
sured, in order to use these “numbers” for subsequent quantitative data analyses
(Henerson, Morris, and Fitz-Gibbon 1987) and to finally derive statistical conclusions.
It is also possible to use an ordinal scale response for measuring perception of social
influence among the members of a group during a work discussion: group participants
are asked to put the other group members from the first to the last position, i.e., from
major to minor impact on the final group decision (Maricchiolo et al. 2011).
Compared to closed questions, open questions provide more information about the
participants’ evaluations, but they are harder to interpret and need coding work to yield
quantitative data. They are particularly useful in introductive or preliminary studies
given the richness of details they provide.
Standardized tools for measuring the decoding of bodily signs exist. The Interper-
sonal Perception Task (IPT, Costanzo and Archer 1989), for example, is a videotape
about bodily communication and social perception. Viewers see 30 brief scenes, each
910 V. Methods
30 to 60 seconds long. After each scene, viewers are asked to guess or “decode” some-
thing about each of the Interpersonal Perception Task scenes. The 30 Interpersonal
Perception Task scenes contain a full range of spontaneous body behaviours in context
and different modes of interaction (e.g., telephone, face-to-face, face-to-camera). For
each scene, there is one objectively correct answer. The answers regard the individua-
tion of five common types of social judgments: intimacy, competition, deception, kin-
ship, and status. The viewer can try to determine the correct answer by “reading”
bodily behaviour: e.g., facial expression, tone of voice, gesture, touch, glance, hesitation,
etc. Such a tool can be used to measure bodily ability and sensitivity to identify per-
ceived (with respect to actual) body cues of interpersonal perception. Studies on relia-
bility of the instrument (Archer and Costanzo 1988; Costanzo and Archer 1989) found
that the internal consistency rating (Kuder-Richardson Formula 20, KR-20) of the
instrument is .52 (The Kuder-Richardson Formula 20 is a measure of internal consis-
tency reliability for measures with dichotomous choices; Kuder and Richardson
1937). However, in spite of the low internal consistency, the Interpersonal Perception
Task seems to have fairly good validity as indicated by peer reports and measures of
convergent validity (Costanzo and Archer suggested that the internal consistency was
relatively low because the Interpersonal Perception Task sampled a diverse range of
scenes). Measures of accuracy and confidence of the viewers (Smith, Archer, and Cost-
anzo 1991) showed no significant correlation between the two: they depend on the mod-
ality of stimuli presentation (more confidence in Audiovisual, but more accuracy in
Visual only condition), on gender (women more accurate but less confident than
men), and on culture.
Another example of a standardized tool for measuring sensitivity to bodily commu-
nication is the Profile of Nonverbal Sensitivity (PONS Rosenthal et al. 1979). This
instrument is a standardized measure of accuracy in comprehending bodily communi-
cation and captures the individual difference in sensitivity at decoding posed emotional
states from bodily communication. The Profile of Nonverbal Sensitivity is a 45-min
16 mm sound film (or videotape) that comprises 220 two-second auditory or visual seg-
ments showing a single individual portraying various emotional states. The viewer has
to decide which of two behavioural alternative answers best describes the segment. The
220 segments represent scenarios from four affective quadrants (positive-dominant,
positive-submissive, negative-dominant, negative-submissive) crossed by 11 nonverbal
channels (e.g., face, body, tone of voice). The internal consistency of the Profile of Non-
verbal Sensitivity resulted to range from .86 to .92, and its median test-retest reliability
is .69. Like the Interpersonal Perception Task, the Profile of Nonverbal Sensitivity discri-
minates individual sensitivity to bodily communication, with respect to age (adults are
more sensitive than children), to gender (women are more sensitive than men), to mental
health (psychopathic patients are less sensitive than healthy people), and to culture.
Direct methods asking for opinions, preferences and evaluations are easy to
administer and, when adequately structured, they hold high reliability and validity. Nev-
ertheless, for providing adequate measures of the to-be-assessed construct, they require
respondents to be able to accurately identify and refer their opinions and perceptions and
to be disposed to express their own actual thoughts, even in case of socially undesirable
answers.
Researchers know that these conditions often are not matched so that, above all
when a socially relevant and sensitive issue is faced, people can provide inaccurate
answers. For clearing this hurdle, several ways to indirectly assess the individual’s eva-
luations have been proposed. The next paragraph aims to describe some of the most
widespread tools for an indirect measure of attitudes and evaluations in response to
bodily forms of communication and interaction.
3.2. Indirect measures

The peculiarity of indirect measures concerns the opportunity to collect information of
the individuals’ perception and evaluation, without asking people what they think
about the element of interest. Stated in other words, they allow accessing information
without asking for it directly, but inferring it from different signals (through psycho-
physiological indexes or neuro-imaging data for example) or from the performance
in a different task, apparently not linked to the construct to be measured (such as reac-
tion time to categorisation tasks).
The psycho-physiological measures include galvanic skin conductance, heart rate,
pupil dilation and variation in vascular activity as indexes of positive reactions and cere-
bral activation; whereas the activation of different facial muscles can be interpreted as
indicators of positive or negative reactions (labials and corrugator muscles, respec-
tively). Also eye movements are considered to be indirect indicators of psychological
constructs, for example, eye blinking is generically related to the amount of attention,
while eye tracking represents an index of the quantity and of the direction of individual
attention. Gullberg and Kita (2009), for example, used eye fixation as a measure of
attention paid by perceivers to the speaker’s hand gestures. These and other psycho-
physiological indexes are listed in Tab. 57.1, together with the associated psychological
responses.
Such measures, if properly applied, show how people react to different types of
bodily communication. To give an example, the effect produced by the perception of
different kinds of hand gestures on the evaluation of a message has been assessed
through an electromyographic (EMG) study of zygomatic major and corrugator super-
cilii muscles activity, indicating respectively pleasant and unpleasant feelings (Maric-
chiolo et al. 2008). Furthermore, heart rate and spontaneous eye blinking were also
measured as expressions of physiological arousal and vigilance/attention during mes-
sage transmission.
While the decoding process is associated with other cognitive abilities and efforts,
neuro-imaging techniques provide relevant information on the activation of specific
cerebral areas, indicating the activation of specific cognitive processes (memory, reason-
ing, etc.) as well as the activation of areas indicating positive or negative reaction to a
stimulus.
These techniques can be applied to the analysis of bodily communication in order to
enlighten the contribution of different brain areas involved in the perception, decoding
and evaluation of bodily communication.
Besides these techniques, providing indexes of neurologic and physiological corre-
lates of psychological processes, several tools and procedures have been recently
proposed and validated as indirect measures tapping into implicit (associative or impul-
sive) processes. They are generally based on reaction time measurement. The most used
is the Implicit Association Test (IAT, Greenwald, McGhee, and Schwartz 1998). If, for
example, the interest is in measuring the implicit evaluation of different types of
912 V. Methods
Tab. 57.1: Summary of the physiological measures employed as measurement tools.

Measure Physiological index Psychological correlates
Galvanic skin resistance skin electrical conductance Arousal, activation
Heart pulsation Variation of heart tensing Pleasure, activation,
(electrocardiogram) attention, recognition
Vascular activity Variation in blood volume and blood Pleasure, arousal,
pressure activation, recognition
Facial muscles activity: Electrical muscle contracture Reaction
(electromyogram)
–corrugators Supercilii muscles Negative
–labials Labials muscles Positive
Pupil dilation Temporary variation in the pupil Positive reactions,
dimension pleasure, arousal,
attention, elaboration
Eye blinking Eyelid beating Attention
Eye tracking Point of gaze and eye movement Attention level and
direction
Event-related evoked Latency, magnitude, and electrical wave Attention, elaboration
potentials (electrical position (electroencephalogram)
modification of the
nervous system)
Brain imaging Modification of the chemical composition Pleasure, arousal,
or of the flow of the brain fluids memory, elaboration
(functional magnetic resonance imaging)
Source: elaborated from Blascovich (2000); Fogarty and Stern (1989); Fridlund and Cacioppo
(1986); Gullberg and Holmqvist (1999); Kelly, Kravitz, and Hopkins (2004); Wang and Minor
(2008).
gestures (i.e., ideational vs. self-adaptors, according to the results by Maricchiolo

et al. 2009) on the positive-negative dimension, the Implicit Association Test procedure
can be employed as described below.
Resting on the idea that concepts sharing the same valence are more strongly asso-
ciated, the Implicit Association Test requires the respondent to categorise two opposite
classes of targets (i.e., ideational gestures vs. self-adaptors, according to Maricchiolo
et al. 2009) and two classes of attributes with opposite valence (i.e., positive and negative)
using only two categorisation keys. In different steps (see Tab. 57.2), each target shares
the response key with a positive attribute or with a negative attribute. Basically, the
idea is that if both the attribute and the target are positively connoted, the respondent
will be faster in executing this task, and slower when the target is evaluated positively
but associated with a negative attribute. On the contrary, if the target is evaluated nega-
tively, when asked to categorise the negative attribute and the target using the same
response key, the respondent will find this task easier and will be faster in executing it.
Reaction time in the categorisation task, thus, represents an indirect measure of the asso-
ciation between the target and the attribute. Because of the presence of two targets and
two attributes with opposite valence, the obtained index is a “relative” index, namely an
index of preference for one over two objects considered in opposition (that is a more pos-
itive evaluation for one of the two categories of gestures).
Of course, there is not always a specific or unambiguous counterpart for a target,

so the Implicit Association Test does not always match the exigencies to study the
automatic or implicit evaluation of a given target.
Tab. 57.2: Sequential phases of the Implicit Association Test procedure. Subtracting the response
latencies to the phase 3 from the response latency to the phase 5, an index of preference is
obtained, where negative values indicate a more positive evaluation of self-adaptors while positive
values indicate a more positive evaluation of ideational gestures (the computational procedure is
described in Greenwald, Nosek, and Banaji 2003).
Phase Task Example
1 Single categorisation of attributes Positive (on the right of the screen) versus
Negative (on the left)
2 Single categorisation of concepts Ideational (right) versus Self-adaptors (left)
3 Combined categorisation of attributes Positive and Ideational (right) versus
and concepts Negative and Self-adaptors (left)
4 Single categorisation of concepts Self-adaptors (right) versus Ideational (left)
(reversed on the screen)
5 Combined categorisation of attributes Positive and Self-adaptors versus Negative
and concepts (reversed) and Ideational
Other implicit measures have been proposed in order to provide an absolute index of
the measured construct, such as the Single Category Implicit Association Test (SCIAT;
Karpinski and Steinman 2006) or a tool flexible in establishing contextual characteris-
tics for the evaluative situation (Go-No go Association Task; GNAT; Nosek and Banaji
2001). These instruments and several others, together with their peculiarities, their dif-
ference and similarities are described by De Houwer and Moors (2010; see also De
Houwer et al. 2009).
Similarly based on reaction time, but on a different task is the Affective Priming
Task (Fazio et al. 1995). It rests on the idea that the exposure to an evaluated object
(prime) facilitates the judgment of a target attribute, if the subsequent attitude is con-
noted with the same valence of the prime. In other words, if the prime is evaluated as
positively as the subsequent target, the judgment of the target is faster; on the contrary,
if a positive prime is followed by a negative attribute, the evaluation of the target (neg-
ative attribute) is slower. For example, if a negative facial expression (evaluated nega-
tively by an individual) is primed, it will automatically activate negative evaluations. If
the presentation of this prime is followed by a negative target attribute (e.g. terrible),
the responded will indicate the connotation of the word faster than when the following
word is a positive attribute (e.g. wonderful ). So, the performance of the respondent, in
terms of reaction time, in executing the task to indicate the valence of the attributes,
shows how the respondent evaluates the prime (positively or negatively).
Shariff and Tracy (2009), using three different tools for assessing implicit associa-
tions, tested whether the bodily expression of pride is automatically detected and eval-
uated as a signal of the status of a group member. Their findings indicate that bodily
cues of pride are automatically processed as indicating high status, suggesting an inno-
vative application of the indirect measures to the analysis of bodily communication.
The use of indirect measures and the analysis of automatic (implicit) association
have several important implications and potentiality. When used together with explicit
914 V. Methods
measures, they can inform on different cognitive processings of bodily cues and on the
automatic perception of the sender by the recipient. These considerations assume an
even greater relevance considering that automatically activated evaluations influence
the deliberate and rational cognitive process leading to explicit evaluations and overt
behaviour (Strack and Deutsch 2004).
3.3. Comparing direct and indirect measures

As stated above, direct and indirect measures are assumed to tap into different cognitive
systems and processes, so they provide information, at least partly, differently. For this
reason, it cannot be concluded that one class of measures is more useful than another.
They both have advantages and disadvantages. Direct measures are easier to administer
and provide outcomes that are easier to interpret. Indeed, it is much easier to be sure that
the respondent understands what s/he is asked to evaluate and that s/he can execute her/
his task correctly (answering to the questions). However, this advantage of direct mea-
sures may sometimes turn into a limitation, for example, when the respondent has
good reasons to hide her/his own actual attitude or is unaware of it. Some factors contri-
buting to the reduction of direct measures’ validity and influencing the truthfulness of the
answers are known as “social desirability” (Ganster, Hennessey, and Luthans 1983). This
encompasses an individual’s tendency to present him or herself in a way positively match-
ing culturally derived norms and standards; or the so called “self-presentational bias”,
namely the attempt to provide a desirable image of oneself on relevant features (Schlen-
ker 1981). Since they operate partly outside the individuals’ control, indirect measures
help to bypass the influences exerted by these factors. On the other side, however, they
present some other disadvantages, for example, they generally require a longer time
of administration, are often implemented with a PC, their reliability is sometimes not
satisfactory and different indirect measures of the same construct often do not correlate.
Some authors (Hofmann et al. 2005; Nosek 2005) were interested in analysing this
relationship and highlighted the presence of a wide variation in the identified correla-
tion values, ranging from very low (close to 0) to very high. As a further step, they also
identified several classes of potential moderators of this relation, such as the above-
cited social desirability and self-presentation bias, but also the lack of introspective
access to implicit evaluations, the peculiarities of the object to be evaluated and the
characteristics of the assessed construct (evaluative strength, distinctiveness and dimen-
sionality). According to these findings, the relevance to test direct and indirect measures
and their relationship in each specific research field increases. There are two main aims:
the first one consists in the identification of a class of instruments and procedures specific
to a given research area; the second one is to theorise and empirically verify the inter-
play between automatic and deliberative cognition, concerning objects and processes
which are specific to different human psychological and behavioural domains.
4. Conclusions
These paragraphs aimed to introduce the main research methods and techniques to
analyse the decoding of bodily communication. Several tools, included in the two
macro-categories called direct and indirect measures, are described together with their
scope, their advantages and disadvantages. Obviously, direct and indirect measures
have some overlapping characteristics and some specificities. These features make them
more or less adequate and useful for different research fields and objects of study. In gen-
eral, however, they should not be considered as excluding each other. On the contrary, they
can be used jointly, in order to gain information on different cognitive processes (automatic
and reflective) involved in the decoding and evaluation of bodily signals. Indeed, even if
presently studies on the automatic evaluation of body communication are not so frequent,
it could be interesting to identify possible differences in the outcome of automatic (asso-
ciative) and deliberative (reflective) processing of bodily communicative signals.
The presence of advantages and disadvantages for both of these classes of measure-
ment tools strengthens the idea that, when possible and useful, direct and indirect mea-
sures should be used together, both to tap into different cognitive processes, as stated
so far, but also for analysing the relationship and the reciprocal influences between
the assessed constructs (Gawronski and Bodenhausen 2006; Strack and Deutsch 2004).
5. References
Archer, Dane and Mark Costanzo 1988. The Interpersonal Perception Task (IPT) for Instructors
and Researchers. Berkeley: University of California Extension Media Center.
Blascovich, James J. 2000. Psychophysiological methods. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social Psychology, 117–137. Cambridge: Cambridge
University Press.
Borsboom, Denny, Gideon J. Mellenbergh and Jaap Van Heerden 2004. The concept of validity.
Burgoon, Judee K. 1991. Relational message interpretations of touch, conversational distance, and
posture. Journal of Nonverbal Behavior 15: 233–259.
Castelli, Luigi, Giulia Pavan, Elisabetta Ferrari and Joshihisa Kashima 2009. The stereotyper and
the chameleon: The effects of stereotype use on perceivers’ mimicry. Journal of Experimental
Social Psychology 45: 835–839.
Costanzo, Mark and Dane Archer 1989. Interpreting the expressive behavior of others: The Inter-
personal Perception Task. Journal of Nonverbal Behavior 13: 225–245.
Cronbach, Lee J. and Paul E. Meehl 1955. Construct validity in psychological tests. Psychological
Bulletin 52: 281–302.
De Houwer, Jan 2006. What are implicit measures and why are we using them? In: Reinout W.
Wiers and Alan W. Stacy (eds.), The Handbook of Implicit Cognition and Addiction, 11–28.
De Houwer, Jan and Agnes Moors 2010. Implicit measures: Similarities and differences. In: Ber-
tram Gawronski and B. Keith Payne (eds.), Handbook of Implicit Social Cognition: Measure-
ment, Theory, and Applications, 176–193. New York: Guilford Press.
De Houwer, Jan, Sarah Teige-Mocigemba, Adriaan Spruyt and Agnes Moors 2009. Implicit mea-
sures: A normative analysis and review. Psychological Bulletin 135: 347–368.
DePaulo, Bella M. and Howard S. Friedman 1998. Nonverbal communication. In: Daniel T. Gil-
bert, Susan T. Fiske and Gardner Lindzey (eds.), The Handbook of Social Psychology, 4th edi-
tion, 3–39. New York: McGraw-Hill.
Dovidio, Jack F. and Russel H. Fazio 1992. New technologies for the direct and indirect assessment
of attitudes. In: Judith Tanur (ed.), Questions about Questions: Meaning, Memory, Expression,
and Social Interactions in Surveys, 204–237. New York: Russell Sage Foundation.
Ekman, Paul 1972. Universals and cultural differences in facial expressions of emotion. In: James K.
Cole (ed.), Nebraska Symposium on Motivation, 207–282. Lincoln: University of Nebraska Press.
Evans, Jonathan 2008. Dual-processing accounts of reasoning, judgment, and social cognition.
Annual Review of Psychology 59: 255–278.
916 V. Methods
Fazio, Russel H., Joni R. Jackson, Bridget C. Dunton and Carol J. Williams 1995. Variability in
automatic activation as an unobstrusive measure of racial attitudes: A bona fide pipeline? Jour-
nal of Personality and Social Psychology 69: 1013–1027.
Fogarty, Christine and John A. Stern 1989. Eye movements and blinks: Their relationship to
higher cognitive processes. International Journal of Psychophysiology 8: 35–42.
Fridlund, Albert J. and John T. Cacioppo 1986. Publication guidelines for human electromyo-
graphic research. Psychophysiology 23: 567–589.
Ganster, Dan C., Harry W. Hennessey and Fred Luthans 1983. Social desirability response effects:
Three alternative models. The Academy of Management Journal 26: 321–331.
Gawronski, Bertram and Galen V. Bodenhausen 2006. Associative and propositional processes in
evaluation: An integrative review of implicit and explicit change. Psychological Bulletin 132:
692–731.
Gifford, Robert 1986. SKANSIV: Seated Kinesic Activity Notation System, Version 4.1. (Available
from Robert Gifford, Department of Psychology, University of Victoria, Victoria, British
Columbia, Canada V8W 2Y2).
Gifford, Robert 1994. A lens framework for understanding the encoding and decoding of interpersonal
disposition in non verbal behaviour. Journal of Personality and Social Psychology 66: 398–412.
Gnisci, Augusto, Ida Sergi, Elvira De Luca and Vanessa Errico 2012. Does frequency of interrup-
tions amplify the effect of various types of interruptions? Experimental evidence. Journal of
Nonverbal Behavior 3601: 39–57.
Greenwald, Antony G., Debbie E. McGhee and Jordan L. K. Schwartz 1998. Measuring individual
differences in implicit cognition: The implicit association task. Journal of Personality and Social
Greenwald, Antony G., Brian A. Nosek and Mahzarin R. Banaji 2003. Understanding and using
the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and
Social Psychology 85: 197–216.
Guerriero, Laura K. and Tammy A. Miller 1998. Associations between nonverbal behaviors and
initial impressions of instructor competence and course context in videotaped distance educa-
tion courses. Communication Education 47: 30–42.
Gullberg, Marianne and Karl Holmqvist 1999. Keeping an eye on gestures: Visual perception of
gestures in face-to-face communication. Pragmatics and Cognition 7: 35–63.
movements and information uptake. Journal of Nonverbal Behavior 33: 251–277.
Henerson, Marlene E., Lynn L. Morris and Carol T. Fitz-Gibbon 1987. How to Measure Attitudes.
Newbury Park, CA: Sage.
Hofmann, Wilhelm, Bertram Gawronski, Tobias Gschwendner, Huy Le and Manfred Schmitt
2005. A meta-analysis on the correlation between the Implicit Association Test and explicit
self-report measures. Personality and Social Psychology Bulletin 31: 1369–1385.
Karpinski, Andrew and Ross B. Steinman 2006. The single category implicit association test as a
measure of implicit social cognition. Journal of Personality and Social Psychology 91: 16–32.
Kelly, Spencer D. and Linda J. Goldsmith 2004. Gesture and right hemisphere involvement in
evaluating lecture material. Gesture 4(1): 25–42.
Kramer, Robin S. S., Isabel Arend and Robert Ward 2010. Perceived health from biological
motion predicts voting behaviour. Quarterly Journal of Experimental Psychology 63: 625–632.
Kramer, Robin S. S. and Robert Ward 2010. Internal facial features are signals of personality and
health. Quarterly Journal of Experimental Psychology 63: 2273–2287.
Kuder, G. Frederic and Marion W. Richardson 1937. The theory of the estimation of test reliabil-
ity. Psychometrika 2: 151–160.
Maddux, William W., Elizabeth Mullen and Adam D. Galinsky 2008. Chameleons bake bigger
pies and take bigger pieces: Strategic behavioral mimicry facilitates negotiation outcomes.
Journal of Experimental Social Psychology 44: 461–468.
58. Analysing facial expression using the facial action coding system (FACS) 917
Maricchiolo, Fridanna, Marino Bonaiuto, Augusto Gnisci and Gianluca Ficca 2008. Effects of
hand gestures on persuasion process. Presented at the 15th General Meeting of the European
Association of Experimental Social Psychology, Opatija, Croatia, 10–14 June 2008.
Maricchiolo, Fridanna, Augusto Gnisci, Marino Bonaiuto and Gianluca Ficca 2009. Effects of dif-
ference types of hand gestures in persuasive speech on receivers’ evaluations. Language and
Cognitive Processes 24: 239–266.
Maricchiolo, Fridanna, Stefano Livi, Marino Bonaiuto and Augusto Gnisci 2011. Hand gesture and
perceived influence in small group interaction. Spanish Journal of Psychology 14(2): 755–764.
Nosek, Brian A. 2005. Moderators of the relationship between implicit and explicit evaluation.
Journal of Experimental Psychology: General 134: 565–584.
Nosek, Brian A. and Mahzarin R. Banaji 2001. The go/no-go association task. Social Cognition 19:
161–176.
Rosenthal, Robert, Judit A. Hall, Robin M. DiMatteo, Peter L. Rogers and Dane Archer 1979.
Sensitivity to Nonverbal Communication: The PONS Test. Baltimore: Johns Hopkins Univer-
sity Press.
Schlenker, Barry R. 1981. Self-Presentation: A conceptualization and model. Paper presented at
the 89th Annual Convention of the American Psychological Association, Los Angeles, CA,
August 24–26, 1981.
Shariff, Azim and Jessica Tracy 2009. Knowing who’s boss: implicit perceptions of status from the
nonverbal expression of pride. Emotion 9: 631–639.
Smith, Eliot R. and Jamie DeCoster 2000. Dual process models in social and cognitive psychology:
Conceptual integration and links to underlying memory systems. Personality and Social Psy-
chology Review 4: 108–131.
Smith, Heather J., Dane Archer and Mark Costanzo 1991 “Just a hunch”: Accuracy and awareness
in person perception. Journal of Nonverbal Behavior 15: 3–18.
Strack, Fritz and Roland Deutsch 2004. Reflective and impulsive determinants of social behavior.
Personality and Social Psychology Review 8: 220–247.
Wang, Yong J. and Michael S. Minor 2008. Validity, reliability, and applicability of psychophysio-
logical techniques in marketing research. Psychology and Marketing 25: 197–232.
Wiggins, Jerry S. 1979. A psychological taxonomy of trait-descriptive terms: The interpersonal
domain. Journal of Personality and Social Psychology 37: 395–412.

Angiola Di Conza, Naples (Italy)
58. Analysing facial expression using the facial

action coding system (FACS)
1. Why do we need a standardised facial expression coding system?
2. The Facial Action Coding System (FACS).
3. Applications of the Facial Action Coding System
4. Modifications of Facial Action Coding System
5. Conclusion
6. References
918 V. Methods
Abstract
The Facial Action Coding System (FACS) was developed in 1978 to provide a common
language with which to translate facial expression descriptions, observations and findings
between studies, research groups, populations and species. As the object of study is visual
and subject to perceptual specialization (faces), an objective, common language is essen-
tial. FACS is anatomically based, and aims to identify the minimal units of facial move-
ment related to the underlying muscle movement. Since development of FACS, scientists
have modified the system for use with babies, and also other species. These recent exten-
sions have injected much needed standardisation across research areas, and mean that
larger scale comparative and evolutionary analyses can now be conducted.
1. Why do we need a standardised facial expression

coding system?
Facial expression is sometimes considered one of the only truly universal human beha-
viours (Brown 1991). Six basic facial expressions (happy, sad, angry, disgusted, fearful
and surprised) are recognised similarly in different cultures (Ekman, Sorenson, and
Friesen 1969). These basic facial expressions do not appear to require significant learn-
ing or experience, but rather can be produced even in the absence of a model during
development, as evidenced by research showing that blind athletes produce similar
facial expressions to sighted athletes (Matsumoto and Willingham 2009). Further, the
muscles which are essential for the production of the six basic expressions were
found to be invariant in dissection studies (Waller, Cray, and Burrows 2008). In con-
trast, other facial muscles can vary in size, shape and occurrence from person to person.
These additional muscles may be used in more variable facial expressions, which are
subject to greater cultural and individual influences (Waller et al. 2008). Facial expres-
sions have clear counterparts in related species (see Parr, Waller, and Vick 2007), sug-
gesting that they are not recent evolutionary developments, but instead are rooted in a
distant phylogenetic past. There is considerable evidence, therefore, that these basic
facial expressions are biologically based and innate.
Unsurprisingly, elements of the human mind seem to be specifically designed to han-
dle the information available in facial expressions, ensuring that responses are accurate
and efficient. We do not need to be aware of the details of a facial expression in order
to respond appropriately, suggesting that facial expression recognition is composed
of hard-wired perceptual shortcuts. Such perceptual automaticity is useful in order
to navigate the social world as a human being, but has serious implications for scien-
tific observation. Automatic, streamlined perceptual processing may be a hindrance
when scientists wish to record the morphological detail of facial expressions with rig-
orous objectivity. To achieve this goal, scientific observation requires standardised
methods.
Universal facial expressions are processed as whole, discrete units and so identifica-
tion of individual components can be affected by the position and shape of other com-
ponents (the configural whole). Calder and colleagues (2000) demonstrated this effect
by forming composite facial expressions using components of one expression paired
with components of another (for example, a “sad” upper face paired with a “disgusted”
lower face). Participants were less able to identify the individual parts in the composite
condition, than when they viewed the same expression misaligned (for example, a “sad”
upper face misaligned with a “sad” lower face). These findings suggest that expressions
are processed in terms of configural content, so that the shape and position of the
mouth are coded with respect to the shape and position of the eyes. This composite
effect is similar to that found in facial identity processing (Young, Hellawell, and
Hay 1987) and eye gaze processing ( Jenkins and Langton 2003). When composite
facial expressions are presented, therefore, a new configuration is formed that interferes
with the processing of constituent parts. For example, Seyama and Nagayama (2002)
found that eyes are perceived as larger in happy faces than in surprised faces, based
on composite photographs created with eye size constant.
It is possible that ascribing emotion to facial expressions – as people routinely do –
further compounds difficulties in identifying, describing or comparing individual fea-
tures. Eastwood, Smilek and Merikle (2003) found that negative facial expressions
disrupt performance in experiments where participants are asked to count the features
of the face, suggesting that attention to faces can depend on perceived valence (Ohman,
Lundqvist, and Esteves 2001). For example, Waller et al. (2007) found that mor-
phological comparisons between human and chimpanzee facial expressions were
affected by the perceived emotion reflected in the chimpanzee expression. If the chim-
panzee expression (bared-teeth face) was considered to reflect angry emotion, similar-
ity to the human smile on specific physical parameters was underestimated. This finding
is consistent with Reisberg and Chambers’ (1991) argument that images are perceptu-
ally organised in terms of how the object is “understood”, and that initial perceptual
organisation restrains how shapes can be manipulated in imagery. Thus, if a face is pro-
cessed and retained as a prototypical emotional expression schema, features highly
salient of that expression (e.g. upturned mouth corners in a smile) may be retained
in preference to other features. Interpreting a face in emotional terms may also affect
perceived comparisons of images held in memory (mental images), which could affect
scientific discussions when scientists are not using direct observations.
In short, faces appear to be special in terms of how they are cognitively processed.
Therefore, scientific description of facial expressions may require extra measures to
ensure accuracy.
2. The Facial Action Coding System (FACS).

2.1. Development of the Facial Action Coding System
Hjortsjö (1970) was the first to attempt a systematic description of the component
muscle movements of facial expression. Prior to this, researchers who were studying
facial expression tended to use only judge’s ratings of expressions and were largely un-
interested in the movements of the face per se. Hjortsjö documented the facial appear-
ance of muscle contractions with painstaking detail and with clear reference to the
underlying physiology, using numbers to identify each individual movement. Soon
after, Paul Ekman, Wallace Friesen and Silvan Tomkins developed the Facial Action
Scoring Technique (FAST: Ekman, Friesen, and Tomkins 1971) which eventually lead
to the Facial Action Coding System’s first publication in 1978 (FACS: Ekman and Frie-
sen 1978). The Maximally Discriminative Facial Movement Coding System (MAX:
Izard 1979) was developed about the same time, but with less focus on the anatomical
920 V. Methods
correspondence between facial movements and facial muscles. The Facial Action Cod-
ing System has been the most commonly used facial expression research tool in human
studies for over thirty years and has continued to be updated and refined to reflect new
research and technologies (Ekman, Friesen, and Hager 2002).
The Facial Action Coding System is an anatomically based system and aims to identify
the most basic components of facial expression – the minimal observable movements of
the face. As such, Facial Action Coding System is built largely on the pioneering work of
Duchenne de Boulogne. In The Mechanism of Human Facial Expression, Duchenne
(1872, reprinted in 1990) identified the anatomical basis of facial movements through sur-
face electrical stimulation of facial muscles. These physiological experiments were the
first to be published accompanied by photography, which is fitting for such visual subject
matter. Duchenne produced a set of images documenting the appearance of each individ-
ual muscle contraction, and discussed how each might be related to specific emotions.
More recently, the movement of each individual muscle has been documented using
intramuscular electrical stimulation (Waller et al. 2006). Intramuscular (as opposed to
surface) stimulation allows the muscle to be accessed directly, thus minimising activity
from surrounding muscles and displaying the contribution of that muscle alone. An
understanding of the contribution of each individual muscle is a vital and fundamental
basis of Facial Action Coding System.
2.2. Description of the Facial Action Coding System (FACS)

The Facial Action Coding System, although ultimately used by many researchers to
study facial expressions in relation to emotion, is a facial movement coding system.
The components of facial expressions (as opposed to whole expressions composed of
multiple movements) are identified and recorded, and can be used to build expressions
using a bottom-up approach. In total, 58 component movements have been identified in
the human face, using Facial Action Coding System. This includes 33 Action Units
(hereafter AUs), where the muscular basis is specified, and an additional 25 action de-
scriptors (hereafter ADs), where the movement is more general (e.g., head movements,
eye movements). Each AU has a number and a name (e.g., AU 4, browlowerer), and
the premise is that although some appearance changes associated with that Action
Unit might differ from person to person, there are some basic, identifiable appearance
changes which do not. These basic changes are used as minimal criteria for identifying
and coding the Action Unit. Thus, Facial Action Coding System provides a tool that can
be used to describe facial movements in individuals whose faces vary widely with regard
to bone structure, fat deposits, skin texture, and other characteristics, because these
differences are buffered by the system.
The strength of Facial Action Coding System lies in its standardisation. To use the
system, researchers must learn to identify these base units of facial movements using
a detailed manual (Ekman, Friesen, and Hager 2002) and take a final test for certifica-
tion. Although “in-house” inter-coder reliability may be desirable for specific studies in
addition to certification, it is not recommended as a substitute. Training takes an esti-
mated 100 hours, but can take more or less time depending on the context. Learning
in groups can make the process easier as it can be helpful to “examine the similarities
and differences shown by other people who are learning this facial measurement
procedure with you” (Ekman and Friesen 1978).
The manual presents each Action Unit in turn. First, a schematic of the target muscle
superimposed on a face photograph is used to show how the muscles are anatomically
located (see Fig. 58.1A). A second figure shows the location and direction of muscle
action (see Fig. 58.1B), and the appearance changes (and minimal criteria for coding)
are then listed. The coder is taught how to distinguish between similar Action Units
(such as AU 14, dimpler; and AU 20, lip stretch) by identifying subtle differences,
and is also taught specific co-occurrence rules for movements which change when active
in combination with another. The Facial Action Coding System can also be used to
identify movements that occur only on one side of the face (unilateral), or on the
upper or lower lip. Intensity of an Action Unit can be recorded using additional
codes (a = trace evidence, e = maximum), but use of intensity codes is not essential.
The coder learns to produce the Action Unit by first attempting the movement while
looking in a mirror, and then feeling the underlying muscle movement. The coder
then practices coding still photographs and video clips of posed movements. Subtle ac-
tions can be difficult to detect in still photographs, and so the emphasis (and final test) is
on recognising Action Units through recordings of continuous movement.
3. Applications of the Facial Action Coding System

3.1. How can the Facial Action Coding System be used
Given its origins, it is not surprising that the most common uses for the Facial Action
Coding System have been to describe and label spontaneous facial expressions of emo-
tion, and to guide the production of posed facial expressions to serve as stimuli for the
identification of emotions. Researchers who share Paul Ekman’s view of the universal-
ity of discrete facial expressions of emotion are more prominently represented in the
Facial Action Coding System research literature than facial expression researchers
with different (e.g., dimensional) theoretical orientations. Nevertheless, as an anatomi-
cally-based descriptive research tool, the Facial Action Coding System requires no as-
sumptions about the nature of emotions, or indeed the assumption that the observed
facial movements or combinations of movements are associated with internal states
of any kind. The Facial Action Coding System is, quite simply, a tool. Thus, researchers
interested in conversational signals, regulation of social interactions, pain, or cross-
cultural studies – as well as artists (Faigin 1990) and animators (see Kanfer 1997) –
are among those who have used Facial Action Coding System to describe or generate
human facial movements.
Alongside explicit changes in Facial Action Coding System scoring and use made by
Ekman and colleagues since its original publication in 1978, a body of information
regarding “best practice” in the use of the Facial Action Coding System has evolved
through the efforts of scientists in many independent research centers. Common issues
have been assessing inter-coder reliability, whether to code all Action Units or a subset,
and whether to include intensity ratings and dynamic aspects of the movements.
3.1.1. Inter-coder reliability

When reporting Facial Action Coding System data, a critical issue is inter-coder relia-
bility. Ekman and Friesen suggested the following “Agreement Index” for calculating
922 V. Methods
reliability of coding: dividing the number of Action Unit scores on which two persons
agreed, by the sum of the number of Action Units scored by each person (Ekman and
Friesen 1978). The Agreement Index deals only with the presence or absence of specific
Action Units, and does not address the reliability of intensity ratings. Although this for-
mula is easy to apply and has been used extensively, it does not take the probability of
chance agreements into account. Therefore, some researchers have encouraged the use
of other statistics, such as Cohen’s Kappa coefficient. In addition, by treating all Action
Units under one umbrella, the Agreement Index ignores possible differences in relia-
bility coding for individual Action Units. This issue was addressed by Sayette and col-
leagues (Sayette et al. 2001), who assessed inter-rater reliability of individual Facial
Action Coding System Action Units from three separate studies designed to elicit
spontaneous facial expressions of emotion (rating the pleasantness of odors, present-
ing nicotine cues to smokers, and requiring participants to present a speech about
their physical appearance). Using coefficient kappa as their measure of interobserver
agreement, these researchers found that the reliability of Facial Action Coding Sys-
tem was good to excellent for 90% of the Action Units that were assessed. Intensity
ratings were considered to be good when a 3-point (rather than 5-point) rating scale
was used.
3.1.2. Choosing what to code

Ekman and Friesen recommended coding all observable facial movements when re-
searchers first begin coding with Facial Action Coding System, and when they are cod-
ing facial expressions in populations new to them (Ekman and Friesen 1978). A
problem with selectively coding only the movements already believed to be meaningful
or relevant is that it is possible to leave out movements that are later found to be impor-
tant. For example, although a lip corner raise (AU 12) has long been considered rele-
vant for the study of happiness, the importance of the cheek raise (AU 6) as a possible
correlate of “enjoyment” (Duchenne 1990) smiles was overlooked by most researchers
before Ekman (see Ekman, Davidson, and Friesen 1990).
Nevertheless, manually coding for all possible movements can be so labor-intensive
as to be impractical. A rule of thumb is that 100 minutes are needed to comprehensively
Facial Action Coding System code one minute of videotape. In the majority of pub-
lished studies, therefore, researchers have limited the amount of coding by coding
only certain target Action Units of interest, normally those identified by previous
research or pilot studies. Another strategy used to reduce the coding time required is
to identify in advance segments that are most likely to contain the movements of great-
est interest (e.g., by coding only the 3 seconds following the highest dramatic point in an
emotionally-evocative film clip).
As an alternative, the Emotion Facial Action Coding System (EMFACS) was devel-
oped specifically to identify and score only clearly obvious movements that had been
shown to be associated with emotion (Ekman, Levenson, and Friesen 1983). Because
Emotion Facial Action Coding System was designed to be used only by fully-competent
Facial Action Coding System coders, the expected training time (100 hours) is the same.
However, because only twelve key Action Units are scored (as “events”) and coding is
conducted primarily in real time, rather than through slow-motion analysis, coding nor-
mally takes only one-tenth as long as for Facial Action Coding System. The Emotion
Facial Action Coding System can therefore also be used to code live events that cannot
be recorded, assuming the face can be clearly seen. Given the focus on emotion, the
Emotion Facial Action Coding System cannot be used for studies in which Action
Units are not associated with emotion. Importantly, given the a priori assumptions
about facial movements and emotion, it cannot be used to test whether additional
Action Units are associated with emotion.
3.1.3. Temporal and dynamic aspects of facial movement

In addition to coding morphology, Facial Action Coding System users can assess
temporal and dynamic aspects of facial movements, including overall duration of a
movement, onset and duration of onset, apex, and offset; ballistic trajectory (i.e.,
smoothness of the movement over its course), consistency of durations for repeated
appearances of the same movement, and coordination of apexes of different move-
ments. According to Ekman and Rosenberg (2005), these characteristics may be espe-
cially important for distinguishing between facial expressions that are spontaneous and
accompanied by emotional feelings, versus expressions that are deliberate and non-
emotional. Coding temporal aspects of the data may be facilitated through the use
of event-logging software (commercial software such as Noldus Observer or Mangold
Interact, or shareware such as the Continuous Measurement System; Messinger et al.
2009). Even with such software, however, manually coding for temporal and dynamic
features is exponentially more difficult and time-consuming than coding mere pres-
ence/absence of movements, particularly when coding Action Units that co-occur or
overlap in time.
3.1.4. Automating Facial Action Coding System coding

The future of Facial Action Coding System may lie in the development of computerized
automatic facial measurement. Many research groups are working on automated facial
analysis systems, several of which explicitly target facial movements as defined by Facial
Action Coding System. Although relatively simple automated systems have been avail-
able for several years, early systems were able to code only single, posed movements.
With more recent advances, systems have improved in their ability to recognize Action
Units in combination as well as expressions made spontaneously (not posed), which typ-
ically are associated with greater head motion and rotation. One example is the CMU/
Pitt Automated Facial Image Analysis (AFA) System (Cohn and Kanade 2007). The
Automated Facial Image Analysis System requires initial reference point marking in
one frame of each image sequence. Using automatic feature point tracking, this system
has been shown to have high concurrent validity with manual coding for Action Units.
When errors occurred, they tended to be the same types of errors made by manual co-
ders. A fully automatic facial expression detection system, the Computer Expression
Recognition Toolbox (CERT) was created by Bartlett and colleagues (Bartlett et al.
2008). By applying machine learning methods, this system can reportedly detect thirty
Action Units in real time. In application studies, the Computer Expression Recognition
Toolbox was able to identify facial expressions associated with faked versus real pain, as
well as expressions that occurred when drivers become drowsy. Bartlett and colleagues
(2008) noted that although machines are able to learn to classify movements from a
924 V. Methods
database without Facial Action Coding System, the use of Facial Action Coding System
allows computer scientists to benefit from the large research base already available,
eliminating the need to collect a new data base and train detectors for each new
application.
Automated analysis is particularly useful for quantifying temporal dynamics of facial
movements, as well as changes in intensity of expression, both of which are especially
difficult for human coders. For example, Ambadar, Cohn, and Reed (2009) showed
that smiles that occurred when participants reported amusement differed from “embar-
rassed” and “polite” smiles with regard to variables such as velocity and offset. Thus,
automated processing has extended the types of research that are possible by greatly
expanding the ability to conduct temporal analyses with Facial Action Coding System
data, and to analyze Facial Action Coding System data together with gestures and
speech. The original goal of Facial Action Coding System was to identify all movements
that humans could reliably distinguish. With the refinement of automated systems, it is
possible that this repertoire will eventually expand to include movements that are not
differentially perceived at a conscious level.
3.2. Interpreting the meaning of Action Units

As noted, the Facial Action Coding System is itself merely descriptive; individual
Action Units and configurations of Action Units by themselves have no meaning inher-
ently attached. Nevertheless, ever since the publication of Facial Action Coding System,
many investigators have been interested in converting Action Units into emotions.
Ekman and Friesen (1978) provided some guidance in the form of a table in Part
Two of the original Investigator’s Guide, but even then urged caution in using the
table. They also briefly described how Action Units can be used as “emblems” (when
a person refers to an emotion by using an Action Unit without necessarily feeling it)
as well as conversational signals. In 1984, Friesen and Ekman published an Emotion
Facial Action Coding System Dictionary, to categorize scores by emotion (Friesen
and Ekman 1984). Through the work of Joseph Hagar, the dictionary was expanded
to become the Facial Action Coding System Affect Interpretation Dictionary (FAC-
SAID), which can be used with regular Facial Action Coding System or Emotion Facial
Action Coding System. The stated goal of Facial Action Coding System Affect Interpre-
tation Dictionary was to “link facial expressions with their psychological interpreta-
tions”, through the use of a relational database. Through this database, emotions are
associated with a set of related but not identical facial configurations.
Hundreds of studies have been published linking Facial Action Coding System
Action Units and Action Unit combinations with particular emotions (as well as cogni-
tive states, intentions, situations, and symbolic meanings). It is beyond the scope of this
chapter to review all of the studies; however, an excellent compendium, with commen-
tary, of representative studies of spontaneous facial expressions published between 1982
and 2005 can be found in the second edition of Ekman and Rosenberg’s (2005) edited
volume entitled What the Face Reveals. Investigations using Facial Action Coding Sys-
tem have encompassed a wide range of topics, from basic studies on the nature of emo-
tion, to studies looking at the role of facial expression with respect to communication,
pain, personality and individual differences, psychopathology and response to treatment,
neuroscience, and culture.
One of the most active areas of research, due in large part to the continuing efforts of
Ekman and his colleagues, is the investigation of lying and deception. According to
Ekman, lying with the face can refer to showing an emotion you do not feel, or mask-
ing/blunting an emotion you do feel, either by showing a different emotion or a neutral
face. Through Facial Expression/Awareness/Compassion/Emotion (F.A.C.E.) training,
he teaches people ranging from security agents to business executives to read “micro-
expressions of emotion” that may reveal subtle emotions or deception. Ekman’s
research has spawned a popular American television show, “Lie To Me” (for which
he is a consultant), where the viewing audience can see mock-up photos of faces with
real Facial Action Coding System Action Units in a fictionalized drama.
4. Modifications of Facial Action Coding System

One of the advantages of having an anatomically based facial coding system is that it
can be modified for use with subjects where the facial anatomy is similar enough to
allow translation. Facial Action Coding System is premised on the principle that, if
movements result from common facial muscles, movements can be compared between
individuals even where differences in facial architecture exist.
4.1. BabyFACS and Neonatal Facial Action Coding System

The morphology of the face of human infants and children is not identical to that of
adults, producing differences in the appearance changes produced by facial muscles.
Taking the unique characteristics of infants into account, Harriet Oster developed
the Facial Action Coding System for Infants and Young Children (BabyFACS),
which is a standardized modification of Facial Action Coding System for use with
infants and neonates, with its own coding manual. The Baby Facial Action Coding
System was revised to be consistent with the 2002 revision of the Facial Action Cod-
ing System (Oster 2006). Before training to use Baby Facial Action Coding System,
potential coders are expected to have passed the regular Facial Action Coding Sys-
tem Final Test. The Baby Facial Action Coding System has been important for
identifying developmental changes in facial expressions in normal as well as atypical
infants.
An alternative coding system for neonates, also based on Facial Action Coding
System, is the Neonatal Facial Coding System (NFCS; Grunau et al. 1998). The Neona-
tal Facial Coding System was developed to use specifically to measure immediate facial
responses to pain in neonates, although it has been used in infants up to 18 months of
age. Ten different facial actions are coded, making it more efficient than the Baby Facial
Action Coding System when investigators are interested only in expressions related
to pain.
4.2. ChimpFACS and MaqFACS

Facial Action Coding System has also been modified for use with related primate
species where facial muscles and the basic facial configuration are similar (and there-
fore potential movements are similar), but important differences in facial morphology
must be taken into account. There are a number of reasons why a Facial Action Coding
926 V. Methods
Fig. 58.1: Training material from the Facial Action Coding System manual (FACS: Ekman,
Friesen, and Hager 2002), showing the anatomical arrangement of the upper face muscles (A),
and direction of muscle action (B).
System system can benefit nonhuman primate facial expression research, which are
similar to those which called for the human Facial Action Coding System in the first
place:
(i) facial expression descriptions can be compared between studies;

(ii) facial expression descriptions can be compared between species, and
(iii) facial expressions can be compared between species in terms of the musculature,
allowing homology of expressions to be assessed at a morphological level.
The first modification of Facial Action Coding System for use with another species was
ChimpFACS (Vick et al. 2007) for use with chimpanzees (Pan trogolodytes, www.
chimfacs.com). This was followed by MaqFACS (Parr et al. 2010) for use with rhesus
macaques (Macaca mulatta, http://userwww.service.emory.edu//˜lparr/MaqFACS.html),
and most recently, GibbonFACS for use with gibbons and siamangs.
Modified Facial Action Coding Systems have been developed to be explicitly
comparable with the human Facial Action Coding System. Thus, development has
tended to follow a clear step-by-step process. The facial muscles of each species were
initially investigated through dissections to document the size, structure and attach-
ments of muscles in comparison to humans (for chimpanzees, Burrows et al. 2006;
for rhesus macaques, Burrows, Waller, and Parr 2009; for gibbons, Burrows et al.
2011). Intramuscular electrical stimulation experiments were then conducted (where

possible) to document the surface changes of facial landmarks during independent con-
traction of each muscle (for chimpanzees, Waller et al. 2006; for rhesus macaques,
Waller et al. 2008). These studies showed broad parallels between all four species,
despite several early accounts of dissimilarity between the facial muscles of humans
and other primate species (Huber 1931). A similar muscular arrangement can be
seen in the four species (Fig. 58.2), despite marked differences in size and shape of
bony structures (e.g. prognathism – the jaw protruding from the skull). The background
information about the facial musculature was used as a basis to identify Action Units in
each species, through detailed observations of video footage and still images. Similar
to the human Facial Action Coding System, descriptions and coding instructions
for each Action Unit are published in scientific coding manuals (ChimpFACS:
Vick et al. 2007; MaqFACS: Parr et al. 2010), and users are required to pass a final
test to become certified. Many Action Units exhibit great similarity between species,
and so appearance changes and minimal criteria for identification are comparable
(e.g., AU 10, see Fig. 58.3) whereas for some species, wholly new Action Units
have been identified and/or existing Action Units have been modified. For example,
AU 18 (lip pucker) is present in rhesus macaques but has two distinct variants
(Fig. 58.4).
Using these primate Facial Action Coding Systems is still in its infancy, but so far
ChimpFACS analysis has been fruitful. Vick and Paukner (2010) identified two distinct
forms of yawn using ChimpFACS that previous studies had been unable to discrimi-
nate. The two yawns could be identified based on the specific combination of Action
Chimpanzee
Gibbon
Human
Rhesus macaque
Fig. 58.2: The facial muscles of the four study species used for FACS development: humans (from
Waller, Cray, and Burrows 2008), chimpanzees (from Burrows et al. 2006), gibbons (from Burrows
et al. 2011) and rhesus macaques (from Waller et al. 2008, adapted from Huber 1931). All images
by Tim Smith.
928 V. Methods
0 10c 10e
Fig. 58.3: AU 10 in humans and chimpanzees at moderate intensity (10c) and extreme intensity
(10e) compared to neutral (Ekman et al. 2002; Vick et al. 2007). Human sequence also includes
mouth opening (AU 25) and chimpanzee sequence includes mouth opening (AU 25) and relaxed
lip (AU 16).
Units, and were associated with subtly different contexts, behavioural patterns and sex
differences. The study indicates a level of complexity and subtlety in chimpanzee
facial expression previously unknown. In addition to this specific study, similarity
between basic human and chimpanzee facial expressions has been documented
using Facial Action Coding System (Parr and Waller 2007) and ChimpFACs has
been used to identify categories of chimpanzee facial expression using a bottom-up
approach, using discriminant function analysis (Parr, Waller, and Heintz 2008). Finally,
Seth Dobson has used a Facial Action Coding System style approach to conduct com-
parative analyses on the relationship between facial movement/expression, and socio-
ecological variables across the primate order (Dobson 2009a, 2009b), which promises
to be a very useful approach to understand the evolution of facial expression in
primates.
AU18i-true pucker AU18ii-outer pucker

Fig. 58.4: Two variants of the lip pucker in rhesus macaques (adapted from Parr et al. 2010)
5. Conclusion
Facial Action Coding System provides a common language with which to translate
facial expression descriptions, observations and findings between studies, research
groups, populations and species. When the object of study is visual and subject to
perceptual specialisation, such a common language is essential. Facial Action Coding
System has allowed scientists to study facial expression with a level of detail and objec-
tivity previously unavailable, and the recent extension to primate species has injected
much needed standardisation within these related fields.
6. References
Ambadar, Zara, Jeffrey F. Cohn and Lawrence I. Reed 2009. All smiles are not created equal:
Morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. Jour-
nal of Nonverbal Behavior 33(1): 17–34.
Bartlett, Marian, Gwen Littlewort, Esra Vural, Kang Lee, Mujdat Cetin, Aytul Ercil and Javier
Movellan 2008. Data mining spontaneous facial behavior with automatic expression coding.
Verbal and nonverbal features of human-human and human-machine interaction. Lecture
Notes in Computer Science 5042: 1–20.
Brown, Donald E. 1991. Human Universals. Philadelphia: Temple University Press.
Burrows, Anne M., Rui Diogo, Bridget M. Waller, Christopher J. Bonar and Katja Liebal 2011. Evo-
lution of the muscles of facial expression in a monogamous ape: Evaluating the relative influ-
ences of ecological and phylogenetic factors in hylobatids. Anatomical Record 294: 645–663.
Burrows, Anne M., Bridget M. Waller and Lisa A. Parr 2009. Facial musculature in the rhesus
macaque (Macaca mulatta): Evolutionary and functional contexts with comparisons to chim-
panzees and humans. Journal of Anatomy 215: 320–334.
Burrows, Anne M., Bridget M. Waller, Lisa A. Parr and Christopher J. Bonar 2006. Muscles of
facial expression in the chimpanzee (Pan troglodytes): Descriptive, comparative, and phyloge-
netic contexts. Journal of Anatomy 208(2): 153–168.
930 V. Methods
Calder, Andrew J., Andrew W. Young, Jill Keane and Michael Dean 2000. Configural information
in facial expression perception. Journal of Experimental Psychology: Human Perception and
Performance 26(2): 527–551.
Cohn, Jeffrey F. and Takeo Kanade 2007. Use of automated facial image analysis for measurement
of emotion expression. In: James A. Coan and John B. Allen (eds.), The Handbook of Emotion
Elicitation and Assessment, 222–238. New York: Oxford University Press.
Dobson, Seth D. 2009a. Allometry of facial mobility in anthropoid primates: Implications for the
evolution of facial expression. American Journal of Physical Anthropology 138: 70–81.
Dobson, Seth D. 2009b. Socioecological correlates of facial mobility in nonhuman anthropoids.
American Journal of Physical Anthropology 139: 413–420.
Duchenne de Boulogne, Guillaume B. 1990. The Mechanism of Human Facial Expression. New
York: Cambridge University Press. First published [1862].
Eastwood, John D., Daniel Smilek and Philip M. Merikle 2003. Negative facial expression cap-
tures attention and disrupts performance. Perception and Psychophysics 65: 352–358.
Ekman, Paul, Richard J. Davidson and Wallace V. Friesen 1990. Duchenne’s smile: Emotional
expression and brain physiology II. Journal of Personality and Social Psychology 58: 342–353.
Ekman, Paul and Wallace V. Friesen 1978. Facial Action Coding System: A Technique for the Mea-
surement of Facial Movement. Palo Alto, CA: Consulting Psychologists Press.
Ekman, Paul, Wallace V. Friesen and Joseph C. Hager 2002. Facial Action Coding System. Salt
Lake City, UT: Research Nexus.
Ekman, Paul, Wallace V. Friesen and Silvan S. Tomkins 1971. Facial affect scoring technique: A
first validity study. Semiotica 3: 37–58.
Ekman, Paul, Robert W. Levenson and Wallace V. Friesen 1983. Autonomic nervous system activ-
ity distinguishes among emotions. Science 221: 1208–1210.
Ekman, Paul and Erika L. Rosenberg 2005. What the Face Reveals: Basic and Applied Studies of
Spontaneous Expression Using the Facial Action Coding System (FACS), 2nd edition. Oxford:
Ekman, Paul, Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial dis-
plays of emotions. Science 164(3875): 86–88.
Faigin, Gary 1990. The Artist’s Complete Guide to Facial Expression. New York: Watson-Guptill.
Friesen, Wallace V. and Paul Ekman 1984. EMFACS-7; Emotional Facial Action Coding System.
Unpublished Manual.
Grunau, Ruth E., Tim Oberlander, Lisa Holsti and Michael F. Whitfield 1998. Bedside application
of the neonatal facial coding system in pain assessment of premature neonates. Pain 76: 277–286.
Hjortsjö, Carl-Herman 1970. Man’s Face and Mimic Language. Sweden: Nordens Boktryckeri, Lund.
Huber, Ernst 1931. Evolution of Facial Musculature and Expression. Baltimore: The John Hopkins
University Press.
Izard, Carroll. E. 1979. The maximally discriminative facial movement coding system (MAX). Un-
published manuscript, available from Instructional Resource Center, University of Delaware.
Jenkins, Jenny and Stephen R.H. Langton 2003. Configural processing in the perception of eye-
gaze direction. Perception 3: 1181–1188.
Kanfer, Stefan 1997. Serious Business: The Art and Commerce of Animation in America from
Betty Boop to Toy Story. New York: Scribner.
Matsumoto, David and Bob Willingham 2009. Spontaneous facial expressions of emotion of con-
genitally and non-congenitally blind individuals. Journal of Personality and Social Psychology
96(1): 1–10.
Messinger, Daniel S., Mohammad H. Mahoor, Sy-Miin Chow and Jeffrey F. Cohn 2009. Auto-
mated measurement of smile dynamics in mother-infant interaction: A pilot study. Infancy
14(3): 285–305.
Ohman, Arne, Daniel Lundqvist and Francisco Esteves 2001. The face in the crowd revisited: A
threat advantage with schematic stimuli. Journal of Personality and Social Psychology 80: 381–
396.
Oster, Harriet 2006. Baby FACS: Facial action coding system for infants and young children. Un-
published monograph and coding manual, New York University.
Parr, Lisa A. and Bridget M. Waller 2007. The evolution of human emotion. In: Todd M. Preuss
and John H. Kaas (eds.), The Evolution of Primate Nervous Systems. Oxford: Academic
Press.
Parr, Lisa A., Bridget M. Waller, Anne M. Burrows, Katalin M. Gothard and Sarah-Jane Vick
2010. MaqFACS: A muscle-based facial movement coding system for the macaque monkey.
American Journal of Physical Anthropology 143: 625–630.
Parr, Lisa A., Bridget M. Waller and Matthew Heintz 2008. Facial expression categorization by
chimpanzees using standardized stimuli. Emotion 8(2): 216–231.
Parr, Lisa A., Bridget M. Waller and Sarah-Jane Vick 2007. New developments in understanding
emotional facial signals in chimpanzees. Current Directions in Psychological Science 16(3):
117–122.
Reisberg, Deborah and Daniel Chambers 1991. Neither pictures or propositions: What can we
learn from a mental image? Canadian Journal of Psychology 45: 336–352.
Sayette, Michael A., Jeffrey F. Cohn, Joan M. Wertz, Michael A. Perrott and Dominic J. Parrott
2001. A psychometric evaluation of the facial action coding system for assessing spontaneous
expression. Journal of Nonverbal Behavior 25: 167–186.
Seyama, Junichiro and Ruth Nagayama 2002. Perceived eye size is larger in happy faces than in
surprised faces. Perception 31: 1153–1155.
Vick, Sarah-Jane and Annika Paukner 2010. Variation and context of yawns in captive chimpan-
zees (Pan troglodytes). American Journal of Primatology 72: 262–269.
Vick, Sarah-Jane, Bridget M. Waller, Lisa A. Parr, Marcia C. Smith Pasqualini and Kim A. Bard
2007. A cross-species comparison of facial morphology and movement in humans and chim-
panzees using the facial action coding system (FACS). Journal of Nonverbal Behaviour 31:
1–20.
Waller, Bridget M., Kim A. Bard, Sarah-Jane Vick and Marcia C. Smith Pasqualini 2007. Per-
ceived differences between chimpanzee and human facial expressions are related to emotional
interpretation. Journal of Comparative Psychology 121(4): 398–404.
Waller, Bridget M., James J. Cray and Anne M. Burrows 2008. Selection for universal facial emo-
tion. Emotion 8(3): 435–439.
Waller, Bridget M., Lisa A. Parr, Katalin M. Gothard, Anne M. Burrows and Andrew J. Fugle-
vand 2008. Mapping the contribution of single muscles to facial movements in the Rhesus
Macaque. Physiology & Behaviour 95(1–2): 93–100.
Waller, Bridget M., Sarah-Jane Vick, Lisa A. Parr, Kim A. Bard, Marcia C. Smith Pasqualini, Ka-
talin Gothard and Andrew J. Fuglevand 2006. Intramuscular stimulation of facial muscles in
humans and chimpanzees: Duchenne revisited. Emotion 6(3): 367–382.
Young, Andrew W., Deborah Hellawell and Dennis C. Hay 1987. Configurational information in
face perception. Perception 16: 747–759.
Bridget M. Waller, Portsmouth (UK)

Marcia Smith Pasqualini, Avila, MO (USA)
932 V. Methods
59. Coding psychopathology in movement behavior:

The movement psychodiagnostic inventory
1. Introduction
2. Psychiatric research on motor psychopathology
3. Microanalysis of movement
4. The movement psychodiagnostic inventory
5. Research with the movement psychodiagnostic inventory
6. Movement psychodiagnostic inventory and communication analysis
7. References
Abstract
Following a review of psychiatric research on motor psychopathology, the Movement
Psychodiagnostic Inventory (MPI) is described as a coding system for the microanalysis
of nonverbal behavior. The value of a highly refined movement coding method is illu-
strated with case examples, and research is discussed that indicates the potential of the
MPI for differentiating schizophrenia spectrum disorder, borderline, and narcissistic per-
sonality disorder through a multidimensional scaling of the coded data. Finally, the clin-
ical potential of the analysis is illustrated with MPI coding of a patient during a therapy
session that identified a distinctive A B C A sequence of behaviors intricately related to
the therapist’s intervention.
1. Introduction
Anna, a woman in her mid-40s diagnosed with chronic undifferentiated schizophrenia,
had spent many years in a psychiatric hospital. One day, she was selected for an inten-
sive therapy unit staffed by psychiatry residents. A few weeks into her new therapy
regimen, Anna explained to her young psychiatrist why he could not help her. “It’s
my movement, doctor. One part goes this way and the other goes that way by itself.”
What Anna reported was visible. A foot would tap nervously in one rhythm while
she rubbed her hands in another or she might shift her position suddenly in a chaotic
sequence that was far more disorganized than what might be called “ungraceful” or
“impulsive.” Picking up on her implicit request for help with this, her psychiatrist re-
commended dance/movement therapy. She responded well to the sessions, trying
hard to move rhythmically with the music and improvise various steps and motions
in synchrony with the dance therapist in their concerted effort to help her organize
her movements. At one moment, she spontaneously jumped into the air and landed
in one perfectly coordinated motion, then laughed with delight as if relieved to be so
connected within herself.
2. Psychiatric research on motor psychopathology

Motor symptoms of severe mental illness have been recognized for centuries. Darwin
([1872] 1965) conferred with a colleague, Dr. Maudsley, about how the facial and bodily
expressions of inmates in a London insane asylum differed from the expression of
59. Coding psychopathology in movement behavior 933
emotions in normal people. The 19th century classic texts on dementia praecox or
schizophrenia contain descriptions of pathological behaviors such as fixed postures,
highly exaggerated facial expressions, and contorted movements (Kraepelin [1919] 1971).
Today, thanks to advances in treatment, a person diagnosed as schizophrenic may not
display the obvious exaggerations and bizarre mannerisms seen in Darwin’s time,
but motor symptoms and signs of severe psychopathology are still very important for
accurate diagnosis and treatment.
Most methods devoted exclusively to coding motor pathology in psychiatric patients,
such as the Abnormal Involuntary Movement Scale (National Institute of Mental
Health 1974) or the Neurological Rating Scale for Extrapyramidal Effects (Simpson
and Angus 1970), are designed for the assessment of the effects of neuroleptic medica-
tions. If the assessment instrument is devised for psychiatric research and diagnosis per
se, the coding of movement pathology tends to be just one part of many, and embedded
within a range of symptoms. For example, in the Positive and Negative Syndrome Scale
(PANSS) (Kay, Fiszbein, and Opler 1987: 276), “blunted affect” is rated on a 7-point
severity scale and is defined as “diminished emotional responsiveness as characterized
by a reduction in facial expression, modulation of feelings, and communicative ges-
tures.” Blunted affect is just one of 30 items in the Positive and Negative Syndrome
Scale. Other examples are severity ratings of “conceptual disorganization” and “delu-
sions.” In the Signs and Symptoms of Psychotic Illness (SSPI) rating scale (Liddle
et al. 2002), 4 of the 20 items are related to motor pathology (“overactivity,” “underac-
tivity,” “flat affect” and “inappropriate affect”).
Such instruments are major improvements over clinical impression in that they dem-
onstrate good observer reliability and great value for research on differential diagnosis,
change over time, and discriminate medication effects from symptoms of the psychiatric
illness. However, there are ways to operationalize the movement coding that are far
more detailed and unambiguous. The Movement Psychodiagnostic Inventory (MPI)
(Davis 1970, 1997) focuses only on movement pathology in very refined movement
terms and is based on the premise that this will generate new insights and discoveries
about the nature of severe mental illness.
The Movement Psychodiagnostic Inventory, first developed by the author in the
1960s, has unusual roots. It was influenced by the dance/movement notation and analy-
sis methods of Rudolf Laban as applied to conversational behavior by his student, Irm-
gard Bartenieff. Observation methods originally developed for dance analysis are based
on extremely fine-grained descriptions of movement in its own terms. As such they are
complex, patterned, accurate, operationalized, and comprehensive. Laban and Barte-
nieff understood their value for the study of behavior, and Bartenieff pioneered the
application of what came to be called Laban Movement Analysis to the study of the
movement patterns of psychiatric patients (Bartenieff and Davis [1965] 1972). Drawing
on the Laban tradition is especially valuable because without immersion in the com-
plexity and richness of movement in its own terms, the observer will tend to look at
movement in limited ways.
3. Microanalysis of movement
As film and video made very fine levels of analysis feasible, the microanalysis of move-
ment behavior using slow motion and repeat viewing helped researchers to identify
934 V. Methods
nuances of initiation, coordination, and spatial changes easily missed in real time view.
Researchers such as linguists or anthropologists who study the nonverbal dimensions of
communication tend to code movement in terms of what body parts are moving in what
direction, and they often use common action terms like “leg cross”, “hand gesture”,
“head nod”, etc. Simple actions in vernacular terms are the way we commonly
view nonverbal behavior. But determining what to code and how in a study of psycho-
pathology is, in large part, a matter of supplementing measures of “what” people do
(e.g. number of head nods) with an examination of exactly “how” they do it (e.g. the
tempo, accent pattern, intensity of the nodding). In the 1960s, there was considerable
support for film and video studies of therapy sessions in the United States, and sup-
port for research on the communication patterns of schizophrenic patients drew re-
searchers from diverse disciplines: psychiatry, anthropology, ethology and linguistics.
Working as a research team, anthropologist Ray L. Birdwhistell (1970) and psychia-
trist Albert E. Scheflen (1973) showed through film microanalysis that psychiatric pa-
tients do not behave differently because they “do” different things than normal adults.
They are different because of how they do the fundamental business of sustaining a
conversation.
Decisions about what aspects of nonverbal behavior to attend to shape the results of
the microanalysis. For example, Scheflen (1973), interested in the regularity and orga-
nization of face-to-face interactions, analyzed distances between people, the relation-
ships of their sitting positions, the synchrony and echoing of actions between them,
and so on. In contrast, Ekman and Friesen (1978), interested in the study of transient
emotions, developed the Facial Affect Coding System (FACS) to microcode very subtle
changes in facial expression and conflicts between what one says and the affect that the
face may fleetingly express. Ellgring (1989), applying a simpler version of the Facial
Affect Coding System to the study of mood change and subjective well-being, found
a general decrease in facial activity in endogenous depressed individuals compared
with normal controls, but not in neurotic depressed patients.
Precision and accuracy becomes critical in a study of movement disorder. Consider
how differently a “sudden action” can be coded. Ekman (1985) describes “micromen-
tary expressions” or MMEs, changes in the face so sudden and brief they last barely
1/24ths of a second and are reliably coded only by those trained to see them or the
few with a natural talent for perceiving them. Micromentary facial expressions
(MMEs) are sudden actions of the face. So are facial tics of people with Tourette’s syn-
drome. In the Movement Psychodiagnostic Inventory (Davis 1997), one form of disor-
ganization is described as a lightening quick movement “out of the blue” that disrupts
the flow of the action, and this can occur in any part of the body, including the face. In
all three examples, sudden appears to mean less than 1/12 of a second. But each is a very
different action. The micromentary facial expressions are flashes of perceptible facial
expressions that trained observers can reliably identify as traces of specific affects
such as disgust, fear, sadness, surprise, etc. Ekman tracks micromentary facial expres-
sions as “leakage” of feelings that contradict what is being said or implied, i.e., related
to a truth that must not be expressed. He does not associate them with psychopathology
but with a context in which the person is in conflict. Presumably, anyone can display
micromentary facial expressions.
Tourette facial tics are, on close examination, sudden spasms of parts of the facial
musculature that do not necessarily constitute a facial expression with a categorical
name such as angry, sad, disgusted, and the like. Like micromentary facial expressions,
they disrupt the normal flow of the action and “break up” the face for a very brief
moment, but their form is different. They can be larger and slightly longer in duration
than micromentary facial expressions, and, although the tics may increase with stress, the
form of the tic itself does not seem particular to the context. Also, unlike the other
examples, the person with Tourette’s may work to transform the tic into conventional
behaviors or appear to consciously control the tic.
A “sudden, out of the blue” facial action as coded in the Movement Psychodiagnos-
tic Inventory appears to disrupt the continuity of the person’s movement, and is not
likely to be limited to the face or a particular set of facial muscles, but can occur in
other parts of the body and during very different types of actions. In other words, “sud-
den” actions as coded on the Movement Psychodiagnostic Inventory are more diffuse
and disorganizing in general than micromentary facial expressions or Tourette tics.
All three actions are in some way beyond one’s control, but there are perceptible dif-
ferences in their form and how they occur with other behaviors. These tiny, but visible
distinctions are so crucial that accurately coding “sudden facial action” can become an
exercise in differential diagnosis, distinguishing the normal person (who may be lying)
from the person with a neurological disorder from the person with severe psychopathol-
ogy. This comparison is presented to illustrate how critical it is to precisely define the
movement pathology in movement terms. Considering many aspects of nonverbal
behavior and honing the observations and coding to a high level detail can make all
the difference.
4. The movement psychodiagnostic inventory

The Movement Psychodiagnostic Inventory is designed to be used for observations of
therapy sessions or psychiatric interviews that last at least 30 minutes. Although trained
observers have completed it from live observation, and there is a “short form” that clin-
icians can use for systematic note taking after therapy sessions, the optimal conditions
for complete and reliable Movement Psychodiagnostic Inventory coding are a 50 min-
ute session videotaped with the camera in a continuous medium with a full body shot
that allows the observer to see gaze patterns and facial expression as well as leg move-
ments. Videotapes that restrict the viewing with cuts and close-ups are limited in value
for this coding. Some patterns may be observed in them, but the observations are re-
garded as incomplete and tentative.
The Movement Psychodiagnostic Inventory is divided into two complementary parts,
the first being “what” is displayed and the second “how.” Part 1, called the Action
Inventory, involves coding the frequency of common conversational behaviors: hand/
arm/shoulder gestures accompanying speech, head movement related to speech, self-
related actions such as fidgeting or grooming behaviors, instrumental actions, patterns
of gaze, head and torso orienting to the other, and the repertoire of positions and
total body movements, called “postural” movements.
The Action inventory of the Movement Psychodiagnostic Inventory (see Tab. 59:1) is
for coding absence or notable restriction in behaviors that are commonly seen in conver-
sation and interaction (e.g. never looks at the listener through an entire speaking turn or
never moves one’s head when speaking). In other words, the focus is on frequency of
what the person does during the interview (e.g. no fidgeting, 3 different body positions,
936 V. Methods
Tab. 59.1: Movement Psychodiagnostic Inventory Part 1: Actions
ACTION INVENTORY SUB-SYSTEM
1. Gesticulations, gestures accompanying speech: # _____

G esticulation
2. Emblems, gestures without speech, e.g. shrug: # _____
3. Repetitive actions, e.g. rocking: 0 1 2

Describe
S elf-related
4. Self-touching, e.g. scratching: 0 1 2
Describe
5. Instrumental actions, object handling: 0 1 2

e.g. smoking drinking activity (2 = 50% of session) I nstrumental
Describe
6. Speaks entire turn without looking at listener 0 1 2

(0 = no turns 1 = 1–3 turns 2 = 4+ turns)
7. Holds gaze away from speaker when addressed: 0 1 2

(0 = no turns 1 = 1–3 turns 2 = 4+ turns)
8. Head orienting in conversation: 0 1 2 O rienting

(0 = at least sometimes 1 = rarely 2 = never toward)
9. Trunk orienting in conversation: 0 1 2

0 = at least some active orienting, however slight
1 = stays with chair positions, i.e. no active orienting to
2 = stays markedly away
10. Head movements with speech: 0 1 2

0 = clearly accompany 2 = none
1 = nods or shakes only or very rare accenting moves
H ead Moves
11. Listens with head nods “yes” or “no”: 0 1 2
0 = at least sometimes 1 = very rarely 2 = never
Describe
12. Facial expression held longer than 15 seconds: 0 1 2

0 = none 1 = once or twice 2 = often F acial Expression
Describe
13. Different resting or “homebase” positions: # _____
14. Phrases of postural shifts: # _____ P osition/Posture
Copyright 1991 Martha Davis. For permission to quote or use contact author at
marthadavisr@mac.com
10 phrases of gesticulation, no head movements while speaking and so on.) Actions can
be infrequent or restricted for many reasons that have little or nothing to do with psy-
chopathology. For example, averted gaze might be related to age or cultural conven-
tions for addressing an authority figure. However, not looking at the other during a
conversation may be related to a more severe diagnosis, especially when combined with
disordered patterns or other forms of restriction.
The Movement Psychodiagnostic Inventory part 2 is called Primary Categories and
deals with how the person moves, the qualitative aspects of nonverbal behavior. It is
composed of 10 categories, each with from 3 to 12 items. While the focus is on serious
patterns of disturbance, these patterns may be very subtle or infrequently displayed.
Tab. 59.2: Movement Psychodiagnostic Inventory Part 2 Primary Categories Illustrated*

1. Disorganization (12 items) Item 6: different movements performed simultaneously in
different parts of the body, not synchronized.
2. Immobility (12 items) Item 3: Fixed shape or position held up in the air and against gravity
for 30 or more seconds.
3. Low Intensity (3 items) Item 2: No space, force, time variations visible in any movement
during the session
4. Low Spatial Complexity (3 items) Item 2: Any spatial complexity, i.e. clear directions, curved
transitions, projection into space restricted to hand or forearm movements.
5. Perseveration, Fixed-Invariant (4 items) Item 4 Moves strictly in one plane or spatial axis per
phrase.
6. Flaccidity or Retardation (4 items) Item 3: Complete limpness and giving into gravity at end
of most gestures or upper limb actions.
7. Diffusion (4 items) Item 1: Movement spatially diffuse and unclear through entire phrase
(i.e. absence of straight, round or 3-D paths or transitions, clear projection into space)
8. Exaggeration (3 items) Item 2: Large, extensive movements throughout phrase, i.e. no
modulation in large size within the phrase.
9. Hyperkinesis (3 items) Item 1: Three or more phrases of large limb and/or trunk shifts within
15 seconds or less (excluding instrumental activity and periods of gesticulating)
10. Even Control/Suspension (3 items) High degree of muscle tension maintained throughout
the entire movement phrase and for most of the phrases; absence of release, giving into
gravity.
*Adapted from Davis (1997) Movement Psychodiagnostic Inventory. For a copy of the full
MOVEMENT PSYCHODIAGNOSTIC INVENTORY and guide, contact the author at
marthadavisr@mac.com
As discussed earlier, with advances in treatment, motor symptoms of mental illness are
not necessarily obvious or exaggerated in the ways that they were before the 1950s.
Tab. 59.2 lists the Movement Psychodiagnostic Inventory Primary Categories and a
sample item from each. This is an inventory of patterns that have been identified by
the author and Irmgard Bartenieff in an initial project and later in the author’s research
on hospitalized psychiatric patients. It is an inventory based on film, video and live ob-
servations of over 100 patients (22 and 62 in formal studies, the rest in clinical studies).
Although there have been few additions since the 1980s, the list of patterns is likely to
expand with future research.
The first two categories, disorganization and immobility, have the most items, and
their prominence in the inventory may reflect the fact that the Movement Psychodiag-
nostic Inventory was developed primarily from observation of people diagnosed within
the schizophrenia spectrum. Disorganization in movement can occur in many ways,
some of which appear to be more serious than others. The patterns considered the
most serious are listed first and identified as such. (The criteria for “serious” – like
938 V. Methods
the assignment of patterns to a given category – is heuristic and based on examination

of their formal properties, not on factor analysis of large samples.) Two forms of disor-
ganization have already been described in the movement of the patient Anna discussed
above. Anna occasionally displayed very quick changes that broke up the continuity of
the movement and appeared to come “from nowhere”, as well as body fragmentation,
one part moving, then another, then a third in a chaotic sequence. A third example
would be movements of one part, such as repetitive shaking of the foot, that are com-
pletely out of synch with movements of another, such as repetitive hand gestures up and
down for emphasis. Note that these patterns are not simply awkward or ungraceful or
impulsive, but disorganize the person’s interactions and conversational behavior in an
extreme way that is very difficult, if not impossible, for someone to duplicate who
does not move that way. The “less serious” forms of disorganization in the Movement
Psychodiagnostic Inventory, such as perceptible pauses between each change of direc-
tion in a series of gestures, can be performed with practice and concentration.
The immobility category covers a range from absolutely no movement during the
entire session (which compares with diagnostic impressions of the person as catatonic)
to very subtle but notable forms of immobility such as no activation of the total body
even when changing position. (It is possible to make large position shifts by moving
just one part at a time, e.g. upper torso with lower body totally still, then leg isolated
in a cross, to lower body pivot with no movement in the upper, and finishing with
torso alone). In some ways this category compares with items on the Action inventory
of the part 1, (e.g. no full body or postural shifts), but the items listed under Immobility
constitute a more refined distinction. The Primary Categories of the Movement Psycho-
diagnostic Inventory include exaggerations or deficiencies in movement intensity, spa-
tial clarity and complexity, muscle tension, mobility and organization (see Tab. 59.2
for the full list with examples).
Ultimately, the coding is entered into a large data base for research analysis or con-
solidated into individual profiles that show which categories of movement disorder are
observed, how serious and extensive the disturbance is, and the degree to which the
conventional communicative behaviors are limited. For example, the Movement Psy-
chodiagnostic Inventory profile of Anna shows a high degree of disorganization,
extreme tension, and perseveration, but relatively little disturbance in mobility, inten-
sity, and spatial clarity/complexity and little restriction in communicative behavior
such as orienting and gesturing. Starting with this baseline profile, the dance/movement
therapist could observe and record subtle improvements or deterioration from week to
week that could be of clinical value.
5. Research with the movement psychodiagnostic inventory

While it takes training and judgment to accurately code the Movement Psychodiagnos-
tic Inventory, subjective criteria are kept to a minimum, the items are strictly operatio-
nalized, and the coder is advised not to code a pattern if unsure. However, the
researcher or clinician has some latitude on how detailed to be. For example, for certain
projects, it might be sufficient to record only that the person displayed signs of serious
disorganization, without specifying the exact type. There is also the option to record the
“subsystem” or type of nonverbal behavior in which the pattern is seen: gesticulation,
posture/position, self-related, instrumental, head movements or facial actions. For
example, disorganization may occur only in the gestures accompanying speech, but not
in grooming behaviors, fidgeting (self-related subsystem) or handling objects (instru-
mental subsystem).
The decision about how much information to record depends to a degree on one’s
aims and resources. Although the complete coding is time-consuming and labor inten-
sive, it has definite advantages. For example, disagreements are easier to identify and
resolve through consensus when the coding is detailed.
In most of our research studies applying the Movement Psychodiagnostic Inventory,
we have used three observers who independently code the individual’s movements
without sound or specific diagnostic information. Observer reliability is usually assessed
with Cohen’s kappa which corrects for chance agreements on qualitative judgments.
Because videotapes of psychiatric sessions are so difficult to secure and the coding is
so labor intensive, rather than throw out data points on which there is disagreement
or average codes which makes little sense for qualitative judgments, observers are
asked to review the points of disagreement and to make a decision based on consensus.
Although there are studies in which time-sampling of segments may be warranted, this
is not a good idea with the Movement Psychodiagnostic Inventory. Often, important
patterns are very rarely displayed, and limiting the sample increases the chance that
they will be missed.
Originally, we predicted that serious forms of disorganization, perseveration and
immobility would be associated with schizophrenia. The reality appears more compli-
cated, as we found in a study of 19 psychiatric patients with schizophrenia spectrum dis-
orders and 33 patients diagnosed with borderline and personality disorders (Davis,
Cruz, and Berger 1995). There were univariate differences in the social behavior con-
ventions operationalized in the Action inventory, with schizophrenia spectrum patients
displaying greater disturbance in behaviors that serve orienting to the other, gesticula-
tion and head movements underlining speech, and so on. However, the presence of the
formal patterns defined in Part 2 are not pathognomonic of schizophrenia in a simple
symptom used for diagnosis. The distribution of formal motor signs was equal between
the two groups, but mulitidimensional scaling of the data showed that the way the symp-
toms were configured was different. Naive impressions that a person is disturbed appear
to be based in part on visible disorders in conventional social behaviors, while the diag-
nostic significance of the more subtle qualitative signs of motor disorder depend on how
they co-occur.
The Movement Psychodiagnostic Inventory distinguished between borderline per-
sonality disorder and narcissistic personality disorder groups as well (Davis, Cruz,
and Berger 1995). A small literature on movement characteristics of patients with per-
sonality disorder suggested that for this study six Movement Psychodiagnostic Inven-
tory primary categories should be used: disorganization, immobility, diffusion, low
spatial complexity, flaccidity and hyperkinesis. The borderline patients showed mark-
edly higher mean scores on disorganization and low spatial complexity and higher
scores on hyperkinesis and flaccidity than the narcissistic group, with somewhat lower
mean scores on immobility. To reiterate, while these initial studies support the validity
of the Movement Psychodiagnostic Inventory as an instrument for studying psycho-
pathology, simple univariate relationships – presence of x means diagnosis y – are
not supported. Analysis of the formal qualitative distinctions of Movement Psychodiag-
nostic Inventory Part 2 appear to reveal “deep structure” differences and support a
940 V. Methods
more complicated model of the nature of severe psychopathology than one based on
univariate differences.
6. Movement psychodiagnostic inventory and

communication analysis
While the results of the initial group comparison research suggest that one has to be
very careful interpreting specific patterns displayed by an individual, it is valuable to
study how they occur within context, in interaction and with speech. For example, in
a film study of a therapy session (Davis 1985), the Movement Psychodiagnostic Inven-
tory movement patterns observed in the patient occurred in a distinct A B C A
sequence, each phase lasting a few minutes. In A, the patiens’s speech gestures were
highly perseverative, each wag of the finger exactly the same as the next in amplitude
and intensity. In B, the gesticulations escalated into large sweeps of the arm that stayed
fully extended, earning a code not only in the exaggeration category but, remarkably, in
the immobility category because, despite the rigor and large amplitude of the arm mo-
tions, the torso remained inert and immobile. In C, the gesticulating stopped, and the
patient was completely still except for one very slow, listless shift of his position.
Such microanalyses should be completed without sound, so that the observations are
not skewed or biased by hearing what is said. In this case, when the sound was finally
turned on, it was immediately clear that each movement phase corresponded to a dis-
tinct change in mental status: A with incoherent speech, B with grandiose accusations
and assertions, and C with coherent depressed statements about how discouraged the
patient was with his condition, that devolved back into A, the incoherent rambling
with perseverative gestures. This A B C A sequence appeared intricately related to
interaction. After listening quietly for a long time, the therapist interrupted the pa-
tient’s ranting in phase B with a poignant interpretation that was confirmed by the
patient in the way that he stopped and responded in phase C, and then seemed undone
by the patient as he returned to the withdrawn, incoherent A phase.
This has been a sketch of a complex coding instrument for studying psychopathology
based on very refined observation of body movement. While the Movement Psycho-
diagnostic Inventory was primarily designed for group comparisons and the diagnostic
potential of movement microanalysis, the Movement Psychodiagnostic Inventory con-
cepts and terms can be applied to research on face to face interaction, and the role
of moments of disordered behavior in communication. How these patterns occur in
context is critical for clinical applications of the research, but the role of moments of
movement disorder or restriction in conversation has potential for a wide range of
communication studies.
7. References
Bartenieff, Irmgard and Martha Davis 1972. Effort-shape analysis of movement. In: Martha
Davis (ed.), Research Approaches to Movement and Personality. New York: Arno Press.
Berger, Miriam Roskin and Robyn Flaum Cruz 1998. Movement characteristics of borderline and
narcissistic personality disorder patients. Poster presentation at the American Dance Therapy
Association Annual Conference, Albuquerque, New Mexico.
60. Laban based analysis and notation of body movement 941
Darwin, Charles 1965. The Expression of Emotions in Man and Animals. Chicago: University of
Chicago Press. First published [1872].
Davis, Martha 1970. Movement characteristics of hospitalized psychiatric patients. Proceedings of
Fifth Annual Conference, American Dance Therapy Association, 25–45.
Davis, Martha 1985. Nonverbal behavior research and psychotherapy. In: George Stricker and
Roert H. Keisner (eds.), From Research to Clinical Practice, 89–112. New York: Plenum.
Davis, Martha 1997. Guide to movement analysis methods. Part 2: Movement psychodiagnostic
inventory. Unpublished guidebook available from author at marthadavisr@mac.com.
Davis, Martha, Robyn Flaum Cruz and Miriam Roskin Berger 1995. Movement and psychodiag-
nosis: Schizophrenia spectrum and dramatic personality disorders. Presented at the Annual
Conference of the American Psychological Association.
Ekman, Paul 1985. Telling Lies. New York: W.W. Norton.
Ekman, Paul and Wallace V. Friesen 1978. The Facial Action Coding System. Palo Alto, CA: Con-
sulting Psychologists Press.
Ellgring, Heiner 1989. Nonverbal Communication in Depression. Cambridge: Cambridge Univer-
sity Press.
Kay, Stanley R., Abraham Fiszbein and Lewis A. Opler 1987. The positive and negative syndrome
scale (PANSS) for schizophrenia. Schizophrenia Bulletin 13: 261–276.
Kraepelin, Emil 1971. Dementia Praecox and Paraphrenia. Translated by R. M. Barclay. New
York: Krieger. First published [1919].
Liddle, Peter F., Elton T.C. Ngan, Gary Duffield, King Kho and Anthony J. Warren 2002. Signs and
symptoms of psychotic illness (SSPI): A rating scale. British Journal of Psychiatry 180: 45–50.
National Institute of Mental Health 1974. Abnormal involuntary movement scale (AIMS). Wash-
ington, DC: Alcohol, Drug Abuse, and Mental Health Administration, U.S. Department of
Health, Education and Welfare.
Scheflen, Albert E. 1973. Communicational Structure: Analysis of a Therapy Transaction. Bloo-
mington: Indiana University Press.
Simpson, G. M. and J. W. S. Angus 1970. A rating scale for extrapyramidal side effects. Acta Psy-
chiatrica Scandinavica 45: 11–19.
Martha Davis, New York, NY (USA)
60. Laban based analysis and notation of

body movement
1. Introduction
2. Overview of the theoretical framework
3. Methodological procedures to gather data
4. Meaning making/interpretation
5. Conclusions
6. References
Abstract
Laban-based analysis, originally developed for dance, can also serve as an excellent tool
for observing body movement in human communication. With the six categories and
approximately 60 parameters of Laban/Bartenieff Movement Studies, one is able to
942 V. Methods
observe movement in its complexity. The observer may select between all or the most
salient parameters, quantitative and a qualitative analysis, macro level to a micro level
and different methods of notation can be chosen. The notation has the advantage that
is quicker to write and can show movements which are happening simultaneously in
close proximity and in a pictorial, non-linear way. For those who are conversant with
the notation, it also enables communication about movement without language as a medi-
ator, so it is truly intercultural like music notation. Through 90 years in which Laban-
based analysis has been developed by many people besides Laban, it meets the intricacy
of human body movement through its complexity and systematic approach.
1. Introduction
Historically, Rudolf Laban (1879, 1958) first used his analysis for dance, but he had
already developed it further to be used in industry and for theatre. In his book the
Mastery of Movement he not only describes his concepts for movement on stage, but
also for the “theatre of life” i.e. communication through movement. Laban comments:
“So movement evidently reveals many different things. It is the result of the striving
after an object deemed valuable, or of a state of mind. […] It can characterize momen-
tary mood and reactions as well as constant features of the mover” (Laban 1975: 2).
Since a movement can have different meanings, it is important to have well differentiated
concepts to be able firstly to describe movement on a more objective level, i.e. with the
least amount of interpretation possible, then secondly to look at it in relationship to the
context to find meaning.
Movement is in a “continuous dynamic flux which is interrupted only by short
pauses” (Laban 1975: 8). This makes it difficult especially to analyze in face-to-face
situations when the observer is also a participant in the communication. Laban devel-
oped a way to analyze movement, on a “macro” level, without the use of film, by way of
shorthand through his symbols and notation system. Today, analysts can also go onto a
“micro” level, with the help of video documentation. Through repeated observations
the many layers (i.e. categories) of movement can be found, while one brief observation
can only reveal the aspect (i.e. category) which is in the foreground.
Many students of Laban have developed his ideas further, so that today we have a
couple of different Laban based analytical approaches. In the US Labananalysis
(LMA) was developed by Irmgard Bartenieff ([1900] 1981) and her students. Since
1988 the European Association of Laban/Bartenieff Movement Studies (EUROLAB),
based in Germany, has been actively translating and developing this material. In the fol-
lowing overview, I will focus on the approach which we use and teach in Europe called
Laban/Bartenieff Movement Studies (LMS). In a few instances I will briefly describe
other Laban-based approaches, which are important in respect to communication,
meaning making or notation.
2. Overview of the theoretical framework

Today, Laban/Bartenieff Movement Studies (LMS) separates movement into six cate-
gories which are all operating in each movement. These movement categories give
answers to the following questions:
– Body: What is moving? Which parts are moving? Which movements are done?
– Space: Where does the movement go?
– Effort: With which energetic quality?
– Shape: With which plastic modification?
– Relationship: How does the moving person set up a relationship to something or
somebody through movement?
– Phrasing: In which temporal order does the movement progress?
The peculiarity of each individual movement originates not only from the addition of
the different elements, but from their versatile combinations. Furthermore, each move-
ment is colored by which of the categories stands more in the foreground. There are a
number of aspects within each of these six categories (altogether over sixty), which can
be observed. Therefore, it is crucial that a selection of the observation parameters is
made according to how one wants to structure the observation process in the context
of the core question (see part 3).
2.1. Body
Laban (1975) not only differentiated the body parts, but also distinguished different
body actions. Today we also look at body attitude i.e. the posture which is held in move-
ment as well as the body connections and organizational patterns. Since the first two are
most relevant for communication, these will be the focus here.
2.1.1. Body actions

This aspect answers the question: what does the body do? On the most general level
there is either stillness or movement i.e. action. The body actions are usually seen as:
gesture, change of support or an action ending in a weight-shift, jump or air moment,
rotation or turn, locomotion or traveling.
It is important to note that Laban’s entire notation is from the perspective of the
mover. The body action symbols are all on the basis of motif writing (see below), there-
fore they are written from down to up. The arrow next to the symbols is to remind the
reader, that this is the direction in which to read the symbol or motif.
Tab. 60.1: Body actions – Basic symbols
Stillness Action Gesture Change of Jump Rotation/ Locomotion/

support Turn Traveling
In every-day communication supplementary to speech, people tend to use gestures and

postural weight shifts. Gestures are usually confined to a part or parts of the body, i.e.
some part of the body is still. They can be observed in any part of the body. In postural
shifts, we see that a part or the whole body shifts the weight, so that there is a change of
support onto another body part. In everyday life, we can observe people doing other
944 V. Methods
body actions – like locomotion through space or turning around. Seldom do we see peo-
ple jump. In all these body actions, the meaning is mainly expressed through the use of
effort, shape, space, relationship and phrasing. Still it may be important to note with
which part of the body this is being done.
Warren Lamb (1965), a student of Laban’s, in his many observations of people in
communication, discovered that there are also sometimes fleeting moments where
the gesture and the postural shift join together – he called these “posture-gesture-
mergers” or “integrated movements” (Moore 2005: 39). With extra training, these
can be observed and analyzed in relation to their effort or shape content.
2.1.2. Body parts
Fig. 60.1: Separation of body party through joints
Tab. 60.2: Body part symbols for the head and fingers
Head Face (front Back of Head Head & Neck Mouth

surface of
Head)
Thumb (1. 2nd Finger 3rd Finger 4th Finger 5th Finger
Finger)
In Laban’s notation, the body parts are separated through the joints, going from the
center out (Fig. 60.1). “C” for Caput, Lat. Head, can be further differentiated with cer-
tain additions to represent the face, mouth and eyes. Again, with other additions, the
limbs or each finger can be notated separately (Tab. 60.2). These symbols can be
added to the body action symbols in motif writing to clarify which body part is doing a
gesture.
2.2. Space
Laban differentiated between the general space around us and the personal movement
area, the kinesphere.
2.2.1. General space/front and facing

The general space is the area which the mover can reach through locomotion. It can be
an open space (outdoors) or a defined space (indoors). A mover can orient himself in
space in relationship to a front – either it is set (e.g. in a theatre) or he defines it (e.g.
towards the mirror). If there is a front there will also be a back, as well as a left and a
right side of the space. Then, the mover can decide where to orient the front of his body
(facing) in respect to this front in the space, because movement can sometimes be best
seen from a certain angle.
In everyday communication there is usually no defined front in the space. Usually, a
mover will orient the front of his body (facing) in relation to a person or object in the
space (see relationship). When a camera is being used to film everyday communication,
it will set up a front through its placement, knowing that this will be the perspective
from which the film is made. Possibly, the mover will be aware of this and use this as
his front in space. If not, it can happen that the observer of the video may not see cer-
tain aspects of the movement.
2.2.2. Kinesphere
The kinesphere is the area around the body which can be reached with the limbs when
standing (or sitting) in one place. Laban defined three approaches to the kinesphere:
central, peripheral or transverse. Central movements go to, away from or pass near
the body center; peripheral movements stay on the edge of the kinesphere; transverse
movements occur in between the center and the edge. Each person defines where the
edge of his kinesphere is through his movements. So it is possible to see peripheral
movements in near reach of the body.
Tab. 60.3: Approach to the Kinesphere
Central Peripheral Transverse
2.2.3. Spatial reference points

Laban developed his space concepts for the kinesphere in three dimensional models:
the Platonic solids. He drew on the corners of these models for his spatial reference
points. He defined the movement as the shortest pathway between two reference points.
946 V. Methods
Fig. 60.2: Octahedron
The vertical, horizontal and sagittal dimensions are in the model of the octahedron
(Fig. 60.2). The diameters of the vertical, horizontal and sagittal planes are in the icosahe-
dron. The four diagonals are in the cube. With these three models (plus center) we
have 27 spatial reference points in total, which can give us an orientation in the kine-
sphere. These reference points can be connected: centrally – through dimensions, dia-
meters and diagonals, peripherally – on the edges of the models (Fig. 60.2), transversely
or a mixture – e.g. central and peripheral (Fig. 60.3).
Fig. 60.3: Defense scale

(Kennedy, 2010, Graphics: Elisabeth Howey)
2.2.4. Spatial scales

Laban developed movement scales within these models, comparable to musical scales,
which follow exactly-described pathways and harmonic laws (like symmetry and mirror-
ing). An example is: the defense scale in the octahedron (Fig. 60.3). Laban (1966) named
15 scales in the three above mentioned models. His student Valerie Preston Dunlop
(1984) delineates 27, which then again can be varied. These scales give the observer pro-
totypes for spatial movement which he can refer to when he observes. A trained
observer can notate the spatial pathways between the reference points and distinguish
if they are a part of a scale.
In everyday communication, the observer will see central, peripheral and transverse spa-
tial pathways in the near to middle reach kinesphere. These may be a part of a scale, but
usually everyday movements don’t follow exact harmonic laws. The idea that the body
likes to balance its movements spatially still holds true in small everyday movements.
For example: if someone moves forward, he will at some point want to move backward.
2.3. Shape
Laban’s first profession was visual art. Therefore it isn’t a surprise that Laban initially
saw movement as a development of one still shape to the next and described it as “a
series of shape transformations” (Laban 1920: 214).
2.3.1. Still shapes

Peggy Hackney, a student of Bartenieff, objectifies the metaphors used by Laban for
still shapes: pin as an elongated shape, ball as any round shape, wall as a flat shape,
and screw as any twisted shape. Hackney (1998: 221) adds the tetrahedron as any angu-
lar spherical shape. For these she developed symbols, so that they can be used in the
motif writing (Tab. 60.4).
Tab. 60.4: Still shape symbols
Pin Ball Wall Screw Tetrahedron
In everyday activities, the still shapes can be observed in the whole body, but also in
body regions, such as the upper body which is pin-like (while sitting upright). It is pos-
sible to transfer these shape descriptions to shapes made by the hands or arms in ges-
tures, i.e. a hand makes a rounded shape of a ball. If the hands start making pantomimic
movements – where the observer has to imagine the rest of the shape – then this be-
comes more what Hutchinson Guest (1983: 173) calls “design drawing”.
2.3.2. Modes of shape change

Laban’s first attempts of defining the shapes used while moving (1926: 94) have been
revised by Judith Kestenberg (Lewis and Loman 1982: 57–58) and are now called by
Hackney (1998: 222) modes of shape change. These modes are differentiated into three
948 V. Methods
different aspects: shape-flow which is self-oriented shape change, directional which is goal-
oriented shape change, either spoke-like or arch-like, and carving which is three-dimen-
sional shaping. Hackney again has designed symbols for these concepts (Tab. 60.5).
Tab. 60.5: Modes of shape change
Shape flow Directional Carving

Spoke-like Arc-like
In active everyday communication with speech, one can observe lots of directional and
carving gestures, some of which may be supported by postural weight shifts. Shape-flow
will mostly happen in those “off ” moments; when people self reference, self touch or
look like they are “dreaming”.
2.3.3. Shape qualities

Furthermore Hackney (1998) sees the shape qualities as a separate aspect of shape,
which can be affined to the dimensions. The six shape qualities are: rising and sinking,
spreading and enclosing, advancing and retreating (Fig. 60.4). Consequently when they
are observed in complex three-dimensional shaping, then words or symbols need to be
combined. For example when someone yawns he might be spreading, retreating and
rising all at the same time.
Fig. 60.4: Shape qualities
In communication, the shape qualities are very important in action as well as in reaction,
since they can carry conscious or unconscious meaning. There are no “one to one” correla-
tions; so they have to be described correctly and then interpreted in context. For example, a
sinking arm movement from high to low can be away from something which is high or
towards something which is low. This movement will be different when the whole body
is sinking or when the arm is sinking but the torso is rising. There will also be a difference
if the movement is purely sinking, sinking with spreading, or sinking with spreading and
advancing. Each differentiation can give the movement a slightly different meaning.
2.4. Effort
In order to describe the dynamic quality of a movement, one usually uses very imagistic,
subjective and interpretive modes of expression, for example “a person thrashed wildly
around himself ”. Laban has created a more objective and a clear structure for the char-
acterization of energetic qualities in movement.
2.4.1. Effort factors and elements

Laban (1988) defined four effort factors of movement: weight, flow, space and time
(Fig. 60.5). Each factor consists of two elements, which are two extremes on a contin-
uum between a fighting and indulging pole. Flow effort means an active attitude to
the continuity of movement. In bound flow, one controls the movement so that it
could be halted anytime. A movement with free flow is hard to stop instantaneously.
Space effort is the inner attitude towards the space, which is observed through the
type of the attention given to space. Does the mover use direct space effort, focusing
the movement in space or does the mover have a flexible attitude towards the space.
Time effort is not about the pulse or the speed of the movement, but the inner attitude
towards time. This could be one of fighting against time: this unrest can evoke quick and
accelerating movements. Or one could enjoy the time, want to extend or sustain it and
decelerate the movement. Weight effort can be an active attitude towards the use of
one’s weight: either using as little weight as possible therefore decreasing pressure
or becoming light, or using as much weight as possible therefore increasing pressure
or becoming strong. The weight effort can also be passive, so that it is giving into gravity,
becoming either heavy or limp.
Fig. 60.5: Effort factors and elements
2.4.2. Effort combinations

With only eight effort elements, and their numerous combinations, and their temporal
ordering possibilities (see phrasing) we can distinguish the versatile nuances of
dynamic qualities in movement. Each element can be executed with different intensi-
ties, so that there are many shad between the two poles of the continuum. Usually, we
do not observe the factors alone, but their “loading” into combinations of two, three,
or four (Tab. 60.6 and Tab. 60.7). In order to characterize the combinations, Laban
gave each of them names. These names already give an indication of possible meaning
making.
950 V. Methods
Tab. 60.6: Combinations of two effort factors
Weight & Flow Time & Space

Dream state Awake State
Weight &
Time & Flow
Space Stable
Mobile State
State
Weight & Time Flow & Space
Rhythm State Remote State
(Near) (Far)
Tab. 60.7: Table Combinations of three effort factors
Weight, Space & Weight, Flow & Weight, Space & Flow, Space &
Time Action Time Passion Flow Spell Time Vision
Drive Drive Drive Drive
In communication, it can be vital to make very fine distinctions in effort through

observation since the meaning could change depending on only one factor. For exam-
ple: a strong- quick movement is something different from one that is strong- quick-
direct or strong- quick- free. Laban called the first combination of weight and time
rhythm state, the second combination of the three factors weight, time and space an
action drive, also called punch, and the third combination of weight, time and flow a
passion drive. All three combinations could possibly be observed in the above men-
tioned “thrashing” person, but each would have a slightly different connotation in
the meaning. Only trained observers can make this subtle distinction.
2.5. Relationship
Ann Hutchinson Guest, a student of Laban’s, has developed the relationship category.
Sometimes the kind of relationship established can be more important than the body,
spatial, effort or shape component of the movement which produced it.
2.5.1. Motion towards and away

A basic relationship concept is moving towards or away from one’s own body parts,
someone or something. It could also be a motion toward or away from a concrete des-
tination point (e.g. a table), a spatial reference point (e.g. high) or a point of focus (e.g.
an imaginary center) (Fig. 60.6).
= Focus point
Fig. 60.6: Motion away and towards a focus point
2.5.2. Degrees of relating

Hutchinson Guest (1983) has defined different degrees of relationship: to be aware, to
address, to be near, to touch and to support (Tab. 60.8). These can be active from only
one person while the other is “passive” or receptive, or both people can be actively re-
lating. The kind of relationship can be maintained for a while or it can be only a brief
moment and then released.
Tab. 60.8: Degrees of relating
Degrees of Relating Basic Symbol

Aware
Address
Near
Touch
Support
2.5.3. Relationship in all categories

All previously mentioned categories can have to do with relationship. Shape is nearly
always present in relationship, since the modes of shape change are defined through
the relationship to the environment. In the category of effort, one can observe
how dynamic energies can influence rapport. In space, Hutchinson Guest (1983: 240)
differentiates the “facing” or the angle through which two (or more) people set up
their relationship (Tab. 60.9).
Tab. 60.9: Facing in relationship

Facing one Facing in Facing Facing Facing Facing Facing Facing one
another same opposite same opposite opposite same another,
direction, directions, direction, directions, directions, direction, diagonally
behind each behind each next to next to diagonally diagonally next to
other other each other each other next to next to each other
each other each other
In everyday communication, it is obvious that the relationship category will play an

important role. We are sometimes in multiple degrees of relating while communicating
952 V. Methods
and at the same time we are using all other categories. Here an example: Person A is
supported by the chair, while she is near a table, her hands are touching a cup (which
is supported by the table), she is addressing person B with her face and aware of the
movement of a dog in the distance. Since Person B is standing in her back space, she
carves her torso into a screw-like twisted shape with direct space effort and sustained
time to address him!
2.6. Phrasing
Movement usually happens in phrases. A phrase has a beginning, middle and end. Gen-
erally, Laban distinguished that a phrase of the movement has an exertion and recuper-
ation phase. Bartenieff (Hackney 1998: 239–241) emphasized the moment of initiation
and also what we do as a preparation for the movement, since it will determine how the
movement is followed through.
2.6.1. Phasing in all categories

We can observe phrasing lengths, their temporal arrangement, and the phrasing of each
of the above mentioned categories of movement. We can look at if phrases are consec-
utive (always after each other), congruent (beginning and ending at the same time), or
overlapping (different beginning and/ or ending) within one category and two or more
categories (Tab. 60.10). Phrases can increase or decrease in how many body parts, in
shape qualities are used, in the complexity in space, in the degrees of relationship, or
in effort loading or intensity.
Tab. 60.10: Phrasing general
Consecutive
Congruent
Overlapping
2.6.2. Effort phrasing

In effort phrasing we distinguish different types of phrases: if there is an emphasis some-
where in the phrase (beginning-, middle- or end- emphasis) or if it has no emphasis
(even phrasing or oscillating phrases). The phrase could increase or decrease, or devel-
ope from increase to decrease or from decrease to increase. For an example with nota-
tion, please see “Phrase writing” below (Fig. 60.8).
3. Methodological procedures to gather data

This model of the structuring of the observation process is based on one developed by
Carol-Lynne Moore, a student of Bartenieff. She identified six aspects to be considered
Observer Role/
Point of View
Duration of Making Sense/

Observation Core Context
Question
Documentation/ Movement
Notation Parameters
Fig. 60.7: Structuring the Observation process (based on Moore and Yamamoto 1988: 224)
in relation to each other (Fig. 60.7). A Laban-based analysis will of course use the above
mentioned parameters and usually the below described forms of documentation and
interpretation.
One can start at any point on the star, but changing one aspect might mean that
others have to be changed. In a face-to-face situation for example, it could unfold in
the following way: the duration of the participatory observation is very short, the num-
ber of movement parameters observed is relatively few, and the making sense of the
observation was perfectly clear, since the observer was in the same situation. While
making the documentation afterwards, the observer comes up with the core question –
ah, that is why this was so interesting to me! Whereas in a research situation one would
have to structure the process very differently: the observer role would be a distant one
(possibly with video), the core question and the movement parameters would have
to be carefully defined in respect to the anticipating how to make sense of the data
(possibly a correlation to an established framework). The duration should be set, but
probably will depend on when the observer feels he has enough data to answer his ques-
tion. The form of documentation probably depends on the training of the observer and
the core question to be answered – which could call for qualitative or quantitative
analysis.
3.1. Methodologies for Laban based analysis

There are five standing methods of Laban based analysis. Three are with notation,
symbol based documentation: Phrase Writing, Motif Writing and Labanotation. Within
two, the concepts are documented on coding sheets in words or notation: Check
Marks and Style Analysis. The table below shows how each is viewed in relation to
its level of description (micro to macro), whether it is a quantitative or qualitative
analysis, how many categories it encompasses and its relation to the order and dura-
tion of events. These are only five methodologies which have been frequently used;
others can be designed in accordance with the application area. Since the methodol-
ogies are correlated with certain forms of documentation, they are explained in the
next section.
954 V. Methods
Tab. 60.11: Standing methodologies for Laban-based analysis

Notation based Methodologies
Notation Phrase Micro Qualitative Reduction to one Temporal ordering;
Writing important category independent of
duration
Motif Macro Qualitative Analyzing the Temporal ordering;
Writing or Micro essence; reduction duration
to one or two documented
important categories
Labanotation Micro Qualitative Spatial analysis of all Temporal ordering;
Body Parts & duration
Relationships documented
Coding Sheet based Methodologies
Coding Check Marks Micro Quantitative Reduction to one Independent of

Sheet (counted) parameter within a order and duration
category
Style Macro Quantitative Complete analysis Independent of
Analysis (estimated) order and duration
3.2. Documentation
There are many ways we can document Laban-based analysis, e.g. through movement
itself, words, or coding sheets. Laban and his students (Albrecht Knust and Ann Hutchin-
son Guest in particular) developed his notation as a basic tool for documentation, with the
hope that its use would become as normal for movement and dance, as music notation is
for music.
3.2.1. Forms of notation

The symbols developed by Laban make an important contribution to the analysis and
documentation of movement. Sometimes individual elements can be combined to
show combinations of efforts or shape qualities. With the same symbols, but different
syntax and grammar rules, different forms of notation will result: Phrase Writing (Bar-
tenieff and Lewis 1981), Motif Writing (Hutchinson Guest 1983) and Labanotation
(Laban 1956).
3.2.1.1. Phrase writing

The phrase writing of one single phrase of movement, with a reduction to one of the
categories, is to be seen on a micro level. This qualitative analysis says something
about the order of appearance of the elements of this category, but nothing about
the duration of the elements. Phrase writing is usually written from left to right
(Fig. 60.8).
The right arm starts with free-quick to then accent the phrase with a strong-quick
effort. Both arms do a bound-sustained movement. The right arm does a movement
which increases the effort loading to a strong-direct-quick accent (punch).
The right arm starts with free-quick to

then accent the phrase with a strong-quick
effort. Both arms do a bound-sustained
movement. The right arm does a movement
which increases the effort loading to a
strong-direct-quick accent (punch).
Fig. 60.8: Example of Effort Phrase writing
3.2.1.2. Motif writing

In motif writing, the most important of the six categories in that moment is represented.
This is comparable with the music, when one hears many sounds all at once (for example
with an orchestra) and nevertheless one is able to distinguish the “leitmotif”. Within the
movement sequence it can happen that the category which is perceived to be in the fore-
ground changes – this will change the category which is notated in the motif. The motif
analysis can be on a micro level, like the phrase analysis, or it can be on the macro level,
when the observation is over a longer period of time or in a live situation. The qualitative
analyses of the parameters of different categories in the sequence are notated in their
duration. The relative duration is notated through the length of the symbols. Motif writ-
ing is written from down to up and then from the left to the right (Fig. 60.9).
to the right
stillness
moving towards place high
rising-enclosing
rising-spreading
release (the support)
right arm supported by

right leg
Fig. 60.9: Example of Motif Writing
Right side of body
Head forward
Hand left-low
Arm gesture right-

forward-high
Torso forward
Leg gesture side-middle
step backward
Fig. 60.10: Example of Labanotation

956 V. Methods
3.2.1.3. Labanotation
Labanotation incorporates the exact positions of all the moving body parts in space and
their relationship to objects and other people (Fig. 60.10). It is a description on a micro
level with the documentation of the duration in the same direction as motif writing. The
difference is that a staff represents the body from a center line into right and left sides.
Another difference is that gestures have to be referenced in their spatial directions from
the “point of attachment” of that limb to the body (Hutchinson Guest 2005: 199–217).
This means that an arm gesture forward in Labanotation is on the level of the shoulders,
whereas in motif writing it will be on the level of the pelvis (LMS uses the center of weight
as the reference, see space). Labanotation is used mostly for an exact reconstruction of
movement events. It is possible to add other categories (effort and shape) for yet a fuller
score, but this is hardly done – since the two Laban-based analyses are separate trainings.
3.2.2. Coding sheets

A coding sheet is a kind of a checklist, outlining what the observer wants to observe.
There are different types of coding sheets for different purposes. For a style analysis
aiming to be as complete as possible, the coding sheet will need to have all the para-
meters in each category, so that no parameter is forgotten. For other purposes, different
coding sheets can be designed with only some the salient parameters to be observed.
Using checkmarks, the observer documents the frequency for one parameter. It is the
simplest form of coding sheet.
Depending on the observer, the parameters on the coding sheet can be written in
words or symbols. Usually, the sheet is designed in such a way that it is also used for
the documentation of the results. Coding sheets are normally used to attain a rough quan-
tification of the frequency and/or the intensity of a parameter on the macro level. It is
also possible to do a rating of the performance of the movement. A temporal order or
statement about the duration of a parameter is not possible with a coding sheet.
For analysis of everyday movement, any of the above mentioned forms of documenta-
tion can be used. Coding sheets are the easiest to train observers with basic knowledge of
movement observation, since they can be used independently of the notation. Of the
three forms of notation, phrase writing is the easiest to learn and observe, since there
will only be one category to focus on. If one is notating movement which goes along
with a conversation, where the categories which are in the foreground can change within
microseconds, then motif writing would be the best choice. When there are only gestures
of the arms and hands, then it is easier to write a motif of the movement than to use the
complete body staff of Labanotation.
4. Meaning making/interpretation
The words which Laban used to describe the concepts are (unfortunately) already
loaded with meaning. In the Laban community, we debate what would most accurately
describe the phenomenon. Actually, there are many words which could work; it all de-
pends on the synthesis of the different elements within a particular context. The sym-
bols are more neutral in their meaning. This helps, when the goal is to document the
movements on a descriptive level.
When the observer wants to make meaning and starts to interpret, then he will do this
either with his own body knowledge and all his body prejudices, or he will have to find or
develop an interpretative framework (Moore and Yamamoto 1988). In either case, the
aim is to find interpretive statements which are supported by the descriptive data.
Our body prejudices become very clear when we are observing movements of other
cultures. We might be interpreting them with our knowledge and then possibly find out
that this was not the intended meaning. But even within our own culture, we sometimes
find out that our initial interpretation was not correct. Still it is an important step in the
observation process to accept that we have first impressions loaded with our prejudices
and interpretations. We can then go through systematic and focused observation with
the Laban based concepts. In the end, we can judge if our first impressions were correct.
An example of an interpretive framework is the “Movement Pattern Analysis” for a
decision-making process which Warren Lamb developed to interpret the poster-gesture-
merger patterns of effort and shape he observed in conversation (Moore 2005: 43). He
correlates effort with assertion and shape with gaining a perspective in the three phases
of decision making: attending, intending and committing.
5. Conclusions
Laban based movement observation presents certain difficulties. Some concepts do not
become clear just from the association to the word. Movement experience and observa-
tion practice with a teacher is the prerequisite for a reliable observation of the para-
meters. Training is also required to attain the necessary knowledge of the symbols
and syntax for the notation. There are different trainings for different parts of Laban
based analysis and notation systems. Laban’s notation takes the point of view of the
mover, which is slightly awkward for the observer, with right and left not being the
way it is observed. But it forces the observer to try to see the world through the mover’s
eyes!
Laban based movement observation also has various advantages. With the six cate-
gories of LMS and approximately 60 parameters, one is able to observe movement in its
complexity. The observer can choose to try to be as complete as possible or only
observe the salient parameters. He may choose a quantitative or a qualitative analysis –
combinations are also workable. It is possible to observe from a macro level to a micro
level, depending on the context and the possibility of repetition. Depending on what is
important to document, different methods of notation can be chosen. The symbols have
the advantage that they are quicker to write and can show combinations in one symbol.
Compared to written speech, the notation has the advantage that it can show move-
ments which are happening simultaneously in close proximity and in a pictorial, non-
linear way. The notation gives the scholar a basis not only to document movement,
but to also reflect on movement in a different way. For those who are conversant
with the notation, it also enables communication about movement without spoken or
written language as a mediator, so it is truly intercultural like music notation.
Through 90 years in which Laban based analysis has been developed by many people
besides Laban, it meets the intricacy of human body movement through its complexity
and systematic approach. I hope to have shown here that Laban based analysis, origi-
nally developed for dance, can also serve as an excellent tool for observing body
movement in human communication.
958 V. Methods
6. References
Bartenieff, Irmgard and Dori Lewis 1981. Body Movement: Coping with the Environment. New
York: Gordon and Breach. First published [1900].
Hackney, Peggy 1998. Making Connections – Total Body Integration through Bartenieff Funda-
mentals. London: Gordon and Breach.
Hutchinson Guest, Ann 1983. Your Move: A New Approach to the Study of Movement and Dance.
London: Gordon and Breach.
Hutchinson Guest, Ann 2005. Labanotation – the System of Analyzing and Recording Movement.
4th edition. London: Routledge.
Kennedy, Antja 2010. Bewegtes Wissen – Laban/Bartenieff-Bewegungsstudien verstehen und erle-
ben. Berlin: Logos.
Laban, Rudolf 1920. Welt des Tänzers. Stuttgart, Germany: Walter Seifert.
Laban, Rudolf 1926. Choreographie. Jena, Germany: Eugen Dietrich.
Laban, Rudolf 1975. The Mastery of Movement. 4th edition. Boston: Plays Publishers. First pub-
lished [1950].
Laban, Rudolf 1956. Laban’s Principles of Dance and Movement Notation. London: MacDonald
and Evans.
Laban, Rudolf 1966. The Language of Movement – A Guidebook to Choreutics. Boston: Plays
Publishers.
Lamb, Warren 1965. Posture and Gesture. London: Trinity.
Lewis, Penny and Susan Loman 1982. The Kestenberg Movement Profile – Its Past Applications
and Future Directions. Keene: Antioch New England Graduate School.
Moore, Carol-Lynne 2005. Movement and Making Decisions – the Body-Mind Connection in the
Workplace. New York: Dance and Movement Press.
Moore, Carol-Lynne and Kaoru Yamamoto 1988. Beyond Words – Movement Observation and
Analysis. New York: Gordon and Breach.
Preston Dunlop, Valerie 1984. Points of Departure: The Dancer’s Space. London: Lime Tree
Studios.
Antja Kennedy, Bremen (Germany)
61. Kestenberg movement analysis

1. History of the Kestenberg Movement Profile (KMP)
2. The Kestenberg Movement Profile System – a comprehensive overview
3. Psychometric qualities of the Kestenberg Movement Profile and instruments derived from
the Kestenberg Movement Profile
4. Clinical and research applications of the Kestenberg Movement Profile
5. Summary
6. References
Abstract
The present chapter provides an overview of the Kestenberg Movement Profile (KMP), a
full body assessment instrument of dynamic movement and meaning. The Kestenberg
61. Kestenberg movement analysis 959
Movement Profile is an observational movement analysis tool that describes body move-
ment patterns across nine different categories. It is employed in clinical fields such as
dance/movement and creative-arts therapies, in developmental, clinical, social, and health
psychology, psychiatry, as well as in embodied cognition research. The theory links these
movement patterns to psychological needs, affect, temperament, learning styles, defense
mechanisms, self- and other related feelings, simple and complex relations, interlacing
movement, developmental, psychobiological, and clinical perspectives. Starting with a
history of the Kestenberg Movement Profile, the chapter provides summaries of the
method, clinical use and empirical research applications of the Kestenberg Movement
Profile. Psychometric qualities, related applications, and links to the cognitive sciences
and embodiment research are described. Kestenberg Movement Analysis has developed
clear propositions regarding how body movement patterns are related to meaning thereby
contributing to understanding the essence, function and psychological correlates of
dynamic movement.
1. History of the Kestenberg Movement Profile (KMP)

The Kestenberg Movement Profile was developed by Judith Kestenberg (1910–1999),
members of the Sands Point Movement Study Group (Arnhilt Buelte, Hershey Marcus,
Esther Robbins, Jay Berlowe, and Martha Soodak), and other associates in the 1960s
and 1970s (Kestenberg et al. 1971; Kestenberg 1965a, 1965b, 1967, 1975, 1985a; Kesten-
berg and Sossin 1979; Lewis and Loman 1990). Their systematic approach to differen-
tiating, and considering the interrelationships among, human movement characteristics
was framed upon the movement analysis works of Laban (1879–1958; Laban and Lawr-
ence 1947; Laban 1960) and Lamb (1965), especially those pertaining to Effort and
Shape, and was integrated with Schilder’s (1935) neurological and psychoanalytic
considerations regarding the body (Kestenberg 1985a), and with Anna Freud’s (1965)
Metapsychological Diagnostic Profile.
Kestenberg was born in Poland, and trained in neurology, psychiatry and psychoanal-
ysis in Vienna, completing her psychoanalytic education in New York, where she moved
to in 1938, settling there. Her interests in nonverbal facets of mother-child interaction
were evident early on (Kestenberg 1946), and her increasing attention to infancy and
early development underscored the need for improved ways of identifying relevant
nonverbal behaviors elemental in early relating. The study of Laban analytic ap-
proaches to movement behavior with Warren Lamb (1965; Lamb and Watson [1979]
1987) and Irmgard Bartenieff (Bartenieff and Lewis 1980) contributed to her knowledge
of movement processes. Over time, Kestenberg elucidated a comprehensive movement-
informed developmental model and an expanded metapsychology (Kestenberg 1975,
1985a). The clinical application in a primary prevention setting, The Center for Parents
and Children on Long Island, New York, contextualized many years of further study.
Kestenberg was guided by a unifying concern for prevention of mental illness. The
Kestenberg Movement Profile incorporates significant propositions, including em-
bedded developmental and affect theories, and an approach to personality assessment.
Kestenberg and Buelte (1977) introduced conceptual and developmental perspectives
related to ongoing reciprocity and mutuality in the caregiver-child dyad. They articu-
lated the confluence between movement and psychic structural development, under-
scoring the roles of empathy, trust and mutual holding. The Kestenberg Movement
960 V. Methods
Profile incorporates a classificatory approach to movement patterns that finds intrinsic

developmental and psychological significance in specific movement patterns and their
combinations. Kinesthetic empathy, embodied knowledge and experiential learning
have been central elements in movement analysis from the beginning.
In the most thorough description of the Kestenberg Movement Profile to date
(Kestenberg Amighi et al. 1999), the Kestenberg Movement Profile is considered to
be a descriptive research and clinically-informing tool, relevant to everyday behaviors
unfastened to a particularized psychoanalytic framework. Hence, the Kestenberg
Movement Profile, as it has evolved, is several things at once:
(i) a Laban-derived ethnogrammatic tool for human movement-process description;

(ii) an encompassing theoretical model offering psychological understandings of the
interface between dynamics and structure;
(iii) a basis for developmental assessment;
(iv) a means of assessing personality;
(v) a research tool for describing and capturing individual and dyadic interaction
patterns;
(vi) a basis upon which to frame and implement clinical interventions across disciplines,
and
(vii) in distillation for nonprofessionals, a framework for enhancing awareness of non-
verbal behavior of self and others, offering substantive ideas regarding child devel-
opment and parenting.
In addition to the Kestenberg Movement Profile, Laban’s school of thought has other
derivatives, including Laban Movement Analysis (LMA), Movement Pattern Analysis
(MPA), the Action Profile® (AP), and the Movement Psychodiagnostic Inventory
(MPI). Some of the many important pioneers contributing in the Laban tradition are
Irmgard Bartenieff (1900–1981), Warren Lamb, Pamela Ramsden, Marion North, and
Martha Davis (Bartenieff and Lewis 1980; Davis 1997, 1981; Davis et al. 2007; Kesten-
berg 1975; Kestenberg and Sossin 1979; Kestenberg Amighi et al. 1999; Lamb 1965;
Lamb and Watson 1987; North 1972).
2. The Kestenberg Movement Profile System – a comprehensive

overview
As described above, the Kestenberg Movement Profile involves both a systematic
method involving elaborations of Effort/Shape as developed by Laban and Lamb, espe-
cially amplifying considerations of flow factors and qualitative differentiations among
patterns with developmental relevance, and an accompanying theoretical model for
understanding meaning in movement behavior. The tension-flow-effort system (System I)
of movement qualities are differentiated from the shape-flow-shaping system (System II).
In the graphic presentation of the Kestenberg Movement Profile, these are denoted as
two parallel columns, separate but interrelated. System II incorporates movement patterns
that give “structure” to the “dynamics” of System I patterns.
The Kestenberg Movement Profile is a multi-layered instrument for describing and
quantifying, nonverbal behavior, specifically movement behavior. Originally based upon
intensive case analyses (Kestenberg and Sossin 1979), the Kestenberg Movement Profile
has evolved during more than 40 years of observation (Kestenberg 1975; Kestenberg
et al. 1971; Kestenberg and Sossin 1979; Kestenberg Amighi et al. 1999; Loman and
Foley 1996; Loman and Sossin 2009; Sossin 2002, 2007). Kestenberg linked the domi-
nance of specific movement patterns with particular developmental phases and psycho-
logical functions. Movement observations complement Kestenberg’s (1975, 1976, 1980)
investigations of multiple aspects of development, including gender, pregnancy and
maternal feelings, trauma, and obsessive-compulsive disorder, with a distinct focus on
the primary prevention of emotional disorders.
Movement patterns in the womb have been considered from a Kestenberg Move-
ment Profile perspective (Kestenberg 1980, 1987; Loman 1994, 2007), describing prena-
tal attunement and continuities in rhythmicities from prenatal to postnatal stages.
Historically, profiles of infants and parents were compared with each other to yield
information about areas of interpersonal conflict and harmony.
2.1. Summary of the Kestenberg Movement Profile

The Kestenberg Movement Profile contains nine categories of movement patterns
representing two corresponding but distinct lines of development, each starting with sys-
tems of movement available to the fetus and newborn. System I, or diagrams on the left
side of the graphically depicted Kestenberg Movement Profile, start with tension-flow
rhythms, pertaining to inner needs and corresponding to developmental phase organization.
Tension-flow attributes describe movement characteristics most readily associated
with affect and temperament (and pleasure/displeasure feelings). System I traces an evo-
lution to pre-effort and effort diagrams, reflecting more advanced patterns in response to
learning modes and environmental challenges. Pre-efforts bear correspondence to what
have traditionally been deemed defenses (and approaches to learning), while effort pat-
terns are linked to adaptations and masteries. Kestenberg linked effort patterns to ego-
functions. System II, reflected by the diagrams on the right side of the Kestenberg Move-
ment Profile, documents a line of development related to relationships to people and
things. The top diagrams, bipolar and unipolar shape-flow, represent patterns involving
expansion and contraction of body contours. Bipolar shape-flow patterns (e.g., symmetric
widening or narrowing) are linked to general experiences of comfort and discomfort,
while unipolar patterns (e.g., asymmetric widening or narrowing) bear upon reactions
of approach and withdrawal toward distinct stimuli. Shape-flow design represents move-
ment pathways towards and away from the body. Shaping in directions represents pat-
terns that form linear vectors, and finally shaping in planes represents elliptical designs
within one or more spatial planes related to the expression of complex relations.
The Kestenberg Movement Profile graphically depicts up to 124 distinct movement
factors (across 29 polar dimensions) and includes a body attitude description and qua-
lifying numerical data. With regard to pre-effort, effort, shaping in directions and shap-
ing in planes, distinctions are made between patterns that are gestural and those that
are postural (following Lamb 1965), and separate frequency distributions are made
for gestures and postures. Each of the nine Kestenberg Movement Profile diagrams
(Fig. 61.1) refers to a specific movement pattern. Observational, developmental and
interpretive characteristics of the Kestenberg Movement Profile’s movement patterns
are summarized below. Fuller descriptions of the Kestenberg Movement Profile can
be found in the literature (e.g., Kestenberg Amighi et al. 1999; Loman and Sossin 2009).
962 V. Methods
Bipolar Shape Flow

widening narrowing
Tension Flow Rhythms
o sucking snapping/biting os lengthening shortening
a twisting strain-release as bulging hollowing
u running/drifting starting/stopping us
ig swaying birthing/surging igs

Unipolar Shape Flow
(lateral) widening (medial) narrowing
og jumping spurting/ramming ogs
lengthening up, lengthening down,
shortening up shortening down
bulging forward bulging backward

hollowing forward hollowing backward
Tension Flow Attributes

flow adjustment even flow
low high
gradual abrupt
Pre Efforts Shaping in Directions

flexbile channeling sideways across
gentle vehemence/straining up down
hesitate sudden forward backward
Pre Efforts Shaping in Planes

indirect direct spreading enclosing
light strong ascending descending
sustained, decelerating quick, accelerating advancing retreating
Fig. 61.1: In this figure, one of the nine diagrams, Shape-Flow Design, is not included, because so
far it did not lead to sufficient reliability between observers. Overview of the Kestenberg Move-
ment Profile. The developmental sequence from top to bottom of all profiles indicates a proceed-
ing from early to more mature movement patterns. The developmental sequence within the single
profiles proceeds from first year to third year in the single rows of the profiles.
2.1.1. Tension-Flow Rhythms

Tension-Flow Rhythms are repetitive alternations of tension and relaxation in the
muscle tone (Fig. 61.2). These alterations between free and bound flow are rhythmic,
although irregular, in their intervals. They are basic movement qualities that are expres-
sive of needs. For example, the sucking rhythm expresses the need to self sooth, the
swaying rhythm to sooth others, and the jumping rhythm to express joy. Ten rhythmic
patterns are identified: each corresponding in pairs to five major developmental phases:
oral, anal, urethral, inner-genital and outer genital (Kestenberg 1975), each with an “in-
dulging” or “libidinal” pole characterized by smooth reversals, and each with a “fight-
ing” or a “sadistic” pole characterized by sharp reversals.
Tension-Flow Rhythms of the KMP
Indulgent, libidinal Fighting, separating
1a. Sucking (oral) 1b. Biting (oral)

Free Flow
Baseline
Bound Flow
2a. Twisting (anal) 2b. Straining-releasing (anal)
3a. Running/Drifting (urethral) 3b. Starting–Stopping (urethral)
4a. Swaying (innergenital) 4b. Birthing/Surging (innergenital)
5a. Jumping (outergenital) 5b. Spurting/Ramming
Fig. 61.2: Tension flow rhythms overview (adapted from Kestenberg Amighi et al. 1999)
Whereas all other movement categories/clusters can be readily “unwed” from psycho-
analytic theory, this applies less to rhythms. The ten basic rhythms, and their according
abbreviations (Kestenberg Amighi et al. 1999) are sucking (o), snapping/biting (os),
twisting (a), strain/release (as), running/drifting (u), starting/stopping (us), swaying
(ig), surging/birthing (igs), jumping (og), and spurting/ramming (ogs). Tension-flow
notation and scoring are exemplified in Fig. 61.3. At the height of each developmental
phase, we expect to see a notable increase in the proportion of rhythms typical for that
phase. All body parts can show all rhythms, and all rhythmic patterns are evident (to
greater or lesser extents) at all phases. Frequency distributions appear to reflect consis-
tent individual differences. In addition to the ten basic rhythms, there is great variety of
“mixed rhythms,” combinations of two or more rhythms. Combinations of mixed fight-
ing rhythms signal a potential for immediate aggression. Individual preferences for
specific rhythms indicate preferred methods of drive discharge.
2.1.2. Tension-Flow attributes

Tension Flow is a manifestation of animate muscle elasticity. Bound flow is a restrain-
ing movement pattern that occurs when agonist and antagonist muscles contract
964 V. Methods
Fig. 61.3: Tension flow rhythms notation: an example of a 2 ½ month old boy.
simultaneously. Free flow is a releasing movement that occurs when a contraction of

the agonist muscles is not counteracted by the antagonists. Neutral flow refers to a
limited range of flow observed in limpness or de-animation.
Attributes of tension-flow intensity factors categorize tension changes along three
dimensions: even or adjusting; high or low intensity; abrupt or gradual. Tension-flow at-
tributes pertain to fighting or indulging patterns of arousal and calmness. Interpretively,
tension-flow is linked to affect regulation: bound flow and fighting attributes are asso-
ciated with cautious feelings, while free flow and indulging attributes are associated
with lighthearted feelings. More subtle or complex affect, especially as they pertain
to safety/danger or pleasure/displeasure, are related to combinations of tension-flow
attributes.
2.1.3. Precursors of Effort

Precursors of Effort precede efforts in describing movement changes (including ten-
sion-flow) in relationship to space, weight and time. Precursors of effort become
motor indices of defense mechanisms and styles of learning. The Kestenberg Movement
Profile denotes six pairs of precursors of effort: channeling vs. flexible, straining and
vehemence vs. gentle, and sudden vs. hesitating. Channeling keeps tension levels
even to follow precise pathways in space; this has a fighting character. Its opposite,
the flexible precursor of effort, changes tension levels to meander around in space
and is thus more indulgent. Like defenses themselves, precursors of effort can be ma-
ladaptive or adaptive; isolation, reflected in notable use of channeling, can be indicative
of affective disassociation (possibly amplified by neutral flow) or of objective thinking.
Precursors of effort are both body-oriented, in terms of bound and free tension-flow
alternations, and reality-oriented, in terms of space, weight and time; hence, they are
intermediary patterns, between tension-flow and effort.
2.1.4. Effort
Effort patterns are motor components of coping with external reality in terms of space,
weight and time (Laban and Lawrence 1947). In space, direct and indirect are distin-
guished; in weight, strength and lightness; and in time, acceleration and deceleration.
Direct, strength and acceleration are fighting effort elements, while indirect, light and
deceleration are more accommodating ways of dealing with space, weight and time.
Effort elements are developmentally linked (as per consonance) to specific precursors
of effort and, even further, to specific tension-flow attribute patterns. The individual’s
mature constellation of effort elements shows, in relation to the polarities identified
above, preferences in terms of attention, intention and decision-making.
2.1.5. Bipolar Shape-Flow

Body contour changes in shape-flow express shifts in affective relations of the self with
objects in the environment. Bipolar shape-flow is the dimensionally proportioned grow-
ing and shrinking of the body in response to internal or general environmental stimuli.
In terms of breathing, for example, we grow with inhalation and shrink with exhalation.
Growing and shrinking occur in three dimensions: horizontal (width), vertical (length)
and sagittal (depth). Bipolar shape-flow is especially expressive of affects of comfort
and discomfort (e.g., with our real or imagined surroundings). Self-in-the-world feelings
are conveyed through bipolar shape flow.
2.1.6. Unipolar Shape-Flow

In unipolar shape-flow, the body grows and shrinks in a dimensionally disproportion-
ate manner, expressing attraction to, or withdrawal from, discrete stimuli. Unipolar
shape-flow thus is directional approach and avoidance movement and occurs in
three dimensions or axes: horizontal (lateral vs. medial), vertical (cephalad vs. caudal)
and sagittal (anterior vs. posterior). An individual’s body grows toward or away from
others with unipolar shape flow allowing inferences about the valence of the according
stimulus.
2.1.7. Shape-Flow design

Along with changes in body shape, body movement creates designs in personal space.
These movements are away from the body (centrifugal) or toward the body (centripe-
tal). They are classified in terms of their degree of linearity (linear or looping), ampli-
tude (high or low amplitude) and angularity (angular or rounded reversals). In notating
shape-flow design, the coder traces a two-dimensional line with pen and paper, utilizing
spatial/ directional parameters. Shape-flow design patterns reflect an individual’s style
of relating and feelings of relatedness. They are influenced by cultural conditioning,
congenital preferences, developmental stages and situational factors.
966 V. Methods
2.1.8. Shaping of space in directions

Linear projections of the body into dimensional space create directional movements,
which bridge distant objects with the self, as in the simple case of pointing. Directions
in space include moving across the body and moving sideways (horizontal), moving
downward and moving upward (vertical), and moving backward and moving forward
(sagittal). Directional patterns give relational structure to precursors of effort. They
reflect ways in which individuals intersect space, and are indicative of defenses against
external stimuli and environmental learning responses. Closed-shape directions create
new boundaries by delimiting bodily access.
2.1.9. Shaping in space in planes

Movements employing shaping in planes configure space by creating concave or convex
shapes. The Kestenberg Movement Profile principally draws upon Lamb’s (1965) defin-
ing work on Shape. Horizontal shaping encloses or spreads; vertical shaping descends or
ascends; sagittal shaping retreats or advances. Each spatial plane includes a principal
and an accessory dimension. In the horizontal plane, the accessory dimension is sagittal;
spreading and enclosing are used in exploration. In the vertical plane, the accessory
dimension is horizontal; ascending and descending are used in confrontation. In the sa-
gittal plane, the accessory dimension is vertical; advancing and retreating are used in
anticipatory actions. Shaping of space in planes expresses multi-dimensional relation-
ships with people and is linked to representational experience. The relative complexity
of patterns classified as shaping in planes corresponds to the complexity of relationships
that are internalized, held in mind, and experienced by the mover.
2.2. Relations between the two systems

2.2.1. Affinities
The tension-flow/effort system (System I), shown by the diagrams on the left side of the
Kestenberg Movement Profile (Fig. 61.1), depicts developmentally changing patterns of
dealing with internal and external reality. The shape-flow/shaping system (System II),
shown by the diagrams on the right side, depicts developmentally changing patterns
of spatial movement expressing growing complexity of object relations. The two systems
are interrelated: fighting tension-flow/effort patterns are affined (fit well), with shrink-
ing shape-flow and closed shaping; pleasant and indulging tension-flow/effort patterns
are affined with growing shape-flow and open shaping. Specific affinities between Sys-
tem 1 and 2 are considered “matching” patterns and specific clashes between systems
are considered “mismatching.” When corresponding (same) patterns are used by two
individuals, they are designated as “concordant,” and when they show specific a specific
clash they are “discordant.”
2.2.2. Attunement
Tension-flow and shape-flow are fundamental in the experience and expression of
affect. Bound flow corresponds to inhibition and discontinuity whereas free-flow corre-
sponds to facilitation of impulses and continuity. Attunement in tension-flow (sharing of
feelings) appears to be a key manifestation of empathy between individuals, such as
caregiver and child (Kestenberg 1985a). Higher attunement is deduced from higher
concordance of the tension-flow attribute diagrams between two individuals (Loman
and Sossin 2009). This can be seen more directly in temporal coding bearing directly
on interpersonal contingencies. Dyadic up-regulation and down-regulation are related
to such contingencies. Partial attunement rather than complete attunement is seen as
being helpful to the parent-infant relation, serving individuation.
2.2.3. Kinesthetic empathy

Tension-flow attunement and shape-flow adjustment are the foundations of empathy
and trust (Kestenberg, 1985b). Kinesthetic empathy refers to an individual’s capacity
and proclivity to know another’s actions, intentions, and emotions through attentive
(though often implicit) perception of his or her movement and bodily experience.
Such a capacity has been related to the mirror-neuron system (Rizzolatti and Sinigaglia
2008). Synchronic KMP rhythms notations can directly and reliably operationalize
kinesthetic empathy (e.g., Sossin 1987; Koch 1999, 2006a). The use of kinesthetic
empathy is central in descriptive and diagnostic processes in movement analysis and
movement therapy.
2.3. Additional Features of the Kestenberg Movement Profile

Tension-flow attunement and shape-flow adjustment are the foundations of empathy.
Kinesthetic empathy refers to an individual’s capacity and proclivity to know another’s
actions, intentions, and emotions through attentive (though often implicit) perception
of his or her movement and bodily experience. Such a capacity has been related to
the mirror-neuron system (Rizzolatti and Sinigaglia 2008). Synchronic Kestenberg
Movement Profile rhythms notations can directly operationalize kinesthetic empathy.
The use of kinesthetic empathy is central in descriptive and diagnostic processes
in movement analysis. The Kestenberg Movement Profile is statistically constructed
in terms of frequency distributions, summarizing complex movement processes, and
casting an individual’s “profile” in terms of preferences, proclivities, and repertoire.
Raw notational data supplements the profile, especially pertaining to “phrasing,” cap-
turing patterns involved in introduction, main theme, and ending/transition segments
within sequentially unfolding movement. Only recently, have researchers using the
Kestenberg Movement Profile begun to systematically explore sequential patterns
more directly (e.g. Koch 2007b; Reale 2011; Shaw, Sossin, and Salbod 2010).
Movement can occur in gestures (e.g., in shaping in directions), using just one part of
the body, or in postures (e.g., in shape flow), involving the entire body (Lamb 1965; Lamb
and Watson 1987). Movement phrases can show the same patterns, first in a gesture and
then in a posture, or vice versa. These sequences are called posture-gesture merging
(PGM; Lamb 1965). Integrated merging of gestures and postures, only occurring from
adolescence on, is central in the works of Lamb and Movement Pattern Analysis
(Moore and Yamamoto 2011) and of The Action Profile® system (Ramsden 2007).
Empirical research has lent support to the link between PGMs and authenticity (Winter
et al. 1989). In the Kestenberg Movement Profile, postures indicate a more whole-hearted
involvement than gestures, since they require greater bodily participation with core
(torso) and periphery (extremities) unanimously involved. Actions influenced by con-
science and aspirations are likely to be evidenced in postural movements.
968 V. Methods
The load factor (LF) is a statistic that applies to all categories of movement except
Tension Flow Rhythms, reflecting the complexity of movements in each subsystem by
indicating how many elements are, on average, included in an action. The range of
the load factor is between one (33% load factor) and three (99% load factor) elements
per action. Gestures and postures of the same cluster can have very distinct load factors.
The gain-expense ratio (GE) compares the number of movement elements (gain) per
subsystem to the number of movement flow factors (expense). The gain-expense ratio is
interpreted in relation to other subsystems, and indicates the relative degree of affective
control (non-flow movement patterns) vs. affective spontaneity (flow patterns) in each
domain. This affective component is further broken down into a ratio of free flow (ease)
to bound flow (restraint) or a ratio of growing (comfort) and shrinking (discomfort) in
System I, or II respectively.
3. Psychometric qualities of the Kestenberg Movement

Profile and instruments derived from the Kestenberg
Movement Profile
There are three major lines of research on the KMP
(i) theory and method development (Kestenberg 1965a, 1965b, 1967, 1975, 1995;
Kestenberg and Borowitz 1990; Kestenberg and Sossin 1979),
(ii) applications to fields of practice (e.g., Birklein 2005; Hastie 2006; Kestenberg
Amighi 2007; La Barre 2001; Lewis 1990, 1999; Loman 1998; Loman and Foley
1996; Lotan and Yirmiya 2002; Loman and Sossin 2009; Sossin 1999), and
(iii) establishment of psychometric quality (e.g., Koch, Cruz, and Goodill 2002; Koch
2006a; Koch 2007a; Sossin 1987).
Kestenberg (1975) reported success in initial efforts to validate the Kestenberg Move-
ment Profile system using the external criterion of the diagnoses by Anna Freud in the
1960s. Kestenberg – blind to Anna Freud’s assessment – diagnosed the same children as
Freud on basis of the movement profile and then compared her diagnosis to psychoan-
alytic diagnoses employing the Diagnostic Profile. Such anecdotal reports laid a basis
for further validational work (e.g., Koch 2007a).
Whereas each study employing the Kestenberg Movement Profile offers further steps
toward validation (e.g., Birklein 2005; Birklein and Sossin 2006; Bridges 1989; Loman
1995, 2005), a more systematic approach to investigating the validity of the KMP has
been conducted by Koch (2007a, 2011) across a series of experimental studies. In this
work, single parameters from the Kestenberg Movement Profile have been selected in an
attempt to crystallize basic dimensions of movement and to validate them step by step.
However, information resulting from these experiments mainly concerns the validity of sin-
gle Kestenberg Movement Profile components, not combinations, sequences or complex in-
teractions of movement parameters. To start with a simple yet important set of movement
parameters, Koch first tested the basic dimensions of system I (tension-flow-effort system;
indulgent vs. fighting movement), and of system II (shape-flow-shaping system; open vs.
closed movement), and then the combination of both (interactions) experimentally.
In terms of economy, Bräuninger and Züger (2007) have suggested an abbreviated ob-
servational version, and Koch has created a questionnaire format with 113 Items (Koch
1999) from the interpretive categories in Kestenberg Amighi et al. (1999). A German ver-
sion of the questionnaire was highly consistent (N=80; all Alphas > .80), and items had
high discriminative power, two exceptions were taken out. On the basis of this question-
naire, Koch and Müller (2007) developed the brief Kestenberg Movement Profile-based
affect scale (Koch and Müller 2007; Fig. 61.4). This scale is suited for experimental mea-
sures of movement-related affect as well as for evaluation designs. It includes System I
(Items in standard writing) and System II movement patterns (Items in italics).
relaxed 1 2 3 4 5 6 7 tense
loaden, fighting 1 2 3 4 5 6 7 joyful, excited
(aimless) drifting 1 2 3 4 5 6 7 impatient, driven
comfortable 1 2 3 4 5 6 7 uncomfortable
indulging 1 2 3 4 5 6 7 distancing
holding on, retentive 1 2 3 4 5 6 7 playful, coy
yielding 1 2 3 4 5 6 7 fighting
letting go 1 2 3 4 5 6 7 nervous
open 1 2 3 4 5 6 7 closed
resenting 1 2 3 4 5 6 7 taking in
approaching, curious 1 2 3 4 5 6 7 avoiding, refrain from
inclined toward 1 2 3 4 5 6 7 disinclined
peaceful 1 2 3 4 5 6 7 aggressive
Fig. 61.4: English version of the brief Kestenberg Movement Profile-based affect scale (13 Items,
Koch and Müller 2007)
4. Clinical and research applications of the Kestenberg

Movement Profile
Kestenberg pursued an enduring inquiry into the nature and significance of psyche-
soma and neuro-kinetic relations, as well as the manners and expressions of the body.
Her work gave rise to research by Bender (2007), Eberhard-Kaechele (2007), Koch
(2007a, 2007b), Lotan and Yirmiya (2002), Sossin and Birklein (2006), and others. La
Barre demonstrated the Kestenberg Movement Profile’s relevance to psychoanalytic
psychotherapeutic process (La Barre 2001, 2005) and, together with Frank (Frank
and La Barre 2011), linked earliest developmental processes and adult psychotherapeu-
tic process. Recent studies have employed the Kestenberg Movement Profile in embo-
died parental mentalizing (Shai and Belsky 2011), in autism (Loman 1995; Loman and
Sossin 2009), in depression (Koch, Morlinghaus, and Fuchs 2007) and in research of in-
terrelated patterns highlighting manners of stress transmission during parent-child
interaction (Birklein and Sossin 2006; Sossin and Birklein 2006).
Embodiment approaches (Koch 2006b; Koch and Fuchs 2011; Niedenthal et al. 2005)
based on phenomenology (Merleau-Ponty 1962) and the neurosciences, have helped
greatly to tie the Kestenberg Movement Profile to cognitive science research. Experi-
mental studies of the Kestenberg Movement Profile have been conducted by Koch
(2007a, 2011), highlighting dynamic kinesthetic feedback on the individual level and
haptic feedback from handshakes on the interpersonal level, supporting the influence
970 V. Methods
of movement qualities (system I) as well as movement shapes (system II) on affect, at-
titudes, and cognition. Kinesthetic feedback from movement qualities (rhythms and
strong versus light efforts) has been shown to operate online, i.e., in the situation (Suit-
ner et al. 2011) as well as offline, i.e., from memory (Koch, Hentz, and Kasper 2011).
Directional movement and meaning has been shown to include Kestenberg Movement
Profile dimensions in the context of spatioal bias research (Koch, Glawe, and Holt
2011). The Kestenberg Movement Profile has been employed as a theory to derive hy-
potheses on gaze behavior among men and women in work teams (Koch et al. 2010),
resulting in the finding that women on average distribute their gaze in a more egalitar-
ian way across all team members (indirect gaze), whereas men on average gazed more
dyadically (direct gaze), paying attention to fewer team members.
Kestenberg Movement Profile research has also examined sequential movement pro-
cesses looking at verbal-nonverbal parallel processes, indications of defensive employ-
ments (Koch 2007b), and manners of maternal communications of depression in
mother-child interaction (Reale 2011). Such investigations are in greater agreement with
systems models of change processes (Fogel et al. 2009), e.g. employing split-screen methods
of microanalysis (Beebe et al. 2010) applying the Kestenberg Movement Profile as a tool in
time-series analyses. Systems-framed studies of transmission and sequential process suggest
that combinatory movement patterns may be more robust factors than singular patterns.
5. Summary
The chapter provided a comprehensive overview of the Kestenberg Movement Profile as
an observational movement analysis system, and as an accompanying theoretical system
pertaining to movement patterns and their meaning. It introduced history, theory,
method, highlighted development, validation, and present use in clinical, developmental,
and cognitive sciences contexts. Further information can be obtained on the Kestenberg
Movement Profile website (www.kestenbergmovementprofile.org) and from the authors.
Acknowledgements: We thank Susan Loman, Teresa Kunz and research grant

01UB0390A of the German Federal Ministry of Research and Education (BMBF)
for support in the preparation of this contribution.
6. References
Bartenieff, Irmgard and Dori Lewis 1980. Body Movement: Coping with the Environment. New
York: Gordon and Breach.
Beebe, Beatrice, Joseph Jaffe, Sara Markese, Karen Buck, Henian Chen, Patricia Cohen, Lorraine
Bahrick, Howard Andrews and Stanley Feldstein 2010. The origins of 12-month attachment: A
microanalysis of 4-month mother-infant interaction. Attachment and Human Development 12(1):
3–141.
Bender, Susanne 2007. Einführung in das Kestenberg Movement Profile (KMP). In: Sabine C.
Koch and Susanne Bender (eds.), Movement Analysis – Bewegungsanalyse: The Legacy of
Laban, Bartenieff, Lamb and Kestenberg, 53–64. Berlin: Logos.
Birklein, Silvia B. 2005. Nonverbal indices of stress in parent-child interaction. Ph.D. dissertation,
Department of Psychology, New School for Social Research. New York, NY.
Birklein, Silvia B. and K. Mark Sossin 2006. Nonverbal indices of stress in parent-child dyads: Impli-
cations for individual and interpersonal affect regulation and intergenerational transmission.
In: Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Therapy, 128–141.
Berlin: Logos.
Bräuninger, Iris and Brigitte Züger 2007. Filmbasierte Bewegungsanalyse zur Behandlungsevalua-
tion von Tanz- und Bewegungstherapie. In: Sabine C. Koch and Susanne Bender (eds.), Move-
ment Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 213–223. Berlin: Logos.
Bridges, Laurel 1989. Measuring the effect of dance/movement therapy on the body image of in-
stitutionalized elderly using the Kestenberg Movement Profile and projective drawings. Un-
published master’s thesis, Antioch Graduate School, Keene, NH.
Davis, Martha 1981. Movement characteristics of hospitalized psychiatric patients. Amercian Jour-
nal of Dance Therapy 4: 52–84.
Davis, Martha 1997. Guide to movement analysis methods, part 2: Movement Psychodiagnostic
Inventory. Available from the author at madavis95@aol.com.
Davis, Martha, Hedda Lausberg, Robyn Flaum Cruz, Miriam Roskin Berger and Dianne Dulicai 2007.
The Movement Psychodiagnostic Inventory (MPI). In: Sabine Koch and Susanne Bender (eds.),
Movement Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 119–130. Berlin: Logos.
Eberhard-Kaechele, Marianne 2007. The regulation of interpersonal relationships by means of
shape flow: A psychoeducational intervention for traumatised individuals. In: Sabine Koch
and Susanne Bender (eds.), Movement Analysis. The Legacy of Laban, Bartenieff, Lamb and
Kestenberg, 65–86. Berlin: Logos.
Fogel, Alan, Andrea Garvey, Hui-Chin Hsu and Delisa West-Stroming 2009. Change Processes in
Relationships: A Relational-Historical Research Approach. 2nd edition. Cambridge: Cambridge
University Press.
Frank, Ruella and Francis La Barre 2011. Movement, Development, and Psychotherapeutic
Change. New York: Routledge.
Freud, Anna 1965. Normality and Pathology in Childhood: Assessment of Developments. London:
Karnac.
Hastie, Suzanne 2006. The Kestenberg Movement Profile. In: Stephanie L. Brooke (ed.), Creative
Arts Therapies Manual, 121–132. Springfield, IL: Charles C. Thomas.
Kestenberg, Judith S. 1946. Early fears and early defences: Selected problems. Nervous Child 5:
56–70.
Kestenberg, Judith S. 1954. The history of an “autistic child”: Clinical data and interpretation.
Journal of Child Psychiatry 2: 5–52.
Kestenberg, Judith S. 1965a. The role of movement patterns in development: 1. Rhythms of move-
ment. Psychoanalytic Quarterly 34: 1–36.
Kestenberg, Judith S. 1965b. The role of movement patterns in development: 2. Flow of tension
and effort. Psychoanalytic Quarterly 34: 517–563.
Kestenberg, Judith S. 1967. The role of movement patterns in development: 3. The control of
shape. Psychoanalytic Quarterly 36: 356–409.
Kestenberg, Judith S. 1975. Children and Parents: Psychoanalytic Studies in Development. New
York: Jason Aronson.
Kestenberg, Judith S. 1995. Sexuality, Body Movement and Rhythms of Development. Northvale,
NJ: Jason Aronson. First published [1975].
Kestenberg, Judith S. 1976. Regression and reintegration in pregnancy. Journal of the American
Psychoanalytic Association 24: 213–250.
Kestenberg, Judith S. 1977a. Prevention, infant therapy and the treatement of adults, 1: Toward
understanding mutuality. International Journal of Psychoanaltyic Psychotherapy 6: 338–367.
Kestenberg, Judith S. 1977b. Prevention, infant therapy and the treatment of adults, 2.: Mutual hold-
ing and holding oneself up. International Journal of Psychoanalytic Psychotherapy 6: 369–396.
Kestenberg, Judith S. 1980. The three faces of femininity. Psychoanalytic Review 67: 313–335.
Kestenberg, Judith S. 1985a. The role of movement patterns in diagnosis and prevention. In:
Donald A. Shaskan, William L. Roller and Paul Schilder (eds.), Mind Explorer, 97–160. New
York: Human Sciences Press.
Kestenberg, Judith 1985b. The flow of empathy and trust between mother and child. In: Elwyn J.
Anthony and George H. Pollock (eds.), Parental influences in health and disease, 137–163. Boston:
Little, Brown.
972 V. Methods
Kestenberg, Judith S. 1987. Imagining and remembering. Israeli Journal of Psychiatry and Related
Sciences 24: 229–241.
Kestenberg, Judith S. and Esther Borowitz 1990. On narcissism and masochism in the fetus and
the neonate. Pre- and Perinatal Psychology Journal 5: 87–94.
Kestenberg, Judith S. and Arnhilt Buelte 1977. Prevention, infant therapy and the treatment of
adults 1. Towards understanding mutuality, 2. Mutual holding and holding oneself up. Interna-
tional Journal of Psychoanalytic Psychotherapy 6: 39–396.
Kestenberg, Judith S., Marcus Hershey, Esther Robbins, Jay Berlowe and Arnhilt Buelte 1971.
Development of the young child as expressed through bodily movement. Journal of the Amer-
ican Psychoanalytic Association 19: 746–764.
Kestenberg, Judith S. and K. Mark Sossin 1979. The Role of Movement Patterns in Development,
Vol. 2. New York: Dance Notation Bureau Press.
Kestenberg Amighi, Janet 2007 Kestenberg Movement Profile perspectives on posited native
American learning style preferences. In: Sabine Koch and Susanne Bender (eds.), Movement
Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 175–186. Berlin: Logos.
Kestenberg Amighi, Janet, Susan Loman, Penny Lewis and K. Mark Sossin 1999. The Meaning of
Movement: Development and Clinical Perspectives of the Kestenberg Movement Profile. New
York: Brunner-Routledge.
Koch, Sabine C. 1999. The Kestenberg Movement Profile. Reliability of Novice Raters. Stuttgart,
Germany: Ibidem.
Koch, Sabine C. 2006a. Gender at work: Differences in use of rhythms, efforts, and preefforts. In:
Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Therapy. Theoretical
Perspectives and Empirical Findings, 116–127. Berlin: Logos.
Koch, Sabine C. 2006b. Interdisciplinary embodiment approaches. Implications for creative arts
therapies. In: Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Ther-
apy. Theoretical Perspectives and Empirical Findings, 17–28. Berlin: Logos.
Koch, Sabine C. 2007a. Basic principles of movement analysis. Steps toward validation of the
KMP. In: Sabine C. Koch and Susanne Bender (eds.), Movement Analysis. The Legacy of
Koch, Sabine C. 2007b. Defences in movement. Video analysis of group communication patterns.
Body, Movement and Dance in Psychotherapy 2: 29–45
Koch, Sabine C. 2011. Basic body rhythms and embodied intercorporality: From individual to inter-
personal movement feedback. In: Wolfgang Tschacher and Claudia Bergomi (eds.), The Implica-
tions of Embodiment: Cognition and Communication, 151–171. Exeter, UK: Imprint Academic.
Koch, Sabine C., Christina Baehne, Friederike Zimmermann, Lenelis Kruse and Joerg Zumbach
2010. Visual dominance and visual egalitarianism. Individual and group-level influences of sex
and status in group interactions. Journal of Nonverbal Behavior 34(3): 137–153.
Koch, Sabine C., Robyn Cruz and Sharon W. Goodill 2002. The Kestenberg Movement Profile
(KMP): Reliability of novice raters. American Journal of Dance Therapy 23(2): 71–88.
Koch, Sabine C. and Thomas Fuchs 2011. Embodied arts therapies. The Arts in Psychotherapy 38:
276–280.
Koch, Sabine C., Stefanie Glawe and Daniel Holt 2011. Up and down, front and back: Movement
and meaning in the vertical and sagittal axis. Social Psychology 42(3): 159–164.
Koch, Sabine C., Eva Hentz and Detlef Kasper 2011. The influence of movement qualities on
affect and memory. Unpublished manuscript.
Koch, Sabine C., Katharina Morlinghaus and Thomas Fuchs 2007. The joy dance. Effects of a sin-
gle dance intervention on patients with depression. The Arts in Psychotherapy 34: 340–349.
Koch, Sabine C. and Stephanie M. Müller 2007. The KMP-questionnaire and the brief KMP-based
affect scale. In: Sabine C. Koch and Susanne Bender (eds.), Movement Analysis – Bewegungsana-
lyse. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 195–202. Berlin: Logos.
Koppe, Simo, Susanne Harder and Mette Vaever 2008. Vitality affects. International Forum of Psy-
choanalysis 17(3): 160–179.
Laban, Rudolf von 1960. The Mastery of Movement. London: MacDonald and Evans.
Laban, Rudolf von and F. Lawrence 1974. Effort: Economy in Body Movement. Boston, MA:
Plays. First published [1947].
La Barre, Francis 2001. On Moving and Being Moved. Hillsdale, NJ: Analytic Press.
La Barre, Francis 2005. The kinetic transference and countertransference. Contemporary Psycho-
analysis 41: 249–279.
Lamb, Warren 1965. Posture and Gesture: An Introduction to the Study of Physical Behavior.
London: Duckworth.
Lamb, Warren and Elizabeth M. Watson 1987. Body Code: The Meaning in Movement. London:
Routledge and Kegan Paul. First published [1979].
Lewis, Penny 1990. The KMP in the psychotherapeutic process with borderline disorders. In:
Penny Lewis and Susan Loman (eds.), The Kestenberg Movement Profile: Its Past, Present Ap-
plications and Future Directions, 65–84. Keene, NH: Antioch New England Graduate School.
Lewis, Penny 1999. Healing early child abuse. The application of the KMP and its concepts. In:
Judith Kestenberg Amighi, Susan Loman, Penny Lewis and K. Mark Sossin (eds.), The Mean-
ing of Movement: Development and Clinical Perspectives of the Kestenberg Movement Profile,
235–248. New York: Brunner-Routledge.
Lewis, Penny and Susan Loman (eds.) 1990. The Kestenberg Movement Profile: Its Past, Present
Applications and Future Directions. Keene, NH: Antioch New England Graduate School.
Loman, Susan 1994. Attuning to the fetus and the young child: Approaches from dance/movement
therapy. Zero to Three: Bulletin of National Center for Clinical Infant Programs 15(1): 20–26.
Loman, Susan 1995. The case of Warren: A KMP approach to autism. In: Fran J. Levy (ed.), Dance
and Other Expressive Art Therapies, 213–224. New York: Routledge.
Loman, Susan 1998. Employing a developmental model of movement patterns in Dance/movement
therapy with young children and their families. American Journal of Dance Therapy 20: 101–115.
Loman, Susan 2005. Dance/Movement Therapy. In: Cathy Malchiodi (ed.), Expressive Therapies,
68–89. New York: Guilford Press.
Loman, Susan 2007 The KMP and pregnancy: Developing early empathy through notating fetal
movement. In: Sabine Koch and Susanne Bender (eds.), Movement Analysis. The Legacy of
Loman, Susan and F. Foley 1996. Models for understanding the nonverbal process in relationships.
The Arts in Psychotherapy 23: 341–350.
Loman, Susan and K. Mark Sossin 2009. Current clinical applications of the Kestenberg Move-
ment Profile. In: Sharon Chaiklin and Hilda Wengrower (eds.), Life Is Dance: The Art and
Science of DMT, 237–264. New York: Routledge.
Lotan, Nava and Nurit Yirmiya 2002. Body movement, presence of parents and the process of fall-
ing asleep in toddlers. International Journal of Behavioral Development 26: 81–88.
Merleau-Ponty, Maurice 1962. The Phenomenology of Perception. London: Routledge.
Moore, Carol-Lynne and Kaoru Yamamoto 2011. Beyond Words: Movement Observation and
Analysis. 2nd edition. New York: Routledge.
Niedenthal, Paula, Laurence W. Barsalou, Piotr Winkielman, Silvia Kraut-Gruber and Francois
Ric 2005. Embodiment in attitudes, social perception, and emotion. Personality and Social Psy-
chology Review 9: 184–211.
North, Marion 1972. Personality Assessment through Movement. London: Macdonald and Evans.
Ramsden, Pamela 2007. Moments of wholeness: How awareness of action profile® integrated
movement and related modes of thinking can enhance action. In: Sabine C. Koch and Susanne
Bender (eds.), Movement Analysis – Bewegungsanalyse. The Legacy of Laban, Bartenieff, Lamb
and Kestenberg 29–40. Berlin: Logos.
Reale, Amy E. 2011. Maternal facial shape flow patterns in mother-infant interaction correspon-
dent to maternal self-criticism and dependency: Application and utilization of the Kestenberg
Movement Profile (KMP) in a microanalysis of mother-infant interactions. Psy.D. Dissertation,
Department of Psychology, Pace University, New York.
974 V. Methods
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain: How Our Minds Share Ac-
tions and Emotions. New York: Oxford University Press.
Schilder, Paul 1935. The Image and Appearance of the Human Body: Studies in the Constructive
Energies of the Psyche. London: Kegan Paul, French and Trubner.
Shai, Dana and Jay Belsky 2011. When words just won’t do: Introducing parental embodied men-
talizing. Child Development Perspectives 5(3): 173–180.
Shaw, Jocelyn, K. Mark Sossin and Stephen Salbod 2010. Shape-Flow in Embodied Parent-Child
Affect Regulation: Heightened Emotional Expression in Dyadic Interaction. World Association
for Infant Mental Health, 12th World Congress: Infancy in Times of Transition. July 1, 2010.
Leipzig, Germany.
Sossin, K. Mark 1987. Reliability of the Kestenberg Movement Profile. Movement Studies: Observer
Agreement 2: 23–28. New York: Laban/Bartenieff Institute of Movement Studies.
Sossin, K. Mark 1990. Metapsychological considerations of the psychologies incorporated in the
KMP System. In: Penny Lewis and Susan Loman (eds.), The Kestenberg Movement Profile: Its
Past, Present Applications and Future Directions. 101–113. Keene, NH: Antioch New England.
Sossin, K. Mark 1999. Interpretation of an adult profile: Observations in a parent-child setting. In:
Janet Kestenberg Amighi, Susan Loman, Penny Lewis and K. Mark. Sossin (eds.), The Mean-
ing of Movement: Developmental and Clinical Perspectives of the Kestenberg Movement Profile.
265–290. Amsterdam: Gordon and Breach.
Sossin, K. Mark 2002. Interactive movement patterns as ports of entry in infant-parent psychother-
apy. Journal of Infant, Child and Adolescent Psychotherapy 2: 97–131.
Sossin, K. Mark 2007. History and future of the Kestenberg Movement Profile. In: Sabine C. Koch
and Susanne Bender (eds.), Movement Analysis: Bewegungsanalyse, 103–118. Berlin: Logos.
Sossin, K. Mark and Silvia Birklein 2006. Nonverbal transmission of stress between parent and
young child: Considerations and psychotherapeutic implications of a study of affective move-
ment patterns. Journal of Infant, Child, and Adolescent Psychotherapy 5: 46–69.
Suitner, Caterina, Sabine C. Koch, Katharina Bachleitner and Anne Maass 2011. Dynamic
embodiment and its functional role: A body feedback perspective. In: Sabine C. Koch, Thomas
Fuchs, Michela Summa and Cornelia Müller (eds.), Body Memory, Metaphor and Movement,
Winter, Deborah Du Nann, Carla Widell, Gail Truitt and Jane George-Falvy 1989. Empirical stu-
dies of posture-gesture mergers. Journal of Nonverbal Behavior 13(4): 207–223.
Sabine C. Koch, SRH Hochschule Heidelberg (Germany)

K. Mark Sossin, Pace University, New York, NY (USA)
62. Doing fieldwork on the body, language, and

communication
1. Available literature on fieldwork
2. Fieldwork defined
3. Equipment
4. Ethics of recording human behaviour
5. Getting the data you want
6. Basic approach to video recording
7. Closing remark
8. References
62. Doing fieldwork on the body, language, and communication 975
Abstract
This chapter gives a brief and selective overview of how to approach research on bodily
and linguistic aspects of communication by doing fieldwork. Here I do not address tech-
nological issues such as specific choices of video recording equipment or computer
programs for processing and analysis. Any such discussion would either be too pur-
pose-specific or would quickly become obsolete due to the fast-moving nature of the
technology. Our interest here is the general approach.
1. Available literature on fieldwork

Extensive and up-to-date resources are available in recent literature on fieldwork in lin-
guistics and language documentation (see Bowern 2008; Crowley 2007; Dixon 2010;
Gippert, Himmelmann, and Mosel 2006; Newman and Ratliff 2001; Sakel and Everett
2012; Thieberger 2012; Vaux, Cooper, and Tucker 2007). These are all useful for general
principles of field research, and to varying extents they show evidence of a current trend
in linguistics toward capturing visual aspects of communication. Of special relevance to
visual and bodily behaviour are Duranti (1997; Chapters 4, 5, 8, and 9, and the appen-
dix) and Seyfeddinipur (2012). See also the field manuals developed in the Language
and Cognition Group at the Max Planck Institute for Psycholinguistics in Nijmegen
(http://fieldmanuals.mpi.nl/; see Majid 2012).
2. Fieldwork defined
What is fieldwork? One meaning of the term is “distant travel for research”. This simply
refers to the collecting of data somewhere away from the researcher’s usual place of
work, especially when this requires the researcher to travel and be absent from their
own home for some time. This is in line with a recent definition given by Majid: “Field-
work is the collection of primary data outside of the controlled environments of the lab-
oratory or library” (Majid 2012: 54). This notion of fieldwork does not distinguish
between qualitatively different modes of data collection. It may, for example, involve
a researcher from Germany traveling to a Namibian village to carry out a brief program
of controlled experiments on memory for body movements in relation to spatial cogni-
tion (see Haun and Rapold 2009). Or it may involve a researcher from the Netherlands
traveling to central Australia for an extended period of observation and video recording
of everyday interaction in Aboriginal communities, to understand how speech and
bodily behaviour are integrated in referring to space (see Wilkins 2003). If this “travel
for research” sense of fieldwork is taken to simply mean that the work is done outside of
the research laboratory, it may also refer to work that is done closer to home. We might
say, for instance, that we are doing fieldwork when we go downtown to take notes on
how people gesticulate in public places (see Efron 1941), or when we take a video
camera to a local mechanic shop to record workplace interaction in a rich artifactual
environment (see Goodwin 2000; Streeck 2009).
A second notion of fieldwork is something like “the recording of research-
independent events”. This refers to the gathering of data by recording events which
are taking place independently of the fact that the recordings are being made. This is
not to say that the act of recording an event has no impact on the nature of the
976 V. Methods
event. Of course people’s behaviour can be affected as a result of them knowing that
they are being observed. But the idea of fieldwork in this sense is that the events
that are being recorded would take place anyway, in roughly the same form, and in
the same places and times, even if no research were being conducted on it at all. It
might involve recording people cooking, or having dinner, working in their fields or
workshops, staging performances or ritual activities. This is distinct from the collection
of data using experimental methods (whether one is in the field or not), where the to-
be-recorded events would not have happened at all were they not instigated by re-
searchers, motivated by research questions and methods.
There are at least two important reasons to do fieldwork in this second sense. First, it
can give the researcher access to phenomena that would otherwise be inaccessible, for
example because they are difficult or impossible to elicit experimentally. Second, and
relatedly, this kind of fieldwork provides a way to maximize ecological validity (though,
of course, at possible cost to experimental control). The second definition of fieldwork
almost entails the first, i.e., that the research be done “outside the lab” (unless of course
one is studying people’s communicative behaviour in research laboratories). But this is
not always the case. On the “recording research-independent events” definition of field-
work, we would have to include, for example, the recording of telephone conversations
or the gathering of data from web page commentaries.
In this chapter I want to discuss fieldwork in the sense of the overlap between the
two notions of the term discussed so far. Thus, we shall focus here on “travel away
from the researcher”s home environment to record research-independent events”.
Many researchers of bodily behaviour have carried out fieldwork in more or less this
sense (see Enfield 2003; Haviland 2003; Kendon and Versante 2003; Kita and Essegbey
2001; Wilkins 2003, among many others). Seyfeddinipur (2012) provides a linguist-
oriented overview and guide of fieldwork on co-speech gesture. There are many useful
references there, as well as detailed advice about the recording and annotation of data.
In what follows, I am going to restrict the discussion to this narrower definition than
what many people will want to include under fieldwork. In the rest of the chapter I want
to discuss some practical points of relevance to carrying out fieldwork on bodily aspects
of communication.
3. Equipment
For fieldwork on body, language, and communication, by far the most effective method
is to use video and sound recording technology. As Hanks (2009) points out, while it is
possible to collect data “on the fly” in the form of handwritten notes, this can only be
done if one has already built a sufficient background knowledge of the local cultural
and physical setting, and even then the data “is already a selective interpretation of
what the researcher perceives” (Hanks 2009: 19). Video recording of interaction
gives you the possibility of repeated inspection of the data. Also, with video-recorded
data you are able to provide others with direct evidence for your findings and analysis.
Use the highest quality equipment you possibly can. This will make a huge difference
for the quality and longevity of your data. If you invest now in equipment that delivers
the best quality possible, it will repay in the long term. Any recording you make today
will potentially be a source for your research for many years to come. Especially rele-
vant in research on gesture is your choice of equipment for video recording and digital
photography. When selecting equipment, you have to make many choices – what degree
of video resolution, what type of lens, which media format, etc. – and each of these
choices represents a potential weak link that may compromise any high-quality choice
you have made elsewhere. Even the best sound recording device cannot compensate for
the sound delivered through a poor quality microphone. Even the best camera cannot
compensate for the poor image delivered through a low quality lens. If the quality of
your recordings is compromised in any of these ways, you will be required to live
with the limitations of an inferior recording forever.
Quality is priority number one. Beyond this simple rule, it is not possible to give gen-
eral equipment recommendations. This is partly because your choice of equipment de-
pends on what you are trying to do, and partly because technology is changing so
quickly that a good choice today may be inferior or even obsolete tomorrow. In figuring
out what equipment to use, first you should specify exactly what your goals are – includ-
ing what kind of data you want to collect, and what you intend to do with the data after-
wards – then study the options, and consult as many colleagues as possible about their
experiences. Talk to them and pay attention to what they say. Our more experienced
colleagues have paid costs for their lessons learnt, and we can do well to benefit
from their costly experiences. Do not repeat others’ mistakes.
Even after the best preparation you should expect for things not to go the way you
have planned. In fieldwork, you must always remain flexible, and be willing to allow
your plans to change in an instant. You might be getting ready to make data recordings
with a certain goal in mind, yet somehow the circumstances change, and people start
doing something different from what they had led you to expect, or they become dis-
tracted from what you had hoped they would be doing. In such a case do not get fru-
strated or try to redirect things. Go with the flow. Like in all empirical science,
serendipity is a source of new, unexpected insights. In the field, you are in a world
that belongs to other people. The dynamics of daily life can be like the changing cur-
rents of the surf: If a rip tide takes you, don’t fight it. There’s no point. It would only
weaken you and tire you out. Instead relax and see where the flow leads you. You
will soon be able to make your way back to safety.
4. Ethics of recording human behaviour

Ethics of fieldwork are complex, with respect to recording people’s behaviour, main-
taining their privacy, and protecting their identities. There are both legal and moral is-
sues. With regard to the legalities of going into communities and recording people’s
activities, the fieldworker is subject to specified and binding requirements and restric-
tions. These are different in different countries. You need to find out what the legalities
are, and you must abide by them, otherwise you may be subject to punishment by law,
confiscation of data, and damage to your reputation both in your field site and in your
professional life. No less important than the legalities are the moral principles at hand.
Ethical rules and regulations are there to protect people from harm. Even where you
are under no legal constraints, you should always follow the golden rule: don’t do things
to others that you wouldn’t want them to do to you. Imagine yourself being studied in
your own home by a foreign guest, and ask yourself how you would want to be treated.
Treat others like that. And of course, this rule needs to be calibrated in light of your
knowledge of the local cultural setting. The people in your field site may have different
978 V. Methods
sensibilities from you. For instance, while they may not care about being photographed
with their shirts off, they might be mortified if you were to publish a picture of them
exposing the soles of their feet. Be sensitive.
5. Getting the data you want

Before collecting your data, you should be able to state explicitly what you are trying to
capture, and why. Do you want everyday conversation? A specific type of religious rit-
ual? Joint activities such as cooking together? Teaching? Are you going to study point-
ing gestures? Facial expression? Conversational repair? Be explicit about what you are
after, and make that your target. There is little point collecting “general purpose” data.
Ask yourself: What am I going to do with this data? What phenomena am I studying?
What publications will I produce based on this data?
When video-recording human behaviour in the field, you must ensure that you are
getting data of the right kind, and of the best possible quality. These goals are more dif-
ficult to meet than one might imagine. Things are easier using the types of fieldwork
protocols that are traditional in functional linguistics. These approaches prefer data
that are spontaneously produced rather than constructed, and it is normal to elicit
such data using non-linguistic stimuli such as pictures, film clips, or concrete objects.
With a stimulus-based elicitation protocol, the researcher can bring the participant to
a relatively quiet place, and can fairly easily control the production of communicative
behaviour – the when, where, what, who, and how (see for instance, the approaches de-
scribed in Majid 2012; Payne 1997). Nine out of ten times, such a recording will be suc-
cessful. But when you want to record unelicited, unscripted, independently occurring
behaviour in everyday life, it is the opposite. So many things can go wrong.
I recommend following a 1/10 Rule. For every ten recordings you make, expect that
one of them will work out well, while in nine of them something will go wrong. Expect
one out of ten recordings to be good enough. Of course, with practice one can improve
the odds, but if you expect a low success rate you can ensure that the quality stays high,
and if you happen to do better, you will be pleasantly surprised.
Traditional linguistic fieldwork is like farming. One cultivates one’s data. By making
appointments with consultants, and determining a line of questioning, you are taking a
direct hand in managing the acquisition of data. But if you are going into the wild, you
do not cultivate the things you want, instead you arrange to be where and when they
can be captured. There are two ways to go about this: hunting, and trapping. If you
try and chase the action like a hunter you will usually be frustrated. Instead, take the
approach of a trapper. Study the routines of your quarry. Learn when and where
your target type of data tends to recur, and then set up your equipment at those places
and times. For example, you might observe that between 4 and 5 o’clock most after-
noons, during a certain time of year, a certain group of men will sit and chat on the
verandah of a certain house after they have arrived back in the village from their
day’s work in the fields. You can set up your camera here before they arrive. If you
are chasing the data instead of lying in wait, you will often miss where a sequence be-
gins, as you’ll only recognize it once it’s already started. But if your equipment is
already set up where human interaction is most likely to occur, then you maximize
your chances of capturing an interaction from its inception. Like a trapper, be patient.
Let the camera roll even when nothing special is happening, and walk away. Come back
later and find out what you have captured.
Be prepared to catch something other than what you were after, or maybe even to
catch nothing at all. In one case, I recall, I set up a camera in a village household, and
soon after I left the scene, so did all the people who I had hoped to film. The result was
an hour of footage of an empty room. But the costs of these kinds of failures are neg-
ligible: a bit of time is lost, and a videocassette or small section of hard drive is filled up.
With digital media, now the norm, your only constraint is hard-disk space. You can eas-
ily ensure that you have ample space for many more recordings than you will actually
end up using.
6. Basic approach to video recording

Quality should be your highest priority. Collecting recordings is an investment, and if
the quality is good, you will long be grateful for your own efforts. Conversely, if the
quality is bad you will be frustrated by your data forever. Remember that you are col-
lecting data for scientific research. Do not confuse your data recordings with the kind of
informal documentary recordings of your field site in which you might be guided by
more aesthetic considerations. By all means collect general recordings of everyday
life in the village – indeed community members typically love to see these; or perhaps
you want to make a documentary film about your field site. But keep this quite separate
from your data collection, where the priority is getting the clearest possible image of the
data you are aiming to collect. If you are busy trying to get a close-up of a speaker’s face
for posterity you will later curse these efforts when you are unable to see what the
speaker was doing with his hands.
One fundamental problem to avoid is bad lighting. Often a recording is too dark.
This can happen if the recording is made in a room with small windows and no lights.
It often happens that the lighting of a recording is poor because the camera is pointing
toward the main source of light. You can avoid this “back-lighting” problem by setting
up the camera with the main light source behind it. For example if you are filming
inside a room where daylight is coming in through a window, set up the camera at
the window, pointing into the room, so that the light is coming in from behind the cam-
era. Never set up the camera to be pointing toward the window, unless you are very
sure you have made the right kind of technical adjustments for back-lighting on the
camera. The same principle of back-lighting applies in many other situations, e.g.,
when filming outside under a tree or verandah. The rule is, as far as possible, try to
have the main source of light, whether it be daylight or artificial light, coming from
behind the camera.
Another problem is camera movement. Even small movements of the camera will
create problems for your analysis, blurring the image. Try to avoid moving the camera
by panning, tilting, or zooming in and out. Make sure that all the action is included in
the frame. Leave just enough space free where people might put their hands if, for
example, they were to make a large pointing gesture. Don’t cut people’s body parts
from the frame, at least where these would be relevant to communication and nor-
mally visible to people in the situation. Avoid hand-held recordings if possible, i.e.,
by using a tripod. However, sometimes making hand-held recording has advantages,
980 V. Methods
in which case you should be careful to brace the camera and avoid camera shake as
far as possible.
Some final points concern the preparation for, and management of, your recordings.
When in the field, make sure you have your recording equipment fully ready at all
times. This means that whenever you have time, for example before you go to bed at
night, make all the necessary preparations in advance for your next set of recordings:
for example, fully charge all the necessary devices and batteries, pack your work bag
with all the things you need, such as extra blank videotapes, or formatted memory
cards, depending on the kind of equipment you are using. Your work bag is then
ready for you to grab at any moment. Check and re-check that everything is in working
order. And when you are actually making your recordings, you should not only check
and re-check, but re-re-check as well. Are your batteries charged? Are your lighting
and focus settings correct? Is your framing good? And particularly important is the
input of sound to your video recorder. If you use an external microphone, you can
have a better quality microphone than the one in your camera, and you will be able
to place it closer to the action. However, it is easy to make errors with cable connections
and sound settings, and so it is crucial to check, re-check, and re-re-check that your
sound input during recording is working well. Use headphones to monitor the sound
input on the camera once you have begun any recording.
Whenever you make a recording, you should immediately note down the relevant
metadata: the time and place of the recording, what are the activities being recorded,
who the people are, and any other possibly relevant information. You can easily and
quickly note these things at the time of recording, and if you don’t, you will find it dif-
ficult if not impossible to remember all the relevant details later. Lastly, backup your
data as soon as you can. And keep your backups in a different place from your original
data, especially when you are traveling.
7. Closing remark
This chapter has offered some remarks concerning fieldwork on the body, language, and
communication. There are many significant and complex issues that I have not men-
tioned. If you are intending to do fieldwork, it is worth reading widely beforehand,
and drawing on others’ experiences as far as you can. But nothing is as valuable as
first-hand experience. If you want to do fieldwork, just do it. No matter how well-
prepared you are, you will make mistakes. Just make sure you learn from these mis-
takes. Don’t try to do everything in a single field trip. Take time in between, in order
to assess your experience and adjust your way of working. And above all, remain flex-
ible and good-humored. Rigidity and stress are both unhealthy and contagious: they
should be avoided at all costs.
Acknowledgements
Thank you to Nick Williams for helpful comments on an earlier draft, and to colleagues
in the Language and Cognition Department at the Max Planck Institute for Psycholin-
guistics in Nijmegen for their input during many discussions on the topics discussed here.
And thanks to Julija Baranova for expert assistance. This work is supported by the
European Research Council (Grant “Human Sociality and Systems of Language Use”).
8. References
Bowern, Claire 2008. Linguistic Fieldwork: A Practical Guide. New York: Palgrave Macmillan.
Crowley, Terry 2007. Field Linguistics: A Beginner’s Guide. Oxford: Oxford University Press.
Dixon, Robert Malcolm Ward 2010. Basic Linguistic Theory. Oxford: Oxford University Press.
Duranti, Allesandro (ed.) 1997. Linguistic Anthropology. Cambridge: Cambridge University Press.
Efron, David 1941. Gesture, Race, and Culture: A Tentative Study of Some of the Spatio-Temporal
and “Linguistic” Aspects of the Gestural Behavior of Eastern Jews and Southern Italians in New
York City, Living under Similar as Well as Different Environmental Conditions. The Hague:
Mouton.
Enfield, N. J. 2003. Demonstratives in space and interaction: Data from Lao speakers and implica-
tions for semantic analysis. Language 79(1): 82–117.
Gippert, Jost, Nikolaus P. Himmelmann and Ulrike Mosel (eds.) 2006. Essentials of Language
Documentation. Berlin: De Gruyter.
Hanks, William F. 2009. Fieldwork on deixis. Journal of Pragmatics 41: 10–24.
Haun, Daniel B. M. and Christian J. Rapold 2009. Variation in memory for body movements
across cultures. Current Biology 19(23): R1068–R1069.
Haviland, John 2003. How to point in Zinacantán. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage, Culture, and Cognition Meet, 139–170. Mahwah, NJ: Lawrence Erlbaum.
Pointing, Where Language, Culture, and Cognition Meet, 109–138. Mahwah, NJ: Lawrence
Erlbaum.
Kita, Sotaro and James Essegbey 2001. Pointing left in Ghana: How a taboo on the use of the left
hand influences gestural practice. Gesture 1(1): 73–94.
Majid, Asifa 2012. A guide to stimulus-based elicitation for semantic categories. In: Nicholas Thie-
berger (ed.), The Oxford Handbook of Linguistic Fieldwork, 54–71. Oxford: Oxford University
Press.
Newman, Paul and Martha Ratliff (eds.) 2001. Linguistic Fieldwork. Cambridge: Cambridge Uni-
versity Press.
Payne, Thomas E. 1997. Describing Morphosyntax: A Guide for Field Linguists. Cambridge: Cam-
Sakel, Jeanette and Daniel L. Everett 2012. Linguistic Fieldwork: A Student Guide. Cambridge:
Seyfeddinipur, Mandana 2012. Reasons for documenting gestures and suggestions for how to go
about it. In: Nicholas Thieberger (ed.), The Oxford Handbook for Linguistic Fieldwork, 147–
165. Oxford: Oxford University Press.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Thieberger, Nicholas (ed.) 2012. The Oxford Handbook of Linguistic Fieldwork. Oxford: Oxford
University Press.
Vaux, Bert, Justin Cooper and Emily Tucker (eds.) 2007. Linguistic Field Methods. Eugene OR:
Wipf and Stock.
Wilkins, David P. 2003. Why pointing with the index finger is not a universal (in socio-cultural and
semiotic terms). In: Sotaro Kita (ed.), Pointing, Where Language, Culture, and Cognition Meet,
171–216. Mahwah, NJ: Lawrence Erlbaum.
N. J. Enfield, Nijmegen (The Netherlands)

982 V. Methods
63. Video as a tool in the social sciences

1. Introduction
2. The pioneers: Early uses of film
3. Critiques and challenges
4. Contemporary video practices in the social sciences
5. Conclusion
6. References
Abstract
Film has been available since the end of the 19th century and indeed has been used by
socio-anthropological research since then. Nonetheless, this interest in film has not ex-
panded in a linear and cumulative way, and has not spread in the same way in all disci-
plines. The chapter focuses on the use of film for scientific research and retraces some
early examples of film research practices, beginning with Haddon and Regnault. It also
mentions the difficulties raised by this field, the controversies about the interpretation of
moving images and the way in which their strong realistic dimension has been both
exploited and criticized. For contemporary uses of video in the study of social interaction,
the Natural History of an Interview, a project initiated by Bateson in 1955, has been crucial,
using film to record an entire interaction in a continuous way, then transcribed in detail and
exploited for interdisciplinary analyses. This project inaugurated a long series of studies
interested in analyzing “naturally occurring social interactions”, which have been partic-
ularly developed on everyday conversations, school settings and workplace activities,
within visual anthropology, visual sociology, micro-ethnography, ethnomethodology,
conversation analysis, linguistics and education studies.
1. Introduction
Film potentialities were exploited by social and anthropological research as soon as the
first technological devices for creating moving images were available. Nonetheless, this
interest in film has not expanded in a linear way: even if visual anthropology and visual
sociology are now established fields, film, and video have not yet been widely adopted
within the various disciplines of the social and human sciences. This paradoxical situa-
tion shows that the availability of technology does not in itself guarantee its scientific
use. Its potentialities are used only when they converge with the conceptual, theoretical
and methodological aims of the disciplines.
Even if the uses of film and video in the social sciences are not following a linear
growth curve, they have always been fueled by an interest in human, cultural, and social
embodied (not only linguistic) conduct, grasped in its dynamic, processual, and tempo-
ral dimensions, as well as in its contextuality and situatedness. Interest in human actions,
relations, and interactions as they take place in diverse, everyday (or more specialized)
settings has prompted the use of film and video for documenting them. Likewise, an
interest in capturing the details of language, gesture, body, and movement, and in study-
ing them in rituals, in everyday life, and in professional contexts, has built a strong
motivation to use cinematic technologies.
63. Video as a tool in the social sciences 983
In this short text, we give a range of examples of the uses of film and video as data
(and not as documentaries; that is, as empirical materials for analysis, and not as a way
of transmitting, circulating, and popularizing results, which would deserve another
study) from across the early period in the history of the social sciences, as well as within
contemporary research, emphasizing the new observable objects that these visual
technologies allow us to study.
2. The pioneers: Early uses of film

Etienne-Jules Marey’s creation of chronophotography in 1882 was motivated by a
strong interest in bodies in movement, such as birds flying, horses running, and men
walking. The “photographic gun”, light and mobile, allowed him to capture natural
movements by decomposing them into several poses (Marey 1896).
In the same period, Eadweard Muybridge was photographing the movement of ani-
mals and persons, using banks of cameras to catch their movement. As early as 1878, he
was able to photograph a horse in fast motion (Muybridge 1887).
These experiments were inspired by a scientific interest in movement as the primor-
dial manifestation of life, both in man and in animals. This interest was very widespread
in academic circles at that time, particularly in medicine and the social sciences, both in
the USA and in Europe. For instance, in Germany, Braune, an anatomist, and Fischer, a
physicist, used the same kind of techniques in order to study the human gait in the early
1890s (see Braune and Fischer 1895). In France, Felix-Louis Regnault, visiting the
Exposition Ethnographique de l’Afrique Occidentale in Paris in 1895 (six months before
the Lumière brothers made their first public projection of cinematograph films), used
Marey’s device to capture the movements of a Wolof woman (Regnault and Lajard 1895).
Regnault too was interested in movement:
The film of a movement is better to research than the simple viewing of a movement; it is
superior, even if the movement is slow. Film decomposes movement in a series of images
that one can examine at leisure while slowing the movement at will, while stopping it as
necessary. Thus, it eliminates the personal factor, whereas a movement, once it is finished,
cannot be recalled except by memory. (Regnault, as cited in Rony 1996: 47)
Regnault captures here some of the key features of film technology. Even if his account
of the posture of the Wolof woman is impregnated with social Darwinism and imperi-
alist determinism, he participated in the early recognition of the potentialities of this
new technology.
The first steps in visual anthropology are generally attributed to Alfred Cort Had-
don, a zoologist who became fascinated by cultural practices and who, during the Cam-
bridge Anthropological Expedition to Torres Straits in 1898, shot various films of
natives dancing, performing ceremonies, and living their everyday life. Again, dance
is a favored topic for film, which allows for the capture of its dynamic movements.
Haddon advised another zoologist-turned-ethnographer, W. Baldwin Spencer, to
take a Kinematograph and an Edison phonograph with him during a year-long expedi-
tion to Central Australia in 1901. Together with Frances James Gillen, they filmed Ar-
unta ceremonies and Kurnara rain ceremonies. They were very clear about the
limitations of the technology (“it is not a very easy matter to use [the cinematograph]
984 V. Methods
amongst savages. As they move about, you never know exactly where they will be, and
you are liable to go on grinding away at the handle, turning the film through at the rate
of perhaps fifty feet per minute, and securing nothing” (Spencer and Gillen 1912: 218,
quoted by Griffiths 1996: 30)). In the face of the difficulty of anticipating the use of
the surrounding space during the ceremony, they tried to pan the camera, and they
finally adopted multiple viewpoints on the event by changing camera set-ups for
every shot: This can be considered as an early attempt to edit film and produce a
multi-scope record.
Some years later, the Austrian doctor and anthropologist Rudolf Pöch organized two
big expeditions to Papua New Guinea (1904–1906) and the Kalahari desert (1907–
1909), bringing a camera and a phonograph with him. As with Regnault’s work,
Pöch’s work provides an example of the early use of film in anthropology, although it
is impregnated with racist ideology (Jordan 1992: 42).
These early ethnographers integrated film technology into their fieldwork practices.
Having a positivistic impetus, they saw film technology as a means to produce “objective”
evidence and knowledge.
But after them, the use of cameras and films dropped radically. Griffiths (1996) ex-
plains this change in terms of the changing requirements of the anthropological
community:
The shift from the evolutionism of nineteenth century anthropology to the cultural relati-
vism and structural functionalism of the twentieth century American and British anthro-
pology inspired a questioning of the objectivity of “scientific” recording techniques such
as anthropometry, a questioning that may explain why so few anthropologists expressed
an interest in moving pictures in the wake of Haddon’s and Spencer’s turn-of-the-century
fieldwork. (Griffiths 1996: 23; see also Banks and Morphy 1997)
It is in the 1930s that film is used again in two significant anthropological studies. The
first study was done by Franz Boas who, as late as 1930, when he was already seventy
years of age, filmed a number of Kwakiutl dances in his last trip to a community he had
studied for forty years (Ruby 1980). The filming was motivated by a strong interest in
rhythm and in the body – which it was his idea to transcribe using the Laban notation
(Laban 1926) – as well as the fact that it was a way of saving the traditional and endan-
gered Kwakiutl culture. Again, the film supported a vision of cultural practices invol-
ving not only language but also the body, and including “motor habits” as Boas
described them (Ruby 1980) – that is, mobility and gesture.
Later on, Boas’ student, David Efron, was even more interested in gesture: he
engaged in a comparative study of the gestural repertoires in different neighboring
communities (Sicilian and Lithuanian Jewish immigrants in the lower East Side of
New York City), as well as of the effect of assimilation on the range of gestures used
by their first generation descendants (Efron 1941).
Efron is an important pioneer in gesture studies (Kendon 2004); he acknowledged
that his use of film techniques was heavily inspired by Boas (“the idea of using film
as a research device in the field of “motor habits” originated entirely with “Papa
Franz” himself who discussed with us at great length his ideas about photographs,
motion pictures and sketches as research tools,” Efron, quoted by Ruby 1980: 11).
The second crucial experience in filming in the 1930s was made by Mead and Bateson
in Bali, where they stayed for 3 years (1936–1939) and shot about 22,000 feet of film.
Both were in contact with eminent precursors of visual anthropology, since Bateson
had been a student of Haddon and Mead of Boas. In their fieldwork, they collected
notes, transcripts, photographs, and films, all related through timed notes (Jacknis
1988: 163–164), already addressing the problem of how to synchronize these comple-
mentary materials. Moreover, anticipating later reflexive experiments, they showed
their films to the informants and gathered their comments about the recorded scenes.
Their way of treating recordings considers them as “data” and distances them from
the tradition of documentary films:
We tried to use the still and the moving picture cameras to get a record of Balinese beha-
viour, and this is a very different matter from the preparation of a “documentary” film or
photographs. We tried to shoot what happened normally and spontaneously, rather than to
decide upon the norms and then get the Balinese to go through these behaviours in suit-
able lighting. (Bateson and Mead 1942: 49)
Ten years later, Bateson played another important role in the use of film for social
research, when he built an interdisciplinary team at the Veterans Administration Hos-
pital of Palo Alto, studying communication in psychotherapy and in families with a
member affected by schizophrenia. Film recordings were used to observe how family
members interact, taking into consideration “non-verbal communication” (see Bateson
et al. 1956 on “double bind”). In 1955, the group worked together on the video record-
ing of a psychiatric interview between Bateson and one of his patients, Doris, then tran-
scribed by Hockett, Birdwhistell and McQuown (McQuown 1971). The title of the
project, Natural History of an Interview, significantly refers to an analysis of human
behavior that recognizes the importance of “spontaneous conversational materials”
in “a variety of contexts” (McQuown 1971: 9 and 11).
On the basis of this film, Birdwhistell eventually developed his famous analysis of the
cigarette scene and the discipline of kinesics.
With the Natural History of an Interview begins the contemporary phase of the use
of video in the social sciences: Influenced by Scheflen, the founder of context analysis
(see Scheflen 1972) who was part of the Palo Alto group, Kendon (1967, 1970) gave a
new vigor to the study of gesture. Some years afterwards, conversation analysis and
other praxeologically and pragmatically oriented approaches developed more and
more sophisticated ways of documenting social practices, and the use of video began
to spread in all the social and human sciences, not only in anthropology, where it
has been used since its invention, but also in linguistics, sociology, studies of work,
technology studies, and education.
3. Critiques and challenges

On the basis of his Balinese experience, Mead explicitly launched the idea of visual
anthropology, advocating photography and film as a crucial tool for collecting data
(see Mead 1995).
Nevertheless, as happened after Haddon, the use of film did not spread widely, and it
remained restricted for decades. Likewise, in sociology, there was a strong interest in
film at the beginning of the twentieth century – Stasz (1979) observes that between
986 V. Methods
1896 and 1916, thirty-one articles using visual images were published in the American
Journal of Sociology. But these types of data disappeared afterwards, and lost the com-
petition with other means of illustration, evidence, or proof, such as statistical reports
(Stasz 1979). Additionally, the fact that, from the beginning of cinematography, com-
mercial film widely exploited ethnographic subjects as a source of curiosity and exoti-
cism might have convinced the scientific community that the medium was too
popular for serious science (Griffiths 1996: 19). The relationship between films as
data for research, commercial cinema and documentary film has never been simple
(see Bateson and Mead 1942: 49).
This leads to a contemporary paradox: although film and then video technologies
have been available for more than a century, and although there are early examples
of research not only using but also advocating the use of film, first in anthropology,
and then also in sociology, the field appears today to have just started to burgeon.
Another paradox is that despite the massive use of images in contemporary society
and the spread of technological media like film, television, video, and computers,
there is still an absence of consolidated and standardized practices in video making
and video analysis in the social sciences.
Among the more recent reasons for this late development, some elements can be
pointed out. A general focus on language as the main manifestation of culture and soci-
ety and the privileged medium for accessing them, as well as the emphasis on writing as
a central practice in fieldwork, have competed with alternative approaches based on
visual instead of audible resources, contributing to their marginalization. In comparison
to the innumerable theoretical and methodological models available for analyzing ver-
bal language, methods for interpreting images have been less developed – giving the
impression that images are more superficial and less telling than language. In this con-
text, the use of film has been confronted by the skepticism affecting image as a means of
knowledge (Jay 1994), as well as by doubts about the objectivity, realism, and positivism
of images. These latter aspects have been criticized within discussions about the “crisis
of representation” (see Marcus and Fischer 1986), leading to an increased awareness of
the constructedness of any data social scientists produce and analyze, as well as of
the fact that images, too, are theoretically and ideologically loaded, and pervaded by
relationships of power, gender asymmetries, and ethnic discrimination.
In response to the crisis of representation and to the challenges constituted by the
use of video data in the social sciences, the latest developments in videographic
methods have insisted on the importance of doing fieldwork in order to prepare what
is to be shot by the camera, and of ethical concerns in securing informed consent, get-
ting authorizations, and protecting the privacy of the informants (Mohn 2002). More-
over, instead of considering that informants gazing at the camera reveal “bias”
undermining the “objectivity” of video recordings, analyses dealing with the orientation
towards the camera have turned it into a phenomenon to be studied, which documents
the situated conditions in which the video records are produced, within a reflexive
approach (Heath 1986: 176; Laurier and Philo 2006; Lomax and Casey 1998; Speer
and Hutchby 2003).
Video has been increasingly used as data and not just as an illustration, and is con-
sidered essential for overcoming the limitations of participant observation and of field
notes, and for making available details of embodied conduct that cannot be imagined by
introspection but can only be discovered and observed with adequate records. In turn,
the observation of these details fueled the development of analytical perspectives re-
cognizing the importance of visual feature for the study of communication, language,
and practice. MacDougall speaks in this respect of “a shift from word-and-sentence-
based anthropological thought to image-and-sequence-based anthropological thought”
(MacDougall 1997: 292). Related to this new perspective, and fostered by gesture stu-
dies and multimodal analysis, videos as data have been increasingly enriched with an-
notations and transcriptions, as well as being stored and eventually made available in
archives and searchable data banks. In this context, sharing video records has been con-
sidered as a way to ensure intersubjective interpretations and assessments of analyses,
and to make data available for the public examination of the scientific community.
4. Contemporary video practices in the social sciences

Today, video is more and more used in various disciplines and scientific communities in
the social sciences, within a range of methodological perspectives. This is partly thanks
to the fact that equipment is getting cheaper, more sophisticated, and less intrusive.
Although video is prized in the social sciences, especially in naturalistic studies, it is
also used in various sorts of “ecological experiments” and reflexive practices.
On the one hand, the impetus to record “naturally occurring social interactions” has
a long-standing tradition in ethnography, and has been strongly advocated by ethno-
methodology and conversation analysis. Naturalistic data arise from an interest in social
interaction as organized collectively by the participants in a locally situated way, in the
very setting in which activities are ordinarily accomplished. This approach insists on the
indexicality of social action, and therefore on the documentation of this action in its
context, which is observed without being orchestrated or disturbed by the researchers
and with the aim of preserving the participation framework, the continuous temporality
of the ongoing action, and the multimodal resources exploited by the participants
(Goodwin 1993; Heath, Hindmarsh, and Luff 2010; Mondada 2006). In practice, these
objectives are achieved by videos recorded as data, shot in the presence or in the
absence of the researchers, but also by using videos produced by the participants as
part of their ordinary activities (such as surgeons operating with endoscopic cameras;
see Mondada 2003) which are turned into data by the researcher.
On the other hand, other methodological set-ups have been explored, using various
types of elicitation: instead of recording naturally occurring interactions, some research-
ers organize different forms of non-directive interviews and focus groups (Lee 2004).
Others ask participants to perform experimental tasks within the context of their
everyday activity. Video diaries are used to get self-representations of informants
(Holliday 2004), and film elicitation techniques have also been explored (Harper
2002: 14). Moreover, reflexive uses of the camera have involved researchers training
ethnic communities into participatory videographing as a form of ethnography and
also of empowerment (Worth and Adair 1972 are famous for instructing Navajo Indians
to use a 8mm camera; Ruby 2000: 225 mentions the case of Michaels in the 1980s train-
ing remote Aboriginal communities in Australia to shoot their own films; Barnes,
Taylor-Brown, and Wiener 1997 report on a project involving mothers with HIV recording
messages for their children to view after their deaths).
Among the areas mainly investigated in the social sciences, we can mention every-
day settings, school settings, and work settings, which have been explored in visual
988 V. Methods
anthropology (Banks and Ruby 2011), visual sociology (Knoblauch et al. 2006; Knoblauch
et al. 2008; Pink 2001), micro-ethnography, ethnomethodology, conversation analysis
(Sidnell and Stivers 2005; Schmitt 2007; Mondada and Schmitt 2010; Haddington,
Mondada, and Nevile 2013; Streeck, Goodwin, and LeBaron 2011), workplace studies
(Luff, Hindmarsh, and Heath 2000; Middleton and Engeström 1996), linguistics (De
Stefani 2007), gesture studies, the science of education, etc.
Everyday settings have been explored by interaction analysis, conversation analysis,
gesture studies, and linguistics in order to uncover the use of multimodal resources for
the situated organization of social, cultural, and linguistic practices. Collaborating with,
and being inspired by, the work of Birdwhistell in kinesics (see Birdwhistell 1970) and
Scheflen in context analysis, Kendon (1979, 1990, 2004) showed very early on the advan-
tages of using film and video for interaction analysis, with a multiple focus on gaze, ges-
ture, and spatial ecology. Emphasizing the role of these resources in the organization of
social interaction, Goodwin (1981, 1993, 2000) shows how video allows the capture of
the systematics of gaze and turn organization, the coupling between gesture and the
environment, and the intertwining of various semiotic fields in complex social actions,
from everyday conversation to highly specialized professional settings. This allows
the development of a vision of language in action which is less logocentric than in
traditional, grammatical, rather abstract, and disembodied accounts.
Educational settings were investigated with cameras very early on, with a tradition of
studies on classrooms, both in ethnomethodology (Mehan 1993; Spier 1973) and in
micro-ethnography. More particularly, the work of F. Erickson (see Erickson 1982,
2004, and 2006 on videography) has been influential on a large range of ethnographic
classroom studies. Classrooms have also been investigated by science of education pro-
jects interested in capturing the embodied, ecologically situated, and dynamic processes
of learning. The particularities of this setting – which can involve a large audience in
front of the teacher, as well as multiple working groups spread about in a room, working
on whiteboards and tables, with various material and textual artifacts – as well as the
features of the learning processes (involving long-term observation across the curriculum
and thus micro- as well as long-term longitudinal documentation), offer various chal-
lenges to video recordings as well as to data archiving and to the analysis of large
amounts of data (Aufschnaiter and Welzel 2001; Derry 2007; Goldman et al. 2007).
Another complex setting which has been extensively studied in the last decade is the
workplace. Characterized by complex, fragmented, and heterogeneous ecologies, by
collaborative work both face-to-face and at a distance, and by the use of artifacts and
technologies, the workplace presents a multiplicity of embodied, visual, material, and
spatial features which has prompted challenging reflections on the best way to video-
graph them. Complex video devices, often coordinating several cameras, dynamic screen
capture, audio recordings of telephone conversations, and the collection of other docu-
ments, have been used, but also the need for a strong ethnographic approach prior to and
during the recordings has been advocated (Borseix 1997; Heath, Hindmarsh, and
Luff 2010; Mondada 2008). Some scholars have also added extra video recordings, doc-
umenting the confrontation between the participants and their (videorecorded) work
(Theureau 2010).
Of course, these topics in no way exhaust the richness of the uses of video as a tool in
the social sciences, but they give a picture of the diversity of settings and disciplines that
have been explored, as well as the multiplicity of issues that researchers have been able
to discuss.
5. Conclusion
The use of film and video in the social sciences is as old as the first cinematographic in-
ventions. From the beginning, film has been seen as an indispensable tool for the obser-
vation of dynamic action, movement, and embodied practices in their ordinary settings.
Nevertheless, the use of film and video has not developed linearly in the history of the
social sciences: This shows the importance not only of the availability of technologies,
but also of the compatibility of technologies with topical interests, methodologies, and
scientific ways of presenting results and evidence. From the 1960s onwards, there has
been an increasing focus on socio-cultural practices as they are observable in their nat-
ural context, without being orchestrated by the researchers. This praxeological turn
in the social sciences has meant an increasing use of video as a tool for analyzing
embodied details that are not imaginable but are only discoverable by fine-grained
observation.
6. References
Aufschnaiter, Stefan and Michaela Welzel 2001. Nutzung von Videodaten zur Untersuchung von
Lehr-Lern-Prozessen. Münster, Germany: Waxmann.
Banks, Marcus and Howard Morphy 1997. Rethinking Visual Anthropology. London: Yale Univer-
sity Press.
Banks, Marcus and Jay Ruby 2011. Made to Be Seen: Historical Perspectives on Visual Anthropol-
ogy. Chicago: University of Chicago Press.
Barnes, Donna B., Susan Taylor-Brown and Lori Weiner 1997. “I didn’t leave y’all on purpose”:
HIV-infected mothers’ videotaped legacies for their children. Qualitative Sociology 20: 7–32.
Bateson, Gregory, Don D. Jackson, Jay Haley and John Weakland 1956. Towards a theory of
schizophrenia. Behavioral Science 1: 251–264.
Bateson, Gregory and Margaret Mead 1942. The Balinese Character: A Photographic Analysis.
New York: New York Academy of Sciences.
Borseix, Annie (ed.) 1997. Filmer le Travail. Champs Visuels, 6. Paris: L’Harmattan.
Braune, Christian Wilhelm and Otto Fischer 1895. Der Gang des Menschen. Teil 1. Sächsische Ge-
sellschaft der Wissenschaften: Leipzig, XXI.
Derry, Sharon J. (ed.) 2007. Guidelines for Video Research in Education: Recommendations from
an Expert Panel. Prepared for the National Science Foundation, Interagency Education
Research Initiative, and the Data Research and Development Center. Available at: http://
drdc.uchicago.edu/what/video-research.html.
De Stefani, Elwys (ed.) 2007. Regarder la Langue. Les Données Vidéo dans la Recherche Linguis-
tique. Numéro Spécial du Bulletin VALS–ASLA, 85. Neuchâtel: Bulletin suisse de linguistique
appliquée.
Efron, David 1941. Gesture and Environment. New York: King’s Crown Press.
Erickson, Frederick 1982. Audiovisual records as a primary data source. Sociological Methods and
Research 11: 213–232.
Erickson, Frederick 2004. Origins: A brief intellectual and technical history of the emergence of
multimodal discourse analysis. In: Philip Levine and Ron Scollon (eds.), Discourse and Tech-
nology: Multimodal Discourse Analysis, 196–207. Washington DC: Georgetown University
Press.
990 V. Methods
Erickson, Frederick 2006. Definition and analysis of data from videotape: Some research proce-
dures and their rationales. In: Judith L. Green, Gregory Camilli and Patricia B. Elmore
(eds.), Handbook of Complementary Methods in Educational Research, 3rd edition, 177–192.
Goldman, Ricki, Roy Pea, Brigid Barron and Sharon J. Derry (eds.) 2007. Video Research in the
Learning Sciences. Mahwah, NJ: Lawrence Erlbaum.
Goodwin, Charles 1993. Recording interaction in natural settings. Pragmatics 3(2): 181–209.
Griffiths, Alison 1996. Knowledge and visuality in turn of the century anthropology: The early eth-
nographic cinema of Alfred Cort Haddon and Walter Baldwin Spencer. Visual Anthropology
Review 12(2): 18–43.
Haddington, Pentti, Lorenza Mondada and Maurice Nevile (eds.) 2013. Interaction and Mobility.
Language and the Body in Motion. Berlin: De Gruyter.
Harper, Douglas 2002. Talking about pictures: A case for photo elicitation. Visual Studies 17(1):
13–26.
University Press.
Heath, Christian, Jon Hindmarsh and Paul Luff 2010. Video in Qualitative Research. London: Sage.
Holliday, Ruth 2004. Filming the closet: The role of video diaries in researching sexualities. Amer-
ican Behavioral Scientist 47(12): 1597–1616.
Jacknis, Ira 1988. Margaret Mead and Gregory Bateson in Bali: Their use of photography and film.
Cultural Anthropology 3(2): 160–177.
Jay, Martin 1994. Downcast Eyes. The Denigration of Vision in Twentieth-Century French Thought.
Berkeley: University of California Press.
Jordan, Pierre-L. 1992. Ein Blick auf die Geschichte – Geschichte eines Blickes. Cinema-Cinema-
Kino, 23–74. Marseille, France: Musées de Marseille.
Kendon, Adam 1967. Some functions of gaze-direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1970. Movement coordination in social interaction. Acta Psychologica 29: 100–125.
Kendon, Adam 1979. Some methodological and theoretical aspects of the use of film in the study
of social interaction. In: Gerald P. Ginsburg (ed.), Emerging Strategies in Social Psychological
Research, 67–91. New York: Wiley.
Knoblauch, Hubert, Alejandro Baer, Eric Laurier, Sabina Petschke and Bernt Schnettler (eds.)
2008. Visual Methods. Forum: Qualitative Social Research, Vol 9, No 3, http://www.qualitative-
research.net/index.php/fqs/issue/view/11/showToc.
Knoblauch, Hubert, Bernt Schnettler, Jürgen Raab and Hans-Georg Soeffner (eds.) 2006. Video
Analysis: Methodology and Methods. Qualitative Audiovisual Data Analysis in Sociology.
Frankfurt am Main: Lang.
Laban, Rudolf 1926. Choreographie. Jena, Germany: Eugen Diederichs.
Laurier, Eric and Chris Philo 2006. Natural problems of naturalistic video data. In: Hubert Kno-
blauch, Jürgen Raab, Hans-Georg Soeffner and Bernt Schnettler (eds.), Video-Analysis: Method-
ology and Methods. Qualitative Audiovisual Data Analysis in Sociology, 183–192. Bern: Lang.
Lee, Raymond M. 2004. Recording technologies and the interview in sociology, 1920–2000. Soci-
ology 38(5): 869–889.
Lomax, Helen and Neil Casey 1998. Recording social life: Reflexivity and video methodology.
Sociological Research Online 3(2), http://www.socresonline.org.uk/3/2/1.html.
Luff, Paul, Jon Hindmarsh and Christian Heath (eds.) 2000. Workplace Studies. Recovering Work
Practice and Informing System Design. Cambridge: Cambridge University Press.
MacDougall, David 1997. The visual in anthropology. In: Marcus Banks and Howard
Morphy (eds.), Rethinking Visual Anthropology, 276–295. New Haven, CT: Yale University
Press.
Marcus, George E. and Michael M. J. Fischer 1986. Anthropology as Cultural Critique. Chicago:
Marey, Etienne-Jules, 1896. L’étude des mouvements au moyen de la chronophotographie. Revue
générale internationale. n˚ 1, 200–218.
McQuown, Norman A. (ed.) 1971. The Natural History of an Interview. Chicago: Microfilm Col-
lection, Manuscripts on Cultural Anthropology, Joseph Reginstein Library, Department of
Photoduplication, University of Chicago.
Mead, Margaret 1995. Visual anthropology in a discipline of words. In: Paul Hockings (ed.), Prin-
ciples of Visual Anthropology, 3–10. New York: De Gruyter.
Mehan, Hugh 1993. Why I like to look: On the use of videotape as an instrument in educational
research. In: Michael Schratz (ed), Issues in Qualitative Research, 93–105. London: Falmer
Press.
Middleton, David and Yrjö Engestrom (eds.) 1996. Cognition and Communication at Work. Cam-
Mohn, Elisabeth 2002. Filming Culture: Spielarten des Dokumentierens nach der Repräsentation-
skrise. Stuttgart, Germany: Lucius and Luci.
Mondada, Lorenza 2003. Working with video: How surgeons produce video records of their ac-
tions. Visual Studies 18(1): 58–73.
Mondada, Lorenza 2006. Video recording as the reflexive preservation-configuration of phenom-
enal features for analysis. In: Hubert Knoblauch, Jürgen Raab, Hans-Georg Soeffner and
Bernt Schnettler (eds.), Video-Analysis: Methodology and Methods. Qualitative Audiovisual
Data Analysis in Sociology, 51–68. Bern: Lang.
Mondada, Lorenza 2008. Using video for a sequential and multimodal analysis of social interac-
tion: Videotaping institutional telephone calls. FQS (Forum: Qualitative Sozialforschung /
Forum: Qualitative Social Research) 39: 1–35.
Mondada, Lorenza and Reinhold Schmitt (eds.) 2010. Situationseröffnungen: Zur Multimodalen
Herstellung Fokussierter Interaktion. Tübingen, Germany: Narr.
Muybridge, Eadweard 1887. Animated Locomotion. Philadelphia: University of Pennsylvania.
Pink, Sarah 2001. More visualising, more methodologies: On video, reflexivity and qualitative
research. Sociological Review 49: 586–599.
Regnault, Felix and M. M. Lajard 1895. Poterie crue et origine du tour. Bulletin de la Société
d’anthropologie de Paris 4(6): 734–739.
Rony, Fatimah Tobing 1996. The Third Eye: Race, Cinema and Ethnographic Spectacle. Durham,
NC: Duke University Press.
Ruby, Jay 1980. Franz Boas and early camera study of behavior. Kinesics Report 3(1): 6–11.
Ruby, Jay 2000. Picturing Culture: Explorations on Film and Anthropology. Chicago: University of
Chicago Press.
Scheflen, Albert E. 1972. Body Language and Social Order: Communication as Behavioral Con-
trol. Englewood Cliffs, NJ: Prentice Hall.
Schmitt, Reinhold (ed.) 2007. Koordination. Analysen zur Multimodalen Interaktion. Tübingen,
Germany: Narr.
Sidnell, Jack and Tanya Stivers (eds.) 2005. Multimodal Interaction. Special Issue of Semiotica 156.
Speer, Susan A. and Ian Hutchby 2003. From ethics to analytics: Aspects of participants’ orienta-
tions to the presence relevance of recording devices. Sociology 37(2): 315–337.
Spencer, Walter Baldwin and Frances James Gillen 1912. Across Australia. London: Macmillan.
Spier, Matthew 1973. How to Observe Face-to-Face Communication. A Sociological Introduction.
Pacific Palisades, CA: Goodyear.
992 V. Methods
Stasz, Clarice 1979. The early history of visual sociology. In: Jon Wagner (ed.), Images of Informa-
tion: Still Photography in the Social Sciences, 119–136. London: Sage.
Streeck, Jürgen, Charles Goodwin and Curtis LeBaron (eds.) 2011. Embodied Interaction: Lan-
guage and Body in the Material Wolrd. Cambridge: Cambridge University Press.
Theureau, Jacques 2010. Les entretiens d’autoconfrontation et de remise en situation par les traces
matérielles et le programme de recherche ‘cours d’action’. Revue d’Anthropologie des Con-
naissances 4(2): 287–322.
Worth, Sol and John Adair 1972. Through Navajo Eyes: An Exploration in Film Communication
and Anthropology. Bloomington: Indiana University Press.
Lorenza Mondada, Basel (Switzerland)
64. Approaching notation, coding, and analysis

from a conversational analysis point of view
1. Introduction
2. Transcribing verbal interaction
3. Notation of bodily communication
4. Conclusion
5. References
Abstract
The process of transcribing requires some fundamental content-related as well as layout-
related decisions. As has been discussed in a series of papers (e.g., Ochs 1979; Cook 1990;
Edwards and Lampert 1993), each of these decisions is influenced by transcribers’ pre-
conceptions and theoretical assumptions, by the research question and by methodological
considerations. Yet it has not been explored in all its implications that transcripts them-
selves lay the foundation for conceptualizations of the object of study and thus contribute
to the construction of theory.
Four arguments for this reflexive relation between transcription and theory will be pre-
sented: 1. Given their inherent selectivity, transcripts reduce possible analyses and inter-
pretations of the data presented, and they reduce the array of questions that could
possibly be asked. 2. Transcripts play a pivotal function in the hermeneutic research pro-
cess, serving as the basis for and documenting the outcome of analysis and interpretation
of the data. 3. Transcripts are texts – and as such, they become subject to writing and read-
ing conventions and to rules for interpreting texts rather than multimodal interaction.
4. The use of writing as a medium of representation raises issues of linguistic norms
and standards and it reinforces “monologistic” (Linell 1998) conceptualizations of
discourse.
1. Introduction
Transcripts are graphical representations of some communicative event that serve
to preserve the genuinely evanescent data collected as a basis for further analysis.
64. Approaching notation, coding, and analysis from a conversational analysis 993
They are used in a wide variety of disciplines, such as linguistic anthropology (Duranti
1997), sociology (Atkinson and Heritage 1984), psychology (MacWhinney 1995) and
linguistics (Ehlich 1993). In all of these disciplines, new approaches have arisen that
do not use experiments nor rely on introspection, but rather base their analysis on empir-
ical data, analyzing discourse as situated, that is, context-bound or at least partially made
up of context-sensitive language use as a communicative event. Consequently, in all of
these approaches, context comes to play a crucial role in the explication of both the
form and the meaning of discourse (see Cook 1990: 1). To validate analysis, the represen-
tation of contextual information becomes crucial. In recent years, several transcription
systems have been developed to meet the need for representing language use in context.
Bergmann (1981) compared the impact of the introduction of audio-recording tech-
nology to discourse analysis to that of the microscope to biology. The same can be said
for the implementation and proliferation of video-technology that has enabled re-
searchers to record visible behaviour and to analyse visible bodily action as utterances
(as programmatically formulated by Kendon 2004). In recent years, the focus has
shifted from the analysis of discourse to the study of multimodal communication,
that is, communication with linguistic as well as bodily means. With the growing need
for the representational inclusion of bodily acts into transcripts, conventional transcrip-
tional systems have been extended and new notational systems have been developed for
the representation of bodily means of communication in coordination with speech. Yet,
until now there exists no single standard transcriptional system, as researchers have
created their own systems that best suit their special interests and research questions.
During the process of transcribing, some fundamental content-related as well as
layout-related decisions have to be made. These concern
(i) the selection of what to transcribe, that is, which context information to include
and which to neglect,
(ii) the segmentation of the flow of observable behaviour into meaningful units,
(iii) the placement of text, that is, speakers’ turns, and contextual information, and
their relation to each other, and
(iv) the notational symbols (see Edwards 1993, 2001).
As has been discussed in a series of papers in recent years (e.g., Ochs 1979; Cook 1990;
the collection in Edwards and Lampert 1993), each of these decisions is influenced by
transcribers’ preconceptions and theoretical assumptions, by the research question and
by methodological considerations. The growing awareness of the impact of theory on
transcripts has led to the development of criteria for the quality of transcription systems
(see among others Ehlich 1993: 125; Selting et al. 1998; Deppermann 2001: 46; Edwards
2001; Selting et al. 2009). What has not yet been explored in all its implications is that
not only are transcripts influenced by theoretical assumptions, but that they themselves
lay the foundation for conceptualizations of the object of study and thus contribute to
the construction of theory.
(i) Four arguments for this reflexive relation between transcription and theory will be
presented: Given their inherent selectivity, transcripts reduce possible analyses and
interpretations of the data presented, and they reduce the array of questions that
could possibly be asked.
994 V. Methods
(ii) Transcripts play a pivotal function in the hermeneutic research process, serving as
the basis for and documenting the outcome of analysis and the interpretation of
the data.
(iii) Transcripts are texts – and, as such, they become subject to writing and reading
conventions and to rules for interpreting texts rather than to multimodal
interaction.
(iv) The use of writing as a medium of representation raises issues of linguistic norms
and standards, and it reinforces “monologistic” (Linell 1998) conceptualizations of
discourse. This will be shown for the transcription of verbal interaction (part I) and
for the notation of bodily communication (part II).
2. Transcribing verbal interaction

2.1. Selectivity
Transcripts are tertiary data sources (Kowal and O’Connell 2000b: 440), based on audio
or video records of interactions. Despite the undisputable selectivity of audio- and even
video-recordings, these new recording technologies have enabled higher standards of
accuracy and have led to an “interest in interactional details that would have been over-
looked in the past” (Duranti 1997: 123). The aim for transcripts is to preserve the
increase in quantity and accuracy of information enabled by use of recordings as op-
posed to participant-observations and fieldnotes (see Bergmann 2000: 531f.). This
clearly highlights the issue of selectivity, as Cook states:
The problems posed for transcription by the introduction of context are of identity, quality
and quantity. The first problem is to find a means of distinguishing relevant features, the
second of devising a transcription system that is capable of expressing them, the third
that even if such a system could be devised it would make the presentation of data
(which in its actual production had occupied a short space or time) take pages of transcript
[…]. (Cook 1990: 4)
Cook (1990) differentiates the following layers of context:
(i)the text, together with

(ii)its physical characteristics,
(iii)paralinguistic features, that is, bodily and facial movements and postures,
(iv) the physical situation, including properties and relations of objects and persons,
(v) the co-text that is considered as belonging to the same discourse,
(vi) the intertext, that is, texts participants associate with the text under investigation,
(vii) thought, that is, intentions, beliefs, and knowledge, as well as interpersonal affilia-
tions, attitudes and feelings, and last, but not least,
(viii) the observer.
Each of these layers of context has its own limitations and thus criteria in themselves for
selection. While transcription systems correspond in that they capture the text (more or
less completely), they widely differ in which of the other contextual factors they
include, in what degree of detail they include notation of contextual factors, in the
categories they build and in the reasons they give for including some context parameters
and for excluding others.
Only a few researchers claim attainment of theoretically neutral transcriptions with-
out any pre-selection. Among them, Lévi-Strauss (discussing the different methodolo-
gies of observation versus experimentation in his famous Structural Anthropology)
states: “On the observational level, the main – one could almost say the only – rule
is that all the facts should be carefully observed and described, without allowing any
theoretical preconception to decide whether some are more important than others”
(Lévi-Strauss 1963: 280). Comparably, in the early days of the naturalistic approach
to interaction, Scheflen proposed that “[w]e do not decide beforehand what is trivial,
what is redundant, or what alters the system. This is a result of the research” (Scheflen
1966: 270, original emphasis).
Although a complete account of all contextual features seems desirable in order to
build data collections and corpora and to share data and transcripts among scholars, this
is not feasible. Transcripts are inherently and unavoidably selective. Selection is guided
by intuition, theory, research interest and methodological assumptions. Most research-
ers acknowledge the inherently selective nature of transcripts, some, like Cook (1990:
15) and Ochs (1979: 44), explicitly encourage it, stating that the transcript should reflect
what is known about communication as well as the particular research interest, that is,
the hypotheses to be examined (see Ochs 1979: 44).
In fact, transcribing all that “is there” is impossible for logical as well as for practical
reasons: “The amount of co-text and the amount of context exist on different dimen-
sions and thus in directly inverse proportion (i.e., the more co-text that is presented
in a given space, the less of other types of context)” (Cook 1990: 15). Transcribing
“all” would exceed transcribers’ as well as readers’ working memory. Moreover, as
Cook states:
[…] if we present everything (assuming for a moment that we could) this would be a repro-
duction of the speech event itself (or more exactly the speech event as its representation in
the mind of the participants is represented in the mind of the observer), and thus exclude
the validity of the analysis, rendering it no different in kind – if considerably more difficult
in apprehension – from witnessing (as a participant) the speech event itself. (Cook 1990: 4,
original emphasis)
One way to cope with the problem of selection is to theoretically and/or methodolog-
ically exclude some contextual factor as being irrelevant for analysis. E.g., in ethno-
methodology, intentionality has been explicitly rejected as a base for analysis: “[…]
meaningful events are entirely and exclusively events in a person’s behavioral environ-
ment [….] Hence, there is no reason to look under the skull since nothing of interest is
there but brains” (Garfinkel 1963: 190). Another, rather practical, way to reduce the
range of potentially relevant contextual features is to use data that lack some of
these contextual layers, such as telephone conversations where bodily communication
and the situation are not visible and thus are not available for participants in the
construction of meaning. Yet another solution is to make methodological use of the
context as it is done in Conversation Analysis and in Gumperz’ (1982) approach to con-
textualization. In these approaches, context is not seen as a given external fact, but as
established by the very interaction itself. Consequently, the transcriptional system
996 V. Methods
developed by Gumperz “attempts to set down on paper all those perceptual cues that
past research and ongoing analyses show participants rely on for their online processing
of conversational management signs” (Gumperz and Berenz 1993: 92).
Still, most researchers design their transcription systems more sensitively around the
topic under investigation, such as Bloom (1993) and Chafe (1993), who explicitly
include those and only those variables needed for the current research interest. Others,
such as Selting et al. (1998, 2009) propose a minimal transcript as working transcript
that can be complemented and refined according to the actual research question.Argu-
ably, the question is not to represent each and every contextual feature, but to select
relevant contextual features and to account for the selection. Therefore we need
explicit criteria for selection that are guided by theory and by our general knowledge
in the field. It is on the basis of empirical studies that our knowledge about relevant
context features grows, which in turn enables researchers to design transcription sys-
tems that increasingly make use of what previous studies have shown to be relevant
(see, e.g., DuBois et al. 1993).
2.2. Description and interpretation

Transcripts play a pivotal function in the research process between the representation,
the reconstruction and the conceptualization of data. Transcribing is not just some
mechanical reproduction of the communicative event by graphical means, but involves
a hermeneutic process, in which increasing understanding of the event leads to refine-
ment of the transcript, which in its turn substantiates further analysis and interpretation
(see Ehlich 1993: 124).
Transcribing is impossible without any preconception of the relevant units. First of
all, the linguistic analysis of the stream of sounds is a prerequisite for transcribing. Sev-
eral experiments suggested that “familiarity with the writing system might be crucial for
developing the ability to segment speech into separate sounds (phonemes) or larger
units (morphemes)” (Duranti 1997: 126). More generally, one indispensable first step
in the transcription process is segmentation of the ongoing stream of behaviour (see
DuBois et al. 1993: 46). At the beginning, these divisions may well be based on the re-
searcher’s intuitions about and partial understanding of relevant units. “As we proceed,
comparing our initial chart with the specimen, the transcription comes to be progres-
sively refined and the organization of the whole event is gradually formulated” (Kendon
1982: 461). Thus, we clearly face a circular process of transcription and conceptualization
(Kendon 1982: 460f).
When it comes to publication, transcripts may be transformed again, now rendering
the results as visible as possible, taking into account the addressed readers’ knowledge
and familiarity with certain transcription conventions (see Duranti 1997: 142). Thus,
readers are not provided with some neutral representation, but with the researcher’s
analysis and interpretation of the data.
The categories built may be more descriptive or more interpretive. The choice of dif-
ferent transcription systems for descriptive versus functional categories reflects theoret-
ical considerations as well as research interests. E.g. HIAT (Halbinterpretative
Arbeitstranskriptionen, Ehlich 1993) has been developed for the analysis of action pat-
terns in institutional settings, and therefore applies functional categories using punctu-
ation marks in order to classify utterances as “question” or “statement” without further
formal description. In contrast, GAT (Gesprächsanalytisches Transkriptionssystem,

Selting et al. 1998) has been developed on the premise that social order, institutional
practice, conversational genres and activities are negotiated locally and interactively.
Therefore, Gesprächsanalytisches Transkriptionssystem opts for formal categories
that allow for the analytical reconstruction of these local practices (see Selting 2001:
1061).
Arguably, a purely interpretive transcript forecloses different interpretations and the
validation of interpretation: “[…] if one transcribes a piece of behaviour only according
to its meaning, its function, or its effect, then the information about the event that might
contribute to an additional or an alternative interpretation is lost” (Bloom 1993: 154–
155). Yet, there is no purely descriptive transcript. Rather, the inherently interpretive
character of any transcript has to be acknowledged. Consequently, transcripts can
and should make the results of the interpretation as visible as possible while providing
readers with a representation of the data that allows for alternative interpretations (see
Bloom 1993: 154).
2.3. Transcripts as texts

The pivotal status of transcripts as representation of the phenomenon and documenta-
tion of its analysis does not only change the heuristic function of the transcript; more-
over, transcripts change the physical state of the interaction into a text. As such, they
are subject to reading and writing conventions that superimpose on routines and
conventions of interpreting interactions.
The interaction itself is evanescent, unique and multisensory. Its dynamics are direc-
tional in the sense that participants are interpreting what happens in light of what
has happened so far and what they expect to happen. Multimodal communication is
processed online, and it is perceived holistically. Transcripts, in contrast, are fixed,
and they are written line by line. Reading routines and conventions – for example, read-
ing from left to right and from top to bottom of a page – superimpose routines and con-
ventions for the perception and interpretation of interactions and the temporal
relationships of these intereactions.
Transcripts exploit two-dimensional paper in order to represent temporal relation-
ships, that is, the simultaneity and succession of events. For the arrangement of speak-
ers’ turns, mainly three formats can be found in the literature: line-by-line, columns and
score (for the latter, see section 3.3).
The line format uses only the vertical dimension for the representation of temporal
relationships and focuses on succession in time. It thus highlights speakership transition
and the sequential organization of turns, because, due to our reading conventions, “we
interpret each utterance in light of the verbal and nonverbal behavior that has been
previously displayed” (Ochs 1979: 46).
Under the premise that the transcript should display characteristics of the communi-
cative event and the results of the analysis, line format is especially well suited for the
representation of rather symmetric interactions between adults, given that
[i]n examining adult-adult-communication, overwhelmingly we treat utterances as contin-

gent on the behavioural history of episode. For example, unless marked by a topic shifter
(Sacks and Schegloff 1973), the contents of a speaker’s turn are usually treated as in some
998 V. Methods
way relevant to the immediate prior turn. The expectation of the reader matches the expec-
tation of adult speakers (Grice, 1975), and by and large inferences based on contingency are
correct. (Ochs 1979: 46)
On the other hand, the column format exploits the horizontal dimension to represent
simultaneity. In this format, the amount of talk of each participant is highly conspicu-
ous. Thus, it is particularly well suited for the representation of interactions between
participants with asymmetric speaking rights such as is often the case in institutional
settings. It also allows for the representation of unconnected utterances by several
speakers, as is often observed in children’s interactions (see Ochs 1979). Furthermore,
reading conventions and routines lead to the attribution of initiative and activity to the
participant whose utterances are placed in the leftmost column (see Ochs 1979: 50f). To
counterbalance this, one may place the more dominant participant in the right column
(see Ochs 1979: 51).
In sum, transcribers rely on reading conventions in order to make their analysis
clearly visible, and, regardless of transcribers’ intentions and conventions, readers’
routines contribute to their interpretation of the data represented in one or the other
format, and thus reinforce (or perhaps undermine) interpretations made by the
transcriber.
2.4. Notation of verbal utterances

Transcripts use conventionalized writing systems as a means of representation. Writing
systems select one variety as the norm that becomes the basis for standard orthography,
turning other varieties into deviations. Standard orthography constitutes an idealiza-
tion, rather displaying how people should be speaking than how they actually speak
(see Duranti 1997: 123–125). This raises questions of linguistic norms and readers’
expectations concerning the well-formedness of utterances.
Discourse analysis aims at representing and analyzing the particulars of oral lan-
guage use in interaction. Therefore, transcripts give a rough impression of features of
oral language such as corrections of formulations, false starts, shortcuts of words and syn-
tactic and/or prosodic constructions, interjections, delays, pauses and overlaps as well as
dialectal pronunciations. Since these details may turn out to be interactively relevant, dis-
course analysis refrains from correcting utterances into well-formed standard sentences
(Bergmann 2000: 531–532; Duranti 1997: 135–136; Jefferson 2004: 19–21). Yet, while re-
searchers in discourse analysis and sociolinguistics widely agree that dialect and spoken
language are variants in their own right, not deviations of some (written) standard, there
is controversy about how precisely to capture the particulars of pronunciation.
The most often applied “literal transcription” relies on readers’ familiarity with the
rather conventionalized rendering of the phonetic features of oral language from the
representation of dialogues in novels. A rather extreme transformation of standard
orthography is widely used in conversation analytical studies that use the transcription
system developed by Jefferson (see Jefferson 2004). This system is based on the assump-
tion that not only do speakers of a given dialect vary with the pronunciation used within
the dialect but even the speech produced by speakers of a standard pronunciation con-
tains varieties. Under the methodological guideline to detect the orderliness of ordinary
conversation, many of the pronunciational particulars are represented, because “[s]ome
of them have led to the discovery of ranges of orderliness; most of them are yet to be
explored” (Jefferson 2004: 23). Other researchers, such as Gumperz, object that this so-
called eye dialect “tends to trivialize participant’s utterances by conjuring up pejorative
stereotypes, while neither representing the phonetic level more precisely nor capturing
detail relevant to the analysis” (Gumperz and Berenz 1993: 96). Therefore, Gumperz
and Berenz (1993: 97) propose, whenever the use of a variety or varieties of colloquial
pronunciation is relevant to the research interest, to include both the standard ortho-
graphy and the popular spelling of the colloquialism in a regularized way and to
indicate the regularizations made in prefatory comments to the transcript.
Given that neither orthography nor literal transcription represents pronunciation
one-to-one, especially when the use (or non-use) of dialects or vernacular varieties
plays a central role in a given interaction, a phonetically more precise transcription,
e.g. using the International Phonetic Alphabet (IPA) would be recommended. Yet,
this gain in accuracy causes a loss in readability, since using a phonetic transcription
assumes as a prerequisite the professional training and competence of transcribers as
well as readers (Edwards 1993: 20; Edwards 2001: 330).
In any case, transcribers have to weigh the benefits of detailed transcription against
readability. Whereas stops, anacolutha, repetitions and so on can be accounted for by
interactive exigencies, their representation thwarts readers’ expectations of written
texts as containing correct and complete syntactic structures. What goes unnoticed
when listening to a spoken conversation is glaring when reading its written reproduction
in a transcript. Furthermore, even if discourse analysis and sociolinguistics do not con-
sider dialects as an impoverished standard (Duranti 1997: 139), their representation (or
in contrast, their regularization into standard orthography) may turn out to be quite
consequential, e.g. in police interrogations and in courtroom proceedings. The use of
a vernacular and its representation in transcripts is a social and political issue even
beyond legal and medial settings (see Buchholtz 2000).
3. Notation of bodily communication

3.1. Inclusion of bodily communication
The inherent selectivity of transcripts has an impact on the conceptualization of the
object of investigation. One of the most basic issues in communication studies is
whether to focus on verbal interaction or to take bodily means of communication
into account. Until recently, bodily communication has been excluded from discourse
analysis for theoretical, methodological and practical reasons. Especially in linguistics,
bodily means of communication that elude analysis by linguistic methods have been
defined ex negativo as “nonverbal” communication and as such have been excluded
from analysis (see e.g. Kendon 1982). In recent years, gesture studies analyse speech-
accompanying gestures as part and parcel of the utterance itself (e.g., Kendon 1980;
McNeill 1992). Gestures may be seen as “part of a speaker’s willing expression” (see
Kendon 2000: 49) and might be distinguished from body postures, posture shifts, gaze
directions and facial expressions that serve to establish and maintain orientation
towards one another and towards the unfolding interaction, which are in the focus of
“nonverbal communication” studies and of the newly developing multimodality
approach to communication (see e.g., the contributions in Schmitt 2007).
1000 V. Methods
Given the longstanding tradition of theoretical, methodological and analytic separa-

tion of verbal versus bodily communication research, we now have two sets of nota-
tional systems at our disposal: on the one hand, transcriptional systems that have
been developed in the framework of discourse analysis with its focus on verbal interac-
tion and that are complemented by signs for the representation of bodily means of com-
munication, and, on the other hand, notation systems that stem from the analysis of
kinesics (Birdwhistell 1970) and nonverbal communication research. Transcriptional
systems developed in gesture studies, as well as in the multimodal approach to commu-
nication, tend to overcome this dichotomy by including verbal and bodily utterances
equally. In order to deal with the multitude of bodily communication, coding systems
either classify the variety of different movement patterns into a small amount of
more or less global categories (“global coding”) or restrict coding to some selected, con-
spicuous movements (“restrictive coding”), or refrain from describing the movement
patterns at all and directly evaluate body movements in psychological terms (“direct
evaluation”), as in rating and functional coding (Frey et al. 1981: 213). E.g., most
research on facial expression focuses on inferences observers are able to draw from
facial expressions, whereas only few try to measure the face itself. Among them are
Ekman and Friesen (1982) whose “primary goal in developing the Facial Action Coding
System was to develop a comprehensive system that could distinguish all possible visu-
ally distinguishable facial movement” (Ekman and Friesen 1982: 179, original empha-
sis). On the other hand, transcription systems developed in the framework of
discourse analysis provide options for the notation of bodily behaviour which allows
for either including any visible behaviour or instead deciding on the basis of communi-
cative relevance and/or on the actual research interest which movements to include, and
which not to (see e.g., Ehlich 1993).
The inclusion of bodily means of communication in discourse analysis gave way to
new research topics such as simultaneity and coordination between modes of commu-
nication as well as between participants (see Deppermann and Schmitt 2007). Findings
of this multimodal approach to communication suggest a re-conceptualization of well-
established categories based on audio-analysis only, such as participant state, recipient,
hearer and speaker roles (see Schmitt 2005; Heidtmann and Föh 2007; Bohle 2007).
3.2. Coding and description

Just as in the transcription of verbal interaction, the choice for descriptive versus func-
tional categories in the notation of bodily communication depends on theoretical and
methodological assumptions as well as on the actual research question. This applies
to notation systems that were especially developed for bodily communication as well
as for transcription systems that stem from discourse analysis.
In the “Bernese system for the study of nonverbal interaction”, Frey et al. (1981)
emphasize that bodily movements do not have clear-cut meanings that could be trans-
lated straightforwardly and serve as a basis for a functional coding. Therefore, Frey
et al. (1981) analyse speech accompanying body movements into their spatio-temporal
components. They propose a chronological notation system (“Zeitreihen-Protokoll”)
where the spatial positions of several body areas (head, trunk, shoulders/arms, hands,
legs and feet) are notated with reference to the time code in the video. In contrast to
this rather positivist notational system, the Facial Action Coding System (FACS)
analyzes “any facial movement into anatomically based minimal action units” (Ekman
and Friesen 1982: 180). Yet, in agreement with Frey et al. (1981), Ekman and Friesen
explicitly “wanted to build the system free of any theoretical bias about the possible
meaning of facial behaviors” (Ekman and Friesen 1982: 180). The authors argue that
“the measurement must be made in noninferential terms that describe the facial behav-
ior, so that the inferences can be tested by evidence” (Ekman and Friesen 1982: 182).
Recently, Sager (2005) applied this approach to the coding of gesture.
In contrast, in one of the most intriguing attempts to apply structural linguistic
methods to bodily communication, Birdwhistell (1970) comes up with functional
units called “kines”, which are defined as “the least perceptible units of body motion”
(Birdwhistell 1970: 166). Kendon (2004 and others) and McNeill (1992 and others) de-
veloped notational systems from their studies of cognitive and communicative functions
of speech-accompanying gestures, and in so doing classified them on the basis of semi-
otic, semantic and pragmatic analyses. In these transcription systems, bodily communi-
cation is represented by pictures (be it drawings or stills), with the bodily movements
being described in ordinary terms and formal parameters (such as the onset, apex,
stroke, hold, retraction/transition) being indicated by special symbols.
Coding systems share the constraint of phonetic transcription of verbal utterances in
that they are to be done and to be read by professionally trained people only. Further-
more, rather positivist notations of the spatio-temporal parameters do not contribute to
readers’ understanding of the ongoing interaction. On the other hand, functional codes,
as well as descriptions of body movements in ordinary terms, run into the danger of
turning out rather interpretive, confounding description with inference (Ekman and
Friesen 1982: 182) and thus foreclosing any alternative analysis and interpretation.
3.3. Representing the coordination of multimodal communication

The transformation of the multimodal event into a written text is quite consequential
for the holistic versus analytical perception of multimodal communication, and it rein-
forces perceptions of verbal and bodily communication developed in the longstanding
“tradition of research that identifies the basic structure of communicative acts with
grammatical [and prosodic; U.B.] units” (Duranti 1997: 149f).
For practical needs and for reasons of clarity of presentation, the modes of com-
munication (posture, gesture, facial expression, gaze) are analytically distinguished
and their description is spatially separated, that is, presented in separate lines (see
e.g. Ehlich 1993). Readers of transcripts have to read the transcripts in nearly the
same manner as they do play scripts, imagining the scene from the text and from the
contextual information given in the transcript. Since they read line by line, readers of
transcripts then have to synthesize information about the different modes in order
to reconstruct what is perceived as an integrated multimodal event in the original
interaction.
As soon as bodily communication is continuously included into the transcript, simul-
taneity becomes relevant. Therefore, a layout format is needed that makes use of both
dimensions of the paper for the representation of temporal relationships. One such
format is the so-called score format based on notation conventions in music (see Ehlich
1993): “Whereas the left to right direction preserves the unfolding of events in time, the
vertical dimension captures how they overlap at each particular point in time” (Ehlich
1002 V. Methods
1993: 131). The score-format has been developed specifically for the transcription of
multiparty multimodal communication. Yet, as the description of bodily movements
often requires much more space than the transcription of speech, it disrupts the repre-
sentation of the text. Therefore, for longer descriptions of non-phonological phenom-
ena, Ehlich proposes to “mark the relevant point within the score area and add a full
description in the left margin, enclosed in other brackets” (Ehlich 1993: 135). What
is chosen out of practical considerations for readability sets the bodily means of
communication apart, and thus literally marginalizes it.
When bodily communication is investigated, the coordination of and temporal rela-
tionship between the several modes of communication have to be indicated. Regardless
of the specific theoretical background of the researchers, most transcription systems
using the line format segment the stream of speech into prosodic units, which are
seen as the fundamental unit of speech production (see Gumperz and Berenz 1993:
95) and which serve as the basis for formatting the transcript. Yet, despite being co-
expressive with speech (Kendon 1980; McNeill 1992), gesture phrases are not exactly
co-extensive with intonation phrases (Bohle 2007: 194–242). Nevertheless, even in
studies on speech accompanying gesture, taking gesture phrases as the basic unit is
the exception (but see Kita 2000: 172).
Likewise, in transcripts using the score format, the verbal utterance serves as refer-
ence for the indication of temporal relations between several modes of communication
(e.g., Ehlich 1993: 136). Yet, (not only) during periods of time when no-one speaks,
some mode of bodily behaviour may be used as a reference to which the other elements
are related (see Heath 1986: 20). Verbal utterances, as well as bodily movements, may
as well all be indicated with reference to some external unit of measurement, be it the
frame number or the time code, without prioritizing some mode of communication as
the organizing point of reference (e.g., Frey et al. 1981).
Another consequence of the verbal utterance serving as reference for the notation of
bodily communication is that the verbal utterance is most often placed in the first line,
which may reverse the online perception of verbal-gestural utterances. At least iconic
gestures tend to precede their “lexical affiliate” and thus open a projection space
(Schegloff 1984), with the word entering a prepared scene (Streeck 1993). In a given
interaction, gestures are typically seen before their lexical affiliates are heard. However,
due to our reading conventions, when placed beneath or right to the text, information
about gesture is read after the text.
3.4. Notation of bodily communication

The use of alphabetic writing for the representation of interaction reinforces the con-
ceptualization of verbal and “nonverbal” communication as categorically distinct –
quite in contrast to the authors’ explicit conceptualization of bodily movements as an
integral, functional equivalent, part of the communication process. In fact, “[w]e tend
to consider linguistic, what we can write down, and nonlinguistic, anything else;
but this division is a cultural artifact, an arbitrary limitation derived from a particular
historical evolution” (McNeill 1985: 350; original emphasis).
Frey (1977, quoted in Frey et al. 1981: 209f) has compared the situation of research-
ers in the field of bodily communication to that of illiterates transcribing verbal commu-
nication. In fact, given that bodily communication lacks phonetic articulation, “[w]riting
(especially alphabetic writing), […], is more adequate for the structural analysis of seg-
mentable sound sequences […] than for other forms of communication, especially ges-
tures” (Duranti 1997: 150). Bodily movements cannot be transcribed directly, they have
to be described. Descriptions in their turn are to be read like stage directions that do
not have to be pronounced but enacted. Consequently, due to their different represen-
tational modes – transcription versus description – information about body movements
is either graphically or spatially set apart in the transcripts in order not to be mistaken
for text.
Furthermore, by transforming non-talk into talk, verbal descriptions reproduce the
dominance of speech over other forms of human expression before giving us a chance
to assess how non-linguistic elements of the context participate in their own, unique
ways, to the constitution of the activity under examination (Duranti 1997: 144).
Readers should be aware that graphical distinction and/or spatial separation of
modes of communication are due to the writing system used for the representation
of discourse, not to ontological or functional differences of the modes themselves –
one may speculate that a transcript of multimodal communication using a logographic
writing system would minimize at least some of the issues discussed here.
4. Conclusion
Recordings and transcripts are used in order to objectify conversation – in the double
sense of transforming a genuinely evanescent event into an object that remains stable
and identical over time and in the sense of eliminating the inherent subjectivity of par-
ticipants’ as well as observers’ interpretations thereof. Yet, even rendered as a text, the
conversation is not the same over the course of time and to any observer. Not only for
participants, the interpretation of what is going on depends on individual interests and
expectations. As has been shown, any analyst may focus on different aspects of the com-
municative event; furthermore, each analysis is influenced and informed by previous
analyses by the same observer. Thus, as Franck (1989: 165) puts it, neither participants
nor observers may enter the same river twice. In consequence, Franck argues that this
specific kind of object of investigation calls for new methods of transmission. Challen-
ging the demand to write down processes and products of analysis, Franck (1989) pro-
poses that instead of attempting to meet the criteria set for scientific objects in
traditional approaches to language, e.g. in structural linguistics, one should conceive
of conversation analysis as a training to hear (Franck 1989: 166).
In a similar perspective, Nothdurft (2006) points out that transcribers attribute what
they hear to speakers, without reflecting that their own perception of a recorded con-
versation is in no way identical to what speakers actually did say. Furthermore, exactly
those procedures of treating the data are based upon an object of investigation that in
essential aspects fundamentally differs from the object of ordinary conversation. The
data undergo a process of transformation that ultimately leads. According to Nothdurft,
analysists to look for phenomena in the wrong place when locating them in the tran-
script. Analysts systematically misconceive the status of their data. Records and tran-
scripts do not represent communicative reality. What enables a detailed microanalysis
is just what eliminates the genuine evanescent character of conversation. Thus, the
phenomena under investigation evaporate and turn into what Nothdurft calls conversa-
tional phantoms. Consequently, he pleas for taking the specific instance of conversation
1004 V. Methods
not as object of investigation, but rather as an occasion prompting the analysts to reflect
upon their own experiences with and expectations towards communication.
From a psycholinguistic perspective, Kowal and O’Connell investigated the percep-
tual faculties that are necessary for transcribing: Transcribers have to refrain from those
perceptual routines that enable them to participate competently in a conversation, as
instead of concentrating on the meaning of utterances, they have to focus, for example,
on self-correcting practices that often interfere with understanding, and they have to
analyse changes in speed and loudness, intonation contours, simultaneity and succession
of turns that participants mostly notice intuitively and that often lie beyond conscious
perception (Kowal and O’Connell 2000a: 355). A corollary could be the consideration
and investigation of how reading routines and reading conventions influence lay readers’
as well as professional discourse analysts’ interpretation of transcripts.
Yet, only few researchers will go as far as to refrain from recording and transcribing
at all. The aim of this paper has been to elucidate the inherent selectivity of transcripts
and how they reduce what could be investigated. Essentially, what has not been re-
corded is not there to be transcribed, and what has not been transcribed is not there
to be analysed and interpreted. As tertiary sources, transcripts become the basis for fur-
ther investigation and as such constitute the object of investigation. Thus, only a limited
set of questions can be answered on the base of a given transcript. Since each new ques-
tion calls for a new set of data, records and transcripts, the sharing of transcripts among
researchers is a challenge. Meanwhile, discussion about practicality, standards and uni-
fication of symbols has begun and will continue with the aim of developing transcrip-
tional systems that reflect the current knowledge in the field and at the same time
allow for new research questions and for reanalysis and reinterpretation of transcripts
in publications.
Whereas selection and segmentation are regularly and often explicitly made on the
basis of on theoretical assumptions, the impacts of layout decisions mostly go unnoticed.
There is no clear correlation between theoretical stances and a given layout format;
rather, practical considerations and the wish to display the focus and results of the ana-
lysis as clearly as possible determine the layout. Yet, as a base for and documentation of
analysis and interpretation, transcripts build a strong case for the researchers’ assump-
tions about (this specific instance of) communication and thus contribute to readers’
conceptualization of the phenomenon as well. Without access to the original data nor
to the records, readers can only form their interpretations from the transcriber’s
account. New publication media enable researchers to provide readers at least with
audio- and video-files of the data.
For qualitative studies, the rendition of transcripts serves as evidence for the analysis.
In order to ensure intersubjective verifiability, the transcription rules are documented,
so that readers might check which information has (or has not) been transcribed, and
whether transcription is consistent and corresponds to the rules (see Steinke 2000:
325). While discussions of selectivity and its solutions can be found in general accounts
of transcription systems (such as Selting et al. 1998; Selting et al. 2009; and those col-
lected in Edwards 1993), until now, these issues are rarely discussed in single empirical
studies. Since there is no way out of the inherently reflexive relation between transcripts
and theory, criteria for selection as well as for layout decisions should be explicated in
any empirical study. In recent years, criteria for the quality of transcripts such as prac-
ticality, readability and exchangeability of data, as well as relevance in relation to the
research question and avoidance of mere positivistic description on the one hand, and
evaluative interpretation on the other hand, have been proposed by Ehlich (1993: 125),
Selting et al. (1998), Deppermann (2001: 46) and Edwards (2001) among others. These
criteria have been developed bearing theoretical considerations as well as practical ex-
igencies for transcription decisions in mind. Discussion should be continued, focusing
on the contribution of transcripts to the conceptualization of communication.
5. References
Atkinson, Maxwell J. and John Heritage 1984. Transcript notation. In: Maxwell J. Atkinson and
John Heritage (eds.), Structures of Social Action. Studies in Conversation Analysis, ix–xvi. Cam-
Bergmann, Jörg 1981. Ethnomethodologische Konversationsanalyse. In: Peter Schröder and Hugo
Steger (eds.), Dialogforschung. Jahrbuch 1980 des Instituts für Deutsche Sprache, 9–51. Düssel-
dorf, Germany: Cornelsen.
Bergmann, Jörg 2000. Konversationsanalyse. In: Uwe Flick, Ernst von Kardoff and Ines Steinke
(eds.), Qualitative Forschung: ein Handbuch, 524–537. Reinbek bei Hamburg, Germany:
Rowohlt.
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body Motion Communication. Philadel-
Bloom, Lois 1993. Transcription and coding for child language research: The parts are more than
the whole. In: Jane A. Edwards and Martin Lampert (eds.), Talking Data. Transcription and
Coding in Discourse Research, 149–166. Hillsdale, NJ: Lawrence Erlbaum.
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben. Explorative Studie zur Rolle Re-
Bucholtz, Mary 2000. The politics of transcription. Journal of Pragmatics 32: 1439–1465.
Chafe, Wallace L. 1993. Prosodic and functional units of language. In: Jane A. Edwards and Mar-
tin Lampert (eds.), Talking Data. Transcription and Coding in Discourse Research, 33–43. Hills-
Cook, Guy 1990. Transcribing infinity. Problems of context presentation. Journal of Pragmatics 14:
1–24.
Deppermann, Arnulf 2001. Transkription. In: Arnulf Deppermann (ed.), Gespräche Analysieren,
2nd edition, 39–48. Opladen, Germany: VS Verlag für Sozialwissenschaften.
Deppermann, Arnulf and Reinhold Schmitt 2007. Koordination. Zur Begründung eines neuen
Forschungsgegenstandes. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimoda-
len Interaktion, 15–54. Tübingen, Germany: Narr.
DuBois, John W., Stephan Schuetze-Coburn, Susanna Cumming and Danae Paolino 1993. Outline
of discourse transcription. In: Jane A. Edwards and Martin Lampert (eds.), Talking Data. Tran-
scription and Coding in Discourse Research, 45–89. Hillsdale, NJ: Lawrence Erlbaum.
Duranti, Alessandro 1997. Transcription: From writing to digitized images. In: Alessandro Duranti
(ed.), Linguistic Anthropology, 122–161. Cambridge: Cambridge University Press.
Edwards, Jane A. 1993. Principles and contrasting systems of discourse transcription. In: Jane A.
Edwards and Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse
Research, 3–31. Hillsdale, NJ: Lawrence Erlbaum.
Edwards, Jane A. 2001. The transcription of discourse. In: Deborah Schiffrin, Deborah Tannen and
Heidi Hamilton (eds.), The Handbook of Discourse Analysis, 321–348. Malden, MA: Blackwell.
Edwards, Jane A. and Martin Lampert (eds.) 1993. Talking Data. Transcription and Coding in Dis-
course Research. Hillsdale, NJ: Lawrence Erlbaum.
Ehlich, Konrad 1993. HIAT: A transcription system for discourse data. In: Jane A. Edwards and
Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse Research, 123–
148. Hillsdale, NJ: Lawrence Erlbaum.
1006 V. Methods
Ekman, Paul and Wallace Friesen 1982. Measuring facial movement with the facial action coding
system. In: Paul Ekman (ed.), Emotion in the Human Face, 2nd edition, 178–211. Cambridge:
Franck, Dorothea 1989. Zweimal in den gleichen Fluß steigen? Überlegungen zu einer reflexiven,
prozeßorientierten Gesprächsanalyse. Zeitschrift für Phonetik, Sprachwissenschaft und Kom-
munikationsforschung 42(2): 160–167.
Frey, Siegfried 1977. Zeitreihenanalyse sichtbaren Verhaltens. Bericht des 30. Kongreß Deutsche
Gesellschaft für Psychologie, 328–329. Göttingen.
Frey, Siegfried, Hans-Peter Hirsbrunner, Jonathan Pool and William Daw 1981. Das Berner System zur
Untersuchung nonverbaler Interaktion I: Die Erhebung des Rohdatenprotokolls. In: Peter Winkler
(ed.), Methoden der Analyse von Face-to-Face-Situationen, 203–236. Stuttgart, Germany: Metzler.
Garfinkel, Harold 1963. A conception of, and experiments with, ‘trust’ as a condition of stable,
concerted action. In: O.J. Harvey (ed.), Motivation and Social Interaction. Cognitive Determi-
nants, 187–238. New York: Ronald Press.
Grice, H. Paul 1975. Logic and Conversation. Peter Cole and Jerry L. Morgan (eds.), Syntax and
Semantics. Vol. 3: Speech acts. New York: Academic Press.
Gumperz, John 1982. Contextualization conventions. In: John Gumperz (ed.) Discourse Strategies.
Studies in Interactional Sociolinguistics, 130–152. Cambridge: Cambridge University Press.
Gumperz, John and Norine Berenz 1993. Transcribing conversational exchanges. In: Jane A. Ed-
wards and Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse
University Press.
Heidtmann, Daniela and Marie-Joan Föh 2007. Verbale Abstinenz als Form interaktiver Beteili-
gung. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimodalen Interaktion, 263–
292. Tübingen, Germany: Narr.
Jefferson, Gail 2004. Glossary of transcript symbols with an introduction. In: Gene Lerner (ed.),
Conversation Analysis. Studies from the First Generation, 13–31. Amsterdam: John Benjamins.
Ritchie Key (ed.), Relationship of Verbal and Nonverbal Communication, 207–227. The Hague:
Mouton.
Kendon, Adam 1982. Organization of behavior in face-to-face interaction. In: Klaus Scherer and
Paul Ekman (eds.), Handbook of Methods in Nonverbal Behavior Research, 440–505. Cam-
Kendon, Adam 2000. Language and gesture: Unity or duality? In: David McNeill (ed.), Language
Kowal, Sabine and Daniel O’Connell 2000a. Psycholinguistische Aspekte der Transkription: Zur
Notation von Pausen in Gesprächstranskripten. Linguistische Berichte 183: 353–378.
Kowal, Sabine and Daniel O’Connell 2000b. Zur Transkription von Gesprächen. In: Uwe Flick,
Ernst von Kardoff and Ines Steinke (eds.), Qualitative Forschung. Ein Handbuch, 437–447. Re-
inbek bei Hamburg, Germany: Rowohlt Taschenbuch.
Lévi-Strauss, Claude 1963. Structural Anthropology. New York: Basic Books.
Linell, Per 1998. Approaching Dialogue. Talk, Interaction, and Contexts in Dialogical Perspectives.
MacWhinney, Brian 1995. The CHILDES-Project – Tools for Analyzing Talk, 2nd edition. Hills-
of Chicago Press.
Nothdurft, Werner 2006. Gesprächsphantome. Deutsche Sprache 34: 32–43.
65. Transcribing gesture with speech 1007
Ochs, Elinor 1979. Transcription as theory. In: Elinor Ochs and Bambi Schieffelin (eds.), Develop-
mental Pragmatics, 43–72. New York: Academic Press.
Sacks, Harvey and Emanuel Schegloff 1974. Two preferences in the organization of reference of
persons in conversation and their interaction. In: N.H. Avison and J.R. Wilson (eds.), Ethno-
methodology: Labelling theory and deviant behavior. London: Routledge and Kegan Paul.
Sager, Sven F. 2005. Ein System zur Beschreibung von Gestik. Osnabrücker Beiträge zur
Sprachtheorie (OBST) 70: 19–47.
Scheflen, Albert 1966. Natural history method in psychotherapy: communicational research. In:
Louis Gottschalk and Aurthur Auerbach (eds.), Methods in Research in Psychotherapy, 263–
289. New York: Appleton-Century-Crofts.
Schegloff, Emanuel 1984. On some gestures’ relation to talk. In: Maxwell Atkinson and John Her-
itage (eds.), Structures of Social Action. Studies in Conversation Analysis, 266–296. Cambridge:
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung –
Online Zeitschrift zur verbalen Interaktion 6: 17–61.
Schmitt, Reinhold (ed.) 2007. Koordination. Analysen zur Multimodalen Interaktion. Tübingen,
Germany: Narr.
Selting, Margret 2001. Probleme der Transkription verbalen und paraverbalen/prosodischen Ver-
haltens. In: Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven F. Sager (eds.), Text-
und Gesprächslinguistik./Linguistics of Text and Conversation. Ein Internationales Handbuch
Zeitgenössischer Forschung. An International Handbook of Contemporary Research, 2. Halb-
band/Volume 2, 1059–1068. Berlin: Walter de Gruyter.
Selting, Margret, Peter Auer, Birgit Barden, Jörg Bergmann, Elizabeth Couper-Kuhlen, Susanne
Günthner, Christoph Meier, Uta Quasthoff, Peter Schlobinski and Susanne Uhmann 1998. Ge-
sprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte 173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, Jörg Bergmann, Pia Bergmann, Karin
Birkner, Elizabeth Couper-Kuhlen, Arnulf Deppermann, Peter Gilles, Susanne Günthner,
Martin Hartung, Friederike Kern, Christine Mertzlufft, Christian Meyer, Miriam Morek,
Frank Oberzaucher, Jörg Peters, Uta Quasthoff, Wilfried Schütte, Anja Stukenbrock and
Susanne Uhmann 2009. Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächs-
forschung. Online-Zeitschrift zur Verbalen Interaktion 10: 353–402.
Steinke, Ines 2000. Gütekriterien qualitativer Forschung. In: Uwe Flick, Ernst von Kardoff and
Ines Steinke (eds.), Qualitative Forschung. Ein Handbuch, 319–331. Reinbek bei Hamburg,
Germany: Rowohlt Taschenbuch.
Ulrike Bohle, Hildesheim (Germany)
65. Transcribing gesture with speech

1. Introduction
2. Notes on tools and data
3. Transcription, observation and annotation
4. Analysis of gesture meanings and functions in discourse
5. References
1008 V. Methods
Abstract
The chapter sketches a set of analytic heuristics for observing and recording observa-
tions of how gesture and speech co-occur in natural interactive human discourse as it
was developed at the University of Chicago. After discussing theoretical as well as meth-
odological issues on tools and data, the chapter presents the set of methods for the tran-
scription, observation, and annotation of speech and gestures. As a last aspect, the
chapter outlines the analysis of gestures meanings and functions in discourse by discuss-
ing the meaning-driven essence and the reciprocal nature of the transcribing gestures
with speech.
1. Introduction
This is a sketch of some essentials of a method for observing and recording observations
of how gesture and speech co-occur in natural interactive human discourse. The method
is not a “coding system”. It is a set of analytic heuristics emerging from decades of work
by many scholars of GESTURE who have collaborated in research led by David McNeill in
the Psychology Department at the University of Chicago. (Note that “GESTURE” in spe-
cial font signals the problematic nature of this term. In this treatment of the bodily di-
mensions of natural language use, the everyday senses of this term are at once too
narrow and too broad. The notes as to what constitutes gesture in the observation and
analysis framework sketched here do not constitute a full or adequate definition of the
term.) Application of these heuristics to multimodal discourse data has the goal of build-
ing a base of descriptive facts that contribute to a theory of human language, rather than
to a theory of “nonverbal behavior”. “Growth Point” theory (McNeill 2005) derives from
these facts. This theory holds that intervals of language use issue from “idea units”
possessing both gradient-imagistic and discrete-categorial semiotic properties.
The notion of GESTURE relevant to this sketch is at the level of modes of semiosis:
gradient-imagistic versus discrete-categorial. Gradient semiosis in the visuo-spatio-
motoric modality, as manifested in gestures and other embodiments, is a focus of obser-
vation, annotation, and analysis. Gradient semiosis is also a focus in the verbal-vocal
modality where it manifests, for example, in speech prosodic contouring. Likewise,
the visuo-spatio-motoric modality is understood to support categorial semiosis as well
as gradient, for example, conventionalized gestures that have standards of form. The
analytic goal supported by the transcription approach sketched here is a systematic
study of how gradient and categorial modes of semiosis, irrespective of the modality
in which they are realized, mesh in real-time language production. In practice, for
the most part, this amounts to observing and annotating phases of gesture production
and speech prosodic contouring and how these synchronize with lexical, grammatical,
and discourse constituents in co-occurring speech. Examination of these temporal rela-
tionships promotes insights into the nature of combined gradient and categorial mean-
ing generation from moment-to-moment and across extended intervals of discourse
time.
An adequate description of the “McNeill method” for transcription and annotation
of multimodal discourse faces at least three challenges. First, there is no single method.
Each scholar whose work has helped shaped the research paradigm has taken a distinc-
tive approach to transcribing gesture with speech, framed by a specific research agenda.
This sketch attempts to highlight certain commonalities among these approaches, in the
belief that these are largely driven by the observable facts of natural discourse, rather
than by affiliation with the research group. A second challenge has to do with defining
the range of empirically observable phenomena in natural language use to which the
method may be said to apply. Best practice continues to evolve in tandem with our
understanding of the target domain. This has meant that the number of phenomena
meeting criteria for the designation gesture has tended to increase.
The third challenge is misconceptions about this method, widespread in the research
community and resistant to change. A significant misconception is that the core activity
consists of parsing discrete gestural entities out of a stream of multimodal discourse and
labeling each as belonging to a singe one of a set of mutually-exclusive categories or
“gesture types” (e.g., iconic, metaphoric, beat, deictic, etc.). Something like this typo-
logical approach, which unfortunately confounds semiotic, morphological, production,
and social-interactional characteristics of gesturing in unsystematic ways, did character-
ize the research group’s early (1980s) efforts to gain traction in describing complex ges-
tural phenomena in narrative discourses. The Procedures Appendix of Hand and Mind
(McNeill 1992) reflected the early efforts. By the time of publication of that influential
volume, the group was coming to see gesturing as comprising phenomena with varying
temporal extents and consistencies of production, and multiple simultaneous and over-
lapping dimensions of semiosis and function in the discourse. Restrictive coding
schemes can have utility in narrow, targeted analyses. However, a tenet of the method
sketched here is that such schemes tend to be “non-neutral technologies” and, in apply-
ing them to natural discourses, the analyst must not become blind to the fluid multidi-
mensionality of every interval of gesture production, nor to the great variability of
gesturing across individuals. Any scheme for observation and annotation should be
employed in such a way that the annotator retains the ability to detect instances of ca-
tegorial blur that may have implications for descriptive adequacy and also remains open
to discovering the influence of additional, unlooked-for dimensions of patterning in a
dynamically-evolving natural discourse.
There are further misconceptions, including that the method does not treat socially-
constituted, conventionalized (“emblem” or “quotable”) gestures – it does; also, that it
concerns only what an individual speaker does with his or her hands – it does not. No
adequate empirical justification has been put forward for excluding semiotically valued
activity or attitudes of body parts other than the hands (head, feet, eyes, shoulders,
torso, whole body), or of the vocal apparatus (prosody) in discourse, nor of dyadi-
cally-constituted such embodiments. The method sketched here aims to meet the
challenge of encompassing all of these.
2. Notes on tools and data

2.1. Granularity
A core goal of the method sketched here is to observe how meaningful bodily move-
ments synchronize with speech constituents at multiple levels of grain, from the
syllable-by-syllable level of speech to the “discourse paragraph” (Kendon 1972). The
micro level of annotation supports assessment of very local bundles of meaning mani-
fested in gesture and speech. Observation at more macro discourse granularities is
1010 V. Methods
concerned with identifying the contextual antecedents of micro-timed gesture-speech

pairings and is essential for inferring local meanings and functions. Annotation of
macro-level patterns also highlights persistent and long-distance recurring features of
bodily movement and speech that contribute to building discourse sense and cohesion.
2.2. “Enabling technologies”

Aspects of the approach to transcribing gesture with speech sketched here are best
understood by keeping in mind that it developed during the technological era when
tape media recorded at 30 or more frames per second would be analyzed using profes-
sional-grade media editing decks coupled to high-resolution monitors engineered spe-
cifically for video display. Thus enabled, observations of gesture-speech synchrony at
the frame-by-frame level in comparison with coarser granularities revealed patterning
that became the basis for many theoretical claims. Today, digitized media viewed in
software interfaces has largely replaced videocassette recorders and videotapes, with
attendant shifts in functionality that have forced some accommodations in the method.
Details of gesture-speech co-occurrence observed on tape media were once recorded
using typographic conventions applied to speech transcription text in document file for-
mats. Such annotations, together with interlinear descriptive and explanatory notes on
gesture forms and meanings, then constituted the “metadata” associated with the raw
data of video-recorded discourse. Text file-based approaches to this exercise are still
in use today and largely support the requirements identified in this sketch of method.
However, most researchers now use software interfaces such as Praat (Boersma and
Weenink 2012) for speech transcription and speech acoustic analysis and ELAN (Brug-
man and Russel 2004), Anvil (Kipp 2012), or the like for video annotation. These
“music score” interfaces are built of annotation tiers temporally synchronized with
the digitized audio-video. They permit the user to mark onsets and offsets of behaviors
on tiers and then label these intervals with anything from simple codes indicating
behavioral categories to paragraphs of descriptive or explanatory text (Schmidt, Duncan,
Ehmer, et al. 2008).
2.3. Elicited discourse data

The core aspects of the language theory that this method of analysis informs have arisen
from analyses of a variety of extended, interactive discourses involving two or more
people. This sketch focuses on analyses of a corpus of quasi-monologic narrative dis-
courses, elicited as stories told from memory after seeing a movie or cartoon. The eli-
citations are videotaped with one or more camcorders positioned to capture both
speaker (the story teller) and listener participants in frame, often including facial
close-ups. If instrumental assessment of speech prosody is planned, audio is captured
with professional-grade microphones. The listening partner in the elicitations is actively
engaged, invited to comment or ask questions as necessary to fully understand the story.
The resulting interactions are typically lively and positive-to-neutral in affect. The cor-
pus comprises more than 1000 samples, two to twenty minutes in duration, in more than
three dozen languages, by speakers including adults, children from age 2–6, second lan-
guage learners, bilinguals, deaf signers, people with neurogenic language disorders, and
other groups. In this analytic framework, holding the discourse elicitation method
constant across all groups augments the certainty with which gesture meanings are
inferred. It also increases the power of a variety of cross-speaker-group comparisons.
Thus, elicitation method, annotative practice, and theoretical framework interpenetrate.
3. Transcription, observation and annotation

3.1. Speech transcription
A detailed speech transcription is the foundation on which multimodal analysis of a dis-
course is built. The attempt is to transcribe everything audible from each participant:
(i) words,
(ii) word partials,
(iii) speech repairs,
(iv) breaths,
(v) speech pauses,
(vi) laughs,
(vii) and various other non-speech sounds.
General practice has been to avoid punctuation (comma, period, etc.) so as not to force
sentential notions on the variously constituted utterances of natural discourse. The
speech of different participants is transcribed on separate lines in a text file or on
separate tiers in an annotation interface.
3.2. Multiple annotation passes

It has been found useful to annotate speech co-occurring gestures and other embodi-
ments (e.g., gaze direction, posture shifts, facial affect, etc.) in multiple passes over
the same discourse data. One reason is that the meanings and functions of coverbal em-
bodiments are highly discourse sensitive and complexly multi-dimensional. Interpreta-
tion of any one interval of speech and gesture hinges on detailed knowledge of how it
relates to others in the immediate and extended discourse context.
The gesture-plus-speech utterances below are excerpts from one speaker’s 4.5-
minute recounting of a cartoon story about a cat that tries in several different ways
to reach and capture a canary that he wants to eat. The symbols “/” and “#” represent
unfilled pauses and breaths, respectively. Other typographic annotation conventions are
explained below.
(1) … he tries / clIMBing up the drAINpipe / uh / and he gets ALL the way Up there …
(2) … he / tr[[ies going up][the inSI][Ide of the drAINpipe #]] and twEEty bird runs …
(3) … the next thin][[g you know he comes swINGin][g thrOUgh on a rOpe]] and then
he jUst misses the wINdow [and he smASHes into the brick wAll beSIde it # …
Some annotators make an initial pass across an entire discourse annotating speech pro-
sodic features such as intervals of peak emphasis. In excerpts (2)–(3), small capitals
mark intervals of audible prosodic emphasis (higher pitch, longer syllable duration, in-
creased loudness). Where speech is transcribed in Praat (Boersma and Weenink 2012)
1012 V. Methods
“TextGrids”, instrumental values for such features may be generated from the acoustic
signal and written to media- and transcript-synchronized tiers. In either case, the exer-
cise of identifying intervals of peak speech prosodic emphasis across a discourse has the
value of highlighting intonational structure as it pertains to contrastive discourse focus.
This can facilitate subsequent gesture annotation passes.
As concerns production properties of coverbal gestures in their temporal relation to
speech, some analysts choose to organize their efforts around the contrast between
movement phases and hold phases of bodily activity (Park-Doob 2010). A more widely
used heuristic centers on the notions of gesture “phrases” and “phases” (Kendon 1972,
1980) and the notion of gesture-speech production “pulse” (Tuite 1993). In either case,
the core activity is one of assessing the temporal and semantic associations between the
phases of bodily activity or stasis and intervals spoken discourse.
A gesture phrase is idealized as the entire interval a hand or other body part is in
motion, starting from an unmarked position of rest, through performance of some
movement or position that the observer perceives as having semiotic value in the dis-
course, and then returning to rest. In (2) and (3), above, square brackets enclose inter-
vals of gestural movement that are observably meaningfully related both to what the
speaker is saying at the moment and to what she witnessed in a cartoon. In the first
bracketed phrase of, “tr[[ies going up][the insi][ide of the drainpipe #]]”, the speaker
lifts her hand from her lap while saying “tries”. Her hand shape is an extended index
finger pointing up. She moves it up in front of her torso. In each of the next two inter-
vals, the hand drops back down (without losing the pointing shape or returning to rest)
and then repeats the movement up. The outer brackets enclosing all three phrases re-
flects the annotator’s judgment that they share features making them some kind of unit.
Intervals with no accompanying gesture phrase, as in (1), are not annotated. Where
“music score” annotation interfaces are used, intervals of gesture phrase are not
marked with typographic conventions on the speech text, rather on a tier synchronized
to the raw audio-video and the speech text.
The internal phase structure of gesture phrases is annotated next. A gesture stroke
phase is generally taken to be the interval within a phrase when an interpretable mean-
ing manifests and/or the interval of apparent greatest gestural effort. The annotator at-
tempts to infer a gesture stroke’s meaning through a “sense-making” process that takes
into account
(viii) the local and extended spoken discourse,

(ix) the image content and story structure of the eliciting stimulus,
(x) and features of gesture movement, hand shape, and location.
In (2) and (3), above, bold font indicates stroke phases within the gesture phrases. In (2)
the annotator judged the intervals of upward movement synchronized with “up”,
“inside”, and “drainpipe” to convey meaning in relation to cartoon imagery, the current
point in the story line, and the accompanying speech. Note that strokes are not always
movement phases. “Hold strokes” are frequently seen. A held configuration of the
hands depicting a static object, for example, may be judged a stroke phase, in a given
discourse context.
In the idealized gesture phrase, a preparation phase is movement from rest to stroke
onset. A retraction is movement from stroke offset back to rest. In text-based
transcripts, with the annotation of gesture phrases and the stroke phases within them,
preparation and retraction phases are de facto also annotated. In (2), “insi][ide of the
drainpipe #]]”, the preparation phase of the second phrase synchronizes with “ide of
th” in speech; the retraction with “pipe #”.
Several phenomena complicate observation and annotation of gesture phases.
Among them, first, gesture phrases often do not begin from rest positions nor end
with retractions to rest positions. Second, gesture phrases sometimes “abort” before
their stroke phases are fully realized, a common occurrence in conjunction with self-
interrupted, disfluent speech. Third, “hold” phases often occur within the gesture
phrase. A pre-stroke hold is when a gesturing hand momentarily halts between prepa-
ration and stroke. A post-stroke hold is when the hand halts for a moment after the
stroke phase, before retracting or beginning the next gesture phrase. In (2) and (3),
above, underlining indicates gestural holds. In the stroke phase synchronized with
“smashes” in (3) the speaker thrusts her tensed, flat palm away from her body and
then stops it suddenly, holding for the duration of “into”. Post-stroke holds are analyzed
as perseveration of a gestured image to encompass further speech constituents belonging
to the core idea unit from which the gesture-speech utterance is generated. Pre-stroke
holds have a different analytic significance, being seen as evidence that stroke phases tar-
get just those speech constituents that are focal in the idea unit. That fully formed ges-
tures may halt in their progression so as to time with particular speech constituents is
evidence that the synchrony of gesture and speech is not random, making synchrony
assessment seem a plausible approach to exploring multimodal semiosis in language
use. In order to achieve the required degree of accuracy in these assessments, most an-
notators find it necessary to examine gestural movements and holds repeatedly at different
playback speeds. Observation of phase micro-timing is important not only if the planned
analyses are to draw on speech-gesture synchrony findings, but also because close scru-
tiny of phases of gestural movement often reveals new dimensions of patterning and
spurs reassessment of gesture phrase judgments made in the previous pass.
4. Analysis of gesture meanings and functions in discourse

A crucial part of the observer-annotator’s task is inferring the discourse functions and
meanings of all gesture-speech pairings across a discourse. The results of that effort are
recorded as systematic annotations and descriptions of the features of gesture and
speech that drive the inference process. The method sketched here is in its essence
meaning-driven. Perceiving an interval as meaningful and inferring aspects of its mean-
ing are acts heavily influenced by considerations outside the individual gesture-speech
production pulse. To be adequate, the process must draw on the larger discourse frame(s)
that the pulse is embedded in, what meanings are emerging sequentially in the speaker’s
utterances, what viewpoint the speaker may be embodying, general characteristics of
this speaker as a gesturer, and so on; also, what stimulus-derived image the speaker
likely has in mind at the moment of speaking. An assessment of gesture meaning
based solely on physical features (i.e., handshape, motion, location) or solely on the
meaning of the word(s) accompanying the gesture will not be adequate.
Further, although seemingly the majority of coverbal gestures appear transparently
meaningful to the careful observer, some proportion continue to seem vague or ambig-
uous even after considerable consideration of all the reference points mentioned
1014 V. Methods
earlier. Such meaning indeterminacy may be at the level of the totality of information
there is to muster, within the universe defined by the collected audio-video sample, in
support of judgments about a phrase parse or the meaning of a stroke phase. It may al-
ternatively derive from the level of speaker thinking, which at moments in a discourse
may be indeterminate or ambiguous itself and thus manifests as gestures whose mean-
ings are difficult to infer.
The method for transcribing gestures with speech sketched here is one of hypothesis
formulation, testing, and revision of analytic judgments at every level (i.e., location of
phrase boundaries, identification of stroke phases, inferences of meaning and semiotic
dimensions in play, etc.). Annotation is a backward-adjusting process as the observer-
annotator works through a discourse sample or multiple related samples, amassing in-
sights that prompt (provisional) acceptance or rejection of hypotheses about particular
gesture-speech productions. As noted, gestures pattern on multiple levels simulta-
neously and are multifunctional. Therefore, multiple hypotheses that annotators may
generate about a given gesture’s meaning or function may all be supportable and should
all be recorded in the process of transcribing gesture with speech. Some hypotheses,
however, will truly be in competition with one another at a particular level of analysis.
For such, the hope is that evidence accumulating from iterative passes over the dis-
course as a whole will aggregate in support of one hypothesis over any other for
every interval of gesture-speech production in a discourse.
5. References
Boersma, Paul and David Weenink 2012. Praat: Doing phonetics by computer [Computer pro-
gram]. Version 5.3.23, http://www.praat.org, Accessed 07.08.2012.
Brugman, Hennie and Albert Russel 2004. Annotating multimedia/multi-modal resources with
ELAN. Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, the
Netherlands, http://tla.mpi.nl/tools/tla-tools/elan.
Kendon, Adam 1972. Some relationships between body motion and speech. In: Aron Wolfe Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–210. New York: Pergamon Press.
Ritchie Key (ed.), The Relation between Verbal and Nonverbal Behavior, 207–227. The Hague:
Mouton.
Kipp, Michael 2012. Anvil: The video annotation research tool [Computer Program]. Version
5.1.3. http://www.anvil-software.de/download/index.html, Accessed 14.08.2012.
of Chicago Press.
Park-Doob, Mischa A. 2010. Gesturing through time: Holds and intermodal timing in the stream
of speech. Unpublished Ph.D.dissertation, Department of Linguistics, University of California,
Berkeley.
Schmidt, Thomas, Susan Duncan, Oliver Ehmer, Jeffrey Hoyt, Michael Kipp, Dan Loehr, Magnus
Magnusson, Travis Rose and Han Sloetjes 2008. An exchange format for multimodal annotations.
In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios
Piperdis and Daniel Tapias (eds.), Proceedings of the 6th Language Resources and Evaluation
Conference, 359–365. Paris & Luxembourg: The European Language Resources Association.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1–2): 83–105.
Susan Duncan, Chicago, IL (USA)

66. Multimodal annotation tools 1015
66. Multimodal annotation tools

1. Introduction
2. Advances in multimodal annotation tools
3. Descriptions of representative tools
4. Concluding thoughts
5. References
Abstract
The chapter presents a concise overview of annotation tools for multimodal data. After
a short introduction into the topic, the chapter traces back the history of the development
of annotation tools before focusing on their recent advances and comparing the capabil-
ities and drawbacks of existing tools. The last section of the chapter concentrates
on a short description of some representative tools (ELAN, ANVIL, Transana, and
EXMARaLDA) and finishes with some concluding remarks on annotation tools and
their influence on the analytic perspective on gestures.
1. Introduction
Systematic analyses of multimodal communication are essential to scientific advance-
ment in the study of language, learning, and human interaction. At the core of these
human capabilities are multimodal acts of communication through which interlocutors
establish shared knowledge and bonds of empathy and make their thinking and emo-
tions manifest in multiple modalities (verbal/vocal, gestural, facial, bodily orientation
and movement). Communicative interactions are contextualized in time and place
with respect to relationship history of the interacting parties, evolution of the immedi-
ate discourse, type of discursive interaction (story telling, planning, free conversation,
instruction), and the local physical and interpersonal environment (e.g., arrangement
and number of interlocutors, presence of manipulable objects, texts, videos, photos, dia-
grams, and other artifacts). These multiple dimensions of communicative interactions
are complexly configured, subtle and nuanced, making comprehensive study of the
whole a significant methodological challenge for anthropology, psychology, and linguis-
tics, as well as computer science and engineering efforts concerned with modeling
human communicative interaction behavior. With respect to audio-video recording
and automating capture of biophysiological, motion, and other dimensions of human
communication, we have reached a stage of technological development where research-
ers are able to assemble large amounts of instrumentally captured data on a variety of
behavioral dimensions. Researchers now aspire to combine instrumentally-automated
analyses of dimensions of communicative interactions, such as prosodic features ex-
tracted from the speech acoustic signal, bodily movements detected by motion capture
from video, gaze shifts captured with eye tracking, breath kinematic data (e.g., Gullberg
and Kita 2009; McFarland 2001; McNeill et al. 2001; Wang and Levow 2011,
respectively) with human-annotated features such as discourse topic shifts, intonation
units, the meanings of gestures, syntactic constituents, backchanneling, interactional
synchrony (Kimbara 2006; Knight 2011; Loehr 2004; McNeill and Levy 1982;
1016 V. Methods
McNeill et al. 2001; Parrill 2008, respectively), and many others. Any hope of bringing
such a variety of instrumentally- and human-assessed dimensions of communicative
interaction together so as to make patterning among them accessible to exploration
rests with continuing development of software interfaces for visualization and annota-
tion of complex multimodal data, that are coupled with database structures capable of
supporting useful queries.
2. Advances in multimodal annotation tools

Since the 1960s, researchers have been using a variety of audio-video technologies to
address the challenge of bringing the multiple strands of communicative interactions
together into formats that allow them to explore and make sense of the interrelatedness
of all the strands. Initially, tools designed for applications other than the scientific
analysis of multimodal communication were repurposed and used in combination to
support observation, transcription, and annotation of verbal and bodily dimensions of
communication; e.g., text editors and devices and software for viewing and editing
film or video. Work by Condon and Ogston (1966) and Kendon (1972) harnessed
film editing technology when this became cost feasible for social science research bud-
gets. Today, the software analogs of such tools – professional grade media editing
packages such as Apple Final Cut™ and Adobe® Premiere Pro®, for instance – are
still in use in multimodal communication research. This is due to their functionalities
in terms of visualization of video and audio signals at fine granularities and because
they allow the researcher access to perceptually interpretable audio video speeds as
slow as frame-by-frame. This latter functionality supports accurate observation of fine-
grained synchronies among intervals of behaviors in the visible and audible modalities.
Such interfaces permit navigating in recorded audio-video data to achieve micro- and
macro-level observation of behavioral dimensions of interest. This activity is typically
joined with that of codifying such observations, for instance, as specialized annotations
applied to speech transcribed from the recording into a text document, or as time-
identified field values in a spreadsheet such as Excel™ or relational database file such
as FileMaker Pro™. Some researchers take the approach of augmenting annotated
speech transcriptions with inserted video stills and/or graphics created with drawing
tools (e.g., Goodwin 2007). A perhaps non-obvious utility of augmented text annotation
efforts is that the product serves as an integrative, unrestrictive, and readily interpretable
“visualization” of simultaneous and sequential relationships among behavioral strands
across extents of connected discourse. Some researchers feel that such formats support
basic, first-pass, exploratory and descriptive analyses of multimodal discourse data par-
ticularly well. Some of the most widely cited research on multimodal discourse continues
to be carried out largely using multi-purpose text, media, and graphics editing technol-
ogies. Reasons for this may include ease of use, learnability, reliability, negligible likeli-
hood of software obsolescence, and share-ability and long-term usability of corpora
developed using them. These benefits arise from the fact that these technologies are
commercial products for which the general, non-specialist, market is large.
However, the past fifteen years have seen development of many software tools de-
signed specifically for annotation of corpora incorporating audio, video, and other
types of data. Some of these are commercially marketed to the research community
and can be quite expensive to purchase (e.g., Noldus Observer™). Most, though, are
available to end users either free of charge or for a nominal fee. The proliferation of
such tools is promising in that it signals widespread engagement with the challenge
of systematizing the work of observing and analyzing multimodal communication
data, as well as with facilitating the time-consuming aspects of this work. However, it
also has problematic aspects. With so many tools available, with apparently overlapping
capabilities, it is difficult for researchers to know how to choose among them. In addi-
tion, many multimodal annotation tools have short lives, having been developed on
platforms that become obsolete or by student researchers who move on to other pro-
jects, leaving them unsupported. The likelihood of these eventualities can be difficult
to gauge prospectively.
A significant portion of the proliferative development is due to researchers develop-
ing their own tools “in-house”. Examples of this are TASX (Milde and Gut 2002) and
MacVisSTA (Rose, Quek, and Shi 2004), two full-featured tools for creating and mana-
ging corpora of linguistic and behavioral data based on audio and video sources, as well
as, in the case of MacVisSTA, data from motion capture. Though the developers of
these and other such packages typically have the goal of sharing with other researchers,
in practice, most in-house efforts remain confined to their research groups of origin and
tend to have relatively short life spans. Learnability and usability are often problem
areas with academic researcher-designed software, which typically does not come
with a comprehensive user’s manual. Not being intended for commercial release,
such packages are unlikely to have received extensive testing and so may not be very
robust. Further, researchers seeking an annotation tool appropriate for their work
may not be able to tell, without investing a lot of learning time, whether particular
packages have capabilities they require or will match their research style. Such factors
push researchers in the direction of developing their own tools, thus furthering the cycle
of proliferative development.
Surveys or systematic comparisons of various multimodal annotation tools avail-
able at particular times over the past fifteen years have appeared in print (e.g.,
Allwood et al. 2001; Bigbee, Loehr, and Harper 2001; Bird and Liberman 2000; Silver
and Patashnick 2011) and at several conference workshops, including those held at
International Society of Gesture Studies meetings (Rohlfing et al. 2006; Rohlfing
and Duncan 2007), and recently at a conference hosted by the Netherlands Associa-
tion for Qualitative Research (KWALON). Organizers of the “KWALON Experi-
ment” tasked five developers of tools for qualitative data analysis with analyzing
the same dataset and then writing reflective papers about the experience
that would enable potential users to get a sense of how the tools compare (Evers
et al. 2011). The results are gathered in a special issue of FQS Forum: Qualitative
Social Research (see Evers et al. 2011; also, Woods and Dempster 2011). In addition,
Silver and Lewins (2009) and Fielding and Silver (2012) are helpful concerning
factors researchers may consider in selecting tools to match their research goals
and style.
3. Descriptions of representative tools

As useful as a comprehensive overview of the many multimodal annotation tools
currently available would be, this space is too limited and, in any case, the rapid pace
of development would soon render such a summary out of date. The researcher trying
1018 V. Methods
to decide on a multimodal annotation tool to use is best served by visiting software

packages’ official websites, trying out demo versions, and actively monitoring the
research literature for technical reports and articles that survey or compare tools
currently in development. The five software packages briefly described for comparison
purposes below meet criteria specified by Bigbee, Loehr, and Harper (2001), specifi-
cally: active development status, generally recognized strong functionality (“best of
breed”), and unique capabilities. Three more criteria are: comparative maturity
(10 years or more of steady development, testing, and responsive user support) assuring
robustness, evidence of large user bases, and interoperability with other tools (Schmidt
et al. 2008). The packages chosen seem generally representative of loose classes of
multimodal annotation tools, so a comparison can foreground certain useful themes. Sil-
ver and Patashnick (2011) note:
Despite the long and varied use of visual data, there seems to be a lack of cross-fertilization
of methods and tools between disciplines. In parallel to the development of CAQDAS
(Computer Assisted Qualitative Data Analysis) packages, tools derived from educational,
behavioural, linguistic, and psychological perspectives offer some quite distinct analytic pos-
sibilities for audiovisual data. These include ELAN […] and Observer. These packages,
however, tend not to be rooted in qualitative social research traditions, usually taking a
more quantitative approach to the analysis of audiovisual data.
Indeed, ELAN (Wittenburg et al. 2006) is an interface that emphasizes accumulation

of categorial label data identifying discrete features of dimensions of behavior that the
researcher wishes to count and compare. The package ANVIL (Kipp 2012) is in the
same class of tools. Both interfaces are intuitive and easily learned. Together they
seem to be the dominant choices, currently, among multimodal annotation tool
users in the fields that Silver and Patashnick (2011) identify as tending toward quan-
titative analyses. Language archiving was a primary spur to ELAN’s development.
This accounts for certain requirements, for instance, in how controlled vocabularies
for coding are set up. ANVIL was originally spurred by the need to annotate features
of video. In format and operation, both tools provide what Bigbee, Loehr, and Harper
(2001) refer to as interactive “music score” annotation interfaces that permit the user
to easily navigate along a timeline, marking onsets and offsets of intervals on multiple
behavioral dimensions represented as separate “tiers” time-aligned with a sample of
audio-video data or multiple synchronized samples. Categorial labels relevant to any
analytic schema are assigned to the marked intervals. These labels may be the objects
of searches and queries, resulting in count data and assessments of behavioral co-
occurrences. Data may be exported to other tools for statistical analysis. They are
well structured to support computational linguistics-style search and analysis algo-
rithms. ANVIL now includes visualizations of motion capture data in the interface,
as well as built-in capability for statistical analysis of interobserver reliability.
ANVIL is now also highly interoperable with ELAN. Annotation data is readily shar-
able between the two platforms. Some see a weakness in such interfaces when it
comes to exploratory, qualitative, descriptive analyses, both in the requirement of a
priori categorization schemes and the limited means for visualizing label data aggre-
gated in ways useful for getting a sense of regular (and unlooked-for) broad-scale pat-
terns in the data that are more encompassing than the view provided by a small time
window for annotation, in which multiple tiers appear. Both of these tools enable speech
transcription and markup, however many researchers who use such platforms seem to
prefer to use the software Praat (Boersma and Weenink 2012) or other, similar software
designed specifically to facilitate speech transcription as well as phonetic analysis of
speech, to create initial transcriptions of the speech in audio-video data samples. Praat
TextGrids and file formats supported by other speech transcription software are readily
importable into interfaces like ELAN and ANVIL.
Transana (Woods and Dempster 2011) and EXMARaLDA (Schmidt and Wörner
2009) are examples of what Silver and Patashnick (2011) would call a “CAQDAS”
package. Both have their roots in the analytic needs of linguists working within the
Conversation Analysis tradition. Both are easy for non-expert computer users to get
started with. Their user interfaces give an experience of accumulated transcription
and annotation data rather different from that provided by the “music score” interfaces,
although something similar to that framework is also configurable in these interfaces. In
Transana multiple transcripts, each a text describing events on one behavioral level
in an audio-video sample, are built in multiple passes over the data. An example is a
verbatim transcription of spoken utterances. Phrases of coverbal gesture would also
be entered as transcripts, intervals of which synchronize with the corresponding inter-
vals in the audio-video sample and with the corresponding transcribed utterance inter-
vals, permitting examination of temporal co-occurrences. The audio-video sample may
be played with multiple time-aligned transcripts synchronized in a visualization format
reminiscent of a speech transcript with interlinears, each capturing different analytic
layers. Intervals may be accessed for playback either from the audio-video media or
transcript intervals. Notes are easy to incorporate. Intervals with features in common
can be excerpted as clips and aggregated as collections. Transana supports a number
of simple but very useful data visualizations that assign different colors to events in dif-
ferent associated transcripts, permitting the user to visually scan for patterning across
lengths of discourse. Data visualizations useful for the human user are also a strong
point of EXMARaLDA which, in addition to making a variety of visualizations with
layering of annotated behavioral dimensions available in the computer interface, also
permits printing these out in a standard transcript format. Interoperability with other
tools has been a development priority for EXMARaLDA and it is particularly strong
in this regard.
4. Concluding thoughts
Ihde (1979) has argued that analytic tools unavoidably select, amplify and reduce as-
pects of experience in various ways. Mowshowitz (1976: 8) notes that, “tools insist on
being used in particular ways.” In these senses, tools such as those used for analysis
of multimodal communication are not “neutral”, therefore their use must inevitably
contribute to shaping the way we perceive and interpret the communication phenom-
ena we study. Silver and Patashnick (2011) caution researchers to be wary of this and
to document their analytic procedure so as to make them transparent and available
to critique. No multimodal annotation tool is without disadvantages in relation to
some analytic goals. Each enables or impedes different aspects of the observation
and analytic effort. There is no “best” tool for research in multimodal communication.
In fact, anecdotal report suggests that most researchers employ more than one such tool
to support different aspects of their work
1020 V. Methods
In the field of research on multimodal communication, Efron’s ([1941] 1972) seminal

study of styles of gesticulation during conversation, comparing Italian and Jewish immi-
grants to the United States and first-generation members of these same ethnic groups, is
still widely cited. For tools, Efron had paper and pencil and his eyes and ears. The in-
sights he developed are still acknowledged to be sound, 70 years on. In our era, new
multimodal annotation tools appear routinely, yet one does not hear that Efron’s efforts
are being routinely being superceded, in terms of their quality and depth of insight. It is
useful to reflect on these facts when the awesome variety and potential of new technol-
ogies threatens to overwhelm our thinking or distract us from thoughtful reflection on
the logic of our analytic strategies.
5. References
Allwood, Jens, Leif Grönqvist, Elisabeth Ahlsen and Magnus Gunnarssan 2001. Annotations
and tools for an activity-based spoken language corpus. In: Jan van Kuppevelt and Ronnie W.
Smith (eds.), Proceedings of the 2nd SIGdial Workshop of Discourse and Dialogue 16, 1–10.
Morristown, NJ: Association for Computational Linguistics.
Bigbee, Anthony, Dan Loehr and Lisa Harper 2001. Emerging requirements for multi-modal
annotation and analysis tools. In: Proceedings, Eurospeech 2001 Special Event: Existing and
Future Corpora – Acoustic, Linguistic, and Multi-modal Requirements. Aalborg, Denmark.
Bird, Steven and Mark Liberman 2000. A formal framework for linguistic annotation. Speech
Communication 33: 23–60.
Boersma, Paul and David Weenink 2012. Praat: doing phonetics by computer [Computer pro-
gram]. Version 5.3.23, retrieved 7 August 2012 from http://www.praat.org/
Condon, William S. and William D. Ogston, 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143: 338–347.
Efron, David 1972. Gesture, Race, and Culture. Berlin: De Gruyter Mouton. First published [1941].
Evers, Jeanine, Katja Mruck, Christina Silver, Baart Peeters, Silvana di Gregorio and Clare Tagg
2011. The KWALON Experiment: Discussions on Qualitative Data Analysis Software by Devel-
opers and Users. FQS Forum: Qualitative Social Research 12(1): Art. 40.
Fielding, Nigel and Christina Silver, 2012. Choosing an appropriate CAQDAS package. Retrieved
15.08.2012, from http://www.surrey.ac.uk/sociology/research/researchcentres/caqdas/support/
choosing/
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell and
Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language, 195–212. Amsterdam:
John Benjamins.
movements and information uptake. Journal of Nonverbal Behavior 33(4): 251–277.
Ihde, Don 1979. Technics and Praxis. (Boston Studies in the Philosophy of Science, Volume 24.)
Dordrecht: Reidel.
Kendon, Adam 1972. Some relationships between body motion and speech. In: Aron W. Siegmann
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–210. New York: Pergamon.
Kipp, Michael 2012. Multimedia annotation, querying and analysis in ANVIL. In: Mark T.
Maybury (ed.), Multimedia Information Extraction, 351–368. Los Alamitos, CA: Wiley-IEEE
Computer Society Press.
Knight, Dawn 2011. Multimodality and Active Listenership: A Corpus Approach. London: Contin-
uum International.
Loehr, Dan 2004. Intonation and gesture. Unpublished doctoral dissertation, Department of Lin-
guistics, Georgetown University, Washington, DC.
Loehr, Dan and Lisa Harper 2003. Commonplace tools for studying commonplace interactions:
Practitioner’s notes on entry-level video-analysis. Visual Communication 2(2): 225–233.
McFarland, David H. 2001. Respiratory markers of conversational interaction. Journal of Speech
and Language Research 44(1): 128–143.
McNeill, David and Elena Levy 1982. Conceptual representation in language activity and gesture.
In: Robert J. Jarvella and Wolfgang Klein (eds.), Speech, Place, and Action, 271–295. New
York: John Wiley and Sons.
McNeill, David, Francis Quek, Karl E. McCullough, Susan Duncan, Nobuhiro Furuyama,
Robert Bryll, Xin-Feng Ma and Rashid Ansari 2001. Catchments, prosody, and discourse. Gesture
1: 9–33.
Milde, Jan-Torsten and Ulrike Gut 2002. The TASX-environment: An XML-based toolset for time
aligned speech corpora. In: Manuel González Rodrı́guez and Carmen Paz Suarez Araujo
(eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation
(LREC 2002), 1922–1927. Las Palmas de Gran Canaria, Spain: European Language Resources
Association (ELRA).
Mowshowitz, Abbe 1976. The Conquest of Will: Information Processing in Human Affairs. Read-
ing, MA: Addison-Wesley.
Parrill, Fey 2008. Subjects in the hands of speakers: An experimental study of synactic subject and
Rohlfing, Katharina, Dan Loehr, Susan Duncan, Amanda Brown, Amy Franklin, Irene Kimbara,
Jan-Torsten Milde, Fey Parrill, Travis Rose, Thomas Schmidt, Han Sloetjes, Alexandra Thies,
and Sandra Wellinghoff 2006. Comparison of multimodal annotation tools – workshop report.
In: Gesprächsforschung 7: 99–123.
Rohlfing, Katharina and Susan Duncan 2007. Workshop: Annotation tools. Held at Integrating
Gestures: The International Society for Gesture Studies 3rd International Conference, North-
western University, Evanston, Illinois.
Rose, Travis, Francis Quek and Yang Shi 2004. MacVisSTA: A system for multimodal analysis. In:
Rajeev Sharma, Trevor Darrell, Mary P. Harper, Gianni Lazzari, and Matthew Turk (eds.),
Proceedings of the 6th International Conference on Multimodal Interfaces, 259–264. New
York: Association for Computing Machinery.
Schmidt, Thomas, Susan Duncan, Oliver Ehmer, Jeffrey Hoyt, Michael Kipp, Dan Loehr, Magnus
Magnusson, Travis Rose and Han Sloetjes 2008. An exchange format for multimodal annota-
tions. In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk,
Stelios Piperidis, and Daniel Tapias (eds.), Proceedings, Sixth International Conference on Lan-
guage Resources and Evaluation, LREC 2008, 359–365. Marrakech, Morocco: European Lan-
guage Resources Association (ELRA).
Schmidt, Thomas and Kai Wörner 2009. EXMARaLDA – Creating, analysing and sharing spoken
language corpora for pragmatic research. Pragmatics 19: 565–582.
Silver, Christina and Ann Lewins 2009. Choosing a CAQDAS package. Working Paper #001, 6th
edition. CAQDAS Networking Project and Qualitative Innovations in CAQDAS Project
(QUIC). Retrieved 15.08.2012 from http://eprints.ncrm.ac.uk/791/1/2009ChoosingaCAQDAS
Package.pdf.
Silver, Christina and Jennifer Patashnick 2011. Finding fidelity: Advancing audiovisual analysis
using software. FQS Forum: Qualitative Social Research 12(1): Art. 37.
Wang, Siwei and Gina-Anne Levow 2011. Contrasting multi-lingual prosodic cues to predict
verbal feedback for rapport. In: Yuji Matsumoto and Rada Mihalcea (eds.), Proceedings of
the 49th Annual Meeting of the Association for Computational Linguistics: Short Papers,
614–619. Portland, OR: Association for Computational Linguistics.
Elan: A professional framework for multimodality research. In: Nicoletta Calzolari, Aldo
Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, and Daniel Tapias (eds.), Proceedings
1022 V. Methods
of the 4th Language Resources and Evaluation Conference (LREC 2006), 1556–1559. Genoa,
Italy: European Language Resources Association (ELRA).
Woods, David K. and Paul G. Dempster 2011. Tales from the bleeding edge: The qualitative ana-
lysis of complex video data using Transana. FQS Forum: Qualitative Social Research 12(1),
Art. 17.
Susan Duncan, Chicago, IL (USA)

Katharina Rohlfing, Bielefeld (Germany)
Dan Loehr, Washington DC (USA)
67. NEUROGES – A coding system for the

empirical analysis of hand movement behaviour
as a reflection of cognitive, emotional, and
interactive processes
1. Aims of the NEUROGES coding system
2. The theoretical and empirical background of the NEUROGES coding system
3. Development of the NEUROGES system
4. Description of the NEUROGES coding system
5. The reliability and the validity of the NEUROGES system
6. References
Abstract
The NEUROGES coding system is a research tool for the empirical analysis of the spon-
taneously displayed hand movement behaviour that accompanies interaction, thinking,
and emotional experience. NEUROGES is designed for diagnostic purposes as well as
for basic research, i.e., to further explore the anatomy of hand movement behaviour
and its relation to cognitive, emotional, and interactive processes. Fields of application
are interaction analysis including psychotherapy, psychodiagnostics, and experimental
and clinical neuropsychology. In a multi-stage evaluation process resulting in more
and more fine-grained units, the behaviour is segmented and classified according to the
kinesic criteria. The objectivity, reliability, and validity of the NEUROGES categories
and values have been tested in several research studies, thus far including 263 participants,
healthy adults as well as patients with brain damage or mental disease. The analysis of the
group differences in hand movement behaviour as well as kinetographic and brain ima-
ging studies provide evidence that the NEUROGES categories and values are associated
with specific cognitive, emotional, and interactive processes.
1. Aims of the NEUROGES coding system

The NEUROGES coding system is a research tool for the empirical analysis of the
spontaneously displayed hand movement behaviour that accompanies interaction,
67. NEUROGES – A coding system for the empirical analysis 1023
thinking, and emotional experience. However, volitional hand movements such as tool
use can be analysed as well.
According to kinesic criteria the ongoing flow of spontaneous hand movements is
segmented into units and classified with values. For some values the association with
cognitive, emotional, and interactive processes is already empirically established and
thus, these can be used for diagnostic purposes. Fields of application are interaction
analysis including psychotherapy, psychodiagnostics, and experimental and clinical
neuropsychology. Furthermore, NEUROGES is designed for basic research, i.e., to fur-
ther explore the anatomy of hand movement behaviour and its relation to cognitive,
emotional, and interactive processes.
2. The theoretical and empirical background of the NEUROGES

coding system
The above outlined aims of NEUROGES system imply that hand movement behaviour
is linked to cognitive, emotional, and interactive processes. More specifically, hand
movements do not only reflect these processes, but likewise, they seem to affect them
(Fig. 67.1).
within-subject between-subjects
cognition emotion interaction

processing processing processing
hand movement behaviour
Fig. 67.1: Bi-directional link of hand movement behaviour and cognitive, emotional, and interac-
tive processes.
There is, in fact, ample empirical evidence that spontaneous hand movements are asso-
ciated with higher cognitive functions, such as language, spatial cognition, or praxis (e.g.,
Beattie and Shovelton 2006, 2009; Blonder et al. 1995; Butterworth and Hadar 1989;
Cohen and Otterbein 1992; de Ruiter 2000; de’Speratie and Stucchi 2000; Duffy and
Duffy 1989; Ehrlich, Levine and Goldin-Meadow 2006; Emmorey, Tversky and Taylor
2000; Foundas et al. 1995; Fricke 2007; Garber and Goldin-Meadow 2002; Goldin-
Meadow 2006; Haaland and Flaherty 1984; Hermsdörfer et al. 1996; Kita 2000; Kita
and Özyürek 2003; Krauss, Chen and Chawla 1996; Lausberg and Kita 2003; Lausberg
et al. 2007; Lavergne and Kimura 1987; Le May, David and Thomas 1988; Liepmann
1908; McNeill 1992, 2005; Müller, 1998; Ochipa, Rothi and Heilman 1994; Parsons
et al. 1998; Poizner et al. 1990; Sassenberg et al. 2011; Seyfeddinipur, Kita and Indefrey
2008; Sirigu et al. 1996; Wartenburger et al. 2010). Likewise, it has been demonstrated
that hand movements are related to emotional processes and that they may reflect
psychopathology (e.g., Berger 2000; Berry and Pennebaker 1993; Cruz 1995; Darwin
1890; Davis 1981, 1997; Ekman and Friesen 1969, 1974; Ellgring 1986; Freedman
1972; Freedman and Bucci 1981; Freedman and Hoffmann 1967; Gaebel 1992; Krout
1024 V. Methods
1935; Lausberg 2011; Lausberg and Kryger 2011; Mahl 1968; Sainsbury 1954; Scheflen
1974; Ulrich 1977; Ulrich and Harms 1985; Wallbott 1989; Willke 1995). Furthermore,
hand movements serve to implicitly and explicitly regulate interactive processes (e.g.,
Birdwhistell 1952; Davis 1997; Dvoretska 2009; Kryger 2010; Lausberg 2011; Scheflen
1973, 1974) and to communicate information (e.g., Cohen and Otterbein 1992; Cook
and Goldin-Meadow 2006; Feyereisen 2006; Holle et al. 2010).
Thus, one rationale for the development of the NEUROGES system has been to
integrate the existing empirical knowledge on movement types for which the link to
specific cognitive, emotional, and interactive functions had already been established,
such as for self-touch and stress. These types are suited for diagnostic purposes.
However, as many aspects of the bi-directional link between hand movements and
higher cognitive and emotional functions remain to be explored, the NEUROGES sys-
tem has also been designed to suit exploratory research. First, the anatomy of hand
movement behaviour is examined, such as the duration and the sequencing of certain
types of movement behaviour. And second, the correlation between hand movement
behaviour units and cognitive, emotional, and interactive parameters are investigated
(see below). This procedure implies that hand movements are classified first by kinesic
features alone, i.e., independently from other functions such as speech.
3. Development of the NEUROGES system

First, a critical review of the existing coding systems in gesture research, psychology,
psychosomatics, psychiatry, psychotherapy, neurology, neuropsychology, and anthropol-
ogy was conducted (e.g., M. Davis 1991; Dell 1979; Efron 1941; Ekman and Friesen
1969; Freedman 1972; Kendon 2004; Kimura 1973a,b; Laban 1988; McNeill 1992; Müller
1998). Among these, only few systems are suited for segmenting the stream of hand
movement behaviour and for classifying all hand movements based on kinesic criteria
alone. With these aims in mind, the most influential coding systems for the development
of the NEUROGS Module I were the coding system by Norbert Freedman (1972), the
Movement Signature Analysis by Davis (1991), and the Laban Movement Analysis
(1988). For the development of Module III the Efron coding system for gestures
(1941) was seminal.
Second, as indicated above, hand movement types were listed for which the associ-
ation with cognitive, emotional, and interactive processes had been empirically estab-
lished. Among these were those types that had been explored in the author’s
neuropsychological studies (see Lausberg this volume).
Based on the literature review and the author’s own research, in a process of
repeated testing of the categories and values with the aim to classify all hand move-
ments that occurred in a large sample of subjects, three concerted modules were
conceptualized. Module I (Kinesics), serves to segment and classify the stream of
hand movement behaviour based on kinesic features such as the trajectory and the loca-
tion where the hand acts. Recent studies indicate that the Module I categories reflect
the complexity of cognitive-motor processes and the focus of attention. Module II
(Relation between both hands) codes the relation between the two hands during simul-
taneous moving. The Module II categories reflect interhemispheric coordination and
hemispheric dominance. This module enables to investigate the neurobiology of certain
hand movement types (see Lausberg this volume). Finally, module III (Cognition/
Emotion) refers to emotional, cognitive, interactive, physical, or practical functions of

hand movements. Thus, the NEUROGES coding system comprises a multi-stage eval-
uation process in which, at each stage, the hand movement behaviour is segmented and
classified, resulting in more and more fine-grained units of behaviour.
To ensure the objectivity of the system, the values were operationalized by kinesic
criteria. Over time, the process of the increasing operationalization including a repeated
testing of the inter-rater agreement resulted in a comprehensive coding manual
(Lausberg forthcoming) with an interactive training CD (Bryjovà, Slöetjes and Laus-
berg forthcoming). In order to structure the evaluation process from module I through
module III, algorithms with several decision steps are provided that lead to the correct
value for the unit (see Figs. 67.2–67.4 below).
4. Description of the NEUROGES coding system

4.1. Module I: Kinesics
4.1.1. The Activation category (Module I step 1)
In the first evaluation step, the stream of hand movement behaviour – right and left
upper limbs, i.e., finger(s), hand, arm, and shoulder – is segmented into Activation
units based on the criteria motion, anti-gravity position, and muscle contraction. Two
Activation values are distinguished:
(i) movement,
(ii) rest position/posture
(Fig. 67.2, step 1). The Activation category provides a general Impression of the sub-
ject’s level of arousal.
4.1.2. The Structure category (Module I step 2)

The movement units as identified in step 1 are classified according to the Structure cat-
egory. The Structure of a movement unit is the “kinesic construction” of the movement.
It is defined by the trajectory, by the presence/absence of efforts (Laban 1988), by the
presence/absence of hand shaping, and as meta-criteria by the presence/absence of
phases and by the position of the unit in the segmented behaviour relative to other
units. Five Structure values are distinguished:
(i) irregular,
(ii) repetitive,
(iii) phasic,
(iv) aborted,
(v) shift
(Fig. 67.2, step 2). If the Structure value changes within the given unit, the unit is seg-
mented into subunits (this priniciple of subunit generation applies to all following cod-
ing steps). The Structure category reflects the level of complexity of the motor
1026 V. Methods
processes. It ranges from continuous non-conceptual (irregular) to discrete conceptual

( phasic).
4.1.3. The Focus category (module I step 3)

The phasic, repetitive, and irregular units that have been identified in step 2 are further
classified according to the Focus. The Focus is defined as the entity that the hand acts
on. The Focus category is operationalized by the presence/absence of dynamic contact,
the counter-part, and the orientation. Six Focus values are distinguished:
(i) within body,

(ii) on body,
(iii) on attached object,
(iv) on separate object,
(v) on person,
(vi) in space
(Fig. 67.2, step 3). The Focus category reflects the subject’s focus of attention. It ranges
from internal (within body) to external (in space).
4.2. Module II: Relation between the hands

4.2.1. The Contact category (module II step 1)
The temporal overlaps of the right hand and left hand StructureFocus units constitute
the new units for the module II assessment. For these units, in which both hands move
simultaneously, first the Contact category is evaluated. It is operationalized by the pres-
ence/absence of physical contact between the hands and the quality of that contact.
Three Contact values are distinguished:
(i) act as a unit,

(ii) act on each other,
(iii) act apart
(Fig. 67.3, step 1). The Contact category is related to the coordination between the two
hemispheres. It ranges from simple (act as a unit) to complex (act apart).
4.2.2. The Formal Relation category (module II step 2)

The phasic and repetitive Contact units are further evaluated concerning the Formal
Relation. This category is operationalized by the criteria dominance and symmetry.
Four Formal Relation values are distinguished:
(i) symmetrical,
(ii) right hand dominance,
(iii) left hand dominance,
(iv) asymmetrical,
Module I: Start
Step 1:
When does the Activation begin and when does it end?
movement rest position/

posture
Step 2:
Where is the Structure?
(only for movement Activation units)
irregular repetitive phasic aborted shift
Step 3:
Where is the Focus?
(only for phasic, repetitive, and irregular Structure units)
on attached on separate in space

within body on body on person (except for
object object irregular)
Fig. 67.2: Coding algorithm for module I
(Fig. 67.3, step 2). The Formal Relation category is related to hemispheric dominance. It
ranges from no suppression (symmetrical) to bilateral suppression (asymmetrical).
4.3. Module III: Cognition/Emotion

4.3.1. The Function category
The phasic and repetitive units and the respective bimanual units are submitted to the
Function analysis. While in Modules I and II the rater is demanded to focus on specific
kinesic features, in Module III the recognition of the Function of a hand movement is
demanded. The Function is a complex phenomenon that emerges a cluster of para-
meters which considers: Structure, Focus, Contact, Formal Relation, body external
1028 V. Methods
Module II: Start
Step 1:
How is the Contact between the hands if both move simultaneously?
(temporal overlaps of right hand and left hand StructureFocus units)
act on each
act as a unit act apart
other
Step 2:
How is the Formal Relation between the hands?
(only for phasic and repetitive Contact units)
right hand left hand

symmetrical asymmetrical
dominance dominance
Fig. 67.3: Coding algorithm for module II
space, path during main phase, hand orientation, hand shape, efforts, body involvement,
gaze, cognitive perspective (meta criterion), frequency, and duration.
Eleven Function values are distinguished:
(i) emotion attitude

(ii) emphasis
(iii) egocentric deictic
(iv) egocentric direction
(v) phantomime
(vi) form presentation
(vii) spatial relation presentation
(viii) motion quality presentation
(ix) emblem
Module III: Start
Step 1:
What is the Function?
(only for unimanual and bimanual phasic, repetitive, and shift units)
spatial motion object- subject-

emotion/ egocentric egocentric form
emphasis pantomime relation quality emblem oriented oriented
attitude deictic direction presentation
presentation presentation action action
Step 2:
What is the Type?
(specific Types for each Function value)
external
rise baton neutral transitive shape route manner
target
super-
fall You imperative intransitive size position dynamics
imposed
clap/beat back-toss self self-related
shrug palm-out body
palming
fist
clenching
opening
closing
Fig. 67.4: Coding algorithm for module III steps 1 and 2
(x) object-oriented action

(xi) subject-oriented action
(Fig. 67.4 step 1, top row). The Function category refers to the emotional, cognitive, in-
teractive, physical, and practical functions of hand movements. This definition implies
that also those hand movements that are displayed beyond the gesturer’s awareness
have a function. As an example, a seemingly purposeless on body movement may
serve psychodynamically for self-regulation.
4.3.2. The Type category

The Type category directly depends on the Function category (Fig. 67.4, step 2). Thus,
each Function value. For emotion/attitude movements, the Type values refer to the
direction of the movement as an embodiment of emotion and attitude. For emphasis
gestures, the Type values refer to the kinesic form used to create emphasis. For
1030 V. Methods
egocentric deictic gestures, the Type values specify the target. For egocentric direction
gestures, the Type values specify the agent who takes the direction. For pantomime ges-
tures, they register transitivity. For form presentation, spatial relation presentation, and
motion presentation, the Type value classify the physical aspects that are presented in
gesture. For emblems, instead of specific Type values, a list of commonly used emblem-
atic gestures is provided.
4.4. NEUROGES in combination with the annotation tool ELAN

It is strongly recommended to use the NEUROGES kinesic coding system in combina-
tion with the annotation tool ELAN (Lausberg and Slöetjes 2009). For this purpose, the
NEUROGES coding system has been transferred into a NEUROGES-ELAN-template.
5. The reliability and the validity of the NEUROGES system

The reliability and the validity of the NEUROGES values were investigated in a meta-
analysis on several recent empirical studies using the NEUROGES system including
altogether 370 participants (e.g., Dvoretska 2009; Hogrefe et al. under review; Kryger
2010; Lausberg et al. 2007; Sassenberg, Helmich and Lausberg 2010; Sassenberg et al.
2011; Skomroch et al. 2010; Wartenburger et al. 2010). The participants were Germans,
U.S. Americans, francophone Canadians, Koreans and Papua New Guineans. In addi-
tion to healthy participants, participants with brain damage and mental disorders
were examined. Furthermore, neuroimaging and kinematographic studies were con-
ducted to test the NEUROGES values (Helmich et al. 2011; Lausberg et al. 2010;
Rein forthcoming; Wartenburger et al. 2010).
5.1. Reliability
Inter-rater reliability was used to assess the quality of the operationalization of the
NEUROGES values. As the NEUROGES coding procedure comprises segmentation
and classification, the raters’ agreement not only concerns the value that is chosen
for the unit but also the segmentation, i.e., if the raters agree on when a unit with a spe-
cific value starts and ends, when the next unit starts, and ends, etc. Since, thus far, only
statistical measures are available which refer to the categorial agreement, Holle and
Rein developed a novel algorithm (a modified Cohen’s kappa) that takes into account
the raters’ agreement concerning behaviour segmentation (forthcoming). In the experi-
mental studies that were included in the meta-analysis this novel algorithm was applied.
With a few exceptions, for all NEUROGES values the inter-rater agreement was mod-
erate to substantial. Especially with regard to the fact that not only the categorial but also
the temporal agreement was considered, this level of inter-rater agreement indicates an
overall good objectivity of the NEUROGES values.
Furthermore, to examine intra-rater retest reliability, the same rater coded the same
videos with a time interval of 2 years. There was substantial agreement, further indicat-
ing a good operationalization of the NEUROGES values.
Kinematography was used to establish parallel-forms reliability. Movement units
that had been assessed by raters with the Module I Structure category criteria were ana-
lysed with the electromagnetic motion capture system Polhemus Liberty© (Colchester,
VT), which records the displacement and orientation (Rein forthcoming). The five
structure values phasic, repetitive, shift, aborted, and irregular (see section 4.1.2.) were
reliably distinguished by kinematography and matched the raters’ classifications.
5.2. Exploring the validity of the NEUROGES values

5.2.1. The neurobiology of the NEUROGES values
Several studies have been conducted to examine the brain hemispheres and areas in-
volved in the generation of certain NEUROGES values. The studies on split-brain pa-
tients as well as the studies on healthy subjects, in which hand preference was used as an
indicator of hemispheric specialisation in the generation of certain NEUROGES
values, are reported in detail in Lausberg (this volume). Furthermore, neuroimaging
studies with functional Magnetic Resonance Imaging (fMRI) and Near Infra-Red
Spectroscopy (NIRS) were conducted to investigate the neurobiological correlates
of Module III values. The performance of tool use demonstrations, pantomime–transitive–
enclosure (holding the imaginary object in hand when demonstrating tool use), and pan-
tomime–transitive–hand-as-object (the hand itself represents the tool) are accompanied by
different cerebral activation patterns (Helmich et al. 2011; Lausberg et al. 2010). While
tool use demonstration with tool in hand was accompanied by bihemispheric activation,
among the pantomime gestures those with the Technique of Presentation enclosure
showed a left hemispheric activation and those with the Technique of Presentation
hand-as-object a right hemisphere activation. In study, in which the structural Magnetic
Resonance Imaging was related to hand movement behaviour during a geometric anal-
ogy task, a larger cortical thickness in the left Broca’s area and transverse temporal cortex
was found in participants who produced presentation gestures and, in particular, motion
presentation gestures as compared to those who did not (Wartenburger et al. 2010).
For Module I values, the hemispheric specialisation was investigated in patients with
right hemisphere damage (RBD) and patients with left hemisphere damage (LBD)
(Hogrefe et al. under review; Skomroch et al. 2010). The left hemisphere damage pa-
tients displayed (with their left non-paretic hands) more phasic and repetitive in space
movements than the right hemisphere damage patients (with their right non-paretic
hands). In contrast, the right hemisphere damage group displayed a higher amount of
irregular on body movements than left hemisphere damage patients. The data provide
evidence for different neurobiological correlates of phasic and repetitive in space move-
ments as compared to irregular on body movements.
5.2.2. Cognitive and emotional functions

Individuals scoring high in fluid intelligence showed a higher accuracy in a geometric
analogy task and produced more gestures (as identified in Module III) when relating
to most relevant aspect of the task (Sassenberg et al. 2011). More specificically, their
gestural behaviour was characterized by a high amount of motion presentation gestures,
the use of which implies a non-egocentric cognitive perspective. Currently, the relation
between intelligence quotient and hand movement behaviour is further investigated in
85 participants.
1032 V. Methods
Indirect evidence for the association between specific NEUROGES values and spe-
cific cognitive and emotional functions is provided by using different experimental set-
tings that challenge the participants differently with regard to these functions. As an
example, an intelligence test may require information retrieval or arithmetic abilities,
whereas the narration of funny animated cartoons without speech may induce joy
and require mental imagery. In the meta-analysis, the pattern of the Structure values
differed between the experiments. Experimental settings that induced a pressure to per-
form (prototyp: intelligence test) were associated with more irregular and shifts units. In
contrast, a high amount of phasic, repetitive, and aborted units and a low frequency of
irregular and shift units were found in experiments that animated the participants (pro-
totype: narration of animated cartoons). The findings indicate a dichotomy between
phasic, repetitive, and aborted units on one hand, and irregular and shift units on the
other with regard to the level of cognitive processing (non-conceptual vs. conceptual).
Likewise, for the Focus category a clear picture emerged of in space dominant and on
body dominant experiments. Experiments that elicited visual imagery were accompanied
by a high rate of in space units. In contrast, stress-inducing experiments were character-
ized by a high frequency of on body units. The finding is in line with the proposition that
the Focus in space offers the most options for expressive demonstrations whereas the
Focus on body serves self-regulation.
5.2.3. Personality traits and psychopathology

Dvoretska (2009) examined the relation between NEUROGES Module I and the per-
sonality inventory NEO-FFI in 40 males and 40 females, who were videotaped during
their dyadic interaction. Neuroticism correlated negatively with the frequency of phasic
and repetitive in space gestures. Furthermore, there was a negative correlation between
the amount of agreeableness and the frequency of on body movements.
Alexithymia as measured with the Toronto Alexithymia Scale is the inability to ver-
bally express emotions. A study on 33 alexithymic subjects (17 female, 16 male) and 33
non-alexithymic ones (17 female, 16 male) evidenced that alexithymia is associated with
gender-specific alterations in hand movement behaviour (Sassenberg, Helmich and
Lausberg 2010). While the alexithymic women showed a reduction of phasic and repet-
itive in space movements as compared to the non-alexithymic women, the reverse was
found for the alexithymic versus non-alexithymic men. Furthermore, more shifts were
displayed by the alexithymic men than by the non-alexithymic men, while the reverse
was found for alexithymic versus non-alexithymic women.
Kryger (2010) examined hand movement behaviour in the course of two successful
psychotherapies in patients with eating disorders. The decrease of irregular on body
movements correlated with clinical improvement. Furthermore, in one patient, the use
of egocentric deictics in the course of the psychotherapy was explored as an indicator of
changes in the relationship to the significant other (Lausberg and Kryger 2011). At the
beginning of therapy the patient had localized her mother in the gesture space close
to the body center by using egocentric deictics, at the end of therapy she projected her
mother distant from her body in the left gesture space. Psychodynamically, at the begin-
ning of the therapy, the patient hardly differentiated between herself and her mother,
whereas at the end, the patient experienced herself and her mother as separate persons.
5.2.4. NEUROGES and interactive processes

Since long is has been documented that interaction partners show a temporal attune-
ment of their verbal utterances (labelled “turn-taking” by Sacks, Schegloff and Jefferson
1974). Accordingly, the kinesic interaction can be investigated with NEUROGES in
terms of turn-taking, i.e., if the interactive partners’ units are overlapping or subsequent
(Lausberg 2011).
The interactive patterns in three patient-therapist dyads in psychotherapy sessions
were examined with the NEUROGES extended version that includes head, trunk,
and leg/foot movements (Lausberg 2011). In all dyadic interactions, subsequent turns
were more frequent than overlapping turns. If one partner starts to move, it is most
likely that she/he starts to move when the other partner has finished or is about to finish
his/her movement. However, most importantly, the kinesic turn-taking was not a mere
reflection of verbal turn-taking, since not only gestures that accompany speech were ana-
lyzed but all discrete movements including self-touches, foot movements, positions shifts,
etc. The regularity of the kinesic turn-taking suggests that the interactive partners’ move-
ment units are implicit reactions to the each other, i.e., on the kinesic level, there is an
ongoing interaction that is partially independent of the verbal interaction.
The subsequent studies focussed on the kinesic turn-taking patterns for specific
NEUROGES values. The mutual understanding in 40 same-sex dyads (20 male, 20
female) was measured with a self-questionnaire by Denissen (2005) and an observer
rating. In the group with good mutual understanding as compared to the group with
bad mutual understanding, there were more overlapping head movements, less overlap-
ping right hand movements, and less overlapping on body movements (Dvoretska, De-
nissen and Lausberg submitted). In the above mentionned study by Kryger (2010) also
the kinesic interaction between patients and psychotherapist was examined. Psy-
chotherapy sessions with a high proportion of overlapping hand movements that
were of the same Structure (Focus) value, e.g. patient and therapist simultaneously
performed shifts, were associated with a good therapeutical relationship as assessed
by a post-session questionnaire step. In contrast, sessions for with the therapeutical
relationsship was considered to be less effective showed a higher proportion of simul-
taneous hand movements of different Structure (Focus) values, e.g. the patient per-
forms an irregular on body movement while the therapist performs a phasic in space
movement.
Acknowledgments
The development of the NEUROGES coding system was supported by the German
Research Association (DFG) grants LA 1249/1–3.
6. References
Beattie, Geoffrey and Heather Shovelton 2006. When size really matters: How a single semantic
feature is represented in the speech and gesture modalities. Gesture 6(1): 63–84.
Beattie, Geoffrey and Heather Shovelton 2009. An exploration of the other side of semantic com-
munication. How the spontaneous movements of the human hand add crucial meaning to nar-
rative. Semiotica 184(1/4): 33–51.
1034 V. Methods
Berger, Miriam R. 2000. Movement patterns in borderline and narcissistic personality disorders.
Dissertation Abstracts International: Section B: The sciences and engineering 60(9/B): 4875.
Berry, Diane S. and James W. Pennebaker 1993. Nonverbal and verbal emotional expression and
health. Psychotherapy and Psychosomatics 59: 11–19.
Birdwhistell, Ray 1952. Introduction to Kinesics: An Annotation System for the Analysis of Body
Motion and Gesture. Washington, DC: Foreign Service Institute.
Blonder, Lee X., Allan F. Burns, Dawn Bowers, Robert W. Moore and Kenneth M. Heilma 1995.
Spontaneous gestures following right hemisphere infarct. Neuropsychologia 33: 203–213.
Bryjovà, Janka, Han Slöetjes and Hedda Lausberg under review. The interactive NEUROGES
training CD In: Hedda Lausberg (ed.), NEUROGES – The Neuropsychological Gesture Cod-
ing System. Berlin: Peter Lang.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech, and computational stages: A reply to
McNeill. Psychological Review 96: 168–174.
Cohen, Ronald L. and Nicola Otterbein 1992. The mnemonic effect of speech gestures: Pantomimic
and non-pantomimic gestures compared. European Journal of Cognitive Psychology 4: 113–139.
Cruz, Robin F. 1995. An empirical investigation of the movement psychodiagnostic inventory. Dis-
sertation Abstracts International: Section B: The Sciences and Engineering 57(2/B): 1495.
Darwin, Charles R. 1890. The Expression of the Emotions in Man and Animals. London: Penguin
Group.
Davis, Caroline 1997. Eating disorders and hyperactivity: A psychobiological perspective. Cana-
dian Journal of Psychiatry 42: 168–175.
Davis, Martha 1981. Movement characteristics of hospitalized psychiatric patients. American Jour-
nal of Dance Therapy 4(1): 52–71.
Davis, Martha 1991 rev. 1997. Guide to movement analysis methods. Behavioral Measurement Da-
tabase Services, P.O. Box 110287, Pittsburgh, PA.
Dell, Cecily 1979. A primer for movement description using effort-shape and supplementary con-
cepts. New York: Dance Notation Bureau.
Denissen, Jacobus J. A. 2005. Understanding and being understood: The impact of intelligence and
dispositional valuations on social relationships. Ph.D. dissertation, Humboldt-University Berlin.
De Ruiter, Jan-Peter 2000. The production of speech and gesture. In: David McNeill (ed.), Lan-
De’Sperati Claudio and Natale Stucchi 2000. Motor imagery and visual event recognition. Exper-
imental Brain Research 133: 273–278.
Duffy, Robert J. and Joseph R. Duffy 1989. An investigation of body part as object (BPO) re-
sponses in normal and brain-damaged adults. Brain and Cognition 10: 220–236.
Dvoretska, Daniela 2009. Kinetische Interaktion und Koordination. Diploma thesis, Department of
Psychology, Department of Mathematics and Natural Sciences II, Humboldt-University Berlin.
Dvoretska, Daniela, Jaap Denissen and Hedda Lausberg submitted Intra-dyadic kinesic turn tak-
ing and mutual understanding.
Efron, David 1941. Gesture and Culture. The Hague: Mouton.
Ehrlich, Stacy B., Susan C. Levine and Susan Goldin-Meadow 2006. The importance of gesture in
children’s spatial reasoning. Developmental Psychology 42(6): 1259–1268.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behaviour: Categories, ori-
Ekman, Paul and Wallace V. Friesen 1974. Nonverbal behavior and psychopathology. In: R. J.
Friedman Raymond J. and Martin M. Katz (eds.), The Psychology of Depression, 203–232.
New York: John Wiley and Sons.
Ellgring, Heiner 1986. Nonverbal expression of psychological states in psychiatric patients. Euro-
pean Archives of Psychiatry and Neurological Sciences 236: 31–34.
Emmorey, Karen, Barbara Tversky and Holly A. Taylor 2000. Using space to describe space: Per-
spective in speech, sign, and gesture. Spatial Cognition and Computation 2: 157–180.
Feyereisen, Pierre 2006. Further investigation on the mnemonic effect of gestures: Their meaning
matters. European Journal of Cognitive Psychology 18(2): 185–205.
Foundas, Anne L., Beth L. Macauley, Anastasia M. Raymer, Lynn M. Mahler, Kenneth M. Heil-
man and Leslie J. G. Rothi 1995. Gesture laterality in aphasic and apraxic stroke patients.
Brain and Cognition 29: 204–213.
Freedman, Norbert 1972. The analysis of movement behavior during the clinical interview. In:
Aron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175.
New York: Pergamon.
Freedman, Norbert and Wilma Bucci 1981. On kinetic filtering in associative monologue. Semio-
tica 34(3/4): 225–249.
Freedman, Norbert and Stanley P. Hoffmann 1967. Kinetic behaviour in altered clinical states:
Approach to objective analysis of motor behaviour during clinical interviews. Perceptual and
Motor Skills 24: 527–539.
Fricke, Ellen 2007. Origo, Geste und Raum. Berlin: Walter de Gruyter.
Gaebel, Wolfgang 1992. Non-verbal behavioural dysfunction in schizophrenia. British Journal of
Psychiatry 161(suppl.18): 65–74.
Garber, Philip and Susan Goldin-Meadow 2002. Gesture offers insight into problem-solving in
adults and children. Cognitive Science 26: 817–831.
Goldin-Meadow, Susan 2006. Talking and thinking with our hands. Current Directions in Psycho-
logical Science 15(1): 34–39.
Haaland, Kathleen Y. and David Flaherty 1984. The different types of limb apraxia errors made
by patients with left vs. right hemisphere damage. Brain and Cognition 3: 370–384.
Helmich, Ingo, Robert Rein, Henning Holle, Christoph Schmitz and Hedda Lausberg 2011. Brain
oxygenation in gesture production. Differences between tool use demonstration, tool use panto-
mimes and body-part-as-object. XI International Conference on Cognitive Neuroscience,
ICON XI Mallorca – Spain, 25–29 Sept. 2011.
Hermsdörfer, Joachim, Norbert Mai, Josef Spatt, Christian Marquardt, Roland Veltkamp and Georg
Goldenberg 1996. Kinematic analysis of movement imitation in apraxia. Brain 119: 1575–1586.
Hogrefe, Katharina, Georg Goldenberg, Robert Rein, Harald Skomroch and Hedda Lausberg
under review. The impact of right and left hemisphere brain damage on gesture kinetics and
gesture location.
Holle, Henning, Joans Obleser, Shirley-Ann Rueschemeyer and Thomas C. Gunter 2010. Integra-
tion of iconic gestures and speech in left superior temporal areas boosts speech comprehension
under adverse listening conditions. NeuroImage 49: 875–884.
Holle, Henning and Robert Rein forthcoming. Assessing interrater agreement of movement an-
notations. In: Hedda Lausberg (ed.), NEUROGES – The Neuropsychological Gesture Coding
System. Berlin: Peter Lang.
Kimura, Doreen 1973a. Manual activity during speaking – I. Right-handers. Neuropsychologia 11:
45–50.
Kimura, Doreen 1973b Manual activity during speaking – II. Left-handers. Neuropsychologia 11: 51–55.
guage and Gesture: Window into Thought and Action, 162–185. Cambridge: Cambridge Univer-
sity Press.
Krauss, Robert M., Yihsiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonverbal
communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.), Ad-
vances in Experimental Social Psychology 28, 389–450. Tampa, FL: Academic Press.
Krout, Maurice H. 1935. Autistic gestures. Psychological Monographs 46(4): i–126.
1036 V. Methods
Kryger, Monika 2010. Bewegungsverhalten von Patient und Therapeut in als gut und schlecht er-
lebten Therapiesitzungen. Diploma thesis, Department of Neurology, Psychosomatic Medicine,
and Psychiatry; Institute of health promotion and clinical movement science, German Sport
University Cologne.
Laban, Rudolf 1988. The Mastery of Movement. Worcester: Billing & Sons.
Lausberg, Hedda 2011. Das Gespräch zwischen Arzt und Patientin: Die bewegungsanalytische
Perspektive. Balint Journal 12: 15–24.
Lausberg, Hedda to appear. NEUROGES – The Neuropsychological Gesture Coding System.
Berlin: Peter Lang.
Lausberg, Hedda, Henning Holle, Philipp Kazzer, Hauke Heekeren and Isabell Wartenburger
2010. Differential cortical mechanisms underlying tool use, pantomime, and body-part-as-
object use. Abstract book, 16th Annual Meeting of the Organization for Human Brain Mapping
Barcelona, June 2010.
co-speech gestures and in gesturing without speaking. Brain and Language 86: 57–69.
Lausberg, Hedda and Monika Kryger 2011. Gestisches Verhalten als Indikator therapeutischer
Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Reprä-
sentation von Objektbeziehungen in gestischen Darstellungen. Psychotherapie-Wissenschaft
1(1): 41–55.
Lausberg, Hedda and Han Slöetjes 2009. Coding gestural behaviour with the NEUROGES-
ELAN system. Behaviour Research Methods 41(3): 841–849.
Le May, Amanda, Rachel David and Andrew P. Thomas 1988. The use of spontaneous gesture by
aphasic patients. Aphasiology 2(2): 137–145.
Liepmann, Hugo 1908. Drei Aufsätze aus dem Apraxiegebiet. Berlin: Karger.
Mahl, George F. 1968. Gestures and body movements in interviews. Research in Psychotherapy 3:
295–346.
of Chicago Press.
lin: Arno Spitz.
Ochipa, Cynthia, Leslie J. G. Rothi and Kenneth M. Heilman 1994. Conduction apraxia. Journal
of Neurology, Neurosurgery, and Psychiatry 57: 1241–1244.
Parsons, Lawrence M., John D. E. Gabrieli, Elizabeth A. Phelps and Michael S. Gazzaniga 1998.
Cerebrally lateralized mental representations of hand shape and movement. Journal of Neu-
roscience 18: 6539–6548.
Poizner, Howard, Linda Mack, Mieka Verfaellie, Leslie J. G. Rothi and Kenneth M. Heilman
1990. Three-dimensional computergraphic analysis of apraxia. Brain 113: 85–101.
Rein, Robert forthcoming. Using 3D kinematics for segmentation of hand movement behavior:
A pilot study and some further suggestions. In: Hedda Lausberg (ed.), NEUROGES – The
Neuropsychological Gesture Coding System. Berlin: Peter Lang.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the or-
ganisation of turn taking for conversation. Language 50: 696–735.
Sainsbury, Peter 1954. A method of measuring spontaneous movements by time-sampling motion
picture. Journal of Mental Science 100a: 742–748.
Sassenberg, Uta, Manja Foth, Isabell Wartenburger and Elke van der Meer 2011. Show your hands –
are you really clever? Reasoning, gesture production, and intelligence. Linguistics 49(1): 105–134.
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1037
Sassenberg, Uta, Ingo Helmich and Hedda Lausberg 2010. Awareness of emotions: Movement be-
haviour as indicator of implicit emotional processes in participants with and without alexity-
hmia. In: Haack, Wiese, Abraham, Chiarcos (eds.), Proceedings of KogWis: 10th Biannual
Meeting of the German Society for Cognitive Science, 169. Potsdam, Germany.
Scheflen, Albert E. 1973. Communicational Structure: Analysis of a Psychotherapy Transaction.
Bloomington: Indiana University Press.
Scheflen, Albert E. 1974. How Behaviour Means. New York: Anchor/Doubleday.
Seyfeddinipur, Mandana, Sotaro Kita and Peter Indefrey 2008. How speakers interrupt them-
selves in managing problems in speaking: Evidence for self-repairs. Cognition 108: 837–842.
Sirigu, Angela, Jean-Rene Duhamel, Laurent Cohen, Bernard Pillon, Bruno Dubois and Yves
Agid 1996. The mental representation of hand movements after parietal cortex damage.
Science 273: 1564–1568.
Skomroch, Harald, Robert Rein, Katharina Hogrefe, Georg Goldenberg and Hedda Lausberg
2010. Gesture production in the right and left hemispheres during narration of short movies.
Conference Proceedings, International Society for Gesture Studies Frankfurt/Oder, Germany,
25–30 July 2010.
Ulrich, Gerald 1977. Videoanalytische Methoden zur Erfassung averbaler Verhaltensparameter
bei depressiven Syndromen. Pharmakopsychiatrie 10: 176–182.
Ulrich, Gerald and K. Harms 1985. Video analysis of the non-verbal behaviour of depressed pa-
tients before and after treatment. Journal of Affective Disorders 9: 63–67.
Wagner Cook, Susan and Susan Goldin-Meadow 2006. The role of gesture in learning: Do children
use their hand to change their minds? Journal of Cognition and Development 7(2): 211–232.
Wallbott, Harald G. 1989. Movement quality changes in psychopathological disorders. In B. Kir-
kaldy (Ed.), Normalities and abnormalities in human movement. Medicine and Sport Science
29: 128–146.
Wartenburger, Isabell, Esther Kühn, Uta Sassenberg, Manja Foth, Elizabeth A. Franz and Elke
van der Meer 2010. On the relationship between fluid intelligence, gesture production, and
brain structure. Intelligence 38: 193–201.
Willke, S. 1995. Die therapeutische Beziehung in psychoanalytisch orientierten Anamnesen und
Psychotherapien mit neurotischen, psychosomatischen und psychiatrischen Patienten. DFG-
Bericht Wi 1213/1–1.
Hedda Lausberg, Cologne (Germany)
68. Transcription systems for gestures, speech,

prosody, postures, and gaze
1. Introduction
2. Transcription systems for gesture
3. Transcription systems for speech
4. Transcription of prosody
5. Transcription of body posture
6. Transcription systems for gaze
7. Conclusion
8. References
1038 V. Methods
Abstract
The chapter presents a concise overview of existing transcription system for gestures,
speech, prosody, postures, and gaze aiming at a presentation of their theoretical back-
grounds and methodological approaches. After a short introduction discussing the under-
standing of the term “transcription”, the article first focuses on transcription systems in
modern gesture research and discusses systems from the field of gestures research, non-
verbal communication, conversation analysis, and artificial agents (e.g., Birdwhistell
1970; Sager 2001; Martell 2002; Bressem this volume; Gut et al. 2002; Kipp, Neff, and Al-
brecht 2007; Lausberg and Sloetjes 2009; HIAT 1, GAT). Afterwards, the paper presents
well-known transcription systems for speech from the field of linguistics, conversation,
and discourse analysis (e.g., IPA, HIAT, CHAT, DT, GAT). Apart from systems
for describing speech, the article also focuses on systems for the transcription of prosody.
In doing so, the paper discusses prosodic descriptions within the field of conversation
analysis, discourse analysis, and linguistics (HIAT2, GAT2, TSM, ToBI, PROLAB,
INTSINT, SAMPROSA). The last sections of the paper focus on the transcription of
body posture and gaze (e.g., Birdwhistell 1970; Ehlich and Rehbein 1982; Goodwin
1981; Schöps in preparation; Wallbott 1998).
1. Introduction
The term transcription goes back to the Latin word trānsscrı̄bere meaning ‘to overwrite’
or ‘to rewrite’ (Bußmann 1990: 187). In a linguistic understanding, transcription refers
to the notation of spoken language in written form. More specifically, it is understood as
the reproduction of communicative events using alphabetic resources and specific sym-
bols while capturing the characteristics and specifics of spoken language (Dittmar
2004). Transcription must be understood as a scientific working method directed to
the analytical needs of a scientist by freezing oral communication and making it acces-
sible to thorough inspection (Redder 2001: 1038; see also Bohle this volume for a gen-
eral discussion). However, today, transcription is not only restricted to linguistics.
Investigating interaction and communication is part of a number of scientific disciplines,
such as ethnography, sociology, psychology, neurology, and biology for instance, which
face comparable problems and obstacles in making communicative behavior analyz-
able. Accordingly, the term transcription is no longer restricted to the notation of spo-
ken language, but also includes the notation of bodily behavior, such as gesture, posture,
and gaze.
2. Transcription systems for gesture

Research on gestures is characterized by a variety of transcription systems developed
within a range of differing disciplines. Foci of the systems vary immensely in their the-
oretical and methodological perspectives, and tend to settle in an extreme, from a pure
form description over a combination of form and possible meanings to a rudimentary
description of gestures. The following section will discuss transcription systems from
the field of gesture research, nonverbal communication, conversation analysis, and
artificial agents along the line of this threefold distinction.
2.1. Systems focusing on gestures’ form

2.1.1. Birdwhistell: A kinesic notation of bodily motion
Starting from the assumption that the kinesic structure of bodily motion is set up in par-
allel to the linguistic structure of spoken language, Birdwhistell developed a notation
scheme, which accounts for the hierarchical set up of bodily motion by distinguishing
a microkinesic and macrokinesic level of notation (Birdwhistell 1970). While the micro-
kinesic level accounts for all forms of bodily motion, the macrokinesic level focuses on
the “meaningful” variants of bodily motions within a particular culture (Birdwhistell
1970: 290). Based on this theoretical assumption, the notational system is devised
into 8 major sections:
(i) total head,

(ii) face,
(iii) trunk,
(iv) shoulder/arm/wrist,
(v) hand and finger activity,
(vi) hip, leg, ankle
(vii) foot activity, walking, and
(viii) neck.
For each of the section, the notational system includes a “basic notational logic”
(Birdwhistell 1970: 258), which is combined with indicators to capture the differing var-
iants producible by the 8 sections of the body. The description is based on articulatory
aspects, such as muscular tension as well joints of articulation. Altogether, the system
offers 400 signs for the description of bodily motion. It includes descriptions for hand
and finger activities, and bi-manual gestures, yet only rarely includes the notation of
movement.
2.1.2. Bernese coding system: The alphabet of body language

Criticizing the functional orientation in the notation and analysis of nonverbal
behavior, Frey et al. (1981) and Frey, Hirsbrunner, and Jorns (1982) developed an
Alphabet of Body Language. By using a time course notation and frame-by-frame
procedure, the coding scheme breaks down the bodily movement into temporal
and spatial components. Altogether, Frey et al. (1981) distinguish 104 movement di-
mensions for the following body parts: head, face, shoulders, trunk, upper arms,
hands, pelvis, thigh, and feet. The body parts are coded according to various dimen-
sions, e.g., head sagittal, relational or lateral movement and the degree of twisting.
For the coding of hand shapes, the scheme distinguishes the position and orientation
of the palm of the hand. The notation of movement is included in the time course
notation, but not separately. The system pursues a taxonomic perspective and
thus only includes as many dimensions as can be distinguished by the coders “in an
objective and reliable way” (Frey et al. 1981: 219). Articulatory variants are not
included.
1040 V. Methods
2.1.3. Sager: A notational system for gestures’ form

Similar to the Facial Action Coding System (FACS) (Ekman and Friesen 1978; Hager,
Ekman, and Friesen 2002), Sager aims at a standardized, detailed, and objective system
for describing gestures (Sager 2001; Sager and Bührig 2005). The notational system
includes 3 major aspects:
(i) temporal structure of gestures,

(ii) quality of movement, and
(iii) description of “Signifikanzpunkte”, that is, semiotically significant points of move-
ment (Sager 2001: 26).
The temporal structure of gestures is described according to the beginnings and endings
of movement. Assuming only a restricted number of possible movements for the pro-
duction of gestures, the direction of movement is described as being horizontal, vertical
or diagonal, for instance. Moreover, the system mentions differences in the quality of
movements and differentiates movements as slow, fast, or discontinuous (Sager 2001:
28–29). Signifikanzpunkte are described on the basis of two principles. The principle
center of rotation records the position of arms and hands through the centers of rotation
allowing the movement (shoulder, upper arm, elbow, wrist). The principle body levels
registers movements in the various centers of rotation relative to three body levels (ver-
tical axis, sagittal axis, transversal axis), allowing for different degrees of freedom for
movement (e.g., pronation or supination of the hand). Apart from the position and
movement of hands and arms, the system includes a description of hand shapes.
Seven communicatively relevant types of hand configurations, derivable from two
types of movements of the hand, are differentiated (e.g., cupped hand) (Sager 2001:
41–42).
2.1.4. FORM: An automated form-based description of gestures

FORM (Martell 2002; Martell and Kroll n.d.), an annotation scheme designed to
describe the kinematic information in gesture, aims at the development of “something
like a “phonetics” of gesture that will be useful for both building better HCI [human
computer interaction] systems and doing fundamental scientific research into the com-
municative process.” (Martell 2005: v) The description of gestures in FORM is based
on anatomical criteria, which are represented as a series of tracks capturing different
aspects of the gestural space. All in all, FORM distinguishes five tracks:
(i) excursion duration for the beginning and end of gestures,

(ii) upper arm and
(iii) lower arm recording the position, lift and direction of the arm (location track) as
well as the planes, types and efforts of movements and particular aspects of the
temporal structure of gestures (movement track),
(iv) hand and wrist for its shape, movement and the differentiation between one and
two handed gestures, and
(v) torso for recording its movement and orientation.
FORM pursues a strictly hierarchical and technical set up due to its aspect of computa-
tional processing and its applicability in research on artificial agents. The system is
designed for the use with the annotation program ANVIL (Kipp 2004).
2.1.5. Bressem: A linguistic perspective on the notation of gestures’ forms

Bressem’s system is developed within a linguistic approach to gestures, assuming a sep-
aration of form and function in the analytical process (Müller 1998, 2010; Fricke 2007,
2012; Bressem and Ladewig 2011; Ladewig and Bressem forthcoming; Müller, Bressem
and Ladewig this volume). Against the background of the four feature scheme (Becker
2004), the system grounds the description of gestures on the four parameters of sign lan-
guage (Battison 1974; Klima and Bellugi 1979; Stokoe 1960), assuming a potential sig-
nificance of all four parameters for the creation of gestural meaning. This theoretical
focus is reflected in the set up of the system, as gestures are described in their hand
shape, position, movement (Stokoe 1960), and orientation (Battison 1974). The system
focuses on a description of the hand, leaving anatomical descriptions of arms aside. De-
veloped within the context of a corpus linguistic study investigating forms of gestures
(Ladewig and Bressem forthcoming), the system propagates a differentiation between
an articulatory and taxonomic description of gestures’ forms.
The system draws on existing notational conventions from the field of sign language
and gestures studies. The description of hand configurations is based on HamNoSys
(Prillwitz et al. 1989), yet without assuming a taxonomic but an articulatory focus, as
it aims at a description of all possible hands by describing configuration of the hands
and fingers separately. The notation of the hand’s orientation is based on the distinction
made by McNeill (1992) with slight adaptations to incorporate form variants. Movement
is split into:
(i) type of movement,

(ii) direction of movement, and
(iii) quality of movement.
For the positions of the hand, the notational system draws on the gesture space intro-
duced by McNeill, which divides the gesture space “into sectors using a system of
concentric squares.” (McNeill 1992: 86) (see Bressem this volume for the notational
system)
2.2. Systems focusing on gestures’ form and function

2.2.1. McNeill: A psycho-linguistic perspective on the transcription of gestures
McNeill’s coding scheme is designed as a “guide to gesture classification, transcription,
and distribution.” (McNeill 1992: 75) It is developed in the light of a psycholinguistic
perspective on gestures in which gestures’ forms are perceived as an expression and man-
ifestation of cognitive processes through reproducing underlying “imagery” (McNeill
2005), which give direct insight onto cognitive processes as a “window onto thinking”
(McNeill and Duncan 2000: 14). The form of the gesture is viewed as inseparable
from its imagery through which meaning is materialized.
1042 V. Methods
McNeill’s system uses the written speech transcription as the basis for coding ges-
tures. Gestures are annotated into the speech transcription by inserting brackets for
the beginning and end of a gesture. The scheme includes the description of hand con-
figuration, orientation, position in gesture space, and movement. The gesture space con-
sists of a system of concentric squares dividing the space in front of the speaker into
three basic areas (center, periphery, and extreme periphery) (see McNeill 1992: 89).
Hand configurations are based on the labeling of hand shapes in American Sign Lan-
guage (ASL) using the “ASL shape that the gesture mostly resembles.” (McNeill
1992: 86) Orientation of the hand is coded according to the gesture space and palm ori-
entation. Gestural movements, such as shape, direction, and trajectory are accounted
for in a descriptive fashion without providing strict guidelines.
2.2.2. CoGest: A linguistic perspective on the transcription of form, meaning,

and function of gestures
The Conversational Gesture Transcription system (CoGest) pursues the goal of providing
a transcription “system of linguistically motivated categories for gestures and a practical
machine and human readable annotation scheme with simple and complex symbols for
simple and complex categories.” (Gut et al. 2002: 3) The system focuses on “linguistically
relevant gestural forms motivated by the functions of gestures within multimodal conver-
sations.” (Gut et al. 2002: 3) The Conversational Gesture Transcription system is based on
the theoretical assumption that the patterning of gestures is organized to a large extent in
similar ways as speech. The system thus distinguishes a form-based and functional descrip-
tion while assuming morphological and syntactical rules for structural and sequential
combinations of gestures (Gut et al. 2002: 4). Distinguishing different levels of transcrip-
tion (“compulsory basic” and “additional optional categories”), the Conversational Ges-
ture Transcription system offers different depths of transcription and coding.
The Conversational Gesture Transcription system includes the coding of:
(i) gesture phases (Kendon 1980),

(ii) hand configuration, movement, position,
(iii) combination of gestures into complex units, and
(iv) a functional classification of gestures.
Configurations of the hand are described with the taxonomic notation systems of
HamNoSys (Prillwitz et al. 1989) and FORM (Martell 2002). Movements are coded
for shape, direction, and modifiers, that is, size, speed, and number of repetitions. In
addition symmetry of hands is coded. For the combination of gestures into complex
units, the Conversational Gesture Transcription system distinguishes sequences of pre-
cedence and overlap (Gut et al. 2002: 6). The functional classification of gestures is
based on a four-part classification distinguishing various degrees of overlap between
gestural and verbal meaning.
2.2.3. Kipp, Neff, and Albrecht (2007): A transcription and coding scheme for the
automatic generation and animation of character-specific hand/arm gestures
Offering a scheme developed “for the specific purpose of automatically generating and
animating character-specific hand/arm gestures, but with potential general value”
(Kipp, Neff, and Albrecht 2007: 1), the scheme operates on the concept of a gesture lex-
icon made up of lexemes, that is “prototypes of recurring gesture patterns where certain
formational features remain constant over instances and need not be annotated for
every single occurrence.” (Kipp, Neff, and Albrecht 2007: 4)
The scheme is implemented and used within the ANVIL annotation tool (Kipp
2004) and consists of adding annotation elements to a track in which each element
is described with a pre-assigned set of attributes, which capture the most essential
parts of a gesture. The annotation scheme includes the spatial form of gestures in
which gesture phases, phrases, and units (Kendon 1980, 2004) are described along
with handedness, path of movement, position, as well as hand shape, and distance of
hands. Hand shapes are coded using a taxonomic classification of 9 types of configura-
tions. Gestures’ membership to a lexical category is determined by the lexeme that de-
fines the hand shape, palm orientation, and exact trajectory. Typical lexemes include:
raised index finger, cup (open hand), finger ring or progressive (circular movement)
(Kipp, Neff, and Albrecht 2007: 14). In a last step, the relation of speech and gesture
is captured.
2.2.4. Zwitserlood, Özyürek, and Perniss (2008): A cross-linguistic annotation

scheme for signs and gestures
The system by Zwitserlood, Özyürek, and Perniss (2008) is a cross modal and cross-
linguistic annotation scheme for the coding of sign language and gesture. It is based
on a number of existing coding schemes for sign language and gestures and pursues a
twofold description and analysis: a descriptive level (description of phonetic and phono-
logical form) and an analytic level (interpretation and analysis) (Zwitserlood, Özyürek,
and Perniss 2008: 186–187). At the descriptive level, manual elements, such as position,
action, and shape of each hand are annotated. The description of hand configurations
is based on HamNoSys (Prillwitz et al. 1989) with some additions and combined with
the description of the orientation of the hand. Furthermore, non-manual elements
such as body position, eye gaze, and facial expression are described. At the analytic
level, an interpretation of the sign or gesture and other non-verbal information is
given (Zwitserlood, Özyürek, and Perniss 2008: 187–189).
2.2.5. NEUROGES: A neurological annotation scheme for gestures

NEUROGES (Lausberg and Sloetjes 2009), an annotation scheme developed for its use
with the annotation software ELAN, pursues a neurological perspective. NEUROGES
assumes that “main kinetic and functional gesture categories are differentially asso-
ciated with specific cognitive (spatial cognition, language, praxis), emotional, and inter-
active functions.” (Lausberg and Sloetjes 2009: 1) The scheme implies that different
gesture categories may be generated in different brain areas. NEUROGES is composed
of 3 modules:
(i) kinetic gesture coding,

(ii) bimanual relation coding, and
(iii) functional gesture coding.
1044 V. Methods
Module (i) refers to the kinetic features of a hand movement, i.e., execution of move-
ment vs. no movement, trajectory and dynamics of movements, location of acting as
well as contact with body or not. For the characterization of the dynamic aspects of
movements, NEUROGES uses Laban notation (1950). Module (ii) allows for the cod-
ing of bimanual relation (for instance in touch vs. separate, symmetrical vs. complemen-
tary, independent vs. dominance). Module (iii) brings in the functional aspects and
determines the meaning of gestures based on a specific combination of kinetic features
(hand shape, orientation, path of movement, effort and others), which define the
various gesture types.
2.3. Systems including a rudimentary gesture coding

A rather large portion of transcription systems, mainly from the field of nonverbal com-
munication and conversation analysis, shares a perspective on the transcription and
coding of gestures that is characterized by the primacy of the verbal modality (Sager
and Bührig 2005). Accordingly, gestures and even nonverbal behavior in general are de-
scribed only rudimentarily in form as well as function, and only in functional relevance
to the meaning expressed in the verbal modality. This section discusses three systems
as examples for this kind of notational tradition. For further examples see for instance
Brinker and Sager (1989), Gumperz and Berenz (1993), Kallmeyer and Schmitt (1996),
Schmitt (2007), Schönherr (1997), and Weinrich (1992).
2.3.1. HIAT 1: A discourse analytic perspective on gestures

The discourse analytic transcription system HalbInterpretative ArbeitsTranskriptionen
(HIAT, Ehlich and Rehbein 1976, 1979a, 1979b, 1982) approaches the transcription of
gestures with a focus on the expressional repertoire (Ehlich and Rehbein 1979a: 313).
Gestures are transcribed symbolically through words or predications using everyday
speech. Designations can refer
(i) only partly to specific elements of the movement potential (e.g. raising hand),
(ii) to the expressional quality including the movement potential, and
(iii) the summery of complex movements and actions (e.g., waving).
The transcription furthermore includes a rough record of the on- and offsets as well as
the length of movements and only a rudimentary description of form or function. The
relevance of the gestural component and its transcription is thereby always dependent
on its relevance for the verbal communication (Ehlich and Rehbein 1979a: 315). The
verbal modality is the constitutive background for the transcription of gestures, so
that gestures are transcribed on commentary lines dependent on the verbal utterance.
2.3.2. Jefferson and Gesprächsanalytisches Transkriptionssystem (GAT):

A conversational analytic perspective on gestures
The transcription system proposed by Jefferson (1984) offers a quite similar perspective
as HIAT. It also uses symbolization for the transcription of gestures and other kinds of
kinesic behavior. Bodily behavior is noted in commentary lines dependent on the verbal
utterance and inclusive of a rudimentary coding of on- and offsets of movement se-
quences. Bodily behavior is yet only of interest if it obviously influences the verbal
and communicative orientations of the speakers and addressees.
The Gesprächsanalytische Transkriptionssystem (GAT, Selting, Auer, Barden, et al.
1998; Selting, Auer, Barth-Weingarten, et al. 2009) also only includes behavioral as-
pects, such as proxemic, kinesic, gesture, and gaze in the transcription of face-to-face
interaction if it contributes to the “(un)ambiguousness of other predominantly verbal
levels of activities.” (Selting, Auer, Barth-Weingarten, et al. 2009: 26) Regarding ges-
tures, the Gesprächsanalytisches Transkriptionssystem (Selting, Auer, Barden, et al.
1998) lists deictic gestures, illustrators, and emblems and includes a rough description
of on- and offsets as well as apex, that is, peaks, of gestural movement sequences.
The description is behavior-oriented and tries to be as little interpretative as possible.
The Gesprächsanalytisches Transkriptionssystem offers differing degrees of detailed-
ness in the transcription, as it sets apart basic vs. fine-grained transcripts. Basic tran-
scripts usually include an interpretive characterization of the gestures within the line
containing the verbal transcription. Fine-grained transcripts list gestures in a separate
line under the simultaneously occurring verbal activity. For illustrative purposes and in
cases of special importance of the nonverbal activities, the Gesprächsanalytisches Trans-
kriptionssystem also mentions the inclusion of pictures in the transcript (Selting, Auer,
Barden, et al. 1998: 28). Its newest revision, the Gesprächsanalytisches Transkriptions-
system 2, mentions that new conventions for the transcription of visual components of
communication are being designed (Selting, Auer, Barth-Weingarten, et al. 2009: 356)
due to the growing interest and importance of visual aspects of communication within
the field of interaction analysis.
This section has presented notation and transcription systems for gestures, which
range from a focus on:
(i) form, to
(ii) form and function, to
(iii) rudimentary descriptions.
The presented systems primarily differ in the aspect of whether a) gestures’ form can
and should be separated from possible meanings and functions (e.g., Birdwhistell
1970; Martell 2002; Bressem this volume) or b) whether a separation of form, meaning,
and function is not useful for a transcription of gestures (e.g., Gut et al. 2002; McNeill
1992, 2005). These diverging foci thereby go along with the theoretical assumption that
gestures can either be broken down into separate components, which may combine with
other features or not. Furthermore, the role of speech in the process of notation or tran-
scription is different in the systems presented above. While the verbal utterance is of
particular importance for some of the systems (e.g., Ehlich and Rehbein 1979b; Selting,
Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009), others exclude
the verbal modality in parts completely from the notational process (e.g., Bressem this
volume). A further difference in the presented systems is the integration of annotation
software. While especially recent systems use the advantages of annotation software for
the process of notation and transcription (e.g., Kipp, Neff, and Albrecht 2007; Lausberg
and Sloetjes 2009; Martell 2002), others rely on conventional and longstanding methods
1046 V. Methods
of transcribing gesture with the use of word documents. Yet, the most important differ-
ence is the clarification and integration of the system within a theoretical and method-
ological framework as well as their implications, which, for most systems, are not
presented as articulately as is necessary.
3. Transcription systems for speech

Transcription systems for speech, regardless of their theoretical and methodological ori-
entation, aim at a representation of spoken language that is easily readable, clear and
distinctive, easily understandable and learnable, economically expandable, and adapt-
able. In doing so, transcription systems commonly use deviations from standard orthog-
raphy to capture a variety of specific characteristics (dialectal or social characteristics),
morphological aspects, syntactical aspects (e.g., reduplication, syntactic malposition-
ing), as well as phonological, and prosodic aspects (e.g., volume, suprasegementalia,
lengthening, pauses, and tempo).
The following section will present a concise overview of well-known transcription
systems from the field of linguistics, conversation, and discourse analysis. The overview
will focus on the following aspects: basic theoretical assumptions, aim of transcription,
basic units of analysis (turn vs. utterance), segmentation of verbal units, transcription of
prosodic aspects as well as nonverbal aspects. It will not focus on the set up of transcrip-
tion files, such as transcription heads and other technical aspects. For a more detailed
overview of the discussed system and the transcription of speech in general see Dittmar
(2004).
3.1. IPA: The International Phonetic Alphabet

Developed at the end of the 19th century, the International Phonetic Alphabet (IPA)
aims at a one to one representation of phonetic characteristics of a sound in a graphical
system (International Phonetic Association 2005). Accordingly, the International
Phonetic Alphabet bases its representation of spoken language on the phonetic quality
of sounds, by providing one letter for each distinctive sound, thereby assuming two
major categories, i.e., vowels and consonants (International Phonetic Association
2005). The notation symbols are based on the Latin alphabetic, using as few non-
Latin forms as possible. If necessary, additional letters are created by using capital or
cursive forms, diacritics, and rotation of letters. The symbols are further supplemented
by a) diacritics indicating information about their articulation, co-articulation and pho-
nation, and b) suprasegmentalia for the representation of prosody, tone, length, and
stress. Using these symbols, the International Phonetic Alphabet includes 107 letters
to represent consonants and vowels, 31 diacritics, and 19 additional signs to indicate
suprasegmental qualities of spoken language.
At the end of the 1980ies, the Speech Assessment Methods Phonetic Alphabet
(SAMPA), an electronic representation and coding of parts of the International Phonetic
Alphabet notation, was developed. A complete representation of the International
Phonetic Alphabet notation, allowing for a machine-readable phonetic transcription
for every known human language, was put forward in the 1990s with an extended version
of the Speech Assessment Methods Phonetic Alphabet (X-SAMPA). While the Speech
Assessment Methods Phonetic Alphabet was essentially designed for segmental
transcription, particularly of a traditional phonemic or near-phonemic kind, the electronic

representation of prosodic aspects was later included in the Speech Assessment Methods
Prosodic Alphabet (SAMPROSA), a system of prosodic notation (Wells et al. 1992).
3.2. HIAT: HalbInterpretative ArbeitsTranskriptionen

HalbInterpretative ArbeitsTranskriptionen (HIAT), the “semi-interpretative working
transcription” (Ehlich and Rehbein 1976) is a literary transcription system based on
the concept of a score notation, i.e., an endless line. The HalbInterpretative Arbeit-
sTranskriptionen segments words by using principles of standard orthography. Bound-
aries of utterances, overlaps, and fast attachment of utterances along with listener
feedback are included in the transcription. Apart from conventions for the transcription
of verbal segments and units, the HalbInterpretative ArbeitsTranskriptionen also in-
cludes guidelines for the representation of prosody. Prosody is transcribed in an extra
commentary line and includes changes in the intonation contour (falling, rising intona-
tion), accented words, lengthening, changes in the volume (crescendo, decrescendo),
and pauses (Ehlich and Rehbein 1976). Nonverbal aspects are transcribed either interli-
nearily or in commentary lines by using a stylistic interpretation of the utterances (see
section 2.3). Apart from its convention for the representation of spoken language phe-
nomena, the HalbInterpretative ArbeitsTranskriptionen tries to account for changing
perspectives on the data and thus distinguishes various levels of transcription. A primary
transcription using a minimal set of signs can be changed into an individual analytic per-
spective by subsequently adding aspects to the transcription. Furthermore, the system
includes technical methods of transcribing and analyzing (HIAT-DOS Schneider 2001).
3.3. CHAT: Codes for Human Analysis of Transcripts

Codes for Human Analysis of Transcripts, developed within the Child Language Data
Exchange System (CHILDES), aims at an international database for first language
acquisition with a uniform transcription system (MacWhinney 2000). Codes for
Human Analysis of Transcripts represents verbal utterances not in turns, but in single
segmented utterances, which are represented using literal transcription. Boundaries
of turns are only noted if necessary. Particular functional signs, that is, declarative, ques-
tion, and request, mark ends of utterances. In addition, overlaps, quick uptake, interrup-
tions (self and other), and listener feedback are transcribed on the level of word
segmentation. Regarding prosodic aspects, Codes for Human Analysis of Transcripts
marks the tonal structure of words not separately but includes it in a commentary
line, representing aspects such as primary accent, three types of accents (standard, espe-
cially strong, contrastive, deviant), lengthening, pauses, volume, and tempo. If neces-
sary, the phonetic structure of the utterance can be represented by using the
International Phonetic Alphabet system. Nonverbal aspects are included within lines
or more detailed in a commentary line.
3.4. Jefferson: The conversation analytical system

The transcription system by Jefferson (1984) aims at the reproduction of sequential
aspects of natural interaction in everyday interaction by using a neutral design of the
1048 V. Methods
transcript as an observational datum. A rich inventory for the reproduction of turns and
their sequential progression is thus characteristic for the system. The system uses the
“eye dialect,” a standard orthography onomasiologically adapted to the phonetic real-
ization of the expression. The system’s inventory of signs is based on the Latin alphabet.
The format of transcription is sequentially organized and turns of speakers are, analog
to their linear progression, ordered in chronological order. The system represents
simultaneous utterances of more than one speaker by using brackets at the time the
overlap occurs. The end of verbal units/turns is marked by standard orthography for
interrogative sentences. The system also includes prosodic aspects of utterances, such
as remarkable changes in pitch, changes in the intonations contour, lengthening,
emphasis, changes in tempo, and pauses. Nonverbal events, such as gestures, mimics,
breathing, and coughing for instance, are represented in commentary lines in double
parenthesis (e.g. coughing). In its newest revision (Jefferson 2002), the system also
includes guidelines for a computer-aided transcription.
3.5. DT: Discourse Transcription system

The discourse transcription system proposed by Du Bois (1991) and Du Bois et al.
(1992) offers an improved version of a conversation analytic for the transcription of
spoken language. The Discourse Transcription System distinguishes a basic (observa-
tional description of spoken language) from a fine-grained transcription (research
specific coding). Contrary to the segmentation in the HalbInterpretative ArbeitsTran-
skriptionen and the Conversation Analytical System by Jefferson, the Discourse Tran-
scription System segments the verbal utterance in intonation units. Intonation units are
functionally classified in three types (final, continuing, appeal) and represented on sin-
gle lines by using a specific phonetic contour or set of contours (Du Bois 1991: 53) as
well as their terminal pitch direction. If necessary, intonation units can be phonetically
realized in fine-grained transcripts. Further prosodic aspects captured by the system
include primary word accents, lengthening, pauses, volume, tempo, and voice quality.
The system also captures nonverbal aspects, such as smiling, coughing, yawning, inhala-
tion and exhalation as well as a rough coding of gestures, gaze, body, and co-action (Du
Bois et al. 1992).
3.6. GAT: Gesprächsanalytisches Transkriptionssystem

The Gesprächsanalytisches Transkriptionssystem (GAT, Selting, Auer, Barden, et al.
1998; Selting, Auer, Barth-Weingarten, et al. 2009) is the most widespread transcription
system in the German speaking area within the field of conversation analysis. It pro-
poses a standardization of existing systems and, similar to the one by Jefferson
(1984), aims at an analysis of talk in interaction and a representation of its sequential
aspects. For ease of handling, principles of standard orthography represent spoken lan-
guage. In the Gesprächsanalytisches Transkriptionssystem, turns, that is minimal units
of utterances, are composed of phrasing units that are constituted by a primary accent.
Each new turn is represented in a new line, thus offering a different set up of the tran-
script than the HalbInterpretative ArbeitsTranskriptionen, for instance. Prosodic as-
pects are represented intralinearily by capturing changes in pitch, primary and
secondary accents (falling, rising, rise-fall, fall-rise, constant), eye-catching changes in
tone, lengthening, pauses, and volume. Laughing is represented in a syllabic manner

(haha hehe hihi) or interlinearily in double parenthesis ((smiles)) (see section 2.3.).
Stylistic interpretations of utterances, such as irony for instance, are represented in
commentary lines. Similar to the HalbInterpretative ArbeitsTranskriptionen, the Ge-
sprächsanalytisches Transkriptionssystem offers a distinction between a basic and a
fine-grained transcription. The basic transcription includes
(i) sequential structure,

(ii) pauses,
(iii) specific segmental conventions,
(iv) laughing,
(v) listener feedback,
(vi) accentuation, and
(vii) changes in pitch.
A fine-grained transcription focuses more closely on the representation of prosody.

In its newest revision (Selting, Auer, Barth-Weingarten, et al. 2009), the Gespräch-
sanalytisches Transkriptionssystem 2 offers an even easier start of transcription by in-
troducing the concept of the minimal transcript. Overall, the Gesprächsanalytisches
Transkriptionssystem 2 has a stronger phonological and prosodic bias, such that the con-
cept of intonation unit replaced the “phrase unit” (Selting, Auer, Barth-Weingarten,
et al. 2009: 355). Primary and secondary accents are mostly defined phonologically rather
than phonetically. In addition, a tutorial (GAT-TO) was developed, introducing the prac-
tical process of transcription along with a discussion of some problematic aspects. (For a
more detailed discussion of the Gesprächsanalytisches Transkriptionssystem 2 and the
transcription of prosody see section 4.2.)
While all of the presented systems aim at a representation of spoken language, the
systems differ from each other in a range of aspects. The most obvious difference is the
diverging format of representation for spoken language. Systems may use notation scores,
that is, an endless line (Ehlich and Rehbein 1979b) or single lines for single speakers and
turns (e.g., Selting, Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009).
Furthermore, the systems differ in their basic unit of analysis: turn (e.g., Jefferson 1984;
Selting, Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009) vs. utter-
ance as a whole (MacWhinney 2000). Going along with this is a differentiation in the
segmentation of verbal units, varying from sounds (International Phonetic Association
2005) to intonations units for instance (Du Bois 1991). In addition, the systems include
prosodic aspects as well as other forms of bodily behavior to varying degrees.
4. Transcription of prosody
Transcription systems for prosody generally capture two main types of phenomena:
a) the division of utterances into prosodically-marked chunks, units or phrases and
b) the representation of prominence along with aspects such as pitch movement,
reset or rhythmic change for instance. But, the size and type of prosodic units vary
considerably in the different systems, thus resulting in different prosodic transcriptions.
1050 V. Methods
In general, it is common practice for manual prosodic annotation to be carried out

via auditory analysis that is being accompanied by analyses of waveforms and funda-
mental frequency (F0). More recently however, a growing number of systems address
the question of automated prosodic annotation and transcription (see for instance
Avanzi, Lacharet-Dujour and Victorri 2008; Campione and Vèronis 2001; Garcia,
Gut, and Galves 2002; and Mertens 2004). The following section will present a concise
discussion of prosodic descriptions within the field of conversation analysis, discourse
analysis, and linguistics.
4.1. HIAT 2: Erweiterte HalbInterpretative ArbeitsTranskriptionen

While the HalbInterpretative ArbeitsTranskriptionen 1 (Ehlich and Rehbein 1976) pri-
marily focused on the verbal level, thus allowing for a relatively accurate representation
of “normal” intonation (Ehlich and Rehbein 1976: 59), the HalbInterpretative Arbeit-
sTranskriptionen 2 (Ehlich and Rehbein 1979a) focuses on a fine-grained transcription
of intonational phenomena. For doing so, the HalbInterpretative ArbeitsTranskriptio-
nen 2 assumes a system with 4 (up to 6) levels of pitch range. By marking the pitch
with the sign “o” and the use of vertical lines to represent the different levels of
pitch range, the HalbInterpretative ArbeitsTranskriptionen 2 is able to capture changes
in pitch. Accents are only transcribed when differing immensely from the normal
accentuation pattern of the utterance. In such cases, the HalbInterpretative Arbeit-
sTranskriptionen 2 assumes contrastive accents underlining of syllable or word for
the indication of marked accentuation patterns. Changes in volume and transcription
of pauses follow the same conventions as in the HalbInterpretative ArbeitsTranskrip-
tionen 1. Finally, the HalbInterpretative ArbeitsTranskriptionen 2 marks tempo in
commentary lines.
4.2. GAT 2: Gesprächsanalytisches Transkriptionssystem

Decisive for the prosodic transcription in the Gesprächsanalytisches Transkriptionssys-
tem 2 is the central concept of the intonation unit with at least one accentuated syllable,
that is the focus accent. The Gesprächsanalytisches Transkriptionssystem 2 demarcates
intonation units by more or less phonetically strong features, such as creaky voice, final
lengthening, pauses, and jumps in pitch at the beginning and ending of units. All in
all, the system includes a rich inventory for the intralinear prosodic transcription of
utterances, as it offers numerous extensions to the conventions offered in the Gespräch-
sanalytisches Transkriptionssystem (Selting, Auer, Barden, et al. 1998). The revised
conventions for rhythm, following the transcription of isochronous rhythmical units
(Couper-Kuhlen 1993 and Auer, Couper-Kuhlen, and Müller 1999), allow for its more
fine-grained representation. In addition, breathing, pauses, and lengthening can be re-
presented more precisely by offering more detailed conventions. Moreover, by suggest-
ing an autosegmental transcription of prosody using German Tones and Break Indices
(GTobi, Grice, Baumann, and Benzmüller 2005) or Dutch Tones and Break Indices
(ToDi, Gussenhoven, Rietveld, and Terken 2005) (see below) for particularly detailed
analyses going beyond the possibilities of an intralinear transcription, the Gesprächsa-
nalytisches Transkriptionssystem 2 captures the range of prosodic phenomena needed
for a conversational analysis.
4.3. TSM: Tonic stress marks system

The tonic stress marks system (TSM, Roach et al. 1993; Knowles, Williams, and Taylor
1996) is based on the British school of auditory intonation analysis (e.g., Crystal 1969;
O’Connor and Arnold 1973). The tonic stress marks system therefore assumes a tran-
scription of intonation by means of a cohesive contour represented in tone units,
which are made up of one nucleus and show their nuclear syllable as the last accented
syllable of the unit. Tone units consist of maximally four components, that is pre-head,
head, nucleus and tail (Crystal 1969; O’Connor and Arnold 1973), and the main ac-
cented syllable receives particular importance as it determines the tone unit as a
whole. Based on these assumptions, the tonic stress marks system assumes two levels
of intonation phrasing (major tone group and minor tone group). The tonic stress
marks system indicates the presence and tonal characteristics of every accent by
means of a diacritic before the accented syllable. The tonic stress marks system includes
5 different tones ((1) level, (2) fall, (3) rise, (4) fall-rise, (5) rise-fall), which can be
either high or low.
4.4. ToBI: Tones and Break Indices

Tones and Break Indices (ToBi, Beckman and Elam 1997; Silverman et al. 1992) is cur-
rently probably the best-known system for the prosodic representation of American
English. Although originally designed for a transcription of American English, Tones
and Break Indices has been successfully adapted to a number of languages, e.g., Ger-
man (GToBi), Dutch (ToDi), varieties of English (IViE), Glasgow dialect of English
(Gla-ToBI), Spanish, Japanese, and Korean.
The basic principles of Tones and Break Indices are taken from the phonological
model of English intonation by Pierrehumbert (1980). The transcription of prosody
using Tones and Break Indices thus follows two steps: 1) transcription of tones and
2) transcription of break indices. For the transcription of tones, Tones and Break Indices
assumes tones to be part of accents or to indicate a boundary. They can be either high
(H) or low (L) and tones signaling the boundaries of prosodically defined phrases may
occur at their right or left edge. The Tones and Break Indices altogether distinguishes
five basic pitch accents, assuming that accents may contain more than one tone. The
transcription of tones consists of a speech signal and a fundamental frequency record
(F0) along with the time-aligned symbolic labels relating to four types of events (Beck-
man and Elam 1997).
The transcription of break indices is based on auditory and visual information and
distinguishes five levels or perceived junctures, which are transcribed between words
on the orthographic tier. They are numbered from 0 to 4, with 0 indicating the lowest
degree (words are grouped together into a “clitic group”), 1 marking the default bound-
ary between two words in the absence of another prosodic boundary, and 3 and 4
specifying intermediate and intonation phrase boundaries (Beckman and Elam 1997).
4.5. PROLAB: Prosodic labeling

PROLAB, a method for prosodic labeling developed in the project “Verbmobil”
(Kohler, Pätzold, and Simpson 1995), is based on the Kiel Intonation Model (KIM)
1052 V. Methods
(Kohler 1991). Accordingly, PROLAB defines all categories perceptually, involving

bundles of acoustic properties including for instance fundamental frequency (F0), dura-
tion, intensity for sentence stresses, and segmental lengthening (Kohler 1995). Contrary
to Tones and Break Indices (ToBi), PROLAB does not represent pitch patterns as lin-
ear sequences of elementary tones but rather recognizes whole pitch peak and valley
contours. Furthermore, it marks sentence stress separately from intonation and sepa-
rates phrasing and intonation. Phrasing markers are assigned with reference to all pro-
sodic features on perceptual grounds. Contrary to other systems, PROLAB does not
offer separate labeling tiers for different speech phenomena. Rather PROLAB aims
at an integration of segmental and prosodic labeling (Kohler 1995), such that prosodic
labels can be integrated into a complete segmental label file or an orthographic file.
4.6. INTSINT: International Transcription System for Intonation

International Transcription System for Intonation (INTSINT) is a system for cross-
linguistic comparison of prosodic systems, allowing for a transcription at different levels
of detail. It is based on the postulate that “the surface phonological representations of a
pitch curve can be assumed to consist of phonetically interpretable symbols which can
in turn be derived from a more abstract phonological representation.” (Hirst 1991: 307)
Transcriptions are thus closely linked to the phonetic realization of the intonation
contour, but at the same time allow for a phonological symbolization. In the Interna-
tional Transcription System for Intonation, prosodic target points are aligned with
an orthographic or phonetic transcription, defined in relation to earlier pitch, and are
transcribed by means of arrows corresponding to the different pitch levels (higher,
up-stepped, lower, down-stepped or same) (Hirst 1991; Hirst and Di Cristo 1998).
4.7. SAMPROSA: SAM Prosodic Transcription

SAM Prosodic Transcription (SAMPROSA) offers a prosodic transcription for linguis-
tic purposes and for prosodic labeling in speech technology and experimental phonetic
research (Wells et al. 1992). In SAM Prosodic Transcription, intonational transcriptions
need to be transcribed independently from other transcriptions or representations of
the signal on a separate tier. Accordingly, SAM Prosodic Transcription sets up parallel
symbolic representations of utterances using different segmental or prosodic criteria
(Gibbon 1988). The parallel symbolic representations may, thereby, be related either
through a) association, in which phonological rules are defined which relate prosodic
and segmental units or b) synchronisation, in which the symbols may be assigned to
the signal as tags or annotations. In general, SAM Prosodic Transcription allows for
the transcription of global, local, terminal and nuclear tones as well as length, stress,
pauses, and prosodic boundaries. However, SAM Prosodic Transcription is not a tran-
scription system for prosody in a strict sense but rather “computer-compatible codes
for use in formatting transcriptions for interchange purposes, once a model has been
selected” (Gibbon, Mertins, and Moore 2000: 53).
The preceding overview has shown that the systems not only vary in their theoretical
and methodological tradition, but also in their focus on a transcription of prosody. In
general, the proposed systems can be classified according to common and differing
parameters (Llisterri 1994). Regarding the representation of prosodic events, the sys-
tems can be classified into multi-tiered (the Tones and Break Indices, the International
Transcription System for Intonation) or one-tiered systems (e.g., the International Pho-
netic Alphabet, the Gesprächsanalytisches Transkriptionssystem 2, the HalbInter-
pretative ArbeitsTranskriptionen 2). The systems further differ regarding their aspects
of machine readable symbols (e.g., the Speech Assessment Method Phonetic Alphabet,
SAMSINT or SAM Prosodic Transcription) vs. non-machine readable symbols (e.g., the
Gesprächsanalytisches Transkriptionssystem 2, the Tones and Break Indices, the Ge-
sprächsanalytisches Transkriptionssystem, the HalbInterpretative ArbeitsTranskriptio-
nen, PROLAB). In addition, the systems differ in whether they are theory-driven
systems, that is a) based on a conception of the phonetics-phonology interface or b) data-
driven systems, i.e., defined by the needs and the practices which are known to be
relevant in order to explain the discursive or the interactional behavior of the speakers (Llis-
terri 1994).
5. Transcription of body posture

While the majority of research is interested in bodily behavior and body posture as in-
dicators for personal attitudes, personal criteria or emotions (e.g., Argyle 1975; Scherer
1970, 1979), or changes of body postures for discursive functions (e.g., Bohle 2007; Ken-
don 1972; Scheflen 1965), approaches focusing on a close description and transcription
systems of body posture are rare. Existing transcription systems thereby usually base their
transcription either on an anatomical or an environmental reference system. While ana-
tomic systems identify the location of a body part or the body as a whole with respect to
the bodily axes (e.g., Birdwhistell 1970; Frey et al. 1981; Wallbott 1998), environmental
systems define the body in relation to objects in the external world (e.g., Hall 1963).
Birdwhistell’s kinesic notation system for bodily motion (see also section 2.1.1.) in-
cludes the trunk, shoulder/arm/wrist, hip, leg, ankle, and neck for a description of body
posture. The different body parts are thereby described in their different anatomical po-
sitions. The trunk may be “leaning back” or “leaning forward,” the shoulders might be
“hunched” or “drooped lateral”. Seated positions may be described as “close double l
(seated, feet square on the floor)” or “reverse x (lower legs crossed)” (Birdwhistell
1970: 261–281). Same as for the notation of hand positions, shapes and finger activities,
Birdwhistell distinguishes a micro and macrokinesic level for the transcription of body
postures, thus allowing for a form-based as well as functional transcription.
A further anatomical transcription system including conventions for body posture is
the Bernese Coding System for Time-Series Notation of Body Movement (Frey et al.
1981; Hirsbrunner, Frey, and Crawford 1987) (see also section 2.1.2.). Similar to Bird-
whistell, the Bernese Coding Scheme differentiates the body according to the shoulders,
trunk, pelvis, thigh, and feet. The body parts are coded according to their potential to
engage in complex movement variations, which are defined for the most part “as displa-
cements” or ‘flexions’ from a standard ‘upright’ sitting position”. The feet are thus
described for instance as “strongly tilted to the left” or “straight (Hirsbrunner, Frey,
and Crawford 1987: 110)”.
Based on Birdwhistell (1970) and Frey et al. (1981), but also on functional classifica-
tion systems such as Ekman and Friesen (1969), Wallbott’s system combines a form-
based and functional perspective. By distinguishing 5 categories (upper body, shoulders,
1054 V. Methods
head, arms, hands), which are described according to their movement abilities (up, down,
back for the shoulders for instance), Wallbott’s system allows for a rough anatomical
description of various body postures.
Although primarily developed for the notation of dance, aspects of the Laban nota-
tion (Laban 1950) are nowadays used in a range of transcription systems (e.g., Davis
1979; Greenbaum and Rosenfeld 1980; Kendon 2004; Lausberg and Sloetjes 2009).
This system includes a basic segmentation of the skeletal system, basic kinesiological
terms (e.g., rotations), spatial terms (e.g., straight vs. circular paths) and object relations
(e.g., touch), which allow for a detailed notation of body posture and bodily movement.
Recently, new transcription systems aiming at a combination of describing form and
function of body postures are postulated. Schöps (in preparation), for instance, presents
a system including basic postures (standing, laying down, sitting) as well as body parts
and movement categories, along with different predicates for the transcription of the
used body configurations (e.g., spread for arms and legs). The Body Action and Posture
Coding system by Dael and Scherer (2012) approaches the transcription and coding of
body posture on an “anatomical level (different articulations of body parts), a form
level (direction and orientation of movement), and a functional level (communicative
and self-regulatory functions)” (Dael and Scherer 2012).
6. Transcription systems for gaze

The role of gaze has so far been mostly investigated in two main areas: 1) for the orga-
nization of talk in interaction, greetings and in particular in turn-taking (e.g., Eibl-
Eibesfeldt 1971; Goodwin 1981; Kendon 1967; Kendon and Ferber 1973), and 2) in
regulating the relationship and intimacy between interactants (e.g., Argyle and Cook
1976). Description and transcription of gaze is thereby usually rough, noting down as-
pects such as looking at each other, eyes closed, eyes wide-open etc.. Birdwhistell
(1970), for instance, includes a few conventions for the transcription of gaze, such as
shut eyes, sideways look, or stare. Ehlich and Rehbein (1982), based on eight different
categories of gaze, distinguish between gaze between a) eyes and b) eyes and face
region exchanged between speakers, sender, and other interactants. The differentiation
is based on the presence and absence of gaze, its movement and duration (Ehlich and
Rehbein 1982: 55–56), allowing for a transcription of gaze as “sender looks at recipi-
ent”, “sender turns gaze towards recipient” or “recipient looks away from sender”.
Goodwin’s (1981) is probably the most explicit transcription system by providing con-
ventions not only for the different types of eye gaze but also for the setup of transcrip-
tion files. Gaze of the recipient is thus marked above the utterance, while the recipient’s
gaze is noted down underneath the utterance. Goodwin includes conventions for mark-
ing the beginning and ending of the gaze as well as preparational and retraction phases
of the gaze, as well as its directionality (Goodwin 1981: viii, 52–53).
7. Conclusion
This overview of notation and transcription system for speech and bodily behavior has
shown that a range of proposals exists, all of which try to account for the reproduction
of verbal and bodily behavior in written forms. It became apparent that the individual
systems differ immensely in their theoretical background and methodological approach.

On the one hand, the large number of proposed systems, in particular for the transcrip-
tion of gestures, speech, and prosody, results in a range of different forms of transcrip-
tions, thus hindering the comparability of research results. On the other hand, it
provides the necessary grounds for the transcription of speech or bodily behavior ac-
cording to particular research perspectives and theoretical foci, thus broadening the
scope of research within the respective fields of research.
8. References
Argyle, Michael 1975. Bodily Communication. London: Methuen.
Press.
Auer, Peter, Elizabeth Couper-Kuhlen and Frank Müller 1999. Language in Time: The Rhythm
and Tempo of Spoken Interaction. New York: Oxford University Press.
Avanzi, Mathieu, Anne Lacheret-Dujour and Bernard Victorri 2008. ANALOR. A tool for semi-
automatic annotation of French prosodic structure. Paper presented at the Interspeech 2008,
Campinas, Brazil, May 6–9.
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies
5: 1–19.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis, Department of Philos-
ophy and Humanities, Free University Berlin.
Beckman, Mary E. and Gayle Ayers Elam 1997. Guidelines for ToBI labelling. Retrieved
15.07.2011, from http://www.ling.ohio-state.edu/research/phonetics/E_ToBI/
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body, Motion, Communication. Philadel-
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben: Explorative Studie zur Rolle Re-
Bohle, Ulrike this volume Approaching notation, coding, and analysis from a conversational ana-
lysis point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
tural movement? Semiotica 184(1/4): 53–91.
Brinker, Klaus and Sven F. Sager 1989. Linguistische Gesprächsanalyse. Berlin: Erich Schmidt.
Bußmann, Hadumod 1990. Lexikon der Sprachwissenschaft. 2nd revised edition. Stuttgart,
Germany: Alfred Kröner.
Campione, Estelle and Jean Vèronis 2001. Semi-automatic tagging of intonation in French spoken
corpora. In: Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie and Shereen Khoja
(eds.), Proceedings of the Corpus Linguistics’ 2001 Conference, 90–99. Lancaster, U.K.: Lancas-
ter University, UCREL.
Couper-Kuhlen, Elizabeth 1993. English Speech Rhythm: Form and Function in Everyday Verbal
Interaction. Amsterdam: John Benjamins.
sity Press.
1056 V. Methods
Dael, Nele and Klaus R. Scherer 2012. The Body Action and Posture coding system (BAP):
Development and reliability. Journal of Nonverbal Behavior, 36, 97–121.
Davis, Martha 1979. Laban analysis of nonverbal communication. In: Shirley Weitz (ed.), Nonver-
bal Communication: Readings with Commentary, 182–206. New York: Oxford University Press.
Dittmar, Norbert 2004. Transkription: Ein Leitfaden mit Aufgaben für Studenten, Forscher und
Laien. Heidelberg: VS Verlag für Sozialwissenschaften.
Du Bois, John W. 1991. Transcription design principles for spoken discourse research. Pragmatics
1(1): 71–106.
Du Bois, John W., Susanna Cumming, Stephan Schuetze-Coburn and Danae Paolino 1992. Dis-
course transcription. Santa Barbara Papers in Linguistics 4. University of California, Santa Bar-
bara, Department of Linguistics.
Ehlich, Konrad and Jochen Rehbein 1976. Halbinterpretative Arbeitstranskriptionen (HIAT 1).
Linguistische Berichte 25: 21–41.
Ehlich, Konrad and Jochen Rehbein 1979a. Erweiterte halbinterpretative Arbeitstranskriptionen
(HIAT2). Linguistische Berichte 59: 51–75.
Ehlich, Konrad and Jochen Rehbein 1979b. Zur Notierung nonverbaler Kommunikation für dis-
kursanalytische Zwecke (HIAT2). In: Peter Winkler (ed.), Methoden der Analyse von Face-to-
Face-Situationen, 302–329. Stuttgart: Metzler.
Ehlich, Konrad and Jochen Rehbein 1982. Augenkommunikation. Methodenreflextion und Beis-
pielanalyse. Amsterdam: John Benjamins.
Eibl-Eibesfeldt, Irenäus 1971. Transcultural patterns of ritualized contact behavior. In: Aristide H.
Esser (ed.), Behavior and Environment. The Use of Space by Animals and Men, 238–246. New
York: Plenum.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories,
origins, usage, and coding. Semiotica 1 49–98.
Ekman, Paul and Wallace V. Friesen 1978. Facial Action Coding System (FACS): A Technique for
the Measurement of Facial Action. Palo Alto, CA: Consulting Psychologists Press.
Frey, Siegfried, Hans Peter Hirsbrunner, Jonathan Pool and William Daw 1981. Das Berner Sys-
tem zur Untersuchung nonverbaler Interaktion: I. Die Erhebung des Rohdatenprotokolls; II.
Die Auswertung von Zeitreihen visuell-auditiver Information. In: Peter Winkler (ed.), Metho-
den der Analyse von Face-to-Face-Situationen, 203–268. Stuttgart: Metzler.
Frey, Siegfried, Hans Peter Hirsbrunner and Ulrich Jorns 1982. Time-series notation: A coding
principle for the unified assessment of speech and movement in communication research. In:
Ernest W. B. Hess-Lüttich (ed.), Multimodal Communication: Vol. I Semiotic Problems of Its
Notation, 30–58. Tübingen: Narr.
De Gruyter.
Garcia, Jesus, Ulrike Gut and Antonio Galves 2002. Vocale – a semi-automatic annotation tool for
prosodic research. Proceedings of Speech Prosody 2002.
Gibbon, Dafydd 1988. Intonation and discourse. In: Janos S. Petöfi (ed.), Text and Discourse Con-
stitution, 3–25. Berlin: De Gruyter.
Gibbon, Dafydd, Inge Mertins and Roger K. Moore 2000. Handbook of Multimodal and Spoken
Dialogue Systems: Resources, Terminology, and Product Evaluation. Norwell, Massachusetts,
USA: Kluwer Academic.
Greenbaum, Paul E. and Howard Rosenfeld 1980. Varieties of touching in greetings: Sequential
structure and sex-related differences. Journal of Nonverbal Behavior 5(1): 13–25.
Grice, Martin, Stefan Baumann and Ralf Benzmüller 2005. German intonation in autosegmental-
metrical phonology. In: Sun-Ah Jun (ed.), Prosodic Typology: The Phonology of Intonation
and Phrasing, 55–83. Oxford: Oxford University Press.
Gumperz, John and Norine Berenz 1993. Transcribing conversational exchanges. In: Jane A. Ed-
wards and Martin D. Lampert (eds.), Talking Data: Transcription and Coding in Discourse
Gussenhoven, Carlos, Toni Rietveld and Jaques Terken 2005. Transcription of Dutch intonation.
In: Sun-Ah Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 118–145.
ture transcription system version 1.0. Fakultät für Linguistik und Literaturwissenschaft, Uni-
versität Bielefeld, ModeLex Tech. Rep 1.
Hager, Joseph C., Paul Ekman and Wallace V. Friesen 2002. Facial action coding system. Re-
trieved 15.07.2011, from http://face-and-emotion.com/dataface/facs/guide/InvGuideTOC.html
Hall, Alan T. 1963. A system for the notation of proxemic behavior. American Anthropologist
65(5): 1003–1026.
Hirsbrunner, Hans-Peter, Siegfried Frey and Robert Crawford 1987. Movement in human inter-
action: Description, parameter formation and analysis. In: Aron W. Siegman and Stanley Feld-
stein (eds.), Nonverbal Behavior and Communication, 99–140. Hillsdale, NJ: Lawrence
Erlbaum.
Hirst, Daniel 1991. Intonation models: Towards a third generation. In: Actes du XIIe‘me Congre‘s
International des Sciences Phone´tiques, 305–310. Aix-en-Provence, France: Université de
Provence, Service des Publications.
Hirst, Daniel and Albert Di Cristo 1998. Intonation Systems: A Survey of Twenty Languages. Cam-
International Phonetic Association 2005. Handbook of the International Phonetic Association: A
Guide to the Use of the International Phonetic Alphabet. Cambridge: Cambridge University
Press.
Jefferson, Gail 1984. On stepwise transition from talk about a trouble to inappropriately next-
positioned matters. In: Maxwell J. Atkinson and John Heritage (eds.), Structures of Social
Action: Studies in Conversation Analysis, 191–222. Cambridge: Cambridge University Press.
Jefferson, Gail 2002. Is “no” an acknowledgment token? Comparing American and British uses of
(+)/(-) tokens. Journal of Pragmatics 34(10/11): 1345–1383.
Kallmeyer, Werner and Reinhold Schmitt 1996. Forcieren oder: Die verschärfte Gangart. Zur
Analyse von Kooperationsformen im Gespräch. In: Werner Kallmeyer (ed.), Gesprächsrhe-
torik: Rhetorische Verfahren im Gesprächsprozeß, 19–118. Tübingen: Narr.
Kendon, Adam 1967. Some functions of gaze-direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1972. Some relationship between body motion and speech In: Aron W. Siegman,
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Perga-
mon Press.
Press.
Kendon, Adam and A. Ferber 1973. A description of some human greetings. In: R. P. Michael
and J. H. Crook (eds.), Comparative ecology and behavior of primates, 591–668. New York:
Academic Press.
Kipp, Michael, Michael Neff and Irene Albrecht 2007. An annotation scheme for conversational
gestures: How to economically capture timing and form. Journal on Language Resources and
Evaluation – Special Issue on Multimodal Corpora 41(3/4): 325–339.
Klima, Edward and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
1058 V. Methods
Knowles, Gerry, Briony Williams and L. Taylor 1996. A Corpus of Formal British English Speech.
London: Longman.
Kohler, Klaus J. 1991. A model of German intonation. Arbeitsberichte des Instituts für Phonetik
und digitale Sprachverarbeitung der Universität Kiel 25: 295–360.
Kohler, Klaus J. 1995. ToBIG and PROLAB: Two prosodic transcription systems for German
compared. Paper presented at the Conference ICPhS Stockholm, 13 August 1995.
Kohler, Klaus J., Matthias Pätzold and Adrian P. Simpson 1995. From Scenario to Segment: The
Controlled Elicitation, Transcription, Segmentation and Labelling of Spontaneous Speech.
Kiel, Germany: Institut für Phonetik und Digitale Sprachverarbeitung, IPDS, Universität Kiel.
Laban, Rudolph von 1950. The Mastery of Movement on the Stage. London: Macdonald and
Evans.
Lausberg, Hedda and Han Sloetjes 2009. Coding gestural behavior with the NEUROGES –
ELAN system. Behavioral Research Methods 41(3): 841–849.
Llisterri, Joaquim 1994. Prosody encoding survery. MULTEXT-LRE Project 62-050.
MacWhinney, Brian 2000. The CHILDES Project: Tools for Analyzing Talk, Volume 2: The Da-
tabase. Hillsdale, NJ: Lawrence Erlbaum.
Martell, Craig 2002. Form: An extensible, kinematically-based gesture annotation scheme. Paper
presented at International Conference on Language Resources and Evaluation. European
Language Resources Association.
dissertation, Department of Computer and Information Sciences, University of Pennsylvania.
Martell, Craig and Joshua Kroll no date. Corpus-based gesture analysis: An extension of the
FORM dataset for the automatic detection of phases in a gesture. Unpublished manuscript.
of Chicago Press.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking-for-speaking. In: David
Mertens, Pier 2004. The prosogram: Semi-automatic transcription of prosody based on a tonal per-
ception model. Paper presented at Speech Prosody 2004, March 23–26, 2004, Nara, Japan.
lin: Arno Spitz.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a grammar of
gesture: evolution, brain, and linguistic structures. Berlin: Antrag im Rahmen der Förderinitia-
tive “Schlüsselthemen der Geisteswissenschaften Programm zur Förderung fachübergreifender
und internationaler Zusammenarbeit”.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of ges-
tures: A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig
and David McNeill (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. Handbooks of Linguistics and Communication
Science (38.1). Berlin and Boston: De Gruyter Mouton.
O’Connor, Joseph Desmond and Gordon Frederick Arnold 1973. Intonation of Colloquial
English. London: Longman.
Pierrehumbert, Janet B. 1980. The Phonology and Phonetics of English Intonation. Boston: Mas-
Prillwitz, Siegmund, Regina Leven, Heiko Zienert, Thomas Hanke and Jan Henning 1989. Ham-
NoSys Version 2.0 Hamburger Notationssystem für Gebärdensprachen: Eine Einführung. Ham-
burg: Signum Verlag.
Redder, Angelika 2001. Aufbau und Gestaltung von Transkriptionssystemen. In: Klaus Brinker,
Gerd Antos, Wolfgang Heinemann and Sven F. Sager (eds.), Text und Gesprächslinguistik.
Ein Internationales Handbuch Zeitgenössischer Forschung, 1038–1059. (Handbücher zur
Sprach- und Kommunikationswissenschaft 16.2.) Berlin: De Gruyter.
Roach, Peter, Gerry Knowles, Tamas Varadi and Simon Arnfield 1993. Marsec: A machine-read-
able spoken English corpus. Journal of the International Phonetic Association 23(2): 47–54.
Sager, Sven F. 2001. Probleme der Transkription nonverbalen Verhaltens. In: Klaus Brinker, Gerd
Antos, Wolfgang Heinemann and Svend F. Sager (eds.), Text und Gesprächslinguistik. Ein In-
ternationales Handbuch Zeitgenössischer Forschung, 1069–1085. (Handbücher zur Sprach- und
Kommunikationswissenschaft 16.2.) Berlin: De Gruyter.
Sager, Svend F. and Kristin Bührig 2005. Nonverbale Kommunikation im Gespräch–Editorial. In:
Kristin Bührig and Svend F. Sager (eds.), Osnabrücker Beiträge zur Sprachtheorie 70: Nonver-
bale Kommunikation im Gespräch, 5–17. Oldenberg: Redaktion Obst.
Scheflen, Albert 1965. The significance of posture in communication systems. Psychiatry 27: 316–331.
Scherer, Klaus R. 1970. Non-Verbale Kommunikation: Ansätze zur Beobachtung und Analyse der
Aussersprachlichen Aspekte von Interaktionsverhalten. Hamburg: Buske.
Interaktionsverhalten, 25–32. Weinheim: Beltz.
Schmitt, Reinhold (ed.) 2007. Koordination: Analysen zur Multimodalen Interaktion. Tübingen:
Narr.
Schneider, Wolfgang 2001. Der Transkriptionseditor HIAT-DOS. Gesprächsforschung-Online
Zeitschrift zur Verbalen Interaktion 2: 29–33.
Schönherr, Beatrix 1997. Syntax – Prosodie – Nonverbale Kommunikation. Empirische Untersu-
chungen zur Interaktion Sprachlicher und Parasprachlicher Ausdrucksmittel im Gespräch. Tü-
bingen, Germany: Niemeyer.
Schöps, Doris in preparation. Körperhaltung als Zeichen am Beispiel des DEFA-Films. Disserta-
tion, Technische Universität Berlin.
Selting, Margret, Peter Auer, Birgit Barden, Jörg R. Bergmann, Elizabeth Couper-Kuhlen, Sus-
anne Günther, Christoph Meier, Uta Quasthoff, Peter Schlobinski and Susanne Uhmann
1998. Gesprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte 173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, Jörg Bergmann, Pia Bergmann, Karin
Birkner, Elizabeth Couper-Kuhlen, Arnulf Deppermann, Peter Gilles, Susanne Günthner,
et al. 2009. Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächsforschung –
Online Zeitschrift zur Verbalen Interaktion 10: 353–402.
Silverman, Kim, Mary Beckman, John Pitrelli, Mori Ostendorf, Colin Wightman, Patti Price, Janet
Pierrehumbert and Julia Hirschberg 1992. ToBI: A standard for labeling English prosody.
Proceedings of ICSLP-1992, 867–870.
Wallbott, Harald 1998. Ausdruck von Emotionen in Körperbewegungen und Körperhaltungen. In:
Caroline Schmauser and Thomas Noll (eds.), Körperbewegungen und ihre Bedeutung, 121–136.
Berlin: Arno Spitz.
Weinrich, Lotte 1992. Verbale und Nonverbale Strategien in Fernsehgesprächen: Eine Explorative
Studie. Tübingen: Niemeyer.
Wells, John, William Barry, Martin Grice, Adrian Fourcin and Dafydd Gibbon 1992. Standard
computer-compatible transcription. Technical Report No. SAM Stage Report Sen.3 SAM
UCL-037. London: University College London.
Zwitserlood, Ingeborg, Asli Özyürek and Pamela Perniss 2008. Annotation of sign and gesture
cross-linguistically. Paper presented at the 3rd Workshop on the Representation and Proces-
sing of Sign Languages, Marrakesh.

1060 V. Methods
69. A linguistic perspective on the notation of

gesture phases
1. Introduction
2. Major accounts in the study of gesture phases
3. Linguistic description of gesture phases
4. Discussion
5. Some remarks on the annotation of gesture phases
6. References
Abstract
This chapter presents a proposal for the description of gesture phases derived from articu-
latory characteristics observable in their execution. It is grounded in a linguistic approach
to gestures which stresses the separation of gestural forms and functions in the analytic
process. By presenting a context-independent and a context-sensitive description of ges-
ture phases this paper aims at three aspects: 1) to present articulatory features apparent
in the execution of gestures phases, 2) to characterize and define gesture phases according
to these features and independent of the their functional aspects, and 3) to allow for a
description of cases in which features are replaced due to the sequential embedding of
phases in particular linear successions. The article concludes with a focus on the practical
implementation of this approach in the annotation process.
1. Introduction
When investigating the tight coordination between speech and gesture, it is useful to
focus on the structure of both modalities. Gestures show simultaneous structures
(along the different form parameters: hand shape, orientation, movement, and position,
e.g., Bressem this volume; Ladewig and Bressem forthcoming; Müller, Bressem, and
Ladewig this volume; see also Battison 1974; Stokoe 1960) but they also exhibit a linear
structure that is hierarchically organized (Müller et al. 2005; Müller, Bressem, and
Ladewig this volume). Segments of gestural movement have been referred to as gesture
phases (see below). They were investigated with respect to their internal structure
(Fricke 2012; Kendon 1980, 2004), their coordination and correlation with units of speech
(Condon and Ogston 1967; Efron [1941] 1972; Fricke 2012; Karpiński, Jarmołowicz-
Nowikow, and Malisz 2009; Kendon 1972, 1980; Kita, van Gijn, and van der Hulst 1998;
Loehr 2006; McClave 1991; Nobe 2000; Quek et al. 2002; Seyfeddinipur 2006; Yassinik,
Renwick, and Shattuck-Hufnagel 2004), and their characteristics observable in their exe-
cution (e.g., Bressem and Ladewig 2011; Chafai, Pelachaud, and Pelé 2006; Harling and
Edwards 1997; Kahol, Tripathi, and Panchanathan 2006; Latoschik 2000; Martell and
Kroll 2007; Wilson, Bobick, and Cassell 1996).
2. Major accounts in the study of gesture phases

That gestural movements have an internal linear structure has been observed already in
the 19th century. Ott, for instance, noticed that “[e]very gesture is divided into three
69. A linguistic perspective on the notation of gesture phases 1061
parts – the preparation, the gesture proper, and the return” (Ott 1902: 21). Also Mosher
remarked that “[t]he great majority of gestures with the hands consist of three parts,
which may be termed the preparation, the stroke, and the recovery.” (Mosher 1916:
10) Furthermore, although argued from a prescriptive perspective (see also Kendon
1980), both remark that gestures can be combined linearly into sequences or series of
gestures.
Kendon (1980) gives a first systematic account of the linear structure of gestures.
Based on the observation that “the pattern of movement that co-occurs with the speech
has a hierarchic organization which appears to match that of the speech units” (Kendon
1972: 190), he focused on the close relationship of gestures and speech and addressed
the phrasal structure of gestures. Kendon identified six different gesture phases: the
rest position, a “moment when the limb is in rest”, the preparation, a phase “in which
the limb moves away from its rest position to a position at which the stroke begins”,
the stroke, the meaningful part of a gesture “in which the limb shows a distinct peaking
of effort”, the hold, a moment in which “the hand is held still in the position it reached
at the end of the stroke”, and the retraction, “a phase in which the limb is either moved
back to its rest position or is readied for another stroke” or the partial retraction, a
“phase in which the hand does not return all the way to the position it was in” (Kendon
1980: 212). The linear combination of these phases can form higher-order structures: A
gesture phrase is composed of a preparation and a stroke. The whole excursion of the
hands from one rest position to the next is termed gesture unit (Kendon 1980). Within
this seminal paper, Kendon not only provides a vocabulary for the segmentation of ges-
tural movement, he demonstrates that gestures show a linear structure and are hierar-
chically organized. Furthermore, his work is fundamental for a fine-grained analysis of
gesture-speech interaction and marks the beginning of the coding of gesture phases in
gesture studies.
Following contributions extend Kendon’s model by focusing on particular gesture
phases and by providing further characterizations. Kita, van Gijn, and van der Hulst
(1998) address a particular gesture phase, namely the hold. While Kendon solely
talks about a hold as a phase in which the hand is held still after the execution of a
stroke, Kita, van Gijn and van der Hulst point out that a stroke can
(i) be preceded by a hold, a so called pre-stroke hold or

(ii) be followed by a hold, a so called post-stroke hold.
This differentiation is grounded in the functions these two types of holds fulfill with
respect to speech. Whereas a pre-stroke hold “is a period in which the gesture waits
for speech to establish cohesion so that the stroke co-occurs with the co-expressive por-
tion of speech” (Kita, van Gijn, and van der Hulst 1998: 26), a post-stroke hold is “a way
to temporally extend a single movement stroke so that the stroke and post-stroke-hold
together will synchronize with the co-expressive portion of speech” (Kita, van Gijn, and
van der Hulst 1998: 26). Moreover, they distinguish between independent holds, i.e.,
holds that can stand by themselves and be a “gestural expression” on their own and
dependent holds which adjoin and are “parasitic to the stroke” because “they arise
from the semiotic coordination or modification of the expression in the stroke” (Kita,
van Gijn, and van der Hulst 1998: 28). This functional specification of holds is currently
almost omnipresent in analyses focusing on gestures (see, for example, Gullberg and
1062 V. Methods
Holmquist 2006; Kendon 2004; Kettebekov and Sharma 2001; McCullough 2005;
McNeill 2005, 2012; Park-Doob 2010; Parrill 2000; Quek et al. 2002; Sowa 2006).
By and large, Kita, van Gijn, and van der Hulst (1998) remain within the framework
provided by Kendon (1980). Although they do focus on a few other characteristics such
as particular aspects of the gesture phase preparation, the basic set up of the gesture
phases as introduced by Kendon remains the same. Two aspects of their endeavor how-
ever stand out in contrast to the one put forward by Kendon. Their proposal includes
guidelines for the segmentation of gesture phases, i.e. the division of phases into smaller
stretches of movement prior to an identification of particular gesture phases. Further-
more, they aimed at a “method of analyzing continuous production of gestures and
signs, which is based purely on [a] formal basis” (Kita, van Gijn, and van der Hulst
1998: 34) and thus address an important aspect regarding the coding of gesture phases,
namely its orientation and dependence on speech.
The hold is again in the focus of attention in the McNeill lab coding manual (Duncan
n.d.). Duncan distinguishes between full holds, i.e. “holds with no detectable move-
ment” (Duncan n.d.: 4), and “ ‘feature’ or ‘virtual holds’ ”, i.e., holds which show
some movement but are characterized by a maintenance of hand shape and position
in gesture space (Duncan n.d.: 4). Regarding the other gesture phases, she includes prep-
aration, stroke, hold, and retraction in her coding manual, but gives no further descrip-
tion. Only the stroke is characterized in more detail, namely as the phase which
“typically (but not always!) is the interval of apparent greatest gestural effort” (Duncan
n.d.: 3). Although McNeill (2005) partially completes the reflection on the phases by
Duncan, insofar as he provides a further specification of the various gesture phases
and includes the phases pre-stroke hold and post-stroke hold, new aspects regarding
the characterization of gesture phases are again only added with respect to holds.
With her frame-by-frame marking procedure Seyfeddinipur (2006) initiates a new
turn in the identification of gesture phases. This segmentation procedure examines ges-
tural movement sequences frame by frame and marks transitions from one gesture
phase to another based on the sharpness of the video image. Accordingly, three types
of transitions in the execution of gestural movement sequences are distinguishable:
(i) “transition from a dynamic to a static phase,”

(ii) “transition from a static to a dynamic phase,” and
(iii) the “transition from a dynamic to a dynamic phase” (Seyfeddinipur 2006: 105).
These types of transitions provide the basis for the assignment of gestural movement
phases to a specific type of gesture phase as each gesture phase is allocated to particular
types of transitions:
(i) dynamic phases which are characterized by the execution of movement such as in
preparation, stroke, and retraction, and
(ii) static phases which do not involve movement such as holds and rest positions.
Furthermore, Seyfeddinipur introduces the category of an interrupted preparation/

stroke (Seyfeddinipur 2006: 109).
More than 30 years have passed since the first systematic account on the linear struc-
ture of gestures was given. Although changes in the set-up of the categories of gesture
phases have been offered (e.g., Kita, van Gijn, and van der Hulst 1998), the basic model
has remained the same. Modifications were offered with respect to technical aspects of
the coding process. Previous contributions have demonstrated that
(i) gestural movements display a structure of their own, i.e., gestural movements are
characterized by the progression of specific phases,
(ii) these phases are hierarchically organized, and
(iii) they correspond closely to units at the speech level, thereby underlining the close
relationship between gesture and speech.
However, existing schemes have concentrated particularly on a description of gesture

phases and their functional relation to adjacent gesture phases and to speech. Accounts
never focus on the gesture phases all by themselves. Kendon, for instance, defines the
preparation as the “phase in which the limb moves away from its rest position to a posi-
tion at which the stroke begins” (Kendon 1980: 212). The definitions by Duncan and
McNeill are similar: “the limb moves away from the rest position into the gesture
phase where it can begin the stroke” (McNeill 2005: 31). If questions on the features
and appearance of gesture phases are raised then it is done only in relation to movement
characteristics of the stroke. For example, Kita, van Gijn, and van der Hulst describe the
stroke as “the phase in which more force is exerted than in neighboring ones” (Kita, van
Gijn, and van der Hulst 1998: 30). Usually, however, descriptions of the various gesture
phases do not include a specification of the movement itself but merely picture its path
(e.g., “hands moved back into rest position” Seyfeddinipur 2006: 107).
In the following section, we attend solely to the phases themselves. Pursuing a lin-
guistic perspective on the subject, we will explicate articulatory features attended to
by an analyst during the coding of gesture phases and when describing their forms
and functions (see also Bressem and Ladewig 2011 for a more detailed account).
3. Linguistic description of gesture phases

In this paragraph we introduce a description and characterization of gesture phases that
is grounded solely on physical characteristics observable in their execution. In particu-
lar, the approach aims at three aspects:
(i) to present articulatory features apparent in the execution of gestures phases,

(ii) to characterize and define gesture phases according to these features and indepen-
dent of the their functional aspects, and
(iii) to allow for a description of cases in which features are replaced due to the sequen-
tial embedding of phases in particular linear successions.
In doing so, we present a context-independent (section 3.1) and a context-sensitive

description (section 3.2) of gesture phases.
The interest in a form-based, articulatory description of gesture phases is based on a
particular viewpoint. We approach the study of gestures from a linguistic-semiotic view-
point (see also Bressem 2012, this volume; Fricke 2007, 2012; Ladewig 2010, 2011, 2012;
Müller 1998, 2010, this volume; Müller, Bressem, and Ladewig this volume; Müller,
Ladewig, and Bressem this volume), which stresses the separation of gestural forms
1064 V. Methods
and functions in the analytic process. In this framework, gestures are first described with
respect to their form, i.e. independent of speech. For this purpose, the approach adopts
the four parameters of sign language for the characterization of gestures (see Battison
1974; Becker 2004; Ladewig and Bressem forthcoming; Sparhawk 1978; Stokoe 1960;
Webb 1996) and describes gestures regarding the configuration of the hand, the orientation
of the palm, the movement, and the position in gesture space. Only in a second step are ges-
tural forms evaluated with respect to their function and related to speech. Based on the sep-
arate description of form and function, analyses approaching gestures within this
framework are able to make statements about patterns and structures of gestures on the
level of form alone as well as on the level of form and function.
3.1. Context-independent (articulatory) description

Gesture phases are considered as minimal units of analysis that can be investigated
and described on their own, detached from their adjacent phases and independent of
their relation to speech. Thereby, only the articulatory features observable in their exe-
cution are focused on and provide grounds for their description. By pursuing a context-
independent perspective on gesture phases, two types of articulatory features can be
determined:
(i) distinctive features comprising the categories movement and tension and
(ii) additional features subsuming the categories possible types of movement and flow
of movement.
All features are based on physical characteristics observable in the execution of gesture
phases. However the two sets of features show a different distribution across phases.
The set of distinctive features subsumes attributes that are visible in all gesture phases.
They make up a paradigmatic set of properties that are mutually exclusive and stand in
opposition to each other. Accordingly, a gesture phase carrying a particular feature
from the category movement such as “presence of movement” cannot carry another
feature from the same category such as “absence of movement”.
The set of additional features comprises properties that are not observable in all ges-
ture phases as they only apply to such phases that exhibit a particular distinctive fea-
ture, namely “presence of movement.” They can only selectively be used to identify
gesture phases, because they have a different status in the form-based characterization
of gesture phases. They will be used for a further description in order to enhance the
formal account of the gesture phases to be presented.
Contrary to the common form features presented above, these additional form fea-
tures cannot be observed in all gesture phases. They only apply to phases that exhibit a
particular common feature, namely “execution of movement”. As such, they can only
be used selectively for an identification of gesture phases and specify the different ges-
ture phases. Both types of articulatory features are presented in the following.
3.1.1. Distinctive features

Two categories of distinctive features are identifiable in the execution of gestural
movement sequences. These are movement and tension.
3.1.1.1. Movement
A prominent feature observable in the execution of gesture phases is the feature
movement. As such, gesture phases can be distinguished according to the presence
and absence of movement:
(i) presence of movement [+movement],

(ii) absence of movement [-movement].
3.1.1.2. Tension
During the execution of gesture phases, changes between relaxation and exertion as
well as differences in the strengths of tenseness are observable. Tension may increase,
decrease, or remain stable. In order to account for differences in tension two different
kinds of muscular activities are taken into account: the volar and dorsal flexion of the
fingers.
Different qualities of tension are reflected in the configuration of the hand. As the
hand always shows some kind of a hand configuration due to residual muscle tension
(tonus), a default condition needs to be presumed. Accordingly, tension applies when
the hand’s configuration differs from that assumed in the default condition.
Accordingly, the category tension shows two features, namely
(i) presence of tension [+tension], and

(ii) absence of tension [-tension].
The feature [+tension] applies when the fingers either stretch (dorsal flexion) or bend
towards the palm (volar flexion), for instance. Tension is absent when the hands are
relaxed in a default position. In this case the feature [-tension] applies.
Gesture phases showing the features [+tension] can be sub-classified and marked by
the features
(i) constant tension [+constant], or

(ii) inconstant tension [-constant].
Movement phases exhibiting a stable configuration that differs from the ones taken in a
default condition are marked by the feature [+constant]. In these cases the tenseness
of the hand does not change throughout the execution of the phase. When [-constant]
applies, tenseness either increases or decreases. Accordingly, phases marked by this
feature can be further characterized in terms of an increase and decrease of tension:
(i) [+increase].
(ii) [-increase].
In both of the above-mentioned cases, the beginning and the end of the phases
show different degrees of tenseness reflected in varying configurations. The feature
[+increase] applies to phases in which tenseness is built up. Accordingly, in the begin-
ning of such a phase the hand is in or is close to the default condition and lacks
tension whereas the endpoint of the movement phase shows tension. This results in
1066 V. Methods
the formation of a configuration of the hand, which differs from the ones that can be
assumed in the default condition. In phases identified by a decrease of tension, the
beginning of the gesture phase is characterized by tension of the hand whereas the end-
point of the movement lacks the feature tension, i.e., the hand is in or close to the
default condition (Fig. 69.1). The decrease in hand tension corresponds with a deforma-
tion of the hand’s configuration approximating the default condition (see also Harling
and Edwards 1997; Martell 2005; for special cases see Bressem and Ladewig 2011: 69).
The application of the distinctive features allows for a differentiation of movement
phases in terms of gesture phases (see section 3.1.2 and 5). Additional features pre-
sented in the following section can enhance this form-based account and offer a
more detailed characterization of the phases.
3.1.2. Additional features

Phases marked by the distinctive feature [+movement] can be specified by further char-
acteristics. Two additional categories are distinguished that buttress the context-
independent identification of the gesture phases, namely possible types of movement
and flow of movement.
3.1.2.1. Possible types of movement

Gestures may vary according to the types of movement (see Bressem this volume) they
perform. With respect to dynamic movement phases, i.e. phases characterized by the
execution of movement, two features are distinguishable:
(i) restricted [+restricted], and

(ii) not restricted [-restricted].
Gesture phases that are marked by the feature [+restricted] show rather “straight” and
“curved” movements. In cases in which the movement is anchored solely at the wrist
“bending” and “raising” of the hand as well as “rotation” can be observed. In phases
in which the feature [-restricted] applies following six basic types are observable:
(i) “straight”,
(ii) “arced”,
(iii) “circle”,
(iv) “spiral”,
(v) “zigzag”, and
(vi) “s-line.”
Movements of the wrist exhibit “bending”, “raising”, or “rotation” (Bressem this

volume).
3.1.2.2. Flow of movement

Gesture phases characterized by the presence of movement can show variations in the
quality of movement. In these cases the category “flow of movement” applies which
captures varying force and/or a change of velocity in the execution of movement. Two
features are distinguished:
(i) variable [+variable], and

(ii) not variable [-variable].
In cases in which the feature [+variable] applies, the flow of movement can be described
as variable, i.e., it shows some degree of variation within one movement phase: It may
be accentuated, accelerated, or decelerated. If the flow of movement does not show any
variation the phase is marked by the feature [-variable] (see Tab. 69.1).
For the dynamic gesture phases, the additional features have the same status as the
distinctive features. They show a particular distribution across the dynamic phases and,
as such, differentiate these phases from each other. As is the case with the distinctive
features, the additional features are mutually exclusive of one another and stand in
opposition to each other, i.e., if a phase carries a particular feature from one category
it cannot carry another feature of the same category.
Tab. 69.1: Overview of distinctive and additional features

articulatory
distinctive features
features
movement + tension
+ tension - tension
+ movement - movement + constant - constant
+ increase - increase
movement
possible
movement types of
additional features
+ restricted - restricted
flow of
+ variable - variable
Applying the sets of distinctive features as well as additional features to gesture phases
results in the following characterization of gesture phases.
3.1.3. Description of gesture phases

Argued from a context-independent perspective, only five gesture phases are identifi-
able in the data, namely preparation, retraction, stroke, hold, and rest position. Applying
the articulatory characteristics introduced above, these gesture phases can be described
as follows. A feature matrix for each phase can be found in Tab. 69.2.
3.1.3.1. Preparation
The gesture phase of a preparation is characterized by the execution of movement. The
tenseness of the hand increases during its performance as the hand shape is being
formed and a configuration that differs from the one taken in the default condition is
assumed. Accordingly, the preparation is marked by the features [+movement],
1068 V. Methods
[+tension], [-constant], and [+increase]. The preparation phase can be further character-
ized through additional features. According to these, it is distinguished by a restricted
variation of movement types as the only “straight” and “curved” movements occur. If
the movement is anchored merely at the wrist, the hand is raised (dorsal flexion) or it is
rotated upwards (supination) in the majority of cases. The flow of movement does not
show any variation. Accordingly, the preparation carries the features [+restricted] and
[-variable] (see Tab. 69.2).
3.1.3.2. Retraction
The retraction phase shows a similar feature matrix as the preparation phase. Both
phases are marked by the presence of movement [+movement] and a varying tenseness
([+tension], [-constant]). However, during the execution of the retraction tenseness
decreases ([-increase]). As such, the hand’s configuration is modified insofar as it is
resolved and approximates a default condition.
Again the variation of movement types is restricted to “straight” and “curved.” If the
movement is anchored at the wrist only, the hand is either bent to pulse (volar flexion)
or rotated (pronation), in the majority of cases. The flow of movement does not vary.
According to these observations, the features [+restricted] and [-variable] apply.
3.1.3.3. Stroke
Similar to the preparation and the retraction, the stroke is characterized by execution of
movement. However, this is the only feature that all three gesture phases discussed so
far have in common. Throughout the execution of a stroke, the tenseness of the hand
remains stable, i.e., the configuration of the hand is maintained and differs from the
one taken in the default condition (see Bressem and Ladewig 2011 for further details).
Furthermore, the types of movement are not restricted at all. This means, that the six
basic movement types “straight”, “curved,” “circle”, “spiral”, “zigzag”, and “s-line”
may be realized by a stroke. Additionally, the flow of movement may vary during the
execution of the stroke. This means that the stroke is the only phase, which may be ac-
centuated and/or performed with a change in velocity. According to these observations,
the stroke is marked by the features [+movement], [+tension], [+constant], [-restricted],
and [+variable].
3.1.3.4. Hold
The hold is characterized by absence of movement. Throughout the execution of a hold,
the tenseness of the hand remains stable (see above) and the configuration differs from
the one taken in the default condition. In some cases, slight drifting movements are
observable in the execution of a stroke, which are not meaningful (see also Duncan
n.d.). However, these are not considered gestural movements but are evidence of mus-
cle contraction. Accordingly, the features [-movement], [+tension], and [+constant]
apply.
The gesture phases stroke and hold share one distinctive feature that is not realized
in the remaining phases and is, as such, essential for their identification: This is the
feature [+tension] that is sub-classified by the feature [+constant].
3.1.3.5. Rest position

Like holds, a rest position is characterized by absence of movement. Rest positions are
the only gesture phases characterized by absence of tension. They are marked by the
features [-movement] and [-tension]. If the hands are moved during a rest position,
the hands either touch an external object, the speaker’s body, or they are engaged in
fidgeting movements (see also Arendsen, Doorn, and Ridder 2007; Ekman and Friesen
1969; Freedman 1977)
3.1.4. Summary
This context-independent perspective on gesture phases allows an explication of the ar-
ticulatory features that an analyst perceives and relies on when coding gesture phases
and when describing their forms and functions. These features are subsumed under
sets of distinctive and additional features, which show a particular distribution across
gesture phases. In this way, a feature matrix for each gesture phase was set up:
Tab. 69.2: Overview of gesture phases

Feature/ Movement possible flow of Tension
Phase types of movement
movement
Preparation [+ movement] [+ restricted] [- variable] [+ tension] [- constant] [+ increase]
Retraction [+ movement] [+ restricted] [- variable] [+ tension] [- constant] [- increase]
Stroke [+ movement] [- restricted] [+ variable] [+ tension] [+ constant]
Hold [- movement] [+ tension] [+ constant]
Rest position [- movement] [- tension]
However, when observing gesture phases embedded within a series of further phases,
these characteristics may undergo changes. Some cases in which the phases’ character-
istics deviate from their usual feature sets will be presented in the following section
focusing on a context-sensitive description.
3.2. Context-sensitive description

In the simplest succession of gesture phases, a preparation is followed by a stroke,
which is, in turn, succeeded by a retraction. The gestural movement is framed by
two rest positions. Quite often, however, the succession of gesture phases is more com-
plex. In many cases, for instance, several strokes follow each other immediately. Some-
times these successions exhibit a preparatory phase in between and sometimes they
do not.
Due to these sequential embeddings, the characteristics of gesture phases described
above may undergo changes, which resemble co-articulation phenomena in spoken lan-
guage (see e.g., Menzerath and Lacerda 1933; Trubetzkoy 1958; for a discussion with
respect to gestures see Bressem and Ladewig 2011: 76–83). In general, linear succes-
sions can be distinguished based on two different ranges of modification. Accordingly,
it is possible to distinguish linear sequences, which cause changes in either only one ges-
ture phase, or in two or more gesture phases. Whereas in the former, only one gesture
phase undergoes changes in its characteristic features, the latter causes changes in the
1070 V. Methods
features of two adjacent gesture phases. Two cases will be presented: firstly the omission
of a preparation, and secondly a succession of strokes.
3.2.1. Changes in one gesture phase: Omission of a preparatory phase

When the hands are moved out of a rest position to perform a gesture, the stroke is
usually preceded by a preparation. However, cases in which the preparation phase is
dropped are quite common (see Kendon 1980, 2004; Kita, van Gijn, and van der
Hulst 1998; McNeill 1992; Seyfeddinipur 2006), for instance, in specific occurrences
of deictic gestures. The omission of a preparation has an impact on the following
gesture phase, namely the stroke. What can be observed is that the characteristic set
of features exhibited by a stroke undergoes changes. More precisely, particular features
are replaced.
Whereas a stroke is usually characterized by the features [+movement, +tension,
+constant, +variable] (see Tab. 69.2), a stroke in a linear succession that lacks a prepa-
ratory phase, as in many deictic gestures, for instance, displays changes in the category
“tension”. In these sequences, the stroke is characterized by the features [+movement,
+tension, -constant, +increase, +restricted, +variable] (for an example, see Bressem and
Ladewig 2011). Accordingly, the stroke adopts the characteristics of a preparation and
shows an increase of tension (see Tab. 69.2). The distinctive feature [+constant] is sub-
stituted by the feature [+increase]. However, the stroke is still clearly identifiable from
a preparation as it shows a particular additional feature characteristic for a stroke,
namely the feature [+variable] – a feature the stroke phase does not share with any
other gesture phase.
The range of contexts in which the features of a stroke can be replaced appears to be
limited. More specifically, these types of replacements have been observed mostly in
such linear successions, in which the stroke follows a rest position or a retraction.
Another instance in which characteristics of a gesture phase undergo changes is the
transition of a retraction to a rest position on the speaker’s body, on an object, or to an
action. (An example is given in Bressem and Ladewig 2011: 80–83)
3.2.2. Changes in two or more gesture phase: Succession of strokes

The following section will present a case in which the change not only affects one phase
but where two adjacent gesture phases alter their characteristics because of their
sequential context. This phenomenon can often be observed in gestural repetitions.
(Bressem 2012)
Characteristic for this phenomenon, which can be observed quite frequently, is the
succession of several strokes exhibiting the same hand configuration, orientation, move-
ment type, or position in gesture space. In these sequences, the speaker moves his/her
hand in several successive strokes. The subsequent stroke phases are separated from
each other by a preparatory phase, in which the hands move in the opposite direction.
But the preparations in between differ from the strokes only in one aspect, i.e. direction
of movement. The configuration of the hand, its orientation, type of movement as well
as position in gesture space is the same for both gesture phases. This is problematic, as
only the first preparatory phase of the segment is distinguished by the characteristic fea-
ture set, namely [+movement], [+tension], and [+increase]. The following preparations
show changes in the category “tension”: The feature [+increase] is replaced by the fea-
ture [+constant]. This change in tension is reflected in the hand’s configuration. Whereas
the hand’s configuration is usually being formed during the execution of a preparation, it
is maintained in theses specific linear successions. Hence, these preparation phases
display the features assigned to the gesture phase stroke (see Tab. 69.2).
These observations thus raise the question whether the movement segments in
between the strokes can still be regarded as preparatory phases or whether they should
rather be considered as strokes. A closer examination of the strokes in these sequences
supports the first assumption: All strokes in these successions show a further movement
characteristic subsumed under the category “flow of movement”, namely an accentua-
tion. By accentuation of movement we understand that the end of the motion is stressed
such that the movement is carried out with more force. This rise in force leads to an
increase in the intensity at the end of the movement execution (see Bressem 2012).
Accordingly, in addition to the features [+movement, +tension, +constant], the strokes
in such successions exhibit the feature [+variable].
The range of contexts in which the abovementioned changes occur can only be ob-
served if at least two strokes sharing the same form parameters are executed. Therefore,
preparations alter their articulatory feature “tension” only if more than one stroke with
the same hand shape, orientation of the palm, movement and position in gesture space
are carried out.
3.2.3. Summary
By taking a linguistic perspective on the study of gestures, the identification and
description of gesture phases has been reconsidered. A context-independent descrip-
tion of the phases preparation, stroke, hold, retraction, and rest position based on
their articulatory characteristic alone, leaving functional aspects aside was proposed.
Two sets of articulatory features were introduced. These are
(i) distinctive features subsuming the categories movement and tension, and
(ii) additional features comprising the categories possible types of movement and flow
of movement.
An application of these categories and their sub-categories to the distribution of

these features across the different phases was presented, providing a definition for
each gesture phase based on the form only.
In the context-dependent description of gesture phases, it was shown how the
sequential embedding of the phases in specific linear successions results in changes of
the formal characteristics of particular phases. Accordingly, it is possible to distinguish
linear sequences, which cause changes in one gesture phase as in the cases of “omission
of a preparation”, and changes in two or more gesture phases as in the case of “succession
of strokes”.
4. Discussion
Adopting a feature-based approach to the description of gesture phases offers an expli-
cation of gestural characteristics and contributes to the discussion of gesturalness
1072 V. Methods
(Kendon 2004: 15), i.e., the “features that an action must have for it to be treated as a
gesture” (Kendon 2004: 12, see also Kendon 1996). However, analyzing gesture phases
as a bundle of features does not necessarily imply the assumption that the described
gestural units resemble or are akin to units of speech. The present approach to gesture
phases is inspired by phonology and proposes to treat gesture phases as separate units
of analysis that can be perceived and analyzed as such. Therefore, the approach offered
here provides the opportunity to describe patterns and structures on the level of the
phases themselves and offers new insights into the semiotic system “gesture”. It consti-
tutes a step further towards a “grammar of a gesture” (Bressem 2012; Fricke 2012; La-
dewig 2012; Ladewig and Bressem forthcoming; Müller, Bressem, and Ladewig this
volume) and lays further grounds for an understanding of the intertwining simultaneous
and linear structures of speech and gesture.
The linguistic framework described here serves as a theoretical building block from
which elements are taken and adopted as far as the semiotic structures of the medium
“gesture” allow for it. It aims at developing a consistent terminology for the description
of both modalities speech and gesture and intends to advance reflections upon a multi-
modal grammar (Fricke 2012). Notably, this is done with the utmost care, so as not to
lose sight of the particular properties of gestures.
5. Some remarks on the annotation of gesture phases

The annotation of gestures is often the first step in the analysis of gestures. In the fol-
lowing section, we focus on the practical side of identifying and coding gestures phases.
Some details will be outlined that can be either considered as analytical steps to be fol-
lowed in the annotation of gesture phases or as criteria to be consulted at particular
points in the coding process. These are
(i) the determination of movement phases,

(ii) the determination of gesture phases on the basis of their form, and
(iii) the determination of gesture phases on the basis of their function.
As proposed above (see section 3), the annotation is done independent of speech,
meaning that the sound is turned off during this process. In this way, only the units
of gesture phases are taken into account and neither their relation nor function with
respect to speech.
The annotation is executed in the annotation software ELAN, a software developed
for the annotation of audio-visual data (Max Planck Institute for Psycholinguistics,
Nijmegen, The Netherlands, http://www.lat-mpi.eu/tools/elan/, Wittenburg et al. 2006).
ELAN provides the possibility to watch videos in varying speed and analyze them
frame by frame, set up individual lines for the categories to be annotated, and group an-
notations according to a time interval. Furthermore, annotations can be exported into
programs such as Microsoft Word or Microsoft Excel.
In order to decide whether a particular movement is a preparation or a stroke, for
instance, the gestural movement excursions need to be segmented. More precisely, on-
sets and offsets of movement phases need to be determined. Seyfeddinipur’s (2006)
frame-by-frame marking procedure aims exactly at this analytical step in the coding
of gesture phases. It introduces a methodological approach to the identification of ges-

ture phases and establishes “unambiguous coding criteria for obtaining consistent and
frame-accurate times of gesture phases” (Seyfeddinipur 2006: 105).
For this purpose, Seyfeddinipur takes advantage of an artifact of videos, namely the
sharpness of the video image in which the execution of movement becomes apparent in
blurry and clear images. Along with the differentiation of gesture phases in two types of
phases, i.e.,
(i) dynamic phases which are characterized by the execution of movement such as in
preparation, stroke, and retraction, and
(ii) static phases which do not involve movement such as in holds and rest positions.
Applying the frame-by-frame marking procedure, transitions from one movement

phase to another are identified on the basis of the sharpness of the video image.
More specifically, transitions from dynamic and static movement phases are identified
(see section 2). In practice, one goes through a video sequence frame by frame, exam-
ines the quality of the video image, and marks new movement sequences depending on
the sequencing of clear and blurry images (see Fig. 69.1).
blurry blurry blurry clear clear clear clear blurry blurry clear blurry blurry
5 6 7 8 9 10 11 12 13 14 15 16
dynamic phase static phase dynamic phase dynamic phase
Fig. 69.1: Segmentation of movement applying Seyfeddinipur’s (2006) frame-by-frame marking

procedure.
It needs to be stressed, however, that the offset of one movement phase can only be
exactly determined when the onset of the following movement phase is taken into
account. The onset of one movement phase thus retrospectively determines the offset
of the preceding. Accordingly, the precise ending of any gestural movement phase
can only be made out when
(i) the next video image is as clear as the preceding thus showing that it belongs to a
following static phase, or
(ii) the next video image is blurred and therefore showing that it belongs to a following
dynamic phase (see Fig. 69.1).
Furthermore, the sharpness of a video image should be determined with regard to the
preceding and following video image as this criterion depends
(i) on the quality of the video data, in general, and

(ii) on the velocity of the executed gestural movements observable in the data.
The determined dynamic and static movement phases provide the basis for the assign-
ment of gesture phases since each gesture phase is allocated to particular types of
transitions.
1074 V. Methods
Gesture phases can be allocated to movement phases based on the consideration of

form and function. Both are presented in the following sections.
5.1. Determination of gesture phases: Form

In order to allocate gesture phases to the dynamic or static movement phases identified
with the frame-by-frame marking procedure, the articulatory features for the categories
movement, tension, possible types of movement and flow of movement (see section 3.1)
should be determined for each phase. In order to do so, Tab. 69.3 can be used as a tem-
plate in this analytical step. Accordingly, if a feature listed in the chart can be observed,
a “+” (plus) is noted, if it is not present, a “-” (minus) is noted. The evolving feature
matrix can then be compared to the characteristic set of features presented for each
gesture phase in section 3.1.3.
Tab. 69.3: Template for the coding of gesture phases

Feature/ Movement possible flow of Tension
Phase types of movement
movement
gesture phase [+/- movement] [+/- restricted] [+/- variable] [+/- tension] [+/- constant] [+/- increase]
In order to distinguish strokes from each other, it might in some cases be useful to pay
attention to a change in the realization of one or more parameters, i.e. movement, hand
shape, orientation of the palm, and position in gesture space. This analytical step is
based on the assumption that the meaning of a gesture is reflected in its form, such
that the modification of the form may entail changes in the meaning of a gesture (see
e.g., Ladewig 2010, 2011; Ladewig and Bressem forthcoming; Müller 2004). Accordingly,
the transition from two dynamic movement phases identified as strokes may not only be
visible in a frame showing a clear video image. Two strokes are also differentiated from
each other by a modification of one or more parameter realizations.
If the analyst encounters phases, which cannot be determined easily, s/he should con-
sider special cases (see section 3.2 and Bressem and Ladewig 2011) or take the function
of gesture phases into account.
5.2. Determination of gesture phases: Function

Functional approaches to gesture phase can be found in the literature (see section 2).
Often, however, two functional aspects are intermingled, namely
(i) the function of a gesture phase with regard to surrounding gesture phases and
(ii) their communicative and interactive function.
As elucidated above, we follow a linguistic-semiotic approach to the study of gestures,

firstly conducting the analysis of form independent from the co-expressive speech before
bringing both modalities together in the analytic process. Thus, we focus on the function
a gesture phase fulfills within its sequential embedding of other phases (see Tab. 69.4).
Tab. 69.4: Function of gesture phases

gesture phase function within sequential embedding
preparation – prepare the hands for the execution of the stroke
– move the hands to a particular position in gesture space
– assume a particular configuration with which the stroke begins
retraction – transition from stroke to possible rest position
stroke – forms the center of a gestural unit (“nucleus”, Kendon 2004)
hold – neighbors stroke (either precedes or follows it)
– may selectively stand alone
– belongs to the center of a gestural unit (“nucleus”, Kendon 2004)
rest position – default condition
Preparations and retractions fulfill functions with respect to following gesture phases.
The function of the preparatory phase is to bring the hand to a particular position at
which or from which a stroke can be performed. This function is reflected in the feature
[+restricted] since moving the hands to a position in space does not demand a variety of
movement patterns. Furthermore, during the preparatory phase the hand assumes
the configuration of the following stroke. Therefore, a preparation shows the features
[-constant] and [+increasing] in most cases. So, if an analyst considers a particular move-
ment phase to be performed so that another phase can be executed, the movement
phase can be identified as a preparation.
The retraction constitutes, in most cases, a transition from a stroke to a rest position.
Accordingly, the formation assumed during the preparation and maintained while per-
forming the stroke is resolved and approximates a default condition. In some cases, the
hands are not moved fully back to a rest position but the path of the hands is interrupted.
This phase has been termed partial retraction (Kendon 1980: 212; Seyfeddinipur 2006).
However, speaking in terms of articulatory features and taking the phases themselves
into account, a partial retraction exhibits the same feature matrix as a “full” retraction.
The stroke is the phase that carries the meaning of a gesture. Thus, it forms the cen-
ter of a gestural unit, which has also been referred to as nucleus (Kendon 2004). Hence,
a stroke differs from other gesture phases insofar as it shows the widest range of pos-
sible types of movements and it varies in the flow of movement. As such it can be ac-
centuated, for instance, which is one reason why it has been termed the phase which “is
supposed to be more forceful compared to its neighboring phases” (Kita, van Gijn, and
van der Hulst 1998: 32).
A hold can occur independently but in most cases it prolongs the scope of the stroke
as a hold mostly precedes or follows it. Pre- and post-stroke hold together with the
stroke or a hold alone belongs to the nucleus of a gestural unit (Kendon 2004).
The most striking feature in the identification of a rest position is its form. During
the rest position the hands and arms are relaxed. A rest position serves as a reference
in order to decide whether a hand shows tension or not which is why we also consider it
a default condition.
The three analytical steps presented above should not be understood as completely
independent steps in the annotation of gesture phases. Rather, the annotation process
must be conceived as a process, characterized by a back and forth between those three
aspects (see Fig. 69.2). Particularly in complex linear sequences of gestures, the
1076 V. Methods
segmentation of movement
phases
determination of gesture determination of gesture

phases-function phases-form
Fig. 69.2: Illustration of the coding procedure conceived as a process
continuous consideration of all three aspects enhances the coding process. The separa-
tion of this process into the determination of movement phases and their classification
based on form and function mainly serves the purpose of bringing to light the different
analytical steps necessary in the segmentation and coding of gesture phases.
Last but not least, we would like to point out that the method presented above is not
to be understood as an error-proof method but should be regarded as a companion to
other proposals and as a further step towards putting the coding of gesture phases on
objectives grounds, by taking the characteristics of the medium itself into account.
Acknowledgments
We are grateful to the Volkswagen Foundation for supporting this research with a grant
for the interdisciplinary project “Towards a grammar of gesture: evolution, brain and
6. References
Arendsen, Jeroen, Andrea J. van Doorn and Huib de Ridder 2007. When and how well do people
see the onset of gestures. Gesture 7(3): 305–342.
5: 1–19.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D.
dissertation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt
(Oder).
Berlin/Boston: De Gruyter Mouton.
tural movement? Semiotica 184(1/4): 53–91.
Chafai, Nicolas Ech, Catherine Pelachaud and Danielle Pelé 2006. Analysis of gesture expressivity
modulations from cartoons animations. Workshop on “Multimodal Corpora”, International
Conference on Language Resources and Evaluation LREC, May 27th in Genoa, Italy.
Condon, William S. and William D. Ogston 1967. A segmentation of behavior. Journal for Psychi-
atric Research 5: 221–235.
Duncan, Susan D. n.d.. Coding “Manual” http://mcneilllab.uchicago.edu/pdfs/Coding_Manual.pdf,
accessed May 2006.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published as “Gesture and
Environment” New York: King’s Crown Press. First published [1941].
gins, usage and coding. Semiotica 1(1): 49–98.
Freedman, Norbert 1977. Hands, words and mind: On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and
Stanley Grand (eds.), Communicative Structures and Psychic Structures, 109–132. New York:
Plenum.
Fricke, Ellen 2012. Grammatik Multimodal: Wie Wörter und Gesten Zusammenwirken. Berlin: De
Gruyter.
Gullberg, Marianne and Kenneth Holmquist 2006. What speakers do and what listeners look at.
Visual attention to gestures in human interaction live and on video. Pragmatics and Cognition
14(1): 53–82.
Harling, Philip and Alistair Edwards 1997. Hand tension as a gesture segmentation cue. In: Philip
Harling and Alistair Edwards (eds.), Progress in Gestural Interaction. Proceedings of Gesture
Workshop 1996, 75–88. Berlin: Springer.
Kahol, Kanav, Priyamvada Tripathi and Sethuraman Panchanathan 2006. Recognizing whole body
movements and gestures through activities in human anatomy. International Journal on Sys-
temics, Cybernetics and Informatics 3: 25–32.
Karpiński, Maciej, Ewa Jarmołowicz-Nowikow and Zofia Malisz 2009. Aspects of gestural and
prosodic structure of multimodal utterances in Polish task-oriented dialogues. In: Grazyna De-
menko, Krzysztof Jassem and Stanislaw Szpakowicz (eds.), Speech and Language Technology,
volume 11, 113–122. Poznań, Poland: Polish Phonetic Association.
Kendon, Adam 1972. Some relationship between body motion and speech In: Aron W. Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Pergamon
Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In:
Mary Ritchie Key (ed.), Nonverbal Communication and Language, 207–277. The Hague:
Mouton.
Kendon, Adam 1996. An agenda for gesture studies. The Semiotic Review of Books 7(3): 7–12.
Press.
Kettebekov, Sanshzar and Rajeev Sharma 2001. Toward natural gesture/speech control of a large
display. In: Roderick Little and Laurence Nigay (eds.), Engineering for Human-Computer
Interaction, 221–234. Heidelberg, Germany: Springer.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and
cospeech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Springer.
1078 V. Methods
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech. Ph.D. disserta-
tion, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
structures in gestures on the basis of the four parameters of sign language. Semiotica.
Latoschik, Marc Erich 2000. Multimodale Interaktion in virtueller Realität am Beispiel der virtuel-
len Konstruktion. Bielefeld: Technische Universität Bielefeld.
Loehr, Dan 2006. Gesture and Intonation. Washington, DC: Georgetown University Press.
Martell, Craig and Joshua Kroll 2007. Corpus-based gesture analysis: An extension of the form
dataset for the automatic detection of phases in a gesture. International Journal of Semantic
Computing 1: 521.
McClave, Evelyn Z. 1991. Intonation and gesture. Ph.D. dissertation, Georgetown University,
Washington, DC.
McCullough, Karl-Erik 2005. Using Gestures during Speaking: Self-Generating Indexical Fields.
Chicago: ProQuest Information and Learning.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2012. How Language Began: Gesture and Speech in Human Evolution. (Ap-
proaches to the Evolution of Language.) Cambridge: Cambridge University Press.
Menzerath, Paul and Antonio de Lacerda 1933. Koartikulation, Steurung und Lautabgrenzung,
Volume 1. Berlin: Dümmler.
Mosher, Joseph A. 1916. The Essentials of Effective Gesture for Students of Public Speaking. New
York: Macmillan.
lin: Arno Spitz.
Perspektive. In: Sprache und Gestik. Sprache und Literatur 41(1): 37–68.
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Se-
dinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gestures:
A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
tion Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem this volume. Gestures and speech from a lin-
guistic point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a Grammar of
Gesture: Evolution, Brain, ans Linguistics Structures. Berlin: Antrag im Rahmen der Förder-
initiative “Schlüsselthemen der Geisteswissenschaften Programm zur Förderung fachübergrei-
fender und internationaler Zusammenarbeit”.
Nobe, Shuichi 2000. Where do most spontaneous representational gestures actually occur with
respect to speech? In: David McNeill (ed.), Language and Gesture, 186–198. Cambridge: Cam-
70. A linguistic perspective on the notation of form features in gestures 1079
Ott, Edward Amherst 1902. How to Gesture. New York: Hinds and Noble.
Park-Doob, Mischa Alan 2010. Gesturing through time: Holds and intermodal timing in the stream
of speech. Ph.D. dissertation, Department of Linguistics, University of Berkeley.
Parrill, Fey 2000. Hand to mouth: Linking spontaneous gesture and aspect. BA thesis, Department
of Linguistics, University of Berkeley.
Quek, Francis, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E.
McCullough and Rashid Ansari 2002. Multimodal human discourse: Gesture and speech.
Association for Computing Machinery, Transactions on Computer-Human Interaction 9(3):
171–193.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting Speech and Gesture. (Max Planck Institute
Series in Psycholinguistics, 39.) Nijmegen: Max Planck Institute.
Akademische Verlagsgesellschaft.
Sparhawk, Carol 1978. Contrastive-Identificational features of Persian gesture. Semiotica 24(1/2):
49–86.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Communicative Systems of
the American Deaf. Studies in Linguistics, occasional paper, no. 8. Buffalo, NY: University
of Buffalo Press.
Trubetzkoy, Nikolaj S. 1958. Grundzüge der Phonologie. Göttingen: Vandenhoeck und Ruprecht.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. Ph.D. dissertation, University of
Rochester, Rochester, New York.
Wilson, Andrew D., Aaron F. Bobick and Justine Cassell 1996. Recovering the temporal structure
of natural gesture. In: Proceedings of the Second International Conference on Automatic Face
and Gesture Recognition, 66–71.
ELAN: A professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation.
Yasinnik, Yelena, Margaret Renwick and Stefanie Shattuck-Hufnagel 2004. The timing of speech
accompanying gestures with respect to prosody. In: Proceedings From Sound to Sense, 97–102.
http://velar.phonetics.cornell.edu/peggy/FA-Yasinnik-STS-MAC.pdf (29 November 2010).
Silva H. Ladewig, Frankfurt (Oder) (Germany)

70. A linguistic perspective on the notation of

form features in gestures
1. Introduction
2. Theoretical and methodological framework: A linguistic (semiotic) perspective on gesture
analysis
3. A form-based notation system for gestures
4. Applying the notation system: Some examples
5. Conclusion
6. References
Abstract
This chapter presents a notation system for gestures, which, by focusing solely on gestures’
physical appearance, directs the attention to the different facets of a gesture’s form and
1080 V. Methods
focuses on its detailed characterization. The system is grounded in a linguistic-semiotic

approach to gestures, assuming a heuristic separation of form, meaning, and function in
the analytical process. Based on the four parameters of sign language (hand shape, orien-
tation, movement, and position in gesture space), the notation system includes guidelines
for the notation of gestures’ forms with regard to their physical appearance.
1. Introduction
Notation, coding or annotation systems for gestures have been proposed by a range of
researchers (e.g., Birdwhistell 1970; Calbris 1990; Duncan n.d.; Gut et al. 2002; Kipp
2004; Lausberg and Sloetjes 2009; Martell 2005; McNeill 1992, 2005; Mittelberg 2006,
2010; Müller 1998; Sager 2001; Sager and Bührig 2005 inter alia; see also Bohle this vol-
ume). However, coming from various disciplines and theoretical backgrounds and with
differing analytical perspectives in mind, a systematic linguistic method for the notation
of gestures is still lacking. Systems differ greatly with respect to what is being described
and in how much detail, as well as the applied terminology and methodology. More-
over, they often remain implicit with regard to their respective research question or
analytic perspective (see Bressem this volume for an overview of existing notation
and transcription system for gestures).
Typically, speech is omnipresent during notation and is made the basis of comparison,
which gestural descriptions have to meet. Aspects of gestures’ forms are thereby often
selectively chosen subject to the accompanying utterance and the information contained
therein. Although the necessity to focus on gestures’ form is gaining more and more
ground in the respective research (e.g., Bergmann, Aksu, and Kopp 2011; Bressem
2012; Bressem and Ladewig 2011; Calbris 1990; Fricke 2007, 2012; Hassemer et al. 2011;
Holler and Beattie 2002; Kendon 2004; Ladewig 2010, 2011, 2012; Ladewig and Bressem
forthcoming; Lücking et al. 2010; Martell 2002; Mittelberg 2007; Müller 2004, 2010, 2011;
Sowa 2006; Teßendorf 2005; Webb 1996, 1998 inter alia), gestural forms alone have only
rarely been the focus of notation or annotation systems (e.g., Birdwhistell 1970; Sager
2001; Sager and Bührig 2005; Martell 2002). Against this background, it appears timely
to propose a framework for the annotation of gestures from a linguistic point of view.
The present chapter presents a notation system for gestures, which, by focusing solely
on gestures’ physical appearance, directs the attention to the different facets of a gesture’s
form, and focuses on its detailed characterization. The system is grounded in a linguistic-
semiotic approach to gestures, assuming a heuristic separation of form, meaning, and
function in the analytical process (see section 2 for further details). Accordingly, the
present system differs from others existing systems in three essential aspects:
(i) It concentrates solely on a form description of gestures.

(ii) It proposes a form description independent of speech.
(iii) It avoids gestural form descriptions including paraphrases of meaning.
The notation system only includes guidelines for the notation of gestures’ forms with
regard to their physical appearance. It addresses hand shapes, movement patterns,
orientations of the hand, and positions in gesture space. It does not include guidelines
for the segmentation and coding of gesture phases (see for example Bressem and
Ladewig 2011), a meaning analysis of gestural forms (e.g., Kendon 2004; Ladewig
2010, 2011; Müller 2004, 2010), a classification of gestures (e.g., McNeill 1992; Müller
1998), or other aspects of gestural coding and analysis. The notation system is consid-
ered as one module of a linguistic description and analysis of gestures, which can be
freely combined with other annotation systems or with other aspects of a linguistic ges-
ture analysis (for an overview of a linguistic method of gesture analysis see Müller,
Bressem, and Ladewig this volume; Müller, Ladewig, and Bressem this volume).
The notation system may be applied within a range of disciplines such as (cognitive)
linguistics and semiotics, anthropology, ethnography, primatology, psychology, and cogni-
tive science. It can be used for descriptive as well as experimental approaches to the ana-
lysis of gestures. Although it was developed within a linguistic context, it is not restricted
to linguistic research questions. On the contrary, it is designed to be a widely applicable
cross-disciplinary notation system for a description of gestures’ physical features.
2. Theoretical and methodological framework: A linguistic

(semiotic) perspective on gesture analysis
The notation system is grounded in a linguistic-semiotic approach to gesture, in which
speech and gesture are understood as inseparably connected and language is consid-
ered to be inherently multimodal (e.g. Bressem 2012; Bressem and Ladewig 2011;
Fricke 2007, 2012; Ladewig 2012; Ladewig and Bressem forthcoming; Müller 1998,
2009, 2010; see also Fricke this volume, Müller this volume; Müller, Bressem, and La-
dewig this volume). “Starting with the assumption that speech and gesture are two dis-
tinct systems while sharing common properties, linguistic analyses of gestures aim at
discovering commonalities and overlapping characteristics as well as differences and
specificities of the two modalities.” (Bressem and Ladewig 2011: 86) This linguistic-semi-
otic approach addresses how gestures are structured and how they mean, thus providing
an account of a “grammar” of gesture (e.g., Bressem 2012). Based on a close analysis of how
speech and gesture are integrated, it is proposed that the grammar of language must
actually be considered multimodal (e.g., Fricke 2012; Ladewig 2012). “Studies within this
framework have shown that a) gestures can be segmented and classified, b) gestures
show regularities and structures on the level of form, meaning and syntagmatics, c) ges-
tures have the potential for combinatorics and hierarchical structures, and d) gestures
show paradigmatic as well as syntagmatic relations.” (Bressem 2012: 12) (see also Fricke
this volume; Müller this volume; Müller, Bressem, and Ladewig this volume)
Studies that are conducted within the linguistic (and semiotic) framework of gesture
analysis pursue a particular methodological procedure, which rests upon a separation of ges-
ture and speech in parts of the analytic process. Gestures are first and foremost investigated
independently of speech. Only successively, gesture and speech are brought together, thus
leading to an investigation of form, meaning, and function of gestures alone as well as in
relation to speech (Ladewig and Bressem forthcoming). Müller (submitted) for instance
distinguishes four blocks in a linguistic (semiotic) analysis of gestures, i.e.,
(i) form,
(ii) sequential structure,
(iii) context of use (local) and
(iv) distribution by which gestural meaning construction and the interplay of speech
and gesture are analyzed.
1082 V. Methods
The analysis of gestures’ form thereby rests upon the “four feature scheme” (Becker
2004, 2008), which grounds the description of gestures on the four parameters of sign
language (Battison 1974; Klima and Bellugi 1979; Stokoe 1960). Gestures are described
in the four parameters “hand shape”, “orientation”, “movement”, and “position”. Sim-
ilar to sign languages, for which each of the parameters can be distinctive in differen-
tiating one sign from another, a linguistic-semiotic approach to gestures assumes a
potential significance of all four parameters for the creation of gestural meaning. Ex-
cluding one of the parameters from the description might result in missing a possibly
meaningful realization. Here, the notation system presented in this chapter comes in.
It presents a system based on the four form parameters, which has been developed during
the course of an empirical study investigating recurrent gestural forms of German speakers
in natural occurring conversations (Bressem 2007; Ladewig and Bressem forthcoming).
3. A form-based notation system for gestures

The notation system is characterized by the following basic attributes:
(i) A phonetic perspective on the notation of gestural forms

Contrary to other notation or coding systems, which aim at a representation of
gestures’ forms including only the most common or meaningful gestural forms,
the present system pursues a phonetic perspective, and aims at an articulatory rep-
resentation of gestural forms. However, similar to a phonetic transcription of
speech, which does not indicate all phonetic details, the notation system departs
from a moderately narrow description, and leaves aside particular details of the
articulation. In line with Ternes, we argue that “a phonetic transcription, which re-
presents all phonetic particularities, is not possible because the number of possible
phones is endless.” (Ternes 1999: 35, translation JB, emphasis in original) Accord-
ingly, a phonetic perspective on the notation of gestural forms already assumes a
particular degree of abstraction, yet the abstraction is not as much driven forward
as to incorporate the function of gestural forms.
The notational system was developed during an analysis, which aimed at a
form-based description of gestures and particularly focused on the use, distribution,
and co-occurrence of parameter realizations in German speakers (see section 4 for
further details). Accordingly, the notation system is data driven and has been
designed while working with and on the material. Later on it was adjusted to incor-
porate further phenomena, which emerged in other types of material.
(ii) No anatomical notation of the arms or other body parts
The system only provides notation guidelines for the hands, and leaves out articula-
tory as well as anatomical descriptions of arms, other body parts, and body postures
(see for instance Martell 2002; Sager 2001; Sager and Bührig 2005). We assume that
solely the hands are of core importance for a notation of gestural forms and for a
further analysis of gestural meaning. Furthermore, different configurations of
arms, for instance, result inevitably form the notation of other parameters. When,
for instance, a gesture of the flat hand with a palm turned upwards (e.g., a Palm
Up Open Hand gesture) is produced with a stretched arm, the fact that the arm
is stretched, is captured by the parameter “position” in gesture space, if 3d based
notation conventions are used as proposed by Fricke (2005) as these capture the
distance of the arm from the body. A separate notation of these features is thus not
included in the present system. Furthermore, the system was not designed as a nota-
tion system allowing for a real life reproduction of gestures in artificial agents for
instance (see for instance Martell 2002). Rather, it is designed to allow for a notation
of gestures, which helps to uncover structures and patterns in gestures’ forms and
functions, which provide a sound basis for gestural meaning analyses.
(iii) A systematic characterization of gestural forms in all four parameters of sign
language
We suggest that it is essential to describe a given gesture with regard to all four
parameters formulated in Sign Linguistics (Battison 1974; Klima and Bellugi
1979; Stokoe 1960): hand shape, orientation, movement, and position. Therefore,
the notation system provides descriptive categories for the notation of hand
shapes, orientations of the palm, movement patterns, and positions in gesture
space. Note, that the presentation of the notation conventions for the four para-
meters follows a particular logic by arranging them according to their prominence.
The system starts with the guidelines for the notation of hand shapes, as it assumes
hand shapes to be the most prominent form features of gestures. Similar to its use
in sign languages, it is assumed that the “perceptional identification of a relatively
stable hand shape in the flow of movement of single signs is much easier than
the identification and nomination of a movement, place of articulation or an
orientation of the hand.” (Wrobel 2007: 47, translation JB)
Based on the fact that the orientation of the hand is strongly connected to the
hand shape, the notation system places the parameter “orientation” on second
position in the notational logic. Contrary to other proposals of gesture notation
and coding, which argue for an inseparability of hand shape and orientation due
to their close connection and thus do not include separate conventions for these
two form aspects (e.g., Gut et al. 2002; Kendon 2004), the present system, based
on McNeill (1992), provides for a separate notation of orientations of the hand.
This assumption is grounded in a large body of research, showing that changes
in the orientation of the hand may go along with changes in gestural meaning
(e.g., Calbris 1990; Fricke 2012; Harrison 2009, 2010; Kendon 2004; Mittelberg
2006, 2010; Müller 2004; Sowa 2006 inter alia).
On third position, the system places the parameter “movement”, as it assumes
movement to be the other most prominent form feature apart from hand shape.
Sign language research for instance was able to show that the “perception of
sign movement appears to be crucially different from that of the static parameters,
such as hand shape and location (Poizner, Klima, and Bellugi 1987). Thus move-
ment appears to be central to sign production and perception […]” (Schembri
2001: 27). Also, gesture research has shown that the parameter “movement”
can be the core form feature in the establishment and differentiation of gestural
meaning (e.g., Calbris 1990; Harrison 2009, 2010; Ladewig 2010, 2011; Mittelberg
2006, 2010; Müller 2000, 2004; Teßendorf 2005 inter alia).
The notation of the parameter “position” is the last step in notating gestures’
forms. Although it is clear that the position in gesture space may be a central fac-
tor in distinguishing meaning and function of gestures (see Ladewig 2010, 2011a)
or that it can be exploited for the creation of larger gestural units (Bressem 2012;
Müller 1998, 2011), the parameter “position” appears to be a generally less central
1084 V. Methods
factor regarding the perception of gestures. Only in exceptional cases, such as in

extremely large gestures for instance, the position in gesture space is perceptually
especially salient, or when position is used to indicate deictic information, such as
where a given object was placed, or to set up spatial relations between objects.
(iv) A characterization of gestural forms independently of speech
Based on the assumption that a description of gestural form features should pre-
cede an analysis of their meaning(s) and function(s), the notation of gestures’
forms is separated from the accompanying verbal utterance. This is done in
order to avoid ascriptions possibly derived from its dependence on speech and
because it is methodologically sound, since an analysis of possible meaning of ges-
tural forms, such as in a context-of use analysis of recurrent gestures for instance
(Kendon 2004; Ladewig 2010, 2011; Müller 2004), rests upon a close account of
gestural form features. Accordingly, the system provides terms for the character-
ization, which facilitate a notation of gestural forms without sound.
(v) Avoidance of gestural form notation including paraphrases of meaning and
function
Contrary to a large body of existing notation, coding and annotation systems, the
present system avoids a notation of gestural form features, which include paraphrases
of their possible meaning and functions. Labels, such as “cupped hand” or “tray” with
respect to hand configurations for example are not used. Rather, the notation con-
ventions address the gestures’ forms in the most “objective” way possible. The system
presented here proposes a terminology, which captures the specific physiological and
material quality of gestures without ascribing possible meanings and functions. In
short, the system takes an articulatory stance towards gesture notation.
(vi) Flexibility, expandability, practicability and learnability of the notation conventions
As a result of its articulatory perspective, the conventions are designed to allow
for their extension, especially with respect to the incorporation of new gestural
forms. This characteristic of the notation system is especially prevalent for the
parameter “hand shape”, as the characterization of hand shapes, consisting of a
three-step procedure (see section 3.1), easily allows the incorporation of new
types of configurations. Furthermore, and even more importantly, the system
was designed in order to be practicable and learnable. The terms should thus be
intuitive and easy to adopt by researchers from various disciplines. This means
that no particular abbreviations, symbols or signs were assigned to the terms.
The aim was rather to suggest terms, which preserve the various forms of gestures
physical appearance as part of their naming.
3.1. Notation of parameter “hand shape”

For the description of the parameter “hand shape”, the notation system falls back on a
differentiation made by the HamNoSys coding system for German sign languages
(Prillwitz et al. 1989). Similar to HamNoSys, the notation system assigns the various
hand shapes to four basic categories:
(i) fist,
(ii) flat hand,
(iii) single fingers, and
(iv) combinations of fingers (see Fig. 70.1).
This distinction is based on the idea that the four categories show different prominent
areas, which determine the hand’s shape. With respect to the “flat hand”, for example,
the palm of the hand dominates the shape of the whole hand configuration. For the cat-
egory “combinations of fingers”, however, single fingers as well as combined fingers in
association with the palm determine the configuration of the hand as a whole.
1. “Fist” 2. “Flat hand” 3. “Single fingers” 4. “Combination of fingers”
Fig. 70.1: 4 basic categories of hand configuration
Accordingly, the description of the hand shape rests upon the evaluation of the most
prominent form feature of the hand, and questions whether
(i) the hand is formed to a fist,

(ii) the palm is the most prominent feature of the configuration,
(iii) single fingers determine its configuration or
(iv) combinations of fingers alone or in association with the palm dominate the shape
of the hand.
Deciding on the particular category is therefore the first step in the depiction of the
parameter “hand shape”.
In addition, hand shapes involving both hands have to be distinguished. These are
either separated based on a) the four categories, the number of digits involved, and the
shape of the fingers (see below) or b) named individually such as in “hands interlocked”.
For the hand shapes assigned to the category “single fingers”, “combinations of
fingers”, and “hand shapes involving both hands”, the hand configuration is further
specified by the involved number and shape of the digits. In order to differentiate the
fingers of the hand, they are numbered, starting from 1 (=thumb) to 5 (=little finger).
After identifying and numbering the digits, their particular form has to be specified.
Here, six different shapes are distinguished, i.e., the digit is
(i) stretched,
(ii) bent,
(iii) crooked,
(iv) flapped down,
(v) connected, or
(vi) touching (see Fig. 70.2 below).
stretched bent crooked flapped down connected touching
Fig. 70.2: Overview of the six shapes of the digits

1086 V. Methods
These shapes correspond to differences in flexing the joints of the digits. Whereas in the
shape “stretched” no joint is flexed, the form “bent” shows a little flexing of the joint at
the fingertip as well as the middle knuckle joint. If the digits are “crooked”, the joints at
the finger tip, the middle knuckle joint as well as the joint at the rudiment of the digit
are flexed, whereas the middle knuckle joint is flexed the most. If the digit is depicted as
“flapped down”, the digit shows only a flexion of the joint at the rudiment and is almost
at right angles to the palm.
The shapes “connected” and “touching” specify shapes of the fingers, in which 2 or
more fingers are in contact. The shape “connected” applies to configurations of the digits,
in which the fingers are “bent” and thus show a flexion of all three knuckle joints, but are
additionally connected at the very tip of the finger. For the shape “touching”, however,
the digits are “flapped down” and touch each other at the entire first limb of the digit (see
Fig. 70.3 for examples of hand shapes involving the combination of fingers).
Furthermore, the marker “spread” is assigned if, in cases of the category “combina-
tions of fingers”, the fingers are separated from each other. In these cases, the fingers
are spread apart, i.e., the space between them is enlarged by upholding an extra amount
of muscle tension in the whole configuration of the hand.
Combination of fingers
1+2 connected 1+3 connected 1+2 crooked 1+2 bent 1–5 crooked 1–5 bent 1–5 spread
bent
2–5 flapped down 2–5 flapped 2–5 bent 1–5 touching 1+5 connected 1+2 touching
down 1
stretched
Fig. 70.3: Examples of hand shapes involving the combination of fingers
The notation of the parameter “hand shape” involves three steps:
(i) Assigning the hand shape to one of the four categories or classifying it as a config-
uration involving both hands;
(ii) Numbering of each finger;
(iii) Specifying the shape of the digit.
3.2. Notation of parameter “orientation”

The coding of the parameter “orientation” is based on the distinction made by McNeill
(1992). Accordingly, the notation of a hand’s orientation depends on a) the orientation
of the palm and b) the gesture space. Consequently, the description of the orientation
involves a two-part procedure, in which first the orientation of the palm and secondly
the hand’s orientation with respect to the gesture space have to be distinguished.
For the characterization of the palm’s orientation, four different basic angles are
distinguished:
(i) palm up,

(ii) palm down,
(iii) palm lateral, and
(iv) palm vertical (McNeill 1992: 380).
In addition to these four, the marker “diagonal” (Bressem 2006) is used to further dif-
ferentiate the four basic angles and to mark an intermediate orientation between
them. While in the case of a “palm lateral”, the hand is parallel to the sagittal line of
the body’s center, the marker “diagonal” indicates a 45 degrees angle to the body’s cen-
ter line (see Figs. 70.4 and 70.5) or with regard to the body of the speaker.
PLdiTC PLTC PLdiAC
sagittal axis
sagittal axis
sagittal axis
body of speaker body of speaker body of speaker
Fig. 70.4: Diagonal orientation PLdiTC, PLdiAC
PLdiAB PVAB PVdiTB

body of speaker
body of speaker
body of speaker
Fig. 70.5: Diagonal orientation PVdiAB, PVdiTB
The orientations “palm lateral”, “palm vertical” and any orientation additionally tagged
by the marker “diagonal”, is further differentiated with respect to the gesture space.
Here, four types are distinguished:
1088 V. Methods
(i) towards center,

(ii) away center,
(iii) towards body, and
(iv) away body (McNeill 1992: 380).
Additionally, if necessary, the orientation of the fingers such as “fingers down” will be
noted. The characterization of a hand’s orientation is therefore always a combination
of “orientation 1” and “orientation 2”, as in “palm vertical (1) away body (2)” for
example.
To sum up: the notation of the parameter “orientation” involves four steps:
(i) Depict the basic orientation of the hand.

(ii) If necessary, specify the orientation by the marker “diagonal”.
(iii) Characterize the hands orientation in relation to the gesture space.
(iv) And if necessary, specify the fingers’ orientation.
3.3. Parameter “movement”

As the parameter “movement” is the most complex of all the parameters, three aspects
of movement are considered separately: type of movement, direction of movement, and
quality of movement.
3.3.1. Type of movement

The shape of motion patterns is accounted for in the type of movement. Here, basic
movement types are distinguished into:
(i) straight movement,

(ii) arced movement,
(iii) circle,
(iv) spiral,
(v) zigzag, and
(vi) s-line (see Fig. 70.6).
For movements executed by the wrist of the hand, the notation system distinguishes
three possible types, i.e., “bending”, “raising”, and “rotation” (see for instance Prillwitz
et al. 1989) (see Fig. 70.7).
straight arced circle
zigzag s-line spiral
Fig. 70.6: Basic types of movements

bending to
puls raising bending to 1 bending to 5 rotation
Fig. 70.7: Types of movement for wrist (figure taken from Prillwitz et al. 1989)
The third type of motion patterns, namely movements of single fingers are depicted ac-
cording to the basic movement types “straight”, “arced”, and “circle”. Additionally, for
the depiction of movements executed by all fingers of a hand, “beating of fingers”,
“flapping down”, “grabbing movement”, and “closing of fingers” is differentiated.
3.3.2. Direction of movement

After describing the type of movement, the motion patterns need to be further specified
according to their direction. With respect to the movements of the arm and shoulder as
well as single fingers, three main directions are distinguished:
(i) movements along the horizontal axis (right and left, regarded from the perspective
of the gesturer),
(ii) movements along the vertical axis (up and down), and
(iii) movements along the sagittal axis (away from body and towards body).
Additionally, these directions can be further differentiated by the supplement “diago-

nal”, for example “diagonal up right”. In these cases, the supplement “diagonal” breaks
up the separation of the movement directions mentioned above and forms a separate
class, i.e., movements along the diagonal axis (see Fig. 70.8 and 70.9).
In addition to these three and possibly four movement directions, a further aspect of
movement direction is considered, namely circular and spiral motions. These movements
are distinguished based on whether the direction is a) clockwise or b) counter clockwise.
Whereas circular motions are only characterized according to these two directions, spiral
motions are further depicted regarding their direction on one of the four axes. Thus, the
up
left right
down
Fig. 70.8: Directions of movements along the vertical and horizontal axis
1090 V. Methods
towards body
away body
body of speaker
Fig. 70.9: Directions of movements along the sagittal axis
characterization of a spiral motion can for example be “clockwise right”. The directions
mentioned above are also used for the depiction of movements of single fingers.
For the characterization of the “bending” type of wrist movement, four directions
are distinguished, i.e., “to pulse”, “to 1”, and “to 5”. The type “raising” needs no further
specification. The type “rotation” is depicted in the same fashion as circular as well as
spiral motions, i.e., according to “clockwise” or “counter clockwise” direction.
3.3.3. Quality of movement

The aspect “quality of movement” captures further aspects of the movement patterns:
(i) size (reduced or enlarged)

(ii) speed (decelerated, accelerated), and
(iii) flow of movement (accentuated).
The terms introduced for the depiction of the character of movement can and often
need to be combined with one another. It is therefore possible to characterize a move-
ment as “enlarged” and being “accentuated”.
The aspect “quality of movement” specifically addresses the markedness of move-
ments. A movement is marked, if it stands out in relation to other movements because
of a particular saliency regarding one of these qualitative features. For instance, in an
“accentuated” movement, the endpoint of the motion is stressed, because the movement
is carried out with more force. This rise in force leads to an increase in the intensity at
the end of the movement execution. Similarly to the accent in the spoken language, in
which the accent is used to stress particular segments of speech such as syllables for
instance (see for example Pompino-Marschall 1995), an accentuation in gestures may
be used to stress a particular gestural segment of the motion pattern (Bressem and
Ladewig 2011; see also Bressem 2012 for a more detailed account).
The notation of the parameter “movement” involves four steps:
(i) Depict the basic type of movement, i.e., whether it is executed by the arm or shoul-
der, the wrist or the fingers.
(ii) Characterize the shape of the movement accordingly.
(iii) Note the direction depending on the type of movement.

(iv) Specify the movement according to its quality.
3.3.4. Parameter “position”

Regarding the parameter “position”, the notation system draws on the concept of the
gesture space introduced by McNeill, which divides the gesture space “into sectors
using a system of concentric squares” (McNeill 1992: 86). Consequently, 4 basic sectors
are distinguished, i.e., “center center”, “center”, “periphery” and “extreme periphery”
which are further differentiated according to the features “upper”, “lower” as well as
“right” and “left” (see Fig. 70.10). A depiction of the hand’s position above the right
shoulder, for example, would thus be coded as “periphery upper right” (right and left
are assigned according to the speaker’s orientation).
EXTREME upper
PERIPHERY
upper right upper left
CENTER
right CENTER- left

CENTER
lower right lower left
lower
Fig. 70.10: Gesture space by McNeill 1992: 89
This depiction of the gesture space is sufficient for a basic characterization of the hands
position and can be used for a first description in recording the gestural forms. How-
ever, if one is interested in a more detailed account of movements and positions in
space, then a three-dimensional model of the gesture space offers an appropriate exten-
sion Fricke’s (2005, 2007). Starting from McNeill’s gesture space, Fricke assigns four
dimensions to the gesture space, i.e.
(i) 0 = speaker’s own body,

(ii) 1 = close distance to the body,
(iii) 2 = middle distance from the body, and
(iv) 3 = far distance from the body.
These dimensions can be assigned either to capture the forward or the backward dis-
tance from the speaker’s the body. (If the hand’s backward distance from the body
1092 V. Methods
needs to be described, the numbers −1, −2 and −3 are used.) Fricke’s three dimensional
model of gesture space may account for the hand’s distance from the speaker’s body but
also for the use of interactive gesture space areas, and even for the reconstruction of
movement trajectories in space.
The notation of the parameter “position” involves two steps:
(i) Depict the basic sector and define its further characteristics.
(ii) If necessary, use Fricke’s three-dimensional model for further differentiation.
See Fig. 70.11 for an example annotation using the notation scheme. A complete doc-
ument containing graphical representations of all notation conventions can be found at
www.janabressem.de/publications.
Fig. 70.11: Example showing notation of gestures using the notation scheme (taken from Bressem
2012)
4. Applying the notation system: Some examples

The notation system sketched in this chapter has been developed during an analysis, which
aimed at a form-based description of gestures and particularly focused on the use, dis-
tribution, and co-occurrence of parameter realizations in German speakers (Bressem
2006, 2007; Ladewig and Bressem forthcoming). It focused on a systematic account
of the different types of hand shapes, orientations of the hand, movements, and posi-
tions in gesture space that are observable when watching people gesture. Possible mean-
ings associated with these form features were not of interest. Based on a thorough and
detailed analysis of gestural forms used by German speakers, the study tackled the
question, whether the use of gestural forms is solely based on idiosyncratic preferences
of individual speakers or whether it is possible to identify a repertoire of gestural forms,
which is used recurrently by different speakers. Starting from this research question, the
study posed the following three aspects:
(i) Which hand shapes, orientations, positions, and movements do German speakers
use in naturally occurring conversations?
(ii) How are the realizations of the four parameters distributed?
(iii) Is it possible to detect frequent co-occurrences of parameter realizations?
Based on the notation system introduced above, the study documented altogether 6 re-
current hand shapes out of which the “flat hand” and the “lax flat” hand were used most
frequently. Moreover, it was shown that both hand shapes frequently occurred with par-
ticular orientations, movements, and positions in gesture space. The “flat hand” and the
“lax flat hand” were documented to be used most often in a palm lateral orientation
(PLTC), a straight movement downwards positioned in the center of the gestures
space (cc), showing that clusters, i.e., the simultaneous occurrence of four form para-
meters frequently recur. Accordingly, the study was able to show that
(i) German speakers dispose of standardized gestural forms, which they use
recurrently,
(ii) that the co-occurrence of hand shapes with other gestural forms such as orientation
or movement is not random, and
(iii) that speakers seem to dispose of clusters which depend on particular hand shapes
and their co-occurrence with other specific gestural forms (for a more detailed
account of the study see Bressem 2007; Ladewig and Bressem forthcoming).
More recently, the notation system was applied in two studies examining simultaneous
structures of gestures in human and non-human primates, which aimed at
(i) the use of recurrent forms of gestures in non-human primates,

(ii) a documentation of the forms and meanings of recurrent gestures in humans, and
(iii) a comparison of the structural properties of gestures in nonhuman and human
primates.
Based on a linguistic form based method to systematically reconstruct the meaning of

gestures consisting of four core building blocks, i.e., form, sequential structure, local
context of use, and distribution over contexts (Müller submitted; Bressem, Ladewig
and Müller this volume), the study investigating gestures in humans documented a rep-
ertoire of 16 recurrent gestures for German speakers. In particular, it examined two
specific gestures families, i.e., groups of gestures “that have in common one or more ki-
nesic or formational characteristics” (Kendon 2004: 227) in more detail. For one of the
families, the family of the AWAY gestures consisting of 5 different recurrent gestures, it
was shown that the gestures by operating on the effect of underlying everyday actions,
such as an empty space around the body, the gestures have the capability of expressing
gestural negation (Bressem and Müller volume 2; see also Harrison 2009, 2010 for a fur-
ther application of the notation system investigating negation in speech and gestures).
The notation system has also been used in a study investigating the use of gestures in
non-human primates (Bressem et al. in preparation). The study aimed at documenting
1094 V. Methods
the simultaneous structures and specifically the degree of structural complexity present
in the gestures of nonhuman primates. Starting from a group of already identified and
analyzed visual and tactile gestures of Orang-Utans (Liebal, Pika, and Tomasello 2006),
these two groups of gestures were reanalyzed based on the notation scheme presented
in this paper (see Fig. 70.12). The study was able to document, that differences in form
features correlate with changes in the contexts-of use (Kendon 2004; Ladewig 2012;
Müller 2004). Accordingly, the study not only revealed that apes modify their gestures
depending on the goal they want to achieve, clearly replicating a structural pattern that
we find in the variation of gestures in human, but also that form variants of gestures
may be grouped into gesture families, moreover showing striking similarities observed
for gestures in humans.
Fig. 70.12: Notation of tactile gestures “slap” using the notation scheme (taken from Bressem
et al. in preparation)
The application of the notation scheme in the different studies has thus shown that
the form-based perspective of the notation is a suitable framework for the detection of
forms, structures, and patterns in gestures. The systematicity in the description of ges-
tures’ form along with the comparability of the terms allows for a (descriptive) statistical
analysis. Combined with annotation software such as Elan (Wittenburg et al. 2006), the
notation system is also suitable for further processing of the data, such as Excel or html.
5. Conclusion
The system presented in this paper sets the stage for a widely applicable notation sys-
tem, which is usable in a range of disciplines. As the terms introduced in the system are
based on form characteristics of gestures only, the notation system is open to all kinds of
research foci from various approaches. Given its flexibility and expandability, the sys-
tem may be adjusted according to a broad range of specific research questions. Further-
more, the notation system may be used as one module in a transcription, coding or
annotation system for gestures (and speech) (see Bressem, Ladewig and Müller this
volume). Furthermore, it is applicable in annotation software but can also be used
for analyses not using particular annotation software.
Due to these characteristics, the notation system may be used in a range of disci-
plines interested in an analysis of gesture use. More importantly, however, it provides
the ground for a sound description of gesture forms, which is central to any account
of gesture irrespective of whether it focuses on the cognitive, semantic, interactive, or
other aspects of gestures.
6. References
5: 1–19.
Becker, Karin 2008. Four-feature-scheme of gesture: Form as the basis of description. Unpub-
lished manuscript.
Bergmann, Kirsten, Volkan Aksu and Stefan Kopp 2011. The relation of speech and gestures:
Temporal synchrony follows semantic synchrony. Proceedings of the 2nd Gesture and Speech
in Interaction Conference (GeSpIn 2011). Bielefeld, Germany.
Birdwhistell, Ray 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Bohle, Ulrike this volume. Approaching notation, coding, and analysis from a conversational ana-
lysis point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
cation Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana 2006. Formen redebegleitender Gesten. Verteilung und Kombinatorik formbezoge-
ner Parameter. MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Bressem, Jana 2007. Recurrent form features in coverbal gestures. http://www.janabressem.de/
Downloads/Bressem-recurrent form features.pdf (accessed 11 August 2010).
tation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt
(Oder).
Bressem, Jana this volume. Transcription systems for gestures, speech, prosody, postures, gaze. In:
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases. Semiotica 184(1/4): 53–91.
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill,
Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana and Cornelia Müller volume 2. The family of AWAY gestures. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig and David McNeill (eds.), Body – Language –
books of Linguistics and Communication Science 38.2.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana, Katja Liebal, Cornelia Müller and Nicole Stein in preparation. Recurrent forms
and contexts: Families of gestures in non-human primates.
1096 V. Methods
Duncan, Susan D. n.d.. Coding “Manual” http://mcneilllab.uchicago.edu/pdfs/Coding_Manual.pdf

(accessed 26 August 2012).
Fricke, Ellen 2012. Grammatik multimodal: Wie Sprache und Gesten zusammenwirken. Berlin: De
Gruyter.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech: A multimodal
approach. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Fricke, Ellen 2005. Geste und Raum – Probleme der Analyse und Notation. Talk held at the lec-
ture series „Analyse und Notation von Körperbewegungen“. Technical University Berlin.
Harrison, Simon 2009. The expression of negation through grammar and gesture. In: Jordan Zla-
tev, Mats Andrén, Marlene Johansson Falck and Carita Lundmark (eds.), Studies in Language
and Cognition, 405–409. Cambridge: Cambridge Scholars Publishing.
29–51.
Hassemer, Julius, Gina Joue, Klaus Willmes and Irene Mittelberg 2011. Dimensions and mechan-
isms of form constitution: Towards a formal description of gestures. Proceedings of the
2nd Gesture and Speech in Interaction Conference (GeSpIn 2011). Bielefeld, Germany.
Holler, Judith and Geoffrey Beattie 2002. A micro-analytic investigation of how iconic gestures
and speech represent core semantic features in talk. Semiotica 142: 31–69.
versity Press.
cognitextes.revues.org/406
nitive, and conceptual aspects. Ph.D. dissertation, Faculty of Social and Cultural Sciences, Euro-
Lausberg, Hedda and Han Sloetjes 2009. Coding gestural behavior with the NEUROGES –
ELAN system. Behavioral Research Methods 41(3): 841–849.
Liebal, Katja, Simone Pika and Michael Tomasello 2006. Gestural communication in orang-utans
(Pongo pygmaeus). Gesture 6(1): 1–38.
Lücking, Andy, Kirsten Bergmann, Florian Hahn, Stefan Kopp and Hannes Rieser 2010. The
Bielefeld Speach and Gesture Alignment Corpus (SaGA). Paper presented at the 2010. Multi-
modal Corpora Workshop, hosted at the 7th Language Resources and Evaluation Conference
LREC 2010. Malta.
Martell, Craig 2002. Form: An extensible, kinematically-based gesture annotation scheme. Pro-
ceedings ICSLP-02: 353–356.
of Chicago Press.
multimodal models of grammar. Ph.D. dissertation, Cornell University, Ithaca, NY. Ann
Arbor, MI: Cornell University: UMI.
Mittelberg, Irene 2007. Methodology for multimodality. One way of working with speech and ges-
ture data. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson and Michael J. Spi-
vey (eds), Methods in Cognitive Linguistics, 225–248. Amsterdam: John Benjamins.
lin: Arno Spitz.
für Helmut Richter, 211–228. Frankfurt am Main: Peter Lang.
Müller, Cornelia 2011. Reaction paper. Are ‘deliberate’ metaphors really deliberate. A question
of human consciousness and action. Metaphor in the Social World 1: 61–66.
Müller, Cornelia submitted. How gestures mean. The construal of meaning in gestures with speech.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gesture.
A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
Poizner, Howard, Edward S. Klima and Ursula Bellugi 1987. What the Hands Reveal about the
Brain. Cambridge: Massachusetts Institute of Technology Press.
Pompino-Marschall, Bernd 1995. Einführung in die Phonetik. Berlin: Walter de Gruyter.
NoSys Version 2.0 Hamburger Notationssystem für Gebärdensprachen: Eine Einführung. Ham-
burg, Germany: Signum.
Sager, Svend F. 2001. Probleme der Transkription nonverbalen Verhaltens. In: Klaus Brinker,
Gerd Antos, Wolfgang Heinemann and Svend F. Sager (eds.), Text und Gesprächslinguistik.
Ein Internationales Handbuch Zeitgenössischer Forschung, 1069–1085. (Handbücher zur
1098 V. Methods
Sager, Svend F. and Kristin Bührig 2005. Nonverbale Kommunikation im Gespräch – Editorial. In:
Kristin Bührig and Svend F. Sager (eds.), Osnabrücker Beiträge zur Sprachtheorie 70: Nonver-
bale Kommunikation im Gespräch, 5–17.
Schembri, Adam 2001. Issues in the analysis of polycomponential verbs in Australian Sign Lan-
guage (Auslan). Unpublished doctoral dissertation. University of Sydney.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Shape Descriptions. Berlin: Akade-
mische Verlagsgesellschaft.
tems of the American Deaf. Buffalo, NY: University of Buffalo Press.
Ternes, Elmar 1999. Einführung in die Phonologie. Darmstadt: Wissenschaftliche Buchgesellschaft.
Barrer”. MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. In: Lynn Messing (ed.), Pro-
ceedings of WIGLS. The Workshop on the Integration of Gesture in Language and Speech. Octo-
ber 7–8, 1996, 79–95. Newark in Delaware: Applied Science and Engineering Laboratories
Newark.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures In:
Christian Cave, Isabelle Guaitelle and Serge Santi (eds.), Oralité et Gestualité: Communication
Multimodale, Interaction, 387–391. Montreal: L’Harmattan
ELAN: A professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation.
Wrobel, Ulrike 2007. Raum als kommunikative Ressource. Eine handlungstheoretische Analyse vi-
sueller Sprachen. Frankfurt am Main: Peter Lang.
71. Linguistic Annotation System for Gestures

1. Introduction
2. Annotation of gestures
3. Annotation of speech
4. Describing gestures’ relation to aspects of speech
5. The role of speech in the annotation process
6. Practical implementation of the annotation system
7. Conclusion
8. References
Abstract
This chapter outlines an annotation system for gestures grounded in a cognitive linguistic
approach to language use and provides guidelines for the annotation of gestures (gesture
units and phases, form and motivation of form), the annotation of speech as well as the
relation of gestures with speech on a range of levels of linguistic description (prosody,
semantics, syntax, and pragmatics). It addresses necessary aspects for a description of
71. Linguistic Annotation System for Gestures 1099
gestures’ forms and for a reconstruction of their meanings and functions with and without
speech and explicates underlying theoretical and methodological assumptions.
1. Introduction
A wide variety of different transcription or annotation systems are used in the field of
gesture studies. Developed for instance within psychology, anthropology, cognitive lin-
guistics, semiotics, or nonverbal communication research, the foci of the systems vary
immensely both in their theoretical and methodological perspectives (see Bressem
this volume b). They range from a focus on conceptual questions and a primary interest
in meaning and function of co-verbal gestures (e.g., Duncan n.d.; McNeill 1992, 2005;
Sweetser and Parrill 2004) to a focus on the form of the gesture alone and disregarding
gesture’s relation to the verbal utterance (e.g., Martell 2002; Sager 2001). Only few tran-
scription or coding systems address the structural properties of the medium “gesture”
(Gut et al. 2002; Kipp, Neff and Albrecht 2007) (for an overview of annotation systems
see Bressem this volume b). A systematic linguistic annotation for gestures is still lack-
ing. No existing scheme so far allows for an investigation of gestures’ structures, mean-
ings, and functions both on the level of gestures alone and in relation to speech and
clearly addresses the theoretical and methodological assumptions going along with a lin-
guistic perspective on gestures. The Linguistic Annotation System for Gestures (LASG)
aims at that target. It provides a perspective on the annotation of gestures grounded in a
(cognitive) linguistic approach to language use and a form- based approach to gesture
analysis (Müller, Bressem and Ladewig this volume; Müller, Ladewig and Bressem vol-
ume 2). In addition, it provides guidelines for the annotation of gestures on a range of
levels of linguistic description. The system addresses necessary aspects for a description
of gestural forms and a reconstruction of their meanings and functions with and without
speech. The underlying theoretical and methodological assumptions are spelled out ex-
plicitly. The Linguistic Annotation System thereby offers solid grounds for describing
and detecting a “grammar of gestures” (see Müller, Bressem and Ladewig this volume).
Moreover, it allows for an analysis of gestures from the perspective of a “multimodal
grammar” (Fricke 2012, this volume) and in particular for an examination of the integra-
tion of gestures into spoken utterances (see e.g., Bressem 2012; Fricke 2012 this volume;
Ladewig 2012).
The annotation system is grounded in a cognitive linguistic and form based approach
to gestures, which assumes that speech and gesture are tightly linked and that language
is inherently multimodal (Fricke 2007, 2012, this volume; Mittelberg 2006; Müller 1998,
2007, 2008a; Müller et al. 2005). Gestures are assumed to have “a potential for lan-
guage” by fulfilling the same functions as language (Bühler 2011) and either “express
inner states and feelings, […] regulate the behavior of others, […] or represent objects
and events in the world.” (Müller 2009: 213, this volume). A linguistic perspective on ges-
tures follows two main aims: 1) a description of the structural and functional properties of
gestures, that is a “grammar of gesture” (e.g., Bressem 2012; Fricke 2012; Müller, Bressem
and Ladewig this volume; Ladewig 2012; Müller 2004, 2009, 2010b, submitted; Müller et al.
2005), and 2) an investigation of the relation of speech and gestures in conjunction from
the perspective of a “multimodal grammar” (Bressem 2012; Fricke 2012, this volume; La-
dewig 2012). Linguistic theory and in particular linguistic methods and concepts are under-
stood as “theoretical building blocks from which elements are selectively taken and
1100 V. Methods
carefully adopted in the analysis of gestures.” (Bressem and Ladewig 2011: 86–87; see
Müller this volume for a detailed account of a form based and linguistic approach to
gestures)
A linguistic perspective on gestures supposes that gestures a) can be segmented and
classified, b) show regularities and structures on the level of form and meaning, and
c) have the potential for combinatorics and hierarchical structures. Gestural forms
are assumed to be motivated form Gestalts, that is meaningful wholes, in which however
every aspect of a gesture’s form is regarded as potentially meaningful. Form features may
be singled out and changes in form features might be meaningful (Bressem and Ladewig
2011; Müller 2004, 2010, submitted; Fricke 2012; Ladewig and Bressem forthcoming).
Moreover, gestural form features are not considered to be random. On the contrary it
is assumed, in particular with respect to performative or recurrent gestures, that form fea-
tures recur across speakers and contexts whilst sharing stable meanings (Bressem and
Ladewig forthcoming; Calbris 1990; Fricke 2010; Harrison 2009; Kendon 1995, 2004;
Ladewig 2010, 2011; Mittelberg 2006; Müller 1998, 2004; Müller, Bressem and Ladewig
this volume; inter alia). This perspective on gestural forms results in a particular method-
ological approach, which gives form a prominent role in the process of description and
analysis (e.g., Bressem 2007, 2012, this volume; Ladewig 2007, 2010; Ladewig and Bressem
forthcoming; Müller 1998, 2004, 2010b; Müller, Bressem and Ladewig this volume).
2. Annotation of gestures
The Linguistic Annotation System for Gestures is embedded within a linguistic
approach to gesture, its methodological premises and in particular within the “Methods
of Gesture Analysis (MGA)” (Müller this volume; Müller, Bressem and Ladewig this
volume). The Methods of Gesture Analysis (see Müller, Ladewig and Bressem vol-
ume 2) offer a form-based method to systematically reconstruct the meaning of ges-
tures. It allows for the reconstruction of fundamental properties of gestural meaning
creation and determines basic principles of gestural meaning construction by distin-
guishing four main building blocks: 1) form, 2) sequential structure of gestures in rela-
tion to speech and other gestures, 3) local context of use, i.e., gestures’ relation to
syntactic, semantic, and pragmatics aspects of speech, and 4) distribution of gestures
over different contexts use. The Methods of Gesture Analysis assumes that the meaning
of a gesture emerges out of a fine-grained interaction of a gesture’s form, its sequential
position, and its embedding within in a context of use (local and distributed). Thus, a
gesture’s meaning is determined in a (widely) context-free analysis of its form, which
grounds the later context-sensitive analysis of gestures.
The Linguistic Annotation System for Gestures represents particular aspects of the
Methods of Gesture Analysis. It takes up its first three building blocks, namely form,
sequential structure, and local context of use and transforms it into a format applicable
in annotation software such as ELAN (Wittenburg et al. 2006) or Anvil (Kipp 2001).
Annotation within the Linguistic Annotation System for Gestures is thereby under-
stood as “any type of text (e.g., a transcription, a translation, coding, etc.) that you
enter on a tier. It is assigned to a selected time interval of the video/audio file (e.g.,
to the utterance of a speaker) or to an annotation on another tier (e.g., a translation
is assigned to an orthographic transcription).” (Elan Manual) Accordingly, no explicit
distinction is drawn between transcription (descriptive perspective with a direct relation
to the spoken and gestural utterance) and annotation (analytic perspective with a ref-
erence to units within the transcripts) but a rather broad understanding of annotation
covering both aspects is assumed (see Bird and Liberman 2001).
The structure of the Linguistic Annotation System for Gestures is determined by the
focus on form aspects of gestures. It first provides for the description and motivation of
gestural forms (modes of representation, image schemas, motor patterns, and actions).
Afterwards it addresses gestures in relation to speech on a range of levels of linguistic
description for speech, that is prosody, syntax, semantics, and pragmatics. In doing so,
the Linguistic Annotation System for Gestures offers obligatory as well as optional as-
pects for each of the different levels of linguistic description and as such allows for a
broad or narrow annotation of gestures alone and in relation to speech (see Tab. 71.1
for an overview). The following overview of the Linguistic Annotation System for Ges-
tures presents the individual aspects for the annotation of gestures in their chronological
order.
Tab. 71.1: Overview of levels of annotation in Linguistic Annotation System for Gestures
obligatory/ controlled
Level of Annotation Name of Tier
optional vocabulary
Gesture Unit
determining units Gesture Phases obligatory
Hand Shape
Orientation
Position
annotation of form obligatory
Annotation of Movement Type
x
gestures Movement Direction
Movement Quality
Mode of representation (MoR)
Action
motivation of form obligatory
Motor pattern
Image schema
Speech Turn
annotation of Speech Turn-translation
speech
Speech Turn-Gesture Phases
(turn)
Annotation of Speech Turn-Gesture Phases translation
obligatory
speech Intonation Unit
annotation of Intonation Unit-translation
speech
Intonation Unit-Gesture Phases
(intonation unit)
Intonation Unit-Gesture Phases translation
Final pitch movement obligatory
prosody optional x
Accent (primary, secondary)
Word Class obligatory
Syntax Syntactic Function x
optional
Integration
Annotation of Temporal Relation obligatory
gestures in relation Semantics Semantic Relation x
to speech
optional
Semantic Function
Turn obligatory
Speech Act
Pragmatics x
Pragmatic Function optional
Dynamic Pattern
1102 V. Methods
2.1. Determining units of analysis

The first aspect of the linguistic annotation scheme concentrates on determining the
units of analysis and in particular on different levels of movement complexity observ-
able in the execution of gestures. In doing so, the Linguistic Annotation System for
Gestures departs from an understanding of gestures as “communicative movements
of the hands and arms, which, similar to language are used to express the thoughts, feel-
ings, and intentions of a speaker and which actively create the social organization of the
conversation.” (Müller 1998: 13, our translation) The Linguistic Annotation System for
Gestures does not provide for the annotation of self-adaptors, object-adaptors, and fid-
geting, which are considered as non-gestural (see e.g., Ekman and Friesen 1969; Ekman
1977; Freedman 1972; see Lausberg and Sloetjes 2009 for annotation system including
this type of bodily behavior).
Building on Kendon’s work (Kendon 1972, 1980), the Linguistic Annotation System
for Gestures considers gestures to exhibit a phrasal structure, which can be broken
down into a succession of different gesture phases. Moreover, it assumes that “the pat-
tern of movement that co-occurs with the speech has a hierarchic organization which
appears to match that of the speech units” (Kendon 1972: 190) so that gestures form
larger units, which match higher-level units at the verbal level (see also Condon and
Ogston 1966, 1967). Following these distinctions, the Linguistic Annotation System
for Gestures concentrates on two main levels of gestural movements, namely gesture
units and gesture phases and thus provides two separate lines of annotation for these
two levels of gestural unit formation.
2.1.1. Gesture units and gesture phases

Following Kendon (1980, 2004), gesture units are understood to capture the “entire
excursion, from the moment the articulators begin to depart from a position of relax-
ation until the moment when they finally return to one.” (Kendon 2004: 111) The
first step in any analysis of gestures in the Linguistic Annotation System for Gestures
therefore centers on delimiting the beginning and end of gestural movement sequences
from and to positions of rest.
Gesture units themselves are composed of gesture phases, that is individual move-
ment phases of gestures, considered to be potentially separable units of analysis
(Bressem and Ladewig 2011; Fricke 2012). Here, the Linguistic Annotation System
for Gestures distinguishes the following gesture phases: rest position, preparation,
stroke, hold, and retraction (see e.g., Bressem and Ladewig 2011; Kendon 1972, 1980;
Kita, van Gijn and van der Hulst 1998). The individual gesture phases are understood
to have different structural and functional relevance in creating units of gestural actions
and, in particular, of gestural meaning. Out of the distinguished gesture phases, stroke
phases are considered to be the most important gestural movement phases as the stroke
usually carries the gestural meaning, constitutes the nucleus of gestural units (Kendon
2004; Fricke 2012), and correlates with the most relevant part of the verbal utterance
(see e.g., Butterworth et al. 1978; Kendon 1972, 1980; McNeill 1992). Preparation
and hold phases on the other hand are assumed to play a particular role in creating
coherence among gesture phases (e.g., Fricke 2012). For the segmentation of gestural
movement sequences into gesture phases, the Linguistic Annotation System for Ges-
tures uses the “frame-by-frame marking procedure” (Seyfeddinipur 2006). By using
the sharpness of a video image, in which the execution of movement becomes apparent
in blurry and clear images, different types of transitions in the execution of gestural
movement sequences are distinguished. On the basis of different types of transitions,
gestural movement phases are assigned to specific types of gesture phases (see also
Ladewig and Bressem this volume).
With these two levels, namely gesture units and gesture phases, the Linguistic Anno-
tation System for Gestures sets up the basis for all annotations within the system.
Gesture units thereby serve as the broadest level of gesture segmentation. Gesture
phases on the other hand constitute the lowest level of gesture segmentation and more-
over make up the referring unit for all following annotations (see Fig. 71.2 for tier
dependency within the annotation system). Intermediate levels, such as gesture phrases,
are not included in the annotation system, as they serve no immediate function in
segmenting the gestural movement for the annotation process.
Apart from segmenting gestural movement sequences into units of different levels of
complexity the concentration on gesture units and gesture phases also serves a further
function. The allocation of gesture phases is not only a prerequisite for determining ges-
tural segments but also for specifying the gesture’s exact relation with units of the
speech stream. Thus it is a necessary step in detecting the meaning of a gesture. Further-
more following Kendon (1972), it is assumed that gestures form larger units, which
match higher-level units on the level of spoken language.
The larger the speech unit, the more body parts there are involved in this movement. For
locutions, for instance, only the head and the gesticulating limb are involved. For locution
groups, there is a shift in the trunk as well. For very high level units, such as “discourse”
or “listening,” there is a major change in the speaker’s total bodily position (Kendon
1972: 205).
Gesture units thereby most likely correlate with locution clusters, the highest level
of speech within a discourse, which can be understood to be equivalent to a paragraph.
Thus annotating gesture units provides a first approach to the thematic organization
of the conversation and can be helpful in annotating the semantic and functional rela-
tion of gestures with speech, especially with respect to particular types of gestures
(see section 2.3.4, 2.3.5.).
2.2. Describing the form of a gesture

Describing the form of a gesture is the second annotation block in the Linguistic Anno-
tation System for Gestures and includes the following aspects: 1) The description of ges-
tures based on the four form parameters of sign language. 2) Detecting the motivation
of a gestural form by explicating underlying cognitive-semiotic principles (Modes of
representation) as well as the gesture’s grounding in motor patterns, image schemas,
and actions. As describing the gestural form as well as detecting their motivation are
central for an understanding of a gesture’s meaning and function in relation to speech,
a context-free analysis of gestural forms is a necessary analytical step. Accordingly, all
of the above-mentioned aspects are obligatory within the Linguistic Annotation System
for Gestures and need to be annotated for the gesture phase “stroke”. What can be var-
ied, depending on the particular gestural phenomenon focused on, is the depth of the
1104 V. Methods
particular aspects addressed. When focusing on gestures’ relation with motion events,
for instance, the detailedness in the description of the four form parameters may be var-
ied. As such one may focus, for instance, on a close description of the parameter “move-
ment” whereas the parameters “hand shape” and “orientation” may be annotated in a
less detailed manner (see e.g., Müller 1998, 2000). However, the Linguistic Annotation
System for Gestures proposes that a rough annotation of all form aspects addressed
above is necessary for a sound linguistic analysis of gestures as it lays the basis for a
detection of structures and functions of gestures.
2.2.1. The four form parameters

The Linguistic Annotation System for Gestures approaches the description of gestures’
forms by applying the four parameters “hand shape”, “orientation”, “movement” and
“position in gesture space”, developed for the description of signs (Battison 1974;
Stokoe 1960) to gestures. Taking the four form parameters as the basis of a gestural
form description aims at systematically addressing the form aspects of a gestural Ge-
stalt. In doing so, it allows for a fine-grained description of gestures and for a detec-
tion of gestural patterns and structures (e.g., Fricke 2010, 2012; Kendon 2004; Müller
2004).
For the description of the gestural forms the Linguistic Annotation System for Ges-
tures uses a notation system that focuses solely on the physical appearance of gestures
and propagates a differentiation between an articulatory and a taxonomic description of
gestures’ forms (see Bressem this volume a).
2.2.2. The motivation of gestural forms

After concentrating on the gestural forms, the focus of the annotation shifts to their
motivation. Here, the Linguistic Annotation System for Gestures addresses the basic
semiotic processes involved in gestural sign creation. By asking what the hands actually
do when performing a gesture and by taking into account the ephemeral shapes, move-
ments, and objects that are created, a first account of a basic meaning of gestures is
aimed at (Müller 2010b, submitted).
Research on gestures has shown that gestures exploit motoric patterns of everyday
actions (e.g., Calbris 1990, 2011; Kendon 2004; Müller 1998, 2004; Streeck 1994,
2009), that they evoke geometrical or schematic patterns or gestalts (circles, oval
shapes, squares, planes; Mittelberg 2006, 2010), make use of basic cognitive image-
schematic structures (e.g. SOURCE-PATH-GOAL, CONTAINER/CONTAINMENT
Johnson 1987; in gestures e.g., Cienki 1998, 2005; Ladewig 2011; Mittelberg 2006,
2010), and can thus be understood as forms of exbodied cognitive structures (Mittel-
berg 2010). The Linguistic Annotation System for Gestures thus assumes that in ges-
tures we see and experience the embodied basis of verbal meaning while speaking.
Detecting the motivation of a gestural form is thus assumed to be a crucial step in trac-
ing back and explaining gestures’ characteristics of form, meaning, and function. The
Linguistic Annotation System for Gestures thus places particular emphasis on a de-
tailed account of the gestures’ modes of representation, motor patterns, image schemas
as well as underlying actions.
2.2.2.1. Modes of representation

“The basis for gestural meaning is – provided that gestures don’t point – mimesis.” (Mül-
ler 2010a: 48, our translation) Drawing on Aristotle’s concept of mimesis, gestural mime-
sis is distinguished according to the “media” (articulators), the “objects” (the range of
gestural referents such as actions, parts of actions, object, etc.) as well as the “modes”
(how mimesis is achieved) (Müller 2010a, 2010b, submitted). Based on the mode of
mimesis, two basic techniques by which gestural mimesis is achieved in gestures can be
distinguished: acting and representing (Müller volume 2).
In the acting mode the hands act as if they would perform (parts of) an actual action.
Depending on the reference object, acting gestures depict “actions only”, “actions with
specified object”, and “actions with unspecified object.” As such they may refer to ac-
tions or objects only or to actions with objects. In the representing mode the hands act
as if they were an object themselves. By representing events or objects in the world, the
hands may depict “objects”, “objects in motion”, “located or directed objects” as well
as “objects in spatial relation”. Furthermore, as gestures are able to enact aspects of
motion events they may either depict “motion only” or motion along with their path
and/or manner. Thus, representing gestures may refer to objects or motions only or
to objects in motion (see Müller, Ladewig and Teßendorf in preparation).
Both techniques of representation, acting and representing, fundamentally rest upon
abstraction and metonymic processes working on the sides of the producer and on the
perceiver (Calbris 1990, 2011; Mittelberg 2006, 2010; Mittelberg and Waugh 2009;
Müller 1998; Teßendorf 2005). Depending on the reference object and cultural norms,
particular traits of objects, actions, or processes are extracted. In the transformation
from practical action to gesture, for instance, gestures carry along the action’s effect.
In case of the objects being gesturally represented, a basic form gestalt has to be
abstracted from a perceived or conceived object.
The gestural mode of representation thus reconstructs the techniques of gestural
mimesis and in so doing is an essential element in gestural meaning construction.
Moreover, it provides access to the grounding of gestures in motor patterns, image
and actions schemas which then in turn advance the understanding of the nature and
meaning of gestural forms and, in particular, their embodied basis.
2.2.2.2. Motor patterns, image schemas, action schemas, and actions

Image schemas are “recurring dynamic pattern[s] of our perceptual interactions and
motor programs.” (Johnson 1987, p. xiv) Based on (visual) perception, motoric move-
ments and the handling of objects, they give coherence and structure to our experience
by operating at the pre-conceptual level and are a principle means for achieving mean-
ing structure (Johnson 1987). Evidence for image schemas has been given with respect
to a range of language phenomena such as grammatical structures (Langacker 1987;
Talmy 1983), semantic analyses of verbal systems (Brugman and Lakoff 1988) and
metaphorical concepts (Johnson 1993; Nuñez 2004; Sweetser 1998, 2006). Yet image
schematic structures “exist across all perceptual modalities and are at once auditory, ki-
nesthetic and tactile.” (Oakley 2005) As such, they are also assumed to underlie the cre-
ation and meaning of gestures. Realized in a static (entity) or dynamic fashion (process)
(e.g., Cienki 1997, 1998, 2005; Mittelberg 2006, 2010), they function as common skeletal
1106 V. Methods
structures for the recruitment of gestural forms (Cienki 2005). In doing so, image sche-
matic structures offer valuable insight into the context-free (Müller 2010b) or inherent
meaning of gestures (Ladewig and Bressem forthcoming).
A further puzzle stone in understanding the meaning and function of gestures can be
found in their basis in everyday actions. Gestures often constitute re-enactments of
basic mundane actions, grounding the gestures’ communicative actions in real world ac-
tions. By modulating the motion patterns of everyday actions, gestures abstract from the
actions in the real world, making them recognizable as as-if-actions, as signs (Müller
submitted). In this process of derivation from (everyday) actions to gestural meaning,
gestures select and recombine perceptually salient aspects and distinctive elements of
the action (e.g., Calbris 2003; Streeck 1994, 2009; Ladewig 2010, 2011; Müller 2004;
Teßendorf 2008). By being metonymically linked with the action itself, gestures
evoke elements from an action chain, such as actor, action, instrument, or result
(Calbris 1990; Müller 1998; Müller and Haferland 1997; Teßendorf 2008), and use
them for different communicative purposes (Teßendorf 2009). Thus, the concrete bodily
basis of gestures becomes visible in their forms providing metonymic pathways to
everyday actions.
3. Annotation of speech
The second block of the Linguistic Annotation System for Gesture addresses the anno-
tation of speech. Here, the system offers two possible strands: Speech occurring within
the boundaries of a gesture unit can either be annotated based on the notion of turns
(Sacks, Schegloff and Jefferson 1974) or intonation units (Chafe 1994). In both cases,
speech is transliterated or transcribed following the conventions of the “GAT2” (for
the English version see Couper-Kuhlen and Barth-Weingarten 2011). Annotating
speech, either based on the notion of turns or intonation units, is considered obligatory
within the Linguistic Annotation System for Gestures. Additionally, the system offers
the possibility to transliterate or transcribe speech in relation to further facets as, for
instance, speech in relation to the individual gesture phases or the translation of the
spoken utterance into another language. It further gives the opportunity to annotate
prosodic aspects of the spoken utterance, such as final pitch movement or focus accents.
3.1. Annotation of speech based on turns

Following the conversation analytic tradition, speech can be annotated based on the
notion of turns. The basic unit of talk within a conversation analytic perspective is
the Turn Construction Unit (TCU) (Sacks, Schegloff and Jefferson 1974). “The TCU
is […] a unit in conversation which is defined with respect to turn-taking: a potentially
complete turn.” (Selting 2000: 478) A Turn Construction Unit is not a linguistic unit but
rather an interactionally relevant unit ending in a Transition Relevance Place, that is a
possible completion point of turns. It is defined in relation to syntactic and prosodic as-
pects of the verbal utterance (Sacks, Schegloff and Jefferson 1974). The Linguistic
Annotation System for Gestures includes the possibility to annotate speech based on
the notion of turns as this allows for a focus on interactive structures within talk and
gestures relevance for constituting and organizing talk in interaction (see e.g., Bohle
2007; Müller and Paul 1999; Streeck and Hartge 1992).
If necessary, following the annotation of speech into Turn Constructional Units, a

translation of the verbal utterance may be included. Furthermore, the correspondence
of speech with individual gesture phases occurring within the gesture unit may be an-
notated. This allows for the determination of an exact relation of gesture phases with
units of speech (e.g., Bressem 2012; Ladewig 2012).
3.2. Annotation of speech based on intonation units

The spoken utterance can also be annotated based on the notion of intonation units.
“An intonation unit is a sequence of words combined under a single, coherent intona-
tion contour, usually preceded by a pause.” (Chafe 1995: 58) The Linguistic Annotation
System for Gestures includes the possibility to annotate speech based on Chafe’s con-
cept of intonation units due to the following reasons: 1) Intonation units are charac-
terized and identified on the basis of a variety of form-based criteria and are not
primarily dependent on syntactical units of verbal utterances (Chafe 1987, 1994).
2) Intonation units can contain more than one primary accent, which is of particular
importance when considering sequences of gestures (see Bressem 2012). 3) Most im-
portantly, intonation units make up a unit of mental and linguistic processing which
“verbalizes the speaker’s focus of consciousness” at the moment of speaking (Chafe
1994: 63) and gesturing. Each intonation unit thus verbalizes a different idea, event,
or state; a factor which is of crucial importance in explaining the function of gestures
in the flow of discourse (see Bressem 2012; Ladewig 2012; Müller and Tag 2010).
4) Intonation units provide a good basis of comparison for relating gestures to speech
other than the single word, and offer a way for a statistical analysis of gestures in rela-
tion to speech (Bressem 2012; see Cameron et al. 2009 for the relation of intonation
units and metaphors).
Same as in annotation of turns at talk, following the annotation of speech based
on intonation units, a translation of the verbal utterance may be included and the
correspondence of verbal units and gesture phases may be annotated.
4. Describing gestures’ relation to aspects of speech

The third block of the Linguistic Annotation System for Gestures addresses the anno-
tation of gestures in relation to speech on a range of different levels such as prosody and
syntax as well as in relation to semantic and pragmatic aspects. Within this second
block, the Linguistic Annotation System for Gestures focuses on a context-sensitive
analysis of gestures. It addresses gestures in relation to prosodic as well as syntactic as-
pects of the verbal utterance. Apart from these structural aspects, the context-sensitive
annotation concentrates on the temporal position of gestures with respect to the “co-
expressive” speech segment (McNeill 1992; McCullough 2005) and the semantic and
pragmatic function of gestures with speech. Finally, it focuses on the specific placement
of a gesture in relation to a conversational turn as well as to preceding or following ges-
tures. Each of those aspects, annotated for the gesture phase “stroke”, has particular
relevance for understanding and describing the meaning and function of gestures in
language use. Meaning is regarded here in its cognitive, functional, and interactive di-
mensions (Müller 2010b).
1108 V. Methods
4.1. Gestures and the prosody of speech

Prosody of speech, such as intonation, accent, rhythm, and tempo, has a particular rel-
evance for speech on a range of different levels. It influences the illocutionary force of
an utterance, expresses affect and emotion and is particularly relevant in establishing
the information structure of spoken utterances. As research on gestures has shown,
strokes are aligned with prosodic aspects of the utterance in a variety of ways. They
may correlate with stressed and unstressed syllables (e.g., Birdwhistell 1970; Kendon
1980; McClave 1991; Loehr 2004), rhythmic patterns (e.g., Tuite 1993; Loehr 2007),
and terminal pitch contours, for instance (e.g., Duncan 1972; Loehr 2004; McClave
1994). In doing so prosody and gesture work towards the creation of a “multimodal
salience structure” relevant for determining gestural meaning construal and function
(Müller and Tag 2010). Accordingly, the Linguistic Annotation System for Gestures in-
cludes the annotation of prosodic aspects and thereby focuses on the final pitch move-
ment as well as on focus accents of turns or intonation units. Annotating prosodic
aspects of the spoken utterance is regarded optional and can be excluded if not neces-
sary for the analytical question in mind.
4.1.1. Annotation of final pitch movement of turns or intonation units

The Linguistic Annotation System for Gestures propagates an annotation of final pitch
movements following the conventions of the “GAT2” (Couper-Kuhlen and Barth-
Weingarten 2011). As the final pitch movement of turns or intonation units has partic-
ular relevance for determining the function of a turn or intonation unit as a whole, it is a
central aspect in determining a gesture’s function. Depending on the final pitch move-
ment, gestures may accompany turns or intonation units fulfilling declarative or ques-
tioning function, which in turn has effects on the meaning and function of the gestures.
4.1.2. Annotation of focus accent of turns or intonation units

A further aspect coded in relation to the prosodic structure of the verbal utterance is
the relation of gestures and speech accents. Also following the “GAT2” (Couper-Kuh-
len and Barth-Weingarten 2011), the Linguistic Annotation System for Gestures anno-
tates the focus accent of the turn or of the intonation unit, that is the semantically and
pragmatically most relevant pitch accent indicating the focus of the utterance. The
annotation of focus accents may be relevant for determining a gesture’s discourse func-
tion on a range of different levels. It may indicate the gesture’s alignment with the pri-
mary focus of the verbal utterance revealing insights into the structural connection of
speech and gesture (e.g., McClave 1991; Loehr 2004). Moreover it provides insights
into the link between gestures and the thematic structure of the verbal utterance
(e.g., Loehr 2007; McNeill 1992, 2005). Thus, both aspects can contribute to determining
the function of gestures in relation to the verbal utterance.
4.2. Gestures and the syntax of speech

After annotating prosodic aspects, the Linguistic Annotation System for Gestures
focuses on a further aspect of a gesture’s local context of use, that is its relation with
the syntax of speech. A close description of the linguistic context in which a gesture is
placed is fundamental for the reconstruction of gestural meanings and for determining
the structural and functional relevance of gestures in language use (e.g., Bressem 2012;
Fricke 2012; Ladewig 2012; Müller, Lausberg, Fricke et al. 2005). Here, the system pro-
vides for the annotation of three facets: 1) Annotation of word classes, 2) Annotation of
syntactic functions, 3) Integration of gestures into the verbal utterance. From these
three aspects, only the annotation of the word classes is considered obligatory within
the system. All remaining aspects are regarded optional in the coding process and
can be annotated if relevant for the particular research question. The annotation is
done for the gesture phase “stroke”.
4.2.1. Annotation of word classes

For the decomposition of the spoken utterance into its constituents, the Linguistic
Annotation System does not specify a particular grammatical theory on which it
needs to be based. The framework can be chosen by the annotator according to the
particular research perspective needed for a grammatical analysis of speech.
Annotating word classes of the spoken utterance is included in the Linguistic Anno-
tation System for Gestures as it provides the necessary basis for determining the ges-
tures’ relevance in creating multimodal utterance meaning. As research has shown
gestures may fulfill attributive function by semantically modifying the nucleus of a
noun phrase (Bressem 2012; Fricke 2012) or they may be inserted in a syntactic gap pro-
viding the nucleus of a noun phrase (Ladewig 2012). Moreover, annotating word classes
of the spoken utterance is required for analyzing the degree of the integration of ges-
tures into the verbal utterance (see section 4.3.). Deictic particles like such (a) or like
this, for instance, may point to a position in the verbal utterance, in which a qualitative
description is needed and instantiated by a concurrent gesture (e.g., Fricke 2007, 2012;
Streeck 1988, 1990, 2002).
4.2.2. Annotation of syntactic functions

Annotating the syntactic function of spoken units is considered an optional facet in ana-
lyzing the gestures’ relation with the syntactic structure of the spoken utterance. It can
be included in the annotation of gestures if relevant for the gestural phenomenon under
investigation. This addresses the questions of what sorts of verbal units gestures corre-
late with or which syntactic functions are taken over by gestures (Ladewig 2012) and
can therefore reveal interesting insights for gesture-speech production. Moreover, it
might be useful for the relation of gestures with aspects of attention phenomena such
as Figure-Ground structures (Bressem 2012) or the selection of subjects in utterances
(e.g., Parrill 2008). In addition, it might prove to be useful for analyses focusing on
viewpoints in gesture (e.g., McNeill 1992).
4.2.3. Annotating gestures’ integration into speech

Annotating how gestures are integrated into speech is the last aspect of describing their
relation with the syntax of speech. Gestures can be integrated into the verbal utterance
1110 V. Methods
in different ways thereby showing “degrees of integrability” (Fricke 2012). Gestures may
be positionally integrated into the verbal utterance by a syntactic gap (Ladewig 2012) or
by temporal overlap (Bressem 2012; Fricke 2012). They may furthermore be integrated
cataphorically through the deictic such (a), the pronouns this, these, or the adverb here
(see Fricke 2007, 2012; Streeck 1988, 1990, 2002). The different degrees of integrability
need to be perceived as a continuum based on the type of integration (e.g., by being cat-
aphorically integrated or through occupation of syntactic gaps), the distribution of
information over the different modalities (i.e., redundance, supplementation or substitu-
tion), and the occurrence of the modalities (i.e., temporal overlap or linear succession)
(Ladewig 2012).
Annotating the type and degree of integration is considered optional within the sys-
tem and needs only to be incorporated if necessary. The Linguistic Annotation System
of Gestures however allows for the annotation of the type of integration as it of partic-
ular importance in analyzing gestures from a linguistic point of view and the perspective
of a multimodal grammar and, in particular, in determining a gesture’s function in
creating a multimodal utterance (meaning).
4.3. Gestures and the semantics of speech

Following the annotation of the gestures’ relation with the syntax of speech, the Lin-
guistic Annotation System for Gestures focuses on a further aspect of the gestures’
local context of use, that is their relation with the semantics of speech. Here, the system
focuses on the temporal positioning of gestures with regard to speech as well as on the
semantic function of gestures. All three aspects contribute fundamentally not only to
the meaning and function of gestures but more importantly they add to understanding
their role in creating a multimodal utterance meaning.
When examining gestures and the semantics of speech, the Linguistic Annotation
System for Gestures departs from the concept of “co-expressiveness” (McNeill 1992,
2005), assuming that speech and gesture “express the same underlying idea unit but
[they] express it in their own ways.” The “co-expressive segment might be a lexical affil-
iate, but there is no necessity for it.” (McNeill 2005: 22, 37) The Linguistic Annotation
System for Gestures furthermore accepts that a slight temporal distance between
speech and gesture is possible for both modes to be considered co-expressive (see
also de Ruiter 2000; Sowa 2005). To count as co-expressive both the speech segment
and the gestures need to be “interpretable as collectively referring to the same
thing” by specifying “a single conceptual category […] that the spoken and visible ele-
ments collectively” refer to (Engle 2000: 26).
In examining the relation of gestures and the semantics of speech, the Linguistic
Annotation System for Gestures follows a three-step procedure: First, the temporal po-
sitioning of gestures with regard to speech is identified. Second, the semantic relation of
gestures, and third the semantic function of gestures with respect to speech is deter-
mined. Annotating the temporal relation of the gestures with the co-expressive speech
segment is regarded obligatory within the system. Annotating the semantic relation and
functions is considered optional and needs only to be incorporated if necessary for the
research question.
4.3.1. Annotation of gestures’ temporal relation

The temporal position of a gesture in relation to speech determines its local meaning
and is of core importance for establishing the local meaning of a gesture because ele-
ments of the utterance, which are expressed simultaneously, are perceived as belonging
together. Considering the temporal co-occurrence of the verbal and gestural modality, a
distinction may be drawn between “serial combinations, that is sequential strings of
words and gestures” and “parallel combinations”, that is “simultaneous expressions
that overlap in time” (Andrén 2010: 64) as well as the occurrence of gestures alone.
Based on these distinctions, the Linguistic Annotation System for Gestures distin-
guishes the following possible temporal relations of gestures with the co-expressive
speech segment:
(i) Pre- and post-positioning, that is the linear combination of speech and gesture, in
which gestures may either precede or follow the co-expressive speech segment.
(ii) Parallel, that is the simultaneous combination of speech and gestures, in which ges-
tures are executed in temporal overlap with the co-expressive speech segment.
(iii) Gesture alone, that is the linear combination of speech and gesture, in which ges-
tures have no direct spoken counterpart at the moment of being uttered but occur
in pauses, in syntactic gaps, or in larger speechless segments.
4.3.2. Annotation of gestures’ semantic relation

Based on the annotation of the temporal position of gestures, the system focuses on the
semantic relation of the gestures with speech by examining the expressed semantic fea-
tures of the both modalities (see e.g., Beattie and Shovelton 1999, 2007; Bergmann,
Aksu and Kopp 2011; Bressem 2012; Ladewig 2012) and/or the image schematic pat-
terns visible in the gestural form parameters (see section 2.2.2.2). In doing so, the ges-
tures’ semantic relation to the co-expressive speech is characterized as one of the
following (see Bressem Ladewig 2012):
(i) Redundant: The gesture matches the semantic features or image schemas in speech
so that the features or image schemas may be identical or included among the set
of semantic features or image schemas expressed in speech.
(ii) Complementary/Supplementary: Speech and gesture do not match in the semantic
features or image schemas but the gesture contributes semantic features or image
schemas to speech “thus forming a subset of the meaning of the superordinate
modality, namely speech.” (Gut et al. 2002: 8)
(iii) Contrary: Speech and gesture do not match in the semantic features or image sche-
mas, but rather carry contrary features so that speech and gesture do not form an
overlapping set of features or image schemas.
(iv) Replacing: Gestures are used without speech.
4.3.3. Annotation of gestures’ semantic function

Starting from the examination of the gestures’ semantic relation with the co-expressive
speech segment, the Linguistic Annotation System for Gestures focuses on the gestures’
1112 V. Methods
semantic function with regard to the verbal utterance. By comparing the semantic
features and/or image schemas communicated in speech and gestures, the system cate-
gorizes the function of the gesture on the semantics of the spoken utterance as one of
the following:
(i) Emphasizing, when expressing redundant semantic features or image schemas.

(ii) Modifying, when expressing complementary semantic features or image schemas.
(iii) Additive, when carrying contrary semantic features or image schemas.
(iv) Substitutive, when expressing contrary semantic features or image schemas in the
absence of speech.
Gestures may thus either illustrate and emphasize what has already been uttered
verbally, modify the verbal meaning, or replace the verbal meaning (see e.g., Engle
2000; Freedman 1977; Fricke 2007, 2012; Gerwing and Allison 2009; Gut et al. 2002;
Kendon 1987; Kipp 2004; McNeill 1992, 2005; Müller 1998; Scherer 1979; Wilcox
2002). In doing so, gestures embody elements of the verbal meaning, mark salient infor-
mation, and highlight and foreground information in the flow of discourse (Alibali and
Kita 2010; Andrén 2010; Goodwin 2000; Ladewig 2012; Müller and Tag 2010).
4.4. Gestures and the pragmatics of speech

After having annotated the semantic relation of gestures with the spoken utterance,
the Linguistic Annotation System for Gestures addresses the pragmatics of the spo-
ken utterance. Here, gestures are regarded with respect to their positioning within
turns, their dynamic use over time, their relation with the verbal speech act, and
their pragmatic function. Of these four aspects, the first two (positioning within
turns and dynamic over use over time) are obligatory within the Linguistic Annota-
tion System for Gestures. Annotating the relation with the verbal speech and the
pragmatic function is optional and needs only to be taken into account if necessary
for the particular research question. All annotations are done for the gesture phase
“stroke”.
Focusing on these structural as well as functional aspects adds information on the
local context of use of gestures and specifies further aspects of the local meaning con-
struction in gestures. It furthermore exemplifies the fact that gestures dispose of the
same functional properties as language and are able to fulfill all three functions of
language namely, representation, expression, and appeal (see section 1).
4.4.1. Annotation of gestures’ position within turns

Gestures “are part and parcel of sequentially structured interactive processes of the
social organization of conversations, of joint actions and embodied communicative ac-
tivities.” (Müller submitted) In so doing, gestures relate to the verbal utterance in a
variety of ways (syntactically, temporarily and semantically, see above) leading also
to differences in their relation to the structure of turn taking as a social activity and
pragmatic aspects of the spoken utterance. Gestures may be positioned in the middle,
at the end, or the beginning of the turn-constructional-unit and as such may either func-
tion as turn entry or turn exit (Bavelas et al. 1992; Bohle 2007; Müller and Paul 1999;
Streeck and Hartge 1992) or even as a turn-holding device (Bohle 2007; Streeck and
Hartge 1992). At the beginning of the speaker’s turn, they would indicate the wish to
become next speaker; at the end of the turn, they would complete the turn, for instance,
by filling up a speech-pause or indicate the wish to maintain the right for the succeeding
turn. Annotating the placement of gestures in relation to the turn-constructional com-
ponent contributes to examining the means and functions of gestural meaning construc-
tion with respect to interactive aspects and furthermore sets the ground for evaluating
their pragmatic functions. Accordingly, the Linguistic Annotation System for Gestures
annotates the gestures placement either as “beginning of turn”, “end of turn” or “middle
of turn.”
4.4.2. Annotation of gestures’ relation to the verbal speech act

Regarding the pragmatic function of gestures from a speech-act theoretical point of
view (Austin 1962; Searle 1969), it becomes clear that gestures also perform speech
acts contributing to the verbal speech acts in a variety of ways. Some gestures primarily
express propositional content (gestures with a representational function or referential
gestures), others primarily realize the illocutionary force (gestures with a performative
function) while other gestures execute perlocutionary effects in the first place (see Mül-
ler this volume). Although some gestures are more apt at expressing speech acts than
others (concrete and abstract referential gestures versus recurrent gestures or em-
blems), gestures in general perform speech acts act regardless of their type. Accord-
ingly, while embodying aspects of the propositional content as in gestures with a
representational function, for instance, these gestures also perform a speech act, such
as providing an explanation and thus have pragmatic function in a wider sense (see
Müller submitted, this volume).
With respect to verbal speech acts, the Linguistic Annotation System for Gestures
annotates
(i) whether the gestures express propositional content,

(ii) whether they primarily relate to the illocutionary force or,
(iii) whether they primarily affect the perlocutionary force of the speech act.
4.4.3. Annotation of gestures’ speech act function

Based on the annotation of how gestures relate to the verbal speech act, the Linguistic
Annotation System for Gestures adds a further specification of the gestures’ speech act
function. Based on Searle’s classification of speech acts, the gestures can be annotated
as being related or functioning as assertives, directives, commissives, expressives and de-
clarations (Searle 1969) and they can be classified as direct or indirect speech acts (see
also Kacem 2012). Furthermore it is possible to specify the perlocutionary force of
gestures by classifying them either as performative (Kendon 1995, 2004; Müller
1998), modal or parsing (Kendon 2004), as discursive (Müller 1998), as “displaying
the communicative act of the speaker and act upon speech as speech-performatives”,
or as “performatives” when aiming at “a regulation of the behavior of others”
(Teßendorf 2008; see also Brookes 2004, 2005; Müller 1998; Müller and Speckmann
2002; Streeck 2005).
1114 V. Methods
4.4.4. Annotation of the dynamic pattern of gestures over time

The meaning of a gesture is not created on the spot. Rather gestures take part in mean-
ing making processes, which evolve dynamically over time (Kappelhoff and Müller
2011; Kolter et al. 2012; Ladewig 2012; Müller 2007, 2008a, 2008b; Müller and Tag
2010). Gestures are often used in more or less loose conjunctions with other gestures
and often build gestural units of different complexity. This may concern small sequences
of gestures such as in the description of objects (e.g., Müller 1998), gestural repetitions
(e.g., Bressem 2012; Fricke 2012) or the successive use of pragmatic gestures (e.g., Ken-
don 2004; Müller 2004, submitted; Teßendorf 2008), in which gestures are employed
shortly one after another. Yet it may also concern the use of gestures over larger
time spans, such as for instance, their use in a narration told in a conversational inter-
action (McNeill 1992, 2005; Müller 1998), in which the recurrence of a gesture is rele-
vant in terms of communicative dynamism (McNeill 1992, 2005) or the functional
rheme/theme structure by highlighting new rather than given information (McNeill
1992; Müller 2003). The dynamic use of gestures over time thereby creates a “multimo-
dal salience structure” for speakers and hearers (Müller and Tag 2010), which not only
serves as a central means for multimodal meaning construction but also for the coordi-
nation of interaction between speakers, thereby assuring mutual understanding and
alignment in interactions.
Accordingly, regarding the dynamic pattern of gestures over time, the Linguistic
Annotation System for Gestures annotates the different occurrences of a gestural
form. This is done for the recurrent use of similar or same gestural form features
over the course of a discourse as, for instance, in so called “catchments” (McNeill
1992) but also for the succession of gestures in describing one and the same reference
object for instance. This allows for the observation of new and emerging verbo-gestural
meaning or the foregrounding of established and activated verbo-gestural meaning.
5. The role of speech in the annotation process

By being grounded in a linguistic approach to gestures and based on the Methods of
Gesture Analysis, the Linguistic Annotation System for Gestures is determined by a
focus on form aspects of gestures, which results in a methodological approach giving
form a prominent role in the process of description and analysis (see section 2). This
focus on form leads to a particular role of speech in the annotation process. Based
on the assumption that a description of gestural form features should precede an ana-
lysis of their meaning(s) and function(s), the Linguistic Annotation System for Gestures
sets out with a separation of gesture and speech. Gestures are, first and foremost, inves-
tigated independently from speech. Only successively are gesture and speech brought
together, leading to an investigation of form, meaning, and function of gestures alone
as well as in relation to speech.
The concentration on gestural form and thus on an annotation of gestures indepen-
dent from speech is reflected in the set up of the Linguistic Annotation System for Ges-
tures. Gesture and speech analysis are related to each other only after having completed
the annotation of gestural forms and meanings. Accordingly, annotating speech as well
as the gestures’ relation to prosody, semantics, syntax, and pragmatics of speech is part
of the third and last annotation block of the system. Henceforth, speech is not taken
into account in the first steps for annotation of the Linguistic Annotation System for
Gestures but it is gradually added throughout the annotation process. Gesture and
speech are taken into account as two separate articulatory modalities with gesture
being integrated with speech both structurally and functionally, while exhibiting mod-
ality specific forms, structures, and patterns. The separation of speech and gesture in
the annotation process is thereby understood as a necessary heuristic procedure in
order to detect and describe the “gestures’ potential for language” (Müller 2009).
6. Practical implementation of the annotation system

The Linguistic Annotation System for Gestures has been designed for its application in
the annotation software ELAN, a professional tool for the creation of complex annota-
tions on video and audio resources, which supports multileveled transcriptions (Witten-
burg et al. 2006). The Linguistic Annotation System for Gestures is yet is also
applicable to other annotation software such as Anvil (Kipp 2004) or Exmaralda
(Schmidt 2004), for instance. In the following, we will shortly address its implication
in the annotation software ELAN.
The tiers of the annotation file consist of the obligatory and optional aspects dis-
cussed in sections 2–4. Each aspect is annotated on a separate tier (see Fig. 71.1). An-
notations of gestural forms are furthermore done separately for both hands. Tiers are
furthermore part of hierarchies leading to particular dependencies of the individual
tiers (aspects of annotations) from each other (see Fig. 71.2). Individual tiers are fur-
thermore associated with controlled vocabularies, which are user–definable lists of va-
lues. Controlled vocabularies are used to present the list of values possible for the
particular obligatory as well as optional aspects of annotation. In doing so, controlled
vocabularies not only accelerate and enhance the annotation but, more importantly,
they make it less prone to errors allowing for the creation of consistent and comparable
corpora. The Linguistic Annotation System for Gestures contains controlled vocabul-
aries for almost all tiers (see Tab. 71.1). The hierarchical set up of the tiers along
with the controlled vocabularies is stored by means of a template file, holding the
skeleton of a transcription within the Linguistic Annotation System for Gestures.
For an example of an annotation based on the Linguistic Annotation System for ges-
tures along with the hierarchical dependencies of the tiers and associated controlled vo-
cabularies see Figs. 71.1 and 71.2.
Apart from practical advances going along with the outlined set up of a file based on
the Linguistic Annotation System for Gestures, it offers further benefits for the process
of analysis and evaluation. If necessary, the implication of the Linguistic Annotation
System for Gestures in ELAN allows for inter-rater agreement, rater coding, etc.
(see Wittenburg et al. 2006). Furthermore, the search system of ELAN provides the
means to construct complex queries in single or multiple files, offering a valuable
means for research based on larger corpora. The feature of exporting the annotation
in a range of file formats (e.g., Hypertext Markup Language, Excel) allows for the pos-
sibility of running further analyses and queries in other (statistical) programs and thus
makes the Linguistic Annotation System for Gestures a useful tool for descriptive as
well as experimental approaches.
1116 V. Methods
Fig. 71.1: Example of Linguistic Annotation System with speech annotation based on the notion
of turn
7. Conclusion
The present article has outlined a perspective on the annotation of gestures grounded in
a (cognitive) linguistic approach to language use and provided guidelines for the anno-
tation of gestures and speech on a range of levels. By explicating underlying theoretical
and methodological assumptions in describing gestures’ forms and reconstructing their
meanings and function with and without speech, the annotation system complements
the list of existing annotation systems (e.g., Gut et al. 2002; Kipp, Neff and Albrecht
2007; Lausberg and Sloetjes 2009) by offering a systematic description at a range of le-
vels of linguistic description. Furthermore, by offering obligatory as well as optional as-
pects of annotation, the systems offers flexibility and expandability allowing it to be
adjusted to a range of research questions which are addressed by different depths of
Fig. 71.2: Tier dependency in the Linguistic Annotation System for Gestures
1118 V. Methods
description and of analysis. The Linguistic Annotation System of Gestures is thus also
applicable in a range of disciplines such as (cognitive) linguistics, semiotics, and cog-
nitive science, for instance, interested in a linguistic and form-based perspective on
gestures and their relation to speech.
Acknowledgments
We are grateful to the Volkswagen Foundation for supporting this work with a grant for
the interdisciplinary project “Towards a grammar of gesture: evolution, brain and
8. References
Alibali, Martha W. and Kita, Sotaro Kita 2010. Gesture highlights perceptually present informa-
tion for speakers. Gesture 10(1): 3–28.
Andrén, Mats 2010. Children’s Gestures from 18 to 30 Months. Centre for Languages and Litera-
ture, Centre for Cognitive Semiotics. Lund: Lund University.
Austin, John L. 1962. How to Do Things with Words. Oxford: Clarendon Press.
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies 5: 1–19.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive ges-
tures. Discourse Processes 15: 469–489.
123(1/2): 1–30.
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language, 221–241. Philadel-
phia: John Benjamins.
Bergmann, Kirsten, V. Aksu and Stefan Kopp 2011. The relation of speech and gestures: Temporal
synchrony follows semantic synchrony. Paper presented at the 2nd Workshop on Gesture and
Speech in Interaction – GESPIN, 5-7 September. Bielefeld, Germany.
Bergmann, Kirsten and Stefan Kopp 2006. Verbal or visual? How information is distributed across
speech and gesture in spatial dialog. In: David Schlangen and Raquel Fernandez (eds.), Pro-
ceedings of brandial 2006, the 10th Workshop on the Semantics and Pragmatics of Dialogue,
90–97. Potsdam University Press, Germany.
Bird, Steven and Mark Liberman 2001. A formal framework for linguistic annotation. Speech
Communication 33(1/2): 23–60.
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body Motion Communication. Philadel-
Bohle, Ulrike 2007. Das Wort ergreifen – das Wort übergeben: Explorative Studie zur Rolle rede-
begleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bressem, Jana 2007. Recurrent form features in coverbal gestures. Unpublished manuscript.
tation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
Bressem, Jana this volume a. A linguistic perspective on the notation of form features in gestures.
Bressem, Jana this volume b. Transcription systems for gestures, speech, prosody, postures, gaze.

tural movement? Semiotica 184(1–4): 53–91.
Brugman, Claudia and George Lakoff 1988. Cognitive topology and lexical networks. In: Steven
Small, Garrison Cotrell and Michael Tanenhaus (eds.), Lexical Ambiguity Resolution: Perspec-
tives from Psycholinguistics, Neuropsychology, and Artificial Intelligence, 477–508. San Mateo,
CA: Morgan Kaufmann.
Bühler, Karl 2011. Theory of Language. The Representational Function of Language. Amsterdam:
John Benjamins.
Butterworth, Brian, Geoffrey Beattie, Robin N. Campbell and Philip T. Smith 1978. Gesture and
silence as indicators of planning in speech. In: Robin N. Campbell and Philip T. Smith (eds.),
Recent Advances in the Psychology of Language: Formal and Experimental Approaches, 347–
360. New York: Plenum Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis. Gesture as the representa-
Calbris, Genevieve 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Cameron, Lynn, Robert Maslen, Zazie Todd, John Maule, Peter Stratton and Neil Stanley 2009.
The discourse dynamics approach to metaphor and metaphor-led discourse analysis. Metaphor
and Symbol 24(2): 63–89.
Chafe, Wallace L. 1987. Cognitive constraints on information flow. In: Russel S. Tomlin (ed.),
Coherence and Grounding in Discourse, 21–51. Amsterdam: John Benjamins.
Cienki, Alan 1997. Some properties and groupings of image schemas. In: Marjolijn Verspoor, Kee
Dong Lee and Eve Sweetser (eds.), Lexical and Syntactical Constructions and the Construction
of Meaning: 3–15. Amsterdam/Philadelphia: John Benjamins.
Cienki, Alan 1998. Straight: An image schema and its transformations. Cognitive Linguistics 9:
107–149.
Condon, William S. and William D. Ogston 1967. A segmentation of behavior. Journal of Psychia-
trical Research 5: 221–235.
Couper-Kuhlen, Elizabeth and Dagmar Barth-Weingarten 2011. A system for transcribing talk-in-
interaction: GAT 2 translated and adapted for English by Elizabeth Couper-Kuhlen and Dagmar
Barth-Weingarten. Gesprächsforschung – Online Zeitschrift zur verbalen Interaktion 12: 1–51.
sity Press.
De Ruiter, Jan Peter 2000. The production of gesture and speech. In: David McNeill (ed.),
Duncan, Starkey 1972. Some signals and rules for taking speaking turns in conversations. Journal
of Personality and Social Psychology 23(2): 283–292.
Duncan, Susan n.d. Coding “Manual”. Accessed 28.07.2012 via http://www.mcneilllab.uchicago.
edu/pdfs/Coding_Manual.pdf.
1120 V. Methods
Ekman, Paul 1977. Facial expression. In: Aaron Siegman and S. Feldstein (eds.), Nonverbal Behav-
ior and Communication, 97–116. Hillsdale, NJ: Lawrence Erlbaum.
gins, usage and coding. Semiotica 1: 49–98.
Engle, Randi A. 2000. Toward a theory of multimodal communication combining speech, gestures,
diagrams, and demonstrations in instructional explanations. Ph.D. dissertation, School of Edu-
cation, Stanford University, Stanford, CA.
Freedman, Norbert 1977. Hands, words and mind. On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and Stanley
Grand (eds.), Communicative structures. A psychoanalytic interpretation of communication,
219–235. New York: Plenum Press.
Freedman, Norbert 1972. The analysis of movement behaviour during the clinical interview. In:
Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175.
New York: Pergamon Press.
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik: Wie Artikula-
tionen zu Typen werden, die bedeuten können. In: Sprache und Literatur 41(1): 70–88.
Gruyter.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech. Cornelia Müller,
Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin/Boston: De
Gruyter Mouton.
Gerwing, Jennifer and Meredith Allison 2009. The relationship between verbal and gestural con-
tributions in conversation: A comparison of three methods. Gesture 9(3): 312–336.
Goodwin, Charles 2000. Action and embodiement within situated human interaction. Journal of
Johnson, Mark 1987. The Body in Mind. The Bodily Basis of Meaning, Imagination, and Reason.
Johnson, Mark 1993. Conceptual metaphor and embodied structures of meaning: A reply to Ken-
nedy and Vervaeke. Philosophical Psychology 6(4): 413–422.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.),
From Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruy-
ter Mouton.
Kacem, Chaouki 2012 Gestenverhalten an deutschen und tunesischen Schulen. Ph.D. dissertation,
Technische Universität Berlin.
metaphor and expressive movement in speech, gesture, and in feature film. Metaphor and
Social World 1: 121–153.
Kendon, Adam 1972. Some relationship between body motion and speech In: A. Seigman and B.
Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Pergamon Press.
Kendon, Adam 1987. On gesture: Its complementary relationship with speech. In: Aron W. Sieg-
mann and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 65–97. Hillsdale,
Kipp, Michael 2001. Anvil – a generic annotation tool for multimodal dialogue. In: Proceedings
of the 7th European Conference on Speech Communication and Technology (Eurospeech),
1367–1370. Aalborg, Denmark.
Kipp, Michael, Michael Neff and Irene Albrecht 2007. An annotation scheme for conversational
gestures: How to economically capture timing and form. Journal on Language Resources and
Evaluation – Special Issue on Multimodal Corpora 41(3–4): 325–339.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and
cospeech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Springer.
Kolter, Astrid, Silva H. Ladewig, Michela Summa, Sabine Koch, Thomas Fuchs and Cornelia
Müller 2012. Body memory and emergence of metaphor in movement and speech. An
interdisciplinary case study. In: Sabine Koch, Thomas Fuchs, Michaela Summa and Cornelia
Müller (eds.), Body Memory, Metaphor, and Movement, 202–226. Amsterdam: John
Benjamins.
Kopp, Stefan, Paul Tepper and Justine Cassell 2004. Towards an integrated microplanning of
language and iconic gesture for multimodal output. Paper presented at the ICMI 04 October
13–15, State College, PA.
Ladd, D. Robert 1996. Intonational Phonology. Cambridge: Cambridge University Press.
Ladewig, Silva H. 2007. The family of the cyclic gesture and its variants – systematic variation of
form and contexts. http://www.silvaladewig.de/publications/papers/Ladewig-cyclic_gesture_pdf;
accessed January 2008.
cognitextes.revues.org/406
nitive, and conceptual aspects. Ph.D. dissertation, Faculty of Social and Cultural Sciences, Euro-
Ladewig, Silva H. and Jana Bressem this volume. The notation of gesture phases – a linguistic per-
spective. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discover-
ing Structures in gestures based on the four parameters of sign language. Semiotica.
Langacker, Ronald 1987. Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites.
Stanford, CA: Standford University Press.
Lascarides, Alex and Mathew Stone 2006. Formal semantics for iconic gesture. In: David Schlan-
gen and Raquel Fernandez (eds.), brandial’06 Proceedings, 64–71. Potsdam, Germany: Pots-
dam University Press.
Lausberg, Hedda and Han Sloetjes 2009. Coding gestural behavior with the NEUROGES–ELAN
system. Behavioral Research Methods 41(3): 841–849.
Loehr, Daniel 2004. Gesture and intonation. Ph.D. dissertation, Graduate School of Arts and
Sciences, Georgetown University, Washington, DC.
Loehr, Daniel 2007. Aspects of rhythm in gesture and speech. Gesture 7(2): 179–214.
Martell, Craig 2002. Form: An extensible, kinematically– based gesture annotation scheme. Paper
presented at the International Conference on Language Resources and Evaluation. European
Language Resources Association. 29–31 May. Las Palmas.
1122 V. Methods
McClave, Evelyn Z. 1991. Intonation and gesture. Georgetown University, Washington DC. PhD Thesis.
McClave, Evelyn Z. 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
Research 23(1): 45–66.
McCullough, Karl Erik 2005. Using gestures during speaking: Self-generating indexical fields. Ph.
D. dissertation, Chicago University.
of Chicago Press.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: discoursive evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University.
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
lin: Arno Spitz.
für Helmut Richter, 211–228. Frankfurt am Main: Peter Lang.
and Use, 259–265. Porto, Portugal: Universidade Fernando Pessoa.
Müller, Cornelia 2007. A dynamic view on metaphor, gesture and thought In: Susan Duncan, Jus-
tine Cassell and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language. Essays
in Honor of David McNeill, 109–116. Amsterdam: John Benjamins.
Müller, Cornelia 2008a. Metaphors. Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
Müller, Cornelia 2009. Gesture and language. In: K. Malmkjaer (ed.), Routledge’s Linguistics
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Martin Vöhler and Christiane
Müller, Cornelia volume 2. Gestural modes of representation as techniques of depiction. In: Cornelia
Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body–Language – Communication: An International Handbook on Multimodality in Human
Interaction. (Handbooks of Linguistics and Communication Science 38.2.) Berlin/Boston: De

Gruyter Mouton.
Müller, Cornelia submitted. How gestures mean – The construal of meaning in gestures with
speech.
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände. Zur Semiose performativer Ges-
ten. Mitteilungen des Deutschen Germanistenverbandes 44(3): 29–50.
lysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Jana Bressem (eds.), Body – Language – Communication: An International Handbook on Mul-
timodality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.2.)
Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Silva H. Ladewig and Sedinha Teßendorf in preparation. Gestural modes of rep-
resentation – revisited. Unpublished manuscript.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a grammar of
gesture: Evolution, brain, and linguistic structures. Berlin: Antrag im Rahmen der Förderinitia-
tive “Schlüsselthemen der Geisteswissenschaften Programm zur Förderung fachübergreifender
und internationaler Zusammenarbeit”.
syntaktische Fallstudie. In: Hartmut Eggert and Janusz Golec (eds.), … wortlos der Sprache
mächtig. Schweigen und Sprechen in Literatur und sprachlicher Kommunikation, 265–281.
Stuttgart, Germany: Metzler.
Müller, Cornelia and Susanne Tag 2010. The embodied dynamics of metaphoricity. activating me-
taphoricity in conversational interaction. Cognitive Semiotics. 6: 85–10.
Nuñez, Rafael E. 2004. Embodied cognition and the nature of mathematics: Language, gesture,
and abstraction. In: K Forbus, Dendre Gentner and T. Regier (eds.), Proceedings of the
26th Annual Conference of the Cognitive Science Society, 36–37. Mahwah, NJ: Lawrence
Erlbaum.
Oakley, Todd 2005. Image schemas. In: Beate Hampe and Jospeh E. Grady (eds.), From Percep-
tion to Meaning: Image Schemas in Cognitive Linguistics. Berlin: De Gruyter Mouton.
Parrill, Fey 2008. Subjects in the hands of speakers: An experimental study of syntactic subject and
nization of turn-taking for conversation. Language 50(4): 696–735.
Sager, Svend F. 2001. Probleme der Transkription nonverbalen Verhaltens. In: Klaus Brinker,
Gerd Antos, Wolfgang Heinemann, and Svend F. Sager (eds.), Text und Gesprächslinguistik.
Ein internationales Handbuch zeitgenössischer Forschung, 1069–1085. (Handbücher zur
Interaktionsverhalten, 25–32. Weinheim, Germany: Beltz.
Schmidt, Thomas 2004. EXMARaLDA: Ein System zur computergestützten Diskurstranskription.
In: Alexander Mehler and Henning Lobin (eds.), Automatische Textanalyse. Systeme und Meth-
oden zur Annotation und Analyse natürlichsprachlicher Texte, 203–218. Wiesbaden, Germany:
Verlag für Sozialwissenschaften.
1124 V. Methods
Searle, John R. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cam-
Selting, Margret 2000. The construction of units in conversational talk. Language in Society 29(4):
477–517.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting speech and gesture. Ph.D. dissertation,
University Nijmegen, the Netherlands.
William C. McCormack and Stephen A. Wurm (eds.), Language and Man: Anthropological Is-
sues, 217–227. The Hague: Mouton.
Sowa, Timo 2005. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Ph.D.
dissertation. Berlin: Akademische Verlagsgesellschaft.
Streeck, Jürgen 1988. The significance of gesture: How it is established. IPrA Papers in Pragmatics
2(1/2): 60–83.
Streeck, Jürgen 1994. Culture, meaning, and interpersonal communication. In: Mark L. Knapp
and Gerald R. Miller (eds.), Handbook of Interpersonal Communication: 286–319. London:
Sage Publications Ltd.
Streeck, Jürgen 2002. Grammars, words, and embodied meanings: on the uses and evolution of so
Streeck, Jürgen 2005. Pragmatic aspects of gesture. In: Jacob L. Mey (ed.), Encyclopedia of Lan-
guage and Linguistics, vol. 5: Pragmatics, 71–76. Oxford: Elsevier.
Streeck, Jürgen 2009. Gesturecraft. The manu-facture of meaning. Amsterdam: John Benjamins.
Aldo di Luzio (eds.), The Contextualization of Language, 138–158. Amsterdam: John Benjamins.
Sweetser, Eve 1998. Regular Metaphoricity in Gesture: Bodily-Based Models of Speech Interaction.
Oxford: Elsevier.
Sweetser, Eve 2006. Negative spaces: Levels of negation and kinds of spaces. GRAAT 35: 313–332.
Sweetser, Eve and Fey Parrill 2004. What we mean by meaning: Conceptual integration in gesture
Talmy, Leonard 1983. How language structures space. In: Herbert L. Pick and Linda P. Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum Press.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de Bar-
rer”. Unpublished MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Teßendorf, Sedinha 2009. From everyday action to gestural performance: Metonymic motivations
of a pragmatic gesture. Paper presented at the Second Aflico Conference, Lille. 10–12 May.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1–2): 83–105.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana Al-
bertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
Wittenburg, Peter, Hennie Brugman, A. Russal, A. Klassmann and Han Sloetjes 2006. ELAN: A
professional framework for multimodality research. In: Proceedings of the Fifth International
Conference on Language Resources and Evaluation (LREC), 1556–1559.
Jana Bressem, Chemnitz Germany

Silva H. Ladewig, Frankfurt (Oder) Germany
Cornelia Müller, Frankfurt (Oder) Germany
72. Transcription systems for sign languages 1125
72. Transcription systems for sign languages:

A sketch of the different graphical representations
of sign language and their characteristics
1. Socio-historical and structural features of sign languages
2. Notation systems
3. Annotation systems
4. Problems and perspectives
5. References
Abstract
Sign languages are natural languages developed and used in all parts of the world where
there are deaf people. Although different from each other, these languages, that have been
exposed to varying degrees of communitisation and institutionalisation, share a signifi-
cant number of common structures. As in the study of any other language, the linguistic
study of visual-gestural languages, now a scientific field in its own right, faces the
unavoidable problem of their graphical representation. This paper provides an overview,
albeit non-exhaustive, of existing solutions to this challenge. We begin by reviewing the
main difficulties posed by the graphical representation of these languages, which have
no writing system of their own, taking into consideration their modal and structural char-
acteristics (section 2). We then present the major types of graphical representations that
have been developed for sign languages, noting their respective strengths and limitations
(sections 3 and 4). In section 5, we outline the problems that remain unsolved at this
point, if one aims to achieve transcription of a growing number of sign language corpora
that would meet research needs while respecting the original features particular to these
languages.
1. Socio-historical and structural features of sign languages

In the course of history, the existence of signing deaf communities is attested in various
sources (e.g. Saint-Augustin 2002 [De Magistro]; Montaigne 2009 [Essais, II.12]). And
yet, sign languages have been virtually ignored as such, until very recently. While the
idea of the educability of the deaf emerged in the Renaissance, sign languages acquired
real visibility only with the educational enterprise initiated in 1760 by Abbé Charles-
Michel de l’Epée (1712–1789). Introducing the revolutionary principle of mass educa-
tion of the deaf based on their “mimicry”, de L’Epée gave a social existence to the
deaf and their language, which lasted a little over a century. However, this initial
phase of recognition was followed by a new century of denial, starting with the prohi-
bition of sign languages in the Congress of Milan (1880) and until the deaf awakening in
1960–1980. Sign language linguistics itself is still a young domain, initiated in the 20th
century by Stokoe’s (1960) study of American Sign Language (ASL), and becoming an
active and diverse theoretical field since.
Given that the deaf (always a minority) have been socially recognised as speakers
only during brief periods, it is hardly surprising that no sign language has developed
1126 V. Methods
a writing system. The vast majority of spoken languages are also unwritten. The major
difference, however, is that some writing system can be easily adapted for use in any
spoken language over time. No such option exists for sign language, which has no sim-
ilar written tradition to build upon. In light of the writing processes used in spoken lan-
guages, sign languages pose very specific problems, related to their modality and its
structural consequences.
Crucially, sign languages exploit the availability of all manual and bodily articulators
(hands, head, shoulders, facial expressions, eye gaze), which can be used simultaneously
or in succession. This parametric multi-linearity is combined with a sophisticated use of
the space in front of the signer that is opened by these articulators; thus, most syntac-
tico-semantic relations in sign languages are spatialised (establishment of reference to
entities, time and space, pronominalisation and maintained reference). It is agreed that
these characteristics reflect the difficulties raised by the graphical representation of
these languages: complex temporal relations between the articulators (co-articulation,
hold and overlap), appropriate use of space through pointing signs and continued use
of the areas thus activated, variability of gestural units through the modification of
location and/or orientation (e.g., Bergman et al. 2001; Johnston 1991; Miller [1994]
2001; Stokoe 1987). However, the descriptive approach proposed by the semiological
model (Cuxac 1996, 2000) points to an additional difficulty.
In addition to conventional lexical units that are widely recognized in the literature
(as “frozen signs” or even “signs” or “words”), this model considers as central another
type of unit, non-conventionalised but employing a limited number of structures
(termed transfer units). Although listed in the literature as “classifier constructions”
(see Emmorey 2003) or “productive signs” (Brennan 1990), these units are generally
analyzed as peripheral and non-linguistic. Yet, they represent 30–80% of sign language
discourse (Sallandre 2003; Antinoro Pizzuto et al. 2008). If we adopt the semiological
perspective that considers units of this type the very heart of sign languages, their inclu-
sion increases the difficulty of graphical representation, as they are based on a semiosis
of the continuity, given their “illustrative intent” to say through showing. In sign lan-
guage discourse, these units are tightly intertwined with the conventional units
(“non-illustrative intent”), and the entire discourse alternates between both intents.
Significantly, the two moments in history which gave rise to an explicit linguistic
reflection on sign languages were accompanied by the development of graphical sys-
tems for their representation: Bébian (1825) and Stokoe (1960). The development of
the modern linguistic study of these languages has also been accompanied by a prolif-
eration of graphical systems (for a review, see Boutora 2005). Classification of these in-
ventions must take two variables into account. First, the goals of the representation
system, as transcription only (the graphic representation of produced data) or as a sys-
tem for written communication as well. Second, and more importantly, the semiological
features of the system used, in particular whether it employs specific symbols and its
own internal logic, or a pre-existing writing system (de facto, the written form of the
national spoken language). On this basis, we can distinguish two sets, which we term
“notation systems” and “annotation systems.” Notation systems are autonomous and
specific systems, sometimes intended for written communication, which share central
semiological features: They are mono-linear, and focused, at least in their design, on
the representation of lexical signs in terms of their visual form outside any discourse
context. In contrast, annotation systems, which are based on the written form of spoken
language, are intended to represent discourse, and used only in the context of linguistic
research.
2. Notation systems
The moment sign language was taken into consideration in the education of deaf chil-
dren, its graphical representation became an issue, particularly for the creation of dic-
tionaries for teachers and students. We will not specify the graphical means used in such
dictionaries from the late 18th century, referring the reader to the full review by Bonnal
(2005). Two methods (which may be combined) were used for these representations:
drawing (enhanced by symbols indicating movement, at times represented by a
sequence of drawings) and, the more dominant, descriptions of the signifier form of a
sign, written in spoken language.
The first independent notation system is Bébian’s (1825) Mimographie, developed
for the sign language used at the Institution de Paris, and intended for purely pedago-
gical purposes (e.g. Fischer 1995). Bébian’s aim was not to provide a written form of
signed discourse, but simply to represent the “mimicked signs.” This purpose is remark-
ably ahead of its time. The idea is that each sign is represented in a linear sequence in-
dicating the relevant body part (“l’organe qui agit,” represented through 86 characters),
its movement (68), its position (14), and if needed, the facial expression (20) as well.
The Mimographie system, although never implemented and not the only attempt at
notation in the 19th century (Piroux 1830), is foundational, and serves as the basis
for all modern notation systems, starting with Stokoe (1960).
For Stokoe (1960, [1965] 1976), the creation of notation rather than transcription was
seen as part of the demonstration of the linguistic status of American Sign Language,
the characters, cherems, intended to be equivalent to phonemes, and thus to prove
the existence of a double articulation. His analysis of signs is directly inspired by Bébian
but diverges in several respects. Focusing only on manual aspects, Stokoe retains only
the handshape (vs. Bébian’s conformation, which included orientation as well), and
adding the parameter of location, but removes facial expression. Stokoe’s system is
composed of 55 cherems (19 handshapes, 12 locations, 24 movements), using symbols
borrowed from the Latin alphabet, the numerical system and some invented for this
purpose, and is generally devoid of iconicity (see Fig. 72.1). This model is the direct
source of the vast majority of systems used over the next two decades, as linguistic
study of other sign languages developed, now focusing on transcription.
Fig. 72.1: Notation in Stokoe’s (1960) system: The sign [SNAKE] in American Sign Language
(Martin 2000).
2.1. Notation for transcription

Miller ([1994] 2001: 13–16) provides a detailed genealogy of the major systems derived
from Stokoe’s (see http://sign.let.ru.nl/groups/slcwikigroup/wiki/c6573/). Variations
1128 V. Methods
between them typically stem from theoretical developments (generative and post-gen-
erative phonology), the addition of the “orientation” parameter (following Battison
1973), the adaptation to different sign languages and options for the parametric linear
sequencing of symbols. However, each system is only comprehensible to the research
team using it.
One noteworthy system based on Stokoe’s is the HamNoSys, developed in Hamburg
(Prillwitz et al. 1989). HamNoSys is intended to enable the phonetic transcription of all
sign languages, and therefore includes a considerable number of symbols (more than
200 basic symbols), and gradually enhanced for the notation of spatial cues and non-
manual aspects (facial expression, body movements, prosody, eye gaze). Unlike Sto-
koe’s system, HamNoSys employs iconic symbols and shows strong internal systemati-
city (see Fig. 72.2 below for an illustration). Yet, it faces a serious legibility problem,
particularly for the recording of discourse. Nevertheless, thanks to its fast digitization
and compatibility with annotation software (see 3.2 below), it is integrated in large lex-
ical databases, as Ilex (Hanke and Storz 2008) or Auslan Database (Johnston 2001), and
is a notation system which has been frequently used in sign language research.
bears
Goldilocks
somewhere
wandering
deep forest
somewhere
wandering
Fig. 72.2: Notation in HamNoSys (Bentele 1999)
Whatever their respective contributions, there is general agreement that these various
systems have some limitations. The most notable is their virtual inability to represent
discourse sequences and taking note of constituent principles. Their inherent mono-
linearity prevents a readable representation of the spatio-temporal relations that are
essential for sign language syntax. In addition, these systems were established solely
on the analysis of decontextualised manual signs, abstracting away from their use in dis-
course (modifications of internal parameters, discursive framing of the conventional
sign by non manual components).
2.2. Notation for transcription and written communication

Designed from a pedagogical perspective (Bébian) or more generally from a research
perspective (Stokoe and derivatives), few notation systems have been intended to
allow written communication. This was the intended purpose of the incomplete
D’Sign system (Garcia 2000; Jouison 1990, 1995), whose originality at its outset rests
in the consideration of all articulators, both manual and non-manual, on the discourse
level. However, the only system designed with the dual objective of serving both for
research and communication is SignWriting (1974–2011; see Sutton [1995] 1999),
which can now claim to approach the status of writing system.
Although it is an alphabetic type of notation, like its predecessors, the significant inno-
vation in SignWriting lies in its semiographic aspect, adding the analogical to the digital
(see Fig. 72.3). This system represents all gesture production as a multi-parameter com-
position and as a whole (each “graphic cell” includes, analogically, the symbols of various
articulators, allowing us to see a body creating a space and a gaze), thus allowing a de-
tailed reconstruction of spatial phenomena. SignWriting was designed to evolve through
use and was quickly adopted by deaf signers. It is taught at various schools around the
world and supported by numerous publications using the system (see, http://www.
signwriting.org/). Over the past decade, the system has been the object of detailed experi-
ments led by the Italian deaf team directed by E. Antinoro Pizzuto (e.g. Pizzuto, Chiari,
and Rossini 2008; Pizzuto et al. 2008), revealing the possibility (previously unachievable)
for a deaf speaker to accurately reconstruct discourse rich in transfer units from a text of
Lingua dei Segni Italiana (Italian Sign Language) in SignWriting, whether written or tran-
scribed (Di Renzo et al. 2006). However, limitations do remain. In the absence of spelling
rules, the system often allows multiple representations for the same sign. There are also
analogical setbacks – the absence of explicit and economic marking of spatial processes of
anaphora – and problems of computational compatibility (see 2.2.2 below).
Fig. 72.3: SignWriting Transcription of the beginning of a story in LIS (Di Renzo et al. 2009). The
circled part provides a gestural unit (our “graphic cell”). The space of the cell analogically repre-
sents the signing space.
1130 V. Methods
The alternative to the various limitations posed by specific notation systems has been,
and remains, the written form of the spoken language. From the very beginning of lin-
guistic research (e.g. Stokoe 1960; Klima and Bellugi 1979), sign language researchers
have resorted to graphic representations based on a “gloss,” that is, the representation
of isolated sign language signs (or sequences) by written words (or sequences) of the
national spoken language, posed as representing the signs of the sign language studied.
These gloss-based notations have undergone recent systematisation to overcome the
limitations of mono-linearity.
3. Annotation systems
3.1. Multi-Linear annotation
Johnston (1991) proposes a system he calls the Interlinear System, a primary concern of
which is the representation of the signifier form via HamNoSys, alongside the essential
use of written English. The issue, at least initially (see Johnston 2001), was to ensure
access to the data. The Interlinear System is presented as multi-linear; however, the typ-
ical relation between the lines is simple superposition, showing the same phenomena at
different levels of analysis. The fields used are:
(i) a “phonetic” notation in HamNoSys (actually, a notation of the signs in the citation
form, complemented, if necessary, by the spatial specifications available in
HamNoSys),
(ii) facial expression,
(iii) an English gloss of conventional signs,
(iv) any signs of mouthing, and
(v) a translation in English.
Actual parametric multi-linearity affects only lines (2) and (4).

The situation is different in annotation systems based on musical score notation,
which allow both horizontal and vertical reading of information. The multi-linearity
of these systems is indeed semiotic, parallel lines representing the respective time
frames and thus inter-relating the various (manual and non-manual) parameters. This
approach is prominent among researchers who consider non-manual aspects and para-
metric multi-linearity as essential linguistic aspects. Such is the system developed by
Baker-Shenk (1983), the first in the literature to provide a musical-score-type transcrip-
tion. Cuxac (1996, 2000) employs the same principle, but adds two new parts, thus pro-
posing a four-parts system:
(i) a parametric score transcription (right hand, left hand, two hands, body, face, eye
gaze and facial expression),
(ii) systematic recovery of each element of the score, here numbered and explained,
(iii) a literal gloss translation in written French,
(iv) a translation in standard French.
Parts (2) and (3) are original and intended to accurately assign each annotated element
to its meaning component, thus partially compensating for the loss of information of
signifier forms.
Other multi-linear systems (e.g. Boyes-Braem 2001; Fusellier-Souza 2004; Sallandre

2003) add the use of a spreadsheet, which provides several advantages: precise time
coding of the annotated element, the addition of images from the video signal, referring
specifically to the time code, the possibility of simple queries, and the export of infor-
mation to a database. The use of spreadsheets thus enables computerised annotation
and quantitative analysis of the annotated data.
3.2. Multimedia annotation software

A major step was taken at the turn of the millennium with the advent of multimedia
annotation software. Software of this type is designed on the principle of the musical
score and incorporates many additional modules. Their crucial impact is in enabling
the alignment of the annotation and the video signal. Researchers can now annotate
with a precision that would have been difficult if not impossible with older systems,
which forced the annotator to stop watching, memorise the item under consideration,
and only then to annotate. The constraining mental gymnastics involved led to the
potential loss of information and increased the difficulty of interpretation.
Multimedia software, freely available online, can be divided into two categories
according to their availability: precursor or confidential systems and systems in wide
distribution. One of the first systems in the first category was SyncWRITER (Hanke
and Prillwitz 1995), which relied on a HamNoSys grid synchronized to a video signal,
and used mostly in 1989–1992. SignStream was first used in 1999 at the University of
Boston (Neidle 2001) and remains in use today (version 2.2.2). We should also mention
SLAnnotation, released by the PRESTO research centre in Toulouse, France (http://
www.irit.fr/presto/resultats.html). This modest annotation tool notably allowed the
addition of annotation in sign language (via a video that appears in a separate window)
alongside the traditional written annotation.
The most developed systems to gradually gain ground in the scientific community are
ANVIL and ELAN. In both programs, annotation files can be imported and exported
using various formats (e.g. text, spreadsheet), providing flexibility and compatibility to
the user. Another major advantage of these programs is the set of annotation tools
available, enabling semi-automatic annotation (e.g. types of segmentation, a merge
function), and tools of information retrieval (e.g. statistical tools).
ANVIL (Kipp 2001) is widely used by computer scientists in the field of sign lan-
guage and linguists specializing in human gestures. ANVIL tracks enable the integra-
tion not only of text but also of icons and colours, making it attractive and a useful
aid in intuitive data analysis. This software also incorporates a 3D visualization tool
used for motion capture that is synchronized to the video signal and the 2D grid anno-
tation (Kipp 2012).
Unlike ANVIL, ELAN (http://www.lat-mpi.eu/tools/elan/), developed at the Max
Planck Institute for Psycholinguistics in Nijmegen (MPI), is an open source program.
As such, it is regularly updated and focused on the needs of its users (Brugman and
Russel 2004; Crasborn and Sloetjes 2008). Its ease of use and flexibility has made it
the dominating software in linguistic research of sign language. Some research centres
have abandoned their locally-developed tool in favour of ELAN, while adapting some
of its features. Thus, the Hamburg team no longer uses SyncWriter, but has integrated
HamNoSys into ELAN, interfacing with their lexical database Ilex.
1132 V. Methods
ELAN is programmed in Java under xml and is compatible with Mac OS as well as
with Windows and Linux. It is part of a set of freeware tools available on the Max
Planck Institute platform (Language Archiving Technology), where it is connected,
in particular, to the ARBIL tool for the precise (and almost exhaustive) administra-
tion of metadata (IMDI Metadata tools). The first step in the creation of an annota-
tion grid, is the definition of the template (e.g. tiers, types and stereotypes, controlled
vocabulary), allowing a hierarchy of information determined on the basis of the de-
sired analysis. As shown in Fig. 72.4, the ELAN grid is organized in distinct parts:
(a) the video, which progresses in sync with the other elements, (b) the annotation
grid, and (c) additional textual and numerical elements. The video alignment indicator
(or cursor) is symbolized by a thin vertical line in the grid (b), thus keeping track of
the video reference. Once the annotation template is established, annotation can be
performed simultaneously by multiple annotators, thus promoting interaction between
users.
(a)
(c)
(b)
Fig. 72.4: Screen capture of annotation with ELAN (Garcia et al. 2011)
The convenience and speed of multimedia tools enable the detailed annotation of large
corpora within reasonable timeframes, undoubtedly bringing a new era to sign language
research. Resultant annotations become, in effect, machine-readable, enabling new
and innovative types of analysis (e.g. diversification of statistical queries, lexicometry).
So, the use of such software provides a real heuristic dimension, enabling new types
of observation. However, these software are complex computerised tools that require
regular updating and maintenance that only large research institutes can provide.
4. Problems and perspectives

The five international workshops of the ESF project InterSign 1998–2000 (Bergman
et al. 2001) testify to the emergent interest in the problems of discourse annotation
among sign language researchers, which increases with the expansion of large sign lan-
guage corpora (e.g Workshops on The Processing of Sign Languages, LREC 2004, 2006,
2008, 2010; Sign Linguistics Corpora Network 2009–2010, http://www.ru.nl/slcn/). The
three key issues are the optimisation of annotation tasks, automated processing of
large corpora, and the possibility of sharing corpora (both data and metadata) to enable
comparisons between sign languages. The latter rests on the establishment of minimal
international standards of annotation and data documentation. In addition to the tech-
nological aspects and the classic disagreements related to the choice of theoretical mod-
els and diverse research interests, much of the current difficulties are due to the absence
of a notation system of sign language, in the strict sense, and (therefore) to the fact that
the basic medium of annotation remains written form of national spoken languages.
4.1. The major disadvantages with the use of written words

of spoken languages
Difficulties associated with the very diversity of the spoken languages used for this pur-
pose are coupled with the semiotic heterogeneity of the use of a spoken language writ-
ten form and the almost total absence of standardised practices, even in the same
country. Thus, writing can be used in the same annotation to describe both the signifier
form (“gaze towards the addressee”), its sense (“interrogative mimicry”) or to provide
grammatical information (e.g Cl. for “classifier”), the latter potentially using abbre-
viated forms (e.g. NS for “name signs,” and DV for “depicting verbs,” in Chen Pichler
et al. 2010).
However, the key issue is the external segmentation induced by the use of a written
language. Regardless of the typological gap between spoken language and sign lan-
guage in general, each sign language has its own organization, like any other language,
which may not correspond to the words and units of the spoken language of the same
region. Thus, the use of spoken/written language increases the risk of affecting the ana-
lysis, given that a written word often carries morphosyntactic information (category,
grammatical inflection) which it superimposes on the annotated unit. The linguist
can, of course, conventionalise the use of label-words (e.g. following the standards of
spoken language research, establishing masculine singular nouns and infinitive verbs
as default non-inflected forms), or pose the use of gloss-words as referring to the signif-
ier form only, without aiming to indicate meaning or discourse function in this way
(Cuxac 2000). However, and this is even more significant in long corpora, this practice
necessarily presents a biased image of sign language and does not resolve the potential
risk of bias on analysis (on these points, see Pizzuto and Pietrandrea 2001; Di Renzo
et al. 2009).
But there is more. The risk of segmenting sign language units through the structures
of the annotating spoken language is often downplayed by sign language linguists who
believe that the basic units of sign languages are conventional lexical signs, not much
distinct from the lexemes of spoken language. However, in an approach that considers
transfer units as central (see section 2.1), this becomes a considerable difficulty, since
1134 V. Methods
transfer units are, at best, equivalent to a clause in spoken language (e.g. “the thin elon-
gated vertical shape moves slowly towards a fixed horizontal oval shape”). Whatever
the theoretical status granted to such units, no sign language linguist can now deny
their massive presence in discourse (e.g. Emmorey 2003; Liddell 2003). The problem
of their graphical representation and the inadequacy of spoken language glosses for
such purposes are thus unavoidable. Ultimately, for Pizzuto, Rossini, and Russo
(2006), Garcia (2006, 2010), and Antinoro Pizzuto and Garcia (in preparation), these
problems are epistemological. Pizzuto and Pietrandrea (2001) have stressed that the
so-called “gloss” or “annotation” of sign language cannot claim this status in the
sense it has in spoken language research, to the extent (and quasi-systematically from
now on) that it lacks the initial level of transcribing the signifier form. This is not a prob-
lem in the annotation of any spoken language, even ones without their own writing sys-
tem, since spoken languages can always be phonetically transcribed using the
international phonetic alphabet. The video clip incorporated into the so-called annota-
tion software (see section 3.2.2) is not a functional equivalent, and even less so, given
that the sign language signal corresponds to highly multi-linear signifiers.
The Berkeley Transcription System (BTS), proposed by Slobin’s and Hoiting’s team
(Hoiting and Slobin 2002; Slobin et al. 2001), was explicitly developed to find alterna-
tives to sign language’s glossing problems. It is based on allowing a graphical (mono-lin-
ear) symbolisation, not on the level of the sign but of the morpheme, lexical as well as
grammatical. Originally designed for the transcription of American Sign Language and
Sign Language of the Netherlands (NGT), the Berkeley Transcription System, which is
connected to the CHILDES system (MacWhinney 2000), allows the transcription of
any sign language. However, its primary medium remains written English, although
supplemented by all available graphical resources (e.g. typographical variants, arrows,
brackets, figures), in rigorously standardised uses. Indeed, this system simply transfers
the limitations of glossing to the sub-lexical level. Another problem, aside from its
low readability, is the rigidity introduced by the labelling system provided, which is
based on specific theoretical assumptions that are not necessarily shared.
For Johnston (1991, 2001, 2008), the only real issue is a consistent use of glossing.
This requires lemmatisation of the lexicon of studied sign language, which is posited
as essential to any “modern corpus” (i.e. machine-readable) and requires the formation,
prior to any annotation, of a lexical database consisting of the systematic assignment of
ID-glosses to these lemmas, to be enhanced and revised later, following the analysis of
large corpora (see Auslan database). While such a lexical database undoubtedly pro-
vides internal glossing consistency, its constituents may be subject to debate (the criteria
for defining the lexical unit and predetermination of the lemma and its variants prior to
discourse analysis, see Konrad 2011). The underlying focus on conventional signs alone
cannot resolve the issue of the annotation of transfer units, let alone the problem of
their representation.
4.2. Perspectives
On the technical level, we can expect much faster progress in the field of image analysis
and automatic recognition, and in development of tools of video inlay and segmentation
(Braffort and Dalle 2008; Collet, Gonzalez, and Milachon 2010). However, two main
types of advances are necessary if we are to resolve the problems posed by the current
annotation practices of sign language corpora.
First, while the internal consistency of annotations is, naturally, a prerequisite for any
productive and significant automatic processing of corpora, lemmatisation, which is
equally necessary, must take into account non-conventional units and their components,
which form 30–80% of these corpora. Cuxac’s (2000) hypothesis of a morphemic
compositionality, many components of which are shared by both conventional units and
transfer units, seems to open a promising alternative route for the creation of lexical
databases that are more faithful to the structures of sign language (Garcia 2006, 2010).
At the same time, progress is needed in the development and/or improvement of nota-
tion systems that allow analytical representation of the signifier form in discourse, which
remains the only way to allow rigorous elaboration of the annotations linked to the signal
itself. On this point, we believe that much can be expected from the continuation of ex-
periments on SignWriting noted above (see Bianchini et al. 2011) and from current efforts
to integrate it into annotation software (e.g. Antinoro Pizzuto and Garcia in preparation).
5. References
Antinoro Pizzuto, Elena, Isabella Chiari and Paolo Rossini 2008. The representation issue and its
multifaceted aspects in constructing sign language corpora: Questions, answers, further prob-
lems. In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik,
Stelios Piperidis and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Con-
ference on Language Resources and Evaluation, 150–158. Paris: European Language Resources
Association.
Antinoro Pizzuto, Elena and Brigitte Garcia in preparation. Annotation tools for sign language
(SL): The nodal problem of the graphical representation of forms. In: Terry Janzen and Sher-
man Wilcox (eds.), Cognitive Approaches to Signed Language Research. Berlin: De Gruyter
Mouton.
Antinoro Pizzuto, Elena, Paolo Rossini, Marie-Anne Sallandre and Erin Wilkinson 2008. Deixis,
anaphora and highly iconic structures: Cross-linguistic evidence on American (ASL), French
(LSF) and Italian (LIS) Signed Languages. In: Ronice Müller de Quadros (ed.), Proceedings
of TISLR9, Theoretical Issues in Sign Language Research Conference, 475–495. Petrópolis,
Rio de Janeiro Brazil: Editora Arara Azul.
Baker-Shenk, Charlotte 1983. Nonmanual behaviors in sign languages: Methodological concerns
and recent findings. In: William Stokoe and Virginia Volterra (eds.), Sign Language Research,
175–184. Burtonsville Maryland: Linstock Press.
Battison, Robbin 1973. Phonology in American Sign Language: 3-D and digitvision. Paper pre-
sented at the California Linguistic Association Conference, Stanford, CA.
Bébian, Auguste 1825. Mimographie, ou Essai d’Écriture Mimique, Propre à Régulariser le Lan-
gage des Sourds-Muets. Paris: Louis Colas.
Bentele, Susan 1999. HamNoSys. Sample of sentences from Goldilocks. http://www.signwriting.
org/forums/linguistics/ling007.html.
Bergman, Brita, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.) 2001. Sign Tran-
scription and Database Storage of Sign Information, special issue of Sign Language and Lin-
guistics 4(1/2). Amsterdam: John Benjamins.
Bianchini, Claudia S., Gabriele Gianfreda, Alessio di Renzo, Tommaso Lucioli, Giulia Petitta,
Barbara Pennacchi, Luca Lamano and Paolo Rossini 2011. Ecrire une langue sans forme écrite:
Réflexions sur l’écriture et la transcription de la Langue des Signes Italienne (LIS). In: Gilles
Col and Sylvester N. Osu (eds.), Transcrire, écrire, formaliser, 1. Traveaux Linguistiques Cer-
LiCO, 71–89. Rennes, France: Presses Universitaires de Rennes.
1136 V. Methods
Bonnal, Françoise 2005. Sémiogenèse de la langue des signes française: étude critique des signes
attestés sur support papier depuis le XVIIIe siècle et nouvelles perspectives de dictionnaires.
Ph.D. dissertation, University of Toulouse Le Mirail, France.
Boutora, Leila 2005. Bibliographie sur les formes graphiques des langues des signes. Projet
RIAM-ANR LS Script, 25 pages. http://lpl-aix.fr/˜fulltext/4786.pdf.
Boyes-Braem, Penny 2001. Sign language text transcription and analyses using ‘Microsoft Excel.’
In: Brita Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Antinoro Pizzuto (eds.),
Sign Transcription and Database Storage of Sign Information, special issue of Sign Language
and Linguistics 4(1/2): 241–250. Amsterdam: John Benjamins.
Braffort, Annelies and Patrice Dalle 2008. Sign language applications: Preliminary modelling.
Universal Access in the Information Society, special issue 6(4): 393–404. Berlin: Springer.
Brennan, Mary 1990. Productive morphology in British Sign Language. In: Siegmund Prillwitz and
Tomas Vollhaber (eds.), Proceedings of the International Congress on Sign Language Research
and Application, Hamburg’90, 205–228. Hamburg: Signum.
Brugman, Hennie and Albert Russel 2004. Annotating multimedia/multi-modal resources with
ELAN. In: Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa and Ra-
quel Silva, with the collaboration of Carla Pereira, Filipa Carvalho, Milene Lopes, Mónica Cat-
arino and Sérgio Barros (eds.), Proceedings of LREC 2004, Fourth International Conference on
Language Resources and Evaluation, 2065–2068. Paris: European Language Resources
Association.
Chen Pichler, Deborah, Julie A. Hochgesang, Diane Lillo-Martin and Ronice Muller de Quadros
2010. Conventions for sign and speech transcription of child bimodal bilingual corpora in
ELAN. Langage Interaction Acquisition 1(1): 11–40. Amsterdam/Philadelphia: John Benjamins.
Collet, Christophe, Matilde Gonzalez and Fabien Milachon 2010. Distributed system architecture
for assisted annotation of video corpora. In: Nicoletta Calzolari, Khalid Choukri, Bente Mae-
gaard, Joseph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias (eds.), Proceedings of
LREC 2008, Sixth International Conference on Language Resources and Evaluation, 49–52.
Paris: European Language Resources Association.
Crasborn, Onno and Han Sloetjes 2008. Enhanced ELAN functionality for sign language corpora.
In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Pi-
peridis and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Conference on
Language Resources and Evaluation, 39–43. Paris: European Language Resources Association.
Cuxac, Christian 1996. Fonctions et structures de l’iconicité dans les langues des signes. Thèse de
Doctorat d’Etat, Paris 5 University.
Cuxac, Christian 2000. La Langue des Signes Française (LSF). Les Voies de l’Iconicité. Faits de
Langue, Paris: Ophrys.
Di Renzo, Alessio, Luca Lamano, Tommaso Lucioli, Barbara Pennachi and Luca Ponzo 2006. Ital-
ian Sign Language: Can we write it and transcribe it with Sign Writing? In: Proceedings of
LREC 2006, Fifth International Conference on Language Resources and Evaluation, 11–16.
Genova, Italy. http://www.lrec-conf.org/proceedings/lrec2006/
Di Renzo, Alessio, Gabriele Gianfreda, Luca Lamano, Tommaso Lucioli, Barbara Pennachi, Paolo
Rossini, Claudia Bianchini, Giulia Petitta and Elena Antinoro Pizzuto 2009. Representation –
Analysis – Representation: novel approaches to the study of face-to-face and written narratives
in Italian Sign Language (LIS). Spoken Communication. Colloque International sur les langues
des signes (CILS), Namur, 16–20 novembre 2009, Palais des Congrès de Namur, Belgique.
Emmorey, Karen (ed.) 2003. Perspectives on Classifier Constructions in Sign Languages. Mahwah,
Fischer, Renate 1995. The notation of sign languages: Bébian’s Mimographie. In: Heleen F. Bos
and Gertrude M. Schermer (eds.), Sign Language Research 1994, Proceedings of the Fourth
European Congress on Sign Language Research, 285–302. (International Studies on Sign Lan-
guage and Communication of the Deaf 29.) Hamburg: Signum.
Fusellier-Souza, Ivani 2004. Sémiogenèse des Langues des Signes. Étude de langues des signes pri-
maires (LSP) pratiquées par des sourds brésiliens. Ph.D. dissertation, Paris 8 University.
Garcia, Brigitte 2000. Contribution à l’histoire des débuts de la recherche linguistique sur la Langue
des Signes Française (LSF); les travaux de Paul Jouison. Ph.D. dissertation, Paris 5 University.
Garcia, Brigitte 2006. The methodological, linguistic and semiological bases for the elaboration of
a written form of LSF (French Sign Language). In: Proceedings of LREC 2006, Fifth Interna-
tional Conference on Language Resources and Evaluation, 31–36. Genova, Italy. http://www.
lrec-conf.org/proceedings/lrec2006/
Garcia, Brigitte 2010. Sourds, surdité, langue(s) des signes et épistémologie des sciences du lan-
gage. Problématiques de la scripturisation et modélisation des bas niveaux en Langue des
Signes Française (LSF). Habilitation Thesis à Diriger les Recherches, Paris 8 University.
Garcia, Brigitte, Marie-Anne Sallandre, Camille Schoder and Marie-Thérèse L’Huillier 2011. Ty-
pologie des pointages en Langue des Signes Française (LSF) et problématiques de leur anno-
tation. In: Proceedings of TALN Conference 2011, 107–119. Montpellier, France. http://
degels2011.limsi.fr/actes/themes.html
Hanke, Thomas and Siegmund Prillwitz 1995. SyncWRITER. Integrating video into the transcrip-
tion and analysis of sign language. In: Trude Schermer and Heleen Bos (eds.), Proceedings of
the Fourth European Congress on Sign Language Research, 303–312. (International Studies on
Sign Language and Communication of the Deaf 29.) Hamburg: Signum.
Hanke, Thomas and Jakob Storz 2008. Ilex – a database tool for integrating sign language corpus
linguistics and sign language lexicography. In: Nicoletta Calzolari, Khalid Choukri, Bente Mae-
gaard, Joseph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias (eds.), Proceedings of
LREC 2008, Sixth International Conference on Language Resources and Evaluation, 64–67.
Paris: European Language Resources Association.
Hoiting, Nini and Dan Slobin 2002. Transcription as a tool for understanding: The Berkeley Tran-
scription System for sign language research (BTS). In: Gary Morgan and Bencie Woll (eds.),
Directions in Sign Language Acquisition, 55–75. Amsterdam: John Benjamins.
Johnston, Trevor 1991. Transcription and glossing of sign language texts: Examples from Auslan
(Australian Sign Language). International Journal of Sign Linguistics 2(1): 3–28.
Johnston, Trevor 2001. The lexical database of Auslan (Australian Sign Language). In: Ronnie Wilbur
(ed.), Sign Language and Linguistics 4(1/2): 145–169. Amsterdam/Philadelphia: John Benjamins.
Johnston, Trevor 2008. Corpus linguistics and signed languages: No lemmata, no corpus. In: Nico-
letta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis
and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Conference on Lan-
guage Resources and Evaluation, 82–87. Paris: European Language Resources Association.
Jouison, Paul 1990. Analysis and linear transcription of sign language discourse. In: Siegmund
Prillwitz and Tomas Vollhaber (eds.), Proceedings of the International Congress on Sign Lan-
guage Research and Application, Hamburg’90, 337–354. Hamburg: Signum.
Jouison, Paul 1995. Ecrits sur la Langue des Signes Française (LSF). Edition critique établie par
Brigitte Garcia. Paris: L’Harmattan.
Kipp, Michael 2001. Anvil - A Generic Annotation Tool for Multimodal Dialogue. In: Proceedings
of Eurospeech 2001, 1367–1370. Aalborg. http://www.lrec-conf.org/proceedings/lrec2002/pdf/289.
pdf
Kipp, Michael 2012. Multimedia annotation, querying and analysis in ANVIL. In: Mark T. May-
bury (ed.), Multimedia Information Extraction, Chapter 19. IEEE Computer Society Press.
Klima, Edouard S. and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
Konrad, Reiner 2011. Die Lexikalische Struktur der Deutschen Gebärdensprache im Spiegel Em-
pirischer Fachgebärdenlexikographie. Zur Integration der Ikonizität in ein Korpusbasiertes Lex-
ikonmodell. Tübingen: Narr.
1138 V. Methods
MacWhinney, Brian 2000. The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawr-
ence Erlbaum.
Martin, Joe 2000. A linguistic comparison. The notation systems for signed languages: Stokoe
Notation and Sutton SignWriting. http://www.signwriting.org/forums/linguistics/ling016.html.
Miller, Christopher 2001. Some reflections on the need for a common sign notation. In: Brita
Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcription
and Database Storage of Sign Information, special issue of Sign Language and Linguistics 4
(1/2): 11–28. Amsterdam: John Benjamins. First published in [1994].
Montaigne, Michel de 2009. Les Essais. Paris: Gallimard.
Neidle, Carol 2001. SignStream™: A database tool for research on visual-gestural language. In:
Brita Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcrip-
tion and Database Storage of Sign Information, special issue of Sign Language and Linguistics 4
(1/2): 203–214. Amsterdam: John Benjamins.
Piroux, Joseph 1830. Le Vocabulaire des Sourds-Muets (Partie Iconographique). Nancy, France:
Grimblot.
Pizzuto, Elena and Paola Pietrandrea 2001. The notation of signed texts: Open questions and in-
dications for further research. In: Brita Bergman, Penny Boyes-Braem, Thomas Hanke and
Elena Pizzuto (eds.), Sign Transcription and Database Storage of Sign Information, special
issue of Sign Language and Linguistics 4(1/2): 29–45. Amsterdam: John Benjamins.
Pizzuto, Elena, Paolo Rossini and Tomasso Russo 2006. Representing signed languages in written
form: Questions that need to be posed. In: Proceedings of LREC 2006, Fifth International Con-
ference on Language Resources and Evaluation, 1–6. Genova, Italy. http://www.lrec-conf.org/
proceedings/lrec2006/
burg Notation System for Sign Languages. An Introductory Guide, HamNoSys Version 2.0.
Hamburg: Signum Press.
Saint-Augustin 2002. Le Maı̂tre. Traduction, présentation et notes de Bernard Jolibert, 2ème édi-
tion revue et corrigée. Paris: Klincksieck.
Sallandre, Marie-Anne 2003. Les unités du discours en Langue des Signes Française. Tentative de
catégorisation dans le cadre d’une grammaire de l’iconicité. Ph.D. dissertation, Paris 8 Université.
Slobin, Dan I., Nini Hoiting, Michelle Anthony, Yael Biederman, Marlon Kuntze, Reyna Lindert,
Jennie Pyers, Helen Thumann and Amy Weinberg 2001. Sign language transcription at the
level of meaning components: The Berkeley Transcription System (BTS). In: Brita Bergman,
Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcription and Data-
base Storage of Sign Information, special issue of Sign Language and Linguistics 4(1/2): 63–
Stokoe, William C. 1960. Sign language structure. Studies in Linguistics – Occasional Papers 8.
Buffalo, NY: Department of Anthropology and Linguistics, University of Buffalo. (Revised
edition Silver Spring, MD: Linstock Press [1978]).
Stokoe, William C. 1987. Sign writing systems. In: John V. van Cleve (ed.), Gallaudet Encyclopedia
of Deaf People and Deafness, Volume 3, 118–120. New York: McGraw-Hill.
Stokoe, William C., Dorothy Casterline and Carl Croneberg 1976. A Dictionary of American Sign
Language on Linguistic Principles. Washington, DC: Gallaudet College. First published [1965].
Sutton, Valerie 1999. Lessons in SignWriting. Textbook and Workbook. La Jolla, CA: Deaf Action
Commitee for Sign Writing. First published [1995].
Brigitte Garcia, UMR Structures Formelles du Langage

(University Paris 8 and CNRS), France
Marie-Anne Sallandre, UMR Structures Formelles du Langage
(University Paris 8 and CNRS), France

(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I

Transféré par

Droits d'auteur :

Formats disponibles

Body – Language – Communication

Mitbegründet von Gerold Ungeheuer (†)

Herausgegeben von / Edited by / Edités par

Library of Congress Cataloging-in-Publication Data

Bibliographic information published by the Deutsche Nationalbibliothek

Introduction Cornelia Müller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

I. How the body relates to language and communication:

II. Perspectives from different disciplines

17. Multimodal (inter)action analysis: An integrative

III. Historical dimensions

IV. Contemporary approaches

38. Verbal, vocal, and visual practices in conversational

63. Video as a tool in the social sciences Lorenza Mondada . . . . . . . . . . . . 982

1. Why a handbook on body, language, and communication?

2. General statement of goals

3. Structure of the book

Cornelia Müller, Frankfurt (Oder) (Germany)

1. Exploring the utterance roles of visible bodily

1. Utterance visible action as a domain of inquiry

2. Temporal co-ordination between speech and hand,

3. The semantics of utterance visible actions in relation to the

bunch’ or grappolo as devices for marking questions in Neapolitan speakers (Kendon

4. When utterance visible action is the main utterance vehicle

In a much larger investigation, I examined Australian Aboriginal sign languages.

5.1. Utterance visible action and speech and the construction

5.2. The emergence of linguistic symbols

see a dialectical struggle, but an orchestration of resources under the guidance of a

5.3. Utterance visible action and language origins

Goffman, Erving 1967. Interaction Ritual. Chicago: Aldine.

Adam Kendon, Philadelphia, PA (USA)

2. Gesture as a window onto mind and brain, and the

2. “Gesture” in a psychological perspective

A slightly different term does denote speech-linked gesture: “gesticulation,” which in

3. Example of this perspective

[ / and it goes dOWn]

BH/mirroring each other in tense spread

3.1. Interpreting the example

4. The growth point

5. Gesture and linguistic relativity

5.1. S-type and V-type languages

A complex curvilinear path in an S-type description tends to be resolved into a series

5.2. Implications for growth points

5.3. Effects on path

(i) and it goes down

Gesture Synchronous speech

PATH 1[/ and it goes

PATH 2 but [[it roll]

PATH 3 [[down the / / ]

PATH 5 [ut/ out i][nto

PATH 6 the sidew]alk/

(1) [entonces SSS]

Tab. 2.1: Segmentation of paths by English- and Spanish-speaking adults

5.4. Effects on manner

(2) e entonces busca la ma[nera (silent pause)]

(3) [de entra][r / / se met][e por el]

(4) [desague / / ] [/ / si?]

(5) [desague entra /]

5.5. Manner modulation

(6) Speaker A (removes manner)

(7) Speaker B (removes manner)

5.6. Chinese and English: Thematic groups and predicate domination

[so it hits him on the hea]

5.7. Summary: Visuospatial cognition across languages

6. Gesture and ontogenesis