Phonetic Transcription in Theory and Practice PDF

SomoudBarghouthy
BARRY HE S ELWOOD
PHONETIC
TRANSCRIPTION
IN T H E OR Y
A N D P R A C T IC E
w.facebook.com/groups/QOU
SomoudBarghouthy
Phonetic Transcription in
Theory and Practice
e
SomoudBarghouthy
SomoudBarghouthy
Phonetic Transcription in
Theory and Practice
e
Barry Heselwood
SomoudBarghouthy
© Barry Heselwood, 2013
Edinburgh University Press Ltd

22 George Square, Edinburgh EH8 9LF
www.euppublishing.com
Typeset in Times by
Servis Filmsetting Ltd, Stockport, Cheshire,
and printed and bound in Great Britain by
CPI Group (UK) Ltd, Croydon CR0 4YY
A CIP record for this book is available from the British Library
ISBN 978 0 7486 4073 7 (hardback)

ISBN 978 0 7486 9101 2 (webready PDF)
ISBN 978 0 7486 9102 9 (epub)
The right of Barry Heselwood

to be identified as author of this work
has been asserted in accordance with
the Copyright, Designs and Patents Act 1988.
SomoudBarghouthy
Contents
e
List of Tables ix
List of Figures x
Preface xiii
Acknowledgements xv
Introduction 1
1 Theoretical Preliminaries to Phonetic Notation and

Transcription 5
1.0 Introduction 5
1.1 Phonetic Transcription and Spelling 5
1.1.1 Logography and phonography 6
1.1.2 Sound–spelling correspondence 6
1.1.3 Speech, writing and the linguistic sign 9
1.1.4 Spoken and written languages as translation
equivalents 14
1.2 Phonetic Symbols and Speech Sounds 15
1.2.1 Speech sounds as discrete segments 15
1.2.2 Complexity of speech sounds 18
1.2.3 Speech sounds vs. analysis of speech sounds 19
1.3 Phonetic Notation, General Phonetic Models and the Role of
Phonetic Theory 20
1.3.1 Phonetic transcription as descriptive phonetic models 24
1.3.2 Phonetic transcription as data reduction-by-analysis 25
1.4 Content of Phonetic Models 26
1.5 Respelling as Pseudo-Phonetic Transcription 28
1.5.1 Transliteration as pseudo-phonetic transcription 29
1.6 Orthographic Transcription 32
1.6.1 Interpretation of spellings and transcriptions 33
1.7 Status and Function of Notations and Transcriptions 35
SomoudBarghouthy
vi
2
Phonetic Transcription in Theory and Practice
Origins and Development of Phonetic Transcription 37
2.0 Introduction 37
2.1 Representation of Pronunciation in Writing Systems 37
2.2 Phonographic Processes in Writing Systems 38
2.2.1 The rebus principle 38
2.2.2 Syllabography 39
2.2.3 The acrophonic principle 40
2.2.4 The notion ‘segment’ revisited 41
2.2.5 Subsegmental analysis 45
2.2.6 Diffusion and borrowing of writing systems 46
2.2.7 Anti-phonography 47
2.3 The Development of Phonetic Theory 48
2.3.1 Phonetic theory in the pre-Modern world 49
2.3.2 Phonetic theory in the Early Modern world 51
2.3.3 Phonetic terminology in the ‘English School’ 65
2.3.4 Phonetic theory in the late eighteenth and nineteenth
centuries 66
2.3.5 From correspondence to representation 69
2.3.6 Spelling reform 70
3 Phonetic Notation 73
3.0 Introduction 73
3.1 Organic-Iconic Notation 74
3.1.1 Korean Hangŭl 75
3.1.2 Helmont’s interpretation of Hebrew letters 76
3.1.3 Wilkins’s organic-iconic symbols 77
3.1.4 Bell’s Visible Speech notation 79
3.1.5 Sweet’s organic-iconic notation 80
3.1.6 The Passy-Jones organic alphabet 82
3.2 Organic-Analogical Notation 83
3.2.1 Wilkins’s analogical notation 83
3.2.2 Lodwick’s analogical notation 86
3.2.3 Sproat’s analogical notation 88
3.2.4 Notation for a voiced alveolar trill in Wilkins,
Bell/Sweet and Passy-Jones 90
3.3 Analphabetic Notation 92
3.3.1 Jespersen’s analphabetic notation 93
3.3.2 Pike’s analphabetic notation 95
3.4 Alphabetic Notation and the Structure of Symbols 97
3.4.1 Pre-nineteenth-century alphabetic notation 101
3.4.2 Lepsius’s Standard Alphabet 106
3.4.3 Ellis’s palaeotype notation 109
3.4.4 Sweet’s romic notation 111
3.4.5 IPA notation 112
3.4.6 Extensions to the IPA 119
3.4.7 IPA Braille notation 124
3.4.8 Pitch notation 126
SomoudBarghouthy
3.4.9
Contents
Notation for voice quality and long domain categories
vii
128
3.4.10 SAMPA notation 129
3.4.11 Notation for infant vocalisations 130
3.4.12 Using notations 132
3.5 Ordering of Components and Homography in Composite
Symbols 134
3.6 Hierarchical Notation 137
4 Types of Transcription 141

4.0 Introduction 141
4.1 Specific and Generic Transcriptions 142
4.2 Orientation of Transcriptions 143
4.3 Broad and Narrow Transcriptions 144
4.4 Systematic and Impressionistic Transcriptions 145
4.5 General Phonetic Transcription 147
4.6 Phonemic Transcription 148
4.7 Allophonic Transcription 155
4.8 Archiphonemic Transcription 157
4.9 Morphophonemic Transcription 158
4.10 Exclusive and Inclusive Transcriptions 160
4.11 Dynamic Transcription 161
4.11.1 Parametric transcription 163
4.11.2 Gestural scores 165
4.11.3 Intonation and rhythm 166
4.12 Instrument-Dependent and Instrument-Independent
Transcriptions 170
4.13 Transcriptions as Performance Scores 170
4.13.1 Nonsense words 171
4.13.2 Transcriptions as prescriptive models 173
4.13.3 Spelling pronunciation 174
4.13.4 Active and passive readings of transcriptions 175
4.14 Third Party Transcriptions 175
4.15 Laying Out Transcriptions 175
5 Narrow Impressionistic Phonetic Transcription 178

5.1 Pressure-Waves, Auditory Events and Sounds 179
5.2 The Auditory System and Auditory Perception Of Speech 180
5.2.1 Just noticeable differences 184
5.3 Perception of Speech 185
5.4 Is Speech Processed Differently from Non-Speech Stimuli? 191
5.5 The Issue of Consistency 194
5.6 The Issue of Veridicality 195
5.7 The Content of Perceptual Objects 198
5.8 The Objects of Analysis for Impressionistic Transcription 201
5.9 Phonetic Judgements and Ascription 204
5.10 Objections to Impressionistic Transcription 206
SomoudBarghouthy
viii
5.11
Who Should Make Impressionistic Transcriptions? 209
5.12 Conditions for Making Transcriptions 211
5.13 Comparing Transcriptions and Consensus Transcriptions 215
5.14 Are Some Kinds of Data Harder to Transcribe Than Others? 220
6 Phonetic Transcription in Relation to Instrumental and

Other Records 223
6.1 Instrument-Dependent Transcriptions 225
6.1.1 Instrument-determined transcriptions 225
6.1.2 Instrument-informed transcriptions 228
6.2 Functions of Instrument-Dependent Transcriptions 229
6.2.1 Annotating function 229
6.2.2 Summarising function 233
6.2.3 Corpus transcriptions 234
6.3 Indexed Transcriptions 235
6.4 Impressionistic Transcription and Instrumental Records 236
6.5 Phonetic Domains, Phonetic Theory and Their Relations 240
6.5.1 Articulatory domain 243
6.5.2 Aerodynamic domain 245
6.5.3 Acoustic domain 246
6.5.4 Auditory domain 247
6.5.5 Perceptual domain 248
6.5.6 Phonetic categories as domain-neutral 249
6.6 Multi-Tiered and Multilayered Transcriptions 250
7 Uses of Phonetic Transcription 251

7.1 Transcription in Dictionaries 251
7.2 Transcription in Foreign Language Learning and Teaching 253
7.3 Transcription in Phonetics Learning and Teaching 256
7.4 Transcription in Speech Pathology and Therapy 256
7.5 Transcription in Dialectology, Accent Studies and
Sociophonetics 257
7.6 Transcription in Conversation Analysis 261
7.7 Transcription in Forensic Phonetics 263
Glossary 265
References 268
Appendix: Phonetic Notation Charts
IPA Chart Revised to 2005 295
Elaborated Consonant Chart from Esling (2010) 297
ExtIPA Chart Revised to 2008 298
VoQS Chart 1994 299
IPA Braille Chart 2009 300
Index 304
SomoudBarghouthy
List of Tables
e
Table 1.1 Types of writing-system units and their corresponding

pronunciation units 7
Table 1.2 Separate letters corresponding to front and back allophones
of /ɡ/ in written Azeri 8
Table 2.1 Consonantal manner terminology in the ‘English School’
of phonetics in the sixteenth and seventeenth centuries 65
Table 3.1 Examples of Jespersen’s notation for phonetic categories 94
Table 3.2 Conventions for interpreting Pike’s analphabetic notation
for [t] 96
Table 5.1 Pressure-waves, auditory events and sounds 179
Table 5.2 Alignments of variant transcriptions 216
Table 5.3 Comparison of variant transcriptions and what they have in
common 219
SomoudBarghouthy
List of Figures
e
Figure 1.1 Two views of the relationship between language, speech

and writing 9
Figure 1.2 Classification of notation in writing 10
Figure 1.3 The relationship of phonetic transcription to language 13
Figure 1.4 Correspondences and equivalences between
expression-forms in translations 14
Figure 1.5 Segmentation of So does she keep chickens? into acoustic
classes 17
Figure 1.6 Categories, dimensions and models in a small,
two-dimensional, abstract taxonomic space 21
Figure 1.7 The mapping of speech phenomena onto a theoretical
model creates a descriptive model 25
Figure 1.8 Transliteration as pseudo-transcription and respelling 30
Figure 1.9 Classification of phonetic notation and transcription in
terms of status 35
Figure 2.1 Units used for spelling the written signs of language A are
used for representing the pronunciation of spoken signs in
language B 46
Figure 2.2 Late twelfth- or early thirteenth-century vocal tract diagram
entitled Sūrat makhārij al-hurūf ‘Picture of the outlets of the
letters’ from Miftāh al-‘Ulūm ‘The Key to the Sciences’ by
Al-Sakkāki 51
Figure 2.3 (a) Robinson’s ‘scale of vowels’ diagram of 1617;
(b) Bell’s ‘scale of lingual vowels’ of 1867 with his Visible
Speech symbols; (c) Jones’s drawings of cardinal vowel
tongue positions of 1918, based on X-ray photographs 56
Figure 2.4 Wallis’s 1653 sound chart ‘Synopsis of all letters’ 58
Figure 2.5 Wilkins’s sound chart of 1668 61
Figure 2.6 Holder’s table of consonants (left) and ‘scheme of the
whole alphabet’ (right) 63
Figure 3.1 Articulatory configurations motivating the Hangŭl letters 75
SomoudBarghouthy
Figure 3.2
List of Figures
Helmont’s diagram of Hebrew bēth (left) and his vocal tract
xi
diagram (right) 77
Figure 3.3 Wilkins’s organic alphabet and articulatory diagrams of
1668 78
Figure 3.4 Bell’s vocal tract diagrams for consonants and vowels 79
Figure 3.5 Sweet’s (1906) organic symbols for (a) consonants and
(b) vowels 81
Figure 3.6 The Passy-Jones organic alphabet 82
Figure 3.7 The analogical symbols of Wilkins 84
Figure 3.8 The analogical symbols of Lodwick with a transcription of
the Lord’s Prayer 87
Figure 3.9 Sproat’s analogical symbols for consonants 89
Figure 3.10 Organic symbols for a voiced alveolar trill 90
Figure 3.11 Structural classification of alphabetic phonetic symbols with
examples 99
Figure 3.12 Vowel symbols of Iceland’s ‘First Grammarian’ 101
Figure 3.13 Hart’s new letter-shapes 104
Figure 3.14 EPG frames showing simultaneous central and lateral
channels for airflow during (a) [lsˁ] in the word θˡˁaim
‘pain’ (Al-Rubū‘ah dialect), (b) [lzˁ] in the word ðˡˁahr
‘back’, and (c) [lzˁ] in the word ðˡˁabʕ ‘hyena’ (Rijāl Alma‘
dialect) 123
Figure 3.15 Halliday’s use of musical staves to show pitch dynamics in
speech 126
Figure 3.16 Consonant chart from Canepari (2005: 168) 135
Figure 4.1 Steele’s (1775: 47) adaptation of musical notation 142
Figure 4.2 Overlapping but distinct sets of allophones of /d/ and /b/
at an assimilation site 151
Figure 4.3 Dynamic transcriptions in Pike’s ‘sequence diagrams’ for
(a) [abop] and (b) [zʒɣn] 162
Figure 4.4 Parametric transcription of Good morning 164
Figure 4.5 Gestural score for palm 166
Figure 4.6 Steele’s transcription of a ‘bombastic’ manner of reciting
lines from Thomas Leland’s Orations of Demosthenes 167
Figure 4.7 (a) F0 trace; (b) orthographic transcription with accent and
tone marking; (c) interlinear tonetic transcription with
iconic representation of pitch height, accentual prominence,
and pitch movement; (d) ToBI transcription 168
Figure 4.8 Relations between speech, instrumental records and
transcriptions in instrument-determined,
instrument-informed and instrument-independent
transcriptions 170
Figure 5.1 The human auditory response area 183
Figure 5.2 Korean ‘denasalised’ alveolar stop, with IPA symbol
alternatives, from the phrase miguŋ nodoŋ ‘American labour’ 211
Figure 6.1 Praat waveforms, spectrogram and labelled text grids for
segmentation and annotation 224
SomoudBarghouthy
xii
Figure 6.2
Spectrogram of a dragonfly with aligned multi-tiered
transcription showing segment overlap 225
Figure 6.3 Palatographic frames showing onset, steady state and offset
of a lateral articulation 227
Figure 6.4 Example of an annotated spectrogram and waveform
incorporating measurement data 229
Figure 6.5 Acoustic and palatographic displays of Libyan Arabic /miʃ
ɡdar/ ‘was not able to’ showing total overlap of alveolar
and velar articulations and the release of /d/ 230
Figure 6.6 Acoustic display of Libyan Arabic wagt ‘time’ with
epenthetic [ə] separating /ɡ/ from /t/ 231
Figure 6.7 Spectrogram, waveform, laryngoscopic images and
spectrum (FFT and LPC) of the Iraqi Arabic word /saʕiːd/
‘happy’ realised as [saˁʕ̆iːd] 232
Figure 6.8 Annotated waveform and spectrogram focusing on a
particular realisation of English /t/ 233
Figure 6.9 Intensity, Fx (pitch) and Qx (closed quotient) traces from
an utterance of What are you talking about? annotated with
ExtIPA, IPA and VoQS notation 234
Figure 6.10 Averaged FFT spectrum and laryngogram indexed to a
specific transcription of the Arabic word /waʕʕad/ ‘to make
someone promise’ showing voice quality features in the
realisation of the geminate pharyngeal /ʕʕ/ 235
Figure 6.11 Spectrograms indexed to a generic allophonic transcription
of English lilt to show typical clear and dark allophones
of /l/ with formant tracks 236
Figure 6.12 Multi-tiered transcription showing (A) signal-oriented
transcription summarising acoustic records (spectrogram
and speech waveform); (B) speaker-oriented transcription
summarising an articulatory record (larynx waveform);
(C) listener-oriented impressionistic transcription 240
Figure 6.13 Phonetic domains in a chain of cause and effect which map
independently to phonetic categories 241
Figure 6.14 Domain-neutral theoretical model and domain-specific
descriptive models 243
Figure 6.15 (a) Midsagittal vocal tract diagram representing generic
physical articulatory space with IPA symbol [s] at the
relevant place of articulation; (b) region of abstract
articulatory space containing [s] as the product of category
intersection 244
Figure 6.16 Vowel plot as a model of normalised acoustic space
showing the grand mean distributions and standard
deviations of the English dress, trap and strut vowels
for different groups of speakers 246
Figure 6.17 Centroid for a token of [s] 247
Figure 7.1 Pages from Ellis’s SED fieldwork notes with IPA
transcriptions 260
SomoudBarghouthy
Preface
e
Why write a book on phonetic transcription? After more than half a century of
major advances in instrumental phonetics which have rightly taken credit for
broadening and deepening our knowledge of the structure of speech, it can appear
to many that symbols and transcription have had their day. What, it might be
asked, can [d] ever tell us that spectrograms, palatograms and the like cannot? If
traditional transcription is not to fade away or be made the amanuensis of auto-
mated forms of analysis, then a case must be made for it on the grounds that it can
express something which instruments cannot. Arguments need to be put against
the view that there is nothing to be gained in phonetics by listening analytically
to people speaking and transcribing what we hear. Marshalling the arguments
provides the opportunity not only to examine critically the aims and methods of
transcription but also to think about how phonetic symbols work in relation to
phonetic theory on the one hand and phonetic data on the other; to consider, that
is, the manner of their semiosis. This book attempts to address these issues and to
place them in the context of the historical emergence of transcriptional resources
from resources for writing language, the development of phonetic theory, and
their coming together to make what I refer to as proper phonetic transcription
possible.
If any time and place can be identified as when and where the ideas for this
book originated, it is nearly twenty-five years ago when I started teaching pho-
netics to speech and language therapy students at Leeds Polytechnic, later Leeds
Metropolitan University. There were quite intensive practical phonetics classes
and tests involving transcriptions of clinical as well as non-clinical speech
samples which had to be marked. Anyone who has had to transcribe difficult
clinical speech data, and judge the accuracy of others’ transcriptions, might agree
that there is nothing quite like it for making one realise that fair copies do not,
and cannot, exist. And yet not all transcriptions are equally insightful. It was the
knowledge, expertise and insightfulness of my then colleague Stephen Mallinson
which showed me that the twists and turns of the transcription process which
threaten to entrap one in endless indecision can be transformed from a maze of
blind alleys into a labyrinth whose path, after leading you deeper into a chaotic
world of sounds, leads you out again past a pleasingly ordered array of symbols
SomoudBarghouthy
xiv Phonetic Transcription in Theory and Practice
and diacritics. It is a transformation that only takes place once one has a thorough
practical grasp of phonetics, a good understanding of phonetic theory in all its
aspects, and the right balance of faith and doubt in one’s ability to make a good
transcription: belief that it is possible, but uncertainty that one has ever quite
managed to do it.
I have been fortunate enough to collaborate over many years with Sara
Howard on various aspects of phonetic analysis and transcription, benefitting
greatly from her knowledge and experience, and finding her appetite and enthu-
siasm for intractable phonetic data a true inspiration. Much of the content of this
book would hardly have been imaginable otherwise.
The scope of the book has had to be limited to keep it within constraints
of space and pressures of time. Consequently I have not looked at shorthand
systems, despite their obvious relevance and historical contribution to the repre-
sentation of pronunciation, on the grounds that they are not used by phoneticians
for phonetic transcription and are not as independent of language-specific lexical,
grammatical, phonological and spelling systems as phonetic notation aims to be.
Transcription of non-speech vocal phenomena inseparably woven into spoken
communication, such as laughter and sighing, has not been included although
infant pre-speech vocalisations are briefly looked at. Transcriptional resources
for other aspects of human communicative behaviours such as gesture, gaze and
proxemics, and notation for discourse structure, have also been omitted as being
outside the usual meaning of ‘phonetic’ as pertaining to the sounds of speech.
Intonationists will probably be disappointed in the greater emphasis on segmental
transcription, but one aim of the book is to bolster the legitimacy of segments
as theoretically respectable elements of auditory-perceptual speech analysis and
denotata for phonetic symbols.
Barry Heselwood
February 2013
SomoudBarghouthy
Acknowledgements
e
Many people have indirectly influenced the content of this book, far too many to
list. But I should like to mention, in alphabetical order, those whose direct advice
and assistance, on small points or on larger issues, have been a help even if they
were not aware of it at the time: Munira al-Azraqi, Michael Ashby, Martin Ball,
Martin Barry, Helen Barthel, Monica Bray, Emanuela Buizza, Elena Coope-
Bellido, Ian Crookston, James Dickins, Gerry Docherty, Martin Duckworth,
Robert Englebretson, John Esling, Paul Foulkes, Tony Fox, Alaric Hall, Zeki
Majeed Hassan, Sara Howard, Mark Jones, Miho Kamata, Pat Keating, Ghada
Khattab, Maha Kolko, Young-Shin Kim, Rachael-Anne Knight, Sujuan Long,
Michael MacMahon, Reem Maghrabi, Stephen Mallinson, Samia Naïm, Sue
Peppé, Leendert Plug, Robin Le Poidevin, Rawya Ranjous, Raouf Shitaw, Mark
Shouten, Fiona Skilling, Alison Tickle, Clive Upton, Gareth Walker, Juan Wang,
Janet Watson, Dominic Watt, Frances Weightman, John Wells, Anne Wichmann;
also all those, not already named, who attended meetings of the Phonetic
Transcription Group in Leeds convened by Sara Howard and myself. Needless to
say, they bear no responsibility for how I have used their advice and assistance,
any errors and inconsistencies being entirely mine. I am also grateful to students
who over the years have contributed their ideas in phonetic transcription classes,
often noting things which I missed and raising issues I had not before thought
about.
Thanks also to David Thomas for agreeing to have his painting on the cover,
to the Faculty of Arts and the School of Modern Languages and Cultures at
Leeds University for funded sabbatical leave, and to colleagues in Linguistics
and Phonetics for their much-valued support and collegiality. I would also like
to express gratitude to Gillian Leslie at Edinburgh University Press for her
patience and advice in steering the book towards publication, and to Fiona Sewell
for diligent copy-editing, Sue Lightfoot for compiling the index, and Rachel
Arrowsmith for assistance with proof-reading. Last but very far from least, I am
grateful to my wife and family for their forbearance while much of my time and
attention was consumed in pursuit of completing this book.
SomoudBarghouthy
(Stoop) if you are abcedminded, to this claybook, what curios of signs (please
stoop), in this allaphbed! Can you rede (since We and Thou had it out already)
its world? It is the same told of all. Many. Miscegenations on miscegenations.
James Joyce, Finnegans Wake
SomoudBarghouthy
Introduction
e
Phonetic transcription is concerned with how the sounds used in spoken lan-
guage are represented in written form. The medium of sound and the medium
of writing are of course very different, having absolutely no common forms or
substance whatsoever, but over the ages people have found ways to represent
sounds using written symbols of one kind or another, ways that have been more
or less successful for their purposes. This book aims to explore the history and
development of phonetic transcription as a particular example of technographic
writing and to examine critically the problems attending its theory and practice.
A good many academic books include ‘theory and practice’ in their title, and I
offer no apology for doing so in a work on phonetic transcription. Theory and
practice have shaped the resources for transcription by pulling often in contrary
directions through obedience to different priorities. Theory, being concerned
with the logic and consistency of category construction, has made many attempts
to impose itself on the design of phonetic notation systems, but practice has
almost always rebelled, finding the demands of theory too inflexible and too
forgetful of the practical need to make and read transcriptions with a minimum
of difficulty. The failure of many proposed notation systems has illustrated that
the only valid test for a notation is ‘practice, not abstract logical principles’
(Abercrombie 1965: 91). It is in phonetic transcription that theory and practice
have to make compromises – practice must not ignore the rigour of theory or it
will lose its accuracy of expression, and theory cannot afford to overlook the
needs and constraints of practice or practitioners will lose patience with it. It
might be objected that I have over-theorised in places, that we can get by per-
fectly well using symbols as imitation labels with attached definitions and be
guided by professional intuition, but if we are to understand what we are really
doing with notations and transcriptions and be able to justify them, then we do
need to expose their theoretical foundations to critical scrutiny, and strengthen
them if need be. It is as well to understand the tools of one’s trade conceptually
and structurally if one can.
The idea of representing something by means of something else is inher-
ently problematic and contradictory but lies at the very heart of language itself.
Phonological forms of words, themselves meaningless, are used in spoken
SomoudBarghouthy
2 Phonetic Transcription in Theory and Practice
language to stand for meaningful things; likewise orthographic forms in written
language. How is it possible for one thing to stand for, or represent, something
else? If I write the word roses no roses appear on the page. Even a good painting
of roses gets us no closer. We might be tempted to think that a photograph of an
object is somehow a more faithful representation than a word-form or an artist’s
painting, but there are still no roses on a photograph of roses; and it may, after
all, have been plastic or paper roses which were photographed. In representations
of sound there is the same absence of the thing represented. No sounds emanate
from the notes on a musical score, or from a page of phonetic symbols. Phonetic
notation, orthographic word-forms, crotchets and quavers, artists’ paintings and
even photographs can only represent something by convention. Whatever means
are developed for representing things, they have to be interpreted, and there has
to be sufficient agreement on how to interpret them if they are to do their job.
Phonetic theory is the source of interpretation for phonetic symbols and is what
essentially distinguishes them from the characters used in written language; it is
the difference, for example, between the phonetic symbol [b] and the alphabetic
letter .
I have just said that representation works by one thing standing for something
else, and yet it also has to stand for itself if it is to be recognised. The phonetic
symbol [b] stands for a particular bundle of phonetic categories but it also stands
for a type of graphic shape, or glyph, consisting of a bowl and an ascending
stroke attached to the left of it, for without that shape it would not be recognised
as that symbol. There is always, therefore, a self-signifying function in the figura
(see Section 1.1.3) of any sign or symbol as well as a deictic function. It is as
if it is saying ‘Look at me, I look like this and I stand for that.’ Once we have
recognised it, however, we need to forget the symbol and attend to that which
it represents. The less distracted we are by the symbol itself, the easier this will
be. But this conflicts with the commonly held, and on the face of it reasonable,
belief that a good representation should resemble the thing it represents as faith-
fully as possible, which implies profusion of detail. At the head of his section on
‘Symbols’, Jespersen (1889: 12) quotes from Thomas Carlyle’s Sartor Resartus:
‘In a symbol there is concealement [sic] and yet revelation.’ A central purpose
of this book is to try to understand what it is that remains in concealement and
to explicate what it is that is revealed when we use phonetic symbols, and to
show that much of this depends on the principles according to which symbols
are constructed; furthermore, that this in turn is crucially dependent on phonetic
theory. The inevitable circularity in these relationships means that a symbol as
part of a notation system cannot tell us anything theoretical that we do not already
know, but can in transcriptions tell us particulars which we do not already know,
and that indeed is symbols’ ultimate purpose in transcriptions. For example, if
someone tells me that such-and-such a variety of English realises final singleton
/t/ as [h], I know nothing more about /t/ or [h] as phonological and phonetic
entities, but I do now know more about that variety of English. It would be a
mistake, however, to think I now know everything about the realisation of final
singleton /t/ in that variety because [h] as a representation normalises for all kinds
of variables, such as pharyngeal volume and tongue elevation, not considered by
phonetic theory to be important in relation to [h]. Theory, therefore, determines
SomoudBarghouthy Introduction
what a symbol reveals and what it conceals, and symbol design determines how
3
its revelations are displayed.

A practical solution to the inflexibilities and impracticalities of theory,
and to the problem of overly detailed representation, is to acknowledge with
Abercrombie (1967: 120) the advantages of arbitrariness in symbol systems just
as Saussure acknowledged it in his theory of the linguistic sign. It is the arbitrari-
ness of the relation between a word-form and its meaning that gives human lan-
guage its extensive and enduring power to signify, and the same principle applies
to a sophisticated symbol system such as phonetic notation. The seventeenth-
century project to design a universal philosophical language failed to acknowl-
edge this fundamental point, and so have iconic and analogical phonetic notation
systems. In both cases, theories about the phenomena to be represented have dic-
tated the forms of representation, the consequence being that, should the theory
be revised, the forms of representation become obsolete. The same happens to
phonographically reformed spellings when pronunciations change. Sweet’s first
response to Bell’s Visible Speech organic notation recognised this weakness
(Sweet 1877: 100–1), but it was not long before Sweet succumbed to the famil-
iar delusion of every age, that things are now, at last, properly understood well
enough. Writing only four years later, he declared himself a committed champion
of Bell’s approach with the justification that ‘[i]f we impartially survey the whole
field of phonetic knowledge, we shall see that the great majority of the facts are
really as firmly established as anything can well be’ (Sweet 1881: 184). One only
has to call to mind a few of the many, many discoveries in phonetics over the
course of the twentieth century, and the continuing additions to our knowledge
and revisions to our theoretical frameworks as we make our way through the first
quarter of the twenty-first, to see how wide of the mark Sweet was.
Arbitrariness of symbols should not prevent us from appreciating their power
to activate representations in the minds of those exposed to them and thus to
appear, from a subjective point of view, to have a necessary connection with what
they signify, becoming subjectively iconic in a Piercean sense. How many pho-
neticians trained in the cardinal vowel system can see [e] and not ‘hear’ cardinal
vowel number 2, perhaps even hear Daniel Jones’s production on F natural in
New Philharmonic pitch if they are familiar with his recordings, before starting
to retrieve its IPA phonetic label? The iconic power symbols accrue, despite their
logical arbitrariness, tends to protect and preserve symbol–denotatum relations,
thereby conferring considerable stability on a notation system once it has been
adopted, very much as with the spellings of written language. The relationship
we have with symbols, as with written words, is more materially immediate than
with what they signify, an insight which has led psychoanalysts such as Jacques
Lacan to declare the primacy of the signifier from a psychological point of view
in contrast to the logical parity of signifier and signified in Saussure’s concep-
tion of the structure of the linguistic sign (Benvenuto and Kennedy 1986: 24).
Proposals to make changes to how things are symbolised have to be well founded
and well argued to have a chance of success. That there is a certain irrationality
in our psychological relations with symbols is evident if one asks how likely is it
that anyone would seriously propose a swastika glyph for a new phonetic symbol.
The world-wide success of IPA-style notation in the discipline of phonetics
SomoudBarghouthy
rides on the near-universal familiarity among literate peoples with the basic stock
of symbol shapes experienced through exposure to written forms of languages
using roman alphabetic letters. This is true even of users of other writing systems
such as Arabic, Chinese, Hindi and Thai, who can hardly escape the reach of
roman-based writing systems. No doubt this is due in large part to the spread
of English as an international language in the wake of political and economic
influence and domination by English-speaking nations. Roman alphabetic letters
themselves have come about through adaptation of letters by literate speakers of
many different languages over millennia in a process which is quite accurately
captured in Joyce’s phrase ‘miscegenations on miscegenations’. To regard IPA
notation as historically misbegotten, however, does not mean we should regard it
as unfit for its purpose. Its fitness or otherwise will be determined by the practical
needs of phoneticians requiring resources for transcription. Whether this nota-
tion will meet the needs of future generations of phoneticians is something we
cannot be in a position to know, but it is unlikely that they will not engage with
the practicalities of transcription whilst continuing to theorise about phonetics,
and either stick with the principles of the IPA and its notation or give birth to a
new ‘miscegenation’.
SomoudBarghouthy
1
e
Theoretical Preliminaries to Phonetic
Notation and Transcription
e
1.0 Introduction
In this first chapter, a number of points of theory need to be clarified concerning
both the relationship between spoken and written language, and the status of pho-
netic transcription as a particular kind of technographic writing for representing
speech. In the course of clarification I hope to define proper phonetic notation
and proper phonetic transcription, to distinguish them from the notion of a pho-
nographic orthography, and to give theoretical expression to respelling and trans-
literation in relation to phonetic transcription. An issue of overriding importance
throughout the book is what exactly phonetic symbols denote and what transcrip-
tions represent. The issue is tackled largely from an assumption that the notion of
a ‘segment’ is valid providing we take a sophisticated view of it as being rooted
in the mental world of perception, not the physical world of measurable proper-
ties. Arguments for this position are put forward in Section 1.2.1 and returned to
in Chapter 2 Section 2.2.4. Like the concept of the phoneme in phonology, the
segment is often denied, but something remarkably like it seems to be reinstated
quickly if only to provide a concept about which statements can be predicated.
1.1 Phonetic Transcription and Spelling

Much of the discussion of phonetic transcription in this chapter is concerned with
the differences between transcription and spelling and thus between spoken and
written language.1 In any consideration of written language there has to be some
account of the many different writing systems that have arisen in the relatively
short time since written language first appeared around the end of the fourth mil-
lennium bce. Writing also features prominently in Chapter 2, where the emer-
gence of transcription out of phonographic processes in writing systems is traced.
It will therefore be useful to outline briefly the main conceptual division of how
writing represents language, that is to say whether its units represent meaningful
words and morphemes (logography) or meaningless units of sound structure such
as syllables, or consonants and vowels (phonography).2 The division is based on
Sampson (1985: 32–5).
SomoudBarghouthy
6
1.1.1
Logography and phonography
Although none of the writing systems we know about are completely logographic,
and few if any are completely phonographic, the distinction is a crucial one in
principle. Logography means that a word or morpheme is written with its own
character and contains no information about how the corresponding spoken word
is pronounced. Words with identical or similar pronunciations may have entirely
different written characters. In Chinese, for example, 握 ‘hold, grasp’ and 卧 ‘lie
down’ are both pronounced [ˋwo] but the characters are silent about any phonetic
similarity. By contrast, phonography means that each character corresponds to an
expression unit of spoken language such as a syllable, a consonant or a vowel.
Words with identical pronunciations will be written the same. The English words
date (fruit) and date (calendar) are pronounced and spelt identically although
they are clearly different lexical items synchronically and etymologically. While
it is easy to see that logography has little to do with phonetic transcription, it is
also easy to assume that phonetic transcription is a phonographic writing system,
an assumption that has in fact been made by scholars of writing such as Sampson
(1985: 33). I will explain below in Section 1.1.3 why I think this is a mistake.
The logography–phonography distinction is in practice more of a continuum
when actual writing systems are analysed and we see logographic and phono-
graphic principles at work. For example, written Chinese is often held to be logo-
graphic (Sampson 1985: 145–71, but see DeFrancis 1989: 99–121, who argues
it is morphosyllabic) but makes extensive use of phonography albeit in a rather
opaque manner. Written English is more obviously phonographic but not all hom-
ophones are spelt the same – hair–hare, blue–blew, sight–site, moat–mote and
so on. Even in Spanish, often cited as highly phonographic in its spellings, there
are a few non-homographic homophones – for example vaca ‘cow’ and baca
‘roofrack’, both pronounced [ˈbaka], haya ‘beech tree’ and halla ‘there is’, both
pronounced [ˈaja]. In written languages the extent to which logographic and pho-
nographic principles are in evidence in typical written texts varies so that some
writing systems, such as Ancient Egyptian and Chinese, are more logographically
oriented than others, and some, like Spanish and the Japanese kana syllabaries,
more phonographically oriented than others. Processes of phonography in writing
increase the orientation towards pronunciation and create resources which can be
used for transcription as well as for spelling (see Chapter 2 Section 2.2).
A type of writing that manifests both logographic and phonographic features
is what is sometimes called morphophonemic writing or morpho-phonography.
English exhibits this category when morphemes are given invariant spellings
despite variant phonological forms. The regular plural inflection, for example,
has the phonological variants /-s, -z, -ɪz/ in spoken English but invariant <-s> in
written English, although of course <-s> does not always spell the plural mor-
pheme (see also Chapter 2 Section 2.2.7 and Chapter 4 Section 4.9).
1.1.2 Sound–spelling correspondence
Relationships between elements of writing and elements of pronunciation I shall,

following common practice, talk of as correspondences. It will be useful first, and
SomoudBarghouthy Theoretical Preliminaries
in preparation for discussions in later sections and chapters, to summarise and
7
exemplify the different kinds of units in writing systems that can be put into cor-
respondence with units of pronunciation. Daniels (1996: 4, 2001: 43–4) proposes
six fundamental kinds of characters in writing systems, distinguished by their
relationships of correspondence to units of pronunciation in spoken language, and
which cannot be further analysed into components having their own correspond-
ences. Logosyllabograms (or morphosyllabograms) are units that function in
written language to spell whole words or morphemes but which also correspond
to discrete syllables in spoken language if, in the language in question, words
are typically monosyllabic as is the case in Chinese. The character 撒 ‘to scatter’
spells the whole written word and the spoken language equivalent is pronounced
[ˇsa]. The character can therefore be said to correspond to the pronunciation-form
[ˇsa]. A syllabogram is a unit of writing that corresponds to a discrete syllable
in speech and which is used for spelling any words whose spoken equivalents
contain that syllable regardless of meaning. The characters of an abjad, or con-
sonantary, correspond only to consonants in spoken language while those of an
abugida correspond to a consonant-plus-vowel sequence. Vowels in abugidas
correspond to systematic additions to a base consonant character which on its
own often represents a consonant plus /a/ as a kind of default vowel – an abugida
is thus a vocalically augmented abjad. Note that an abjad can, as in Arabic, have
optional diacritics corresponding to vowels whereas the vocalic augmentation in
abugidas is obligatory. In an alphabet there are autonomous characters which can
be put into correspondence with vowels as well as consonants. The final type is
a featural system in which ‘the shapes of the characters correlate with distinc-
tive features of the segments of the language’ (Daniels 1996: 4). Written Korean
is given as an example; Arabic and Hebrew pointing, and the niguri and maru
diacritics in Japanese kana scripts, are also featural (see Chapter 2 Section 2.2.5).
Table 1.1 presents examples of the six types.
TABLE 1.1: Types of writing-system units and their corresponding

pronunciation units
撒,苏,色さ,す,せ ‫س‬ ሠ,ሡ,ሤ s, a, u, e ᄉ
/ ˇsa, ˉsu, `se/ /sa, su, se/ /s/ /sa, su, se/ /s, a, u, e/ [dental]a
‘scatter’, ‘revive’, Japanese Arabic Amharic Spanish Korean
‘colour’ hiragana abjad abugida alphabet featural
Chinese syllabograms consonant consonant- consonant feature
logosyllabograms letter plus-vowel and vowel letter
letters letters
a
Sampson (1985: 124–5) calls this feature ‘sibilant’.
‘Sound–spelling correspondence’ is a general term, neutral with respect both to

type of writing-system unit, and to the size of the sound elements of speech. It is
common to come across the term ‘grapheme–phoneme correspondence’ in litera-
ture dealing with reading and writing but there are problems with it. ‘Grapheme’
SomoudBarghouthy
means different things in different theoretical approaches to writing systems, and
‘phoneme’ means different things in different phonological theories, the implica-
tions of which for phonemic transcription are considered in Chapter 4 Section
4.6. Concerning ‘grapheme’, some writers follow Pulgram (1965) in using it for
the minimal distributional element of writing in a given writing system whether
this be a logogram, syllabogram or alphabetic letter. Others, such as DeFrancis
(1989: 54), reserve the term for written characters that correspond systemati-
cally to minimal elements of sound in spoken language. The latter use brings its
own problems in cases of so-called ‘silent’ letters, which occur frequently in,
for example, English and French spelling. English made is spelt <made> and
transcribed phonemically as /meɪd/ (or /mejd/). The final <-e> can be regarded
either as part of a discontinuous digraph <a-e> corresponding to the diphthong
/eɪ/, or, as Venezky (1970: 50) advocates, as a diacritical letter telling us that
the grapheme <a> in this context corresponds to /eɪ/, preventing made becom-
ing mad. Similar problems attend the in comb and climb. Daniels (2001:
66–7) favours ditching the term ‘grapheme’ altogether. The notoriously many
and contentious definitions of ‘phoneme’ in the phonological literature preclude
review here (see Chapter 4 Section 4.6), but on a very general level the term can
be understood as a distinctive consonant or vowel without regard for contextual
(allophonic) variation.
It is rare for the allophones of a phoneme to have separate corresponding
letters but Azeri furnishes an example. In this Turkic language /ɡ/ has a front
allophone before front vowels and a back allophone before back vowels. Azeri,
at different periods, has been written using Arabic, Roman and Cyrillic letters
and in each case the two allophones of /ɡ/ have had their own letter as shown in
Table 1.2.
TABLE 1.2: Separate letters corresponding to front and back allophones of /g / in
written Azeri (from Coulmas 1996: 30)
Roman Cyrillic Arabic
Front allophone g Ҝ ‫گ‬
Back allophone q Γ ‫ق‬
The rarity of different allophones of a phoneme being in correspondence with

different letters depends to some extent on how one does one’s phonological
analysis. For example, many languages have vowel–glide pairs which are in
complementary distribution, e.g. English [u] and [w], [i] and [j], and which
have their own corresponding letters <u, w>, <i, y>. If these glides are regarded
as non-nuclear allophones of vowels, then examples of allophone–letter corre-
spondences may not be so hard to find.
Letters can correspond to what structuralist phonologists call an ‘archipho-
neme’, which is the result of the neutralisation of a phonemic opposition in a
particular phonotactic context. Trubetzkoy (1933/2001: 12 n.1) gives the fol-
lowing three examples. The three-way oppositions between voiced, voiceless
and aspirated plosives in Ancient Greek were neutralised before /s/. Letters were
invented to correspond to the sequence of the neutralised stop + /s/. For example,
9
<ψ> corresponded to the sequence comprising the archiphoneme /P/, resulting

from neutralisation of the /b–p–pʰ/ oppositions, plus a following /s/. The letter
<T> in the Avestan alaphabet corresponded to an archiphoneme /T/ representing
the neutralisation of /t–d/ (<t-d>) in prepausal and pre-obstruent positions. The
Devanagari script has a letter representing the archiphoneme resulting from the
neutralisation of the nasals /m–n–ɳ–ɲ–ŋ/ before stops (see Bright 1996: 385).
The correspondence of letters to archiphonemes is rather surprising because it
demonstrates that whoever invented letters for that purpose realised that there
was something different, not necessarily about the sound itself at that position in
the phonotactic structure, but about its distinctiveness in that position. It attests to
some conscious appreciation of distinctiveness as an abstract structural property
of a system. Some writing resources have thus developed as a consequence of an
analysis as deep, if not as detailed, as any in modern phonological theory.
By conceiving of the relationships between sound units of spoken language
and graphic units of written language as relations of correspondence I am delib-
erately taking a non-representationalist view of written language. That is to say, I
do not take the Aristotelian view (De Interpretatione 16a3) that writing represents
speech (Figure 1.1a). I take instead the view, elaborated in Section 1.1.3, that
language can be expressed in spoken and written forms but that its ontology as a
system of lexis and grammar is equally independent of, and dependent on, both
(Figure 1.1b). It is the purpose of phonetic transcription to embody an analysis
of its spoken expression. A theoretical account of how it does so is outlined in
Section 1.3.
(a) LANGUAGE SPEECH WRITING
(b) LANGUAGE
SPEECH WRITING
FIGURE 1.1: Two views of the relationship between language, speech and
writing: (a) that speech expresses language and writing represents speech;
(b) that both speech and writing independently express language. The
dotted arrow in (b) indicates that relations of correspondence can be set
up between elements of speech and elements of writing.
1.1.3 Speech, writing and the linguistic sign
Resemblances between phonetic transcription and phonographic writing are

obvious but potentially misleading. They are both forms of writing in the wider
sense of graphic representations of some aspect of language, and they may
even employ notation which is visually the same, but their purposes are quite
different. Spelling uses notation to write items of lexis and grammar which by
SomoudBarghouthy
definition are language-specific, whereas phonetic transcription uses notation to
write an analysis of pronunciation-forms using language-independent symbols.
By a pronunciation-form I mean something pronounced, either real words of a
particular language or nonsense words, looked at from a perspective which is
neutral with respect to speaking and listening. The general term I shall use for
the elements of spelling is character (Coulmas 1996: 72), a term that includes
logograms, syllabograms, the letters of consonantaries and abugidas, alphabetic
letters and also punctuation marks. For the elements of phonetic transcrip-
tion I shall use the general term symbol to include all resources for segmental,
suprasegmental and parametric transcription, including diacritics. The term glyph
is a superordinate term for characters and symbols and is useful for referring to
the graphic form of a character or symbol. Figure 1.2 shows this classification of
notation by purpose.
WRITING
NOTATION
Glyphs
SPELLING TRANSCRIPTION
Characters Symbols
Graphic resources Graphic resources

for expressing lexis and grammar for expressing analyses of pronunciation
FIGURE 1.2: Classification of notation in writing
The three attributes of a ‘letter’ discussed by Abercrombie (1949/1965) –

figura, potestas and nomen – are applicable to symbols as well as characters.
They obviously both have written shape (figura), and can be referred to by
some kind of name (nomen), for example the names given to phonetic symbols
in Pullum and Ladusaw (1996). What is meant by potestas ‘power, ability,
value’ is not so straightforward. Abercrombie takes it to be the pronunciation,
in which case there would be no difference between a character and a symbol,
and indeed he points out that the term ‘letter’ has traditionally been ambiguous
between written character and speech sound. It is perhaps more useful to interpret
potestas as the value a character or symbol has in its contexts of usage, that is
to say, its power or ability to distinguish one linguistic form from another; this
11
interpretation seems to have been given to it by the Icelandic ‘First Grammarian’

in the twelfth century who took the littera doctrine from the Ars Grammatica
of Donatus (Haugen 1972: 51–61). The value of a character is that it is a distin-
guishable unit of spelling, while the value of a phonetic symbol is its ability to
express an analysis of a distinguishable unit of pronunciation (see Figure 1.2)
or, to put it another way, to denote a model onto which a distinguishable unit of
pronunciation can be mapped (see Section 1.3).
Because phonetic transcription is a form of writing, there is a temptation
to think of it as an alternative way of spelling, one that is more faithful to
pronunciation-forms than orthographies usually are, particularly in languages
notorious for complicated sound–spelling correspondences such as English and
French, or in languages that use writing systems which are more logographically
oriented such as Chinese. This temptation is likely to be strengthened by the fact
that most of the symbols of the IPA, currently the most commonly used phonetic
notation system, are derived from roman alphabetic letters and have the same or
similar shapes. But it is of fundamental importance to understand that phonetic
transcription is not an orthography for the words and morphemes of any lan-
guages. Its purpose is to express, in a language-independent notation, an analysis
of pronunciation-forms. There is also a widespread misunderstanding that the
main purpose of spelling, especially in phonographically oriented writing, is to
provide information about pronunciation, and that writing systems are defective
to the extent that they cannot provide for one-to-one sound–spelling correspond-
ences, and spellings are defective to the extent that they do not employ sound–
spelling correspondences consistently and systematically. While information
about pronunciation can be gleaned from spelling with varying degrees of relia-
bility, the primary purpose of spelling is to identify which words and morphemes
are being written. The reader will generally already know how to pronounce the
spoken form of those words and morphemes. As the philologist Max Müller
expressed it using Isaac Pitman’s 1876 alphabet in the magazine Fortnightly
Review, ‘[r]aitiŋ woz never intended tu foutograf spouken laŋgwejez’ (quoted in
Baker 1919: 209). To appreciate these points and their implications more fully,
it is necessary to consider briefly what a linguistic system is, and the relationship
between spoken and written language.
There has been a long tradition, already alluded to in Section 1.1.2 above,
stretching back to Aristotle in ancient Greece and persisting through to the writ-
ings of Saussure, that written language represents speech (Coulmas 2003: 2–12).
The view still has currency, having been more recently expressed for example by
DeFrancis (1989: 6–7) and Daniels (1996: 3). But challenges to this view have
come from the recognition that spoken and written discourses have their own
particular features such that the one cannot be seen merely as the transfer of the
other into a different medium (Vachek 1945–9, 1973; McIntosh 1961; Pulgram
1965; Halliday 1985; Mulder 1994), and from theorising about the relationship
between language, speech and writing. Critical perspectives on the relationship
between spoken and written language are found in Harris (1986) and Olson
(1994). For written language to be a representation of spoken language, concepts
relating to linguistic structure such as ‘word’ and ‘syllable’, Olson argues, would
SomoudBarghouthy
already have to have been explicitly recognised before the invention of writing.
Olson (ibid.: 68) proposes the reverse, that ‘awareness of linguistic structure is a
product of a writing system not a precondition for its development’ (my italics).
Olson’s claim, that linguistic structure is only accessible for analysis once lan-
guage has a written form, may, however, be mistaken. A vigorous tradition of
grammatical scholarship arose in India during the early centuries of the first mil-
lennium bce culminating in descriptions of Sanskrit still regarded as exemplary
linguistic analyses, for example Pāṇini’s Astādhyāyī ‘Eight Books’. It is very
possible that these analyses were first carried out in the absence of literacy and
were orally transmitted from memory, only later being set down in written form
(Allen 1953: 15; Misra 1966: 19; however, for evidence of Pāṇini’s possible
literacy see Bronkhorst 2002).
Whether Olson is correct or not, there is no logical precedence of spoken lan-
guage over written language. While it is accepted that spoken language existed
for tens of thousands of years before writing was invented, and that human
beings acquire spoken language before learning to read and write, it is logically
possible for there to be a written language without a corresponding spoken lan-
guage. Words and morphemes, the basic abstract items of language that possess
meaning and grammatical properties, are equally independent from sound and
from visual marks, but without sound they cannot be spoken or heard and without
visual marks they cannot be written or seen. The fact that phylogenetically and
ontogenetically the linguistic harnessing of sound predates the linguistic har-
nessing of visual marks has little if anything to do with any intrinsic properties
of lexis and grammar. Explanation for these historical and developmental facts
has to be sought in the evolution of cultural practices in human society (Trigger
2004) and the course of biological maturation in individuals from birth through
infancy into childhood and beyond (Locke 1993). Because originally language
only manifested through speech, when language started to be written it might
well seem as if it were speech that was being written.
The adaptation of Saussure’s concept of the linguistic sign in Figure 1.3 shows
that the relationship of phonetic transcription to spoken language is not analo-
gous to the relationship of spelling to written language. Saussure’s linguistic sign
has two aspects (Saussure 1974: 65–7): the ‘signified’, which can be interpreted
broadly as the meaning of the sign, and the ‘signifier’, which I will interpret
as pertaining to the observable manifestation of the sign.3 The terms ‘content’
and ‘expression’ are often used instead of signified and signifier respectively.
‘Expression’ can be thought of as the clothing that a sign wears so that it can be
recognised. In written language, spelling is the clothing while in spoken language
it is the pronunciation. Phonetic transcription is a way of setting down in notation
an analysis of what the clothing of spoken language is made of. An analogous
description of what the clothing of written language is made of would be the
naming of the characters used in the spelling of written signs. We can also, of
course, name the symbols used in a phonetic transcription using, for example,
the symbol names given in Pullum and Ladusaw (1996) and recommended,
although not officially adopted, by the IPA (IPA 1999: 31, 166–84, 188–92). In
doing so, we are treating a transcription symbol as a sign whose content is its
phonetic definition and whose expression is a glyph, that is to say the glyph is
SomoudBarghouthy Theoretical Preliminaries 13
the ‘spelling’ of the sign. The point is that, unlike spelling, phonetic transcription
does not express linguistic-semantic meaning; it expresses an analysis of pronun-
ciation. For example, the IPA transcription [ˈtʰeɪbəɫ] does not express the same
as the spelling <table> – the latter expresses the word table whereas the former
comprises symbols which express categories such as aspirated alveolar plosive,
close-mid front closing diphthong, etc.
LANGUAGE
SPOKEN WRITTEN
LINGUISTIC SIGNS LINGUISTIC SIGNS
Content Content
Expression Expression
(pronunciation using (spelling using
speech sounds) characters)
Phonetic transcription
using symbols to express an analysis of pronunciation
FIGURE 1.3: The relationship of phonetic transcription to language
When spellings for a written language become fixed and an orthography

is established, the criterion for correct spelling is not how closely it matches
pronunciation but whether the correct expression units, i.e. characters, have
been used and are in the right sequence. Pronunciation can vary widely, and
change over time, without affecting spelling. To take an example from current
British English, the correct spelling of the word party is <party> whether or not
the spoken word uses plosive [t], spirantised [s̝] or glottal [ʔ] to realise the /t/
phoneme, or just a hint of breathy voice in a pronunciation we can transcribe as
[pʰɑː̤ɪ]; even if its pronunciation became homophonous with the word pie the
identity of party would still be expressed in written English by its spelling as
<party>.
Having said that phonetic transcription is not an alternative spelling system, it
has to be pointed out that there are transcriptions which do have functions more
like those of spelling, and may be considered a type of spelling. This is most true
of representations of postulated invariant underlying forms in phonology, such
as in morphophonemic transcription, which are discussed in Chapter 4 Sections
4.6 and 4.9.
To summarise, phonetic transcription embodies in a written form an analysis
of the expression elements of spoken language by using symbols which have
SomoudBarghouthy
phonetic definitions drawn from phonetic theory. By contrast, spelling uses char-
acters as the written expression of language. The characters themselves have no
theoretical definitions.
1.1.4 Spoken and written languages as translation equivalents
It is justifiable to regard the relationship between the spoken and written forms
of a language as a translation relationship (Mulder 1994: 54). To write a spoken
word down, or to read out a written word, involves identifying equivalent items
in two different systems in much the same way that translating from one language
into another does (the difficulty, or even impossibility, of finding precise transla-
tion equivalents between languages does not affect the argument, nor am I nec-
essarily claiming that written and spoken words are absolute equivalents within
the same language). When literate translators translate between English book and
French livre, six corresepondences and equivalences between expression-forms
are implicated (indicated by double-headed arrows) as shown in Figure 1.4.
English French
Spoken [bυk]
Written <book> <livre>
FIGURE 1.4: Correspondences and equivalences between expression-

forms in translations
The expression-forms of English are completely different from the equivalent

expression-forms of French; it is (near-)equivalence of meaning that connects
them all. The same is true if we look only at English or only at French. Ignoring
the visual similarity of characters and symbols, the spoken expression-form [bʊk]
and the written expression-form <book> have nothing in common as expression-
forms: the former is a pronunciation-form, the latter a spelling-form. Their only
connection is via the abstract lexical item book for which they are both expres-
sions in different media.
It is worth pursuing this point a little further by looking at logographic writing,
where the translation nature of spoken language and written language relation-
ships is more obvious. The Chinese logogram for ‘below’ is <下> while the
spoken form of the word is [ˋɕiɛ].4 No properties of the one in any way suggest
any properties of the other any more than properties of the English spelling-form
<book> suggest the French pronunciation-form [livʁ], or properties of the French
spelling-form <livre> suggest properties of English [bʊk]. The Saussurean doc-
trine of the arbitrariness of the linguistic sign holds sway over all these relation-
ships of cross-linguistic equivalence and cross-medium correspondence.
Insight into these relations, and into the question of whether writing is used
to represent speech, is provided by the phenomenon of xenography (from Greek
ξένος ‘stranger’), also called heterography. A xenogram, or heterogram, is a
loanword written in the spelling of the donor language but pronounced as the
15
spoken translation equivalent in the borrowing language. An example would be if

English were to spell book as <livre> but read it aloud as [bʊk]. The French spell-
ing would provide no information about the English pronunciation but would
identify the lexical item in logographic fashion. The similarity to translation is
apparent when we see that xenography is the exploitation of the relation shown
in Figure 1.4 between [bʊk] and <livre>. Xenograms have occurred here and
there throughout the history of writing in situations of language contact and the
borrowing of writing systems. Coulmas (1996: 564, see also Gelb 1969: 105–6,
where they are referred to as allograms) mentions Sumerian spellings being used
to correspond to Akkadian pronunciations (sometimes called Sumerograms),
Aramaic spellings corresponding to Middle Persian pronunciations (sometimes
called Aramaeograms; see also Skjærvø 1996: 517–20), and Chinese characters
corresponding to Japanese pronunciations in Japanese kanji. Xenography shows
that the only absolutely crucial correspondences between written and spoken
language are at the level of lexis and grammar.
1.2 Phonetic Symbols and Speech Sounds

At first sight it may seem self-evident that what phonetic symbols denote are
speech sounds. They are often talked of in this way, but there are three major
difficulties to consider: the notion of a single discrete speech sound itself as an
identifiable object, the indeterminate complexity of speech, and the problem of
real-world extension.
1.2.1 Speech sounds as discrete segments
The notion of a single discrete speech sound, often referred to as a ‘segment’,

is highly problematic in the context of spoken language. It has become com-
monplace in phonetics and phonology to regard the segment as a ‘fiction’
(Abercrombie 1965: 122, 1991: 30–1; Laver 1994: 568) and to stress the para-
metric nature of continuous speech, but the fictional status of the segment needs
some critical discussion if we are not to fall into the trap of dismissing it as some-
thing devoid of any kind of reality. It is perfectly possible to produce an isolated
steady-state vowel sound such as [a], or nasal such as [m], or fricative such as
[s], or lateral approximant such as [l], and quite feasible with some practice to
produce isolated stops of various kinds with release bursts unaccompanied by
vowels, such as [p]. These sounds can be produced by speakers and perceived
by listeners, they are discrete, and they are every bit as materially real as speech.
But we cannot meaningfully call them segments because they are not part of a
larger item: the term ‘segment’ implies ‘segment of’ an articulated structure.
When we look at the phonetic structure of speech we do not find it composed of
discrete sounds strung together, in Hockett’s (1955: 210) simile, like beads on
a string. The phenomenon of formant transitions nicely illustrates the problem
of segmentation. Experiments in speech perception have shown that information
about the place of articulation of a stop consonant is contained in the formants
of adjacent vowels as they undergo changes in frequency caused by changes in
SomoudBarghouthy
vocal tract shape. The presence of the transitions is enough to cause listeners to
hear the stops. Formant transitions are, from an auditory-perceptual perspective,
part of the structure of the stops as much as they are part of the acoustic structure
of the vowels. The resonant properties of the transitions are vocalic but they are
encoding information about stops which are not vocalic. The form of the transi-
tion information, we might say, belongs to the vowels carrying it but the value of
the information belongs to the stop articulations causing, and being perceptually
cued by, the transitions. It is impossible to segment between the form and the
value of the information.
The ‘fiction’ that Abercrombie and Laver talk about comes from treating
speech as if it were constructed from the kind of discrete vocal sounds which
we know can exist outside of speech. But it does not take much to abstract
sounds perceptually from speech and equate their qualities with the qualities of
these discrete sounds, for example equating the vowel sound in the pronuncia-
tion of cat with an isolated [a]. We can then treat phenomena such as formant
transitions as if they result from contextual influences on otherwise discrete and
spectrally stable sounds. The fact that we can do this attests to some normalising
and integrating processes in our perceptual and cognitive systems enabling us to
identify segments in our perceptions (Repp 1981: 1462; Raphael 2005: 200–1;
and see Chapter 5 Section 5.3) and to operate with the notion ‘segment’ as a
pre-theoretical model of the kind that may have facilitated the development of
alphabetic writing.
Postulated contextual influences on putatively discrete and stable segments
are referred to in phonetics and phonology as ‘coarticulation’, a phenomenon
which Laver points out is a further fiction necessitated by the fiction of the
segment, an ‘antithetic error’ which Abercrombie (1989/1991: 31) sees as a case
of enabling two wrongs to make a right. It needs to be appreciated, though, that
in phonetics as in literature, fiction is not the same as fantasy. Analysing and
describing speech in segmental terms, and transcribing it with discrete phonetic
symbols, are based on a principled understanding of the structure of speech and
how it can fruitfully be analysed, not on unbridled invention or naïve assumption.
It may even parallel quite closely how we process the time-varying speech signal
in terms of stable percepts when we listen to it, rather than parallelling speech
production processes (see Chapter 5 Section 5.3). Nonetheless, it is absolutely
necessary to remember that symbols in a segmental transcription do not in them-
selves accurately reflect the temporal structure of speech as revealed instrumen-
tally; readers of a transcription with sufficient knowledge of phonetics will not
be misled into thinking that they do. Because we can analyse speech in terms of
segments does not, and should not, commit us to the view that it is produced in
terms of segments.
One way to align segmental transcriptions with the temporal structure of
speech is to exploit the fact that the acoustic signal can be segmented into discrete
acoustic classes (Fant 1962: 11; Barry and Fourcin 1990: 32–3, 40). The prime
acoustic classes in speech are silence, transience, aperiodicity and periodicity.
Silence occurs in the structure of speech as the acoustic correlate of the articula-
tory hold phase of a voiceless oral stop; transience occurs when there is a sudden
release of air pressure causing a single pressure pulse, for example the release
burst of a plosive; aperiodicity is found as a result of air being forced through
17
a partial articulatory stricture under pressure to create the turbulence of frica-

tives characterised by the quasi-random variation of frequency and amplitude;
periodicity is characterised by regularly repeated pressure pulses of very similar
frequency and amplitude resulting from vocal fold vibration, occurring in all
voiced sounds. Acoustic classes can occur singly or in certain combinations. A
voiced fricative, for example, combines aperiodicity and periodicity; a voiced
plosive combines transience and periodicity. All in all we can set up six basic
acoustic classes: four simple ones and two compound ones. The spectrogram and
synchronised waveform of the utterance So does she keep chickens? in Figure 1.5
show how speech can be segmented into these acoustic classes. The phonetic
transcription underneath is an approximate indication of how the classes relate to
the phonetic structure of the utterance.
Further acoustic subclasses could be set up by reference to spectral and ampli-
tude discontinuities such as can be seen in Figure 1.5 at the points marked on the
waveform by the arrows. Yet further subclasses could be established on the basis
of the distribution of acoustic energy (see Turk, Nakai and Sugahara 2006 for
discussion of criteria for acoustic segmentation). If a different symbol were to be
assigned to each subclass then we could use symbols to express categories that
occur discretely and objectively in speech as segments. The reason we do not do
this may be partly because phonetic notation is still firmly rooted in the tradition
of focusing on the articulatory domain of speech (MacMahon 1996: 821), but it is
surely mostly because we would lose track of the linguistic-phonetic information
FIGURE 1.5: Segmentation of So does she keep chickens? into acoustic

classes. s = silence, t = transience, a = aperiodicity, p = periodicity
SomoudBarghouthy
which is distributed across acoustic class boundaries (see for example Fowler
1986: 11–13). This information is important because what phonetics is most
interested in is not speech as a catenation of noises but speech as the pronun-
ciation of language. There is experimental evidence that we perceive speech in
‘temporal compounds’ (Warren 2008: 198–9) which may contain many changes
of acoustic class extending over at least a whole syllable and encompassing reali-
sations of several phonemes, from which we can then ‘infer’ the presence of a
segmental structure (see Chapter 5 Section 5.3). General phonetic categories are
of interest because of how they can be put into relations with the phonological
categories of spoken language. It is more fruitful to deal in phonetic categories
that more closely match our phonological categories than in ones that refer only
to acoustic classes.
For example, it is useful if the symbol [d] can be interpreted to include formant
transitions in adjacent vowels as well as a hold phase and a burst; all these phe-
nomena and the acoustic classes in which they are embedded are relevant to
[d] as the realisation of a phonological item /d/. They may well also be highly
relevant to the stability of the auditory correlate of [d], despite considerable dif-
ferences in formant transition patterns depending on the frontness or backness of
adjacent vowels.
Further discussion of the notion of a segmental speech sound, and a defence
of its legitimacy in phonetic description, is presented in Chapter 2 Section 2.2.4.
1.2.2 Complexity of speech sounds
The second difficulty with the claim that symbols denote speech sounds is that,
even in the case of an isolated steady-state sound, the processes and events going
on are too numerous to identify. As Pike (1943: 152) has counselled, ‘no phonetic
description, no matter how detailed, is complete’. Speech is a series of overlap-
ping events taking place in articulation, aerodynamics, acoustic transmission,
auditory reception and perception, which are interlocking domains connected
in a chain of cause-and-effect relations of a complex and often non-monotonic
kind. No transcription can ever hope to denote all the events in even one of these
domains, never mind all of them, nor can any phonetician claim to know every-
thing about them all. To take a very simple example, the vowel sound transcribed
by the IPA symbol [ɑ] involves the following events (the list is by no means
exhaustive):
1. In the articulatory domain:
contractions and relaxations of the intercostal and abdominal muscles,
contraction of various intrinsic laryngeal muscles,
repeated opening and closing of the glottis,
lowering of the jaw and tongue, and
retraction of the tongue root into the pharynx.
2. In the aerodynamic domain:
movement of air up the trachea,
increases and decreases in subglottal air pressure, and
jets of air releasing into the pharynx.
SomoudBarghouthy
3. In the acoustic domain:
Theoretical Preliminaries 19
rapid oscillations of countless air particles at thousands of different fre-

quencies and amplitudes, and
the formation of a standing wave in the vocal tract with pressure and veloc-
ity nodes.
4. In the auditory domain:
rapid oscillations of the eardrum and the perilymph fluid,
repeated stimulations of the hair cells in the inner ear, and
repeated firings of many auditory nerves.
5. In the perceptual domain:
awareness in consciousness of a sound having a particular pitch, timbre
and loudness.
In transcription all these are distilled down to [ɑ], a static visual object, and it
is far from clear how we should characterise the relationship between all these
myriad events and a single symbol. We cannot describe or observe all the indi-
vidual events, even by marshalling a whole battery of instrumental techniques.
We cannot even know, at the lower levels of detail, how many events take place.
If we claim that phonetic symbols denote sounds, then we have to admit that we
do not fully know what it is they actually denote because we cannot fully know
everything about sounds. This situation is not of course unique to phonetic nota-
tion but is shared by all forms of representation – whatever is represented, we
cannot know everything about it. Our view of the thing represented is selective,
shaped by properties of our perceptual and cognitive systems, by our experiences
and by the purpose for which we wish to represent it, otherwise it would be an
exact copy, like the map in Borges’s story ‘that was of the same Scale as the
Empire and that coincided with it point for point’.5 Because phonetic symbols
express an analysis of speech, and because we can only analyse things in terms
of what we know about them, it follows that phonetic symbols cannot, at any one
time, denote anything beyond the limits of what is known about speech at that
time. It is the role of phonetic theory to systematise our knowledge of speech by
identifying the important parameters along which it varies to give rise to distin-
guishably different sound-types – place of articulation, degree of stricture, glottal
settings and so on. It is from these parameters and parameter-values that phonetic
theory constructs its models, and, as discussed in Section 1.3, it is these models
that phonetic symbols denote.
1.2.3 Speech sounds vs. analysis of speech sounds
The third and final of the three serious difficulties attending any claim that
phonetic symbols denote speech sounds concerns the problem of real-world
extension. The same problem is encountered by the claim that in language words
directly denote things (Lyons 1977: 216). Suppose we did want to use symbols
to denote actual speech sounds. We hear a sound si and denote it with the symbol
σi. We then hear another sound, sj, which to our ears sounds the same as si, but
we cannot use the symbol σi because that denotes si. Things soon get out of hand
because of the sheer numbers involved. If symbols denote individual sounds
SomoudBarghouthy
then each symbol must denote a sound produced at a particular time and place
and no other. With this restriction all transcriptions would have to be specific
transcriptions (see Chapter 4 Section 4.1 for the distinction between specific
and generic transcriptions) and all transcriptions would have to be unique, just
as, if words denoted individual things, there could be no generic reference, only
specific reference. Furthermore, symbols in these conditions would only serve
as substitutes for sounds, needed for no reason other than that sound cannot be
put onto paper – we could instead carry round sacks of recordings in the manner
of Swift’s Lagado professors.6 Symbols in transcriptions would therefore not be
capable of embodying a phonetic analysis of the expression elements of spoken
language because they would not be denoting theoretical categories, but would
only denote specific non-equivalent events (or sets of events); they would not
even be embodying pre-theoretical analyses of the kind required to judge that two
things share some common property. Nor is it a solution to say that a phonetic
symbol in a transcription denotes the set or class of sounds of which specific
sounds are members. A set of sounds is a potentially ever-growing collection of
individual sounds simply giving us more and more of the same. It is only when
we come to consider criteria for assigning sounds to sets that we start to get
somewhere. If we assign sounds to the same set because they sound the same
then we are indeed applying a pre-theoretical analysis to recognise the similarity;
our symbol can then denote this similarity. If we have a theory that can account
for the similarity then we are applying a theoretical analysis and our symbol can
denote the theoretical category or categories in terms of which we make that
analysis. These issues are explored further in Section 1.3.
1.3 Phonetic Notation, General Phonetic Models and the

Role of Phonetic Theory
The answer to the problems raised in Section 1.2 is to regard a system of phonetic
notation as a system for denoting general phonetic models. Models are either
theoretical or pre-theoretical. Theoretical models are generated by the categories
of a theory whereas pre-theoretical models are abstractions from experience and
more like prototypes in recognition memory (Johnson 2007: 30–2) or imitation
labels. If we use a symbol in the absence of a phonetic theory then we have to
find some way of defining the model it denotes without recourse to a theory.
The alternative to a theoretical definition is an ostensive definition. What «b»
denotes can be defined ostensively as the sound at the beginning of the spoken
word bee.7 Phonetic theory plays no part in such a definition. Ostensive defini-
tions can be refined into something more general and abstract by saying that «b»
is what the spoken words bee, boot, bark and so on have in common. Ostensive
definitions of this kind rely firstly on one’s having experienced the relevant
spoken words, and secondly on one’s ability to notice and abstract the relevant
similarity from them. Pre-theoretical phonetic models can therefore be defined in
terms of the commonalities shared by members of sets of known pronunciation-
forms. But there is a circularity here: the very phenomena one wishes to model
are furnishing the models to be used for modelling them. Circularity is broken if
we have an adequate phonetic theory to provide the definitions for our models
and the categories for their generation. What we think of as the sounds of speech
21
are constellations of events whose complexity, as we have seen in Section 1.2,

defies exhaustive description. To deal with this intransigence we theorise about
the most salient identifiable events marking off one distinguishable sound from
another and set them up as a network of interrelated theoretical categories, in the
dimensions of an abstract taxonomic space. The intersections of these catego-
ries generate theoretical models as shown in a simple two-dimensional space in
Figure 1.6; it should be understood that in fact there is no limit to the number
of dimensions that can be set up. The role of phonetic theory in relation to pho-
netic notation is thus crucial on two counts: it furnishes us with categories for
the analysis of speech, and it enables us to set up these categories as models in a
non-circular way. In other words, it provides the denotata for phonetic notation.
Part of the task of phonetic theory is to chart abstract taxonomic space by setting
up the kinds of dimensions and categories that observable phonetic data can be
mapped onto, to decide how many dimensions and categories are required, and
to work out which categories can and cannot co-occur.
DIMENSION x
Category i Category j Category k
Category c Model ci Model cj Model ck

y
DIMENSION
Category d Model di Model dj Model dk
Category e Model ei Model ej Model ek
FIGURE 1.6: Categories, dimensions and models in a small,

two-dimensional, abstract taxonomic space
I shall call any notation system not underpinned by a phonetic theory

‘pseudo-notation’ and its symbols ‘pseudo-phonetic symbols’; a system of
notation which is underpinned by phonetic theory I shall call proper notation
and its symbols proper phonetic symbols. ‘Pseudo’ and ‘proper’ are not to be
taken as value terms. The role of phonetic theory in relation to phonetic nota-
tion is therefore crucial. It is responsible for distinguishing between a proper
notation which qualifies as a technographic writing system with a scientific
basis (Mountford 1996: 628) on the one hand, and a pseudo-notation based
on abstraction from experienced exemplars on the other hand. Commonly
encountered forms of pseudo-transcription are respelling and transliteration (see
Section 1.5 below).
Any expression element from any glottographic writing system can be used
to represent some aspect of pronunciation on the basis of correspondences
between elements of a writing system and elements of pronunciation without
phonetic theory playing any part. This is how the rebus principle arose and
how phonography has gained ground in the diachrony of writing systems (see
Chapter 2 Section 2.2). Phonetic theory is not a prerequisite in such cases; all
SomoudBarghouthy
that is needed is an ability to make same-or-different judgements about pronun-
ciation in a pre-theoretical manner. If phonetically untrained literate English
speakers hear a proper name they have not heard before and do not know how to
spell it, they can try to write it using the letters of the English alphabet to repre-
sent the sounds that they identify. The result will be a pseudo-transcription and
the person will have used the letters as a pseudo-notation system. It is impor-
tant to understand that they will not thereby have spelled the name whether or
not the result is the same arrangement of letters as the spelling. One can only
spell a word if one knows the spelling. If one guesses a spelling, one does so
via pseudo-transcription – witness the idiosyncratic sound–spelling relations
in proper names such as <Cholmondeley> - [ˈʧʌmli], <Keighley> - [ˈkiːθli],
<Salkeld> - [ˈsafəld] (from Wells 2008). In pseudo-transcription, sounds will
not have been identified through theoretically informed phonetic analysis, and
therefore the transcription will not be expressing such an analysis. It does,
however, express a pre-theoretical analysis of the kind needed to make similar-
ity judgements. Conversely, if a reader is presented with an unknown name in
written form, the spelling can take on the properties of a pseudo-transcription
if the reader tries to extract information about its pronunciation. The key point
about a proper phonetic transcription is that it expresses an analysis into theo-
retically defined categories. A pseudo-transcription does express some kind of
analysis, but into elements that are not theoretically defined. They will be
known through ostensive definition which by its nature relies on experience,
not on knowledge of theory. Compare, for example, the theoretical definition
of [b] as ‘voiced bilabial plosive’ and the ostensive definition of «b» as ‘the
first sound in the word bee’. Different kinds of knowledge are required to
understand these definitions and different kinds of analyses are undertaken by
applying them.
Pseudo-notation is a set of graphic resources for expressing a pre-theoretical
analysis of pronunciation, and pseudo-transcription is the deployment of a
pseudo-notation to express a pre-theoretical analysis. Proper phonetic notation is
a set of graphic resources for expressing a theoretically informed analysis, and
proper transcription is the deployment of proper phonetic notation.
Transliteration tends in practice also to be pseudo-transcription (see Section
1.5.1 below). The process by which one language borrows and adapts a writing
system from another language involves pseudo-transcription in which the expres-
sion elements are transferred into the borrowing language as pseudo-notation
(see Figure 2.1 in Chapter 2).
A distinction needs to be made between graphic resources for notation being
taken, on the one hand, entirely from an orthography and, on the other hand,
being developed or created as a special phonetic system of notation. I shall call
notation ‘proto-phonetic’ if it is based on phonetic theory but uses only ortho-
graphic resources. We therefore have three possibilities for the status of a pho-
netic notation system (see also Figure 1.9 in Section 1.7 below):
1. Pseudo-notation – denoting models not defined by phonetic theory; com-

prising orthographic characters which then take on the status of pseudo-
phonetic symbols; enclosed in double angled brackets, e.g. «b».
2. Proto-notation – denoting models defined by phonetic theory; comprising
23
orthographic characters which then take on the status of proto-phonetic

symbols; enclosed in ornate parentheses, e.g. (b).
3. Proper notation – denoting models defined by phonetic theory; comprising
a special notation system of proper phonetic symbols; enclosed in square
brackets, e.g. [b].
The status of a transcription is defined by the status of the notation system in

which it is written. The same glyph can be a spelling letter, a pseudo-phonetic
symbol, a proto-phonetic symbol, or a proper phonetic symbol depending on
the purpose for which it is used and how it is read and interpreted. The glyph
‘b’ can be used as the letter in spelling the English written words bat, blue,
debt, climb, or as a pseudo-phonetic symbol «b» in transcribing a spoken word
perceived to contain a sound that the spoken words bee, boot, bark etc. have in
common, or as a proto-phonetic symbol ﴾b﴿ in transcribing a spoken word con-
taining a sound analysed as a voiced bilabial plosive where the symbol comes
from an orthography, or as a proper phonetic symbol [b] in transcribing a spoken
word containing a sound analysed as a voiced bilabial plosive where the symbol
comes from a phonetic notation system.
A phonetic symbol can be defined as a glyph in relation with a phonetic deno-
tatum. Proper symbols and proto-symbols can be defined formally as in (1.1)
where R is a denoting relation:
(1.1) Phonetic symbol = Glyph R theoretical phonetic model

Example: [b] or ﴾b﴿ = ‘b’ R voiced bilabial plosive
Pseudo-phonetic symbols are glyphs in relation with non-theoretical denotata

such as ostensive definitions based on commonalities as in (1.2).
(1.2) Pseudo-phonetic symbol = Glyph R ostensive definition

Example: «b» = ‘b’ R what bee, bat, crab have in common
What distinguishes a proper symbol from a proto-symbol is that it is a member

of a set of symbols which is not co-extensive with the set of orthographic letters
used for spelling a written language. The IPA symbol [b] has systematic relations
with symbols such as [ɓ] and [ʘ] which are not used for spellings; the letter 
has sequential relations with <a>, <c>, etc. in the order of the alphabet. Proper
symbols and proto-symbols denote analytic models whereas pseudo-symbols
tend to denote holistic prototype models.
Proper phonetic notation will not be as constrained as pseudo- and proto-
notation by limits on the graphic resources available and on the number of
distinctions among the sounds and parameters of speech that can be notated. It
ought also to be less biased towards particular languages and types of languages,
although language biases are probably always going to feature to some extent
in transcriptional practice (see Chapter 5 Section 5.11). As Ladefoged (1990:
343–4) has pointed out, ‘[o]nce a language has been learned one is living in a
room with a limited view. [. . .] Even skilled phoneticians will fail to recognise
SomoudBarghouthy
auditory distinctions to which they are completely unaccustomed.’ It has to be
acknowledged also that special systems of phonetic notation such as the IPA
have in-built biases reflecting the linguistic context of their origins and develop-
ment (see Chapter 3 Section 3.4.5).
Once it is set up, phonetic theory generates its complex models from catego-
ries independently of experience. For example, the IPA chart generates the model
‘pharyngeal nasal’ from the categories ‘pharyngeal’ and ‘nasal’ although no such
sound is possible, and therefore no symbol has been provided for it. Obviously,
no such model could come about as a result of abstraction from experience
because no such sounds will ever have been experienced.
In so far as the models denoted by phonetic notation are constructed by
phonetic theory independently of specific languages, they are general phonetic
models. Phonetic symbols can be said to denote descriptive phonetic models
when they are used in relation to language data in transcriptions, and to repre-
sent, or refer to, those phenomena which are mapped onto the general phonetic
models.
1.3.1 Phonetic transcription as descriptive phonetic models
A phonetic notation system on its own denotes the categories and models in
terms of which analyses of pronunciation can be made. When used in a tran-
scription of speech data the theoretical models denoted by symbols become
descriptive models through having observed phenomena mapped onto them (for
the distinction between theory and description on which this approach is based
see Mulder 1975). Transcribers have to judge whether the phenomena meet the
criteria for being mapped onto a particular model (see Chapter 5 Section 5.9).
The phenomena in question may be linguistic, in the sense of realising categories
of linguistic structure such as phonemes or tones, or may be paralinguistic or
extralinguistic – the only limitation is that they must be produced by the human
vocal tract. They may also belong to any of the domains of phonetic phenomena,
of which it is useful to recognise five: articulatory, aerodynamic, acoustic, audi-
tory and perceptual (see Chapter 6 Section 6.5).
At this point we need to distinguish between denoting on the one hand, and
representing or referring to on the other hand, in relation to phonetic symbols.
A descriptive model is the conjunction of a theoretical model which is denoted
by the phonetic symbol, and certain speech phenomena which are mapped
onto the theoretical model and which are referred to, or represented by, the
symbol; these relations are diagrammed in Figure 1.7. Symbols in transcriptions
are descriptive models. Whenever I talk about phonetic symbols representing
sounds or referring to sounds in the ensuing sections and chapters it should be
understood in the way just explained. In addition to representing and referring
to sounds, symbols also express an analysis of them by virtue of the theoretical
models they denote – the representing/referring capacity is extensional, while
the analysis-expressing capacity is intensional. That is to say, a potentially
infinite number of referents can have one and the same analysis, or, in other
words, an infinite number of descriptive models can relate to a single theoreti-
cal model.
SomoudBarghouthy
THEORETICAL
Theoretical Preliminaries
PRONUNCIATION
25
MODEL [a] PHENOMENA
Low front unrounded vowel Mapping relation Sounds judged to meet relevant criteria
for the theoretical model [a]
Denotes [a] Represents
DESCRIPTIVE
MODEL
FIGURE 1.7: The mapping of speech phenomena onto a theoretical model

creates a descriptive model
Phonetic transcriptions, then, are composed of descriptive phonetic models.

A phonetic transcription is a proper phonetic transcription if the descriptive
models derive from pronunciation phenomena being mapped onto theoretical
models and a special phonetic notation is used for writing it; it is a proto-phonetic
transcription if it is written with orthographic characters; and it is a pseudo-
phonetic transcription if the descriptive models derive from phenomena being
mapped onto pre-theoretical models of what several pronunciation-forms have in
common. Again it should be stressed that the terms ‘proper’, ‘proto’ and ‘pseudo’
are not value terms. Proper phonetic transcription is not intrinsically better than
pseudo- or proto- transcription; how good a transcription is depends on how well
it fulfils its aims and purposes. The differences are nevertheless very important
and hinge on whether there is a body of phonetic theory underpinning the nota-
tion to provide it with consistent phonetic definitions, and whether the notation
comprises a set of special symbols linked to the theory by interpretative conven-
tions such as those of the IPA.
1.3.2 Phonetic transcription as data reduction-by-analysis
Representing the myriad events of continuous speech as a linear sequence of a

relatively small number of stationary graphic objects, rather than being an unfor-
tunate limitation, is precisely what makes transcription useful. It is a process
of data reduction in which the transcriber tries to make static order out of a
seeming dynamic chaos by analysing an utterance in terms of known phonetic
categories. It can furnish us with a visual record of an analysis of a particular
observed utterance by denoting the categories which, in the judgement of the
transcriber, are the most appropriate ones for mapping the phonetic phenomena
onto. Sounds as auditory events appear and disappear in an audio recording just
as they do in live speech. Although it is possible to slow playback down without
affecting the pitch of the speaker’s voice, the constantly changing signal makes
it difficult to recognise recurring patterns in a speaker’s pronunciation of the
kind a phonetician, dialectologist, sociolinguist, conversation analyst, speech
pathologist or forensic phonetician might be interested in. Patterns can be seen
much more easily in a transcription when the eye can scan the page at leisure.
But a specific transcription does both more and less than arrest the sounds of
SomoudBarghouthy
speech as they fly by. Whereas an audio recorder simply registers whatever hits
the microphone, a transcriber has to make judgements about what hits his or her
ear and make decisions about how to represent it. Inevitably during this analytic
and interpretative process certain aspects of the raw speech signal will escape the
transcriber’s notice, or be judged not worth including in the transcription. The
transcriber’s own language background, and experience in doing transcription,
will partly determine what escapes notice and what is judged relevant. In this
sense, in addition to the impossibility of capturing all speech events, a transcrip-
tion contains less than the utterance it purports to represent. That is to say, a
narrow phonetic transcription could always contain more if more time and effort
were spent on it, though one has to recognise the law of diminishing returns. On
the other hand, a consideration of the theory-dependence of transcription leads
to the conclusion that in a crucial sense a transcription contains more than the
raw utterance contains. It contains a classification, based on the categories of
phonetic theory, of what the transcriber thinks are the relevant constituent parts
of the phonetics of the utterance. Abercrombie makes precisely this point when
he says ‘phonetic transcription records not an utterance but an analysis of an
utterance’ (Abercrombie 1967: 127). This truth should never be overlooked when
we think of phonetic transcription as a form of data reduction: the fact that it
expresses a data analysis means that it is also data-enhancing. This is the import
of Thomas Carlyle’s observation that ‘[i]n a symbol there is concealement [sic]
and yet revelation’.
Phonetic transcription helps to make spoken language more available for
further phonological analysis by, ironically, representing it in a written form. By
so doing it does, to some extent, imprison it in ‘the written language bias’ that
Linell (1982) saw in linguistics in general. For example, segmental transcrip-
tions usually take the ‘word’ as the basic unit of utterance structure and employ
the convention of bounding words with spaces despite the absence of spaces
between the pronunciations of words in continuous speech. Parametric transcrip-
tion is more faithful to speech in this respect. Nevertheless, weighing against this
written language bias is the ability of phonetic transcription to capture aspects of
the prosody of spoken language, and paralinguistic and extralinguistic features
such as voice quality, tempo and loudness, most of which have no common
parallels in written language. Although writing can use devices such as enlarged
characters, changes of case and font, different colours and so on for emphasis
and other effects, these are not systematic and are not all routinely employed
outside of advertising and graphic design. By contrast, it is impossible for spoken
language not to have voice quality, pitch, tempo and loudness, all of which are
manipulated by speakers for communicative purposes of one kind or another.
Any system of phonetic notation should provide resources for representing these
kinds of features in transcriptions.
1.4 Content of Phonetic Models

Theoretical models belong to theories not to data. It follows that the content of a
theoretical model cannot be of the same kind as the contents of data. In Chapter 6
Section 6.5 I propose that the categories of phonetic theory should be conceived
SomoudBarghouthy Theoretical Preliminaries 27
of as neutral with respect to the domains of articulation, aerodynamics, acoustics,
auditory processing and perception, despite the largely articulatory terminology
of sytems such as the IPA, so that phonetic symbols are independent of these
domains whilst being interpretable within each domain through domain-specific
conventions. That is to say, the theoretical categories of phonetics inhabit first
and foremost taxonomic phonetic space, and inhabit specific domains by general
phonetic conventions.
What, then, is the content of the theoretical model denoted by, for example,
the IPA symbol [b]? According to Principle 2 of the IPA (1999: 159), it is
‘voiced, bilabial, plosive’, the categories that intersect to generate the model.
This is surely the correct way to define the content of a theoretical model so
that it can be exhaustively defined, providing that we can maintian domain-
neutrality. When we use the term ‘labial’, does it always and only refer to labial
activity, that is to say is it confined to the articulatory domain? This question is
taken up and discussed in Chapter 6 Sections 6.4 and 6.5 in relation to multi-
tiered transcriptions in which each tier takes a different perspective on the data:
speaker-oriented transcriptions take an articulatory perspective in which symbols
have articulatory interpretations, signal-oriented transcriptions take an acoustic
perspective in which symbols have an acoustic interpretation, and listener-
oriented transcriptions take an auditory-perceptual perspective in which symbols
need to be interpreted accordingly. Transcriptions expressing an interpretation
of articulatory and acoustic records have to denote, respectively, articulatory
and acoustic categories to be meaningful, likewise transcriptions expressing an
auditory-perceptual analysis. ‘Labial’ from an acoustic perspective denotes nega-
tive formant transitions and whatever else is thought to be an acoustic correlate of
‘labial’. In an impressionistic transcription ‘labial’ denotes auditory-perceptual
correlates – what labiality sounds like – and, importantly, from an articula-
tory perspective it denotes articulatory correlates rather than being defined
exclusively in articulatory terms. That is to say, phonetic transcription is better
served if phonetic categories are set up as domain-neutral with domain-specific
correlates. Historically, phonetic categories have tended to be overwhelmingly
articulation-based, which has led to problems in making and reading transcrip-
tions without direct access to articulatory data (Heselwood 2008b: 90–2).
Exhaustive definition of a theoretical model does not entail exhaustive
definition of a descriptive model. While it is true that what a symbol denotes is
exhaustively determined by the structure of taxonomic phonetic space, what it
represents, or refers to, is a mixture of known and unknown real-world proper-
ties in whichever domain the transcription is oriented to. In the case of [b] the
speech phenomena we map onto this model may have many unknowns, such as
the position of the tongue-tip, the volume of the buccal chamber, the tilt of the
epiglottis, the height of the larynx and so on. Until we know everything about
speech phenomena and can structure phonetic space so finely that no detail need
ever be unaccounted for, a descriptive model in transcription will in a sense
always represent more than it denotes. This means that the analysis expressed
by the theoretical model is not an exhaustive analysis in so far as our knowledge
of the speech phenomena in question is incomplete. That is to say, we must not
mistake classifications for descriptions (O’Connor 1973: 125–8; Howard and
SomoudBarghouthy
Heselwood 2013: 73–9). Our understanding of [b] as a descriptive model in a
transcription depends not on knowing everything it is made of as a datum, but
on knowing how it relates to other objects in taxonomic phonetic space along
certain dimensions. The question of what something is made of is a question to
be levelled at the speech phenomena which are mapped onto theoretical models,
not at the theoretical models themselves. Phonetic instruments have a pivotal
role when our ears cannot answer such questions. Their revelations can lead to
the setting up of additional dimensions in abstract articulatory or acoustic space
so that its structure becomes finer and more of the content of speech phenomena
can be mapped onto models defined in that enriched space.
Taking this view of the content of phonetic models allows us, I suggest, to
accept Ladefoged’s (1990: 338) assertion that ‘the symbols are not symbols
for phones; they are simply shorthand for what a phonologist would regard as
a bundle of features’, whilst also accepting Ashby’s (1990: 23) rival claim that
‘they represent sound types’. Accommodation of these apparently conflicting
positions is achieved if we take Ladefoged’s view to be true of the theoretical
model denoted by a symbol in a notation system, and Ashby’s to be true of the
descriptive models represented by a symbol in a transcription.
1.5 Respelling as Pseudo-Phonetic Transcription

Respelling is a strategy, used in some monolingual and bilingual dictionaries
and language teaching materials, for indicating pronunciation more accurately
than the normal spelling does. Respelling uses orthographic conventions but
regularises their correspondences with elements of pronunciation so that, as
far as possible, the same character always corresponds to the same pronuncia-
tion element. The pronunciation elements they correspond to can be thought of
roughly as phonemes, although usually no explicit phoneme theory is invoked
for identifying them. A need for respelling is often felt when spelling has become
standardised and fixed while pronunciation has continued to change. In such con-
ditions sound–spelling correspondences become more opaque and irregular so
that readers who do not know the pronunciation of the word cannot reliably work
it out from the orthography. Respellings are a means of trying to re-establish
more direct sound–spelling correspondences and maintain a transparently phono-
graphic written language. I will try now to characterise what respellings are from
a theoretical point of view.
The important question is whether respellings are best seen as a type of spell-
ing or a type of phonetic transcription. This question in effect asks if they are
expressions of written words, or analyses of the expressions of spoken words.
Expressions of written words have the function of enabling the reader to recog-
nise those words via their written form. Respellings, it could be argued, do not
have this function because the word has usually already been identified by its
normal spelling. It could only clearly be said to have a word-identifying function
if it were replacing the conventional spelling as part, for example, of a spelling
reform programme. The purpose of the respellings we are considering is to give
the reader a better idea of the pronunciation of the item than the normal spell-
ing provides. But how far can it be said to embody explicitly an analysis of the
spoken form? The analysis embodied in a proper phonetic transcription relies
29
on phonetic theory for its recovery. A reader with no knowledge of phonetic

theory cannot recover that analysis. Yet some analysis of sound–spelling cor-
respondences in the language has to have taken place to decide which letters
should be used in the respelling. Analysis into sound-types of the kind required
for phonographic writing systems is therefore presupposed. We can characterise
this awareness of sound-types as pre-theoretical phonetic knowledge and char-
acterise respellings as embodying a pre-theoretical analysis of pronunciation.
Being pre-theoretical, it has no explicit classificatory framework within which
to make its analysis, whereas proto- and proper phonetic transcriptions do.
Respellings are in effect transcriptions made outside of any theoretical phonetic
framework and qualify as pseudo-transcriptions as defined in Section 1.3 above.
The orthographic resources used in respelling therefore take on the status of a
pseudo-phonetic notation.
1.5.1 Transliteration as pseudo-phonetic transcription
Transliteration is defined by Coulmas (1996: 510) as the ‘one-to-one conversion

of the graphemes of one writing system into those of another writing system’. It
involves replacing the expression elements of written language signs with a dif-
ferent set of expression elements, e.g. writing English words using Arabic letters,
or Hindi words using Japanese syllabograms. The English and Hindi words still
have to be recognisable as English and Hindi words but they no longer wear
their normal clothing because the spelling–sound correspondences of Arabic
and Japanese have been transferred into the writing of English and Hindi. The
conversion cannot proceed without reference to the pronunciation of both of the
languages involved. Examination of an example of the English word boot trans-
literated into Arabic characters will make this clear.
If someone with sufficient knowledge of English and Arabic is asked to trans-
literate English <boot> into Arabic characters they are very likely to write it as
<‫�وت‬.>.8 There is nothing about <‫ >ب‬to suggest it is the appropriate character
to transliterate , and the same is true of the other characters. The characters
are chosen not because of any intrinsic properties they have linking them to
the English characters (although as it happens there may be distant historical
links – see Gardiner 1916) but because they have correspondences with closely
comparable phonemes in the two languages. The English letter corresponds
to the English phoneme /b/, exceptions such as debt and comb notwithstanding,
and the Arabic letter <‫ >ب‬corresponds to the Arabic phoneme /b/; the English
digraph <oo> corresponds mostly to English /uː/, and Arabic <‫ >و‬corresponds
to Arabic /uː/ (also to /w/); English <t> corresponds to the English phoneme /t/
and in written Arabic <‫ >ت‬corresponds to /t/. It is these correspondences that
determine the form of the transliteration. In fact, there need be no reference
to the English spelling at all. When <‫ >ب‬is used in writing English boot it is
in effect a transcription of spoken English /b/. If it is carried out outside of a
phonetic theory it is a pseudo-transcription which can then function as a respell-
ing, or even as a first spelling if the language has not previously had a written
form. This is the principal process by which writing systems are adapted for
SomoudBarghouthy
writing other languages, a process that has been repeated many, many times in
the history of human literacy. Most transliteration, then, is a process of pseudo-
transcription which can become established as a spelling or a respelling. That is
to say, it can function as the expression of the written sign as well as expressing
a pre-theoretical analysis of the corresponding spoken sign. The two functions,
spelling and pseudo-transcription, will share the same glyphs unless and until the
spoken sign is affected by pronunciation changes without corresponding changes
in spelling.
The fact that the process we have been considering can be carried out on
unwritten languages as well as on written languages rather shows the term
‘transliteration’ as we have used it so far to be a misnomer. This is clear also in
cases of ‘transliterating’ logograms. Transliteration of a logogram into elements
of a phonographic writing system can only be done through reference to the
pronunciation in spoken language of the word represented in written language
by the logogram. For example, the Chinese character <下> meaning ‘below’ is
written as <xià> using the roman alphabet Pinyin system. The choice of which
roman letters to use is not determined by any sound–spelling correspondences
to be found in the relationship between spoken / ˋɕiɛ / and written <下> (see
Section 1.1.1). Clearly the choice of letters in the Pinyin spelling is determined
instead by properties of the spoken form with reference to the kinds of sound–
spelling correspondences that the letters <xia> take part in in languages that use
roman letters. It is therefore really a case of pseudo-transcription functioning as
a respelling. Figure 1.8 diagrams the process.
Spoken sign Pinyin Logographic
written sign written sign
Content Expression Content Expression Content Expression
BELOW ˋɕiɛ BELOW xià BELOW л
Transcribed as Respelled as
FIGURE 1.8: Transliteration as pseudo-transcription and respelling
An example of transliteration in the other direction, from roman letters to

Chinese logograms, also makes it clear that transliteration makes reference to
pronunciation. In the People’s Republic of China the three-syllable foreign word
Obama (the surname of the president of the USA at the time of writing) is written
in Chinese using the syllabograms <奥巴马>, corresponding from left to right
to the Pinyin spellings àu bā mǎ. The syllabograms originated as logograms
31
for ‘mysterious’, ‘adhesive’ and ‘horse’ respectively. The choice of Chinese

characters is determined by a matching of pronunciation, not letter–character
equivalences.
It is clear from the above discussion and examples that transliteration as
usually practised is not, strictly speaking, transliteration. That is to say, one
cannot simply list the characters of two writing systems, find some criteria such
as positions in the lists or visual similarity for pairing them up, and expect the
result to make linguistic sense. Even if one uses criteria based on pronunciation
the result may not be satisfactory. For example, the English postalveolar frica-
tive /ʃ/ corresponds in spelling to the English digraph <sh>, but Arabic /ʃ/ cor-
responds to the Arabic grapheme <‫>ش‬. A strict transliteration into Arabic of the
English orthographic form <shy> would be < > using the one-to-one conver-
sions <s> → <‫>س‬, <h> → <‫>ه‬, <y> → <‫>ي‬, but this would not be regarded as a
helpful way to write the English word shy using Arabic spelling.9 The preferred
solution < > is a respelling via pseudo-transcription (the diacritic <◌َ> corre-
sponds to a short /a/ vowel).
Strict transliteration does, however, have its uses. One of these uses, ironically
enough, concerns phonetic notation systems. For example, Ellis (1869: 15) presents
tables showing one-to-one equivalence between Bell’s organic notation and his own
palaeotype symbols, and MacMahon (1996: 837) adds equivalent IPA symbols. A
symbol in a cell in one table is equivalent to the symbol in the corresponding cell
in the other table, e.g. Bell’s symbol [] is equivalent to Ellis’s [sh] and IPA [ʃ].
Dobson’s edition of Robert Robinson’s Art of Pronuntiation (1617) transliterates
Robinson’s invented symbols into a mixture of IPA symbols and English ortho-
graphic letters (see discussion in Dobson 1957: xi–xv); Robinson’s symbol [ƨ]
transliterates as the IPA symbol [u], for example. ‘IPA Braille’ is another and very
recent example (see Englebretson 2009; and see Chapter 3 Section 3.4.7), in which
every symbol on the IPA chart has an equivalent braille form, as is the SAMPA
notation for use in emails (Wells 1995b; and see Chapter 3 Section 3.4.10). The cri-
terion for transliteration of phonetic notation is that the symbols denote comparable
models and can represent the same pronunciation phenomena.
Further uses for strict transliteration are in the classification of documents and
other bibliographic control measures (Wellisch 1978: 31–7), and for assigning
keystrokes on a keyboard to characters other than standard keyboard characters.
For example, the Wǔbǐzìxíng method of typing Chinese characters, also known
as Wubi, or Wang Ma, assigns keys to characters on the basis of a character’s
stroke-structure, not its pronunciation. The character <银行> meaning bank
(financial) is entered by typing ‘qvtf’, which is a kind of transliteration process
although nobody would use <qvtf> as a spelling for writing the word. The Q key
is used for the left-hand part of the <银> component because it is in the area of
the keyboard for characters with strokes falling to the left, while the V key is in
the area for characters with a hook stroke so is used for entering the right-hand
part; the left-hand and right-hand parts of the <行> component are assigned to
the T and F keys because of left-falling and horizontal strokes respectively. The
pronunciation is represented in the Pinyin pseudo-transcription spelling <yín
háng>, which has no relationship at all to the assigning of keystrokes.
SomoudBarghouthy
1.6 Orthographic Transcription

When a piece of spoken language is written down using spelling conventions it
is not the expression elements of spoken language which are being transcribed;
rather it is the expression elements of the corresponding written language which
are being written. There is therefore a process not unlike translation taking place
in which the spoken language is in a sense translated into written language (see
Section 1.1.4). An orthographic transcription of a word will be the same regard-
less of how pronunciation of the word might vary within and across speakers,
because it is the orthography that determines how the words will be written,
not their pronunciation. In many varieties of English the phonetic form [ɹoʊd]
(or something similar) will occur three times in Jane rode down the road and
rowed across the river, but it will correspond to three different spellings in an
orthographic transcription because there are three different words, each with its
own sequence of letters. To carry out an orthographic transcription of spoken
language one has to recognise the words and know how to spell them (and rec-
ognise the grammar and know how to punctuate it), but one does not have to do
any phonetic analysis of the spoken forms.
The road–rode–rowed example shows the influence of morphology on
English orthography. The -ed past tense inflection distinguishes the weak verb
row from the strong verb ride as well as from the noun road. In addition to the
effects of historical changes in pronunciation, it is the intrusion of morphology
into orthography that makes English spelling appear illogical to anyone who
thinks the job of spelling is to represent pronunciation. English spelling is best
categorised as morpho-phonographic in so far as morphemes that have alterna-
tions in spoken language tend to have a single invariant spelling in written lan-
guage. This is particularly true of inflectional morphology such as plural -s, third
singular indicative -s, possesive -s, past tense -ed and past participle -ed, and only
marginally less true of stem morphology where some exceptions can be noted:
witness the spelling differences in maintain~maintenance, rigour~rigorous, for
example. We will see in Chapter 4 Sections 4.6 and 4.9 how a quest for phono-
logical invariance in spoken language can affect how transcriptions are made and
interpreted.
Orthographic transcription of a more logographically oriented language such
as Chinese makes even more obvious the fundamental difference between tran-
scribing the expression elements of spoken language and writing the expression
elements of the corresponding written language. It is impossible to make an
orthographic transcription of spoken Chinese using traditional Chinese charac-
ters unless one recognises the lexical items and knows the characters with which
they are written. Perhaps a little surprisingly, the same is in fact true of any
language, no matter how phonographic its orthography might be. If we do not
know a word, or do not know its written form, and we write it on the basis of
knowledge of its pronunciation, then we are not transcribing it orthographically
but making a pseudo- or proto-transcription using the orthographic resources as
a pseudo- or proto-notation. What we write may be identical to an orthographic
transcription, but it will have come not from knowledge of how to spell that
particular word but from knowledge of general sound–spelling correspondences.
Failure to appreciate this point has led sometimes to misuse of the term ‘ortho-
33
graphic transcription’, at least from the point of view of the distinction between
transcription and spelling that I have been at pains to draw. In Guendouzi and
Müller (2006), for example, the term is used to cover what I would describe as
the employment of orthographic resources in pseudo- or proto-transcription. The
authors are concerned with producing, for clinical purposes, accurate transcripts
of the spoken language of speech and language therapy clients so that these can
be analysed in a largely Conversation Analysis framework. It is therefore very
useful for the transcripts to represent aspects of speech behaviours such as voice
quality and tempo. When the authors say that an orthographic transcription ‘has
to be detailed and as faithful as possible to the data at hand’ (Guendouzi and
Müller 2006: 36), they are moving away from translation of spoken into written
language and moving towards representing aspects of the expression elements of
spoken language. An orthographic transcription leaves no room for variation of
detail or faithfulness – it translates the grammar and lexis of a piece of spoken
language (an utterance) into written language by adhering to spelling practices.
1.6.1 Interpretation of spellings and transcriptions
If we consider the question of interpretation of characters in spellings and

symbols in transcriptions, the lack of analogy between spelling and transcrip-
tion ought to be apparent. The symbols used in phonetic transcription have to be
interpreted as standing for something, which is not true of individual characters
used in spellings. Proto- and proper symbols denote theoretical models and
therefore are interpreted in terms of those models and the dimensions of phonetic
space that define them. But when we see the letter <g> in the written form of
the English words goat, ghost, gaol, sign, badge, cough, weigh and so on it is
pointless to ask what it denotes or represents, or how to interpret what it means;
indeed, it is not in the least clear that it has anything in common across this set of
written words beyond its graphic form, being called ‘Gee’ and being numbered
seventh in alphabetical order. Any further synchronic interpretation is likely to be
no less fanciful than Clarence’s assertion, on the authority of a wizard, that G is a
disinheritor.10 A literate user of English only needs to know which words contain
it in their spelling and where to put it, or where to expect it when reading. It is not
even necessary to know explicitly how it corresponds to units of pronunciation,
although literate language users do have some explicit knowledge of sound–letter
correspondences which enables them to attempt pronunciations of newly encoun-
tered written words and to make a stab at spelling newly encountered spoken
words; proper names commonly pose these problems. The essential point here
is that the letter <g> is not there primarily to supply information about how to
pronounce the words. The primary function of the arrangement of letters in spell-
ings is to identify words the pronunciation of which will already be known. The
interpretative process in relation to spelling is primarily at the level of lexis and
grammar. In conclusion, we can make the general statement that the characters
of written language do not denote anything at all except their function as distinct
characters in an orthography. That is to say, they have only a self-signifying
function in writing system scripts.
SomoudBarghouthy
By contrast, when it comes to seeing the symbol [ɡ] in a phonetic transcription
we do need to know how to interpret it. What it denotes, every time it is used,
is a model fixed and defined by phonetic theory comprising an intersection of
particular categories in phonetic space. We need to understand the categories to
understand the symbol, and know what kinds of phenomena can be mapped onto
the model comprising them. This point is perhaps clearer when there is no close
resemblance between a symbol and a character in a phonographically oriented
writing system, for example the IPA symbols [ʘ] and [ʢ]. The symbol [ɡ] in
the IPA notation system denotes a theoretical model which may be defined as
a ‘voiced posterodorso-velar plosive’. Each term in the definition can only be
properly interpreted through knowledge of the phonetic theory underpinning
these categories. The category ‘voiced’ is a category onto which can be mapped
vocal fold vibrations of the modal type which, according to current understand-
ing, involves aerodynamic-myoelastic action throwing the true vocal folds into
sustainable quasi-periodic vibration; ‘posterodorso-’ is a category onto which can
be mapped actions of that part of the dorsum of the tongue lying opposite the soft
palate and identified as the active articulator; the category ‘velar’ is for mapping
involvements of the soft palate identified as having the role of passive articula-
tor; ‘plosive’ is a category onto which can be mapped the complex sequences of
events in which intra-oral pressure is manipulated and converted into transient
acoustic energy. Each constituent category thus has a necessary connection with
particular parts of a comprehensive theoretical account of how speech is pro-
duced by the human vocal tract, an account which at its deeper levels draws on
theoretical knowledge from the disciplines of anatomy and physiology, aerody-
namics and acoustics. Full interpretation of proper phonetic notation and proper
phonetic transcription is therefore heavily theory-dependent, which is not the
case with characters such as alphabetic letters, syllabograms or logograms. While
the latter two invoke the concepts of syllable and word respectively, theoreti-
cal understanding of these concepts is not a requirement for literacy. To enable
correct use of any phonetic notation system, a set of conventions for its interpre-
tation must be supplied, defining what the symbols denote.
This brings us to consideration of pseudo-phonetic notation and pseudo-
phonetic transcription. What is the interpretative process when «g» is used as a
pseudo-phonetic symbol? Because it is a pre-theoretical model its interpretation
is not dependent on any body of theory. Instead it is dependent on experience of
pronunciation-forms containing phenomena that map onto «g», a pre-theoretical
model abstracted from experiencing what word-forms such as goat, again, bag
and so on have in common, though not gnat, sign, badge. We can think of «g»
as an imitation label for a particular type of sound which we can recognise and
repeat.
Spellings can be read as transcriptions and vice versa, as we have seen in
Section 1.5 on respelling and transliteration. But it should always be borne
in mind that spellings are expression-forms in written language which can be
put into correspondence with expression-forms in spoken language, whereas
transcriptions, whether pseudo, proto- or proper, represent analyses of spoken
language expression-forms through denoting pre-theoretical or theoretical
models.
SomoudBarghouthy
1.7
Theoretical Preliminaries
Status and Function of Notations and Transcriptions

35
The status of a phonetic notation system, and of transcriptions made with it, is
crucially dependent on its relationship to a body of theoretical phonetic knowl-
edge and on the graphic resources available. A further factor in assessing status
is whether a transcription is specific or generic (see Chapter 4 Section 4.1).
Figure 1.9 illustrates this classification. It is possible for the status to be differ-
ent for a transcriber and a reader of a transcription depending on their level of
familiarity with phonetic theory.
Function refers to the purpose to which a transcription is put by a tran-
scriber or a reader. The most common function of a transcription is probably to
NOTATION and TRANSCRIPTION
PSEUDO- PROTO- PROPER
Denotes models established Denotes models established

through abstraction from experience by phonetic theory
Notated with orthographic characters Notated with phonetic symbols

which become pseudo- or proto-phonetic symbols in [ ]
pseudo- in « », proto-in ()
GENERAL PHONETIC MODELS
Not in a mapping relation with phonetic data

Denote only phonetic models/categories
DESCRIPTIVE PHONETIC MODELS
In a mapping relation with phonetic data

Represent an analysis of data in terms of phonetic models/categories
SPECIFIC TRANSCRIPTIONS GENERIC TRANSCRIPTIONS
Data are from a single observed pronunciation Data are from an indefinitely large class of
observed and/or postulated pronunciations
FIGURE 1.9: Classification of phonetic notation and transcription in terms

of status
SomoudBarghouthy
express an analysis of pronunciation, whether specific or generic (see Chapter 4
Section 4.1). This is a passive function in so far as it does not influence pronun-
ciation but is providing knowledge about pronunciation. However, we shall see
in Chapter 4 Section 4.13 that transcriptions can have active functions as perfor-
mance scores and prescriptive models.
The various functions of transcriptions can be used in different contexts, some
of which, such as lexicography, language teaching, speech therapy and conversa-
tion analysis, have already been mentioned. These contexts will be revisited along
with other contexts such as dialectology and forensic phonetics in Chapter 7.
Notes
1. For the first two chapters of this book I shall be using the term ‘phonetic transcription’
in a wide sense without distinguishing between broad and narrow, impressionistic and
systematic, or even between phonetic and phonemic, except where such distinctions
are explicitly indicated.
2. The terms morphography or morphemic writing are sometimes used where the unit
represented is a morpheme rather than a word. I shall use logography to include mor-
phographic writing unless otherwise stated.
3. Saussure conceived of it as a ‘sound image’ in the speaker’s mind.
4. This character derives historically from the incorporation of the phonetic element 卜
having the phonetic value [bǔ], which bears no relation to / ˋɕiɛ/, the phonological
form corresponding to 下 in modern Mandarin Chinese. It is one of the approximately
33 per cent of characters in written Chinese that do not have any components that cor-
respond to any elements of the spoken form of the word (DeFrancis 1989: 110–12);
it thus truly qualifies as a logogram.
5. ‘On Exactitude in Science’ in Jorge Luis Borges (1975), A Universal History of
Infamy, London: Penguin.
6. In Swift’s Gulliver’s Travels, part III, ch.V: ‘since Words are only Names for Things,
it would be more convenient for all Men to carry about them, such Things as were
necessary to express the particular Business they are to discourse on’.
7. I shall use double angle brackets to enclose symbols representing pre-theoretical
models, and square brackets to enclose symbols representing theoretical models.
8. The direction of Arabic writing is from right to left.
9. < ‫ > ه‬is the isolated form.
10. Shakespeare’s Richard III, act 1, scene i.
SomoudBarghouthy
2
e
Origins and Development of Phonetic
Transcription
e
2.0 Introduction
In Chapter 1 I described proper phonetic transcription as a technographic form
of writing in which the symbols have phonetic definitions supplied by phonetic
theory. In this chapter I will look at how writing became available as a means
of representing pronunciation and consider the rise of the discipline of phonetics
as a means of analysing and describing it. I will then attend to how writing and
phonetics have come together to provide the practical and theoretical resources
that have enabled proper phonetic notation and transcription to develop. Going
back through history it is apparent that these resources have arisen independently
in different cultures and periods, and that what I call pseudo-notation and pseudo-
transcription have been widespread in the transmission and adaptation of writing
systems. Proper phonetic notation and transcription require phonetic theory
and analysis, and have therefore not been so widespread. They did, however,
develop in the work of the phoneticians of ancient India and Greece, among the
medieval grammarians of the Middle East, and among the spelling reformers
of Renaissance and Early Modern Europe. But it was not until the nineteenth
century that phonetic notation started to become systematically separate from the
characters of written language, and transcription systematically and conceptually
separate from spelling.
2.1 Representation of Pronunciation in Writing Systems

Whether or not writing has been language-dependent from its very beginnings
is partly a matter of definition. Systematic use of visual marks may have started
independently of language as a means of expressing extralinguistic meanings
and concepts directly rather than as a means of identifying language-specific
words. A modern-day example is the use of a sign such as to warn of danger.
It will be read very differently depending on the language of the reader – English
‘danger!’, German ‘Achtung!’, Spanish ‘¡peligro!’ etc. – but it can also be read
differently in the same language, e.g. ‘hazard!’, ‘be careful!’, ‘keep away!’ etc.,
because it represents a concept or set of concepts, not a word. It is technically
SomoudBarghouthy
a semasiogram; it expresses a meaning independently of any particular lan-
guage, as do mathematical symbols. Some scholars are happy to call this kind
of graphic communication writing, while others prefer to call it proto-writing or
partial writing, or exclude it from writing altogether (see critical discussions in
Sampson 1985: 29–32; Harris 1986: 57–75; DeFrancis 1989: 3–64; Boone 2004:
313–17). Because it is not tied to any specific language we can classify it as ‘non-
glottographic’. Pronunciation can start to be represented once writing has become
glottographic and takes on the function of expressing language-specific words in
visual form. Because written words have spoken equivalents expressed through
pronunciation, it becomes possible to link the visual marks of writing with
recurrent aspects of pronunciation and to systematise these links into explicit
sound–spelling correspondences. Once this happens the resources are there for
pseudo-notation and pseudo-transcription. One question for us is when and how,
and also why, the conditions for this have arisen in the history of writing.
2.2 Phonographic Processes in Writing Systems

Phonographic writing could not have come into existence without some kind of
analysis of pronunciation, albeit of a pre-theoretical kind. Characters of written
language take on, in addition to their status as written language expression
elements, the status of pseudo-phonetic symbols representing properties that
auditory-perceptual experience suggests are shared by the expressions of differ-
ent spoken words. Historically, these properties have been at various levels: the
whole word-form, the syllable, the segment, or segment constituent.
2.2.1 The rebus principle
A simple kind of analysis of pronunciation is that which enables homonymic

relations to be established. It is on this kind of analysis that the rebus principle
rests. Supposing we had in English the logogram <۲> for the word rye. We could
use it to represent the homophonous word wry as well. To recognise homo-
phones one has to pay attention to the pronunciations as well as the meanings
of the words and be able to notice that they sound the same, although without
necessarily being able to give any sort of phonetic account of the similarity. The
judgement as to the sameness of pronunciation only need be holistic for rebus
writing, so there is no call for analysis of the pronunciation into any constituent
parts and no notion of a ‘speech sound’ other than the sound-impression of the
spoken word-form as a whole. Punning exploits homophony, occurring among
non-literate as well as literate speech communities. Rebus writing is an early step
in the phonographic orientation of writing although, as Harris (1986: 67) points
out, it is still logographic. In the above example the word wry is represented by
the logogram <۲> just as much as is the word rye, but the choice of that particu-
lar logogram is made by reference to pronunciation and is therefore phonetically
motivated, whereas logograms themselves typically have semantically moti-
vated origins, although these may become opaque over time as has happened in
Chinese (Sampson 1985: 150; see examples of diachronic change in Li 1992).
The process by which <۲> would extend to be the expression of the written word
SomoudBarghouthy Origins and Development
wry is a process of pseudo-transcription in which ‘۲’, at least temporarily, has
39
the status of a pseudo-phonetic symbol, «۲», representing an abstraction of what

the pronunciations of rye and wry have in common. It would not have this status
when used as the expression of the written word wry, having instead the status
of a character.
Baines (2004: 163) cites studies advancing the claim that both logographic
and phonographic writing of Ancient Egyptian are exemplified in archaeological
finds dating from the late fourth millennium bce, and that the rebus principle
may already have been employed at that time. These finds from the site known
as tomb U-j at Abydos in Upper Egypt might be the oldest language-dependent
writing that we know about. If the archaeologists’ interpretations of the U-j finds
as reported in Baines (2004) are accurate, then the rebus principle may be as old
as writing itself, in which case phonography has been present in glottographic
writing since its beginnings.
2.2.2 Syllabography
If <۲> started to be used in the writing of all words containing the syllable [raɪ]
in corresponding spoken words – writing, riding, ripen, arise and the like – then
it would correspond recurrently and systematically to that syllable and could be
used as a pseudo-phonetic symbol to transcribe it. The invention of syllabograms
requires analysis of pronunciation at a deeper level than rebus writing. Instead of
the judgement of sameness being made over whole words it has to be made over
syllables, therefore requiring segmentation of speech into syllables even if sylla-
ble boundaries are not precisely or consistently established and there is no formal
definition of a syllable. The real significance of this only becomes apparent in
the context of polysyllabic words in which constituent syllables are themselves
meaningless, having no semantic or grammatical content. The pseudo-phonetic
symbol «۲» is now available to represent that abstracted spoken syllable on its
own, as an expression element divorced from content. The pre-theoretical model
denoted by the symbol can be defined ostensively as what pronunciations of
rye, wry, writing, arise and so on have in common. Divorcing expression from
content, it could be argued, is the single most important step that has to be taken
in order for any form of phonetic notation to develop. In languages where words
are generally monosyllabic, such as Chinese, it may not be so obvious that
expression can be divorced from content because all occurring syllables will be
word-forms. This might account for why written Chinese is not as phonographic
as most other written languages (Robertson 2004: 34). Although most compound
characters in Chinese consist historically of a ‘phonetic’ and a ‘signific’, i.e.
one character present for its spoken expression value and another for its content
value, the logic underlying the structure of compounds has lost its systematicity
due to three thousand years or more of pronunciation changes (Sampson 1985:
156).
Syllabography has arisen historically in contexts of what Wellisch (1978) calls
script conversion, using the writing system of one language to write another.
Script conversion often involves the reinterpretation of spelling elements such
that they change their relationships of correspondence with the spoken language
SomoudBarghouthy
in a phonographic direction. For example, as we saw in Chapter 1 Section 1.1.4,
the Akkadians and Japanese adapted, respectively, Sumerian and Chinese logo-
grams as syllabograms. The fact that script conversion tends to increase pho-
nographic orientation may be responsible for the view, current until relatively
recently and articulated particularly by Gelb (1969: 200–5), that there is some
teleology at work guiding the development of writing from hazy beginnings in
pictography to the polished clarity of alphabetic writing. This view has been
heavily criticised by Harris (1986), Olson (1994) and Coulmas (2003: 197–8) and
is hard to reconcile with a number of facts, chief of which is the observation that
most languages are written using stable mixed systems of writing in which logo-
graphic and phonographic elements co-exist. Akkadian happily continued to use
Sumerian logograms as xenograms alongside syllabograms derived from logo-
grams, and Japanese continues to do the same with its Chinese logogram-derived
kanji (Coulmas 2003: 74), although kana spellings are gradually replacing kanji
in some morpheme and word classes (Nomura 1988, cited in Smith 1996: 210).
2.2.3 The acrophonic principle
Acrophony takes a logogram or syllabogram and uses it to correspond to the first

sound in that word or syllable; it can then be used in the spelling of any word
containing that sound in its pronunciation. For example, our logogram <۲> for
the word rye could be used to correspond to the initial [r]; we could then use it
in the spelling of red, crab, berry and so on. It therefore takes pre-theoretical
phonetic analysis of pronunciation further than syllabography and provides the
means to represent speech as a segmental structure below the level of the sylla-
ble. Once speech is seen as segmental, and the segments are associated with indi-
vidual characters, they become objects with an abstracted existence of their own;
written characters, in addition to spelling words, can then take on the function of
representing these segments independently of the words they occur in, and we
have the conditions for a pre-theoretical kind of segmental pseudo-notation. The
character can be seen as denoting a pre-theoretical model abstracted from what
we perceive the spoken forms of red, crab, berry and so on to have in common.
Acrophony thus involves establishing an initial sound and separating it from
the rest of the pronunciation-form. The Ancient Egyptian consonantal signs, in
use by 3000 bce, came about through acrophony (Sampson 1985: 78) coupled
with the need to be able to write proper names, particularly foreign ones.
Segmental pseudo-transcription can therefore be said to date from at least this far
back in history, at least with respect to consonants. Examples of the manipulation
of expression elements as objects independent from the words they are used to
spell can be seen in the Early Dynastic inscriptions of Ancient Egypt. It is not
known if they were ever pronounced, but their significance lies in the conceptual
and physical separation of expression from content without which the develop-
ment of any form of phonetic notation and transcription would not be possible.
Centuries later, the Chinese in the third century ce developed fǎnqiè, a kind
of acrophonic procedure in which characters could be used for their syllable
onset values and others for their syllable rhyme values; writers could thus create
nonsense words by combining them to write non-occurring syllables in a pseudo-
phonetic transcription. Except for explicit phonetic analysis of tones, the Chinese
41
did not develop phonetic analysis and classification beyond division into onsets
and rhymes until phonetic scholarship came in from India some centuries later
(Halliday 1981: 131–5). Once phonetic analysis was incorporated as a result
of Indian influences, the syllabogram characters in fǎnqiè could be regarded as
changing their status from pseudo-phonetic to proto-phonetic notation.
2.2.4 The notion ‘segment’ revisited
In Chapter 1 Section 1.2.1, the notion of a speech sound as a discrete segment

realising a discrete phonological element was critically examined. We need
to return to it here in the light of the claim that the notion is dependent on the
prior existence of an alphabetic writing system. This claim has been advanced
by Faber (1992) using psycholinguistic evidence from studies of reading ability
alongside evidence from the history of writing. It sits comfortably with other
claims by scholars such Olson (1994) that written language provides models for
the analysis of spoken language (see Chapter 1 Section 1.1), and has become
quite strongly entrenched in modern linguistics. Fraser (2005: 116), for example,
in a generally insightful discussion of types of representation of speech, confi-
dently claims that ‘[i]t is well-established that it is only through acquisition of
alphabetic literacy that an analysis of speech into segments becomes available to
language users’.
Faber’s arguments can, I think, be met on two fronts: firstly, whether accept-
ance of her case means that the notion of segments has no legitimacy in linguistic
and phonetic theory; secondly, whether her case is persuasive and ought to be
accepted. On the first point, I think the answer has to be no. If it is true that seg-
mental awareness only arises among users of an alphabetic writing system, this
is no reason to regard the segment as an illegitimate analytic concept for phone-
ticians and phonologists. In the sphere of syntax, language users can only parse
sentences if they have been taught grammar, but we do not take this to mean
that we have to dispense with notions such as noun and verb, particle and affix.
Whatever contingencies might be responsible for the notion of a segment as a
constituent of the structure of speech, whether we should apply the notion or not
depends on how well it facilitates analysis. All theoretical notions are arbitrary,
but some are more appropriate than others. I agree with Laver (1994: 110) that
the segment is an appropriate notion in phonetic theory providing we understand
how to apply it. Regarding the persuasiveness of Faber’s arguments, I find it
lacking. Her arguments are essentially of two kinds: psycholinguistic and histori-
cal. The psycholinguistic evidence is cited in the main from three papers pub-
lished in an issue of Cognition. One of these studies is Morais, Bertelson, Cary
and Alegria (1986), in which illiterate and ex-illiterate (having become literate in
adulthood) speakers of European Portuguese were tested on various consonant,
vowel and syllable segmentation tasks. Illiterate subjects were able to segment
initial [p] with 18.6 per cent accuracy, compared to 62.5 per cent and 83.3 per
cent for poor readers and better readers respectively. Figures for vowel segmenta-
tion were 55.2 per cent for illiterates and 85.0 per cent for both groups of readers.
Literate subjects performed considerably better, but the task was by no means
SomoudBarghouthy
beyond all the illiterate subjects, refuting the claim that alphabetic literacy is a
prerequisite for consonant and vowel segmentation. Responses of illiterates were
15.2 per cent correct for separation of a [pl-] cluster into [p] and [l]. Another
study cited is Mann (1986), which compared phoneme awareness in school-age
Japanese readers of kanji and syllabaries, and school-age American alphabetic
readers. Awareness of phoneme-sized units was exhibited by Japanese fourth-
grade children (c. 9 years) who had had no instruction in alphabetic reading. In
the light of her results, Mann (ibid.: 89) suggests that ‘the capacity for manipu-
lating phonemes could be part and parcel of a language acquisition device’. The
third study cited by Faber is Read, Yun-Fei, Hong-Yin and Bao-Qing (1986),
which compared segmentation ability in two groups of literate Chinese speakers:
one group who had learned the alphabetic Pinyin spelling system in addition to
learning traditional Chinese characters, and one group who had only learned the
traditional logographic characters. Non-alphabetic readers scored 21 per cent
correct on non-words and 37 per cent correct on real words, compared to 83
per cent and 93 per cent correct responses by the alphabetic readers (ibid.: 38).
Again, the results confirm that segmentation skills are by no means completely
lacking in the absence of alphabetic knowledge and experience. All three studies
cited by Faber in fact suggest that segmentation at the level of individual sounds
can be performed by around a quarter of language users without prior familiarity
with an alphabetic writing system, although accuracy and consistency of per-
formance improve dramatically among those who are in the habit of using one.
There is anecdotal fieldwork evidence of illiterate speakers undertaking quite
sophisticated segmental analysis. Trubetzkoy (1937/2001: 37), for example,
relates how an illiterate Circassian speaker told him: ‘Where we pronounce a
strong s the H̤ak˚əc˚ pronounce it that way too, but in words where we pronounce
a very weak s, they replace it by č.’
The historical argument concerns the supposed uniqueness of the early Greek
alphabet in having letters for vowels as well as consonants and thus being fully
segmental. The introduction of vowel letters into the alphabet by the Greeks
was at one time hailed as a major intellectual advance on the Semitic abjad
(Carpenter 1933), suggesting implicitly or explicitly that the Semitic speakers
had lacked the insight into spoken language structure to appreciate the existence,
or importance, of vowels (see Bernal 1987b: 393–9). The segmental nature of
the Greek alphabet as it existed after being adapted from the Canaanite abjad is
explained by Faber, following Sampson (1985: 100–2; see also Gelb 1969: 181,
and Coulmas 2003: 127), as having arisen not through segmental analysis but
through a misinterpretation of certain letters which corresponded in Semitic lan-
guages to consonants that had no equivalents in Greek. The Greeks instead used
them to represent Greek vowels similar in quality to the vowels in the Canaanite
letter-names (Allen 1981: 115), perhaps thinking that this was how they had
always been used. Once this had happened, and only once this had happened,
Greek letters could be seen as representing individual discrete vowel sounds as
well as consonant sounds. Beforehand, the notion ‘segment’ in relation to vowels
could not, according to Faber, be said to have existed.
The historical evidence, to my mind, supports a contrary view. The practice
of matres lectionis in archaic Semitic writing shows clearly that resources for
representing vowels had in fact been developed before the Greeks, during the
43
second millennium bce (Gelb 1969: 197), and may in fact have influenced the
Greek usage of vowel letters (Bernal 1987a; Coulmas 1996: 329). In matres
lectionis (‘mothers of reading’), letters corresponding to glide consonants were
used to indicate vowels of a similar auditory quality to the glides. The letters
corresponding to consonantal /w/ and /j/, for example, were used to indicate the
long /uː/ and /iː/ vowels. Pairing of semivowels and vowels relies on accurate
recognition of phonetic similarity and suggests that experimental observation
may have been involved in the process: it is by holding steady in the form of
discrete sounds the articulation of [w] and [j] that one observes them becoming,
respectively, [u] and [i]. Characters corresponding solely to vowels date in fact
from very early in the history of writing. They are found in Ancient Egyptian
from before 2000 bce (Gelb 1969: 168). Although they were not used very
often, and never became systematically integrated into the Ancient Egyptian
writing system, they attest to awareness of vowel sounds separate from conso-
nantal sounds a long time before the Greek alphabet appeared. They cannot be
explained away as mistakes arising from the adaptation of a writing system to
another language with a different inventory of consonants. Even a consonantary
without any letters corresponding to vowels would attest to the same ability to
segment as an alphabet containing vowel letters. The only difference is that the
vowel segments have no corresponding letters. If a spoken [CVC] structure cor-
responds to a written <CC> structure, then the [V] has been left out of account,
but it can only be left out by detaching it from the Cs, unless one claims it was
simply not noticed at all. The small vowel inventories of Ancient Egyptian and
Semitic languages, and the lexico-semantic stability of their consonantal roots,
placed less importance on vowels than on consonants for word identification.
Vowels mainly expressed inflections; their distribution would have been much
more predictable from grammatical context than in an Indo-European language
like Greek, so their representation in writing was not so necessary. It is still
the case in Arabic, a modern Semitic language, that written texts are typi-
cally unvowelled for precisely these reasons. Although our word ‘consonants’
implies their dependence on vowels, in traditional Sanskrit grammar the word
is vyañjana, which, according to one authority, comes from the verb vy-añj- ‘to
manifest’ because consonants manifest meaning (Allen 1953: 81). Because lan-
guages tend to have many consonants but fewer vowels, consonants will have a
higher functional load and differentiate word-forms more than vowels.
Faber (1992: 127) regards the Chinese fǎnqiè as non-segmental and adduces it
to support the view that the segment is a notion dependent on alphabetic writing,
not one that helped to shape it. However, the fǎnqiè process of separating a syl-
lable onset from a syllable rhyme will result in segmentation into a consonant
and a vowel in any open syllable with a single onset consonant. Although there
is now some doubt whether CV syllables are universally the first syllable type
to appear in language acquisition (Savinainen-Makkonen 2007), it is generally
accepted that CV and V are the most widely attested syllable types across the
world’s languages, being found in all known languages (Kenstowicz 1994: 254),
and CV is certainly an extremely common syllable type in Chinese (Yip 2000:
20). If the syllable is the basic unit of production and perception (Levelt and
SomoudBarghouthy
Wheeldon 1994), then, as Warren (2008: 201) points out, speakers and listeners
will have direct access to monosyllabic lexical items, and if the structure is V
the process of inferring segmental content will be maximally easy. To infer the
segmental content of a CV syllable only requires recognition that something has
been appended to the V. This analytic process can be repeated to deal with more
complex syllables.
The history of phonographic writing contains, from its earliest stages, evi-
dence that language users were able to segment speech into the same kinds of
consonantal and vocalic elements that IPA symbols denote. Segmentation may
even predate writing, if the Indian phoneticians of the first half of the first mil-
lennium bce did not use a writing system. Segmentation is identified by Allen
(1953: 18–19) as the second of the three main stages of ancient Indian phonetic
analysis – between articulatory processes and prosodic features – resulting in the
establishment of much the same consonantal and vocalic segments as modern
analysis would establish (see table in Allen 1953: 20). Daniels (2001: 70) claims
that by the time writing reached India discrete consonants and vowels were
already fully understood.
Whether the segment in speech is a ‘natural unit’ of auditory perception or is a
notion that arose when people deliberated about how language could be analysed
or written amounts to the same thing: that the human mind is capable of apply-
ing a segmentation procedure to spoken language without the idea having been
suggested by alphabetic letters adapted through misinterpretation. Perceptual
and cognitive constraints determine which kinds of properties of the speech
signal tend to be noticed and, as a result, can come to be regarded as objects
which combine together to build speech. Pre-literate children’s sensitivity to syl-
lables and to onset–rhyme division, as evidenced in studies such as Bowey and
Francis (1991), and evidence from naming tasks that speakers parse syllables
and store them in their mental lexicons (Levelt and Wheeldon 1994), attest to a
perceptual-cognitive bias in humans which may be responsible for driving the
development of writing in the direction of syllabograms and alphabetic letters
via rebus writing and acrophony. The same biases seem to underlie the poetic
devices of alliteration, assonance and rhyme which are found in pre-literate oral
poetry as well as in written literatures (Finnegan 1977: 93–6). The prevalence
of CV syllable types in languages means that these perceptual-cognitive biases
will encounter ample input to feed and reinforce an analysis into two segments:
a consonant and a vowel. These then become models in terms of which analyses
of more complex structures can be made. We ‘find’ segments in the structure of
speech not because they are there in any physically objective sense, but because
we are predisposed to conclude they are there, either innately or through learning.
Modern physics describes a world very different from the world as it appears to
us, or the way it appears to a bee or a bat (see for example Nagel 1974). What
causes it to appear to us the way it does is our perceptual-cognitive make-up. A
physical description of the world includes descriptions of pressure-waves, but
we do not experience speech as pressure-waves; we experience it as sound with
a concatenated structure (see Chapter 5 Sections 5.1, 5.7 and 5.8). The phenom-
enologist Merleau-Ponty (1945/2002: 240, original italics) comments that ‘seen
from the inside, perception owes nothing to what we know in other ways about
the world, about stimuli as physics describes them and about the sense organs as
45
described by biology’.
If the notion ‘segment’ or ‘speech sound’ exists as a pre-theoretical model
in a listener, and if it is available to take part in complementary processing (the
interaction of auditory input with higher-level stored information; see Chapter 5
Section 5.4), it will predispose the listener to experience speech as consisting of
segments of the kind that can be produced as isolated sounds. It could arise as a
pre-theoretical model through judgements about what is common to words such
as bee, bar, boot and bee, tea, key, etc.
2.2.5 Subsegmental analysis
So far we have seen that phonography has existed from very early in the history
of literacy, perhaps from its very beginnings, and that the units implicated in pho-
nography can be whole word-forms (rebus writing), syllables (syllabograms) and
phoneme-like segments (the acrophonic principle). Phonography can also impli-
cate units smaller than the segment. The introduction of naqt ‘diacritical point-
ing’ into the Arabic alphabet in the seventh century ce indicates that analysis of
place and manner of articulation of Arabic consonants had already been carried
to some level of sophistication. For instance, the Aramaic letter <‫ >ح‬was used in
early Arabic writing not only for spelling words having the pharyngeal /ħ/ in their
spoken form, but also for words with uvular /χ/ and postalveolar /ʤ/ (or possibly
palatal /ɟ/), resulting in homographs and near-homographs. A diacritical dot was
placed over the letter to create a new letter <‫ >خ‬corresponding to /χ/, and placed
below to create <‫ >ج‬corresponding to /ʤ/. According to Revell’s analysis of the
pointing system (Revell 1975: 182–3), the criterion for dot placement was place
of articulation: it was placed above for sounds further back in the vocal tract
and below for those further forward. Arabic phoneticians, like the earlier Indian
phoneticians, started their descriptions at the back of the vocal tract, which they
described as being ‘higher’ than the front. Diacritical dotting can thus be seen to
be motivated by the iconicity of this perspective.
The Hebrew dagesh diacritical dotting was introduced for similar reasons of
disambiguation. It indicated a stop consonant while its absence corresponded to
a homorganic fricative (Coulmas 2003: 116).
In the Japanese katakana and hiragana syllabaries the niguri (double slanted
‘ditto’ marks) and maru (small circle) diacritics represent voice and voiceless-
ness respectively (cf. the IPA voicelessness diacritic [ ]̥ ) when added to CV
syllabograms in which the C corresponds to labial /p/ or /b/: katakana ビ cor-
responds to /bi/, and ピ corresponds to /pi/; hiragana び corresponds to /bi/, and
ぴ corresponds to /pi/. The niguri was introduced in the twelfth century CE, the
maru in the sixteenth (DeFrancis 1989: 135). They are added to a base character
which on its own corresponds to a CV syllable where the C is /h/.
Uniquely in the history of written language, there is one example of a writing
system designed so that characters consistently correspond to elements smaller
than segments. The Hangŭl (also spelled Han-gul, Han’gŭl, Hankul, Hangeul)
system, developed in the fifteenth century ce for writing Korean, is founded
on an analysis of consonants and vowels into component articulatory features
SomoudBarghouthy
(Sampson 1985: 124–9; King 1996: 219–20). The importance of this has been
downplayed by some on the grounds that not all the features needed are rep-
resented, and that literate Koreans are unaware of a featural dimension in the
system (DeFrancis 1989: 196–8). Nevertheless, Hangŭl does provide examples
of feature-level correspondence between written and spoken forms similar in
principle to the examples furnished by Arabic and Hebrew pointing, and by the
Japanese niguri and maru diacritics, but extending through almost the whole
system rather than being peripheral additions. A proper phonetic notation based
on Hangŭl characters has in fact been developed and is exemplified in the 1999
IPA Handbook (p. 123; see also Chapter 3 Section 3.1.1).
2.2.6 Diffusion and borrowing of writing systems
When a writing system which was developed for one language is borrowed to
write another language it often has to be adapted to suit the structural properties
of the borrowing language (Wellisch 1978). In addition to differences of mor-
phological structure, there will also be different consonants and vowels, so that
close attention to the pronunciation of words is necessary in order to decide how
to deploy the writing system. An ability to compare pronunciations in the two
languages would seem to be an obvious prerequisite for adapting any elements
of a writing system to write another language phonographically.
The process by which the spelling units of one language are used to write
another language can be modelled using the concept of pseudo-transcription as
diagrammed in Figure 2.1. Spelling units from language A are interpreted as
standing for sounds on the basis of sound–spelling correspondences in language
A. They are then used to stand for the sounds in the expression of spoken words
in language B. That is to say, they are used as a pseudo-notation to make pseudo-
transcriptions which become the expressions of the written signs in language B.
LANGUAGE A LANGUAGE B
SPOKEN SIGN WRITTEN SIGN SPOKEN SIGN WRITTEN SIGN
content content content content
expression expression expression expression
Pseudo-transcription
FIGURE 2.1: Units used for spelling the written signs of language A are
used for representing the pronunciation of spoken signs in language B.
This pseudo-transcription then becomes the spelling for the written signs
in language B.
SomoudBarghouthy
2.2.7 Anti-phonography
Origins and Development 47
There are counter-influences to phonography at work in the world of writing

which can be seen clearly in modern English spelling. Although written English
is predominantly phonographic, different spellings for homophonous words such
as hair–hare and too–two pull in a direction away from the rebus principle and
represent a resistance to phonography which displeases proponents of spelling
reform but enables ambiguity to be avoided in written texts. Attempts by spelling
reformers to change this through further phonographic orientation have not met
with any great success among the general literate English-speaking public, who
seem to value the morpho-phonographic features of English spelling whereby
alternations correspond to invariant spellings. Among the beneficiaries of this
resistance are the regular plural/possessive/third singular present alternants all
spelt <-s>, the regular past tense/past participle alternants all spelt <-ed>, and
stems that undergo pronunciation changes in suffixation but whose spellings
remain intact, e.g. atom~atomic, climate~climatic, photograph~photography.
Invariant spellings facilitate lexical and grammatical recognition in reading and
prevent what would otherwise be an increase in the number of spelled forms that
have to be learned and remembered.
The use of logograms in written languages alongside phonography suggests
that phonography does not necessarily yield preferred resources for writing.
Logograms maintained a vivid presence in Ancient Egyptian writing for three
millennia, actually increasing from some 700 in the Middle Kingdom (c. 2000
bce to 1650 bce), when a full consonantary was already in use, to around 5,000
in the Graeco-Roman period (332 bce to c. 400 ce) (Ritner 1996: 74). In China,
the introduction of phonographic roman Pinyin spellings was not intended to
replace traditional Chinese characters and shows no sign of doing so, despite
some voices calling for this since the Cultural Revolution in the 1960s (Wellisch
1978: 77–81). Although there are indications that logographic Japanese kanji,
Chinese characters that spell Japanese translation equivalents as xenograms, are
in decline in some word classes in written Japanese (see Section 2.2.2), they
are still very much a living part of the written language despite the presence of
the highly phonographic hiragana and katakana syllabaries. As Coulmas (2003:
180) points out, logography seems to have some appeal, which may also help
to explain the practice of xenography. If written language is supposed to repre-
sent spoken language then xenography is an exceedingly strange way to do it.
Commenting on the continuing use of logographic resources in writing systems,
Cooper (2004: 92) warns us not to underestimate ‘the ideological investment
a culture has in its traditional script’. The power of ideology is also evident in
modern spelling reform debates; see for example Johnson (2005: 119–48) for a
critical analysis of the debate on the 1996 German orthographic reforms.
It is interesting that Korean Hangǔl writing, devised according to phonetic
analysis and therefore unambiguously phonographic in conception, has become
increasingly morpho-phonographic over the centuries (King 1996: 223), receiv-
ing an official impetus in this direction in 1933 with the publication of the
Guide for the Unification of Korean Spelling by the Korean Language Research
Society (Sampson 1985: 139). Hangǔl readers and writers have thereby shown
SomoudBarghouthy
a preference for spellings to be invariant with respect to morphemes rather than
with respect to pronunciation.
In examples of what I have called ‘anti-phonography’ in writing – and more
could be given – greater importance is put on lexical and grammatical iden-
tity than on sound–spelling correspondences, a practice inconsistent with the
Aristotelian view that written language consists of signs for representing spoken
language.
2.3 The Development of Phonetic Theory

I have discussed how phonographic processes in the history of writing made the
expression units of written language available as a resource that can be used for
what I have termed pseudo-notation and pseudo-transcription. We have seen that
language users seem to have been able to deploy characters in this capacity for
writing proper names and in adapting writing systems from other languages from
very early on in historical times. However, for characters to become a proto- or
proper phonetic notation system there has to be, as I have said earlier, a body
of theoretical phonetic knowledge that can provide phonetic definitions and
interpretations for the elements of the notation system. This existed at various
levels of sophistication in the ancient world in India and Greece, and in medieval
times among the grammarians of the Middle East, but in western Europe ‘[t]he
discipline of phonetics did not appear until the early modern period’ (Law 1997:
262). The lack of interest in phonetics in the Europe of the Middle Ages (Robins
1990: 87) is symptomatic of a wider lack of interest in observational method.
Philosophy in Europe at that time was overwhelmingly theological; debate
among medieval scholastics concerning the correct way to obtain knowledge
tended to revolve around whether divine revelation was the only source of true
knowledge or whether knowledge could also be arrived at through human reason-
ing. Everyday observable facts were hardly accorded any importance (Russell
1961: 428). But phonetics cannot really be studied if this epistemological attitude
prevails. Attempts to establish the facts of speech production can only be founded
on observation. This may be why at the present day those phonologists in the
generative tradition who take a rationalist stance on linguistics tend not to be
much interested in phonetic detail or its representation (for example, Bromberger
and Halle 2000: 24–5).
In order to trace how proper phonetic notation evolved from pseudo-notation
and proto-notation, in the sections that follow I review the emergence of the
main theoretical approaches to phonetics in the pre-Modern world up to the
European Renaissance (Section 2.3.1), and the Early Modern world up to the late
eighteenth century (Sections 2.3.2 and 2.3.3), followed by crucial developments
during the nineteenth century (Section 2.3.4) which end with the establishment of
the International Phonetic Association. The Association set the general tenor of
what phonetic notation would be like up to the present day (see Chapter 3 Section
3.4.5). Phonetic theory continues to develop, pushed along by technologies such
as sound spectrography, laryngoscopy and other instrumental means of phonetic
research, but the basic formula of the International Phonetic Alphabet, roman-
based base symbols with diacritics, keeps pace and continues to provide for the
transcriptional needs of phoneticians. For an account of the first century of the
49
IPA, see MacMahon (1986).

The chapter concludes with a section comparing and contrasting sound–spell-
ing and sound–symbol relations, and a final section on spelling reform.
2.3.1 Phonetic theory in the pre-Modern world
As far as we know, theorising about pronunciation was first indulged in in

ancient India before the time of Pānini, possibly in the absence of written lan-
guage (Allen 1953: 15; Varma 1961: 12; Misra 1966: 19 – but see Bronkhorst
2002) and therefore possibly without the resources for notating sounds in any
manner at all. Consonants and vowels were classified according to the articula-
tory criteria of place, manner and voicing in much the same way as in the modern
IPA system of phonetic classification. In fact Allen (1953: 7) takes the view,
with regard to the development of phonetics in western Europe in the nineteenth
century, that ‘Henry Sweet takes over where the Indian treatises leave off.’ The
motivation behind the development of phonetic theory in ancient India was reli-
gious. Sacred Vedic texts were recited, not written, and accuracy of pronuncia-
tion was highly valued to the point where mispronunciation ran one the risk of
damnation. In order to instruct believers in correct pronunciation it was necessary
to understand how speech is produced. When they had use of the Brāhmī and
Brāhmī-derived alphabets from about the third century bce, the Indian phoneti-
cians did not explicitly distinguish between letters as units of spelling and letters
as symbols for representing aspects of pronunciation. Letters thus had a dual use:
as units of expression for written language and, because phonetic descriptions
attached to them, as transcription symbols for representing the expression units of
spoken language. Because their descriptive framework for classifying consonants
and vowels was the product of theorising about speech production, each letter
in its pronunciation-representing capacity was part of a proto-phonetic notation
system. In the hands of the Indian phoneticians the letters could be used as proto-
symbols having precise phonetic definitions, and therefore could be used for
proto-transcription. These phoneticians could read written language either, like
the literate layman, as spellings for words, or as representing a phonetic analysis
of spoken language.
In ancient Greece the rudiments of a science of phonetics, including a division
into consonants and vowels, can be seen in writings by Plato and Aristotle in the
fourth century bce. It developed further under the Stoics in the third to second
centuries bce. As was the case with the Indian grammarians, the motivation
behind the study of phonetics in Greece was often prescriptive. Grammarians
wished to preserve the pronunciations of Hellenic Greek and protect them from
changes taking place due to koineisation and the spread of Greek to speak-
ers of other languages whose pronunciation of Greek was influenced by those
languages (Robins 1990: 20). The Greeks developed methods for phonotactic
analysis and analysed sounds into manners of articulation, dividing them into
stops and continuants and setting up three triads of aspirated–unaspirated–voiced
plosives. Although the Greeks fell short of an accurate account of voicing, there
are hints in certain texts that they understood more about it than they have often
SomoudBarghouthy
been given credit for. Terminology was used with explicit phonetic definitions
such that alphabetic letters came to have phonetic definitions associated with
them, giving them the status of proto-symbols for use in phonetic analysis and
proto-transcription in addition to their status as letters for use in spelling. It is
notable, though, that no terms were coined by the Greeks, or by the Romans after
them, for denoting places of articulation.
Turning attention to the Middle Eastern grammarians of the medieval period,
it has been suggested that they learnt their phonetics from India (Danecki 1985).
However, there is no direct evidence for this and the circumstantial evidence is
very thin (Law 1990); Bakalla (1983: 49), for example, believes that ‘Arabic
phonetics grew up largely independently of the general scientific tradition of the
pre-Muslim world.’ Greek influences may be more likely (Semaan 1963: 10;
Versteegh 1977: 21–5; Odisho 2011), although Carter (2007) argues against this
possibility, pointing out that Arab scholars were careful to acknowledge external
sources but no such acknowledgments are found in their phonetic writings. We
have already seen that the deployment of diacritical pointing in written Arabic
around the late seventh century ce was guided by phonetic observation. By
the time of Sībawayh, the most renowned of the medieval grammarians of the
Middle East, in the late eighth century ce, a situation existed similar to that which
obtained in India over a thousand years previously: there was a comprehensive
framework for phonetic classification based on careful observation of articula-
tory processes in which the letters of the Arabic abjad were given phonetic defi-
nitions, and allophonic and dialectal variants were described (Al-Nassir 1993).
The Middle Eastern grammarians therefore had the means at their disposal for
proto-phonetic notation and transcription. In fact advances were made beyond
the bounds of the writing system when ways were devised of notating features
such as vowel nasalisation, which is not contrastive in Arabic. Bakalla (1983:
55–7) relates that dots, circles and superscript letter-shapes were used for this
purpose in the tajwīd tradition for instructing correct recitation of the Qur’ān.
These non-orthographic resources can be regarded as proper phonetic notation
according to the definition proposed in Chapter 1 Section 1.3. Similar transcrip-
tional devices were independently invented by Iceland’s ‘First Grammarian’ in
the twelfth century ce (Haugen 1972: 15–19), attesting to a phonetic knowledge
which has been described as unrivalled in western Europe at that time (Robins
1990: 82; Vineis and Maierú 1994: 187) but which remained virtually unknown
until the nineteenth century. The ‘First Grammarian’ carried out a classificatory
analysis of Icelandic vowel distinctions based on length, nasality and openness
and, significantly for us, proposed new letters for them by systematically adding
diacritics to the five vowel letters of the roman alphabet (Haugen 1972: 15–19,
34–41; and see Chapter 3 Section 3.4.1). There was no attempt, however, to clas-
sify consonants other than by their letter-names, and noting that whether their
names have a CV or VC structure correlates with the stop–continuant distinction:
 is called ‘bee’, <f> is called ‘eff’ etc.
Prescriptivism provided the initial motivation for phonetic scholarship in the
medieval Middle East in much the same way as in ancient India and Greece.
Accurate pronunciation of the Qur’ān was and remains important for Muslims.
New converts whose first language was not Arabic had to be taught how to recite
sacred verses, but in the ideas of the Middle Eastern phoneticians one can see an
51
interest in phonetics for its own sake, reaching levels of analysis over and above
what is required for instruction in ‘correct’ pronunciation. Commenting on the
Sirr al-Sinā‘at al-‘Irab ‘The Secret of the Inflectional Endings’ by Ibn Jinni
(tenth century ce), which is ostensibly a prescriptive work, Mehiri (1973: 76)
describes it as ‘un véritable traité de phonétique’. Ibn Jinni likened the vocal tract
to a flute through which air is blown, with the places of articulation functioning
like the finger-holes to give different qualities of sound. This is the insight of a
phonetician, not a prescriptivist.
The first known diagram of the vocal tract appeared in the late twelfth- or
early thirteenth-century Arabic treatise Miftāh al-‘Ulūm ‘Key to the Sciences’ by
Al-Sakkākī and is reproduced in Figure 2.2. Each letter is written beside the place
of articulation of the corresponding consonant. We can interpret the diagram to
the effect that the letters become proto-symbols and the places of articulation are
identified as part of the theoretical models that the proto-symbols denote. I am
not, of course, claiming that Al-Sakkākī would have explained it in these terms.
FIGURE 2.2: Late twelfth- or early thirteenth-century vocal tract diagram

entitled Sūrat makhārij al-hurūf ‘Picture of the outlets of the letters’ from
Miftāh al-‘Ulūm ‘The Key to the Sciences’ by Al-Sakkāki. Dotted line
indicates the nasal passage with a nostril above the lip.
2.3.2 Phonetic theory in the Early Modern world
Challenges to medieval European modes of thought brought in the Renaissance

at around the time that vernacular languages were gaining status in Europe. A
burning question in many quarters was how these languages, regarded heretofore
SomoudBarghouthy
as inferior illiterate dialects, should be written. Attention to this question, along
with a more empirical approach to knowledge, was probably a major impetus to
the emergence of phonetic theory in western Europe in the sixteenth and seven-
teenth centuries.
In deciding how words in French, Italian, Spanish and other Romance ver-
naculars should be spelled, two guiding principles came into conflict, namely
etymology and pronunciation. Proponents of etymological spellings tended to
be Roman Catholic by religion and socially hierarchical, desiring to show close
links between their own spoken language and Latin, the language of the Roman
Catholic Church. By contrast, those who favoured taking pronunciation as the
guide tended to be Protestant and socially egalitarian. They saw etymological
spellings as a barrier to literacy for the population at large and an attempt to pre-
serve written language for social and religious elites. An influential figure in the
fight against etymological spellings for French was the Calvinist Louis Meigret
in the mid-sixteenth century. Spoken French had drifted further from its Latin
origins than most other Romance dialects and there was an anxiety that phoneti-
cally based spelling would not only seriously obscure the Latin etymologies but
also create large homograph sets and render grammatical and lexical identities
opaque. Meigret, however, did not accept these objections, taking his justifica-
tion from the Aristotelian thesis that writing is the representation of speech. He
regarded any spelling that was not true to pronunciation as a ‘superstition’ – we
can perhaps see contempt for Roman Catholicism in his use of this term. Meigret
went to the length of insisting that some of his works be printed in his own
phonetically motivated respellings, as a result of which they were not widely
read (Tavoni 1998: 25). A somewhat similar fate befell Le Maître phonétique,
the forerunner of the current Journal of the International Phonetic Association,
which published its contributions in IPA notation until 1971. Daniel Jones was
lamenting already in 1912 that because of this policy ‘many valuable articles are
simply lost to the world’ (Collins and Mees 1999: 128). A compromise form of
writing French was proposed in which phonetic spellings would be written on a
lower line with etymological ones above wherever the etymology was obscured
by a phonetic spelling (Tavoni 1998: 25). Like many compromises, it pleased
no one and no one took it up. The headmaster of St Paul’s School in London,
Alexander Gill, practised a more acceptable kind of compromise for English,
resorting to etymology only where sounds he described as ‘indistinct or waver-
ing’ made phonetic spelling problematic. He seems to have been referring to
reduced vowels and proposing that non-reduced alternants should motivate their
spelling, a strategy found in some phonological analyses of English schwa, for
example Hammond (1999: 206), and which in effect is what English spelling does
anyway. Another compromise was proposed by Desainliens (aka Holyband), in
which unnecessary letters were to be retained but identified by ‘a speciall marke’
(Desainliens, The French Littelton, Dedication, cited in Danielsson 1955: 65).
It is not hard to see that this would make spellings even more complicated and
written texts more taxing to read.
Attention to the spelling of vernacular languages in Europe was not confined
to the Romance world. The same debates were going on in Germany, Denmark,
the Netherlands and England, often mixing nationalism into the arguments to
advocate spellings that would mirror the national tongue and mark it as different
53
from neighbouring cognate languages.

The egalitarians who favoured the phonetic orientation of spellings over the
etymological were following the injunction of Quintilian in the first century ce to
write a language as it is spoken rather than speak it as it is written. Writing it as
it is spoken is, in the absence of phonetic theory, to practise pseudo-transcription
by prioritising the identity of sounds in spoken language equivalents over the
identity of words and morphemes in written language. It means that awareness of
pronunciation is sharpened and before long a need is felt for a better understand-
ing of speech and speech sounds. When this need is felt acutely enough it can
only be satisfied by developing a theoretical approach to phonetics.
A nascent general phonetic theory can be seen in sixteenth-century western
Europe in the works of Jacob Madsen in Denmark and Petrus Montanus in the
Netherlands (Kemp 2006: 473–7), who coined hundreds of new technical terms
but had little subsequent influence (Abercrombie 1993: 311), but it gained its
strongest momentum in England in the work of John Hart (c. 1501–74) and other
scholars of the time who were motivated by a commitment to spelling reform
in the wake of the sound–spelling dislocations occasioned by the English Great
Vowel Shift, and by an interest in observing how speech sounds are made. They
are the first of the ‘English School of Phonetics’ discussed by Firth (1946; see
also Albright 1958; Collins and Mees 1999: 455–71). Hart acknowledged Meigret
as a key influence on his thinking and rejected etymological spellings almost as
vigorously, arguing strongly in favour of phonetic spellings. Speech sounds he
likened to Aristotelian ‘elements’ and regarded letters as ‘their markes’ and ‘the
Images of mannes voice’ (Hart 1551: 29–34, in Danielsson 1955: 118). These
views are similar to those of Sir Thomas Smith (1513–77), an English diplomat
stationed in Paris, who wrote that ‘writing may truly be described as a picture of
speech’ (Smith 1568: 5, in Danielsson’s edition, 1983: 31). Smith puts forward
an Aristotelian case for the naturalness of sound–letter relationships, despite
recognising that writing takes its nature ‘by a postulate’ rather than, as he says
speech does, ‘by creation’. Arguing syllogistically that ‘if a by itself is a, and b,
b; taken together they make ab’ (Smith 1568: 8, in Danielsson’s edition, 1983:
43), he claims that for spellings to disturb this simple orthographic logic upsets
the natural order, for example using digraphs such as <th> and <sh> for single
sounds; curiously, though, he has no objection to a single letter standing for
a cluster of two sounds, as <x> for final /-ks/, even proposing Greek <ψ> for
English final /-ps/, which suggests he did not fully understand the archiphonemic
nature of <ψ> in Greek orthography (see Trubetzkoy 1933/2001: 12 n.1). Hart
displays a similar attitude when he makes the case for writing to be governed by
‘due order and reason’ (Hart 1569: title page) instead of the disorder he saw in
contemporary English spellings.
Hart’s descriptions of the production of sounds are more perceptive and
detailed than Smith’s, and on the whole reasonably accurate as far as they go.
He noted the presence of aspiration in English voiceless plosives, which Smith
did not (though he remarks on it in Welsh), and represented it in writing, for
example writing pipe as <p-heip>, albeit somewhat inconsistently in relation
to /t/ and /k/ (Jespersen 1907: 13–14). He did not provide any description
SomoudBarghouthy
or explanation of aspiration, though, beyond saying that ‘ui brẹð ðe h softli’
(Hart’s spellings). There are other important gaps in Hart’s accounts. He offers
no description of the production of [l], for example, and nor did Smith; and
although Hart distinguished between voiced and voiceless sounds, like the
Greeks and Middle Eastern grammarians he did not appreciate the mechanism
of voicing, describing the difference only in auditory-impressionistic vocabu-
lary such as ‘soft’ (voiced) and ‘hard’ (voiceless). Salmon (1995: 142–6) gives
an account of Hart’s attempt to establish triads of aspirated–voiceless–voiced
stops in English on Thrax’s model for Greek, abandoning it when faced with
the facts of his own phonetic analyses of English sounds. Smith also mentions
the Greek categories as subdivisions of the ‘mute’ consonants, but never actu-
ally fully applied the terms to English, probably because he was unable to make
them fit. Moreover, his statements that /p/ and /t/ are the same in English as in
Latin indicate either that he was unaware of the unaspirated–aspirated differ-
ence between the two languages, or that he was referring to an English-accented
Latin.
Both Hart and Smith realised that the phonography of the Latin alphabet was
inadequate for expressing the sounds of English and devised some notational
devices of their own (see Chapter 3 Section 3.4.1). If we take their respec-
tive versions of letters for /ʃ/ and look at how they defined them we can see
the extent to which their definitions are theoretical or ostensive.1 Taking first
Smith’s [ ], which he names [ɛʃ], he gives a list of keywords such as she, shed,
shine, ash, blush but provides no description of how the sound is produced. An
experimental analysis is performed in which he compares it on the one hand to
the sequence [sh-] constructed by prepending [s] to hell in order to show that the
result does not sound like shell, and on the other to the sequence [sj-] constructed
by prepending [s] to yell in order to show that this yields a pronunciation more
like shell. Smith thus defines [ ] ostensively and justifies it experimentally by
drawing attention to its palatality (without identifying it as such) but does not
offer an account of its production.
Hart gives two descriptions of the production of [ʃ] (Hart 1569: §38b, in
Danielsson 1955: 195; Hart 1570: §2b, in Danielsson 1955: 242) for which he
provides the new letter <ȣ>. Both descriptions are less than precise about tongue
configuration, saying that the tongue is drawn ‘inward’ to the upper teeth and
that [ʃ] is distinguished from [s] and [z] by the tongue not touching the palate. In
contrast to Smith, Hart does attempt to define the uniqueness of [ʃ] in articulatory
terms, although not as accurately as Danielsson (1955: 221) is prepared to give
him credit for. But it does mean that of the two, Hart is the more theoretically
inclined in providing an interpretation of his letter which is not solely osten-
sive. Consequently, Hart’s [ȣ] has more of the proper phonetic symbol about it
than Smith’s [ ] and reaches a level of phonetic description comparable to that
achieved by the medieval Middle Eastern linguists such as Sībawayh and Ibn
Sīnā (Avicenna), whose descriptions of Arabic [ʃ] refer to a narrowing relation
between the middle part of the tongue and the hard palate (El-Saaran 1951: 247;
Semaan 1963: 39–40; Al-Nassir 1993: 15).
Danielsson (1955: 54) is clear that Hart ‘had devised his new orthography to
serve both as a reformed spelling of English and as a general phonetic alphabet’.
Hart’s primary aim, however, was to reform spelling. In so far as he developed a
55
phonetic theory it was to guide orthographic decisions away from the irregulari-
ties and morpho-phonographic tendencies of English spelling firmly towards a
completely phonographic writing. His notation was there to provide the resources
for it. It is clear that he desired to go a long way in the direction of phonography
to provide spellings which are ‘shallow’ in Sampson’s (1985: 43–5) sense of
being close to the surface phonetics of speech. His distinct spellings for strong
and weak forms of English gradable words show sensitivity to differences in
their pronunciation, and he provides spellings for assimilated and elided forms –
for example, weak-form and spelled as <ănd> before vowels and <ăn> before
consonants, as with <z> before voiced sounds and <s> before voiceless ones
(Danielsson 1955: 187). Although primarily a spelling reformer, Hart shows
the kind of observational acuity without which an adequate theory of phonetics
cannot develop. He is part of the wider trend towards observation and description
that formed the beginnings of the scientific methods that became more firmly
established in the following century.
Additional observations about speech sounds and speech production were
made in the late sixteenth and seventeenth centuries which helped to advance
phonetic understanding and provide the knowledge for more detailed phonetic
descriptions. In talking of the seventeenth-century scholars who wrote on phonet-
ics, Abercrombie (1993: 310) has remarked: ‘Their contribution to the history of
the subject is not to be despised. They succeeded in constructing the foundations
of a true general phonetics.’
Robert Robinson, a contemporary of Shakespeare, published The Art of
Pronuntiation in 1617 not so much to reform spelling as to devise a way of
describing pronunciation so that learners of foreign langauges could learn native-
like forms of speech. He created a vowel chart, perhaps the first ever, showing in
a diagrammatic representation of the mouth the relationship of the tongue to five
points along the palate (see Figure 2.3a).
At each point Robinson indicated five associated vowel qualities, in short and
long variants, using his own set of symbols, although neither the open–close
dimension nor lip-shape is incorporated into the scheme. Figures 2.3b and 2.3c
show that very similar scalar diagrams were used by Bell (1867: 74) and Jones
(1918/1972: 32). For consonants, Robinson used his own adaptations of existing
letters and designed new ones, using diacritics to distinguish between voiced and
voiceless (Dobson 1957: xii–xiii, 23–4). He defined the characters in terms of
five locations for vowels and three for consonants (‘outer’, ‘middle’ and ‘inner’),
and four consonantal manner distinctions (‘mute’ = plosive, ‘semi-mute’ = nasal,
‘greater obstrict’ = fricative, ‘lesser obstrict’ = approximant) plus a fifth for
‘the peculiar’ [l] (ibid.: 14–24). Assignment of sounds to these categories is not
always in agreement with modern phonetics: [θ] and [ð] are placed in the ‘inner’
region along with velars, behind [s] and [z]. Comparing his solution for [ʃ] with
Smith’s and Hart’s, Robinson tells us in a passage reminiscent of Smith that he
derived his symbol [xx] from the sequence [xox] (= [jsj]) because ‘it seems to be
but one consonant sound, nor indeed can it be discerned to be otherwise, vnlesse
by a very diligent obseruation’ (Robinson 1617 (not paginated), italics added).
That he did not give [ʃ] the status of a primitive suggests he thought in reality
SomoudBarghouthy
56
a)
b)
c)
FIGURE 2.3: (a) Robinson’s ‘scale of vowels’ diagram of 1617.

A = larynx, B = front of palate, C = tongue root. Robinson (1617), The
Art of Pronuntiation, facsimile edition, edited by R. C. Alston, Menston:
The Scolar Press, 1969; (b) Bell’s ‘scale of lingual vowels’ of 1867 with
his Visible Speech symbols. Bell (1867), Visible Speech: The Science of
Universal Alphabetics, London: Simpkin, Marshall and Co.; (c) Jones’s
drawings of cardinal vowel tongue positions of 1918, based on X-ray
photographs. Jones (1918/1972), An Outline of English Phonetics,
Cambridge: Cambridge University Press, ninth edition
it was two sounds, which would explain why he did not classify it or give it a
description to compare with Hart’s. Nevertheless, Robinson’s scheme marks an
advance on the work of Hart for its conception of a notation free from the influ-
ence of any irregularities in the sound–spelling correspondences of traditional
orthography, and for the setting up of a small number of theoretical phonetic
categories to account for all the consonants and vowels he could discern. His
notation therefore meets the requirement of a proper phonetic notation more fully
than Smith’s or Hart’s because it is more explicitly based on theory, however
inadequate we might nowadays judge that theory to be. Its purpose was not
to replace extant orthography but to be able to represent the expression ele-
ments of spoken language. His symbols can therefore be said to denote general
phonetic models that have theoretical definitions. Their use in proper phonetic
transcription is exemplified in a number of surviving manuscripts in the Bodleian
Library, most extensively in a transcription of a poem by Richard Barnfield,
Lady Pecunia, which runs to 56 six-line stanzas. Robinson may therefore argu-
57
ably be the first phonetician to produce proper running phonetic transcriptions

in English; they can be classed as generic, broad and systematic (see Chapter 4
Sections 4.1, 4.3 and 4.4).
An interesting feature of Robinson’s notation is the way he represented voice
and voicelessness as consonantal prosodies or ‘long domain’ features, which
‘strikingly anticipated Firthian prosodic analysis’ (Abercrombie 1993: 311).
Voiced and voiceless cognates were given the same base symbol and an ‘aspi-
rate’ mark was placed above the first consonant symbol of a syllable if the onset
and/or coda contained any voiceless consonants: [↼] = voiceless onset, [⇁] =
voiceless coda, [ϟ] = voiceless onset and coda. Dobson (1957: xiii) complains
that this is ‘ill-conceived’, but it has some merit as an analysis of English onset
and coda clusters in which, with a handful of optional exceptions, obstruents
agree in voicing (Gimson 1980: 239–53).
In the latter half of the seventeenth century four figures are generally cred-
ited with having made the most progress in the English School of phonetics:
John Wallis, John Wilkins, William Holder and Francis Lodwick. Wallis
(1616–1703) attracted controversy for accusations and counter-accusations
regarding claims about his achievements, for which Firth (1946: 109) is
unforgiving, but Kemp (1972: 13), while not excusing Wallis’s dishonesty, is
a little more understanding of how academics sometimes succumb too much
to vanity.
In the Tractatus de Loquela, prefaced to his Grammatica Linguae Anglicanae
of 1653, Wallis, a founding member of the Royal Society, presents a classifi-
catory scheme for vowels and one for consonants.2 These are summarised in
tables of intersecting categories much like the modern IPA chart in principle
if not in detail (reproduced in Figure 2.4). Vowels are defined as the intersec-
tions of two dimensions, front–back and close–open, each having three values:
guttural–palatal–labial and wide–medium–narrow respectively, specifying nine
vowel qualities; Bell’s (1867: 73) nine primary vowels, and Sweet’s (1877:
12), are defined by almost identical categories (Kemp 1972: 46) but presented
in tabular form more iconically with the high–mid–low categories on the verti-
cal axis, where Wallis places his wide–medium–narrow on the horizontal axis.
Wallis gives other dimensions (open, round, obscure, fat, thin) in the cells in
a somewhat ad hoc manner. The table for consonants shows four dimensions:
the manner dimension mute–semi-mute–semivowel built on the place dimen-
sion labial–palatal–guttural, and a thin–fat dimension (which Wallis describes
variously as a spread–rounded or narrow–wide distinction) built on an aspi-
rate–non-aspirate dimension (although the thin–fat distinction does not apply to
non-aspirates). For an extensive discussion of Wallis’s knowledge of phonet-
ics, how it compared to that of other scholars of the time, and the meanings
of his terms, see Kemp (1972: 39–66). For our purposes we should note that
his terminology originates in a theoretical approach even if it is at times rather
vague (Kemp 1972: 48), and that Wallis tried to fit vowels and consonants into
the same place-of-articulation dimension of ‘labial’, ‘palatal’ and ‘guttural’,
anticipating some modern attempts such as Catford’s polar coordinates (Catford
1977: 182–7).
SomoudBarghouthy
FIGURE 2.4: Wallis’s 1653 sound chart ‘Synopsis of all letters’
The significance of a tabular presentation of sounds in the development of

phonetic theory can hardly be overestimated. By setting up phonetically defined
dimensions whose categories intersect, phonetic models are generated which
become the denotata for phonetic notation. That is to say, instead of symbols
denoting real-world phenomena with all the problems that that conception of
symbols brings (see Chapter 1 Section 1.2.3), they can denote products of
a theory. In this manner, orthographic characters are transmuted into proper
phonetic symbols. Tables with dimensions defined in terms of articulatory
phonetic theory are models of an abstract taxonomic phonetic space in a way
that labelled diagrams of the vocal tract such as Robinson’s for vowels and
Al-Sakkākī’s for consonants are not. Labelled vocal tract diagrams associate
parts of the vocal tract with particular sound qualities whereas tables define the
articulated dimensions of a more abstract conception of phonetic space with
at least the potential to be domain-neutral. Wallis may not, of course, have
thought of his tabular arrangement in quite these terms, but it liberated symbols
from their orthographic origins to guarantee them the potential for a freedom
they had never had before, allowing them to be put to the service of phonetics
as a scientific notation. Abercrombie’s (1993: 312) verdict on Wallis, that ‘his
59
De Loquela is an unsatisfactory book in many ways’, overlooks this very sig-

nificant step in the often parallel development of phonetic theory and phonetic
notation.
We can see in Wallis’s table how it generates sound-types which he recog-
nises as not occurring in speech (mugitus ‘mooing’, gemitus ‘groaning’),3 just
as we saw in Chapter 1 Section 1.3 how the IPA chart generates ‘pharyngeal
nasal’ although no such sound is possible. Compared to Robinson, Wallis is less
venturesome in his symbol set – his only new symbol is [ɴ̄ ], which denotes a
voiced velar nasal – but the models they denote have firmer theoretical foun-
dations resulting from a more systematic attempt to chart taxonomic phonetic
space.
There is a line of development from Hart through Robinson to Wallis in
which phonetic observations become more systematic though not always more
accurate, phonetic theory is more prominent, and a more universalist perspective
is evident. Where exactly we draw a line and say that proper phonetic notation in
the western Early Modern world starts will be to some extent arbitrary, but there
is enough to show that Wallis was clearly operating in a manner informed by
observation and theorising which was closer in method to modern phonetics than
his predecessors. He also showed more concern to make his scheme applicable
to other languages, including the non-European languages Hebrew and Arabic.
Like any other pre-Modern phonetician, he can be criticised for errors that seem
elementary to us. For example, he says that in the production of [θ, ð] the air
exits through ‘a round shaped hole’ while for [s, z] it escapes ‘through a slit’
(Wallis 1765: 23, tr. Kemp 1972: 173) and he fares no better than Robinson, and
rather worse than Smith and Hart, on the ‘esh test’. Wallis excluded [ʃ], and the
affricates [ʧ, ʤ], from his table, regarding them as compounds made up of the
sequences [sj, tj, dj]. Kemp (1972: 60) conjectures that Wallis may have based
his analyses on pre-coalescent pronunciations of words such as nation, nature,
soldier (see Cruttenden 2001: 76, 190) rather than on words such as shop, ash,
church, judge in which [ʃ, ʧ, ʤ] do not result from coalescence. This greater
uncertainty about [ʃ] in the later writers Robinson and Wallis, also seen in
eighteenth-century accounts of English pronunciation (e.g. Walker 1791: 4), may
be connected with coalesced pronunciations of words such as sugar starting to be
perceived as vulgarisms (see Beal 1999: 144–51).
Bishop John Wilkins, brother-in-law to Oliver Cromwell and, like Wallis, a
founder member of the Royal Society, lived from 1614 to 1672. His reputation
among modern linguists is for his work on a ‘universal language’, the famous
Essay Towards a Real Character and a Philosophical Language (1668), that
would by-pass natural languages and allow world-wide communication in terms
of supposedly universal semantic categories each having its own written charac-
ter. This semasiographic project was carried out in an intellectual climate much
influenced by Francis Bacon (Salmon 1983: 128) in which there was little faith
in the ability of natural languages to express truth clearly and distinctly. In the
five years prior to his death in 1626, Bacon had written in his unfinished work,
The Great Instauration, about what he called the ‘idols of the mind’, four types
of preconceptions or inclinations in the minds of human beings which tend to
SomoudBarghouthy
prevent us from apprehending truths. The type he called ‘idols of the market-
place’ were responsible for the false belief that we have rational control over
our use of language, and for our failure to see that language can control our
thought. In a sentence which looks forward to activation models of the mental
lexicon, Bacon asserts that ‘words react on the understanding; and this it is that
has rendered philosophy and the sciences sophistical and inactive’ (Spedding,
Ellis and Heath 1858: IV 60–1, quoted in Carlin 2009: 19). The desire to estab-
lish a universal philosophical language in the seventeenth century had both
religious and scientific motivations. On the religious side, it was a programme
to tackle the linguistic chaos which ensued, according to the Old Testament,
after the destruction of the Tower of Babel. Latin had functioned as a kind of
universal language in Roman Christendom but the rise of vernaculars, and the
strength of the Reformation, had weakened its status (Clauss 1982: 532–3). In
the opinion of many, a state of linguistic homogeneity needed to be restored
to mankind. On the scientific side, advances in the taxonomic classification of
the natural world led to a belief that all reality and human experience could be
similarly classified and a system of universal categories set up as the content
elements of a universal language. Reality and language would then ‘form two
isomorphic systems’ (Hüllen 1986: 119) over which the idols of the market-
place would have no power. Each category would be assigned a written charac-
ter which in some versions would be pronounced as the translation equivalent
of the language of the reader – that is to say, the character would be a semasio-
gram – while in other versions, Wilkins’s being one, each character would be
assigned a pronunciation. For this purpose, Wilkins tried to establish universal
phonetic categories, much as does the IPA. The linking of a universal perspec-
tive on phonetics with the idealism of international communication came about
again in the late nineteenth and early twentieth centuries when spelling reform-
ers and the Esperanto movement made common cause in challenging national
orthographies and national languages, bolstering their positions with reasoning
from phonetics.
Wilkins is important for his contribution to both phonetic theory and phonetic
notation. Regarding phonetic theory, his classification of consonants showed
more awareness of articulatory structures than Wallis’s and he made a more
succcessful attempt to incorporate vowels into the same scheme. His cross-
classificatory sound chart, shown in Figure 2.5, is therefore a more sophisticated
model of articulatory phonetic space and each symbol consequently denotes a
more exact general phonetic model.
Regarding notation, Wilkins devised symbols based on the postures of the
speech organs during the production of consonants and vowels in so far as
they were understood. The symbols of this ‘organic alphabet’ bear no relation
to alphabetic letters but are motivated by the shapes of the articulators and the
passage of the airstream, their iconicity depending on observation and theory.
Wilkins makes no mention of the Dutch philosopher and alchemist Franciscus
Mercurius ab Helmont, who the year before had published his account of the
Hebrew alphabet (Helmont 1667) with cutaway sagittal drawings of the vocal
tract to try to prove that Hebrew letters constituted a ‘natural’ organic alphabet.
Wilkins’s drawings are stylistically and anatomically very similar, including an
SomoudBarghouthy Origins and Development 61
FIGURE 2.5: Wilkins’s sound chart of 1668. Reproduced with the

permission of the Brotherton Collection, Leeds University Library
‘at rest’ diagram with numbered articulators. Although Wilkins did not intend his
organic symbols to be used as transcription symbols, they marked an important
step away from orthographic thinking. The importance of this step is summed up
by Heselwood et al. (2013: 12):
SomoudBarghouthy
Organic symbols explicitly identify sounds as objects of study independently
of any writing system and therefore imply the possibility of phonetics as a
language-independent discipline drawing on the disciplines of anatomy and
physiology.
In their role as ‘pictures of the letters’ the organic symbols linked the letters
to articulation to give them concrete phonetic interpretations, thus acting as
shorthand definitions for the accompanying letters which were used as phonetic
notation. For example, the organic symbol for [F] (= IPA [f]) shows the two lips
touching but with a line bisecting them to indicate that air is passing between
them. For [P] (= IPA [p]) there is no bisecting line, and for [V] (= IPA [v]) the
line has a single oscillation at the left end to indicate vibration of the epiglottis,
which Wilkins took to be the source of voicing (the vocal tract is oriented to face
right; see Chapter 3 Section 3.1 on organic notation).
One year after Wilkins’s Essay, William Holder’s Elements of Speech was
published, although it was probably completed before Wilkins’s work appeared
(Salmon 1972: 152). Holder lived from 1616 to 1698. That he continues the
general Aristotelian view of writing’s relation to speech is evident when he says
(Holder 1669: 63) that ‘[l]anguage is a connexion of audible signes [. . .] Written
language is a description of the said audible signes by signes visible.’ Holder has
a view of spoken language very similar to Smith’s and Hart’s in which the sounds
we make are ‘natural elements’ but the meanings are ‘artificial’ and come about
by ‘institution and agreement’ (ibid.: 9–11). How the ‘audible signes’ are to be
written is something which can be reasoned about rather than resulting from the
operation of ‘uncertain fabulous relations’ beyond our knowledge. Although he
talks of written language as providing a ‘description’ of spoken language, Holder
did not propose organic symbols. Like Smith, Hart and Wallis, he gave phonetic
definitions to existing roman letters, took [θ] from Greek, and used a few extra
ones, for example [ȣ] representing a ‘labio-guttural’ vowel, a glyph previously
used by Hart for [ʃ]. Holder employed the diacritic [‘] to denote voicelessness
when added to a sonorant consonant, for example [L‘] (= IPA [l ]̥ ), but for nasali-
sation when added to a fricative, for example [S‘] (= IPA [s̃]); see Figure 2.6. The
general strategy of co-opting roman alphabetic letters, taking letters from other
alphabets and adding new letters and diacritics to create a notation system was to
become, over two centuries later, the recognised strategy of the IPA for enlarging
its stock of symbols (see Chapter 3 Section 3.4.5).
Albright (1958: 8–12) thinks Holder’s lasting importance in phonetics can be
reduced to his invention of the [ŋ] symbol for a velar nasal, although the symbol
itself did not appear because, as Holder explains, the printer had no type for it.
In fact, Alexander Gill had already come up with something very similar in his
Logonomia Anglica of 1619 (Abercrombie 1981: 212). Albright’s rather dis-
missive evaluation, perhaps premised on the erroneous view that Holder merely
followed Wilkins (Albright 1958: 11), ignores some quite profound passages in
Holder which have led Kemp (1981a: 42) to compare him to the ancient Indian
grammarians and Abercrombie (1993: 315) to hail him as ‘the most important
17th century figure’ in phonetics. Holder’s description of voicing (Holder 1669:
23) is the first comprehensive account in western phonetic literature which, even
if it does not quite attain the accuracy of modern descriptions (Abercrombie
63
1986: 4–5, 1993: 318–19), ‘provides the conceptual rudiments of what we know
as the aerodynamic-myoelastic theory of phonation, and the source–filter model
of speech production’ (Heselwood et al. 2013: 12). It refers to breath from the
lungs passing between approximated vibrating cartilages in the larynx to create
a tone which is ‘sweetened and augmented’ by resonance in the supralaryngeal
vocal tract. In Abercrombie’s (1986, 1993) discussions of the ‘hylomorphism’ of
Holder’s framework, we can see a clear identification of ‘matter’ and ‘form’ in
speech production with the ‘source’ and ‘filter’ respectively of modern speech
acoustic theory. The matter, or material of speech, is the airstream, which can
be voiced or voiceless and which remains undifferentiated until given different
forms by the variable filter of the supralaryngeal vocal tract. Holder’s hylomor-
phic scheme and the modern source–filter scheme can be mapped onto the three
functional components of speech production in parallel, as in (2.1).
(2.1) Holder: Matter Form
Functional components: Initiation Phonation Articulation
Acoustic theory: Source Filter
Some confusion over whether glottal [h] and [ʔ] count as sounds comes through
in Holder (1669: 72–3), which is not a great surprise when we consider difficul-
ties later writers have had in distinguishing between phonatory and articulatory
functions in the larynx.
Holder’s descriptions of several sounds are notable for detail and accuracy.
FIGURE 2.6: Holder’s table of consonants (left) and ‘scheme of the whole
alphabet’ (right). From Holder (1669: 62, 96)
SomoudBarghouthy
His account of [l] and [r] would only look outdated in a modern phonetics text-
book for its seventeenth-century language. Muscles are identified, and the trilling
action of [r] is described in aerodynamic-myoelastic terms: ‘born stiffely, as with
a Spring, by the Muscles, (especially by the Genioglosse) and agitated by strong
impulse of Breath’ (Holder 1669: 50).
The syntagmatic axis of speech gets more attention from Holder than from
other writers of the time. He sees speech as successive openings and closings of
the vocal tract, each cycle separated by an ‘appulse’, an approach of an active
articulator towards a passive one, very much in the same vein as the ‘frame and
content’ view of syllables based on mandibular cycles of Davis and MacNeilage
(2005). Analysis of places and manners of articulation is more modern-sounding
in Holder than in previous accounts, with greater consistency in situating the
terminology in relation to the different domains of phonetics, and there is more
emphasis on what would nowadays be called the phonemic or phonological
function of speech sounds (Fromkin and Ladefoged 1981: 4). Finally, it is worth
drawing attention to Holder’s account of the process of hearing in the Appendix
to Elements of Speech, where he identifies the components of the outer and
middle ear, the ‘three very little Bones’, and refers to the ‘inward ear’ which
connects to the auditory nerve.
The last of the English School phoneticians to be considered here is Francis
Lodwick (1619–94). His Essay Towards an Universall Alphabet was published by
the Royal Society in 1686 but had already been circulating for some years amongst
scholars interested in universal languages (Abercrombie 1948/1965: 49). It pre-
sents an organic-analogical alphabet (see Chapter 3 Section 3.2.2, and Figure 3.8)
in tabular form which only partly follows the structure of the vocal tract and uses
numbers to label the rows and columns (‘ranks’ and ‘files’) instead of phonetic
terminology. Although we can see the network of cross-classifications showing
which consonantal correlations are proportional to other correlations, Lodwick
does not identify the phonetic bases of these relationships, leaving the reader to
work them out. This absence of phonetic explanation means that Lodwick did
not really add very much to phonetic theory, although his principle ‘that no one
Character have more than one Sound, nor any one Sound be expressed by more
than one Character’ (Lodwick 1686: 127, in Salmon 1972: 236) is close to the
IPA principle, first articulated in 1888, that ‘[t]here should be a separate letter for
each distinctive sound’. One other interesting point of theory, although he gives
no rationale for it, is that voiceless obstruents are derived from more ‘primitive’
voiced ones. In the history of how voiced and voiceless obstruents have been
handled in descriptive frameworks, we have here perhaps for the first time the
suggestion that voiced obstruents are more basic than voiceless ones. It is not clear
whether Lodwick conceived of the relationship being one in which voicelessness
is added to derive voiceless obstruents, or voice is taken away, though the former
is implied by the device of adding a stroke to denote voicelessness.
By the late seventeenth century phonetic knowledge in England had reached
a level broadly comparable to the Middle Eastern grammarians of some eight
hundred years before. It would not reach the level of attainment of the ancient
Indian grammarians of over two thousand years before until the nineteenth
century.
SomoudBarghouthy
2.3.3
Origins and Development
Phonetic terminology in the ‘English School’

65
One indication of a mature scientific discipline is a stable and consistent termi-

nology so that the same phenomena are referred to in the same way by different
scholars. By this indicator, phonetics was still making its way through early
adolescence in the seventeenth century, with no two writers using the same set of
classificatory terms. Table 2.1 presents the manner of articulation terms employed
by the major figures from Smith to Holder against the closest IPA equivalents;
Lodwick has been left out because he did not use phonetic terminology to classify
sounds. We get a sense of each scholar trying to find the most appropriate terms
for the categories as they understood them. Influence from classical writings sur-
faces most clearly in Wallis but there are differences in how classically derived
terms are used. For example, Robinson and Holder use ‘mute’ for plosives, in
line with the term’s classical origins (from Greek aphōna via Latin mutae; Allen
1981: 117–18), and ‘mute’ was used in this sense as late as the early 1840s by
Pitman (see Kelly 1981: 251–2), while Wallis and Wilkins use it for all voice-
less sounds. There is conspicuous uncertainty here about whether ‘mute’ refers
to absence of sound generated at places of articulation or in the glottis, probably
because the mechanism of voicing was not known except by Holder.
Terms vary in their relations to different phonetic domains. They had not
settled into the predominantly articulatory basis of modern phonetic categories.
‘Mute’, ‘sonorous, ‘hard’ and ‘soft’ are auditory-perceptual concepts; ‘obstrict’,
‘aspirate’, ‘breathless’, ‘breath’ and ‘pervious’ are aerodynamic; while ‘closed’,
‘occluse’, ‘open’ and ‘partial’ are articulatory, as are ‘thin’ and ‘fat’, which Wallis
uses to refer to the size and shape of articulatory constrictions. Wilkins’s inclina-
tion towards aerodynamic terms may reflect the focus on airflow expressed in his
organic-iconic diagrams and symbols (see Chapter 3 Figure 3.3). This mixture
TABLE 2.1: Consonantal manner terminology in the ‘English School’ of phonetics

in the sixteenth and seventeenth centuries
IPA Smith Hart Robinson Wallis Wilkins Holder
Plosive Mute Stopped Mute Primitive/ Breathless Plenary/
breath Closed Occluse/ Mute
Fricative Continual Greater Thin Derived/ Mouth- ‘Blæse’ Partial/
breath obstrict Open/ breathing (lisping) Pervious
Aspirate and
sibilant
Approxi- Semi- Semi-vocal Lesser Fat Semi-
mant vocal/ obstrict vocal
Liquid
Voiceless Hard Breath, Aspirate Mute Mute Breath
hard
Voiced Soft Sound, soft No term Semi-mute Sonorous Voice
Nasal Semi- Semi- Semi- Semi-vocal Nose- Nasal
vocal/ vocals mute breathing
Liquid
SomoudBarghouthy
of terms from different domains shows an empirical taxonomic approach which
had not yet decided on its methods of observation and classification and had not
oriented itself into a single overall direction. It was later to do so by attending to
physiological causes of speech sounds and the anatomical structures responsible
for them. Much of the impetus in this direction came from the Indian tradition,
which came to the notice of modern western linguists only towards the end of the
eighteenth century, but we should not overlook the steps which were taken in this
direction by the ‘English School’. For example, we have already seen that Holder
had greater insight into phonation than his contemporaries because of knowledge
of laryngeal structure and vibration.
Holder’s conception of a basic distinction between ‘breath’ and ‘voice’ is the
one the Indians operated with under the terms śvāsa and nāda (Allen 1953: 33–4),
and which may have been independently developed in the medieval Middle East
by Sībawayh, who coined the terms mahmūs (participle form of Arabic hams
‘whisper’) and majhūr (Arabic jahr ‘clear, outspoken’), possibly as a result of
Greek influence. Holder may have got it from Hart’s ‘breath–sound’ dichotomy
by applying his more accurate knowledge of phonation. It is the distinction used
by Sweet (1906: 9–12), based on Bell (1867: 45–6), and perpetuated in Jones
(1918/1972: 19–22), who equates breath with ‘voiceless’, the latter being pre-
ferred by Abercrombie (1967: 26–7) and now in widespread use. Of all the terms
in Table 2.1, these are the only ones with a presence in modern phonetic tax-
onomy, although ‘sonorous’ and the concept of sonority have become centrally
important in theories of the syllable (Laver 1994: 503–5; see also Botma 2011).
2.3.4 Phonetic theory in the late eighteenth and nineteenth centuries
The eighteenth century saw very little progress in phonetics until the final quarter.
This is in great contrast to the nineteenth, by the end of which huge advances
had been made in phonetic theory and also in the application of technology to
the study of speech. It is not an exaggeration to say that by the start of the twen-
tieth century phonetics had become a science in Europe linked with the scientific
study of anatomy and physiology and of acoustics (Albright 1958: 19), but first it
had to forge its own identity separate from the interests of language teaching and
spelling reform. It is in the nineteenth century, particularly the second half, that
we see most directly the roots of modern theoretical and experimental phonetic
science and the development of our current resources for phonetic transcription.
Notation systems are dealt with in Chapter 3, where their relations to phonetic
theory will be examined in some detail and in a historical context; consequently
at this point comments on these matters will be kept brief.
In general at this period phonetic theory was more closely tied to issues of
notation than to instrumental methods and experimental procedures, the latter
being carried out by physical scientists who viewed speech as the product of a
system of pumps, tubes and valves rather than as the spoken manifestation of
language. Symbols in notation systems had to be defined, and this was usually
done in relation to how the symbolised sound was understood to be produced,
that is to say in terms of articulatory phonetic theory.
Several ingredients came together from the late eighteenth through to the
mid-nineteenth centuries which all contributed significantly to the formation
67
of phonetics as a science. Marking the start of the last quarter of the eighteenth
century was Joshua Steele’s (1700–91) An Essay Towards Establishing the
Melody and Measure of Speech of 1775. Steele was concerned with the pro-
sodic structure of speech and particularly with its representation. He adapted
terms and notational devices from music in his analyses of rhythm, intonation
and other dynamic features. Steele’s work went largely unappreciated at the
time (Sumera 1981: 103), but some of his resources have made a reappearance
in the extensions to the IPA with the same applications to speech, for example
allegro, f(orte), p(iano) (Duckworth, Allen, Hardcastle and Ball 1990); there are
also resemblances to later interlinear intonational transcriptions (see Chapter 4
Section 4.11.3) and to Halliday’s (1970: 52) representations of intonational pitch.
One of the first representations of vowels in an abstract vowel space was pre-
sented in the 1781 Dissertatio Physiologico-Medica de Formatione Loquelae of
Christoph Hellwag (1754–1835) in the form of a ‘vowel triangle’ (Kemp 2001:
1469–70). It has clear similarities to the cardinal vowel system of Daniel Jones
(e.g. Jones 1918/1972: 31–9) and the modern IPA vowel quadrilateral.
Lexicography rather than general phonetics was more in the ascendency at this
time as shown in the number of English pronouncing dictionaries which appeared
with various ways of representing consonants, vowels and word-accent (Beal
2008). In his Grand Repository of the English Language of 1775 Thomas Spence
(1750–1814) produced ‘a genuine, scientific, phonetic alphabet’ (Abercrombie
(1948/1965: 68). The letters of this alphabet are modifications of the roman
alphabet and are presented in alphabetical order with keyword exemplifications
but without phonetic descriptions. It is questionable whether Spence really adds
anything to general phonetic science, although he can be applauded for showing
that it is possible to regularise the grapheme–phoneme correspondences of
English into a ‘broad phonemic system’ (Beal 1999: 89).
John Walker (1732–1807) achieved greater fame than Spence with his A
Critical Pronouncing Dictionary of 1791. Walker’s classification scheme shows
no advance on those of Wilkins or Holder, and his phonetic descriptions are some-
times less perceptive. He does not appear to have understood Holder’s account of
voicing despite referring to it. Labiodental fricatives he describes as produced ‘by
pressing the upper teeth upon the under lip’ (Walker 1791: 6), which fails to assign
active and passive roles accurately to the articulators. He seems unsure whether
the sounds corresponding to English orthographic <SH> and <TH> are single
sounds or not, describing them rather confusedly as ‘mixed or aspirated’, having
‘a hiss or aspiration joined with them, which mingles with the letter’ (Walker
1791: 4–5). While Walker seems to have had an acute ear for detecting subtleties
of sound, he lacked a corresponding acuity in matters of phonetic theory.
A huge influence on phonetics, because of the need to apply it as a tool in his-
torical and comparative linguistics, came from the work of Sir William Jones, a
British legal official stationed in India. Although resemblances between Sanskrit
and European languages had been noted from the late sixteenth century (Robins
1990: 150), it was Jones’s presentation of his famous paper with the unpromising
title Third Anniversary Discourse in 1786 that established beyond doubt the sys-
tematic relationship of Sanskrit to Greek and Latin, and set historical linguistics
SomoudBarghouthy
on a footing where it could apply the taxonomic approaches current in botany
and biology to historical linguistic data. Interest in Sanskrit and the availability
of Sanskrit texts brought ancient Indian phonetics into European scholarship such
that, according to Allen (1953: 7) as we saw above (Section 2.3.1), ‘Henry Sweet
takes over where the Indian treatises leave off’, although Alexander J. Ellis had
already made a study of Indian ideas, as had the German-trained American lin-
guistic W. D. Whitney.
The two biggest influences on Sweet were probably Ellis and Bell, but he also
greatly admired the Norwegian phonetician Johan Storm and took serious note
of what was going on in Germany in the work of Carl Merkel, Eduard Sievers
and Wilhelm Viëtor. Sweet’s contributions to phonetic theory have been evalu-
ated by Kelly and Local (1984), who stress the attention to detail, consistency
of description and comprehensiveness of scope of his work compared to his
predecessors such as Bell. A. J. Ellis (1814–90) is best known for his researches
on, and conjectured phonetic descriptions of, English pronunciation from the
Old English period through to his own time, and for his work with Isaac Pitman
on systems of notation (see Kelly 1981) leading to his own palaeotype system
(Ellis 1867). Alexander Melville Bell (1819–1905) is probably best known for
his Visible Speech, also dated 1867, an experiment in organic alphabet creation
based on detailed analyses of consonant and vowel production. It combines the
principles of Wilkins’s organic and systematic alphabets so that ‘all Relations
of Sound are symbolized by Relations of Form’ (Bell 1867: 35). Sweet at first
resisted the organic approach to notation but soon became a convert (see Chapter
3 Sections 3.1.4 and 3.1.5).
Advances in phonetic theory at this time owed much to comparative and his-
torical linguistics on the one hand, and to medical understandings of anatomy and
physiology on the other. Although the development of the comparative method
in the first half of the century by scholars such as Rasmus Rask, Franz Bopp and
Jakob Grimm sought to establish language relationships through shared sounds,
the emphasis was on their lexical distribution rather than the phonetic structure
of the sounds themselves (Morpurgo Davies 1998: 163). It soon became clear,
however, in the attempts at internal reconstruction by scholars such as August
Schleicher and Friedrich Schlegel, that their methods would require a phonetic
theory sophisticated enough to account for phenomena covered by sound laws
such as Grimm’s Law and Verner’s Law. On the practical side, a good nota-
tion system makes for more concise, accurate and systematic descriptions of
historical and comparative data, as Ellis (1867: 1–2) remarked. From the 1850s,
articulatory and acoustic frameworks of phonetic description became available in
Germany through the work of the physiologists Ernst Brücke and Carl Merkel,
and the physicist Herrmann von Helmholtz, who used their scientific knowledge
to study properties of speech sounds. Helmholtz (1821–94), for example, identi-
fied separate vowel resonances in the mouth and pharynx, and undertook experi-
ments in synthetic speech (Dudley and Tarnoczy 1950). By applying advances in
medical technology, techniques of laryngoscopy, aerometry and direct palatog-
raphy were developed for investigation of the articulatory domain of phonetics,
and the acoustic domain started to become more amenable to investigation with
Scott’s Phonautograph, invented in 1859. These developments in the understand-
ing of the physical properties of speech made it possible to give a more explana-
69
tory account of historical sound changes, and formed the foundation for Eduard
Sievers’s achievements in general phonetic theory and its application to historical
linguistics and linguistic phonetics (see Kohler 1981).
But perhaps the invention with the greatest impact on phonetic transcription
took place in 1877, the year after Sievers’s Grundzüge der Lautphysiologie and
the year of Sweet’s Handbook of Phonetics. This was the invention by Thomas
Alva Edison, himself hard of hearing since childhood, of a device for audio
recording and playback. Without audio recording we would not be able to collect
speech from different speakers and store it for later analysis, nor would we be
able to listen to the same utterance again and again, which is essential for analytic
listening. All impressionistic transcription would have to be live, and its inability
to keep up with continuous speech would make it a rather poor tool. Nowadays
we take recorded speech very much for granted, but as phoneticians we should
probably be more thankful for this invention than for anything else to be found in
phonetics laboratories. It is hard to overestimate the impact that sound recording
has had on the development of phonetics as a data-driven science.
Henry Sweet’s Handbook of Phonetics of 1877 and Sievers’s Grundzüge
der Phonetik of 1881 show us the state of phonetic theory in western Europe
in the years leading up to the formation of the IPA. Both authors stressed the
importance of accurate phonetic descriptions of living languages and the value
of practical phonetic skills. Both were also suspicious of instrumental phonetics
and tried to discourage it, which in hindsight looks rather Canute-like given the
ubiquity of instrumental methods in phonetics today, and somewhat misplaced
in the light of what they have revealed to us about the articulatory and acoustic
structure of speech. Nevertheless, it would be unwise to dismiss Sweet’s plea
that instrumental methods should not be allowed to supersede auditory methods,
a plea taken up later in this book in Chapters 5 and 6.
The context in which the International Phonetic Alphabet, the most well-
known and widely used phonetic notation system, had its beginnings was formed
from the influences outlined above coupled with the desire to make pronunciation
clearly representable in written form. Two groups who made common cause in
pursuit of this desire were spelling reformers and teachers of modern languages,
and several of the most influential and energetic founders of the International
Phonetic Association were both, including the leading figure Paul Passy.
2.3.5 From correspondence to representation
In summary, the process by which the phonographic orientation of writing and

the development of phonetic theory have made possible a proper phonetic nota-
tion and proper phonetic transcription is one where relations of correspondence
change into relations of denotation and representation. Phonography provides
written characters which correspond to units of pronunciation. Phonetic theory
provides models for units of pronunciation. If written characters are used to
denote these models then they are being used as general phonetic symbols. When
speech phenomena are mapped onto these models, then the phonetic symbols
denote descriptive models and can be said to represent those phenomena. It is
SomoudBarghouthy
the difference in function between correspondence and representation that essen-
tially distinguishes spelling from phonetic transcription. Failure to distinguish
correspondence relations from representation relations is responsible for the
Aristotelian doctrine that written language represents spoken language, and it
sustains the energies of spelling reformers who wish to regularise sound–spelling
correspondences. Even in a fully regular and consistent phonographic writing in
which sound–spelling correspondences were entirely isomorphic with symbol–
sound representations, spelling and transcription would still be different activi-
ties with different purposes and interpretations: the former identifies meaningful
items of linguistic content in written language, the latter embodies an analysis of
meaningless items of linguistic expression in spoken language.
Pseudo-phonetic transcription was possible more or less from the beginning of
glottographic writing. Writing a foreign name in Ancient Egyptian uniconsonan-
tal characters results in a spelling of that name, but the procedure by which the
spelling is constructed is one which exploits the possibility of a representational
relation between the speech phonemona observed when the name is pronounced
and the pre-theoretical models abstracted from experiences of hearing similar
sounds. That is to say, pseudo-transcription has always been one way of produc-
ing new spellings.
2.3.6 Spelling reform
We have seen how phonetics in sixteenth-century England began in the service of

spelling reform to make literacy easier to acquire and foreign languages easier to
study, and then became increasingly focused on description and taxonomy. The
emergence of phonetics as a more scientific discipline in the nineteenth century
gave a surer basis to taxonomic categories and terminology. It also loosened
the ties with spelling reform, but it was some time before it cut them. The 1949
Principles of the International Phonetic Association recognises reformed spell-
ing as an application of IPA symbols, thus giving symbols the status of letters, an
aim dropped in the 1999 version. The application of phonetics to language learn-
ing and teaching does not now have such a strong presence in the IPA’s journal
as it did in the early 1990s. Phonetics is not now primarily seen as existing to
support the learning and teaching of languages but as a body of theoretical and
practical knowledge about how speech is structured to be put more in the service
of phonology, sociolinguistics and speech technology than language pedagogy.
Spelling reform is usually thought of as a policy to increase the transpar-
ency of sound–spelling correspondences, but several of the lasting examples of
reformed spellings in the history of English orthography have in fact had the
opposite effect. They date from the fifteenth through to the seventeenth centuries,
when some letters were introduced into spellings for etymological reasons where
the spoken forms had no corresponding sounds to motivate them (Scragg 1974:
56–9). The consequence is that sound–spelling correspondence becomes com-
plicated and increasingly lexically specific. The introduced into the French
loan dette to give us debt through conscious reference to Latin debitum raises the
question as to whether we should say that <eb> corresponds to /ɛ/, or <bt> cor-
responds to /t/, or even <ebt> corresponds to /ɛt/, or whether we can simply leave
 out of correspondence relations altogether as preferred by Carney (1994:
71
213). All options are lexically restricted (cf. bet, met, set, get, let, web) such that
the spelling has similarities to a xenogram (see Section 1.1.4): we use the Latin-
influenced spelling <debt> for the written language form of debt, and pronounce
it /dɛt/ in the spoken form. The lasting success of Latinising etymologically
driven changes to English spelling, whether based on true or false etymologies,
further exemplifies the power of anti-phonography in glottographic writing and
the systemic independence of written and spoken language despite their obvious
close association.
Phonographically motivated spelling reformers have generally had an uphill
battle. They advocate in effect a state of affairs in which spelling would be iso-
morphic with transcription and spellings would be performance scores function-
ing as prescriptive models. Their motives are socially progressive, arguing that
it will facilitate literacy for the masses and open up greater access to foreign lan-
guages by making them easier to learn from written sources. However, the egali-
tarian aims of spelling reform tend to be undermined when it comes to deciding
whose pronunciation a reformed spelling should be based on. John Hart, one
of the earliest proponents of reforming English spelling phonographically (see
Section 2.3.2), was forthright in his views on this, deliberately echoing Quintilian
in saying it should be based on the speech of the learned, and most emphatically
not on the speech of ‘the unexpert vulgar’. How it should be decided whose pro-
nunciation will shape a reformed orthography is a serious problem which is likely
to cause attempts at spelling reform to flounder, particularly in the case of a lan-
guage like English with social and geographical variation extending over nations
and continents. If reformed spellings were to follow the speech of the learned
elite in Quintilian’s quomodo sonat fashion, then an Alcuinian policy of ad lit-
teras (see Chapter 4 Section 4.13.3) would have to be imposed on the ‘unexpert
vulgar’ if they were to gain any benefit from the enterprise. The benefit would
come at the price of abandoning local norms of speech in a top-down, centralised
policy of prescriptive accent levelling. Henry Sweet (1877: 196), for example,
advocated the teaching and testing of pronunciation in schools so that it would
match a reformed spelling. If bath and trap words were to have different vowel
letters because different vowel qualities are used by the social elite, then either
everyone has to use those vowel qualities or the spelling reform is only meaning-
ful to those native speakers who already make the vowel quality distinction and
do not need to be told; it would of course have benefits for non-native learners of
English. Spelling reform in a language exhibiting large-scale social and regional
variation can hardly be other than anti-democratic if it is to have any significant
effect for its native-speaker population. The only way to avoid this totalitarian-
ism is for each variety to develop its own spellings, in which case reading will be
either more restricted or more demanding, and cross-variety written communica-
tion put in jeopardy.
The strongest linguistic arguments against phonographically driven spelling
reform are founded on the view, expressed in Section 1.1, that the ontology of
language as a lexico-grammatical system is equally independent of writing and
speech, and that characters and sounds are alternative sets of clothing enabling
language to be made manifest in different media for communicative purposes of
SomoudBarghouthy
all kinds. Any correspondence relations that can be set up between speech and
writing are merely incidental and irrelevant for the functioning of the language
qua language. Weighing against this view, however, is the undeniable impor-
tance of phonographic processes in the history of written language, as outlined in
Section 2.2, which seems to be evidence that literate language users have always
valued at least some transparency in sound–spelling correspondence. It may be
that two different needs have to be reconciled: the need for spoken language and
written language each to function effectively on its own terms, including social-
indexical functions, and the need for literate users to be able to translate between
spoken and written language as effectively as possible. Trying to force reforms
to meet the latter need may upset the balance that has to be struck. Nevertheless,
the phonographic tendencies in written language which have given hope to spell-
ing reformers have been fortuitous for the development of resources for phonetic
notation and transcription.
Notes
1. The term ‘letter’ did not mean the same among earlier writers as it does today. Instead
of meaning only the alphabetic letters of written language, it was formerly used to mean
an element, or unit, of linguistic analysis neutral with respect to written and spoken
language which could manifest as a written character or a sound (see Abercrombie
1949, 1993: 316–18).
2. The first edition was 1653. The edition consulted is the sixth, of 1765, in Kemp’s
(1972) facsimile edition with translation from the original Latin.
3. In fact these categories would generate nasalised continuants which do occur.
SomoudBarghouthy
3
e
Phonetic Notation
e
3.0 Introduction
The purpose of a system of phonetic notation is to function as a resource for
denoting theoretical models which become descriptive models when used in
transcriptions (see Chapter 1 Section 1.3.1, Chapter 4 Section 4.0).
There are two sides to phonetic notation, namely the design of the glyph and
its denotation. The history of written language and phonetic notation is full of
the same glyph being used with different values. Just to take a random example,
the ‘bullseye’ glyph ‘ʘ’ seems to have started life as a variant of the Greek letter
theta <θ> in the Umbrian alphabet, for which it was also used in Tocharian; it
was drafted several centuries later into the Gothic alphabet invented by the Greek
Bishop Ulfilas (aka Wulfila) in the fourth century ce for IPA [w] (Coulmas 1996:
168), appears with the phonetic value [nd] in the Turkish Yenisei runes (ibid.:
515), corresponds to [s] in the Berber Tifinagh alphabet (ibid.: 504), was used in
late eighteenth-century America by William Thornton for IPA [ʍ] (Abercrombie
1981: 210, 216), turns up in the Vai syllabary in 1820s Liberia for [ku] (Coulmas
1996: 538), and then in 1976 became the IPA symbol for the bilabial click
(Pullum and Ladusaw 1996: 132). Fascinating as the history of individual glyphs
is, the focus in this chapter will be on the principles behind notation systems and
how they function as a whole to denote phonetic categories.
Phonetic notations come in different types. They can be constructed accord-
ing to different principles and be used in transcriptions to express analyses at
different levels of phonetic, phonological and morpho-phonological structure.
This chapter is concerned with describing principles of notation construction and
how they relate to phonetic theory; Chapter 4 will consider the different types of
transcriptions which can be made by employing notation systems.
Any phonetician engaged in transcription is likely to be in sympathy with
Sweet (1877: 100) when he asserts that ‘[t]he notation of sounds is scarcely less
important than their analysis’. Of course, analysis is more important than notation
because without it there is nothing to symbolise, but without notation we cannot
express analyses so succinctly and conveniently. Once sufficient familiarity with
phonetic theory and transcription conventions is attained, phonetic analysis can
SomoudBarghouthy
be read from notation relatively quickly and easily providing the notation is user-
friendly. The ultimate aim of a system of proper phonetic notation is to be able
to denote all the categories of phonetic classification that one’s phonetic theory
identifies, and thus to denote all points in taxonomic phonetic space. Each point
in that space is a model onto which phonetic data can be mapped (see Chapter 1
Section 1.3). Another way to think of this is to say that a notation system should
be able to populate the taxonomic phonetic space mapped out by phonetic theory
with symbols so as to leave no yawning gaps. How symbols denote categories is
an issue that leads to looking at the internal structure of symbols as well as rela-
tionships between symbols and denotata. These issues will be addressed for each
type of notation considered in the following sections. The issue of whether there
is information value in the sequential arrangement of symbol components – that
is to say, the question of whether symbols are functionally ordered or function-
ally simultaneous – will be considered in Section 3.5.
3.1 Organic-Iconic Notation

An organic notation is one in which symbols denote categories defined in terms of
articulators or articulatory states and actions. It is therefore anchored firmly in the
articulatory domain and can be thought of as populating abstract articulatory space
with symbols. In organic notation, abstract articulatory space is the taxonomic
phonetic space. It has been customary to classify as ‘organic’ only those nota-
tion systems which explicitly and systematically set out to denote sounds by their
articulatory formation such that each symbol can be analysed into components
denoting individual articulators. I shall follow this custom, but it should be noted
that any phonetic notation is organic to the extent that the conventions for its inter-
pretation take an articulatory perspective. The great problem with an organic bias
in phonetic notation is that in practice most phonetic analysis is not directly articu-
latory but either perceptual or, since the invention of spectrography, acoustic.
In the history of phonetic notation, organic systems have either been iconic,
so that there is some visual similarity between the symbol and what it denotes,
or analogical, so that the same denotatum is always denoted by the same symbol
but without visual similarity. In analogical notation the relation between symbol
and denotatum is therefore arbitrary. Examples of the iconic type are Bishop
John Wilkins’s organic alphabet of 1668 and Alexander Melville Bell’s Visible
Speech symbols of 1867. The analogical type is exemplified by the symbols of
Francis Lodwick’s 1686 Universall Alphabet and Amasa D. Sproat’s symbols of
1857. The division into iconic and analogical notations is not, however, always
clear-cut. The characters of the Korean Hangŭl orthography and the symbols of
the Passy-Jones alphabet are somewhere in the middle, but they will be dealt with
under the ‘organic-iconic’ heading here.
The most complete and transparent kind of organic-iconic notation would be
one where the whole configuration of the vocal tract during the production of a
sound was depicted in a symbol, but such symbols would not be easy to read and
write, being in effect highly detailed drawings of physical vocal tract space; nor
would they be selective in expressing an analysis of the particular sound being
represented – all parts of the vocal tract would appear to be equally implicated
SomoudBarghouthy Phonetic Notation
and equally important in contributing to the formation of the sound. To be useful
75
and informative, segmental organic-iconic symbols need to be selective, stylised

diagrams of those articulators identified as responsible for producing the sound in
question, the selection being the responsibility of phonetic theory. There should
be a one-to-one relationship between an organic-iconic symbol and an articula-
tory category such that, for example, labiality is always denoted by the same
graphic representation of the lips, and plosiveness, and voicing, and so on. Each
organic-iconic symbol thus denotes an articulatory category. Whole consonants
and vowels are then represented by composite multi-category symbols.
The great advantage claimed for a good organic-iconic notation is that it is
maximally analytic and maximally transparent to any reader with sufficient
knowledge of the vocal tract. One great disadvantage is that it tends to be difficult
to use in practice, but a further disadvantage is that it cannot be used to denote
dimensions of classification which cannot be tied to a particular articulatory
parameter, for example sonority, sibilance or rhoticity. These disadvantages are
no doubt partly responsible for the fact that none of the organic-iconic notations
which have been devised have been widely or lastingly adopted by phoneticians,
despite enthusiastic support from leading phoneticians such as Henry Sweet (e.g.
Sweet 1881).
Some examples of organic-iconic notation systems are discussed in the fol-
lowing sections.
3.1.1 Korean Hangŭl
The first known notation system on organic-iconic principles is the Korean

Hangŭl orthography (see Chapter 2 Section 2.2.5) introduced in the fifteenth
century to replace Chinese characters for the spelling of Korean words. Most
Hangŭl letters are complexes constructed from characters that represent par-
ticular articulatory configurations, as shown in Figure 3.1. For example, in the
FIGURE 3.1: Articulatory configurations motivating the Hangŭl letters.

Reproduced with kind permission from King Sejong the Great: The
Everlasting Light of Korea, p. 92
SomoudBarghouthy
letters <ᄃ, ᄀ> (transliterated respectively as <d, g>), the upper horizontal line
is a component character corresponding to the palate oriented with the front to
the left; the lines contacting it correspond, respectively, to closures at the alveolar
and velar places of articulation.
Under the influence of the structure of Chinese characters, Hangŭl letters
are composed into blocks corresponding to syllables, so that trisyllabic datugo
‘fighting, quarrelling’ is written with three syllable blocks (separated from each
other here for ease of identification) as 다 투 고. The Hangŭl letters belong to
an orthography but have a phonetic theory underpinning their design (Sampson
1985: 124–9; King 1996: 219–20). They therefore constitute a proto-phonetic
notation system as well as an orthographic system. This system has been devel-
oped into a proper phonetic notation system by Hyun Bok Lee of Seoul National
University. Called the International Korean Phonetic Alphabet (IKPA), it was
first published by the Korean Language Society in 1971. Using the Hangŭl
organic principles for the construction of complex symbols, Lee uses diacritics
and modifying strokes to extend the notation to cover sounds not found in Korean
and arranges the symbols linearly instead of in syllable blocks. Transcription of
datugo then becomes [ᄃ ᅡ ᄐ ᅮ ᄀ ᅩ] with each consonant and vowel clearly
separate in sequence from left to right. IKPA transforms Hangŭl characters from
proto-symbols into proper phonetic symbols, although organic-iconicity is dif-
ficult to identify in relation to some of the characters and they are not all system-
atically deployed throughout the system. The category ‘fricative’, for example,
is denoted by a subscript circle similar to the IPA voicelessness diacritic, but not
all fricative symbols have it. However, in principle, each articulatory category is
denoted by a separate symbol the graphic shape of which is based on some aspect
of how the vocal tract implements that category. An example transcription using
IKPA is given in Lee (1999: 123).
3.1.2 Helmont’s interpretation of Hebrew letters
A curious twist on the organic-iconic approach is found in a book by the Dutch

philosopher and alchemist Franciscus Mercurius ab Helmont published in 1667.
Helmont tried to show that the letters of the Hebrew alphabet represented the
articulatory configurations for the corresponding sounds. But his interpretations
of the letters as a kind of phonetic tablature notation, as if they were like Korean
Hangŭl, led him to incorrect conclusions about the formation of the sounds.
His description of [b], based on the shape of the letter bēth < ‫> ב‬, would have
it that ‘[l]ingua cum maxima corporis sui parte, valide admodem palato appli-
catur, adeo, ut propterea mucro ejus antrorsum quadantenus incurvetur’ (‘the
largest part of the body of the tongue is applied fully to the palate, so much
so that its tip is to some extent curved forwards’) (Helmont 1667: 60–1). A
similar tongue position is attributed to [m] from the letter-shape of mēm < ‫> ם‬:
‘Lingua palatum leniter attingit, prout et labia sese leniter exosculantur’ (‘the
tongue strikes the palate softly, and according as the lips are gently kissed by
each other’) (ibid.: 74). His diagram for [b] is given in Figure 3.2 alongside his
vocal tract diagram.
SomoudBarghouthy Phonetic Notation 77
FIGURE 3.2: Helmont’s diagram of Hebrew bēth (left) and his vocal
tract diagram (right). Reproduced with the permission of the Brotherton
Collection, Leeds University Library
3.1.3 Wilkins’s organic-iconic symbols
The Frenchman Honorat Rambaud may have been the first European to experi-
ment with an organic alphabet (Abercrombie 1948/1965: 50), but better known
and more influential is the one devised by John Wilkins, bishop of Chester, in the
seventeenth century (see Chapter 2 Section 2.3.2). As with Hangŭl and IKPA, the
organising principle is that each subsegmental category is denoted by a symbol,
and symbols combine into complex symbols – ‘natural Pictures of the Letters’ –
to represent segment-sized sounds. Wilkins’s organic alphabet is reproduced in
Figure 3.3, which shows for each sound how, according to the phonetic under-
standing of the time, the vocal tract is modified compared to the partly labelled
at-rest diagram in the lower right of the table. For voiced sounds the epiglottis
is shown in two positions to indicate its oscillation, which Wilkins erroneously
thought was the voicing mechanism, despite his claim to have read Holder
(Wilkins 1668: 357). Airflow is also represented for all sounds except oral stops
and the first three vowels, the latter presumably because the view is frontal in
order to show lip-shape; airflow is shown bifurcated in the case of laterals and
issuing from the nose in the case of nasals. In the top right of each picture is an
organic-iconic symbol intended to capture the essential articulatory state shown
in the diagram; note that these symbols are oriented to the right whereas the dia-
grams are oriented to the left. Wilkins did not intend these organic symbols to be
used in transcriptions; instead he assigned to each one a non-organic upper case
roman alphabetic symbol, shown in the top left corner.
SomoudBarghouthy
FIGURE 3.3: Wilkins’s organic alphabet and articulatory diagrams of

1668. Reproduced with the permission of the Brotherton Collection,
Leeds University Library
SomoudBarghouthy
3.1.4
Phonetic Notation
Bell’s Visible Speech notation

79
The Visible Speech notation of Alexander Melville Bell (Bell 1867) is nowadays
the most well-known organic-iconic notation system (MacMahon 1996: 838).
As Bell himself explained, ‘[i]t is the aim of this System of Letters to write
every sound which the mouth can make, and to represent it exactly as the mouth
makes it’ (Bell 1867: 70, italics added). It is devised on basically the same prin-
ciples as Wilkins’s organic alphabet, but there are differences in the categories
denoted and how they are expressed. Bell provides diagrams of the vocal tract
for consonants and vowels, reproduced here in Figure 3.4. The principal organs
of speech are labelled with numbers and shown in their neutral at-rest positions
except for the tongue-body and tongue-tip, which are shown in both lowered
and raised positions. The epiglottis is represented (not with great anatomical
accuracy) but not labelled, reflecting the fact that Bell knew it played no role in
voicing, which he correctly attributed to vibration of the vocal ligaments (Bell
1867: 46). Voicing as a separate feature is denoted by the symbol [ɪ], indicative
of the vocal folds meeting along the midline of the glottis; when combined with
other features into a segmental symbol, it becomes a short line ‘inserted within
the consonant curve’ (ibid.: 66), for example [] represents a velar articulation,
[] a velar articulation with voicing. Bell’s symbols are less explicitly organic
and more diagrammatic than Wilkins’s but they go a considerable way towards
FIGURE 3.4: Bell’s vocal tract diagrams for consonants and vowels (Bell
1867: 38)
SomoudBarghouthy
justifying his claim that ‘the sound of every symbol is deducible from the form
of the symbol itself’ (ibid.: 99), though the further claim that this can be done
‘without any encumbrance to the reader’s memory’ is perhaps less justifiable.
Although the symbols are iconically motivated, one has to learn and remember
what they stand for; it is hardly self-evident. One thing which is soon apparent
when looking at proposed organic-iconic notations is that the same vocal organ
can motivate different iconic representations. The symbol [] could denote open
lips if attention were to be focused on the right-hand part of the symbol, and there
is nothing intrinsic in [] to tell us it stands for a low back vowel with widened
pharynx if we have not memorised the conventions. Interpretative conventions
are no less necessary with iconic symbols than with other kinds. That is to say,
their denotation is not completely determined by their form. It is also question-
able whether users find it more convenient to interpret a complex symbol in terms
of its constituent parts than to memorise it as a whole.
The great value of Bell’s Visible Speech to us nowadays, apart from its value
as an experiment in organic-iconic notation, is that it shows us explicitly the state
of phonetic theory in the latter half of the nineteenth century, reminding us that
much of what we take to be the sophistication of modern phonetics was in fact
current at that time despite the absence of modern instrumentation. His apprecia-
tion of English contextual devoicing is a good example (Bell 1867: 67).
3.1.5 Sweet’s organic-iconic notation
Although Sweet recognised that Bell’s organic symbols were at the mercy of
changes in phonetic theory (Sweet 1877: 100–1), within three or four years he had
come to the view that enough was known for certain about speech production to
justify opting for an organic-iconic notation system (Sweet 1881: 183). Any tink-
ering about with it that might become necessary was, in his opinion, a small price
(a)
(b)
FIGURE 3.5: Sweet’s (1906) organic symbols for (a) consonants and
(b) vowels
to pay for avoiding the arbitrariness and ‘cross-associations’ of symbols based on

roman alphabetic letters. By ‘cross-associations’ Sweet meant the problem of, for
example, English and French phoneticians interpreting roman-based symbols in
terms of their typical letter–sound correspondences in English and French, which
he saw as particularly likely in the case of vowels (Sweet 1881: 181–2).
Sweet revised aspects of Bell’s notation (see Figure 3.5) to increase the
simplicity and distinctiveness of certain symbols, for example the symbols for
SomoudBarghouthy
nasals, thus making them easier to use. But he also made changes based on
theoretical differences concerning the production of certain sounds, for example
glides. While he was highly respectful of Bell’s analysis of vowels, Sweet did
not adopt Bell’s set of glide symbols, objecting to his category ‘glide’ on two
grounds (Sweet 1881: 197–9). The first was that it confused two distinctions:
consonant–vowel and syllabic–non-syllabic (cf. the consonant–contoid and
vowel–vocoid distinctions introduced by Pike (1943: 143–5)). Secondly, Sweet
did not accept that there could be a category of stricture between close vowel and
fricative consonant. It is not clear from Bell’s description of glides as ‘intermedi-
ate to consonants and vowels’ (Bell 1867: 69) whether he really meant intermedi-
ate in stricture or in some other sense, but modern phonetic theory does in fact
recognise that the stricture for [j], for example, tends to be closer than for [i], as
can be seen when they occur in sequence in English yeast, but not close enough
to produce the friction of [ʝ]. Sweet proposed that non-syllabic vowels be sym-
bolised by reducing the size of the vowel symbol, so that IPA [j] becomes [],
a smaller version of [] (= IPA [i]), being then the same height as a consonant
symbol (Sweet 1881: 204–5). Sweet (1906: 52–62) then used the term ‘glide’ for
coarticulatory transitional sound qualities produced epiphenomenally as a result
of the vocal tract moving from the articulation of one sound to the articulation of
the following sound, or between a sound and silence.
3.1.6 The Passy-Jones organic alphabet
The last serious attempt to launch an organic notation was by Paul Passy and
Daniel Jones (see Passy 1907). Although the symbol shapes are obviously
heavily influenced by those of Bell and Sweet (see Figure 3.6), they are made
to look more like familiar roman letters (Collins and Mees 1999: 52–3) and thus
to loosen their iconic connection with vocal tract structures. In consequence,
any advantages conferred by iconicity are diminished, while the disadvantages
of unfamiliarity remain, which may be one reason why this notation was soon
abandoned.
FIGURE 3.6: The Passy-Jones organic alphabet (Le Maître phonétique

1907, Supplement)
In the Passy-Jones system, the size of the symbol also has signification. A
83
small version of a symbol denotes a retracted place of articulation relative to

the larger version. Labiodental symbols are smaller versions of bilabial ones,
alveolars of dental ones, and uvulars of velar ones. The system also contains
‘bronchiales’ (probably because of Sweet’s view that Arabic [ħ] and [ʕ] are
produced below the glottis (Sweet 1904: 37)) which are symbolised by smaller
versions of the ‘laryngeale’ symbols. An obvious problem with the distinctive
use of symbol size is knowing which is intended if a symbol is used on its
own. Size is also used to distinguish between a ‘roulée’ (trill) and a ‘semi-
roulée’ (tap or flap). The straight line inside the consonant curve is halved in
length, no doubt motivated by the idea that a tap is like half a trill, i.e. one beat
instead of the typical two or three beats found in singleton trills (Laver 1994:
219). Jones (1918/1972: 47) describes a trill as ‘a rapid succession of taps’
and a flap as ‘a single tap’ without mentioning the very different mechanisms
modern phonetic theory takes to be responsible for their production (see Laver
1994: 224).
3.2 Organic-Analogical Notation

Symbols in organic-analogical notation systems are more arbitrary than those in
organic-iconic systems. The principle of analogical notation is that each phonetic
category is consistently denoted by the same symbol or symbol component.
However, the way this is done varies considerably in different notation systems,
as does the way in which the notation system relates to phonetic theory in terms
of explicitness and accuracy. These differences can be seen in the examples con-
sidered in the following sections.
3.2.1 Wilkins’s analogical notation
In the same work in which he published his organic symbols, Wilkins provided
a chart of analogical symbols (Wilkins 1668: 376). It is reproduced here in
Figure 3.7, where we can see his list of consonantal roman letters and digraphs
(and one trigraph) given in lower case in column 1 and upper case in column
9. In row 1 he gives the vowel letters ([ ] = the strut vowel, IPA [ʌ]; see
Wilkins 1668: 363). Column 2 and row 2 contain, respectively, the equivalent
analogical symbols for the consonants and vowels in isolation. Sounds repre-
sented in rows 3–17 are based on a straight vertical stroke, which is tilted for
the semivowels – backwards for [w], forwards for [j], perhaps motivated by
their respective relationships to front and back vowels. A short stroke adjoined
at the top of the obstruent symbols in this set denotes voice; adjoined at the
base it denotes voicelessness; this device is also used to distinguish voiced and
voiceless liquids.1 Place of articulation is denoted by the way this short stroke is
adjoined: at a 45° angle it denotes labial; horizontal extending in one direction
from the vertical denotes apical; horizontal extending in both directions denotes
dorsal. Manner of articulation is expressed through the addition of a curve at the
end of the short stroke to denote a fricative or affricate. Sounds in rows 18–25
are based on curves. The sibilant fricatives in rows 18–21 have the orientation
SomoudBarghouthy
FIGURE 3.7: The analogical symbols of Wilkins. Reproduced with the

permission of the Brotherton Collection, Leeds University Library
of their double ‘snake-like’ curves reversed to show voice and voicelessness;
85
the addition of a short horizontal stroke at the top distinguishes the ‘hushing’
postalveolars from the ‘hissing’ alveolars (in Wilkins’s terminology, the ‘dense
whistling’ from the ‘subtle whistling’). Reversal of the curve orientation is used
to distinguish median from lateral liquids (rows 22–5). For nasals (rows 26–31),
voicelessness is denoted by the addition of a curve of the same kind as is used to
denote fricative/affricate articulation.
The crucial requirement for analogical notation is consistency in the relation
between symbol and denotation, but clearly this does not obtain through all of
Wilkins’s system. Among the consonants, there are really four different analogi-
cal subsystems in which the same symbol or device, e.g. a short horizontal stroke,
or a curve, signifies something different in each one. What it signifies in a par-
ticular subsystem is determined by its context, in the same way that a tilde in IPA
notation may signify nasalisation, creaky voice or velarisation/pharyngealisation
depending on whether it appears above the main symbol or below it, or strikes
through it – [l̃], [l̰], [ɫ]. The four systems divide into phonetic classes as follows:
I (rows 3–17) = plosives and non-sibilant fricatives; II (18–21) = sibilants; III
(22–5) = liquids; IV (26–31) = nasals.
Wilkins presents his analogical symbols in the form of a list ordered within
each of the above classes according to place of articulation and voicing.
Anterior sounds precede posterior ones, and voiced precede voiceless. A
curious exception to place of articulation ordering is that the first three in the
list are the symbols for [h, w, j] with the glottal sound placed first despite
its classification as the farthest back consonant in Wilkins’s sound chart (see
Figure 2.5 in Chapter 2). Because it is presented simply as a list, and not as
a sound chart showing cross-classification, we cannot tell from the analogical
symbols themselves what phonetic features they stand for. We have to work
it out by applying our knowledge of the roman-based symbols in column 1,
which, with a few small alterations, are the symbols in the top left of the
organic chart in Figure 3.3 above, and in the sound chart in Figure 2.5 in
Chapter 2.
The modern principle of equipollence is evident in the fact that both voice
and voicelessness are denoted, the former by a line adjoined at the top of the
symbol, the latter by a line adjoined at the base. This compares to the more
privative expression of the voicing contrast in Lodwick’s notation (see Section
3.2.2 below).
There is no componential analysis implied in the vowel notation. Three
vowels are symbolised by a small ring, the other three by a small half-ring. The
placement of the ring or half-ring in the vertical plane distinguishes between the
members of each set of three. The high front [i] is missing despite Wilkins’s
treating of it elsewhere and providing keyword examples (Wilkins 1668: 363).
There seems to be no phonetic basis for whether a vowel is denoted by a ring or
a half-ring.Vowel symbols are adjoined to consonant symbols to form composite
symbols for syllables in an abugida-like fashion, but with ordering of the conso-
nant and vowel symbols to distinguish between VC and CV syllables (columns
3–8 and 10–15 respectively).
SomoudBarghouthy
86
3.2.2
Lodwick’s analogical notation
Francis Lodwick published his ‘universal alphabet’, an analogical notation

system, in 1686 (reproduced as Figure 3.8). As with Wilkins’s symbols, it is
really only the consonants which are symbolised according to the analogical
principle, a shortcoming which may reflect the generally greater theoretical
understanding of consonants than of vowels not just in late seventeenth-century
England but throughout the history of phonetics. In contrast to Wilkins, Lodwick
presents his analogical consonant symbols in a table rather than a list (although
the vowels are presented in list form), but he does not label the classificatory
dimensions of the table with anything except numbers for the rows and columns.
Nowhere in either his first or second essay on this subject did he mention any
articulators or manners of articulation; in fact there are virtually no phonetic
terms other than ‘vowel’ and ‘consonant’. Nonetheless, the arrangement of the
table into rows and columns (ranks and files in his terminology) shows awareness
of places of articulation and manners of articulation through the pairing of ana-
logical symbols with roman orthographic letters, and their implied correlations
and proportionalities.
Lodwick sets up symbols for voiced stops as ‘radical characters’ in row 1,
from which others are derived by the systematic addition of further symbol
components, thus implying a privative analysis of manner of articulation in
which voiced consonants lack voicelessness and plosives lack continuance and
nasality. Voicelessness in stops is denoted by adding an ‘n’-shaped component
to the lower right of the stem; it is doubled into an ‘m’-shape to denote a voiced
nasal. Fricatives are denoted by a lobe joined to the lower right of the stem, and
voicelessness in fricatives by a stroke extending to the left from the stem. The
representation of voicelessness is therefore not consistent across stops and frica-
tives. It is hard also to see analogical systematicity in the symbols for the oral
sonorants in columns 7–11. Column 12 is interesting in that the symbol is paired
with the Hebrew letter aleph and seems to be intended to denote a glottal stop
(Abercrombie 1948/1965: 53; Salmon 1972: 153).
No explanation is given for why Lodwick treated voiceless obstruents as
derived but voiced ones as primitives, a practice which is at odds with the modern
view that the unmarked type of obstruent is voiceless. Neither does he indicate
what he thinks distinguishes the sounds in column 2 from those in column 6 – is
it dental versus alveolar? Various anomalies soon come to light on inspection of
Lodwick’s table of consonants. The symbol in column 7 row 4 is for the voice-
less lateral [ɬ] but appears in a row of voiced sounds; the symbols in columns
8 and 9 denote glottal [h] and palatal [j] respectively but have the same hook to
the top left as distinguishes the dental/alveolars in column 2 from the labials in
column 1; place of articulation is not indicated for sonorants as it is for obstru-
ents; the symbol for [z] has the leftward extending stroke that elsewhere denotes
voicelessness.
The general principle of Lodwick’s symbols is to add a bound element to
a base free element, the latter on its own denoting a voiced plosive. The base
symbols are of the simple integral type, and his complex symbols are structurally
of the form Z = (Y ← a) (see Section 3.4 below).
In his symbols for vowels it is hard to see any componential analysis at all
87
and the diacritics ‘do not seem to be arranged with any method’ (Abercrombie
1948/1965: 53). Lodwick merely substitutes lines, angles and curves for the
alphabetic letters, and it is not hard to see that some of them are clearly motivated
FIGURE 3.8: The analogical symbols of Lodwick with a transcription of

the Lord’s Prayer
SomoudBarghouthy
by the corresponding roman letter-shape, e.g. [‫ ]׀‬based on , [o] based on <o>,
[c] based on <e>.
The value to us of the analogical notations devised by Wilkins and Lodwick is
largely that they show that componential analysis, at least of consonants, was not
only being undertaken in the English phonetic tradition (there is ample evidence
of this in the work of John Hart a century earlier; see Chapter 2 Section 2.3.2),
but that, despite errors and inconsistencies, thought was being given to denoting
them in a new system of notation free from the often ambiguous associations of
alphabetic letters, which Lodwick suspected would lead people ‘to spell words
according to their old and corrupt Custom, whatsoever Rules shall be set to the
contrary’ (Lodwick 1686: 132, reproduced in Salmon 1972: 241).
Abercrombie (1948/1965: 52) has described transcriptions in Lodwick’s
system as syllabic; see Lodwick’s transcription of the Lord’s Prayer on the
same page as the table of symbols (Figure 3.8). But the notation system is not
a syllabography according to Daniels’s (2001: 43–4) typology of notation for
writing systems (see Chapter 1 Section 1.1.2). Rather, it is an abjad with vowel
diacritics.
3.2.3 Sproat’s analogical notation
The eighteenth century seems not to have taken an analogical approach to

phonetic notation, but in the mid-nineteenth century the American Amasa D.
Sproat designed an explicitly analogical notation system in which his aim was
‘to conform the shapes of the letters to their classification’ (Sproat 1857: 10).
Sproat’s effort has gone more or less unnoticed in the phonetic notation literature.
While the consonants are arranged in a cross-classificatory table with labelled
phonetic dimensions (see Figure 3.9), the vowels are, as with Wilkins and
Lodwick, presented in a list. There is some advance on Wilkins’s and Lodwick’s
treatment of vowels, however, in so far as Sproat indicates vowel length iconi-
cally by the length of one of the component strokes.
The consonant symbols are highly systematic and consistent. Symbols for all
labial sounds have a common [] component, all ‘gingival’ (= dental and alveo-
lar) sounds have a common [] component, and all ‘palatal’ (= postalveolar,
palatal and velar) sounds have a common [] component. The labial and palatal
symbols are the same shape as Bell’s ‘lip’ and ‘back of tongue’ symbols of ten
years later (see Bell 1867: 66). Sproat has an additional ‘guttural’ class, the
symbols for the members of which all have a [‹] component. These symbols on
their own stand for the voiceless plosives [p, t, k, ʔ] respectively. They are ‘close
atonic’ sounds in Sproat’s terminology, meaning close contact of the articulators
without an accompanying glottal tone. The symbols for voiced plosives (‘close
tonic’ sounds) are derived by adding a horizontal rightward-extending stroke to
the base of the voiceless symbol, so that where [t] is symbolised as [], [d] is
symbolised as [ ]. To denote a nasal (‘close nasitonic’), this stroke is placed
halfway up the base character: [˫] is therefore the symbol for [n]. The logic
behind the notation is easy to discern from looking at the table, but the reasons
for some of the classifications fly in the face of modern phonetic taxonomy.
Although Sproat’s setting voiceless plosives up as the basic sound-type has
support in modern phonetics in the sense that they are close to being a universal
89
class (Ladefoged and Maddieson 1996: 47–53), and voiceless stops are generally
considered to be the unmarked type, his placing of ‘r’, ‘hl’ and ‘l’ in the guttural
class is inexplicable and, as far as I can tell, unprecedented. From his description,
Sproat claims that the sound corresponding to Welsh orthographic <ll> is [x], not
[ɬ] (Sproat 1857: 37–8).
FIGURE 3.9: Sproat’s analogical symbols for consonants
In both Lodwick’s and Sproat’s alphabets, one sound at each place of articula-
tion is denoted integrally. Symbols for other homorganic sounds are then derived
from them by the addition of strokes expressing analogies of manners of articu-
lation. Lodwick regards voiced plosives as the basic ‘primitive’ type of sound,
with voicelessness as a ‘distinct Characteristicall Addition’, whereas for Sproat
voiceless (‘atonic’ in his terminology) plosives are basic and voicing is added to
derive voiced (‘tonic’) plosives. In the iconic notations of Bell and Sweet, the
base horseshoe-shaped symbols [   ], which Bell calls ‘stems’, denote
fricatives (Bell 1867: 51), with additions to denote other manners. We can see in
these differences how phonetic theory, not just phonetic observation, impinges
on the structure of the notation system to set up certain categories having a
default status conceptually similar to underspecificationist approaches in modern
phonology (for example, Archangeli 1988).
Complex organic-analogical symbols are structurally similar to organic-iconic
symbols in that they are built up from primitive components, each component
denoting a phonetic category. The difference is that analogical symbols are not
constrained by having to be visually associated with a speech organ. Greater
freedom of design gives scope for symbol relationships to be more explicitly
analogous and for the logic of their construction to be more transparently evident.
However, the question of whether users actively interpret analogical symbols
SomoudBarghouthy
analytically or holistically still remains. It may be more efficient for practical
purposes to use holistic strategies, which is possibly a contributing reason to the
prevalence of arbitrary alphabetic notations like the IPA.
3.2.4 Notation for a voiced alveolar trill in Wilkins, Bell/Sweet and

Passy-Jones
In this section the influence of phonetic theory on the form of organic-iconic

symbols is illustrated by taking a close look at how a voiced alveolar trill, IPA
[r], is symbolised in three organic-iconic notation systems: those of Wilkins, Bell
and Sweet (who represent them in the same way), and Passy and Jones. They are
shown in Figure 3.10.
(a)
(b) 
(c)
FIGURE 3.10: Organic symbols for a voiced alveolar trill. (a) Wilkins,
(b) Bell and Sweet, (c) Passy and Jones
Wilkins’s organic symbol for a voiced alveolar trill, shown in the top right-
hand corner of Figure 3.10a, is based on the postures of the speech organs during
production of the sound as depicted in the cutaway vocal tract diagram, but with
the orientation turned horizontally through 180°. In the diagram, the epiglottis
is shown in two positions to represent the dynamics of its trilling action, which
Wilkins believed was the source of voicing (Wilkins 1668: 380). It is denoted in
the symbol by the undulation at the rear of the tongue. Likewise, the tongue-tip
is shown in two positions to indicate its trilling action, not only in the vocal tract
diagram but also in the organic symbol. Interestingly, Jespersen (1889: 29) stated
that because of ‘the want of a fixed configuration [. . .] the exactest manner of
writing a trill would, accordingly, be a series of signs denoting the extreme posi-
tions between which it swings’. In addition, Wilkins provides descriptive con-
ventions where [R] is classed as a sound produced by ‘Trepidation or Vibration,
against the inmost part of the Palate’ (ibid.: 369).
Wilkins mentions familiarity with Holder’s views on phonetics (ibid.: 357),
and Holder’s attribution of voicing to vibrations of ‘cartilaginous bodies’ in
the larynx (Holder 1669: 23) could be construed as identifying the epiglottis,
although it seems from Holder’s description that he was most probably referring
to the arytenoid cartilages and the glottis (Abercrombie 1986: 4). Wilkins’s error
in assigning voicing to the epiglottis points up the vulnerability of organic-iconic
phonetic symbols having to be redesigned whenever there is any revision of the
phonetic theory they are based on.
In the Visible Speech notation of Alexander Melville Bell, the symbol for
the voiced alveolar trill is, as with Bell’s other symbols, less obviously iconic
than Wilkins’s and consequently needs more explanation (see Figure 3.10b).
For example, the basic horseshoe shape for consonants denotes apical articula-
tions when oriented with the opening at the top, which is inconsistent with the
other three orientations, in which it is the closed end of the horseshoe which
correlates with where the closure occurs:  = dorso-velar,  = dorso-palatal, 
= bilabial; all the vocal tract diagrams in Bell (1867) are oriented facing right.
The short line protruding into the horseshoe denotes voicing through being a
simplified version of the symbol [ɪ], in which Bell says ‘[the] vocalising condi-
tion of the glottis is pictured’ (Bell 1867: 46), that is to say the approximation
of the vocal ligaments in the glottal midline is represented. The trilling action
of the apical articulation is shown by the separate element [] placed to the right
of the main symbol, and ‘denotes a loose vibration or quiver of the organ to
which the symbol applies’ (ibid.: 47, original italics), making it less organic
than Wilkins’s symbol – otherwise it would appear to be taking place outside
the vocal tract – and also less iconic. Wilkins’s representation in the diagram
of epiglottal trilling and tongue-tip trilling uses the same device of showing
the endpoints of the oscillations, thereby suggesting they are the same kind
of action. This parallel is lost in his symbol, however, and is also absent in
Bell’s use of a straight line for the glottal trilling of voice but a crooked line
for other kinds of trilling. Bell’s manner of symbolising a trill does not include
the symbol component for a stop consonant (a line closing the consonant curve,
e.g. [] = IPA [d]), thereby excluding trills from the class of stops. The Passy-
Jones symbol for a trill (Figure 3.10c) places a closing line inside the consonant
curve, for example the ‘roulée dentale’ [ ] and the ‘roulée vélaire’ [ ], indicat-
ing by analogy with plosives that it belongs to the class of stops (for example
[ ] = IPA [d]); note the different positionings of the line denoting voice, either
within the consonant curve or outside it, but always oriented in the direction of
the curve’s bisection).
We can see from these comparisons that iconic symbol form is closely related
to questions in phonetic theory. How voicing is denoted depends on how it is
SomoudBarghouthy
thought to be caused physiologically, and how a trill is denoted has implications
for whether it is classified as a stop or not. The latter issue is still unresolved. For
example, Laver (1994: 218–21) and Ball and Rahilly (1999: 78) treat trills as a
type of stop; Catford (1977: 128) regards it as intermediate in stricture between
a stop and a fricative; Sweet (1906: 34) sets up trills as ‘a special variety of
unstopped consonants’. Differences of opinion on this point do not require a
change in the IPA [r] symbol precisely because it is not an organic-iconic symbol.
The problem with truly iconic symbols is that they have the double function of
denoting abstract phonetic categories and denoting physical vocal tract configu-
rations as if they are one and the same thing. That is to say, they conflate physical
and abstract articulatory space (see Chapter 6 Section 6.5.1). This can be seen
most clearly in Wilkins’s organic symbols. The flexibility of IPA notation comes
partly from the dissociation of these functions such that the symbol denotes only
the categories whose definitions and roles are supplied by the interpretative con-
ventions, which can change without the symbol having to change. In contrast to
organic symbols, the meaning of an IPA symbol is maximally underdetermined
by its form in the same way that word-meanings are maximally underdeter-
mined by their word-forms, apart from marginal cases of onomatopoeia and
sound symbolism. This arbitrariness of form–meaning relations allows meanings
and forms – contents and expressions – to shift independently of each other in
processes of natural language change and in scientific notation systems. What
distinguishes the latter is the stabilising role of the interpretative conventions.
3.3 Analphabetic Notation

The term ‘analphabetic’ was coined by Jespersen, who devised a typographically
complicated notation incorporating Greek as well as roman letters, superscripts
and subscripts, numbers and different typefaces (Jespersen 1889). His system
was used by Bally and Sechehaye to supplement Saussure’s descriptions of
speech sounds when they edited his lectures from students’ notes to create the
famously posthumous Course in General Linguistics (see Saussure 1974: 41–9).
Jespersen’s system will be looked at in the next section.
Analphabetic notation is formulaic and is perhaps the most explicitly analytic
of all types of phonetic notation, the depth of analysis being of course dependent
on the sophistication of the phonetic theory on which it is based. The essential
principle is that each phonetic category is separately denoted by a discrete letter
or number assigned to it, a principle which may first have been put forward by
Erasmus Darwin, Charles Darwin’s grandfather, in the notes to a poem published
in 1803 (Abercrombie 1967: 113). Abercrombie (ibid.) mentions a ‘rather simple’
analphabetic notation devised in 1821 by a Birmingham school teacher called
Thomas Wright Hill which was not published until 1860. The formulaic nature of
analphabetic notation enables an unlimited number of classificatory dimensions
to be separately denoted. After Jespersen, it was taken up some decades later
by Kenneth Pike, who wished to go beyond the limitations of short descriptions
such as ‘voiceless alveolar plosive’ to bring out the complexity of segmental
structure. His ‘functional analphabetic symbolism’ (Pike 1943: 154–6), described
by Abercrombie (1967: 114) as taking the analphabetic principle as far as it can
probably go, is considered in Section 3.3.2 below. Analphabetic notation is more
93
tightly bound to phonetic theory than most other kinds, and bears much con-
ceptual resemblance to feature geometry in having a hierarchy of organisational
layers (see Section 3.6 below). As with iconic and analogical symbols, changes
in phonetic theory about how a sound is produced would precipitate changes in
the sound’s formula as well in the conventions for its interpretation, whereas, by
contrast, the equivalent alphabetic symbol, such as an IPA symbol, would remain
the same despite a change in its denotation.
The richer the theory on which an analphabetic notation is based, the longer
the formulae will be. Hill’s ‘rather simple’ system symbolised [p] as ‘1/1’ where
the numerator denoted the first active articulator numbering from the front of the
mouth, and the denominator denoted the first passive articulator; the separating
line denotes complete closure.
3.3.1 Jespersen’s analphabetic notation
Frustrated with the ‘bewildering confusion’ of symbols and phonetic terms which
he saw around him at the time the IPA was coming into existence, Jespersen
took inspiration from the possibility of basing a notation system for phonetics
on the formulaic notation of chemistry, an idea which he knew had already been
proposed for anthropology (Jespersen 1889: 2). He started with the premise that
‘[a]ll sounds are equally compounds’ (ibid.: 6), from which he challenged the
alphabetic aspect of Bell’s symbols, although very much embracing his organic
approach. Jespersen saw some of the same lack of componential transparency in
some of Bell’s notation as in roman alphabetic letters, which he blamed for illogi-
calities such as the manner of denoting voiceless vowels by adding a voiceless
symbol to a symbol which contains a component denoting voice. For example,
[ſ] (= IPA [i]) contains the voice symbol [ɪ] as its stem, and so does the voiceless
version in which the open glottis ‘aspirated’ symbol is linked to it by the ‘linking’
[˚] symbol to give [ſ˚O] (= IPA [i ]̥ ). On a strict interpretation, Bell’s [ſ˚O] and IPA
[i ]̥ denote voiceless and voiced at the same time. In concluding that ‘we must,
in fact, symbolize not sounds, but elements of sounds’ (ibid.: 7, original italics),
Jespersen was consciously continuing what he, along with many others, believed
was the teleological advance of writing systems from pictography through logog-
raphy and syllabography to modern alphabetic writing (ibid.: 8–9).
There is a parametric conception to Jespersen’s scheme. He envisaged arrang-
ing information about each active articulator’s contribution to a sound on a sepa-
rate line, using an assortment of letters and numerals to denote different kinds of
categories systematically. A selection is shown in Table 3.1.
An example of Jespersen’s notation is given in (3.1) equivalent to IPA [t],
with explanations.
(3.1) β 0f Complete closure of tongue-tip against alveolar ridge

ε3 Glottis in the voiceless state
A feature of Jespersen’s notation worth examination is that the conventions

for a symbol are contextually determined. What a numeral denotes depends
SomoudBarghouthy
TABLE 3.1: Examples of Jespersen’s notation for phonetic categories
Notation Phonetic category denoted

Greek letters Active articulators: α = lower lip, β = tongue-tip, γ =
tongue-body, δ = soft palate, ε = larynx and vocal folds,
ζ = respiratory organs
Lower case roman Passive articulators: a = endolabial, b = neutral labial,
letters c = exolabial, d = interdental, e = dental, f = alveolar
(printed as t in the original, p. 14), g = palatoalveolar,
h = palatal, i = postpalatal, j = velar, k = uvular, l =
pharyngeal
Arabic numerals Median stricture degree with 0 = complete closure, and
also glottal states and degree of syllable stress
Roman numerals Lateral stricture degree
Italic letters Lax sounds
Thick (bold) type Tense sounds
r Trilling
,, Inactivity of an articulator
< and > ‘Less than’ and ‘more than’ when modifying numerals
( and ) Gliding towards or away from an unreached position
.. Length (continuation of preceding specification)
Some terminology has been modified in line with current IPA usage.
on which Greek letter it is associated with. In the context of α (labial) β

(coronal) and γ (dorsal), numerals denote degree of articulatory stricture, but
in the context of δ they denote degree of velopharyngeal port (VPP) aperture,
for which Jespersen recognises four values: 0 for VPP closure, 1 for a ‘nasal
twang’, 2 for nasal consonants and most nasalised vowels (Jespersen mentions
Portuguese), and 3 for French nasalised vowels, which he says ‘are of a differ-
ent order’ (ibid.: 30). Jespersen (ibid.: 38) defended this practice on the grounds
that it is too difficult to devise different easy-to-use symbols for everything that
needs to be denoted.
Despite Jespersen’s commitment to articulatory-based notation, there is little
objective confirmation for his category distinctions. It is unlikely, for example,
that the four degrees of VPP aperture for which he provides notation were ever
directly observed, and very likely that they express auditory-perceptual judge-
ments about levels of nasal resonance. Jespersen repeats various speculations
by contemporary phoneticians about how French nasalised vowels were pro-
duced, showing that phonetic theory was seeking articulatory explanations for
auditory-perceptual phonomena. As phonetic theory advances, different expla-
nations become available. Laver (1980: 86), for example, has pointed out that
nasal-like ‘cul-de-sac’ resonances can be produced by certain configurations in
the laryngopharynx without any VPP opening. To accommodate this discovery,
the notation system would have to be modified, not just the conventions for its
interpretation.
SomoudBarghouthy
3.3.2
Phonetic Notation
Pike’s analphabetic notation

95
Compared to Jespersen’s notation, Pike’s ‘functional analphabetic symbolism’ is

both simpler and more complicated. It is simpler in that only roman alphabetic
letters are employed, albeit in upper and lower case, italic and roman, and, at
least for English speakers, there is the advantage that letters for categories are
derived acrophonically from English phonetic terms. The hierarchical organisa-
tion of categories is also easier to discern. What makes it more complicated is the
level of detail it goes into such that categories from every possible dimension of
phonetic classification have to be denoted for every sound. For example, Pike’s
formula for IPA [t], believe it or not one of the shortest of his formulae, is shown
in (3.2).
(3.2) MaIlDeCVveIcAPpaatdtltnransfsSiFSs. (Pike 1943: 155)
There are thirty-four letters in the formula compared to only five characters in
Jespersen’s formula for [t] (three letters and two numerals). The conventions for
interpreting the formula are presented hierarchically in Table 3.2.
The denotation of a letter depends on where in the hierarchy it comes. As in
Jespersen’s system, then, the interpretation of a component of the formula is
context-dependent. For example, ‘a’ denotes alveolar when governed by C and
p, but articulatory strength when governed by C and r.
Pike’s hierarchy of categories shows very clearly how dependent a nota-
tion is on phonetic theory. His categories are overwhelmingly oriented towards
articulation and aerodynamics, that is to say to events which are confined to
the speaker’s vocal tract. This reflects the articulatory theory on which they are
based. Of the some 160 category terms in the hierarchy, 140 are articulatory
or aerodynamic. In addition to these, there are three labelled ‘acoustic impres-
sions’ – ‘loud’, ‘normal’, ‘soft’ – and fourteen covering syllabic function and
stress. Under ‘segmental type’, Pike has three categories which are rather dif-
ferent from the others and in practice very difficult to apply, not least because
of some vagueness in Pike’s explanations. They are ‘instrumental’, ‘real’ and
‘perceptual’. An instrumental segment, or ‘phone’, is one which is ‘identified or
identifiable by some instrumental means; repeated contiguous or noncontiguous
utterances of the same instrumental phone will (by definition) be found identi-
cal, within the range of sensitivity of some particular instrument’ (Pike 1943:
115). It is not clear, though, how Pike envisages that the same instrumental
phone can be repeated in different utterances. A real segment ‘is one which the
average normal ear, after training, elimination of phonemic prejudice, and so on,
would identify, or be physiologically capable of identifying; in repetitions of a
real phone any variation detectable only by instruments is below the threshold
of perceptual ability of the ear’ (ibid.). Pike seems here to be referring to the
normalising function of the auditory system by which sounds that can be shown
instrumentally to differ sound the same to a listener. A perceptual segment ‘is
one which a particular ear at a particular time believes it identifies; repeated
utterances which are to a particular observer occurrences of the same perceptual
phone, may to someone else be different perceptual phones’ (ibid.). The reason
SomoudBarghouthy
TABLE 3.2: Conventions for interpreting Pike’s analphabetic notation for [t]
(from Pike 1943: 154-6)
M = Productive mechanism
a = Airstream mechanism
I = Initiator
l = Lung air (pulmonic)
D = Direction
e = Egressive
C = Controlling mechanism (relating to articulation and phonation)
V = Valvate stricture
v = Velic closure
e = Esophageal closure
I = Degree of constriction
c = Complete closure
A = Acme stricture (the stricture with the greatest degree of closure)
P = Primary (taking place in the oral cavity)
p = Point of articulation
a = Alveolar
a = Articulator
t = Tongue-tip
d = Degree of articulation
t = Duration
l = Long
t = Type of articulation
n = Normal (that is, not a tap, flap or trill)
r = Relative strength of:
a = Articulation
n = Normal (that is, neither fortis not lenis)
s = Shape of articulator
f = Flat
s = Straight
S = Segmental type
i = Instrumental (that is, capable of being verified instrumentally)
F = Phonetic function
S = In the syllable
s = Syllabic contoid (Pike is here considering sounds in isolation
which he takes by definition always to constitute a syllable)
for focusing in on these three categories is that Pike appears to be trying to build
into his notation a means of denoting the status of a sound in terms of whether it
is instrumentally validated, and whether there is agreement between observers.
As far as I am aware, no other notation system has tried to incorporate this kind
of meta-analysis into its symbolisation. To specify segment type in these terms
would mean subjecting every piece of data to be transcribed to instrumental
97
analysis, and also to some procedure to determine whether listeners with ‘normal
average ears’ would hear the same sound. Pike may have felt it necessary to
allow for this because his method relies on inferring vocal tract events accurately
from auditory analysis.
3.4 Alphabetic Notation and the Structure of Symbols

In the context of writing systems, the term ‘alphabet’ refers to letters which
have a correspondence with analytical units of speech at the level of indi-
vidual consonants and vowels (Daniels 1996: 4). Extending this into phonetic
notation means that systems like Lodwick’s analogical symbols and Bell’s
Visible Speech could justifiably be called alphabets. Indeed, those authors
presented them as ‘universal’ alphabets. However, it is useful in a typology
of phonetic notation to have a much narrower definition of ‘alphabetic’. Kelly
and Local (1989: 58) point to the integral nature of alphabetic symbols in
which categories are ‘non-componentially represented’. The integral–compo-
nential distinction is easily illustrated if we compare the integral IPA symbol
[d] with various composite equivalents, as in (3.3) (original terms replaced
with current IPA terms). Note, though, that there is an integral element to
some of the composites such that one symbol component denotes more than
one category.
(3.3) IPA integral [d] ‘voiced alveolar plosive’

Wilkins’s organic-analogical [˥] ‘voiced alveolar’ + ‘plosive’
Lodwick’s organic-analogical [ ] ‘voiced plosive’ + ‘alveolar’
˥
Sproat’s organic-analogical [ ] ‘alveolar plosive’ + ‘voiced’
Bell’s organic-iconic [] ‘alveolar’ + ‘plosive’ + ‘voiced’
The base symbols of an alphabetic notation are the glyphs of a set of alpha-
betic letters used as integral symbols. All symbolisations of general phonetic
models in an alphabetic notation comprise at least one integral symbol which
functions as a base. A base symbol can occur on its own as a simple symbol, for
example [d], or be modified by the addition of diacritics such as [d̥], or of what
Wells (1995b) calls pseudo-diacritics such as the right top hook in [ɗ], to form
a complex symbol.2 The relationship between a base and a diacritic is similar to
that between a stem and an affix in morphology, for which Lyons (1977: 521)
provides the general formula given in (3.4).
(3.4) X+a→Y
In relation to alphabetic phonetic symbols, X stands for any base symbol, a any
diacritic, and Y a resulting composite symbol. Lyons’s formula only states an
additive relation between the constituents, but we can go further in analysing
constituent relations and point out that a, being bound, is dependent on, or subor-
dinate to, X, which is not bound. a also determines, or modifies, X in a particular
way which is in opposition to the way that X may be determined by another
SomoudBarghouthy
element, b or c. For example, the base symbol [d] is determined differently by
the different diacritics in the complex symbols [d̥ d̤ d̪] although the dependency
relations are the same. Dependent determinative relations can be shown by the
formula in (3.5).3
(3.5) Z = (Y ← a)
Example d̥ = (d ← ̥)
Base symbols can combine in compounds to denote a single phonetic model

such as IPA [dz] or [ɡ͡b]. The symbol components in a compound are independ-
ent and mutually determining. That is to say, in [ɡ͡b] neither symbol component
is dependent for its occurrence on the other – the [ɡ] component determines the
[b] component and vice versa. This kind of relationship can be represented as in
(3.6).
(3.6) Z = (X ↔ Y)
Example dz = (d ↔ z)
It is important to appreciate that these relationships are between symbol com-

ponents, not their denotata. Neither are they between components of glyphs.
To take the last point first, the glyph ‘b’ can be analysed into a bowl and a left-
hand ascender, and the glyph ‘ɓ’ into a bowl, left-hand ascender and right top
hook. In the phonetic symbol [b] these components have no separate denoting
function, while in [ɓ] the top hook does. These judgements can only be made
by considering how the symbols denote their respective theoretical models.
Concerning the point about relations between denotata, different relations
obtain in the symbol [ɡ͡b] and in its denotatum ‘voiced labial-velar plosive’.
In [ɡ͡b] the [ɡ] and [b] components do not denote ‘voiced velar plosive’ and
‘voiced bilabial plosive’, which shows that a compound symbol such as the
IPA symbols [ɡ͡b] and [dz] can retain integralness. The constructional relation
means that their denotations are not the same as [ɡb] and [dz], which are not
symbols for single phonetic models but in each case for two models occur-
ring adjacently in a transcription. Similarly, the symbol [d̥] does not imply
that voicelessness is somehow dependent on alveolar plosiveness. That we are
talking about the structure of symbols and not the structure of the phonetic
models they denote is clear when we consider the distinction between graphi-
cally continuous and discontinuous symbols. This distinction accounts for the
difference between a true diacritic and a pseudo-diacritic. Adding the true
diacritic [ʲ] to a base results in a discontinuous complex symbol such as [dʲ];
by contrast, adding the pseudo-diacritic ‘right top hook’ results in a continuous
complex symbol such as [ɗ]. Clearly there is no such distinction to be made
between their denotata, just as the dependency of a plural morpheme on its
base morpheme, for example in book+s, in no way implies that the phenom-
enon of plurality depends on the phenomenon of a book. Figure 3.11 sets out
a structural typology of alphabetic phonetic notation with examples from the
IPA which makes use of all the types.
SYMBOL
99
SIMPLE COMPOSITE
ad
COMPLEX COMPOUND
CONTINUOUS DISCONTINUOUS CONTINUOUS DISCONTINUOUS
a d dz au b
FIGURE 3.11: Structural classification of alphabetic phonetic symbols

with examples
One type of symbol can be embedded within another. For example, in [dzʲ] we
have a continuous compound constituent determined by a bound discontinuous
diactritic, the structure of which can be expressed as in (3.7).
(3.7) Z = ((Y = (W ↔ X)) ← a)

Example dzʲ = ((dz = (d ↔ z)) ← ʲ)
In [aʉ] we have a discontinuous compound denoting a diphthong, one element

of which is denoted by the continuous complex symbol [ʉ]. Its structure can be
expressed as in (3.8).
(3.8) Z = (X ↔ (Y ← a))
Example aʉ = (a ↔ (u ← ‘bar’))
Where there is more than one diacritic, for example [ɗ̪ ʷ] with three diacritics, we
have co-determination of the base symbol, which can be represented as in (3.9).
(3.9) Z= a→Y←b Example ɗ̪ ʷ = ‘top hook’ → d ← ʷ

↑ ↑
c ̪
The arrangement of a, b, c around Z in a constellation represents the lack of

functional ordering of the bound diacritical components in [ɗ̪ ʷ] – placing the
diacritics in a different arrangement does not denote a different model. Where
functional ordering can be established, for example in [ʰt] versus [tʰ], it can be
shown formulaically as in (3.10) with a double arrow. Note that the denotation of
[ʰ] is the same whether it precedes or follows the base symbol but the resulting
complex symbols are not equivalent (for more on functional ordering in symbols
see Section 3.5 below).
(3.10) Z = (a ⇒ Y) Z = (Y ⇐ a) Z = (a ⇒ Y ⇐ a)
ʰt = (ʰ ⇒ t) tʰ = (t ⇐ ʰ) ʰtʰ = (ʰ ⇒ t ⇐ ʰ)
SomoudBarghouthy
Where there is formal identity of two diacritics which denote different categories,
for example in [ã] and [a̰], the structure is not functionally ordered and should
therefore be represented as in (3.11).
(3.11) Z = (Y ← a) Z = (Y ← a) Z = (a → Y ← b)
ã = (a ← ̃ ) a̰ = (a ← ̰) ã̰ = ( ̃ → a ← ̰)
The distinction between simple and compound symbols in alphabetic nota-

tion is similar to the uniliteral–multiliteral distinction made by Daniel Jones
(1918/1972: 336–8), although, inappropriately in my view, he applied it in a
typology of transcription rather than of notation. The criterion is whether a single
general phonetic model is denoted by a single base symbol glyph or by more
than one. For example, the IPA symbol [ð] is uniliteral whereas IPA [dz] is mul-
tiliteral, being transparently a motivated amalgam of [d] and [z]. Symbols with
diacritics such as [dʲ] and pseudo-diacritics such as [ɗ] are uniliteral complex
symbols. By definition, simple alphabetic symbols are uniliteral and complex
symbols are those with bound modifying diacritics or pseudo-diacritics. But not
all multiliteral symbols are compounds of free symbols. In Ellis’s palaeotype
notation, multiliteral [dh] denotes the equivalent of IPA [ð] but the [h] compo-
nent is not a free symbol. Ellis (1867: 16) defines it as a diacritic which cannot
occur on its own. Its structure is therefore that of (3.5), not (3.6).
Despite the theoretical advantages claimed for non-alphabet notation systems,
it remains a fact that very few transcriptions have ever been made with them
other than by their inventors. Generally, they have never got beyond the design
stage with a few short illustrative transcriptions intended to promote them. It is
perhaps paradoxical that purpose-built notations have not been as successful as
notations like the IPA which have been adapted from sets of already existing
alphabetic characters. Success, however, depends on what is being judged. If
phonetic notation systems are judged on how successfully they denote separate
phonetic categories, then some kind of analogical notation would doubtless carry
the day. The problem for analogical notations is that ultimately their success is
judged by how they fare in transcriptions, not how they fare in charts. While
charts reveal the logic or otherwise of symbol design, it is in transcriptions that
a notation has to prove itself, and the history of transcription is a history of the
familiar being more highly valued than the logical. We have seen earlier that
even in iconic notation, conventions are required for interpreting the symbols
because icons cannot be completely free of arbitrariness. An icon is meaningful
only to those who already know what it is supposed to stand for – witness, for
example, the baffling icons on household appliances and car dashboards. If we
cannot dispense with conventions, then the advantages claimed for iconic and
analogical notations are severely undermined.
Alphabetic phonetic symbols, much more than any iconic or analogical
symbols, are directly the descendants of resources for spelling, and of particular
relevance is their use in respelling and script conversion. In respelling, orthog-
raphies are readjusted to maintain sound–spelling correspondences which have
been put out of line by phonological change, and in script conversion, new cor-
respondences are established in one language borrowing the writing system of
another with a different phonology. We can see an example of the former in
101
English in the late twelfth-century Orrmulum after the vowel mergers of late Old
English, and more extensively in the works of John Hart after the Great Vowel
Shift. Examples of the latter are legion throughout the spread of alphabetic
writing. This intimate link with spelling means that the glyphs already have a
long history of use in lengthy texts as orthographic letters, and therefore look
more at home in transcriptions than purpose-made iconic or analogical symbols,
despite not fitting so transparently or neatly into charts and tables.
3.4.1 Pre-nineteenth-century alphabetic notation
Writers on phonetics who confined their symbols to the characters of their

written language were, by the definitions proposed in Chapter 1 Section 1.3,
creating proto-phonetic notation by using letters to denote sounds for which they
could offer some theoretical account. This was largely the practice in ancient and
medieval times with the notable exception of the anonymous ‘First Grammarian’
in twelfth-century Iceland (see Chapter 2 Section 2.3.1). He identified thirty-six
possible phonemic and allophonic vowel qualities in Old Norse. To represent
them he had to invent new symbols based on the five Latin vowel letters to deal
with nasalisation, length and vowel height, for example [o̜] for a vowel more
open than [o], [ȯ] for a nasalised vowel and [ó] for a long vowel (Haugen 1972:
15–19, 34–41). Figure 3.12 presents the full set.
FIGURE 3.12: Vowel symbols of Iceland’s ‘First Grammarian’
There are simple integral symbols such as [a e o], and complex symbols
with bound diacritics either continuous, for example [ę ø], or discontinuous, for
example [á a. a.́]. Structurally, the complex symbols are of the types illustrated in
(3.5) and (3.9), that is to say they comprise a base determined by one or more
bound diacritics without functional ordering. His innovation for consonants was
to symbolise geminate consonants with small roman capitals instead of double
letters (Haugen 1972: 46).
Iceland’s First Grammarian may have been one of the first actually to create
proper phonetic symbols. He did so in a way which has been repeated many times
since, namely by modifying existing letter-shapes from the roman alphabet, ‘the
most profitable source of new letters’ (Abercrombie 1981: 211). That the roman
SomoudBarghouthy
alphabet has provided most of the material resources for alphabetic phonetic
notation is the result of historical coincidence. The two things which have coin-
cided are the longstanding ubiquity of the roman alphabet as the basis for witten
languages in western Europe and the fact that modern phonetics has developed
mostly in this part of the world.
Around 1180 a manuscript called the Orrmulum (‘forrþi þatt Orrm itt
wrohhte’) appeared, written by an Augustinian cleric named Orrm, or Orrmin
(from Old Norse meaning ‘worm, serpent’ and ‘worm-man, serpent-man’), in
what has been identified as a Lincolnshire dialect. His significance in phonetics is
that he employed a unique spelling system by adapting, extending and systema-
tising orthographic features in other scribes’ work (Anderson and Britton 1999:
306). His main concern was to represent in writing the consonant and vowel
quantities of the spoken Lincolnshire English of the time, which still had conso-
nant gemination as well as distinctive vowel length. A number of vowel mergers
involving both qualities and quantities had taken place in Old English, particu-
larly in the East Midlands varieties, resulting in dislocations of sound–spelling
correspondences which Orrm’s spellings sought to mend. The most striking
device in the Orrmulum is the use of double consonant letters to indicate that a
preceding vowel is short, for example <upp> for up, <annd> for and, and indeed
his own name <Orrm>. We use this device in present-day English orthography
in derivations such as let–letting, hid–hidden, rob–robbery etc. Where doubling
could be confused with consonant gemination, he used a breve over the vowel;
long vowels were marked with an acute accent. We can only speculate why he
did not generalise the use of the vowel quantity diacritics, seeming to prefer the
typologically odd double consonant device. The oddness of this device is that it
increases the graphic quantity of one element, the consonant, to represent a rela-
tively smaller phonological quantity of the preceding element, the vowel, thus
not only displacing the locus of the vowel quantity feature but implying a quan-
tity increase in the consonant. It is very reminiscent of the orthographic repre-
sentation of quantity in Scandinavian languages, where, in monosyllabic words,
short vowels are always followed by long consonants (Elert 1964). For example,
Swedish hat ‘hate’ pronounced [hɑːt] is opposed to hatt ‘hat’ pronounced [hatː].
One wonders if some knowledge of this in Old Norse orthography influenced
Orrm or other scribes from whom he may have heard about it; Old Norse speak-
ers had been settled in eastern England during the ninth and tenth centuries.
Haugen (1972: 75) admits to being tempted to suppose that the Icelandic First
Grammarian may have studied in England, and even that there may have been
some connection between him and Orrm, though Haugen dismisses the possiblity
due to lack of evidence (ibid.: 72 n.2). Regarding consonants, in written English
there were one-to-many spelling–sound relations such as <c> ↔ /k, ʧ/, <ʒ> ↔ /j,
ɡ, ʤ/, <sc> ↔ /sk, ʃ/, which Orrm wished to disentangle. He used the H-digraphs
<sh> and <ch>, which ironically became such tokens of irrationality for spell-
ing reformers in later centuries, for example Sir Thomas Smith. Smith (1568:
166–71) also complained of contemporary spellings that showed vowel quantity
by consonant doubling, such as <henne> for hen in which ‘the last syllable (ne)
has no sound’ (ibid.: 49). Anderson and Britton (1999: 316) suggest a rational
phonological basis for Orrm’s H-digraphs, namely that, unlike the other non-
plosive obstruents /f, θ, s/, /ʃ/ and /ʧ/ did not have voiced allophones, which may
have motivated the use of <h> as a digraph component. However, Scragg (1974:
46–7) notes that <h> in Latin had a diacritical function in <ch> and suggests
that the creation of <sh> was modelled on it. To deal with the trivalency of <g>,
Orrm restricted <ʒ> to correspond only to /j/, adapted Carolingian minuscule
<g> for /ʤ/, and by combining graphical features of both coined the new letter
<gˉ> to correspond to the much more common phoneme /ɡ/. Orrm’s may be the
first systematic attempt to tidy up sound–spelling correspondences in English by
providing phonetically motivated respellings, but with no phonetic observations
or conventions for interpretation his use of letters to represent pronunciation has
to be seen as pseudo-notation and the text as pseudo-transcription.
Iceland’s First Grammarian and England’s Orrm were isolated figures in the
history of phonetics whose work remained unknown or neglected for centuries.
By contrast, the sixteenth-century English orthoepists and spelling reformers
started a tradition of phonetic theorising and concern over the representation of
pronunciation which has a fairly clear, if not direct, line to modern phonetics.
We have seen above that some in this tradition experimented with non-alphabetic
notation, but most tried to use roman alphabetic letter-shapes as far as possible. It
was when the roman alphabet could not provide an appropriate symbol, and some
alternative had to be sought, that close attention had to be paid to the phonetics
of the recalcitrant sounds in order to try to understand what was different about
them. The basic principle of one-symbol-one-sound has been in evidence, albeit
often overridden, throughout the history of phonographic writing and has been
carried into alphabetic phonetic notation as an explicit aim, so it is no surprise
that the first targets of the sixteenth-century English spelling reformers were
English digraphs, although frustration with digraphs was not a feature of notation
in the following century. Hart designed consonantal letters, shown in Figure 3.13
with keywords, to replace digraphic <sh>, <th>, <ch> and also a letter for /ʤ/.
Hart’s aim was twofold: to reform English spelling and to provide a general pho-
netic alphabet (Danielsson 1955: 54). In their capacity as new letters for English
spellings, these are letters which have correspondences with units of pronuncia-
tion and should be presented in othographic angled brackets, but in their capacity
as elements in a general phonetic alphabet they are phonetic symbols denoting
phonetic models and can be presented in phonetic square brackets.
Robinson (1617) devised his own theoretical scheme of classification for con-
sonants and vowels, and a set of symbols to go with it. They are all clearly based
on roman letter-shapes through reorientations of one kind or another, with some
invented symbols. Symbols for long vowels are fashioned by turning short vowel
symbols either through 180° ([n – u], [ɛ – ɜ], [t – ʇ]) or back to front ([s – ƨ]), or
by a different symbol altogether ([e – ɤ]) (see Chapter 2 Figure 2.3). Consonantal
symbols are also based on roman letter-shapes, several of them on vowel letters,
for example [a] for a labial plosive. The point of great significance is that they
qualify by the definition proposed in Chapter 1 Section 1.3 as a set of proper
phonetic symbols which Robinson used to make proper phonetic transcriptions.
Structurally, Robinson’s symbols are all simple integral symbols except for his
method of indicating voice and voicelessness (see Chapter 2 Section 2.3.2).
Later seventeenth-century writers who used integral symbols were less
SomoudBarghouthy
FIGURE 3.13: Hart’s new letter-shapes with keywords
disposed than Robinson to design new ones. Wallis used upper case roman
letter-shapes in italics for consonants and did not seem to be too bothered about
digraphs. He distinguished [θ] and [ð] as [Th] and [Dh], and used [Ch] and [Gh]
for [x] and [ɣ], regarding [ʃ ʒ ʧ ʤ] as compound sounds formed by adding [Y]
(= IPA [j]) to [S Z T D] respectively (Wallis 1765: 37–8, in Kemp 1972: 201–5);
we can see here an analogical principle at work in the context of integral symbols
used in composites. He represented the velar nasal by adding an overline to the
symbol for the alveolar nasal to give [N̅ ], which we have to regard as an inte-
gral but discontinuous simple symbol rather than a base plus diacritic complex
symbol because the overline is never used in any other symbol. For vowels,
Wallis used lower case italic forms of the five Latin vowels and employed acute,
grave, circumflex and breve diacritics to distinguish nine monophthongs. His
notation for diphthongs is very similar to some modern solutions, regarding them
as compounds which are analysed into a ‘preposed’ vowel and a ‘subjoined’
vowel, or glide (ibid.: 36, and 199), giving us [ay ey aw ow] etc. Wallis’s nota-
tion can be seen in Figure 2.4 in Chapter 2. From a structural point of view his
monophthong vowel symbols are either simple, or complex comprising a base
and a bound modifying diacritic; his diphthong symbols are biliteral compounds
despite his describing them as ‘subjoined’ constructions.
Like Wallis, Wilkins used upper case for consonants, though curiously not for
the bilabial nasal,4 and lower case for vowels. Also like Wallis, he was not shy
of digraphs. In addition to Wallis’s [Th] and [Dh], Wilkins has [Sh] and [Zh] but
also [mh Nh Ngh Lh Rh] as voiceless (‘mute’ in his terminology; see Chapter 2
Section 2.3.3) correlates of [m N Ng L R], thus extending analogical use of ‘h’
as a component. Wilkins’s sound chart is shown in Chapter 2 Figure 2.5. For
vowels, he introduces a new symbol [ȣ], which seems to be equivalent to IPA
[ʊ], [ι] equivalent to [ɪ], and a symbol for a schwa-like vowel in the form of a
‘Y’ with an attached hook at the base (see Figure 2.5), paired quite insightfully
as the voiced correlate of [H] (= IPA [h]) if one accepts the lack of an intrinsic
tongue-position specification for schwa (see Bates 1995: 266–7; Giegerich 1999:
191). Pairing schwa with [h] in this manner is also found in Tucker (1773: 26–8),
who sets them up as the basic vowel and consonant, describing them as the
‘sonorous’ and ‘spirate’ roots from which ‘all our other vocal sounds are made
to spring’. He used [υ] as a schwa symbol, placing it at the end of his ‘reformed
alphabet’ (ibid.: 8).
The practice of upper case for consonants and lower case for vowels, and the
acceptance of some digraphs, is continued in Holder’s alphabetic notation except
that voicelessness (‘breath’) in nasals and liquids is denoted by a reversed apos-
trophe [M‛ N‛ Ng‛ L‛ R‛] because Holder objected to over-use of H-digraphs
(Holder 1669: 67–72). The reversed apostrophe is a bound element modifying
a base and exemplifies the structural type presented in (3.5) above, the digraphs
exemplifying the (3.6) type. H-digraphs and a turned apostrophe [‛] were later
both to appear in Ellis’s palaeograph notation to denote voicelessness (see
3.4.3 below). Lower case consonant symbols are sometimes given as variants,
sometimes with [θ] and [ϑ] instead of [th] and [dh]. An innovation of Holder’s
which anticipates the modern use of * in linguistics is an ‘obelisk’ († rotated 90°
anti-clockwise) placed to the left of a symbol to show that the sound-type is not
attested in a language; he uses it with voiceless sonorant consonants and nasal-
ised continuant consonants. Digraphic [oo] appears as a vowel symbol as does
Wilkins’s [ȣ] with the same phonetic value.
Isaac Newton in his younger years tried his hand at alphabetic phonetic nota-
tion, inventing [ഗ] for IPA [ð] and reversing it vertically to provide a symbol for
IPA [ʒ]. Hebrew ‫ ע‬and ‫ ש‬were employed for [ŋ] and [ʃ], the former continuing a
belief found throughout phonetic writings of the time, for example Wallis (1765:
17–18, in Kemp 1972: 160–3), Holder (1669: 57), that the sound correspond-
ing to ‫ ע‬was originally a velar nasal, though Wilkins (1668: 358) was doubtful.
Newton also noted that a bilabial trill is possible and represented it by [pw]
(Elliott 1954: 10).
In the eighteenth century no substantial advances in notation took place, and
in fact some backward steps are evident, such as the proposal in Yeomans’s 1759
Abecedarian that the voiced and voiceless dental fricatives should share the same
[ə] symbol, though reversed ‘c’ [ɔ] for IPA [ʧ] and [ʞ] for [ʃ] are reasonable
enough suggestions (see MacMahon 1994: 19); [ʞ] has in fact been added to
the 2008 ExtIPA chart (see Appendix) to denote velodorsal articulation, that is
to say, the active movement of the soft palate towards the dorsum of the tongue
as reported in Ball et al. (2004). One of the few attempts at a comprehensive
SomoudBarghouthy
notation was that developed by Thomas Spence for his dictionary The Grand
Repository of the English Language, published in Newcastle-upon-Tyne in 1775.
Spence had an ingenious way of avoiding the discontinuity of H-digraphs by
stripping off the left-hand vertical of the H and adjoining the remainder to the
preceding S, Z, T, D or W component to fashion single continuous compound
symbols, or amalgams. He was not the first, though, to propose amalgam letters.
Tucker (1773: 5–6) created a letter to correspond to the velar nasal by adding the
tail of ‘g’ to ‘n’ (the IPA adds the tail of script ‘ɡ’). Another of Tucker’s inven-
tions was an amalgam of the long ‘ſ ’ (‘long s’) and ‘h’ to form a letter <ɦ> to
replace the <sh> digraph, perhaps inspired by the typographic ligature form <ſh>.
He also conjoined ‘ɑ’ and ‘w’ to give a vowel letter corresponding to the vowel
in thought. Spence’s alphabet is similar in its achievement to Orrm’s spellings of
six hundred years before in that it regularises sound–spelling correspondences to
provide phonetic respellings for English without showing how it can be applied
more widely as phonetic notation. Taken in their respective historical contexts,
Orrm’s is the far greater achievement for having had no precedents. Spence does,
however, provide keywords and better attains one-to-one correspondence than
does Orrm.
3.4.2 Lepsius’s Standard Alphabet
Richard Lepsius (1810–84) was a German Egyptologist motivated to construct a

universal orthography by a missionary desire to spread the Bible to peoples with
no written languages (Lepsius 1863: 26–30). Putting his religious zeal to one
side, Heselwood et al. (2013: 15) conclude, looking at his proposed alphabet, that
‘we see in it the preoccupations of the philologist more clearly than those of the
phonetician’. Although he groups sounds into articulatory classes, he offers no
theoretical definitions and no overall theoretical scheme of classification, prefer-
ring to focus on providing an adequate set of symbols (Kemp 1981b: 61), saying:
A comprehensive exposition of the physiological basis would here be out of
place. We must limit ourselves to facilitating the understanding of the system.
This will be best accomplished by not separating the phonic from the graphic
system, but by presenting the former immediately in its application to the
latter. We do not enlarge, therefore, on the definition of Voice and Sound, of
Vowel and Consonant, and other physiological explanations, and shall only
refer to them as necessity may demand. (Lepsius 1863: 46)
Nevertheless, Albright (1958: 29–30) notes a number of features which make
the Standard Alphabet a forerunner of the IPA: the use of roman alphabetic
letters supplemented by diacritics and non-roman letter-shapes, application of a
phonemic principle to prioritise sounds with a distinguishing function, and use
of digraphs for diphthongs. Albright neglects to point out also the avoidance of
digraphs for consonants, the adherence to a ‘one-sound-one symbol’ approach,
and the policy of using roman letters as far as possibe for the kinds of sounds
they are most commonly in correspondence with in orthographies. For example,
Lepsius is insistent that fricatives should not be symbolised by letter-shapes asso-
ciated with plosives. He presents his symbols in italic form, which I shall follow.
The issue of whether to employ digraphs has always been an important one
107
which has divided opinions amongst phoneticians not so much into proponents
and opponents, but into those with a stronger aversion and those with a weaker
aversion who are more inclined to accept them as solutions to problems if
the alternatives are deemed to be worse. Kemp (1981b: 46–7) lists twenty
roman-based phonetic alphabets between 1668 and 1880, of which eleven used
digraphs. Avoidance of digraphs generally results in a profusion of diacritics
and this is true of Lepsius’ notation, leading some to call it a ‘diacritic’ alphabet
(Albright 1958: 28), and Sweet (1877: viii) to declare it ‘impracticable for
ordinary use’. The structural type given in (3.5) is thus much preferred by
Lepsius to the type given in (3.6). The symbols which are not roman-based are
mostly from Greek, for example [χ] (= IPA [x]), [γ] (= IPA [ɣ] or [ʁ]), [δ] (=
IPA [ð]), [θ] and [ϑ] (= IPA [θ]), all of which have also been incorporated into
the IPA with the same or similar values but with typographical harmonisation, a
particular obsession of Daniel Jones (Collins and Mees 1999: 290). For glottal
stop, Lepsius uses the apostrophe [’], a common device for the orthographic
representation of glottal stop and used by Boas, Goddard, Sapir and Kroeber
(1916: 14) for glottal stop in American languages; Lepsius also created the
symbols for ejectives adopted by the IPA by incorporating the apostrophe in [t’
k’] etc. to show that the glottis is closed, only opening after the ‘explosion’ of
the consonant (Lepsius 1863: 140). He classifies glottal stop as lenis and has as
its fortis counterpart a sound symbolised by two vertically aligned apostrophes.
These are the only base symbols in his main consonant chart which are not
roman or Greek letter-shapes. His examples for these sounds are, respectively,
the Arabic hamza (<‫( )>ء‬corresponding to /ʔ/) and ‘ayn (<‫( )>ع‬corresponding
to /ʕ/). Lepsius (1863: 186) explains that the symbol for the fortis correlate
‘shows its phonetic relation to the weaker ‫’ء‬, parallelling, but not mentioning,
the traditional Arabic explanation that the hamza letter was adapted from the
top portion of the ‘ayn letter to indicate its homorganicity (Heselwood 2012).
Either Lepsius imitated that adaptation, or he came up with the iconic device
independently.
While roman-based notation systems are stuck with the shapes of roman
letters for their symbols, which, being integral, have no componential structure
analogous to phonetic structure, the use of diacritics provides an opportunity to
introduce an analogical principle, and Lepsius exploits this, as of course does the
modern IPA. An acute accent over ‘guttural’ base symbols denotes palatal place
of articulation, for example [γ́] for IPA [ʝ], and also over base symbols of other
places of articulation to denote secondary palatalisation, for example [ṕ] for IPA
[pʲ]. This creates a problem in representing a palatalised guttural such that IPA
[c] and [kʲ] both have to be [k ́]. A superscript dot is used for ‘guttural’, distin-
. .
guishing [n] (= IPA [ŋ]) from ‘dental’ [n] (= IPA alveolar [n]), and [r ] (= IPA
[ʀ]) from [r] (= IPA alveolar [r]).
In dealing with vowels, Lepsius constructed a vowel space represented as
a triangle defined by [a] at the apex and [i] and [u] at base left and right. In
common with a number of writers on vowels in the first half of the nineteenth
century (Kemp 1981b: 52–4), he tried to associate vowel qualities with colours,
putting red at the top, yellow to the left and blue to the right. In so far as these
SomoudBarghouthy
colours are conceived of as inhabiting the vowel space, one might argue that the
symbol [a] denotes a ‘red’ vowel in Lepsius’ notation as much as it denotes an
‘open’ one. As a method of specifying vowel qualities, it is not very satisfac-
tory and was heavily criticised at the time. Whatever the theoretical shortcom-
ings in his vowel scheme, Lepsius was able to situate vowels in relation to the
triangle points and to each other, and symbolise them in a systematic manner
using keywords from the better-known languages to indicate their qualities.
Until Jones developed his cardinal vowel framework and gave vowel qualities
a tangible language-independent identity, attempts at vowel classification had to
resort to keywords as the only known points of reference, articulatory descrip-
tions being increasingly imprecise as vowels become more open. The problem
with keywords is firstly that there are always variant pronunciations (Ladefoged
1967: 53–4), and secondly that pronunciations change over time. Without sound
recordings we cannot know what exact pronunciation of English hate, or French
vote, or German Bär Lepsius (1863: 53) had in mind.
To undescore his practical aims, in what must represent a prodigious amount
of work, Lepsius provides Standard Alphabet transcriptions of texts for seventy-
eight ‘literary languages’ and thirty-one ‘illiterate languages’ drawn from many
language families all across the world. Under ‘literary languages’ he includes
not only dead ones such as Old Egyptian, and long-established living literary
languages like Arabic and Chinese, but also the Khoisan language Nama, having
only very recently had the Standard Alphabet applied to it by Europeans. The
click symbols will be familiar to all modern phoneticians: Lepsius invented
[!], described as ‘cerebral’ (= retroflex); hence the subscript dot which the IPA
has kept, although the sound is now classed as (post)alveolar and the symbol
regarded as integral, dental [ǀ] and lateral [ǁ]; but it was not until after Köhler,
Ladefoged, Snyman, Traill and Vossen (1988) argued for their adoption that the
IPA gave them its official blessing in place of the unpopular [ʇ ʖ ʗ], which were
withdrawn the following year. The palatal click has the palatal diacritic on the
dental symbol but in the IPA is represented integrally as [ǂ], a symbol first pro-
posed at a missionary conference held in South Africa in 1856 at which [+] was
put forward instead of [!]; see Lepsius (1863: 80 n.2). For examples of the many
suggestions for click symbols since the seventeenth century, and the puzzled
descriptions of the sounds by early travellers, see Breckwoldt (1979). Listed
among the ‘illiterate languages’ is Cherokee, even though the syllabary devised
around 1820 by Sequoyah (Scancarelli 1996) is given with Standard Alphabet
equivalents (Lepsius 1863: 294).
The format of presenting transcriptions of different languages alongside ver-
sions of the same text in that language’s orthography accompanied by explana-
tory notes is still continued by the IPA; see for example the Principles of 1949
and the 1999 Handbook, and illustrations of analyses of languages using IPA
symbols and categories are regularly featured in the IPA’s official journal. The
influence of Lepsius on IPA notation and transcription is therefore considerable,
not just in the respects for which Albright gives him credit, but also in a legacy
of symbols and the practice of presenting texts from many different languages
with explanatory notes. Despite the objections of many to his profusion of dia-
critics, we probably have Lepsius to thank for firmly establishing Z = (Y ← a) as
a standard derived structural type for a phonetic symbol, a type which continues
to prove its worth in the IPA.
3.4.3 Ellis’s palaeotype notation
Alexander J. Ellis, a mathematics graduate, started working on phonographic

alphabets with Isaac Pitman in 1843. Both had been working independently on
alphabetic projects for six or seven years before that date, motivated by spelling
reform, which they, like many others before and after them, saw as a prerequisite
for a socially desirable mass literacy and an aid to foreign language learning.
Albright (1958: 23) relates how they faced condemnation from conservative
elements in the English church and press, seeing opposition to their efforts as
defence of the vested interests of privilege and power. The landmark in their col-
laboration was the English Phonotypic Alphabet of 1847, the evolution of which,
and the thinking behind it, has been described in detail by Kelly (1981), who
notes that ‘[t]he labours of Pitman and Ellis during the ten years that preceded
the 1847 alphabet can be said to have established phonetics as a modern science
in Great Britain’ (ibid.: 262). It brought the era of amateur phonetics ‘firmly
to a close’ (ibid.: 263). Kelly also notes that they made a major contribution to
the notational resources available to Ellis in his later work on English dialects,
Ellis himself saying that beforehand there had been no adequate alphabet. The
phonotypic alphabet is a simple uniliteral notation with a clear predominance of
roman letters, and its intended function as a means of spelling is evident in the
provision of upper case as well as lower case forms having the same phonetic
value. Adaptations from non-roman alphabets are used as well, such as Greek [Σ]
paired with a lower case [ʃ] for IPA [ʃ], a reversed form for the voiced cognate
paired with [ʒ], Cyrillic [Є] paired with [c̡] for IPA [ʧ], [Ƌ] paired with lower
case [đ] for IPA [ð]. Reversal is one of the few devices available for analogical
formations using only simple uniliteral symbols, but it cannot be used exten-
sively. Basic roman vowel letter-shapes are modified in various ways to maintain
a simple uniliteral look, for example upper case [Ө] and lower case [ɵ] for IPA
[ɔ].
It may have been the restrictions of simple uniliteral symbols, and the adverse
reaction of many to the diacritics in Lepsius’ alphabet for constructing complex
symbols, which turned Ellis in the direction of multiliteralism after he parted
company with Pitman and began work on palaeotype notation in an endeavour
‘to ascertain what were the sounds of human speech, and reduce them to a set of
symbols’ (Ellis 1867: 3). The term ‘palaeotype’ means the old types, that is to say
‘the letters of the Roman Alphabet in their original Latin senses’ (Ellis 1869: 1).
Some of the criticisms of palaeotype levelled by Eustace (1969: 34), who
declared it ‘incomprehensible’, are countered by Local (1983: 2–3), who argues
from textual evidence that it should be seen not as a single coherent notation
system but as comprising ‘a complex mixture of the phonetic and the phonologi-
cal, the systematic and the fortuitous’. Nevertheless, there are principles which
are applied throughout the palaeotype notation which make it overwhelmingly
multiliteral. Ellis (1867: 3–4) shows concern for printers and compositers in his
reasons for avoiding the use of diacritical marks except for voice qualities, and
SomoudBarghouthy
in keeping as far as possible to roman letters supplemented by italics. He decides
against co-opting Greek letters on aesthetic grounds, although many might say
that mixing roman and italic letters in digraphs has an aesthetic awkwardness
about it making it uneasy on the eye, especially in running transcriptions – Sweet
(1881: 179) thought it ‘sprawly’.
Because Ellis wished to avoid diacritics he felt the need to have some symbols
which carry out a diacritical function by modifying the denotation of a preced-
ing symbol, but which have no phonetic value by themselves. Some of these he
reserved only for ‘complete’ transcriptions intended to show subtle shades of
sound, in contrast to ‘approximative’ transcriptions which have a lower resolu-
tion (see Chapter 4 Section 4.3). For example, [ʞ] is ‘a diacritical sign placed
after a vowel to indicate a more guttural, that is, either a less palatal, or a less
labial sound’ (Ellis 1867: 18), and [j] has as its diacritical function ‘to give a
palatal modification to a preceding consonant’ (ibid.: 17). Multiliteral symbols
with these diacritical components have the structure Z = (Y ← a), showing that
palaeotype [tj] is structurally the same as IPA [tʲ] in having a bound element
subordinate to a free element. Ellis also presents [h] as a diacritical symbol (ibid.:
16) (IPA [h] is symbolised as [ʜ]), but in fact there is variation in how it modi-
fies preceding symbols, leading to the conclusion that there are several symbols
with the same letter-shape ‘h’. That is to say, it is a homograph, which may be
one reason why Sweet (1881: 180–1) proposed alternatives. In [sh] (= IPA [ʃ]) it
modifies [s] by changing it to postalveolar, in [th] (= IPA [θ]) it modifies [t] by
making it fricative, and in [lh] (= IPA [ɬ]) it modifies [l] by making it voiceless.
Voicelessness can also be denoted by [ʻ] placed before the modified symbol, an
example of symbol synonymy. [h] can also be added to vowels ‘to indicate any
required variety, provided it be distinctly characterized’ (ibid.), meaning it can
be defined howsoever one wishes, though Ellis later discontinued this practice
(Local 1983: 10).
Many of the multiliteral symbols in palaeotype notation are analysable into a
base symbol and a diacritical symbol, which makes them in a sense only super-
ficially multiliteral compared to the digraphs in Wallis and Holder, for example,
although the homographic nature of ‘h’ as a component in multiliteral symbols
means that some of Ellis’s H-digraphs are more properly like compounds. It
is not much more than sleight of hand to say that ‘h’ in [sh] is a diacritical
symbol with its own denotation which modifies ‘s’, rather than saying that [sh]
as a whole compound symbol denotes a postalveolar fricative. Other diacritical
symbols differ from IPA diacritics only in being full-sized symbols – compare,
for example Ellis’s [lj] and IPA [lʲ] – but many also in spatial position and/or
font, as with palaeotype [am] compared to IPA [ã] for nasalisation, and [oɥ] com-
pared to [o̜] for less rounded. Because Ellis’s diacritical symbols cannot be used
on their own, they are occurrence-dependent on non-diacritical, or base, symbols
just as IPA diacritics are, with the consequence that there is little danger of mis-
parsing the symbol strings, although Ellis does provide a ‘diaeresis’ symbol [,]
just in case, ‘to separate groups of letters which would otherwise have a com-
pound signification’ (Ellis 1867: 31). The true compounds are the long vowel and
diphthong symbols, of which there are very many, for example [ii] (= IPA [iː])
and [ai] (= IPA [aɪ]), and which give occasion for trigraphs when a diacritical
symbol is added, for example [aah] (= IPA [ɑː]) defined as ‘long of (ah)’ (= IPA
[ɑ]), having the structure Z = ((X ↔ Y) ← a).
Symbols for accents and tones are provided in the form of configurations
of dots and turned apostrophes placed after the vowel or final consonant of an
accented or tone-bearing syllable. For example, navigation is transcribed as
[nav:igee·shən], where [:] denotes secondary accent and [·] primary accent. A
rising tone is represented iconically as [.·], a falling tone as [·.].
Ellis also developed notation for work specifically on English dialects and
for a reformed spelling of English. For the former, he created ‘glossotype’, a
watered-down version of palaeotype without turned letter-shapes and with only
one italic form, [r] for IPA [r]. It has breves and macrons for marking short and
long vowels. For a reformed spelling of English ‘received pronunciation’, a
designation coined by Ellis, ‘glossic’ was proposed, which ‘had the advantage of
not requiring any letters besides those of the ordinary alphabet’ (Albright 1958:
27). Glossic has the same aim as Spence’s 1775 alphabet, namely to provide
one-to-one letter–phoneme correspondences for English, and is therefore a
phonographic spelling system rather than a transcription system. It differs from
Spence’s notation in confining itself to available orthographic resources, which,
in conjunction with conventions informed by phonetic theory, makes it a proto-
phonetic notation.
Ellis’s work exemplifies the tension between concern with spelling reform
and the pursuit of phonetics as a discipline incorporating scientific methods and
requiring its own specialist technographic notation. The former pulled him in the
direction of consistent language-specific sound–spelling correspondences and
led to glossic, whereas the latter saw him aiming to create in palaeotype ‘a tool
for what we would now call impressionistic transcription’ (Local 1983: 5), that
is to say aiming for a universal, language-independent notation ‘for representing
nuances of speech sounds in dialectal and other philological studies’ (Albright
1958: 27).
3.4.4 Sweet’s romic notation
The same dual interest in general phonetics and in spelling reform characterised
much of the work of Henry Sweet, a pivotal figure in the history of phonetic nota-
tion as well as ‘the major force in phonetics’ in the Britain of his time (Collins
and Mees 1999: 42). He took Ellis’s palaeotype and fashioned a more usable
notation out of it which became the basis of the IPA. Ironically, he envisaged it
as a ‘temporary compromise’ (Sweet 1877: 102) until phonetic theory was com-
plete enough for an accurate organic notation to take over. Within a few years he
thought this time had arrived and he became a propagandist for his own revised
version of Bell’s organic notation. In fact, the roman-based IPA has so far proved
rather less temporary and far more widely used than any organic notation system
has ever proved to be, though with considerably more compromises of one kind
or another as we will see in Section 3.4.5 below.
Among the changes Sweet made to palaeotype are a greater use of diacritics,
especially on vowel symbols, with a consequent reduction in the use of digraphs;
restriction of italics to modifier symbols; abolition of capital letters; and
SomoudBarghouthy
inclusion of letter-shapes ‘from the Anglo-Saxon, Greek, and various European
alphabets, and from Pitman’s Phonotypy’ (Sweet 1881: 217). Some of Ellis’s
turned letters are kept and new ones proposed, including [ɟ] with its modern IPA
value. Transliterations from Sanskrit solve some problems, such as [c] for voice-
less palatal stop. The legacy of many of these policies is still evident in the IPA,
which in many ways is a direct descendent of Sweet’s romic notation (Albright
1958: 37).
Sweet called his notation ‘romic’ ‘because based on the original Roman values
of the letters’ (Sweet 1877: 102), and coined the terms ‘narrow’ and ‘broad’
(ibid.: 105) to distinguish between the full set of symbols required for represent-
ing ‘minute shades’ of sound in a scientific general-phonetic analysis, and the
much smaller sets which suffice to express distinctive differences of sound in a
phonological analysis of a particular language (see Chapter 4 Section 4.3). We
can see here the influence of Ellis’s palaeotype philosophy and his division of
notation into a set motivated by phonographic respelling and a set motivated by
a scientific approach to phonetics. Examples of applying broad romic to English,
French, German, Dutch, Icelandic, Swedish and Danish appear in Sweet (1877:
109–68).
Accent marking in romic follows Ellis in the use of [·] and [:] (Sweet 1877:
91, 190), but tonal notation does not. It follows Bell’s (1867: 83; see Sweet 1877:
xvii, 94) symbols, given in (3.12), which have become IPA symbols though not
currently with exactly the same meanings (see IPA 1999: 183–4).
(3.12) [ā] = level tone, [á] = simple rising tone, [à] = simple falling tone,
[ˆa] = compound rising-falling tone, [ˇa] = compound falling-rising tone
3.4.5 IPA notation
If it may be permitted to make an analogy between notation systems and some

of the great cities of the world, then compared to analogical systems the IPA is
a London rather than a Haussmann’s Paris or a grid-plan New York. There is
certainly evidence of planning and purposeful design, but the legacies of history
dominate the landscape. Sweet (1881: 182) thought that the arbitrary relation
between symbol-shape and sound would render agreement on the conventions of
roman alphabetic notation ‘impossible’, but this did not prove sufficiently true
to prevent IPA notation becoming established internationally. Daniel Jones’s
zealous promotion and defence of IPA notation throughout his professional
career no doubt played an important part in its success (Collins and Mees 1999:
424–5), but the main reason for its healthy survival into the twenty-first century
is probably its familiarity and practicality. However, it did not come straight into
existence in its current form. It has been through many modifications since the
first version of the IPA chart appeared in 1889. The dimensions for the back of
the vocal tract were particularly unstable over the first four decades of the IPA’s
history. For example, if we look at how sounds currently classed as ‘pharyngeal’
were accommodated we can see reflected a great uncertainty in phonetic theory
about these sounds. The laryngoscopic work of Esling and colleagues since the
mid-1990s is even now prompting another rethink of how sounds in this part of
the vocal tract should be classified (see for example Esling 2005, 2010: 688–90).
113
The 1889 chart has no pharyngeal sounds on it, the term ‘guttural’ being used for
uvulars, and this had not improved by 1892. By 1899 ‘uvular’ had appeared and
pharyngeals came under the ‘guttural’ label, between ‘laryngeal’ and ‘uvular’,
but in 1905 they descended to ‘bronchial’, below laryngeal sounds, which were
then termed ‘guttural’. The bronchial analysis probably comes from Sweet’s
(1904: 37) belief, expressed the previous year, ‘that the Arabic hā is simply a
bronchial hiss’. This kind of jigging around of categories shows that it is not only
organic notations which are affected by changes in phonetic theory. The impor-
tant difference, though, is that integral symbols only have to change their location
in taxonomic phonetic space, whereas organic symbols also have to change their
glyphic forms because these are determined by their taxonomic categories.
The basic alphabetic IPA symbols are simple integral symbols. For example,
[p t c k q] are all voiceless plosives but they have nothing in common as symbol
shapes, and [b m] are both bilabial but also share no common symbol component.
They are all unanalysable as symbols in contrast to their denotata. However,
when we look beyond the legacy of the roman alphabet we find that not all uni-
literal symbols are wholly integral. The analogical principle can be seen in the
design of many new symbols. MacMahon (1996: 823) points out the descend-
ing right tail denoting ‘retroflex’ [ʈ ɖ ʂ ʐ ɳ ɻ ɽ ɭ], the ascending right hook-top
denoting ‘glottalic implosive’ [ɓ ɗ ɠ ʛ] and the visual similarity of the nasal
symbols [m ɱ n ɳ ɲ ŋ ɴ], based on the letter-shapes of roman <N, n>. There is
also the descending left tail denoting ‘palatal’ [ɟ ɲ], motivated by [j], although
it does not appear on all palatal symbols. Lateral symbols are based on roman
<L, l> (or on Greek <λ>) [l ɫ ɬ ɮ ɭ ʎ ʟ]. In the case of the nasals and laterals,
we can take the base symbol to stand for the categories ‘nasal’ and ‘lateral’ in
general, as is done for denoting nasal and lateral release with the diacritics [dⁿ
dˡ]. The class of rhotic sounds, notoriously hard to group together on phonetic
grounds but nevertheless felt to have something in common (Lindau 1985), are
based on the letter-shapes of <R, r> [r ɾ ɽ ɹ ɻ ɺ ʀ ʁ], and the rhotic hook [˞] can be
said to denote ‘rhoticity’ in general despite there being as yet no wholly satisfac-
tory definition of this term. The phonetic value of these common visual elements
remains constant across the set of symbols just as does the phonetic value of a
diacritic. The difference between the ascending hook of [ɗ] and the apostrophe
diacritic of [t’] is only in the graphic continuity of the glyph, meaning that both
are equally complex in terms of symbol structure compared to the simple integral
symbols [d] and [t]. We can distinguish between them as complex continuous
and complex discontinuous; wholly integral symbols are by definition simple,
never complex, though they can be multiliteral. The lateral click symbol [ǁ], for
example, can be analysed graphically into ‘ǀ’ + ‘ǀ’ but the separate pipes do not
separately denote a category, though perhaps with some stretch of the imagina-
tion one could say that the placing of two pipes side by side is iconic of laterality.
The distribution of simple and complex continuous symbols through the
taxonomic phonetic space represented by the IPA chart is the result of grafting
purposely designed symbols onto the stock inherited from the roman alphabet,
and exemplifies the London as opposed to Parisian or New York nature of the
IPA.
SomoudBarghouthy
Iconicity is not entirely lacking either. Many diacritics have an iconic resem-
blance to what they denote. The voiceless diacritic [ ̥] resembles the Japanese
maru diacritic in form and denotation, and also Bell’s (1867: 35) Visible Speech
symbol [s] for ‘throat open’, although there is disagreement on the definition of
voicelessness, with some phoneticians defining it as an open glottis and others
as the absence of glottal vibrations (see discussion in Esling and Harris 2005:
350–1), which latter of course includes [ʔ]. The dental diacritic [ ̪ ] resembles
a tooth, and the raised, lowered, advanced and retracted tongue root diacritics
[ ̝ ̞ ̘ ]̙ iconically represent the relevant direction in the form of a line perpendicu-
lar to a reference mark (with a vocal tract facing left). The consonant and vowel
charts are arranged iconically to represent vocal tract space oriented with the
front to the left and the back to the right, closer articulations to the top and more
open articulations to the bottom. The tone and word-accent marks are iconic on a
similar principle to the raised and lowered diacritics. Pitch height is represented
by the height of a line perpendicular to a vertical reference line [˥ ˦ ˧ ˨ ˩], and
extent and direction of pitch movement by an angled line meeting the vertical
reference line [��].
Because the total set of symbols is a mixture of historical accident and pho-
netically motivated design, it is quite difficult to present a typology of IPA
symbols. I have listed in (3.13) what I think are the relevant typology parameters;
this should be read in conjunction with the general typology in Figure 3.11 in
Section 3.4 above.
(3.13) (a) Graphic properties of symbols

● Literality: uniliteral [a], [b] vs. multiliteral [au], [dz]
● Graphic continuity of symbol: continuous [œ], [ɓ] vs. discon-
tinuous [au], [p’]
● Boundedness of symbol component: bound [˞], [ ̪ ] vs. free
symbols [a], [b]
● Amalgamation of free symbols: amalgamated [æ], [ʧ] vs. non-
amalgamated [aʊ], [ɡ͡b]
(b) Relationship of symbols to denotata
● Analyticity of symbol: integral [a], [b] vs. analysable [au], [ɓ]
● Iconicity: iconic [ ̥], [˦] vs. non-iconic [a], [b]
● Self-allusion: self-allusive [ʰ], [ʂ] vs. non-self-allusive [ ̤], [s]
● Organicity: organic [ ̪ ], [ ̥] vs. non-organic [˦], [ʰ]
● Analogicality: analogical – retroflex series, ejectives vs. non-
analogical – velar series, plosives
As emphasised in Section 3.4 above, we have to be careful to distinguish between

the composition of a symbol and the component categories of the model it
denotes. The roman heritage of the IPA has the consequence that internal symbol
relations vary in how they mirror internal model relations. For example, [œ] is
analysable into ‘o’ and ‘e’, yet it denotes the rounded correlate of [ɛ], not [e]. It
would be more consistent for [œ] to denote the close-mid vowel, and a symbol
be constructed from ‘ɔ’ + ‘ɛ’ to denote the open-mid vowel. The inconsistency
arises from the use of [ø] for the close-mid vowel because of the phonographic
value of the letter <ø> in Danish orthography (Pullum and Ladusaw 1996: 136),
115
and the phonographic value of <œ> in written French (ibid.: 139). The ‘ash’
symbol [æ] also transparently combines two vowel symbol glyphs ‘a’ and ‘e’, but
its denotation of a vowel between [a] and [ɛ] suggests it should be constructed
with ‘ɛ’; interestingly, Holder (1669: 81) suggested [æ] for a vowel in the ‘space
between a and e’, and centuries before him the Icelandic ‘First Grammarian’ con-
structed his new vowel letters by combining features of two roman vowel letters
to denote a vowel quality intermediate between them. For example, he says that
‘e̜ is written with the loop of a, but with full shape of e, since it is a blending of
the two, spoken with the mouth less open than for a, but more than for e’ (Haugen
1972: 15). The mid-centralised diacritic [ ̽] could be construed as iconic if one
focused on the point of intersection and took the four extremities to represent the
four corners of the vowel quadrilateral, but it is not clear if this actually is its
motivation (Pullum and Ladusaw 1996: 234).
The ‘self-allusion’ parameter needs some explication. It identifies those
symbols whose design is motivated by the denotation of another pre-existing
symbol. Allusive relationships are quite loose and only quasi-systematic. If x
alludes to y, the relationship can be glossed as nothing more precise than ‘x has a
sufficient connection with y to motivate being modelled on it’. What the sufficient
connection is varies considerably from case to case. The aspiration diacritic [ʰ] is
clearly motivated by the glottal fricative symbol [h], and the secondary articula-
tion diacritics are motivated by the denotation of the main symbol of the same
shape, for example [ʲ] denoting palatalisation and [j] denoting a palatal approxim-
ant. There is in fact some inconsistency among the secondary articulation diacrit-
ics in that the manner of articulation of the main symbol is not always appropriate
for the diacritic, although a ‘sufficient connection’ can be identified in a post hoc
manner. While [ʲ] and [j] both denote open approximation, the velarisation and
pharyngealisation diacritics [ˠ] and [ˤ] are based on base symbols which denote
fricatives (despite the rarity of voiced pharyngeal fricatives pointed out by Laufer
(1996)), but the place of articulation is appropriate. Howard and Heselwood
(2002: 385) point out that this makes symbolisation of a voiced affricated velar
plosive problematic: [ɡˠ] should be interpreted by the conventions as a velarised
velar. A modified integral symbol involves self-allusion if part of the denotation
of the base symbol is also part of the denotation of the modified symbol. In [ɖ],
for example, the categories ‘voiced’ and ‘plosive’ are integrally denoted by the
[d] component, ‘retroflex’ analytically by the right tail. To avoid logical contra-
diction in this arrangement, we would have to say either that [d] as a component
denotes only ‘voiced plosive’ and that ‘alveolar’ is denoted by the absence of
another component, or that [d] has a different signification in [d] and [ɖ], and
again in [d̥]. The first alternative is a default interpretation of the kind found, for
example, in the Amharic syllabary, where an unmodified character corresponds
to a consonant plus /a/, but for other vowels the character has to be modified by
the addition of another component: < > corresponds to /la/, < > to /lu/, < > to
/li/. There is an exact parallel here with IPA [d], [ɖ] and [ɗ]. The second alterna-
tive means that composite symbols have to be understood as not always being
simply the sum of their parts, so that a certain integralness still characterises their
semiosis.
SomoudBarghouthy
These examples show how difficult it can be to sort out the logic of inter-
component relationships in complex symbols, and of symbol–denotatum rela-
tions, in a notation system like the IPA which has not been constructed according
to a single coherent plan, and which makes ad hoc use of analogy only where
the absence of an integral symbol allows. This state of affairs contrasts with
a strictly analogical system such as Lodwick’s (see Section 3.2.2) and is very
similar to the semantics of compound lexical items, where the meanings are not
predictable from the meanings of the constituent items. For example, the seman-
tic role relationships in modifier–head constructions such as window-cleaner and
vacuum-cleaner, and letter-writing and hand-writing, are quite different, window
and letter having the role of goal, vacuum and hand that of instrument. Despite
the looseness of self-allusion, the partnership of analogy and self-allusion has
featured quite prominently in the creation of new symbol-shapes for the IPA and
seems to have met the needs of transcribers quite successfully.
The organicity parameter also requires some comment in the light of the
observation above that any notation system is organic to the extent that its con-
ventions for interpreting symbols refer to articulation, and this is certainly the
case with respect to the IPA. However, it is complicated by the presence of aero-
dynamic terms such as ‘plosive’ and ‘fricative’, and auditory-perceptual terms
such as ‘creaky’ (the denotation of [ ̰ ]) and ‘no audible release’ (the denotation
of [˺]). The terms ‘voiced’ and ‘voiceless’ are more auditory-perceptual than
articulatory, as also is ‘click’. IPA categories are therefore a bit of a mixed bag,
and while the bag mostly contains articulatory categories, the non-articulatory
ones apply to a great number of sounds, among them some of the most commonly
occurring types, for example voiced and voiceless plosives and fricatives.
One resource in the roman alphabet which the IPA has not exploited is the
set of full-size capital letters. A proposal to use [A] for a low central vowel was
rejected, and there are currently none in the stock of symbols. The reason for the
exclusion of full-size capitals is probably twofold: firstly, concern for maintain-
ing visual harmony in transcriptions, where the occurrence of a full-size capital
letter in the middle of a string of lower case symbols would look ungainly;
secondly, capitals have significations in roman-based orthographies which are
inappropriate in phonetic transcription, such as marking the start of a sentence,
proper nouns and so on. One of the inconveniences of Ellis’s palaeotype was the
retention of capital letters for just such purposes. They have, however, been used
in archiphonemic transcription (see Chapter 4 Section 4.8) and feature in the
VoQS voice quality notation (see Section 3.6 below) as well as in intonational
notation systems such as ToBI (see Section 3.4.8 below), and in Wells’s (1995b)
SAMPA notation (see Section 3.4.10 below).
The IPA does use small capitals such as [ʙ, ɢ, ʜ, ɪ, ɴ, ʀ] for sounds related,
though in different ways, to their lower case forms. While [ʙ] shares the same
place of articulation as [b] but not the same manner, [ɢ], [ɴ], [ʀ] and [ʜ] share
the same manner but not the same place as [ɡ], [n], [r] and [h] respectively.
Small capital [ɪ] and its rounded partner [ʏ] are the only ones for vowels apart
from the [ɶ] amalgam. The higgledy-piggledy nature of the deployment of small
capitals is another consequence of the fact that the IPA symbol stock has grown
in a somewhat unplanned manner over many decades. Their deployment would
doubtless be among the many changes were the IPA to start again from scratch
117
in a wholesale notation reform. Before any such reform is contemplated, though,

thought should be given to whether it would make transcription any easier, not
just to whether it would make the notation more logical. Boas et al.’s (1916: 10)
suggestion of using small capital vowel symbols to stand in a voiceless–voiced
relation with lower case vowel symbols – for example, [i] as equivalent to IPA
[i̥] – is far more logical, but has never been widely adopted as a practice, perhaps
because the logic of it is nevertheless arbitrary and could not easily be general-
ised to consonants.
A key theoretical assumption underlying the classificatory scheme of the IPA
consonant chart is that the location of the greatest degree of constriction in the
supralaryngeal vocal tract is a highly important determinant of the properties
which distinguish consonants from each other. The ancient Indian phoneticians
took this view, and their influence via Sir William Jones and other Sanskritists
influenced the theorising of the phoneticians who contributed to the formation
of the IPA. But it was also seen as highly important by the ‘English School’
phoneticians of the seventeenth century who constructed consonant charts with
places and manners of articulation as the intersecting dimensions (see Chapter 2
Section 2.3.2), although some of them were less than accurate in their views of
how particular consonants were articulated. The point I wish to make is that it is a
theoretical assumption and not a given truth that the place of maximum constric-
tion is the key to a particular sound quality. Law (1990: 219–20) draws attention
to a different approach in phonetic descriptions of Arabic by medieval Middle
Eastern phoneticians, who were more concerned with where the air flowed out of
the vocal tract than with the place of maximum constriction. The implications of
this different focus are clearest when dealing with lateral and nasal consonants.
The place of maximum constriction for [l] is where the midline of the vocal tract
intersects with the alveolar ridge, but the air flows out over the lateral margins of
the tongue a bit farther back, more or less at the side of the palatal region, though
this may be quite variable. The IPA classification ‘alveolar lateral approxim-
ant’ can be construed as contradictory in that the alveolar stricture is a complete
closure, not a stricture of open approximation – the stricture of open approxima-
tion where the air flows out is molar rather than alveolar, but this is not precisely
captured by the term ‘lateral’ and will be different for a retroflex [ɭ], palatal [ʎ]
or velar [ʟ]. Similarly, [m] is classified as a bilabial nasal because the maximum
constriction is formed by the lips, leading some phoneticians to class it as a type
of stop (e.g. Ball and Rahilly 1999: 85), but the function of the closure is not the
same as in the production of [b], where the air is released from the same place as
the closure once the articulators part. Instead, the air during [m] flows continu-
ously through the velopharyngeal port and out through the nose, so that it is less
than obvious where the maximum constriction is in the path of the airstream –
Catford (1977: 138) proposes calling [m] a ‘nareal approximant’, implying it is at
the nostrils. Looking at Al-Sakkāki’s medieval vocal tract diagram in Figure 2.2
(Chapter 2), we can see that the Arabic letters < ‫ > ل‬and < ‫ > ض‬corresponding to
the lateral sounds [l] and [ɮˤ] are placed along the sides of the tongue, although
the nasal letters < ‫ > م‬and < ‫ > ن‬corresponding to [m] and [n] are placed at the
lips and alveolus respectively.
SomoudBarghouthy
In similar vein, Vaissière (2007: 63) points out that what distinguishes French
and English coronal consonants is not the precise place of contact or the precise
part of the active articulator making the contact, but the overall shape of the
tongue. It is overall tongue shape which determines the pattern of airflow and the
resonance of cavity excitation. When we turn our attention to vowels, the point of
maximum vocal tract width may in fact be as important as the place of maximum
constriction as a determinant of resonance quality.
Taking the location of maximum constriction, and the degree of constriction
at that point, as primary determinants of consonantal quality leads quite naturally
to setting up place and manner of articulation as dimensions in abstract articula-
tory space. When these are set out in a chart, the chart becomes like a form of
tablature notation in which symbols ‘denote what speakers do with their lips and
tongues, not what sounds they make’ (Heselwood 2008b: 85). The consequences
for impressionistic phonetic transcription of using symbols defined largely in
tablature terms is discussed in Heselwood (ibid.), where an important shift away
from ostensive definitions based on experience of sounds towards a more theory-
driven mode of phonetic definition is identified as having taken place as a result
of the 1989 IPA Kiel Convention. Ladefoged (1990: 338) identifies what appears
to be a contradiction, or at least an ambiguity, in Principle 2 of the post-Kiel IPA
Principles regarding what IPA symbols denote: they are ‘intended to be a set of
symbols for representing all the possible sounds of the world’s languages’, but
also to designate intersections of phonetic categories (IPA 1999: 159). The two
characterisations can be reconciled if we make the distinction between theoreti-
cal models and descriptive models discussed in Chapter 1 Section 1.3.1, which is
in effect the difference between notation and transcription. I suggest we take the
view that a symbol as part of a notation system denotes a theoretical model as a
product of an intersection of categories, but when it is used in transcription it rep-
resents a sound, either specifically or generically, by virtue of denoting a descrip-
tive model. There is always more material content in a sound in the articulatory
and acoustic domains than can be accounted for by phonetic categories because
of the indeterminate complexity of events involved (see Chapter 1 Section 1.2.2).
In the perceptual domain, categories do not have phenomenal reality because they
are theoretical constructs, but phoneticians can ascribe their perceptions to catego-
ries by procedures which rely on the distinction between recognition memory and
declarative memory (Johnson 2007: 32; and see Chapter 5 Section 5.9). We recog-
nise sounds because we have a remembered history of experiencing sounds, and
we can analyse them because of our knowledge of phonetic theory. The proposal
in Chapter 6 Section 6.5 of expressly distinguishing between the various domains
of phonetics, between physical and abstract spaces within those domains, and of
conceiving of taxonomic phonetic space in domain-neutral terms, ought to allow
analysis of any phonetic data to be mapped onto phonetic categories without those
categories having to imply a particular speaker or listener behaviour. In other
words, it frees the symbols on IPA charts from being a type of tablature notation.
A principle applying to integral symbols in the conventions of the IPA, and
which relates to the theoretical importance attached to the point of maximum con-
striction, concerns the denotation of active and passive articulators. The principle
is not immediately obvious, though, because of the lack of explicit mention of
categories for active articulators in the traditional three-term labels, and it is not
consistently applied. Let us examine in detail the symbol [t̪], denoting ‘voiceless
dental plosive’. The three-term label makes no mention of the active articulator,
although, as Catford (1977: 145) advises, ‘in formal analytical designations one
must use the fullest possible term’. The full term in this case is ‘voiceless apico-
dental plosive’, explicitly identifying the apex, or tip, of the tongue as the active
articulator. The principle of symbolisation here is that the passive articulator is
denoted separately by a diacritic while the active articulator is denoted integrally
by the base symbol. We can see this repeated in [t̼ ], where the diacritic denotes the
upper lip. Although the diacritic is defined as ‘linguolabial’ on the IPA chart, it may
make more sense to call [t̼ ] ‘apico-labial’, with the ‘apical’ category being denoted
integrally; blade articulation can be denoted by adding the ‘laminal’ diacritic [ ]̻ .
Inconsistency comes when we look at the presence of a diacritic for apical, [ ]̺ , so
that [t̺] in effect has ‘apical’ denoted twice. Further inconsistency can be seen with
the advanced and retracted articulation diacritics. For example, [t−] denotes that the
place of articulation is retracted, but how do we know what it is retracted from? This
information is contained in the integral base symbol, which is understood to denote
the ‘alveolar’ category if no other place is specified. Inconsistency, however, is not
necessarily a reason for getting rid of something if it is found to be useful. Being
able to specify apical articulation as opposed to laminal is indeed useful, as is being
able to denote advanced and retracted articulation. To make these transcriptional
resources more consistent and logical may not make them any easier to use.
The phonemic principle has guided the IPA in the development of its notation
from its beginnings. Increasingly, however, users have wanted to be able to tran-
scribe details which may not function to distinguish between words in any known
language but may have social-indexical importance, or distinguish one accent or
dialect from another, or one speaker from another, or even two realisations of the
same item by the same speaker. A narrowing down to almost microscopic levels
of detail in phonetic analysis, and consequent demands for narrow transcription
resources, reflects a current which has flowed through linguistic phonetics over
the last century or more, moving the focus from language-teaching (the earlier
names of what became the International Phonetic Association were L'Association
Phonétique des Professeurs Anglais, The Phonetic Teachers’ Association and
L'Association Phonétique des Professeurs de Langues Vivantes), in which
knowledge of distinctive sound differences is crucial for good pronunciation,
towards a more scientific and research-driven discipline, in which instrumental
investigation has played an increasingly central role. A good stock of diacritics
and other devices for representing observed details of speech in all its mani-
festations is therefore now much more of a priority than it was in the days of
Paul Passy and colleagues. We should not forget, however, that this current was
already on the move, before the IPA had been thought of, in the work of the
German experimentalists of the nineteenth century (Kohler 1981).
3.4.6 Extensions to the IPA
The extensions to the IPA (ExtIPA) came about as a result of a working party
set up by the IPA Congress held in Kiel in 1989 with a remit to propose
SomoudBarghouthy
symbols suitable for the transcription of atypical speech, particularly the
kinds of speech behaviours encountered by speech therapists and speech
pathologists. It took as its starting point previous proposed systems such as the
symbols for the phonetic representation of disordered speech drawn up some
ten years earlier (PRDS 1980) and suggestions in Shriberg and Kent (1982)
and Vieregge (1987). The working party’s own proposals were approved by
the Congress and published in Duckworth et al. (1990). There have been some
additions since, for example those suggested by Bernhardt and Ball (1993).
The most recent ExtIPA chart is the one revised to 2008 (reproduced in the
Appendix).
It is interesting to see which of the typological parameters in (3.13) above
are most evident in the current ExtIPA set. Use of existing free symbols in
amalgamations is seen in [ʪ ʫ fŋ], which are the only new main symbols, but a
new type of symbol construction appears in [ʬ] and [ʭ] for bilabial and bidental
percussives, involving vertical reduplication which is iconic of the percussing
articulators. The only precedent in the IPA set is the double pipe [ǁ], which, as
remarked earlier, could be said to be a reduplication of single pipe [ǀ] and iconic
of laterality. Discontinuity of graphic shape is seen in these percussive symbols,
and is a prevailing parameter in the addition of more diacritics. Diacritics rather
than new symbols are used for denoting additional ̪ ̪ ̪ ̪ places
̪ of articulation not
normally found in typical ̪
̪ ̪ ̪ speech:
̪ ̪ ̪ dentolabial [p b m f v ], labioalveolar [p b5 m
55
f
5
v
5
] and interdental [t̪ d̪ n̪ r̪ θ̪ ð̪ l̪ ]. Something of the inconsistency seen in the way
active and passive
̪ articulators are denoted in the IPA is evident here. In dento-
labials, the [ ] denotes the lower teeth, which function as the active articulator,
so the base symbol has to carry the information about the passive articulator;
this goes against the principle of the base symbol integrally denoting the active
articulator. The labioalveolars, however, adhere to it as do the interdentals, which
also introduce the iconicity̪ of the active articulator being positioned between the
upper and lower teeth – [n̪]. But if the upper diacritic denotes the upper teeth and
the lower denotes the lower teeth, as no doubt they must in the bidental percus-
sive [ʭ], then although this ̪ is more logical it reverses the illogical but iconically
motivated denotations of [ ] = lower teeth in dentolabials and [ ̪ ] = upper teeth in
dentals; the PRDS symbol was [ ̪ ] with a raised inverted bridge above. Curiously,
I have noticed students in practical phonetics classes get confused about which
teeth and which lip are involved when they make labiodental and dentolabial
articulations. An attempt to regularise the symbol for a labiodental nasal by
replacing [ɱ] with [m̪ ] was opposed by the IPA and was withdrawn when the
ExtIPA symbols were officially adopted by the International Clinical Phonetics
and Linguistics Association (ICPLA) in 1994 (IPA 1999: 186–7). The left tail
of [ɱ], attached to the rightmost leg of [m], stands out as anomolous because it
is not by analogy with symbols in a shared category. Both of the other such tails
are on the velars [ɡ ŋ]. When [ɱ] was first added to the IPA chart, some time no
later than 1914, no language was known to use it except as an allophone of /m/
in labiodental contexts. It thus violated the general IPA principle of only sym-
bolising sounds known to be phonemically distinctive in at least one language.
An example of its distinctive use had to wait until Paulian’s study of the Kukuya
dialect of the Congolese language Teke to come to light (Paulian 1975, cited in
Pullum and Ladusaw 1996: 112). A minimal pair is /kì mààlà/ ‘to complete the
121
rest’ and /kì ɱààlà/ ‘to laugh at’.

Some comment needs to be made about the percussives, sounds which
uniquely do not require the initiation of an airstream, but only local displacement
of air. The sound symbolised by [ʭ] is the sound arising from raising the jaw so
that the lower teeth strike against the upper teeth with a sharp cracking sound, but
there is some confusion over what exactly is the sound respresented by [ʷʷ], often
informally described as a ‘lip-smack’. It appears on the 1994 ExtIPA chart but
the conventions in Duckworth et al. (1990) and Ball, Code, Rahilly and Hazlett
(1994) make no mention of it, and although it appears again on the version
revised to 1997 published in IPA (1999: 193), it is not listed there as a symbol
(see ibid.: 188) – it is mentioned in passing as only ‘occasionally found’ (ibid.:
187). Its IPA number should be 600, as it immediately precedes [ʭ] on the chart,
which is numbered 601, the last number assigned to an IPA symbol being 599,
the number for subscript acute [ ̗]. The informal term ‘lip-smack’ suggests not
a percussive sound but a suction sound which arises from the two lips parting.
If this is the sound intended, and I have heard it demonstrated this way, then it
could be symbolised [ʷʷ↓], but it would have to be removed from the percussive
row. A true bilabial percussive sound, which is much duller in timbre, can be
made by raising the jaw so that the lower lip strikes the upper lip; it can be made
without the lips parting. By quickly raising and lowering the jaw so that the lips
meet, part, meet again and part again, a rapid succession of [ʷʷ ʷʷ↓ ʷʷ ʷʷ↓] can be
heard.
Another innovation on the ExtIPA chart is a row for ‘nareal fricatives’, which,
like the new place of articulation categories, makes use of a diacritic, [ ͋ ], placed
over a nasal base symbol instead of designing new main symbols. In Duckworth
et al. (1990: 276) there are two ways of symbolising audible nasal friction, the
choice of which to use depending on whether the ‘intended’ sound was a nasal
or not. Ligatured [h‿n] was to be used if [n] was intended and produced with
audible nasal friction, whereas if, for example, [z] was intended then it should be
represented as [z͋ ]. For the first time in IPA conventions, the speaker’s intention
became a criterion for the use of a symbol. Unsurprisingly, it occasioned some
controversy and the distinction was withdrawn in 1994 by ICPLA (IPA 1999:
187). Thereafter, the diacritical representation [n͋ ] was generalised to cover all
instances of nasal friction, although some authors continued to use the ligatured
symbols without implying a nasal target sound. Controversy has continued,
however, in cleft palate studies over how to classify and symbolise different
kinds of nasal airflow; see discussions in Grunwell and Harding (1996), Peterson-
Falzone, Trost-Cardamone, Karnell and Hardin-Jones (2006) and Howard (2011:
132–3). If friction noise is generated at the velopharyngeal port rather than the
nostrils, then the ‘velopharyngeal friction’ diacritic [ ̃] can be used. If the friction
in either case is voiceless, then the voiceless diacritic is employed.
Looking at the ExtIPA chart, the most obvious difference from the IPA
chart is that columns and rows have been introduced to accommodate symbol-
plus-diacritic combinations. The only exceptions are the amalgamations [ls lz
fŋ]. Amalgamated symbols on the IPA chart are placed in the ‘other symbols’
category outside the table. If this had been done in the ExtIPA context, and the
SomoudBarghouthy
diacritics confined to the ‘diacritics’ box, there would in fact be no need for an
ExtIPA table. Setting that point aside, perhaps the most interesting fact about
the ExtIPA set from the point of view of the history of phonetic notation is that
no new integral main symbols have been introduced, and no new analogical fea-
tures either.5 Overwhelmingly, the policy has been to use diacritics on existing
symbols, a policy which relies heavily on self-allusion, as can be seen in some
of the new diacritics. The nareal fricative [ ͋ ], velopharyngeal friction [ ͌ ] and
denasal [ ͊ ] allude to the IPA [ ͌ ], and the dentolabial and interdental diacritics
allude to [ ̪ ]; so does the bidental percussive main symbol, while the bilabial
percussive alludes to the labialisation diacritic [ʷ], which itself alludes to the
main symbol [w].
Another innovation of ExtIPA is the provision for ‘indeterminate’ transcrip-
tion ‘to mark sounds about which the transcriber is uncertain’ (IPA 1999: 187–8).
The symbol for indeterminacy is listed as a ‘balloon’ (ibid.: 188) inside which
a symbol or a category label abbreviation can be placed, but examples on the
ExtIPA chart are given in parentheses with overline and underline. An inde-
terminate vowel, for example, is symbolised [(V̄)], _ an indeterminate voiceless
plosive as [(P̄_ l.v–—
ls– )], although [(P̄
_̥)] seems to me preferable – the PRDS symbols
included (S̥) for an unspecified voiceless stop (Grunwell 1987: 294) and one
wonders why ExtIPA did not follow this symbolisation. Although the IPA did
not have any official means of symbolising a whole class of sounds, a turned ‘k’
was suggested in 1911 to stand for ‘a consonant in general’ (MacMahon 1994:
19) and is mentioned in the 1949 IPA Principles along with a turned ‘u’ as a
generalised vowel symbol.
Silent articulations, or ‘mouthings’, are enclosed within parentheses. These
are useful for infant babbling behaviours, where they are commonly encountered
(Vihman 1996: 110, 215–16).
Some previously unofficial but widely used symbols are given official recog-
nition in ExtIPA, for example the ‘unaspirated’ diacritic [˭], and further diacritics
introduced for what are termed on the chart ‘strong’ and ‘weak’ articulations (not
to be confused with the traditional terms ‘fortis’ and ‘lenis’; see Duckworth et
al. (1990: 276–7)), but listed a little confusingly as ‘stronger’ and ‘weaker’ (IPA
1999: 189). The ‘sliding articulation’ diacritic, introduced on the suggestion of
Bernhardt and Ball (1993: 35–6) and exemplified by [ ͢θs], is an example of a new
symbol which is iconic of an articulatory movement. It is provided for represent-
ing two sounds with adjacent places of articulation which are produced ‘within
the timing slot for one segment’ (ibid.), thus implying misarticulations.
The reason for the PRDS symbols and their ExtIPA successors was to provide
notation specifically for clinical transcriptions of atypical speech, but they often
prove useful for typical speech as well. An example concerns the symbols [ls lz]
for simultaneous lateral and central airflow, a manner regarded as ‘atypical’ by
Ball and Local (1996: 56) and ‘found with some misarticulations of target alveo-
lar fricatives’ (IPA 1999: 187). The risk in confining a category to ‘atypical’ or
‘misarticulated’ speech is illustrated in the fact that sounds for which [ls] and [lz]
are appropriate symbols have recently been observed in some dialects of Arabic
in south-western Saudi Arabia. Figure 3.14 presents electropalatographic frames
of one token of pharyngealised [lsˁ] and two tokens of pharyngealised [lzˁ]
(Watson, Heselwood, Al-Azraqi and Naïm 2012). The slightly greater degree of
123
contact for the voiced correlate is consistent with previous EPG studies of frica-
tive articulations (Dagenais, Lorendo and McCutcheon 1994; McLeod, Roberts
and Sita 2006).
[ʪˁ]
a)
199 200 201 202 203 204
205 206 207 208 209 210
[ʫˁ]
b) c)
415 416 417 418 419 420 524 525 526 527 528 529
421 422 423 424 425 426 530 531 532 533 534 535
FIGURE 3.14: EPG frames showing simultaneous central and lateral

channels for airflow during (a) [lsˁ] in the word θˡˁaim ‘pain’ (Al-
Rubū‘ah dialect), (b) [lzˁ] in the word ðˡˁahr ‘back’, and (c) [lzˁ]in the
word ðˡˁabʕ ‘hyena’ (Rijāl Alma‘ dialect)
Notation for representing subtle but important differences in the timing

of voicing is a welcome and very useful addition to the ExtIPA symbol set,
although in my opinion the conventions for interpreting them would benefit
from not invoking the concept of typicality – speech is speech, and any judge-
ments as to its typicality or otherwise are best made separately rather than
appearing in the conventions. The voicing notation introduces a feature which
has generally not appeared in IPA or other segmental notation systems, namely
sequential ordering, an issue addressed in Section 3.5 below. Although moti-
vated by the requirements of clinicians and researchers in clinical phonetics,
‘the ExtIPA taxonomy broadens the symbolic capability of the basic IPA nota-
tional system with considerably more detail than was formerly possible’ (Esling
2010: 694).
SomoudBarghouthy
124
3.4.7
IPA Braille notation
This account draws heavily on Englebretson (2009), which gives a thorough and
up-to-date review of the history of Braille versions of the IPA from Merrick and
Potthoff (1934), with which Daniel Jones was associated, to the official adoption
of IPA Braille by the International Council on English Braille (ICEB) in 2008.
Braille is a system invented by the Frenchman Louis Braille in the 1820s for
transliterating the characters of writing systems to represent them in the form
of patterns of raised dots which can be felt by the fingertips. It is the standard
medium of written language for blind literate language users. The basic unit is a
‘braille cell’, which is a matrix of six dots in a 2-column, 3-row array: ̤.̤̈ The dots
can be referred to by numbers as shown in (3.14).
(3.14) 1 4
2 5
3 6
There are braille versions of several alphabets, abjads and syllabaries. Each
letter or syllabogram has a unique combination of dots. Roman <a> = <a> = 1,
 = = 12, <c> = <c> = 14, <d> = <d> = 145, <z> = <z> = 1356. Like
shorthand, the braille systems of many languages employ contractions which use
single cells or cell combinations to represent common strings of letters or whole
words. For example, contracted English braille uses the cell <&> = 12346 to
transliterate the string <and>.
Dot-assignment in braille transliteration is based on alphabetic order. The
letters <a–j> have only dots in the top two rows; letters <k–t> repeat the same
patterns in the same order but with the addition of dot 3, so that <a>1 is to <k>13
as 12 is to <l>123; letters <u–z> recruit dot 6 to the same homologous
pattern, except for <w>, which is 2456 and which was absent from the French
alphabet when braille was invented.
The ‘IPA Braille’ notation presented in Englebretson (2009) is intended to
replace previous rival and mutually unintelligible braille versions of the IPA,
the Braille Authority of the United Kingdom (1990) version and the Braille
Authority of North America (1997) version, in both of which serious short-
comings have been identified (Englebretson 2009: 71–5). It takes the Merrick
and Potthoff version as its starting point, but with a complete revision of the
diacritics. Its governing principle is faithfulness to the symbols of the 2005 IPA
chart, so that every symbol on it has a unique braille transliteration equivalent.
Faithfulness has the consequence that the anomalies and inconsistencies of the
IPA discussed above in Section 3.4.5 are mostly reproduced.
The six-dot cell means that if more than 63 symbols are needed, there will
have to be multi-celled symbols. To produce braille versions of the 180-odd
symbols of the 2005 IPA chart, considerable thought has to be given to the task.
True to the transliterative nature of braille writing, those IPA symbols which are
glyphically the same as roman alphabetic letters are simply given those braille
forms, so that [a] is [a], [b] is [b], etc.6 IPA modifications such as the right tail
of retroflexes and right hook-top of voiced implosives are mirrored in IPA Braille
by the cell for the base symbol being modified by a prefix cell denoting retroflex
or implosive. For example, [d] = [d], [ɖ] = [4d] and [ɗ] = [8d]. These complex
symbols have the same structure as their IPA equivalents, as shown in (3.15).
(3.15) Z = (Y ← a)
Example 8d = (d ← 8)
The IPA symbol for a voiced pharyngeal fricative, [ʕ], is a turned version of the
glottal stop symbol [ʔ], and a glyphic relation is found in the IPA Braille symbol
[62], which contains the glottal stop symbol [2] in a composite with [6], which
latter also occurs as part of composite glyphs for some labiodentals, labial-
velars, the voiceless palatal fricative and some non-front vowels, thus exhibiting
considerable homography. The pharyngealisation diacritic is triliteral [@62],
the cell @ being a prefix indicating that what follows is a secondary articulation
diacritic. Each subtype of diacritic has its own prefix depending on whether in
the IPA it is placed above, below or on the same level as the main symbol: the
subscript position of diacritics such as voiceless and voiced is denoted iconically
by [,], superscript position by [�], and level position, such as the velarisation/
pharyngealisation tilde, by ["]. It is a moot point whether this kind of device is
denoting phonetic categories or denoting facts about IPA notation, or denoting
the former via the latter. These notational devices make the internal structural
relations of complex symbol glyphs much more complicated than in the glyphs
of IPA symbols, often introducing another level of subordinate determination.
Even though the conceptual structure of IPA notation is faithfully adhered to,
the expression of the conceptual relations in terms of glyphic form are different
because of the quite severe limitation imposed by the smaller inventory of braille
cells and their conventionally linear arrangement. For example, IPA [dˤ] has the
form and internal structure given in (3.16), where i represents a component which
gives us information about its governing component’s status, in this case that it
is a diacritic of the IPA superscript class. The IPA notational equivalent is given
underneath in (3.16) for comparison.
(3.16) d �62 = (d ← ((26)←@))

Structure Z = (Y ← ((a b)←i))
dˁ = (d←ˁ)
We can see in the above examples and discussion how the principle of faith-
fully following the structure of IPA symbols in fact leads the notation away
from faithfulness in certain other respects, as with SAMPA notation (see Section
3.4.10). Unlike IPA diacritics, diacritics in IPA Braille are all biliteral or trilit-
eral composites. Englebretson (2009: 78–9) emphasises that the sheer number of
IPA symbols for which braille equivalents have to be designed using the basic
unit of the braille cell is bound to give rise to unsystematic correspondences and
usages. IPA Braille therefore has to be approached on its own terms as an arbi-
trary mixed bag of integral and analysable symbols, just like the IPA, but with
an extra layer of semiosis due to the direct transliteration of glyphic elements, so
that, for example, in addition to the notational relationship between [4d] and the
SomoudBarghouthy
category bundle ‘voiced retroflex plosive’ there is also an analysable translitera-
tion relation between ‘4d’ and ‘ɖ’.
At present there is no braille version of ExtIPA notation, or of VoQS, but there
are plans to develop them (Englebretson 2009: 82).
3.4.8 Pitch notation
The sphere of activity in which pitch notation is most often encountered is music.
Since the Middle Ages, western music has been notated by placing notes on staves
each line and space of which, in conjunction with a clef and a key signature,
denotes a pitch on a musical scale (see McCawley 1996: 847–8). The first person
to realise that musical notation conventions could be used in phonetics seems to
have been Robert Robinson in his 1617 The Art of Pronuntiation (see Dobson
1957: 21–2), although the first to apply it comprehensively within the framework
of a prosodic theory was probably Joshua Steele (1775; see Abercrombie 1965:
35–44; Sumera 1981: 101–3; and see Chapter 4 Section 4.11.3). Daniel Jones, in
Jones and Plaatje (1916), used musical notation to give exact pitch values to the
tones of Sechuana (Tswana) (see Collins and Mees 1999: 149), and several other
writers have used it for other languages. Halliday (1970: 52) uses musical staves to
indicate pitch dynamics, reproduced in Figure 3.15. Musical notation has not gen-
erally been found to be the most convenient method and, as Fox (2000: 183) points
out, it makes a misleading analogy between musical and linguistic uses of pitch.
FIGURE 3.15: Halliday’s use of musical staves to show pitch dynamics in

speech. Halliday (1970), A Course in Spoken English: Intonation, Oxford:
Oxford University Press
Representation of pitch is not a common feature in writing systems, even for
tone languages, although Lao and Thai are exceptions (Diller 1996: 464–5). It
did develop in Greek in the second century bce in the form of acute [ ́], grave
[ ̀] and circumflex [ ̂] accents. The acute denoted a high pitch, circumflex a high
falling tone, and a falling or level pitch was marked by the grave accent (Threatte
1996: 276–7). These glyphs have endured to the present day as tone marks in
orthographies and phonetic notations with a variety of values, usually exploiting
their iconicity in one way or another.
Pitch, as Fox (2000: 179, original italics) reminds us, ‘is a phonetic feature
with a variety of phonological functions’, and the notational devices for repre-
senting pitch are most often employed for denoting phonological categories of
lexical tone, word-accent and intonation, categories which are language-specific
and subject to contextually determined variation, rather than for denoting single
pitch values as in music. Whereas the absolute pitch of a note is crucial in music,
at least in the context of a given key, it is only relative pitch which phoneticians
are generally interested in, and typically with a low level of resolution (see e.g.
Pike 1943: 27–9). There is an assumption behind pitch-marking conventions
that only a small number of relative categories are needed, the IPA providing
for five – extra high, high, mid, low and extra low; there are no guidelines on
precisely what counts as ‘high’ rather than ‘extra high’. Quite what motivates the
terms ‘high’ and ‘low’ for the auditory sensation of pitch is not altogether clear,
but they make intuitive sense to listeners, and pitch-based musical notation places
tones with faster vibrations higher on a stave than those with slower vibrations,
thus reflecting perception iconically.7
Iconicity in phonetic pitch notation has persisted from the Greek accents
through Robinson (1617) and Bell (1867) to the present-day IPA, although the
interpretative conventions have been subject to changes. In Bell’s Visible Speech
and in the 1949 IPA Principles, acute accent denoted a rising tone, grave accent
a falling tone, as indeed was the case in Joshua Steele’s system (Steele 1775:
9–11), with level tone denoted by a macron (IPA 1949: 18). This is the system
in Chinese Pinyin writing, with an inverted circumflex [ˇ] for the falling-rising
compound tone. In the revised 1999 version of the IPA Handbook, the reader is
warned not to interpret acute and grave accents iconically (IPA 1999: 14) since
they now mean simply ‘high tone’ and ‘low tone’ respectively; iconicity is not
entirely absent, however, in the height of the right endpoint of the accent mark
relative to the left endpoint. There is also an element of iconicity in the denota-
tion of ‘extra high’ and ‘extra low’ tones through repetition of the tone mark – [á]
for ‘high’, [a̋] for ‘extra high’. In IPA Braille, tone marks follow the shape and
iconicity of the IPA marks [˥ ˦ ˧ ˨ ˩], based on Chao’s (1930) tone marks for
Chinese but with reorientation. For example, IPA ‘high’ [˦] is [_c], ‘low’ [˨]
is [_-].
For pitch movements over intonation units, if similar notational devices are
used to those for more local marking of pitch on individual syllables, then the
relevant stretch of speech has to be identified. The IPA provides a single vertical
line│ to demarcate rhythm groups and a double vertical ║ for intonation groups.
Ball et al. (1994: 75) warn that the current IPA notation for intonation ‘may be
more of a hindrance than a help to many phoneticians’. This view is borne out by
SomoudBarghouthy
the fact that writers on intonation have not tended to use it, adapting instead the
basic acute, grave and circumflex symbols with their own interpretative conven-
tions, for example Cruttenden (1997: xvi) and Wells (2006: 260).
The simplest obvious way to represent pitch iconically is in the form of pitch
curves drawn to give an impression of height and movement. An example from
Daniel Jones (1909: 88, original transcription) is given in (3.17).
(3.17)
Jones also combined these curves with musical notation; see Collins and Mees
(1999: 60–5, 239–45) for a critical account of Jones’s approach.
Not all proposed notation for pitch has been iconic. Halliday (1967) uses
numbers for intonation patterns, and Gandour (1979: 96) numbers the tones of
Thai from 1 = low to 5 = high – the compound high falling tone is represented
as 51. The four tones of Mandarin Chinese are often referred to conveniently in
discourse by the numbers 1 to 4, and capital letters have been recruited to denote
tones by acrography (H for high, M for mid, L for low, etc.), for example in ToBI
notation (Tone and Break Indices; see Beckman and Ayers 1994), X-SAMPA
notation (Extended Speech Assessment Methods Phonetic Alphabet; see Wells
1995b; and see Section 3.4.10 below) and INTSINT notation (International
Transcription System for Intonation; see Hirst 2004). Conveniently, all symbols
in ToBI notation can be typed with the ‘caps lock’ on. In addition to H and L,
it uses [*] to denote accent, [!] for a stepped accent (homosymbolic with IPA
‘postalveolar click’ and VoQS ‘harsh voice’), [ˉ] for a phrase accent, and [%] for
a boundary accent (see Chapter 4 Section 4.11.3). Arbitrary capitals are the norm
for reconstructed ‘proto-tones’ where pitch values are unknown but distinctive-
ness is hypothesised (Fox 2000: 185).
Hirst’s (2004) INTSINT system is unique in that it comes close to specifying
absolute pitches when used in specific transcriptions. If the F0 range in Hz is
ascertained for an utterance, then the pitch categories T (= Top of the range), M
(= Middle of the range) and B (= Bottom of the range) have absolute values. An
algorithm then computes target values for the other categories of H and L, and U
(= Upstep) and D (= Downstep).
3.4.9 Notation for voice quality and long domain categories
Kelly and Local (1989: 34–5) identify what they call ‘holistic listening’, in
which the listener tries to ‘attend closely to the details of the overall character-
istics of longer stretches’ of speech. Notation for representing such phenomena
is provided for in the ExtIPA set and in the VoQS (Voice Quality Symbols) set
‘intended as a hypernotational scheme for marking long pieces of segmental
transcription’ (Esling 2010: 694).
The VoQS notation comes from bringing together Laver’s (1980) voice quality
129
categories with notational innovations from the IPA and ExtIPA (Duckworth et
al. 1990: 278; Ball, Esling and Dickson 1995: 73). For denoting phonation types
and supralaryngeal settings, roman capitals are employed acrographically as
symbols along with IPA and ExtIPA diacritics. Falsetto, for example, is symbol-
ised as [F], (modal) voice as [V], anterior phonation as [V̟ ]; raised and lowered
larynx are denoted by [L̝] and [L̞], left and right offset jaw by [J͕ ] and [J͔ ]. Greek
capital [Θ] is used for ‘protruded tongue voice’, and two Cyrillic capitals make an
appearance to stand for atypical ‘airstream types’: [Ю] for ‘tracheo-œsophageal
speech’, and [И] for ‘electrolarynx speech’.
For features of loudness and speech rate, including pausing, the ExtIPA set
provides conventions adapted from musical terminology and notation: f (forte)
for ‘loud speech’, p (piano) for ‘quiet speech’, allegro and lento for ‘fast speech’
and ‘slow speech’, and also terms such as crescendo/descrescendo for ‘getting
louder/quieter’ and accelerando/ralentando for ‘getting faster/slower’. Pauses
are denoted by full stops in parentheses, which can be used so that each full stop
represents a ‘silent beat’ of pause (Ball et al. 1994: 73). With this interpretation,
it is a rhythmic symbol rather than simply a symbol for momentary cessation of
speech, and therefore similar in function to the stress mark [ˈ].
An extremely useful innovation for deployment of symbols denoting voice
quality and other long domain categories is the ‘labelled braces’ notation,
which can be used in conjunction with orthographic transcription as well as
segmental phonetic transcription. Labelled braces allow the transcriber to
demarcate the extent of a long domain category in relation to the segmental
or orthographic transcription. An example is given in (3.18), in which the
information in the labelled braces is presented on the following lines for ease
of reading, rather than on the same line as shown on the VoQS chart (see
Appendix).
(3.18) ðɛn i tʰoʊɫ ðə weɪtə (.) tə ɡɛʔ ðə bɪɫ

Rate and loudness {moderato} {lento} {f allegro f}
Voice quality {V {V̰ V̰ } V}
Then he told the waiter to get the bill.
3.4.10 SAMPA notation
The notation known as SAMPA (Speech Assessment Methods Phonetic Alphabet)

and its extended form X-SAMPA (Wells 1995b) is, despite having been rendered
unnecessary by Unicode, interesting because firstly it is a relatively recent
development and therefore able to be designed with up-to-date knowledge of
phonetics, and secondly it was tightly constrained from the outset by the form in
which ASCII files could be sent as email messages.8 As might be guessed from
this, the motivation for SAMPA notation was to enable phonetic transcriptions
to be put in the body of an email message rather than having to be in attached
documents. Only ASCII characters with numbers between 32 and 126 could be
successfully transmitted (ibid.: 1), giving a basic set of 93 symbol components,
or glyphs. Although this is more than the basic set of 63 IPA Braille forms (see
SomoudBarghouthy
Section 3.4.7 above), it is well short of the 180 or so on the 2005 IPA chart.
The challenge, then, as with braille, was to find the best way to transliterate
IPA symbols with a much smaller set of glyphs at one’s disposal. Do we find,
for example, the same attempt to copy the analogical structure of IPA retroflex
and implosive symbols, to give some unity to the set of nasal symbols, and so
on? The answer is a mixed yes and no. Retroflex symbols are transliterated by a
grave accent diacritic added to an alveolar base symbol: [tˋ] = IPA [ʈ], [dˋ] = [ɖ]
etc., not to be confused with the apostrophe which in SAMPA marks palatalisa-
tion, not ejectives: [t’] = IPA [tʲ]. Similarly, voiced implosives are symbolised by
adding [_>] to the voiced plosive symbol: [b_>] = IPA [ɓ], [d_>] = [ɗ] etc., but
note the additional complexity in which the underscore functions to tell us that
the following component is to be interpreted as a diacritic. For example, [t_w] is
equivalent to IPA [tʷ]. This parallels the situation in IPA Braille shown in (3.16)
above. The SAMPA underscore is therefore structurally an i-type component
determining the a-type ‘>’ component.
When we come to nasals, we find they are not all designed on roman nasal
letter-shapes. The labiodental nasal is capital [F], and the palatal nasal is capital
[J], glyphs not used in the IPA but which denote ‘falsetto’ and ‘jaw’ in the VoQS
symbols. There is an obvious reason to prioritise non-conflict with IPA symbols,9
but it does mean that several VoQS symbols cannot be used unless they are
explicitly identified as such; capital [C], denoting ‘creak phonation’ in VoQS,
transliterates IPA [ç], [V] transliterates [ʌ], and [W], by the same logic, translit-
erates [ʍ]. There is extensive homosymbolism of [\] as a diacritic, which can be
seen as a result of the constraints on the symbol set. In [G\] (= IPA [ɢ]), [X\] (=
[ħ]) and [N\] = [ɴ]) it functions as a marker of uvular place of articulation, but
while in the latter two it can be interpreted as retracted articulation applied to
another consonant of the same manner class ([N] = IPA [ŋ], [X] = IPA [χ]), in
the former it is not quite analogous because [G] is equivalent to IPA [ɣ], not [ɡ].
Two voiced fricative symbols, [B] (= [β]) and [R] (= [ʁ]), become homorganic
trill symbols with the addition of [\], but the other symbols with [\], a selection
listed in (3.19) with their SAMPA derivations and IPA equivalents, are a bit of a
ragbag group, although they have in common that [\] changes only one category
from the same classificatory dimension.
(3.19) J ([ɲ]), J\ ([ɟ]) – nasal–plosive; K ([ɬ]), K\ ([ɮ]) – voiceless–voiced;

L ([ʎ]), L\ ([ʟ]) – palatal–velar; M ([ɯ]), M\ ([ɰ]) – vowel–glide;
p ([p]), p\ ([ɸ]) – plosive–fricative; r ([r]), r\ ([ɹ]) – trill–approximant.
3.4.11 Notation for infant vocalisations
Researchers into the vocal behaviours young infants display prior to canonical
babbling have not been comfortable using the analytic categories that underlie
systems of notation for transcribing spoken language and have thus had misgiv-
ings about representing infant vocalisations with standard phonetic symbols such
as those of the IPA (Oller 2000: 4). The reason for this unease is not so much
the fact that the vocalisations are not pronunciations of words as that the actual
structure and timbre of the sounds infants produce are different. Mackenzie Beck
(2010) gives an extensive overall account of organic changes to vocal tract struc-
tures and functions responsible for initiation, articulation and phonation over the
course of life from infancy to old age, and Hodge (2013: 8–19) reviews a number
of key studies on the effects on vocal productions of the continuing development
of the vocal tract through the early years of life. The ratio of oral to pharyngeal
cavity size is not constant during development due to the different growth rates of
the various anatomical structures. The larynx is very high at birth, level with the
second spinal vertebra, compared to the seventh by puberty, making the pharynx
very short. The tongue is relatively large in infants, filling the mouth more than
it does by the time speech emerges. Infants’ breathing tends to be nasal rather
than oral because of the need to breathe while feeding, so vocalisations are often
nasalised, a feature augmented by the ‘close relationship of laryngeal and velo-
pharyngeal cavities’ (Vihman 1996: 104). All these factors give physical articu-
latory space (see Chapter 6 Section 6.5.1) a significantly different shape as well
as different dimensions compared to an older child or adult. As a consequence,
the acoustic structure of a vocoid with an articulatory configuration analogous
to an older child’s [ɛ] or [u] will have different formant spacing and a different
auditory-perceptual quality.
Two of the first child language specialists to respond to these fundamental
differences in the phonetics of early infant sounds were D. Kimbrough Oller
and Rachel Stark (see e.g. Oller 1980, 2000; Stark 1986). Oller refers to the
sounds infants make which are not vegetative, that is to say not coughs, belches
or cries, as ‘proto-phones’ (Oller, Eilers, Neal and Schwartz 1999: 225; Oller
2000: 193–4) and, with various colleagues, pioneered methods for charting the
development of proto-phones into proper phones through the babbling stages
into early forms of spoken language. This task required a framework of cat-
egories onto which observed infant vocalisations could be mapped and which
could, if one wished, be denoted by some kind of notation. Oller’s categories
are mostly acoustically defined in terms of spectral and F0 stability, tempo-
ral relations of syllable-like constituents, and rapidity of formant transitions,
parameters which bind otherwise limitless physical properties of acoustic
signals into a fabric recognisable as an approximation to speech. Vihman
(1996: 100) characterises Oller’s perspective as bridging the gap between
acoustic analysis and phonetic transcription (see for example Oller 2000: 12,
fig. 1.2), but it may be more accurate to identify abstract acoustic space (see
Chapter 6 Section 6.5.3) as containing most of the models of the framework.
The category ‘quasi-resonant nucleus’ accounts for vocoid-like sounds which
have a recognisable formant structure, and the acrographic symbol [QRN] is
employed to denote it. Other categories, or ‘infraphonological properties’, with
different acoustic characteristics are ‘squeal’, ‘growl’ and ‘goo’, which can be
notated as [SQ GR GO]. Oller et al. (1999: 227) regard the IPA conventions
as inappropriate for these kinds of vocalisations, but Heselwood and Howard
(2008: 389) suggest that IPA symbols can be used to augment these proto-
phone symbols if one wishes to give some information about auditory quality,
for example that a particular QRN or GR is closer to [a] in auditory quality
than to any other adult vowel. An example is given in (3.20) with VoQS nota-
tion as well.
SomoudBarghouthy
132
(3.20)
ʔ QRN GR SQ
a ɣ i
{Ṽ Ṽ}
In this case, the conventions for the IPA symbols would have to have the general
caveat that the symbols are to be understood as ‘have something of the quality
of x’ or, as I have heard it endearingly put, have to be understood as ‘kiddywink
symbols’. Esling’s (forthcoming) contention that infant vocalisation data ‘illus-
trate that laryngeal quality is primal, that control of the articulatory and perhaps
acoustic cues of speech originates in the pharynx and that the acquisition of the
ability to produce manners of articulation spreads from the pharynx’, if true, pre-
dicts interesting times for transcribers of infant proto-speech, particularly when
we consider the facts about infant laryngeal and pharyngeal anatomy. It also pre-
sents a challenge to the more or less received opinion that the first vowel qualities
to emerge in early vocalisations are associated with the low front quadrant of the
vowel quadrilateral (Hodge 2013: 12–15), unless one postulates a developmental
discontinuity of the kind that Jakobson (1968: 21–2) is now widely criticised for.
3.4.12 Using notations
Notations exist in order to provide resources for expressing analyses of phonetic

data in transcriptions of one kind or another (see Chapter 4), just as the theoreti-
cal models they denote exist to facilitate those analyses. On their own, theoretical
models and symbols may hold a fascination but they are, from a practical point
of view, vacuous. It should also be fully understood that it is not imperative to
embody one’s analyses in transcriptional form. If it is more convenient and more
useful to spell them out in descriptive language, then that is what one should do.
But if transcription is called for, then a number of points can helpfully be borne
in mind about using notation systems.
Responsible usage of notation requires a good understanding of its conven-
tions and of phonetic theory so that transcriptions remain accurately interpret-
able, but a notation should not be regarded as a rigid system of rules shackling the
phonetically informed imaginations of transcribers. If one wishes to transcribe
something for which one set of conventions does not provide a ready-made
symbol, new complex symbols can be formed from existing symbols, or ele-
ments from another notation system can be used providing this is made clear, as
in (3.20) above. When new sound-types come to light, such as the labiodental
flap reported for many central African languages by Olson and Hajek (1999),
or the recent discovery of a pharyngeal tap (see Chapter 6 Section 6.2.1), then
thought has to be given to how it should best be symbolised. In the case of the
labiodental flap, a new symbol was proposed and accepted by the IPA. It is
formed analogically by grafting the ‘fish-hook’ curl of the alveolar tap [ɾ] onto
the voiced labiodental fricative symbol [v] to give the graphically continuous [ѵ]
which appears on the 2005 IPA chart. The proposed symbol for the pharyngeal
tap is [ʕ̆] (Esling 2010: 696, 699–70), a graphically discontinuous symbol com-
prising the voiced pharyngeal fricative base symbol and a diacritic defined in the
conventions as ‘extra short’. However, the sound is not an extra-short fricative.
The use of the breve for a tap is found in Olson and Hajek (1999: 101), who use
133
[w̆ ] for a bilabial tap and [v̆] for the labiodental tap. It seems to have been Eunice
Pike (1946) who first used the breve over a base symbol with the same place of
articulation to denote a tap or flap. These quasi-systematic solutions show how
the mixture of historical accident and purposeful design continues to characterise
IPA notation and how it can continue to be exploited to meet new demands. The
‘elaborated consonant chart’ in Esling (2010: 695–700) contains further exam-
ples of the manner in which existing IPA, including ExtIPA, notational resources
can be harnessed to create complex symbols; see Appendix.
Not only new discoveries but also new developments in phonetic theory,
resulting in new categories, have to find expression in notation systems and their
conventions. The advantage of a system based on integral symbols such as the
IPA is that the conventions can be changed while the base symbols themselves
remain unaltered. This is an advantage not only for practical considerations of
font design and printing, but also for preserving the domain-neutral status of
notation (see Chapter 6 Section 6.5). A reconceptualisation of abstract articula-
tory vowel space such as is proposed in Esling (2005) does not change what the
vowels sound like, nor does it change their acoustic properties. It does, however,
change the taxonomic categories because they are articulation-based, and it
changes how the articulatory domain relates to the acoustic, aerodynamic, audi-
tory and perceptual domains.
The need for narrow transcriptions mentioned above at the end of Section
3.4.5 means that transcribers sometimes have to use symbols with a slightly dif-
ferent meaning, which can be explained in a note if clarification is felt to be nec-
essary. For example, a narrow transcription of northern British English What’s
up lad? might look like (3.21).
(3.21) s̝̆ʷ’ʊ̥̆ p˺ l(ɣ) a̽ d̥

{allegro ʔ ʔ allegro} {L̝2 L̝3}
The [ʔ] is being used here to show that the glottis is closed throughout the first
syllable, although this meaning is not recoverable from IPA, ExtIPA or VoQS
conventions. The voiceless diacritic has therefore to be interpreted as lack of
voicing due to glottal closure in [ʊ̥̆] but due to glottal opening in [d̥]. The vowel
in the first syllable may have no acoustic reality as a segment, but be perceived
as present because of the labial quality of the ejective fricative and the listener’s
expectation (see Chapter 5 Section 5.6). The rather cumbersome way of indicat-
ing with numerals that the larynx gets lower during the second syllable after
having been raised for the ejective could be improved perhaps by a grave accent
instead of the numbers: [Lˋ] = ‘descending larynx’, a dynamic category; [Lˊ]
would then denote ‘ascending larynx’.
The best advice is to be as free with the symbols as their conventions allow,
and not to be afraid to introduce temporary conventions providing there is good
reason to do so, that it is clear what the temporary modifications are, and that they
do not contradict the theoretical framework of the notation. Base symbols can be
used as diacritics to indicate a lesser degree, or a hint of, what they denote. In a
transcription of lunch pronounced without final glottalisation one might wish to
SomoudBarghouthy
indicate an epenthetic oral stop between /n/ and /ʃ/ by transcribing [lʌntʃ]. If one
does not accept that the phonological form of lunch is /lʌnʃ/ and insists on /lʌnʧ/,
then the oral stop will not be analysed as epenthetic. Here we can see that pho-
nological analysis may determine the content of a narrow phonetic transcription.
Canepari (2005) contains a wealth of symbols and combinations of symbols
and diacritics modelled on the IPA but which Canepari calls canIPA. Vowel
symbols are mapped onto rectangular vowel grids with accompanying sagit-
tal ‘orogram’ and coronal ‘labiogram’ diagrams (ibid.: 121–5), and consonant
symbols onto a chart with eighteen rows for manner of articulation and sixty-
five columns for place of articulation (see Figure 3.16). These are also liberally
illustrated with orograms (ibid.: 166–95). The orograms show that the symbols
have a clear articulatory denotation, but they are not justified by reference to
any instrumental articulatory data. The canIPA is an attempt to build a more
richly structured general phonetic taxonomy denoted by systematic use of
diacritics and pseudo-diacritics, making it much more analogical than the IPA
although still based on integral roman-derived glyphs. Returning to the city
metaphor, it is like trying to turn London into Haussmann’s Paris without first
making a convincing case for such a massive project. As far as I am aware,
can
IPA is not used except by its inventor, a familiar fate in the history of nota-
tion invention.
IPA notation can be supplemented by other notational devices as is advocated
by Vaissière (2007: 59–60), who finds that the diacritics do not denote what she
wishes to denote. Her supplementary notation combines articulatory, acoustic
and auditory-perceptual categories to provide greater detail. For example, she
characterises French /i/ as in (3.22).
(3.22) {palatal (⇑F3F4)3200Hz}
‘Palatal’ is clearly an articulatory designation; ‘F3F4’ denotes a spectral

prominence formed from F3 and F4 in proximity and thus a perceptual sali-
ence through auditory integration; ‘3200Hz’ denotes the centre-frequency of
the prominence; the underline indicates that F3 is a front cavity resonance; and
the arrow indicates that F3 is about as high as F3 can be. By using this supple-
mentary notation, Vaissière is able to show cross-linguistic differences in the
descriptive models denoted by IPA symbols. That is to say, for example, that
the data mapped to IPA [i] in French can be specified as different from the data
mapped to it in English.
3.5 Ordering of Components and Homography in

Composite Symbols
The internal relations of symbol components have not received as much atten-
tion by commentators as they deserve, and may not have been much in the minds
of those who designed the symbols. But it might be fruitful to analyse symbol-
internal relations because it can provide one way of typologising phonetic
notation, as we have seen in Section 3.4 above and in Figure 3.11. The relevant
parameter to consider here is ordering, more specifically functional ordering,
FIGURE 3.16: Consonant chart from Canepari (2005: 168) showing

twenty-four places of articulation between palatal and laryngeal. ©
Canepari (2005), Handbook of Phonetics, LINCOM
such that symbol components may be functionally ordered or not functionally

ordered. If they cannot be shown to be functionally ordered, then we can say
that they are functionally simultaneous. The distinction is based on Mulder
and Hervey (1975/1980). Functional ordering means that a sequence ab has
SomoudBarghouthy
information value associated not only separately with a and b but also with the
fact that a precedes b, that is to say in the ordering relation between them. There
can only be separate information value in the order if it contrasts with a different
information value in a different order, ba.
In simple integral symbols there is clearly no ordering of components because
the symbol cannot be analysed into components. In [b] the categories ‘voiced’,
‘bilabial’ and ‘plosive’ can thus be said to be denoted simultaneously, which in
fact mirrors the simultaneity of the three-term label ‘voiced bilabial plosive’ –
‘bilabial voiced plosive’ cannot be a different theoretical model. The same is true
of an analysable symbol such as [ɖ] or [n̥] or [l ]̰ , because there is no informa-
tion value in the spatial relationships of the separate components of the symbol.
The descending right tail denoting retroflex cannot be attached at the top, and
although the diacritic denoting voiceless can be placed over the symbol as an
over-ring [n̊ ] it does not change the denotation of the whole complex symbol,
which remains ‘voiceless alveolar nasal’. When it comes to [l ]̰ , however, we
have to be careful not to be misled into thinking that the diacritic’s position
under the base symbol to denote creaky voice does carry information which
contrasts with the information conveyed when it is placed above to denote
nasalisation in [l ̃], or through it to denote velarisation/pharyngealisation in [ɫ].
The reason why we should not come to this conclusion is that we are dealing
here with homographs, or we could coin the term homosymbols, that is to say
two symbols which have the same figurae whilst each maintaining its own
potestas. The situation is analogous to homographs in spelling such as English
just (adverb) and just (adjective), which can in fact occur together in They are
just just, which is also the case in [ɫ̰̃]. The fact that we recognise which is which
by their positions under, over or through the symbol is not the same as saying
that the position determines the denotation of the tilde component. This should
be clear if we compare it with the aspirated diacritic in [ʰt] and [tʰ], where the
denotation of the diacritic itself remains the same but the denotation of the two
complex symbols is not the same. In this case, the denotation of the complex
symbol as a whole is shown to rely on the positional relations expressed in (3.10)
in Section 3.4 above.
Functional ordering of symbol components in the IPA framework only came
about with the introduction of the ExtIPA ‘voicing’ symbols, the conventions
for which allow for [ʰ] to be used for pre-aspiration as we have just seen, but
further examples of the positional relations of symbol components are found in
the ExtIPA symbol set. Analogous to the placing of [ʰ], the placing of voiced
diacritic [ ]̬ can be to the left or the right of a base symbol to indicate prevoicing
and post-voicing respectively. For example, [ ̬z] indicates that voicing begins
before the friction, and [z ̬] that it continues after the friction. The position of the
diacritic in relation to the base symbol thus carries an information value separate
from that of the diacritic itself. To show differences of voice timing during the
hold phase of a consonant, subscript half-rings are added as a kind of diacritic
for the diacritics. For example, [ s̜ ]̬ denotes that there is voicing for the first part
of the fricative, [s̬ ]̹ that there is voicing in the second part; in a complementary
arrangement, [ z̜ ̥] denotes absence of voicing in the first part of the fricative, and
[z̥ ̹] absence in the second part. Enclosing the diacritic in half-rings, [ ̜s̬ ̹] and [ z̜ ̥ ̹],
means ‘partial’ voicing or devoicing, presumably in the middle of the segment’s
137
duration. Because the half-rings are dependent on, and determine, the voicing
diacritics, we again have an additional layer of embedded structure in the whole
symbol, shown in (3.23), in which the i component has an ordered relation to the
a component.
(3.23) Z = (Y ← (a ⇐ i)) Z = (Y ← (i ⇒ a))

Example z̥ ̹ = (z ← ( ̥ ⇐ ̹ )) ̜z̥ = (z ← ( ̜ ⇒ ̥))
Positional relations play a more pervasive role as a design feature in

Jespersen’s analphabetic notation, where different permutations of letters denote
subdivisions in the place of articulation dimension. For example, ‘fg’ means
between alveolar and postalveolar, but closer to alveolar; ‘gf’ means closer to
postalveolar (see Jespersen 1889: 12–14). By contrast, Pike’s (1943) analpha-
betic formulae are unordered because it does not matter how a formulaic string
is presented; the meanings and the hierarchical relations can be recovered from
the notation. For each category in Pike’s system the sequence of letters is fixed
and therefore has no information value; the random jumbling of all the letters of
a formula would result in something uninterpretable.
A multiliteral symbol such as Ellis’s palaeotype [sh] (= IPA [ʃ]) is simultane-
ous because the reverse [hs] is not a single symbol and therefore it cannot con-
trast with it. The same is true of this and other H-digraphs in, for example, Wallis
(1765) and Holder (1669), and is not counter-evidenced by the possibility of the
sequence [hs] in a transcription because [hs] would be two symbols representing
different sounds which happen to be contiguous. Ellis’s trigraphic multiliter-
als [shj] (= IPA [ʃʲ]) and [ljh] (= IPA [ɬʲ]) appear on the face of it to implicate
functional ordering of [h] and [j], but in fact this is homosymbolism because [h]
does not mean the same thing in both instances: it is a voiceless diacritic in [ljh]
but not in [shj].
Briefly surveying the notation systems examined in this chapter, it is evident
that functional ordering is relatively rare compared to functional simultaneity in
symbol-internal relations. It is not found at all in the core IPA, only in ExtIPA,
and not in the IPA Braille and SAMPA transliterations of the IPA. Lodwick’s
analogical symbols are all unordered complexes, as also are Wilkins’s, although
less obviously so because of homosymbolism. Whether by conscious or uncon-
scious design, or by accident, the strong tendency for simultaneity of symbol
structure reflects the temporal stability of a speech sound as a perceived object
in which all its properties tend to be apprehended at once, rather than reflect-
ing the dynamism and temporal distribution revealed by instrumental analyses,
conferring an agreeable suitability on segmental notation for impressionistic
transcription.
3.6 Hierarchical Notation

We have seen a kind of hierarchy in the internal relations of composite symbols
where one type of relation is embedded within another, but hierarchical relations
are also encountered in the denotata of symbols. These relations can themselves
SomoudBarghouthy
be denoted, typically in the form of a tree with nodes and connecting lines of
the kind familiar from syntactic analyses – what are sometimes called ‘arboreal’
representations. Hierarchical notation is mostly encountered in phonology rather
than phonetics. In arboreal representations, nodes lower down the tree are said to
be dependent on, or dominated by, the nodes they are connected to above. The
approach to phonology known as feature geometry makes extensive use of trees
to show how the theory views the internal structure of a segment as hierarchically
organised. There are various versions of feature-geometrical tree structures, but
a common one is given in (3.24), adapted from Gussenhoven and Jacobs (1998:
180).
(3.24) ROOT
LAR SUPRALAR
Spread Voice Constr Nas Cont Lat
PLACE
LABIAL CORONAL DORSAL RADICAL
Distr Round Ant Distr Back High Low
The tree represents various facts as assumed by the theory, such as that whether
a segment is nasal or not, or lateral or not, depends on a supralaryngeal adjust-
ment, and for a segment to be rounded it has to be labial. Individual consonant
and vowel segments can be represented in the form of trees to show only those
feature components which the theory postulates for them.
Dependency phonology expresses internal segment relations with its own
notation and conventions. For example, the difference between a voiceless frica-
tive, a nasal and a liquid is shown in (3.25), from Anderson and Ewen (1987:
153).
(3.25) {|V:C|} {|V →

→ C|} {|V →
→ V:C|}
Voiceless Nasal Liquid
fricative
These notations show that for a voiceless fricative, according to the theory of
dependency phonology, the V-component (open vocal tract and voicing) and the
C-component (closed vocal tract, no voicing) are ‘mutually dependent’ because
the vocal tract is open and there is no voicing; in nasals, V dominates C, while in
liquids, V dominates a less constricted articulation.
Hierarchical notation has often been applied to represent syllabic structure and
sometimes also the structure of rhythm and intonation groups, in order to denote
internal structural relations of a hierarchical kind theorised to obtain within those
structures. The traditional structure of the syllable has it that the coda and nucleus
form a closer relationship than the onset and nucleus, which is denoted in the tree
in (3.26).
(3.26) σ
ONSET RHYME
NUCLEUS CODA
Hierarchical relations involving accent- and tone-bearing syllables in rhythm

and intonation groups can be usefully represented in trees. In (3.27), syllables
with greater degrees of prominence are shown as dominating weaker syl-
lables. The words sort and out have rhythmic beats, and out is also the tonic
syllable.
(3.27)
sort it out
Notes
1. Voiced and voiceless are termed ‘sonorous’ and ‘mute’ respectively by Wilkins.
2. For convenience I shall use the term ‘diacritic’ to cover true and pseudo-diacritics
unless the distinction needs to be made.
3. In this structural formula and those that follow, Z represents the whole symbol; W, X
and Y represent free integral base symbols which can combine in compounds; and a, b
and c represent bound elements (diacritics and pseudo-diacritics).
4. Wilkins describes voiced [m] as mugitus ‘mooing, lowing’, saying it is ‘counted of
difficult pronunciation in the end of words’ (Wilkins 1668: 358). He seems here to
have followed but misunderstood Wallis, who used the term mugitus for a mooing
sound ‘not yet represented in any alphabet’ (Wallis 1765: 18–19, in Kemp 1972:
163–5) and which seems to be a nasalised labial fricative or approximant such as IPA
[ṽ] or [w̃].
5. The symbol [ʞ] denotes a single articulatory parameter, ‘velodorsal articulation’, and
is therefore not an integral symbol.
6. IPA Braille has braille versions of square and slant brackets (Englebretson 2009: 80),
but for convenience I will enclose braille symbols in IPA brackets.
SomoudBarghouthy
7. Interestingly, higher frequencies are processed above lower frequencies in the temporal
lobes of the auditory cortex, though this can hardly be the explanation. More relevant
might be the tendency for the larynx to elevate for higher F0s and depress for lower F0s.
8. For convenience I shall use ‘SAMPA’ as a cover term for SAMPA and X-SAMPA.
9. Compatibility with the IPA is an important consideration when introducing or using
non-IPA symbols (Ball, Rahilly and Tench 1996: 73).
SomoudBarghouthy
4
e
Types of Transcription
e
4.0 Introduction
In this chapter I consider how phonetic notation can be used to represent differ-
ent levels and aspects of the analysis of pronunciation and how these uses have
been typologised. Where relevant, I shall draw attention to examples in writing
systems which can be seen as pre-theoretical examples of an awareness of these
levels of analysis. A key distinction in transcription is between phonetic on
the one hand and phonemic, or phonological, on the other. It is now universal
practice to enclose phonetic transcriptions, denoting general phonetic models, in
square [ ] brackets and phonemic or phonological transcriptions in slant / / brack-
ets. The square bracket convention was employed by Ellis (1867: 8) to enclose
‘approximative’ (i.e. broad; see Section 4.3 below) transcriptions, although it
is not clear if he was the first to adopt it; other transcriptions were enclosed in
parenthesis ( ) brackets, a practice also followed by Sweet (1877). Square brack-
ets enclose alphabetic phonetic symbols throughout Jespersen (1889). According
to Makkai (1972: 4), it was not for another several decades that the slant bracket
convention for enclosing phoneme symbols was introduced, in Trager and Bloch
(1941).
Phonetic transcription derives from a conjunction of data and the symbols of
a notation system for denoting theoretical models as the products of category
intersections (see Chapter 1 Section 1.3.1). Data are ultimately traceable to real
observed utterances even when a transcription expresses a generalisation about a
whole population of speakers. Data can be in the form of a single consonant or
vowel, or tone, or longer stretches of sounds in pronunciations of words, phrases
and sentences. In the latter case, a segmental transcription is an ordered set of
symbols, meaning that a different sequence of symbols would be a different
transcription. In parametric transcriptions, however, each parameter is an ordered
set of parameter values, the transcription as a whole being an unordered set of
parameters because rearranging their relative vertical positions does not give us
a different transcription.
SomoudBarghouthy
4.1 Specific and Generic Transcriptions

A specific transcription comprises symbols which purport to express an analysis
of a specific instance of pronunciation at a particular time and place, as schema-
tised in (4.1); the arrow glosses as ‘maps onto’. That is to say, a specific tran-
scription is tied to a particular utterance.
(4.1) [a] ← pi, where pi is a specific pronunciation of a low front unrounded

vowel, e.g. in a specific pronunciation of English hat
It is in specific utterances that phoneticians make contact with the reality of their
subject matter. Specific transcriptions have, therefore, a special significance as
analyses of bits of reality and record what we think we know about that reality. They
will tend to be quite narrow if the transcriber is trying to capture features of speech
in a previously undescribed language, or which may be idiosyncratic to the speaker
or to a particular group of speakers of which the speaker in question is representa-
tive, or which may have a particular significance in their context of occurrence.
Transcriptions made by fieldworkers working with native-speaker consult-
ants, by speech pathologists working with clients exhibiting atypical speech, by
forensic phoneticians analysing the speech of individual criminals and suspects,
and by conversation analysts looking at particular instances of interactive talk
are specific transcriptions. The first specific transcriptions may well have been
Joshua Steele’s (1775) attempts to capture the rhythm and intonation of recita-
tions of Shakespeare and other literary works using his adaptations of musical
notation. The first line of his transcription of David Garrick delivering Hamlet’s
famous soliloquy is shown in Figure 4.1 (see also Section 4.11.3 and Figure 4.6).
In his Life of Samuel Johnson (volume 2, the year 1775) Boswell laments that
Johnson’s ‘deliberate and strong utterance’ was not captured by Steele for
posterity.
FIGURE 4.1: Steele’s (1775: 47) adaptation of musical notation. First line
of his transcription of David Garrick reciting Hamlet’s soliloquy. See
Section 4.11.3 for key to notation.
An example of a generic transcription is when we say that in many varieties of

English the vowel in hat is [a]. Although it will be based ultimately on specific
observed instances, otherwise it would merely be imagined, the transcription is
not intended to represent any single specific observed instance, but is generalis-
ing about the most typical pronunciation known to be found in certain varieties
of English. A generic transcription can be defined as in (4.2).
SomoudBarghouthy
(4.2)
Types of Transcription
[a] ← p, where p = {pi, pj, . . .. . .. pn}
143
A generic transcription can thus be thought of as representing an indefinitely

large class of past, present and future productions by an unspecified number of
unidentified speakers forming a particular kind of population. The method that
leads to generic transcriptions is therefore inductive: a general statement is made
on the basis of observed particulars.
Generic transcriptions can function as prescriptive models in dictionaries and
language teaching texts showing readers the ‘correct’ or recommended pronunci-
ation of words (see Section 4.13.2 below). Because of their generality, they tend
to be broad rather than narrow, representing only what is thought to be common
across all or most members of the population in question. Whether a transcription
is specific or generic will normally be clear from the context, though there are no
notational devices for distinguishing them.
4.2 Orientation of Transcriptions

The IPA recognises that transcriptions can take one of two perspectives on speech:
a speaker-oriented or a listener-oriented perspective (IPA 1999: 36–7). As the
terms indicate, their respective aims are to represent what speakers do and what
listeners hear. The IPA (ibid.: 37) notes ‘an assumption implicit in phonetic tran-
scription that the form to be transcribed is common to speaker and hearer’. I would
like to advocate that we take this assumption apart and explicitly acknowledge that
certain kinds of transcriptions are typically made from one perspective or the other
and are therefore not common to both perspectives. Systematic transcriptions
can be viewed unproblematically as neutral with respect to speakers and hearers
because within a language variety they share the same system. Impressionistic
transcription (see Section 4.4 below and Chapter 5), however, typically expresses
an analysis of what speech sounds like to a phonetically trained listener, and I
claim in Chapter 5 that this is why it is a valuable practice in phonetics; but it
can also be motivated by trying to ascertain what a speaker is doing, for example
in a clinical context when instrumental evidence will often not be available. An
impressionistic transcription is speaker-oriented to the extent that it is presented as
such by the transcriber. Transcriptions which summarise or annotate instrumental
records such as palatograms and articulograms are unambiguously speaker-
oriented, but it may be useful to introduce a third orientation for transcriptions
which summarise or annotate acoustic records (see Chapter 6). That is to say,
the category of signal-oriented transcriptions should perhaps be recognised. A
substantial body of research (e.g. Hamlet and Stone 1976; Ladefoged, Harshman,
Goldstein and Rice 1978; Maurer, Gröne, Landis, Hoch and Schönle 1993) has
confirmed Ellis’s (1889: 1277) observation ‘that two or three different positions
of the mouth may produce the same resonance’. Speakers are adept at compensa-
tory articulations through exploiting ‘motor equivalences’ (Perkell 1997: 366).
Conversely, quite different signal properties can come from very similar vocal
tract shapes when F0 is varied (Carrell, Smith and Pisoni 1981; Maurer et al.
1993). Transcriptional annotations of spectrograms denoting vowel qualities from
formant patterns therefore cannot unproblematically claim to be speaker-oriented.
SomoudBarghouthy
4.3 Broad and Narrow Transcriptions

A distinction which has proved of lasting value in using phonetic notation
systems is that between narrow and broad transcriptions (Laver 1994: 550–61;
IPA 1999: 28–30). But it is not always clear how this distinction relates to other
distinctions in transcription, particularly the systematic–impressionistic and pho-
nemic–allophonic distinctions, but also the quantitative–qualitative distinction
drawn by a number of phoneticians (Abercrombie 1964a: 35–6). Abercrombie
(ibid.: 35), for example, says of the terms ‘they had come to be used as equiva-
lents of phonemic and allophonic respectively’ by 1911, but something of this
equivalence is evident in the earliest writings on the matter more or less at the
start of the era of modern phonetics. In 1867 Ellis introduced the terms ‘approxi-
mative’ and ‘complete’, having the same import as broad and narrow, in relation
to uses of his palaeotype notation. Local (1983: 6–7) points out that Ellis uses
‘approximative’ almost in the sense of phonemic when he says it is for denoting
‘nearly related sounds’ where the differences between them are ‘unimportant
for the discussion in hand’ (Ellis 1867: 8). ‘Complete’ transcription was to be
employed if one wished to pick out finer distinctions to show how sounds which
are closely similar differ from each other. For example, Ellis (ibid.: 34) uses [r]
as an approximative symbol to cover the distinctions represented by the more
detailed complete symbols [r r ʀ ɹ ɹ .r] ([.r] = a vigorous trill; ibid.: 31). He also
identifies several symbols as not for use in approximative transcriptions. It was
Sweet, who described Ellis as ‘the pioneer of scientific phonetics in England’
(Sweet 1877: vii), who, for applications of his romic notation, first coined the
terms broad and narrow (ibid.: 105). In talking of the need ‘to have an alphabet
which indicates only those broader distinctions of sound which actually corre-
spond to distinctions of meaning’ (ibid.: 103), it is clear that for Sweet ‘broad’
meant what we would now call phonemic and ‘narrow’ meant general phonetic,
the former to be used for transcribing particular languages for practical purposes
and the latter to serve a more scientific function of expressing detailed general
phonetic analyses. Following on from Sweet, Jones (1918/1972: 332–3) explic-
itly defines broad transcription as phonemic, and the term is still used with this
sense (IPA 1999: 28). Jones then identifies two types of narrow transcription.
One is allophonic, while the other is ‘when use is made of “exotic” or inconven-
ient letters’ even though more familiar ones could be used. Abercrombie (1964a:
35) equates this distinction with his own distinction between ‘simple’ and
‘comparative’ transcriptions, which he applies to both phonemic and allophonic
transcription (ibid.: 22).
To cut through this confusion of distinctions, I would like to draw the broad–
narrow distinction in a more straightforward way than it has perhaps been drawn
before, which I think is also more in line with an intuitive interpretation of the
terms. I propose to base it entirely on how much detail is represented. The terms
‘broad’ and ‘narrow’ are thus relative and should be understood as being on a
continuum that has a logical endpoint of maximum broadness but no logical
endpoint of maximum narrowness. Any transcription could be made narrower
by adding another symbol or diacritic to it, a limit being reached only because of
limits on the skills of the transcriber or the delicacy of the notation. The absolute
SomoudBarghouthy Types of Transcription
limit of broadness would be a highly uninformative transcription in which only
145
the occurrence of indeterminate sounds is represented, by, for example, using

the ExtIPA symbol [(¯_)]. In practice, the broadest transcriptions will be phone-
mic (see Section 4.6); inclusion of archiphoneme symbols makes a phonemic
transcription broader still (see Section 4.8). Establishing a ‘broad–narrow’ con-
tinuum does not mean the terms always have to be used explicitly as comparative
terms or with qualifying adjectives such as ‘very’. Phoneticians will understand
reasonably well what is implied if a transcription they have not seen is described
as broad or narrow.
In practice, the broad–narrow distinction does not simply rest on how many
symbols and diacritics are in a transcription. For example, IPA [t̪] looks narrower
than [p] but in fact both are denoting three categories: voiceless dental plosive,
and voiceless bilabial plosive. Similarly, [n̥] looks as if it contains more detail
than [n] but again both are denoting three categories: voiceless alveolar nasal,
and voiced alveolar nasal. Whether a particular category is denoted by a sepa-
rate diacritic to form a complex symbol or is part of the denotation of a simple
integral symbol is not always consistent in IPA notation because of its roman
alphabet basis and historical bias towards the sound systems of English, French
and German. The narrowness of a transcription has to be assessed by the delicacy
of the analysis it expresses, not by counting diacritics, especially, but not only, in
the context of the more ‘exotic’ languages.
4.4 Systematic and Impressionistic Transcriptions

This distinction concerns the method by which a transcription is made. In system-
atic transcriptions, as the term suggests, judgements about the content of the tran-
scription are made by reference to knowledge of a phonological system, whereas
in impressionistic transcriptions they are made by reference to the transcriber’s
sense-impressions, which are then ascribed to phonetic categories. While impres-
sionistic transcriptions are necessarily specific because sense-impressions can
only come from specific utterances, systematic transcriptions can be either spe-
cific or generic. If a transcription denotes the phonemic or allophonic content of
a particular utterance then it is specific and represents a phonological analysis of
that utterance, but it has general validity because any speaker of the same variety
could produce an utterance which would be analysed as comprising the same
systematic elements despite phonetic differences.
The terms ‘systematic’ and ‘impressionistic’ were first introduced to distin-
guish two types of phonetic transcription by Abercrombie (1953: 32), although
he disclaimed originating them. They have continued to prove extremely
important and useful. The factor which divides systematic from impressionistic
transcription is predictability, a factor first identified as crucial to information
value by the mathematician Claude Shannon at the Bell Telephone Laboratories
(Shannon 1948). In a phonemic transcription, which is the most systematic
kind, it is already known how the phonemes are typically realised in different
phonological environments in the relevant language or language variety, and
this knowledge can be encoded in conventions accompanying the transcription
(Abercrombie 1964a: 23), or a source can be cited where such conventions are to
SomoudBarghouthy
be found. The prerequisite for making a systematic transcription is that the tran-
scriber has access to the knowledge encoded in the conventions for the language
variety being transcribed. By definition, systematic transcriptions are always and
only transcriptions of a particular language or language variety for which such
knowledge is available (IPA 1999: 28). Ball (1991: 61) sums up the position in
saying that ‘[i]t is not possible to use a phonemic transcription when you do not
know the phonology’. As we will see in Section 4.6 below, the relationship with
phonology means that systematic transcriptions carry with them the assumptions
and definitions of the particular phonological theory within the framework of
which the analysis they express has been made.
The confusion of the broad–narrow distinction with the systematic–
impressionistic distinction is understandable in that systematic transcriptions are
all relatively broad because the phonetic detail is in the conventions, not in the
transcription. However, there are degrees of broadness in systematic transcrip-
tions. The broadest are those where the symbols have the least specific phonetic
information. These are phonemic transcriptions, and especially archiphonemic
transcriptions. The conventions for interpreting phonemic transcriptions may
employ realisation statements containing allophonic transcriptions, which can
vary to some extent in how broad or narrow they are. For example, the state-
ment giving the contexts in which English /l/ is realised with a ‘dark’ allophone
might use the IPA symbol [ɫ], which covers both velarised and pharyngealised,
or be narrower in specifying it as velarised with the symbol [lˠ]. To take another
example, an allophonic transcription of English will mark aspiration on voiceless
plosives in certain contexts, but generally not the degree of aspiration, although
this is systematic cross-linguistically. Cho and Ladefoged (1999) have shown
that languages exploit different ranges of voice onset time (VOT) values in their
aspirated plosives depending on place of articulation, which could be indicated
in narrower systematic transcriptions with superscript length marks, for example
[pʰ, tʰˑ, kʰː], but supplied by conventions in broader ones. See Section 4.6 below
for how the phenomenon of free variants also affects the broadness of systematic
transcription.
Impressionistic transcriptions can range across the broad–narrow continuum
depending on how much detail the transcriber thinks it useful to represent, or on
the phonetic knowledge, confidence or skill of the transcriber. A transcription of
a previously unstudied language or language variety might, at least as a first pass
transcription, be quite broad, denoting only what Ashby (1990: 25) has called
‘basic level categories’. An example would be using the symbol [b] to denote
‘something [b]-like’ which on closer analytic listening one symbolises with
[bʷ], or [b̥], or [b̥ʷ]. None of the transcriptions is influenced by any knowledge
of the system, but some are narrower than others. The defining characteristic of
impressionistic transcription is that there are no conventions other than those
supplied by general phonetic theory for interpreting the transcription symbols.
A segmental impressionistic transcription, such as is commonly made with IPA
symbols, requires general phonetic conventions so that it is understood that the
structure of speech in the articulatory and acoustic domains is not discretely seg-
mental. Of course, such conventions do not need to accompany transcriptional
texts because they are part of the general knowledge-base of phoneticians. A
segmental impressionistic phonetic transcription comes with implicit general
147
phonetic conventions concerning not only what the symbols themselves represent
but also what is implied by particular sequences of symbols in terms of coar-
ticulatory phenomena. That is to say, in making an impressionistic transcription,
nothing should be assumed to be predictable unless universally predictable in all
languages, although in practice it may not be possible for transcribers to break
completely free of influences from their own linguistic knowledge as speakers
in order to avoid being biased by predictions, even unconsciously (see Chapter
5 Section 5.11). It is also becoming increasingly evident from phonetic research
that what is universally predictable about speech may be surprisingly limited.
Impressionistic transcription is the subject of Chapter 5, where it is discussed
at length. Here, I shall just note the uses to which impressionistic transcriptions
are put. They have in common that they are specific transcriptions resulting
from auditory-perceptual, or auditory-visual-perceptual, analysis of particular
utterances of particular speakers on particular occasions. The kinds of contexts
in which auditory analysis and impressionistic transcription are carried out
include speech pathology and therapy (Heselwood and Howard 2008), studies
of speech development (Vihman 1996: 100–1), forensic cases (Nolan 1997:
759–61), conversation analysis (Jefferson 2004; Local and Walker 2012),
linguistic and dialectological fieldwork (Orton and Dieth 1962; Abercrombie
1964a: 36), and phonetic ear-training (Jones 1918/1972: 350); see relevant sec-
tions in Chapter 7.
One type of transcription which does not always fit the impressionistic–
systematic dichotomy is a corpus transcription (see Chapter 6 Section 6.2.3).
Corpus transcriptions are non-impressionistic in so far as they denote the products
of statistical analyses of corpora, and they are non-systematic if no conventions
other than general phonetic conventions have to be consulted to interpret them.
4.5 General Phonetic Transcription

Transcriptions can be made which are neither impressionistic, because they are
not representations of actual utterances, nor systematic, because they are not
related to a phonological system. Such transcriptions can be called general pho-
netic transcriptions. One such example is a phonetic nonsense word. Nonsense
words are constructed from phonetic symbols often to function as performance
scores in practical phonetics exercises (see Section 4.13). General phonetic tran-
scriptions of this kind denote syntagmatic structures comprising general phonetic
models but represent not an analysis of any data, only of potential data. In effect,
the analysis precedes the data, which come into existence once the score has been
performed.
In phonetic research investigating anthropophonic possibilities, it might be
necessary to represent hypothesised sound-types which have not actually been
observed other than in the attempts of the researcher to produce them. For
example, as far as I am aware a simultaneous click and voiced implosive has not
been attested in any language, but it is possible to produce a click while using
its initiatory velar closure as an articulatory closure for [ɠ], resulting in sounds
which could be transcribed [ɠ͡!], [ɠ͡ǁ] etc. Although absolute simultaneity of their
SomoudBarghouthy
bursts is probably impossible for aerodynamic reasons, and thus predicted to be
impossible by phonetic theory – the click articulation has to be released while the
velar closure is still in place – they are temporally close enough to be considered
a single complex sound. In Ladefoged and Traill’s (1994) terminology, the click
has a voiced implosive accompaniment. Ejective nasal stops are not reported
for any language (Ladefoged 1997: 618), but it is quite easy to maintain an oral
closure and open the velopharyngeal ̥ port to release glottalic egressive air. The
result could be transcribed as [ŋ’], [n̥’] etc.
There may be occasions when one wishes to include information which is
predictable by general phonetic theory, for example in explaining phonetic theory
itself. In such cases, the transcription becomes narrower. When a lateral conso-
nant immediately follows a homorganic plosive without any intervening vocalic
segment, phonetic theory explains that the intra-oral air pressure is released
laterally. This explanation functions as a convention in the interpretation of
transcriptions such as [tl dl cʎ ɟʎ], but it can be denoted explicitly with the IPA
lateral release diacritic, [tˡl] etc., and similarly with nasal release. For reasons
such as these, general phonetic transcriptions can range along the broad–narrow
continuum.
When phonetic interpretations of instrumental records are expressed in tran-
scriptions without reference to any system-specific conventions, they are general
phonetic transcriptions (see Chapter 6 Section 6.1).
General phonetic transcriptions may also be employed to exemplify the use of
phonetic notation systems without reference to any utterances or languages. The
purposes of general phonetic transcriptions are therefore very much internal to
the preoccupations of general phonetics as a theoretical and practical discipline.
4.6 Phonemic Transcription

Exploitation of sound–spelling correspondences in the history of writing have
provided resources for pseudo-transcription in the absence of phonetic or
phonological theory. These were the letters of abjads, abugidas and alphabets
(see Chapter 1 Section 1.1.2). The level of phonological structure which these
most closely relate to is the phonemic level, attesting to either a pre-theoretical
appreciation of the functional equivalence of allophonic variants, or a lack of
appreciation of their phonetic differences. Either way, the letters were available
as pseudo-phoneme symbols. The most common resource for phonemic tran-
scription nowadays is the IPA.
Because phonemic transcriptions express how the units of contrast in a phono-
logical system are employed to provide the phonological forms of lexical items,
they can be classified as ‘system transcriptions’ as a subtype of systematic tran-
scriptions. That is to say, they denote only units of a system and give no informa-
tion about their realisation. Phonemic transcription therefore cannot be done until
a phonological analysis of the language or language variety has established at
least a provisional phonemic inventory and assigned a symbol to each phoneme.
Phoneme symbols give us more information when seen in the context of the
whole inventory, or at least some structurally defined part of the inventory. To
know that language L has a /b/ phoneme, for example, tells us more if we know
what other labial oral stop phonemes it has. Arabic has only /b/, English has /p/
149
and /b/, Punjabi has /pʰ p b/ while Hindi has /pʰ p bʰ b/.
Precisely what a phoneme symbol denotes depends crucially on the theo-
retical framework of phonology within which the transcriber is working. The
conventions for interpreting the symbols will be couched implicitly or explicitly
within the terms of a particular theoretical view of what phonemes are. Presented
outside of a phonological theory, a phonemic transcription cannot be properly
interpreted, becoming only approximately indicative of a certain pronuncia-
tion. There is no room in this section to summarise all the definitions of what a
phoneme is that have appeared in the literature over the last century or more, but
two radically different schools of thought can be discerned dividing views of the
phoneme into, on the one hand, an object in the internal grammar of speakers
and, on the other, an analytic construct existing only in a phonological theory.
The first view is common to all versions of generative phonology which have
not explicitly rejected the concept of the phoneme, and the second characterises
most functionalist theories of phonology. They have in common the fundamental
insight of the phoneme principle that sounds are grouped together in language-
specific sets such that lexical items in that language cannot be differentiated in
pronunciation by sounds belonging to the same set. The sounds [p] and [pʰ] in
English cannot be the only difference in the pronunciation of two different words
because they belong to the same set: [pʊɫ] and [pʰʊɫ] are taken by English speak-
ers to be slightly different pronunciations of the same word pull, the first being
much less typical but nonetheless possible. In Punjabi they belong to different
sets and do distinguish one word from another: [pʊl] means bridge and [pʰʊl]
means flower. These examples show that in English [p] and [pʰ] are allophones
of the same phoneme /p/, whereas in Punjabi they are allophones of two different
phonemes, /p/ and /pʰ/.
According to generative phonology, English speakers have the phoneme /p/ as
an object in their mental grammar along with rules to determine which allophone
of /p/ to use in which phonological contexts: [pʰ] in syllable-initial contexts
such as pool, [p] when preceded by /s/ as in spool, and so on. Under this view, a
phoneme is like a symbol in a Turing machine, an object which is manipulated by
instructions. In the version of generative phonology known as Optimality Theory
(Prince and Smolensky 1993) there are, to avoid rule duplication (Kenstowicz and
Kisseberth 1977: 136), constraints instead of rules, but their role is essentially the
same: to ensure that the objects obey the instructions so that the right forms are
in the right places in the output of the grammar. The symbol /p/ in a phonemic
transcription of English by a generative phonologist thus denotes an object exist-
ing in the speaker’s grammar, and because the speaker’s grammar exists in the
speaker’s mind, that is where the /p/ must be. The content of a phoneme is taken
to be, in most versions of generative phonology, a set of distinctive features or
distinctive feature-values, or distinctive gestures, which distinguish that phoneme
from all other phonemes. These features or gestures are the minimal phonologi-
cal objects in the grammar and are usually considered to be universal. That is to
say, all languages select a subset from a universal set. Therefore /p/ denotes not
just the phoneme but also its content as selected from a universal set of minimal
phonological entities.
SomoudBarghouthy
The purpose of phonemic transcriptions in generative phonology is to specify
the phonological form of underlying representations of lexical items (see Section
4.9 below). Versions of generative phonology differ in how abstract they are
prepared to allow underlying representations to be, but they work in the same
way to accommodate alternations. For example, some generative phonologists
would say that atom has the underlying representation /atɒm/ in order to account
for atomic; rules or constraints derive the observed pronunciations [ˈatəm] and
[əˈtɒm-] by reference to suffixation and stress placement. Others would reject
this as the underlying representation because it is never the surface form of any
alternant. Instead, they would have two separate underlying representations,
/atəm/ and /ətɒm-/. To interpret a generative phonemic transcription fully, one
has to know not only the feature-system or gestural system supplying the content
of phonemes, but also what kinds of processes are said to apply to it to derive
surface forms.
Functionalist phonologists downplay the existence of phonemes in speakers’
grammars on the grounds that phonemes cannot be directly observed. These
phonologists have a more empirical theoretical orientation, in contrast to the
rationalist orientation of generative approaches (see Chapter 5 Section 5.10). In
functionalism, a phoneme is a model defined by a theory to account for patterns
of distribution and distinctive function of observable sounds in speakers’ pro-
nunciations. For example, in English the sounds [p pʰ ʔ͡p . . . p̪] (the dots indicate
there could be others) do not distinguish one lexical item from another, and tend
to be found in complementary phonological contexts. The phoneme /p/ is exten-
sionally defined as this set of sound-types in relation to a particular distinctive
function. This relation can be expressed in notation as in (4.3).
(4.3) {[p pʰ ʔ͡p . . . p̪]} R di (R = ‘in relation with’; di = a particular distinc-

tive function; see Mulder 1989: 156–9; Dickins 1998: 130)
Its intensional definition, that is to say its value in the phonological system, at
least in the more structurally oriented functionalist approaches, is a negative
one based on the fact that it is none of the other phonemes, so that /p/ = ~{/t
k b d ɡ . . . /etc.} (~ = ‘not’). Its intensional definition can also be expressed
as a set of distinctive features, but it is important to appreciate that these
distinctive features are not the same as the distinctive features of generative
phonology. They are derived from how the phoneme relates to other phonemes
in a network of structural relations in terms of correlations and proportionali-
ties and are therefore language-specific, not universal. While English /b/ has
the feature ‘voiced’ in addition to ‘labial’ and ‘plosive’, Arabic /b/ does not
because there is no /p/ to contrast it with. In a fully universalist phonology,
/b/ has the same content in all languages. It is not assumed by functionalists
that speakers have any knowledge of these structural relations, or that speak-
ers group sounds together in phoneme sets according to the same criteria that
phonologists use. A phoneme symbol in a transcription made from a function-
alist theoretical position of this kind therefore denotes a significantly different
kind of object from that denoted by a generative phonologist’s transcription.
Rather than being like a symbol in a Turing machine, it is more like an object
in Shannon’s (1948) theory of communication, an object with a certain infor-
151
mation value for the phonologist.

We therefore cannot equate a phoneme symbol in generative phonology with
the same symbol in functionalist phonology. They denote different things, and
are the same symbol only in form, not in content, and therefore not really the
same symbol. Strictly speaking, their relationship is one of homography only.
Whatever phonological theory a phonemic transcription comes from, its
focus is on the role sounds have in the lexical distinctions of a language. From a
phonetic point of view, a phonemic transcription is a minimally low-resolution
account of pronunciation. For example, the transcription /bɒtl/ for English bottle
covers a range of phonetic possibilities including [ˈbɒtˡɫ], [ˈbɒʔəɫ], [ˈbɒʔɤ] etc.
all of which are implied in the phonemic transcription, or at least not excluded
from its interpretation.
The extent to which predictability is regarded as an important criterion in
phonological analysis can have a bearing on the form of a phonemic transcrip-
tion. Heselwood (2007) has argued that many occurrences of schwa in English
are predictable and they should therefore not appear in phonemic transcriptions.
For example, words such as today and abbot have to be pronounced with a schwa
only because the phonetic clusters [td-] and [-bt] are disallowed, so the phonemic
transcriptions should be /tdeɪ/ and /abt/. Phonologists who do not accept the argu-
ment will include schwa, transcribing them as /tədeɪ/ and /abət/.
There are a number of other theoretical issues which will divide transcribers
and be reflected in their phonemic transcriptions. There is not room to cover
all of them here, but I shall discuss a few different kinds of examples as illus-
trations. The first concerns how assimilations are phonemicised. In English,
road-mending can optionally be pronounced the same as robe-mending through
the very common process of assimilation of final /d/ to a following initial /m/
(Cruttenden 2001: 285). When this kind of assimilation happens, road is then
often represented in phonemic transcription as /rəʊb/. An alternative view is that
it should still be transcribed /rəʊd/ because the set of allophones which can occur
is different from the set which can occur at the end of robe in robe-mending. In
the latter, the final stop has to be [b] and can be released either orally or nasally,
whereas the final stop at the end of road in road-mending can be either [b] or [d],
and if it is [b] it can be released only nasally. The difference in the two sets can
be represented using a Venn diagram as in Figure 4.2.
road-mending robe-mending
/-d/ /-b/
[d], [dn] [bn] [b]
FIGURE 4.2: Overlapping but distinct sets of allophones of /d/ and /b/ at
an assimilation site
SomoudBarghouthy
The fact that the contents of the two sets are not the same can be used to justify
an analysis in which [ɹəʊbnmɛndɪŋ] can be phonemicised as /rəʊdmɛndɪŋ/ if it is
known that [ɹəʊdmɛndɪŋ] is lexically equivalent but that [ɹəʊbmɛndɪŋ] with oral
release of [b] is not. The importance of the identity or non-identity of sets when
assessing assimilatory phenomena is that it enables us to focus, in Saussure’s
terms, on langue without being distracted by parole. In individual utterances,
neutralisation takes place when there is complete assimilation – when road-
mending is pronounced so that it is indistinguishable from robe-mending – but
the task in a phonemic transcription, as a system transcription, is to represent the
facts of the system (langue), not single utterances, and if the system allows for
variants at a particular point in structure, then our phonemic analysis must take
account of the whole set of variants, not just the one that happens to occur in a
particular utterance. Taking this approach, the biuniqueness objection (Chomsky
1964: 94; Lass 1984: 27–30) is easily avoided.
The above example involves disagreement as to which of two phonemes
should be represented as occurring in a particular type of context, but there is
no disagreement that the two phonemes are part of the inventory. In the next
example, the case of English [ŋ], however, the disagreement is over whether
it is an allophone of /n/ or of an additional /ŋ/ phoneme. Minimal pairs such
as sin–sing [sɪn–sɪŋ], fan–fang [fan–faŋ], run–rung [ɹʌn–ɹʌŋ] and so on can
be adduced to support the need to assign [ŋ] to a different phoneme from [n].
An alternative analysis is to say that words such as sing, fang and rung actually
have a final /ɡ/ in their postulated underlying phonological form, in which case
the occurrence of [ŋ] can be attributed to its influence. The statement in (4.4)
describes the distribution of [ŋ] as an allophone of /n/ when preceding a velar
consonant in the same stem.
(4.4) /n/ → [ŋ] / __velar C #
For this statement to work there has to be a process which removes the underly-
ing velar consonant in cases such as sing pronounced [sɪŋ], but not from words
such as stronger, which have to be treated as lexical exceptions (note that the
postulated /ɡ/ is in the same stem strong as the nasal). Those who propose the
analysis of [ŋ] as an allophone of /n/ usually justify it by pointing out that [ŋ]
in English, unlike the other nasals, is not found word-form-initially. These pro-
ponents argue that it is better to avoid having phonemes in the inventory with
very restricted distributions. Furthermore, there are varieties of English in which
sing, fang, rung and so on are pronounced with a final [ɡ] (Wells 1982: 46), for
which (4.4) is an unproblematic analysis; these varieties retain the pronunciation
of these words which were general to English before the seventeenth century and
which is still reflected in their spelling.
Phonological quantity provides another example of theory determining phone-
mic transcription. There has been a longstanding debate about whether English
vowel distinctions in words such as sheep–ship, pool–pull, caught–cot are based
primarily on a long–short quantity distinction, as in Jones (1918/1972: ch. XIV),
or on the quality distinctions [i]–[ɪ], [u]–[ʊ], [ɔ]–[ɒ] (Gimson 1980: 96–100;
Cruttenden 2001: 94–6). Phonetically, both kinds of distinctions are present,
but we need only denote one of them if the other is predictable and can be sup-
153
plied by convention. Depending on one’s analysis, phonemic transcriptions of

the above words can therefore be as in (4.5) or as in (4.6), both of which see the
distinction as operating on the paradigmatic axis, or axis of substitution of one
element for another.
(4.5) /ʃiːp – ʃip/, /puːl – pul/, /kɔːt – kɔt/
(4.6) /ʃip – ʃɪp/, /pul – pʊl/, /kɔt – kɒt/
In fact they are often transcribed to show both kinds of difference, as in (4.7),
which is not really phonemic but broad allophonic, because either the quantity
difference or the quality difference is redundantly represented (Fox 2000: 32).
(4.7) /ʃiːp – ʃɪp/, /puːl – pʊl/, /kɔːt – kɒt/
Alternatively, a long vowel can be analysed as a sequence of the same short

vowel repeated so that phonetic [iː] is phonemicised as bi-phonemic /ii/, the solu-
tion favoured by Pike (1947: 138) providing the second vowel commutes with
other vowels, or as a sequence of a short vowel plus a glide, as in /ij/. In these
analyses, sheep (/ʃiip/ or /ʃijp/) and ship (/ʃip/) differ only on the syntagmatic
axis, or axis of combination of elements, not the paradigmatic. The vowel-plus-
glide analysis was extended controversially by Bloch and Trager (1942) to deal
with other long monophthongs in English, such as [ɑː], by postulating a voiced
post-vocalic allophone of /h/ and phonemicising [ɑː] as /ah/, an analysis which
Fox (2000: 37) notes is reached by looking at how length operates in the whole
system rather than at oppositions at a particular position in syntagmatic structure.
A moraic view of the syllable suits bi-phonemic analyses in which each
phoneme occupies a separate mora. The theory of autosegmental phonology has
an a priori prohibition on the same phoneme occurring in succession, known as
the ‘obligatory contour principle’. To avoid this while keeping the number of
vowel phonemes to a minimum, a single vowel can be shown as associating to
two morae as in (4.8), a representation requiring association lines and syllable (σ)
and mora (μ) node symbols as additional notational devices.
(4.8) σ
μ μ
How vowel quantity is represented in phonemic transcription will depend on

theoretical considerations such as keeping the inventory of phonemes as small
as possible, on one’s theory of the syllable, and on whether one takes a paradig-
matic, syntagmatic or prosodic view of quantity (Fox 2000: 19–20).
SomoudBarghouthy
Like vowel quantity, consonantal quantity can also be analysed in different
ways with implications for transcription. Languages like Italian and Arabic make
distinctive use of consonantal length in pairs of words such as Italian fato ‘fate’
contrasting with fatto ‘made’, and Arabic kasara ‘he broke (something)’ contrast-
ing with kassara ‘he smashed (something)’. Again, considerations of economy
come into play in deciding whether these languages have short and long conso-
nant phonemes – /t/–/tː/, /s/–/sː/ etc. – or repetition of the same phoneme: /tt/,
/ss/ etc. Autosegmental phonology can deal with this through association of one
phoneme to two consonantal positions across a syllable boundary (Gussenhoven
and Jacobs 1998: 161–2).
Phonotactic analysis can also have a bearing on phonemic transcription.
Heselwood (2007, 2008a) argues that initial consonant clusters in English are
phonologically simultaneous. Because their sequence is fixed – /sp- st- sk-/
occur, for example, but not */ps- ts- ks-/ – the order in which they occur has
no information value. This can be shown in phonemic transcription by joining
the members of initial clusters with a tie bar: /s͡p-/ etc. This contrasts with non-
simultaneous /sp-/ in pairs such as sport–support, where the schwa in the latter is
said to have the function not of a phoneme but of allowing /s/ and /p/ to permute
in forms such as perceive. They can only permute if separated by schwa, which
marks them as being in an ordered relation, not a simultaneous relation.
Special notation for representing phonotactic relations has been developed by
Mulder (1968: 118–19, 1987: 41; see also Heselwood 2008a). Peripheral phono-
tactic positions are labelled in relation to the nuclear position (n) of a phonotactic
construction, typically occupied by a vowel. Positions before the nucleus are
called ‘explosive’ (e), those after it ‘implosive’ (i). These are arbitrary terms but
motivated by Saussure’s conception of the syllable and the function of conso-
nants surrounding its vowel (Saussure 1974: 51–8). In the English word slips,
for example, the phonemes and their positions can be represented as in (4.9).
The phonemes in peripheral positions are said to determine the phoneme in the
nuclear position by virtue of the positions they occupy.
(4.9) {/s/, e1}, {/l/, e2}, {/i/, n}, {/p/, i1}, {/s/, i2}
In choosing a symbol for a phoneme some principled criteria have to be

invoked if selection is not to be arbitrary. For example, varieties of English
can be found in which the voiceless alveolar plosive /t/ has the allophones [ɾ],
which is not voiceless, [ʔ] and [h], which are not alveolar, and [s̝], which is not
a plosive. The two principle criteria which seem to have been used most consist-
ently are to select the ‘canonical’ allophone and use the simplest character shape.
The notion ‘canonical’ is not easy to define, and has been characterised as ‘the
perceived “basicness”’ of a variant (Dickins 1998: 253), which generally means
the variant with the widest distribution or most common occurrence, although
it is not clear how these can be reliably established in the face of stylistic and
sociophonetic variation. In practice, the allophone occurring in singleton syllable
onset contexts in citation-form speech is the one chosen. In Spanish, for example,
the symbols /b d ɡ/ are used for phonemes which are realised as fricatives or
approximants intervocalically but as plosives when not preceded by a vowel.
SomoudBarghouthy Types of Transcription 155
In the case of English /t/, we can identify the canonical allophone as [tʰ] by this
method. The criterion of the simplest character shape is then brought to bear to
omit the [ʰ] diacritic; slant brackets are then placed round the remaining /t/.
In his influential classification of types of transcription, Abercrombie (1964a:
17–21) proposed a distinction between a simple phonemic transcription and
a comparative phonemic transcription; Jones’s (1918/1972: 334–6) simple–
complex distinction is essentially the same. The distinction hinges on how far the
set of symbols sticks closely to the shapes of the letters of the roman alphabet
and dispenses with ‘exotic’ letters and diacritics. When Abercrombie made this
distinction, using non-roman letters and adding diacritics was typographically
challenging, a consideration which put simple transcriptions at a premium.
Nowadays, in the age of digital word-processing with numerous fonts available
containing all kinds of characters and diacritics, transcriptions are no longer
constrained by access to symbols, but Abercrombie’s emphasis on simplicity
of character-shape for phoneme symbols still has force from the point of view
of reading transcriptions. It is easier on the eye if symbols are familiar and una-
dorned with diacritics, a fact recognised throughout the development of the IPA.
Furthermore, because phoneme symbols denote oppositional entities, they do not
need to contain, and one might argue should not contain, phonetically specific
information. Indeed, Abercrombie advocates departing from the principle of sim-
plicity only if one wishes to ‘be more phonetically specific’ (1964a: 21), though
he is not altogether clear about how this is compatible with being phonemic. His
example of transcriptions of English using /ɹ/ as opposed to /r/ can perhaps be
justified on the grounds that this phoneme does not have a wide allophonic range
like that of /t/, and therefore the symbol for the principle allophone can be used
as the phoneme symbol, despite its not being a roman alphabet letter, to draw
attention to its median approximant nature.
4.7 Allophonic Transcription

Because allophones belong to phonemes, the concept of an allophone is just as
theory-dependent as the concept of the phoneme. Exactly what an allophone
symbol denotes therefore varies according to the theoretical framework of the
transcriber, and again the major difference is between generative and functional-
ist theories. To illustrate the difference, consider the statement in (4.10).
(4.10) /p/ → [pʰ] / #__V (# = syllable boundary)
In a generative approach, this rule states that the object in the speaker’s mental
grammar denoted by /p/ changes to become [pʰ] when it occurs as the only item
in a syllable onset and is followed by a vowel. The allophone denoted by [pʰ] is
also a mental object conceived of as comprising specifications for pronunciation.
It represents the speaker’s articulatory intentions (Bromberger and Halle 2000:
24–5), not the actual execution of those intentions. The arrow represents the
mental process of deriving [pʰ] from /p/. In fact these derivations are often given
in the form of changes applied to feature specifications as in (4.11) – note that
slant brackets are typically not used in these feature-changing rules.
SomoudBarghouthy
156
(4.11)
−voice −voice
+labial +labial
−aspirated +aspirated / #— V
−continuant −continuant
Here, an object specified as [−aspirated] changes into one specified as [+aspirated].

Nothing in a generative interpretation of (4.10) or (4.11) denotes anything
outside the speaker’s mind, and the statement as a whole can be taken as an
expression of part of the speaker’s knowledge of the phonology of the lan-
guage. A functionalist interpretation of (4.10) is quite different, and sees it as
stating which member of the set of sounds [p pʰ ʔ͡p . . . p̪] is found to occur
in the context in question. Under this view, the arrow has nothing to do with
a speaker’s real-time mental processing and there is no change of one object
into another, nor is it implied that the speaker selects this allophone from the
set. It is instead an expression of the phonologist’s analysis of what can be
observed.
A key difference between phonemic and allophonic transcription, at least from
a functionalist point of view, is that phonemes are defined negatively by their
place in a network of phonological relations, whereas allophones are members of
extensional sets and therefore have positivist phonetic identities. We can identify
[pʰ] by a voiceless labial closure released with aspiration, but we cannot tell if it
is an allophone of a /p/ phoneme as in English, or a /pʰ/ phoneme as in Punjabi
or Korean, unless we know what the other phonemes in the system are.
To show explicitly that phonetic symbols are being used to represent sounds
in their systematic roles as allophones rather than as general phonetic sound-
types, Mulder (1989: 304) encloses the square-bracket transcription within slant
brackets. In /[pʰiːɫ]/, for example, the slant brackets denote that the contents of
the square brackets are realisations of phonemes.
Symbols in an allophonic transcription can be considered to exist at the
interface between the positivist models of phonetic theory and the negativist
structures of oppositional relations which form the phonematic systems of
languages. The long and fruitless search for invariant content of phonemes
was based on the erroneous belief that all allophones of a phoneme would
have some common positive phonetic property. It was motivated by the
assumption that phonemes are perceptual units, that listeners needed to be able
to recognise them in speech, and that they could only do so by detecting some-
thing which was present in all a phoneme’s allophones (see Chapter 5 Section
5.3). Belief in phonemic invariance has more or less ceased (Raphael 2005:
200), and there is now recognition among psycholinguists, for example Remez
and Trout (2009: 259), not only among phonological theorists, that phonemes
are entities with no essential phonetic properties linking their allophones, and
that the only thing they all always have in common, by definition, is their dis-
tinctive function.
It was noted in Section 4.4 above that allophonic transcription can vary
along the broad–narrow continuum and that an important reason for this is the
existence of free variant allophones. Allophones which are in complementary
distribution can be handled by interpretative conventions which identify their
contexts of occurrence. For example, in English, [tʰ] is the predictable allophone
of English /t/ when it occurs as a singleton in the onset of a stressed syllable in
words such as tea, taking, intense. But which allophone of /t/ will occur inter-
vocalically between a stressed and unstressed syllable is not predictable system-
atically. In betting for example, the allophones [t], [ɾ] and [ʔ] are all possible.
We cannot predict which will occur on a given occasion even though one may
be more probable than the other in speakers of certain social backgrounds and
in certain communicative contexts. In final position, /t/ may also be alveolar or
glottal, with the added free variation between released [t ʔ] and unreleased [t˺
ʔ˺]. In phonetic dictations for ear-training, students are often asked to listen
out for which variant is used and to transcribe it, whilst using phoneme symbols
where variants are predictable. Specifying free variants moves the transcription,
or at least parts of it, into a narrower region of the continuum and starts to blur
the line between systematic and impressionistic transcription because, in repre-
senting which variant occurs on a particular occasion, we are mixing specific
transcription into an otherwise generic one. Free variation seems to be behind
Jones’s (1918/1972: 334) example of comparative allophonic transcription. To
maintain allophonic transcription as strictly generic, free variation could be
dealt with by including all the variant allophones linked by alternation-tildes
in a multiple symbol such as [t~ɾ~ʔ], or by including them in the conventions.
However, if all allophonic variation is dealt with by conventions, the tran-
scription will be phonemic, not allophonic. It seems to me therefore that, for
a transcription to be truly allophonic and contain information not recoverable
from conventions, it has to be specific and not generic when dealing with free
variation. Furthermore, it seems to me that the simple–comparative distinction
drawn by Abercrombie (1964a: 22) and taken up by Jones (1918/1972: 333–4)
may have come about in response to free variants requiring a more extended set
of symbols than is needed for a truly systematic transcription, together with the
strong preference, originating with Ellis and persisting through the formation of
the IPA, for symbols based as closely as possible on roman alphabetic letter-
shapes for typographical convenience and in recognition of the fact that they are
highly familiar to most users.
4.8 Archiphonemic Transcription

The notion of archiphoneme is a highly theoretical one, but something like it has
occasionally been manifest in writing systems (see Chapter 1 Section 1.1.2). The
Avestan letter <T>, for example, can be described as a pseudo-archiphoneme
symbol.
In phonology, the theory of the archiphoneme is mainly associated with
the Russian Prague School linguist Nikolai Trubetzkoy and later with the
French neo-Praguian André Martinet (Akamatsu 1988: 10). Generative phonolo-
gists have not paid it much attention, not finding it useful for their purposes.
Trubetzkoy was concerned to account for why certain phonemic oppositions,
rather than certain phonemes, seemed to be systematically excluded from certain
contexts. A well-known example from English comes from the absence of an
opposition between voiced and voiceless plosives in initial clusters after /s/.
SomoudBarghouthy
Words such as spy, sty, sky had previously been, and often still are, phonemically
analysed as /spaɪ/, /staɪ/, /skaɪ/, but whereas the /p/ in pie contrasts with a /b/ in
buy, the /p/ in spy does not: */sbaɪ/ is unattested in English as the form of a dif-
ferent lexical item. Trubetzkoy was not content to attribute this sort of state of
affairs to accident and had the insight to see how it might be accounted for by a
general principle. The solution he devised was to say that what occurs after the /s/
in spy is neither /p/ nor /b/ but only what they have in common (Trubetzkoy 1969:
77–9). Following a suggestion by Jakobson, he called this item an archiphoneme.
Because he envisaged phonemes as comprising distinctive features, he was able
to define an archiphoneme as comprising only the distinctive features common
to both phonemes. In the case of /p/ and /b/, the common features are ‘labial’
and ‘plosive’, and the archiphoneme, usually symbolised /P/, has just these fea-
tures and no others; the opposition between /p/ and /b/ is said to be neutralised.
Transcriptions of spy, sty, sky expressing an archiphonemic analysis are therefore
/sPaɪ/, /sTaɪ/, /sKaɪ/ (for a critique of how these archiphonemes are derived, see
Heselwood 2008a: 9–10). Capitalising the symbol for the voiceless member of
each pair of phonemes rather than the voiced one is motivated by another point
of phonological theory, which is that oppositions based on voice are privative.
According to this view, /b/ has the feature ‘voiced’ but /p/ does not (we have
seen in Chapter 3 that there have been different views about this). With ‘voice’
consequently out of the picture, /P/ makes more sense than /B/, although both are
logically adequate. Various other proposals have been made about how to sym-
bolise archiphonemes, some of which include the symbols for all the phonemes
affected by the neutralisation. For example, instead of /P/, both /p/ and /b/ can be
combined in a ‘multiple symbol’ /p/b/, or /p-b/, or <p/b> (cf. Jones’s (1918/1972:
337) ‘multiliteral’ transcriptions). Using a single symbol is a direct symbolisa-
tion, while using a multiple symbol is an indirect symbolisation (Akamatsu 1988:
329). Probably for practical reasons, the direct option became the preferred one
amongst most phonologists who accepted the value of the archiphoneme concept.
For a lengthy discussion of the theoretical issues involved in representing archi-
phonemes with symbols, see Akamatsu (1988: 314–31). Trubetzkoy himself did
not seem overly concerned with its symbolisation, having at various times used
Greek letters, roman capitals and multiple symbols.
The general principle of the archiphoneme can be seen behind the proposal
to have ‘inclusive’ transcriptions (see Section 4.10 below) to represent variant
vowel qualities, although the vowels involved would not be candidates for archi-
phonemic analysis according to archiphoneme theory. Sundby (1983: 152–5)
explains the suggestion to use them in this way for the Dictionary of Early
Modern English Pronunciation, 1500–1800. For example, [A] is defined as a
cover symbol for [a æ ɑ], so that pronunciations of words like grass can be given
one inclusive transcription with conventions linking [a] to the north of England,
[ɑ] to the south, and so on.
4.9 Morphophonemic Transcription

Roman capitals feature in the representation of morphophonemes and there are
some similarities between archiphonemes and morphophonemes, although the
differences between them are more important. Pseudo-morphophonemes can be
159
identified in various spelling systems, English being one of them, and are evidence
for a pre-theoretical appreciation of a certain kind of relationship between a set
of sounds and a morpheme. The members of the sets tend to be the same as or
similar to those which are neutralised to derive archiphonemes. The relationship of
characters with morphemes seems to have quite a strong appeal to users of written
language. Korean Hangǔl, for example, has moved in the direction of morpho-
phonography from purely phonographic origins (see Chapter 2 Section 2.2.7).
A morphophoneme denotes a morpheme by using a phoneme symbol asso-
ciated with one of its regular alternants. For example, the plural morpheme in
English has, notwithstanding possible archiphoneme analyses, the regular alter-
nant /s/ after a stem-final voiceless obstruent (caps, cats, tacks, cuffs, breaths
etc.) and /z/ elsewhere (bibs, lids, figs, doves, bells, pens, peas etc.). (Words like
glasses, matches etc. I leave out for convenience of illustration.) Because /z/ has
the least restriction on its contextual distribution, it is chosen as the base form of
the plural morpheme and represented in curly brackets by the roman capital {Z}.
The brackets denote its grammatical status as a morphophoneme, not a phoneme
or archiphoneme. It is important to note that, unlike in the case of archipho-
nemes, the distribution of the alternants is determined not solely by phonological
context but also by grammatical identity. The phoneme /s/ can occur in some of
the contexts where /z/ occurs, for example in else, pence, peace, but not as the
phonological form of the plural morpheme. The symbol {Z} therefore denotes an
alternation between /s/ and /z/, and because this alternation is morpheme-specific
(ignoring for convenience the possessive and the third singular present tense
morphemes), it can also denote the plural morpheme itself. The alternants /s/
and /z/, as phonemes, have only phonological identity, but {Z} has grammatical
identity. The regular spelling of the plural with <-s> has a distribution which cor-
responds in written English with {Z} in spoken English, although, by extension
to irregular forms of the plural, {Z} can be used to denote plural in words like
oxen, sheep, geese and so on. That is to say, it can stand directly as a symbol for
the plural morpheme regardless of the regular /s~z/ alternations.
Returning to the archiphoneme, we can see that it is implicated in the regular
plural alternation because, after a voiceless obstruent, the /s–z/ opposition is neu-
tralised – by most accounts */kapz/, *katz/ etc. are non-attested forms in English.
Accepting this, one of the alternants has to be /S/. By the same reasoning over
the non-occurrence of /s/ after a voiced obstruent, another alternant has to be
/Z/, and because both /s/ and /z/ can occur after a sonorant, /z/ has to be a third
alternant. The point of this discussion is not to argue for a particular position on
these issues, but to show that what a morphophonemic symbol actually denotes
depends on one’s decisions regarding a number of theoretical factors.
Morphophonemes can be used for all alternations, not just those involving
mono-phonemic morphemes. An often-quoted example is furnished by German,
in which it is claimed that there is no voice opposition in word-form-final
contexts (a claim that has been challenged; see Port and Crawford 1989). The
/t/ (or archiphoneme /T/) at the end of German Rat ‘advice’ remains /t/ in the
plural form Räte, but the /t/ (or /T/) at the end of Rad ‘wheel’ is replaced by /d/
in the plural Räder. In Rad–Räder there is therefore a /t–d/ alternation while in
SomoudBarghouthy
Rat–Räte there is no alternation, only /t/. To express this difference, the morpho-
phoneme symbol {T} is employed to transcribe Rad as {raT} (Hyman 1975: 79).
The symbol {T} here denotes the alternation /t–d/ and can be used wherever such
an alternation is found in the language.
The motivation behind morphophonemes, and behind morpho-phonographic
spelling, is to have a single invariant form for a single item of grammar. If there
is only one plural morpheme, or only one morpheme with the meaning-content
‘wheel’, then in the thinking of many linguists this should be reflected in its
pairing with only one expression-form. It tends to be linguists of a generative
persuasion who are keenest to try to establish one-to-one relations between
content and expression, a tendency which may stem from a desire to minimise the
number of items which have to be stored in the grammar (Kenstowicz 1994: 60),
but the effect is to tip transcription more in the direction of spelling and less in
the direction of an analysis of pronunciation. Generative phonology has focused
more on the phonological structure of lexical items than on phonological struc-
ture as a network of relations between phonological items such as phonemes, to
the point where the term ‘phoneme’, if not the phoneme principle, has all but
disappeared from generative phonology’s discourse (Dresher 2011: 262) – it is
historically significant that in Chomsky and Halle (1968: 10–11), probably the
single most influential text in phonology since its publication, all terms with the
word ‘phoneme’ in them are rejected. The motivation for morphophonemic rep-
resentations propelled generative phonology to do without a phonemic represen-
tation intervening between the morphophonemes and the phonetic form. Instead,
all phonemes become in effect morphophonemes (they are sometimes called
‘systematic phonemes’; for example, Hyman 1975: 80–2) so that all morphemes,
stems as well as affixes, have a single ‘underlying representation’ from which
‘surface forms’ are derived by treating morphological alternations and allophonic
variants in the same way. We can see this if we return to the English regular
plural, where the derivation in (4.13) is preferred to that in (4.12) (the forms of
the rules are not relevant here, only their effects).
(4.12) /kat{Z}/ → /kats/ → [kats], by one rule specifying the /s/ phoneme
alternant after a voiceless obstruent, and another specifying its reali-
sation in this context.
(4.13) /katz/ → [kats], by a single rule stating that /z/ is realised as [s] after
a voiceless obstruent.
The justification for the /z/ in /katz/ is that [z] is the pronunciation of the plural
having the widest distribution. The symbol /z/ therefore has grammatical as well
as phonological identity, taking on a function normally the responsibility of
spelling.
4.10 Exclusive and Inclusive Transcriptions

In his account of how James Murray developed his notation for the New English
Dictionary (later the Oxford English Dictionary), MacMahon (1985: 80) identi-
fies one problem Murray faced: how to represent the pronunciations of words
161
when so many variant pronunciations exist and when, as a deliberate policy,

no single accent of English had been selected as the model. Daniel Jones’s
distinction between exclusive and inclusive transcriptions (Jones 1918/1972:
338–40) comes from recognition of the same problem. What are excluded
or included are variant pronunciations – ‘diaphonic’ variants in Jones’s ter-
minology. A ‘diaphone’ is ‘a sound used by one group of speakers together
with other sounds which replace it consistently in the pronunciation of other
speakers, and it also encompasses stylistic variants used by the same speaker
(Jones 1918/1972: 53). It is thus similar to a sociophonetic variable. Specific
transcriptions are by definition exclusive because only a single variant can be
present at any one time. Generic transcriptions are in practice inclusive because
there are always variant pronunciations in any language or language variety, or
any speaker.
4.11 Dynamic Transcription

This section is divided into three subsections. The first two consider two types
of dynamic transcription which try to give an account of the articulatory domain
by denoting the changing relationships of the speech organs during speech pro-
duction. The third subsection looks at the transcription of intonation and rhythm,
which are perhaps even more obviously inherently dynamic.
Before looking at the two conceptually very similar forms of dynamic articu-
latory transcription, a few words need to be said about their notation. The nota-
tion systems featured in Chapter 3 mostly provided for representing sounds as
if they are static objects without any internal dynamics, although the ExtIPA
voicing notation can deal with some segment-internal changes; in Chapter 6
Section 6.0 there are some further suggestions for how dynamics can be incor-
porated into transcriptions using segmental notation (see Chapter 6 Figure 6.2 for
an example). A symbol such as [b] or [a] is necessarily static in its denotation
because it denotes an intersection of categories, and intersections are logically
simultaneous –all their elements have to be present simultaneously in order to
intersect. When we look at the referring function of a symbol rather than the
denoting function, that is to say when we look at descriptive models in transcrip-
tions rather than theoretical models in notation charts (see Chapter 1 Section 1.3
for this distinction), we are dealing with phonetic data which comprise a multi-
plicity of time-varying parameters in all the phonetic domains. In the articulatory
domain, the tongue, lips and vocal folds are continually changing their spatial
locations and states; in the aerodynamic domain, there are continual changes in
air pressure and modes of airflow; in the acoustic domain, there are continual
changes of frequency and amplitude; in the auditory domain, there are continual
changes in the information being processed and relayed to the auditory cortex;
and in the perceptual domain, sense-impressions constantly fade and give way
to new ones, although there are normalising effects in perception which confer
some perceptual stability without clear correlates in the other domains (see
Chapter 5 Section 5.3). Ceaseless change is therefore the reality of the materi-
als of speech, and it is recognition of this fact which has prompted alternative
SomoudBarghouthy
non-segmental representations. The two types of dynamic articulatory transcrip-
tion which have been developed, parametric and gestural, denote articulatory
parameter categories which change their values through time. Notation in these
dynamic transcriptions is relatively unimportant and there is not a lot to say about
it. The only crucial thing is that each parameter must be made clearly identifi-
able. Usually this is done with labels in the form of initial letters such as TT for
tongue-tip, or abbreviations such as VEL for velum, and often by employing
different styles of lines for representing fluctuations in parameter values. Pike
(1947: 10), for example, denotes lip-shape with a line of [oooo]s for ‘rounded’
and struck-through [oooo]s for ‘unrounded’ (not to be confused with the [oooo]
s superimposed on a line representing the lower lip in Tench (1978: 40)),
‘tongue-tip’ by a line of full stops [. . ..], ‘tongue blade’ by low acutes [ ], and
́́́́
‘tongue-back’ by low graves [ ]. They are plotted on a five-line stave represent-
̀ ̀ ̀ ̀
ing degrees of stricture. Figure 4.3 shows examples.
(a)
(b)
FIGURE 4.3: Dynamic transcriptions in Pike’s ‘sequence diagrams’

for (a) [abop] and (b) [zʒɣn]. From Pike (1947: 10). Reproduced with
kind permission of the University of Michigan Press. © Pike (1947),
Phonemics, University of Michigan Press.
Voicing in parametric transcriptions is often represented by a line of [xxxx]s
or a continuous zigzag, and voicelessness by [0000]s or a continuous straight
line. Compared to static segmental notation, it is only when dynamic notation
is deployed in transcriptions that one can see its representational potential. The
other crucial thing in dynamic representations, which is in fact their raison
d’être, is that the parameters must be carefully aligned to represent the relation-
ships between them at specific points in the time course, so that, for example, we
can see when the velum closes in relation to other events such as the lowering of
the tongue-tip and the offset of voicing in a sequence like [ns] in once.
4.11.1 Parametric transcription
A naïve interpretation of the sequential arrangement of segmental transcrip-

tions might conclude that speech is made up of a series of stable sounds which
instantaneously give way to each other as the time course of speech progresses.
Phoneticians and linguists of course know that this is not the case, but it is
possible to transcribe speech to show explicitly that this is not the case and to
try to capture the inherent dynamism of speech. In parametric transcription,
each parameter of speech is separately represented through time to show how
and when its value changes relative to the values of other parallel parameters.
Changes can be represented iconically with upward and downward movements
of articulators denoted by upward and downward movements of a line read
from left to right, and voicing by an oscillating line. In a word such as English
sprinkle the voicing parameter would be shown to remain in the vibrating state
from the parting of the lips for [p], through the raising of the tongue-tip for [ɹ]
and its consequent lowering, continuing while the tongue-front is raised and
the velum lowered for the regressively nasalised [ɪ̃], ceasing to vibrate after the
tongue-back has made contact for [ŋ] at the moment the velum is raised for [k],
and resuming again when the tongue makes contact for the final [l]. The oro-
nasal process parameter would be shown to lower while the tongue is raised for
the vowel before the tongue-back makes contact with the velum, and so on. This
kind of transcription represents speech as a simultaneous bundle of time-varying
articulatory parameters which overlap with each other in patterns of coarticu-
lation and which have no segmental structure marked out by straight vertical
boundaries. A very defensible claim can be made, as it has been by Abercrombie
(1964b/1965: 123) and Tench (1978: 41–2), that a parametric transcription is
much closer to the articulatory realities of speech production than any type of
segmental phonetic transcription. Instrumental records such as palatograms,
articulograms and spectrograms consistently show extensive overlapping of
articulatory movements associated with realisations of adjacent phonemes, and
even of non-adjacent phonemes (Farnetani and Recasens 2010: 325).
Parametric transcriptions are clearly speaker-oriented in that they purport to
represent what speakers do. There can only really be a valid specific parametric
transcription, however, if there are instrumental data from the utterance to inform
it, showing for a particular utterance exactly when the tongue-tip started to rise
in relation to bilabial release, when the velum lowered in relation to the vowel
articulation, and so on, in which case it becomes a summary of instrumental data
SomoudBarghouthy
(see Chapter 6 Section 6.2.2). In the absence of specific data, parametric tran-
scriptions have to be approximate and speculative, and in so far as speculation is
informed by application of what phonetic theory tells us (Howard and Heselwood
2013: 94), they are generic. Theoretical understanding of coarticulation and the
relative timing and coordination of articulatory activities is of course premised
on instrumental data from previous specific utterances which feed into phonetic
theorising. Whether a parametric transcription can be impressionistic is a moot
point and probably depends much on one’s theory of speech perception, and of
perception in general. Direct realist theories would presumably regard the idea
as unproblematic because, according to direct realism, the speaker’s articulatory
actions are the unmediated objects of perception (see Chapter 5 Section 5.7).
For those who are not direct realists the parameters would have to be under-
stood as auditory-perceptual correlates, or auditory-visual-perceptual correlates,
of articulatory actions the perception of which is constructed from the speech
signal in ways which are not easily determined and may in fact be in principle
indeterminate. If the transcriber is also the speaker, then a specific impressionis-
tic parametric transcription is possible, though not necessarily accurate, through
careful introspection of proprioceptive and kinaesthetic sensations coupled with
auditory-perceptual impressions (and visual-perceptual if carried out in front of
a mirror), in which case the ‘impressions’ are multi-modal.
Parametric transcriptions are not commonly used outside of a pedagogical
context because they are difficult to read and write, though reading them can be
made easier with a segmental annotation, as in Figure 4.4. The same information
is, though, contained in segmental transcriptions provided the general phonetic
conventions implicit in them are not ignored. However, as Tench (1978) empha-
sises, parametric transcriptions are extremely useful in phonetics pedagogy. In
fact they can make manifest the conventions implicit in segmental transcriptions
by illustrating clearly how articulations are timed and coordinated. They are often
FIGURE 4.4: Parametric transcription of Good morning from Tench

(1978: 41). 1 = lower lip; 2 = tongue-tip; 3 = tongue-front; 4 = tongue-
back. Tench (1978), ‘On introducing parametric phonetics’, Journal of the
International Phonetic Association 8, 34–46.
used, for example, in explaining VOT distinctions, and a good exercise for stu-
dents is to summarise an articulatory description of the pronunciation of a word
such as sprinkle by means of a parametric transcription based on introspection of
their own production.
4.11.2 Gestural scores
At first glance, a gestural score looks just like a parametric transcription, and
indeed it can be considered a particular type of parametric transcription. What
makes it different is the theoretical framework with which the term ‘gesture’
has become very closely associated. Gestural scores can be viewed as paramet-
ric transcriptions constructed according to the general theoretical approach to
phonetics and phonology taken in what is known as ‘Articulatory Phonology’
in the work of Browman and Goldstein (1989, 1990, 1992). This approach sees
no reason to treat phonetic and phonological representations as different, main-
taining that speakers organise their phonological systems in terms of abstract
gestures which are specified dynamically for time of inception relative to other
gestures, and for amplitude of movement, in an abstract articulatory space.
Because these abstract gestures are said to exist in speakers’ internal grammars,
articulatory phonology is a generative theory which is at pains to stress that the
gestures it postulates are not actual observable movements of articulators but
specifications for such movements (Browman and Goldstein 1989: 75). Whether
and how the gestures actually occur in speakers’ mouths are not a matter for
the grammar, as we have seen before in generative phonology (see Section 4.6
above), but a matter of individual performance. The theory allows for gestures to
remain unexecuted due to being completely overlapped in the internal representa-
tion by one or more other gestures.
A gestural score, like a parametric transcription, is a two-dimensional plane
with the gestures arranged in abstract articulatory space on the y-axis and time
on the x-axis. It therefore closely resembles an orchestral musical score, which
is not a record of an actual performance but a set of instructions telling each
musician when to play what. A written-out gestural score such as is shown in
Figure 4.5 therefore denotes a mental gestural score in the speaker’s phonologi-
cal grammar. Interestingly, with time forming one of the axes of gestural scores,
the usual understanding that synchronic structures of grammar are outside time is
necessarily contradicted because of the inherent dynamism of the basic notion of
a gesture. Trubetzkoy (1938/2001: 48) articulated the general view of synchronic
grammars, which probably began in modern western linguistics with Saussure,
that ‘language structure (langue) is timeless; it is only in speech (parole) that
temporal relations emerge’.
Typologically, a gestural score is by definition a systematic transcription and
must also be a generic transcription because the gestural scores of individual
speakers are not observable and it is not the observable articulatory movements
which are represented in it. In so far as a parametric transcription can be spe-
cific and impressionistic, it is possible to view gestural scores and parametric
transcriptions as enjoying a broad–narrow relationship in which the paramet-
ric transcription gives an account of the actual articulatory movements of the
SomoudBarghouthy
VEL wide
narrow
TB pharyngeal
(TBCD)
clo clo
LIPS labial labial
(LA)
GLO wide
100 200 300 400

TIME (MS)
[ p h ɑ m ]
FIGURE 4.5: Gestural score for palm from Browman and Goldstein (1989:
76). Continuous tract variable motions added to box notation with aligned
segmental transcription. Reproduced with kind permission of Haskins
Laboratories, Yale University
speaker. That is to say, a gestural score can represent phonological structure and
a parametric transcription a particular realisation.
As a systematic transcription, a gestural score can be said to be neutral with
respect to speakers and hearers of the same language variety because they are
assumed to share the same system of gestural organisation in their internal
grammars.
4.11.3 Intonation and rhythm
The phenomena to be handled in intonational transcription have generally been

regarded as more slippery than segmental phenomena. Not only is there ‘no
comparable standard alphabet’ (Beckman and Vendetti 2010: 610), but it is prob-
ably fair to say that there is no comparable conceptual and theoretical framework
for providing a set of general phonetic categories for an intonational alphabet to
denote. Intonational categories are not products of category intersections, and there
is no procedure as neat as the commutation test for pairing forms and meanings.
Joshua Steele’s remarkable work has already been mentioned in Section 4.1 above,
and deserves further examination here as the first comprehensive framework and
notation for handling intonation and rhythm. His analyses of English prosody
are not so greatly improved on in modern accounts. In Figure 4.6 we can see his
representation of a ‘bombastic’ style of speech using notation taken from music
to transcribe pitch movements, rhythmic stress and quantity in a multi-tiered and
multilayered transcription. The staff shows a bass clef and time signature of 3/4.
FIGURE 4.6: Steele’s transcription of a ‘bombastic’ manner of reciting

lines from Thomas Leland’s Orations of Demosthenes. Steele (1775:
51). Oblique lines = acute and grave accents showing pitch height, and
direction and extent of movement; tails show quantity – And marked as
a crotchet, now a minim, de-li-be a quaver triplet, pres- one and a half
crotchets; ɼ = a crotchet rest; ┐= a quaver rest; ∆ = heavy cadence,
... = light cadence, .. = lightest cadence; ʻ = forte, ’ = piano; zigzags =
crescendo and decrescendo
The importance of pitch as an intonational parameter, and the ease with which
F0, the acoustic correlate (though often not the auditory correlate) of pitch, can
be extracted from acoustic speech signals and displayed as in Figure 4.7a, have
led to some phoneticians being persuaded that an F0 trace makes transcription
of intonation redundant. That an F0 trace will be more faithful to the signal
than an impressionistic, signal-oriented transcription of pitch (see Section 4.12
below) is no doubt true, at least for modal voice if less so for other voice-source
types (Beckman and Venditti 2010: 605), but it does not provide an analysis into
categories and it treats every detail as equally important.1 The vexed question is
what kind of categories the pitch movements of an utterance should be analysed
into. The categories most applicable to what can be observed in an F0 trace,
and to how perception of intonation is reported by listeners, relate to height and
SomoudBarghouthy
direction of movement. Notation for these is provided for in the IPA level and
contour tones and various other adaptations of the basic set of acute, grave and
circumflex accents inherited from Greek (see Chapter 3 Section 3.4.8).
Figure 4.7 presents four different representations of the intonation of a spe-
cific utterance of an English interrogative sentence. Figure 4.7a is an acoustic
F0 trace with frequency and time scales. This is clearly a parametric represen-
tation, showing F0 values changing continuously, and is iconic with reference
to intuitive, everyday descriptive terms such as ‘high’ and ‘low’, ‘falling’ and
‘rising’ for pitch. It is also the most phonetic of the representations because no
category judgements have been made, which in fact excludes it from being a
transcription, and it is quantifiable in Hz at every time point. The orthographic
transcription in Figure 4.7b has accent and iconic tone marks as used by most
phoneticians in the British tradition, as found in Cruttenden (1997: xvi) and
Wells (2006: 260), for example. This kind of transcription is in a sense seg-
mental in that discrete categories are denoted by discrete symbols at particular
points in syntagmatic structure, even if those categories are realised in a highly
(a) - 180 Hz
- 80 Hz
0 ms 1000 ms
(b)
Did 'John see' Jane toˎ day
(c)
(d) H* !H* L*L¯L%
FIGURE 4.7: (a) F0 trace (arrows identify local perturbations; see

text); (b) orthographic transcription with accent and tone marking; (c)
interlinear tonetic transcription with iconic representation of pitch height,
accentual prominence, and pitch movement; (d) ToBI transcription: H*
= accented syllable in the upper pitch range; !H* = downstepped version
of H*; L* = accented syllable in the lower pitch range; Lˉ = low phrase
accent; L% = low boundary tone
distributed manner. It is also phonological because the categories denoted are
169
set up in a system of contrasts. The low fall on day is taken to be linguistically

different from a high fall, not simply different in pitch. In Figure 4.7c is an
interlinear tonetic transcription also common in the British tradition; see for
example Jones (1918/1972; Jones has an extra line to represent the middle of
the pitch range), Crystal (1969), O’Connor and Arnold (1973) and Cruttenden
(1997). Here we have a more phonetic representation rather than a representa-
tion of phonological categories, although the latter are perhaps not entirely
absent. We can also see that it is halfway between being parametric and being
segmental: the dots are discrete but lie on obvious trajectories of continuity
mimicking the iconicity of the F0 trace. Iconicity is also present in the size of
the dots indicating perceptual prominence. Finally, in Figure 4.7d, there is a
transcription using ToBI notation (Tone and Break Indices; see Beckman and
Ayers 1994), perhaps the most influential system of the period since the late
1980s, particularly among American intonationists, which denotes phonologi-
cal categories in a segmental fashion. What happens between the symbols has
to be accounted for by ‘interpolation rules’, rather like coarticulation between
vowels and consonants.
If every detectable change in F0 direction were to be transcribed we would
not see the wood for the trees, but in order to separate the wood from the trees
we find ourselves applying phonological criteria to exclude phonetic detail.
For example, there are highly localised F0 perturbations at the edges of vowels
due to the voicing behaviours of adjacent consonants. Examples are marked
by arrows on Figure 4.7a, where slight raising is evident after voiceless /s/ and
/t/, and a slight dipping after voiced /ʤ/. Unless these effects are the focus of
interest, there is little point in transcribing them, but the decision not to do so
is a phonologically motivated one. Systems of intonation transcription therefore
tend to be phonological ones in which the denotata for the notation are pitch
patterns believed to be units in the phonology of the language being transcribed.
Intonation transcription is consequently much more language-specific than seg-
mental transription, and this is particularly and explicitly the case with ToBI.
Once a set of pitch patterns has been decided on for a language, F0 contours can
be made to fit them by smoothing out local perturbations with overlaid ‘close-
copy stylisations’ or marking of ‘tone targets’ (see Beckman and Venditti 2010:
605, 619).
The most common way to transcribe rhythm is to mark accentual prominence,
although this may not be appropriate for all languages. What exactly underlies
our sense of rhythm in speech is still a matter of theoretical debate, the terms
of which are still largely shaped by the syllable-timed versus stress-timed
dichotomy formulated by Pike (1947: 13) and the attempts to escape the contro-
versy it has engendered, by finding some objective instrumental measure such
as in Ramus, Nespor and Mehler (1999) and the pairwise variability index (PVI)
presented in Grabe and Low (2002). Rhythmic beats are usually marked by [ˈ]
placed in front of the syllable judged to be carrying the beat, a practice found for
representing word-accent from early in the English lexicographic tradition. Silent
beats can be represented using the ExtIPA bracketed full stops: [(..)] symbolises
two silent beats.
SomoudBarghouthy
170
4.12
Instrument-Dependent and Instrument-Independent

Transcriptions
The relationship between transcriptions and instrumental records is dealt with in
Chapter 6, but a short outline of the distinction between instrument-dependent
and instrument-independent transcriptions is useful in this chapter on transcrip-
tion typology. A transcription can be instrument-dependent in two ways, which
I shall call ‘instrument-determined’ and ‘instrument-informed’. By ‘instrument-
determined’ I mean specifically that the data for the transcription are the
instrumental records, not the speech which they are records of. An instrument-
determined transcription transcribes the original speech only indirectly, and
in fact does not require it to have been heard at all. An ‘instrument-informed’
transcription is one in which information from instrumental records is consulted
to help the transcriber make judgements about how to transcribe what has been
heard. By contrast to both types of instrument-dependent transcriptions, an
instrument-independent transcription, as the term suggests, is not derived at all
from information in instrumental records.
The different relationships betweeen speech, instrumental records and tran-
scriptions is diagrammed in Figure 4.8.
(a) Instrument-determined transcription

Instrumental
Speech Transcription
record
(b) Instrument-informed transcription

Instrumental
record
(c) Instrument-independent transcription

FIGURE 4.8: Relations between speech, instrumental records and

transcriptions in instrument-determined, instrument-informed and
instrument-independent transcriptions
4.13 Transcriptions as Performance Scores

If a transcription is read and pronounced, either aloud or silently, then it is func-
tioning as a performance score much like a musical score for a musician. The
transcription instructs the reader what speech sounds to produce. Transcriptions
in dictionaries, and language learning and treaching materials, can have this func-
tion. In these contexts the transcriptions will normally be generic ones represent-
ing typical or recommended pronunciations for the reader to practise. In speech
therapy a therapist might use a transcription to provide a client with a model
pronunciation to be practised, which could be of real words or nonsense words.
In the learning and teaching of practical phonetics a tutor will often compose
nonsense words in the form of general phonetic transcriptions for students to
perform by stringing together more or less arbitrary selections of symbols, for
example [ʁɛˈʈɯɸp’], [vʊˌliɻeɓæˈħam̥ ]. Although such a transcription is not
expressing an analysis of any actual pronunciation-form, whether specific or
generic, it is expressing an analysis of a potential pronunciation-form which is
actuated when performed. A nonsense word is a string of language-independent
general phonetic models functioning as instructions for the production of a
pronunciation-form which has no non-fortuitous connection to a lexical item.
Tutors may also create nonsense-word transcriptions as performance scores for
themselves to produce so that students can practise identifying and transcribing
the sounds. The students’ transcriptions of these productions are specific general
phonetic transcriptions.
Transcriptions as performance scores may have a role in some phonetic
fieldwork procedures. If the fieldworker has made a specific transcription of a
consultant’s pronunciation, he or she may wish to repeat it back to the same or
another consultant to check its accuracy.
A different kind of performance score will be part of any speech synthesis
system. The sounds to be synthesised have to have encoded instructions (Carlson
and Granström 2010: 783–4) which can be read by the system and converted into
synthetic speech.
4.13.1 Nonsense words
Nonsense words are expression-forms without content. If they comprise charac-

ters of written language – syllabograms or letters – then they are orthographic
nonsense words; if they comprise phonetic notation then they denote general
phonetic models functioning as performance scores for spoken nonsense words.
Because an orthographic nonsense word cannot be identified as a lexical item, its
pronunciation has to be gleaned from its spelling which then takes on the status
of a pseudo-transcription or a proto-transcription, depending on whether it is
denoting pre-theoretical or theoretical models.
Nonsense-word performance scores were first introduced into practical pho-
netics training by Jean Passy, the brother of Paul Passy, the main founder of the
International Phonetic Association (Collins and Mees 1999: 21), but nonsense
words have a much longer history than that. The honour of being the first ever
written nonsense words may go to what Baines (2004: 182) calls ‘pseudo-
writing’ found in Egyptian Early Dynastic inscriptions (early third millennium
bce). These are collections of consonantal signs that as far as is known do not
spell real words but had the function of displaying prestige in public places in a
society where literacy was the property of an elite few. We do not know if they
were intended to be pronounced, but they were at least potential performance
scores, although readers would have had to supply vowel sounds themselves as
they do in modern abjad writing systems.
The Greek Stoic grammarians undertook the distributional analysis of sounds
in Greek syllables. By manipulating consonants and vowels independently, they
constructed sound sequences that conformed to the phonotactic structures of
SomoudBarghouthy
real Greek words but were not actual Greek words, for example βλίτυρι ‘blityri’
(Robins 1990: 28). More adventurously, they constructed sequences that did not
conform to Greek phonotactic patterns although still restricted to Greek conso-
nants and vowels. These activities are only possible if expression is separated
from content in both written and spoken language. By writing these nonsense
words, the grammarians were using Greek letters as a phonetic notation system
which at that time was already beginning to have some phonetic theory behind
it. Phonetic descriptions in the Téchnē Grammatiké, usually attributed to the
Alexandrian grammarian Dionysius Thrax (c. 100 bce), though not without
controversy (Matthews 1994: 67), show the ability to cross-classify sounds on
the basis of shared manner features such as aspiration and laminar airflow and to
give the features phonetically appropriate terms, although terms for the different
places of articulation were lacking (Allen 1981: 119–21). If we cannot accord
the Stoic nonsense words the full status of proper phonetic transcriptions on the
grounds that they only used notation resources supplied by their orthography, we
should recognise the presence of some phonetic theorising, which, by the crite-
rion advanced in Section 1.3, distinguishes between pseudo-transcription on the
one hand and proto- and proper transcription on the other. Their nonsense word
transcriptions may therefore be regarded as rudimentary proto-transcriptions.
Around the third century ce in China a procedure known as fǎnqiè (fǎn ‘oppo-
site’ + qiè ‘cut’) was developed, in which syllables were divided into initials
and finals with some characters corresponding to the one and some to the other;
initials are syllable-onset consonants and finals are the syllable rhyme and associ-
ated lexical tone (Halliday 1981: 130–8). DeFrancis (1989: 119) describes it as a
telescoping procedure, as if English cat were to be spelt by combining the char-
acter corresponding to the onset of cup with those corresponding to the rhyme of
rat to give <c(up r)at>. Tables were constructed with the initials on the y-axis and
the finals on the x-axis. In addition to specifying all the actually occurring syl-
lables of Chinese, the tables generated all the possible syllable-types and enabled
the distinction to be made between occurring and non-occurring syllables, thus
focusing attention on sounds as entities with an existence apart from their use
in the pronunciation of words by separating them off from lexis and grammar.
It also provided a pseudo-phonetic notation for writing foreign words, but it
was not widely used and was never standardised. Its significance in the history
of writing is that it is an example of language users deliberately manipulating
expression elements of written language as objects with an independent existence
outside of the written words they are normally used to spell.
An intriguing episode in the deliberate manipulation of expression elements to
create written forms lacking content is encountered in the innovative wordplay
of Virgilius Maro Grammaticus in the seventh century ce (Law 1997: 224–40).
He took the classical hyperbatonic devices of tmesis and synchysis, extending
them to strip words of their content through a procedure he called scinderatio
fonorum, itself manifesting a reverse tmetic construction from Latin scindere
‘to split apart’ and ratio ‘order’, plus his own coinage fonum, which his usage
shows meant a word-form as a construction of sounds – for the content aspect
of words he used the term verbum (Law 1997: 237–9). Tmesis is the division of
a compound into its morpholexical constituents, while synchysis is the random
reordering of words in a sentence. Both tropes take content items as their inputs.
Grammaticus, however, extended these to take expression items as input. He
then split them up and redistributed them in what Vineis and Maierú (1994: 164)
describe as ‘most unscrupulous manipulations of the signifier’. The Latin quan-
dolibet vestrum gero omni aevo affectum ‘I bear affection for you at all times’ is
scrambled into ge ves ro trum quando tum affec omni libet aevo (example from
Law 1995: 85). Scattered among preserved meaningful items such as quando and
aevo are meaningless items such as trum and ro. Because there are no such lexical
items, the only source of information about how these are to be pronounced is the
arrangement of the letters, which take on the status of pseudo-phonetic symbols,
and the spelling becomes a pseudo-transcription.
The eighth-century Middle Eastern grammarian and phonetician Al-Khalīl
derived nonsense words through an ‘anagrammatic method’ (Sara 2009: 1) in
which he took the consonantal roots of Arabic lexemes and rearranged them into
all their possible permutations, some of which were non-occurring in the lexicon.
An example is given in (4.14) on the root ‘d r s’ with semantic field glosses.
(4.14) √drs– studying

dsr – caulking (archaic)
rds – rolling sth. smooth
rsd – *
sdr – being perplexed
srd – continuing without interruption
Al-Khalīl’s knowledge of the phonetic theory of the time, which he himself did
much to establish, and the writing of the nonsense words using only the charac-
ters of the Arabic abjad make them proto-phonetic nonsense-word transcriptions.
4.13.2 Transcriptions as prescriptive models
A transcription can function as a prescriptive model if it is said to represent

the correct or recommended pronunciation of an item. In practice, prescriptive
models are selected from descriptive models, but it is possible in principle for
someone to claim that a particular form of pronunciation is correct yet for nobody
ever to have pronounced the item like that. Something like this situation must
have obtained when Esperanto words were first created in the late 1870s. In such
cases a transcription begins as a string of general phonetic models, only becom-
ing descriptive phonetic models once the pronunciation has gained currency.
For transcriptions to function as prescriptive models there has to be some
distinction between pronunciations that are correct and incorrect, or standard and
non-standard, or recommended and not recommeded. Prescriptive transcriptions
are aimed at readers who either want to know what the correct or standard pro-
nunciation is as a matter of interest, or want to acquire it as their own pronuncia-
tion. Dictionaries and language learning and teaching texts are the most obvious
contexts in which transcriptions will function as prescriptive models, some
pronouncing dictionaries distinguishing in various ways between recommended,
non-recommended and ‘incorrect’ but occurring pronunciations. The Longman
SomoudBarghouthy
Pronunciation Dictionary (Wells 2008), for example, prints recommended forms
in light blue with non-recommended but not incorrect in black, and scares readers
away from incorrect forms with a warning triangle. Speech therapy is another
context in which prescriptive models are used. However, it should be noted that
the question of what counts as a ‘correct’ pronunciation in speech therapy, and
thus what should function as a prescriptive model, will often be different from
a ‘standard’ pronunciation (Docherty and Khattab 2008: 612) when the client’s
social environment is taken into account.
4.13.3 Spelling pronunciation
If spelling is used as a source of information about pronunciation rather than

as the clothing by which written words are recognised, it can lead to spell-
ing pronunciation. When this happens, spellings become pseudo-transcriptions
functioning as prescriptive models. Pronunciation can change lastingly because
of this. For example, many written English words with initial <h-> entered the
language from French Latin-derived words which had never had a corresponding
/h-/ since Classical Latin times. The presence of <h-> in the written forms of such
words as hotel, hospital, herb prompted English speakers in the late eighteenth
century, who wished to dissociate themselves from the perceived vulgarity of
H-dropping, to restore /h-/ hyper-correctively in the spoken forms (Beal 1999:
171–4; see also Scragg 1974: 41). In some varieties of modern British English
a similar trend is being noted in the reversal of H-dropping, particularly among
younger female speakers (Stoddart, Upton and Widdowson 1999: 76; Williams
and Kerswill 1999: 158). Some of the etymological spellings resulted in lexically
specific pronunciation changes. For example, the <l> in fault was introduced to
the French loan faute to show its historical connection to Latin fallitus (past par-
ticiple of fallere ‘to fail’). Speakers then took the <l> as an instruction to insert
an /l/, giving us the modern standard pronunciation.
Taking spellings as prescriptive models for pronunciation was at the heart of
the Carolingian reparatio reforms in Roman Catholic western Europe overseen
by Alcuin of York in the late eighth and early ninth centuries ce, under the direc-
tion of Charlemagne. Latin no longer had any native speakers and had undergone
divergent linguistic changes in different parts of Europe where Romance dialects
had taken root, eventually to become national languages during the course of the
Renaissance. The break-up of Latin left it as a liturgical and learned language
whose pronunciation was at the mercy of local vernaculars, a state of affairs that
worried Charlemagne. Alcuin’s remedy was to instigate an ad litteras policy to
preserve ‘correct’ Latin pronunciation in order to stave off corrupting vernacu-
lar influences. The ad litteras policy meant pronouncing Latin ‘to the letters’
such that written Latin words became pseudo-transcriptions with the function of
prescriptive performance scores – in Coulmas’s words, ‘[t]he image became the
model’ (Coulmas 2003: 97). Insistence that Latin should be spoken to reflect its
spelling marked a reversal of the principle, articulated by Quintilian in the first
century CE, that Latin should be written to reflect its pronunciation, a clearly
Aristotelian view of the relationship between writing and speech, and itself a
call to use letters as pseudo-phonetic notation. It is as if the Romans left accurate
segmental pseudo-transcriptions of ‘correct’ Latin for Alcuin to use as prescrip-
tive models for non-native speakers.
Alcuin’s influence can be seen physically in modern printed and word-
processed phonetic transcriptions using fonts based on Times Roman type-
faces. Times Roman is partly modelled on the shapes and proportions of the
Carolingian minuscule script introduced under Alcuin’s direction to be a standard
bookhand for scribes.
4.13.4 Active and passive readings of transcriptions
A transcription can be read actively or passively. By passive reading I mean that

a transcription conveys information to readers which they can store as passive
knowledge. A reading is active when that information affects users’ speech
through a transcription functioning as a prescriptive model or as a performance
score. To exemplify the distinction, consider [tʰeɪbəɫ] as a transcription of
English table. We use this information passively if we simply register the fact
that this is how many people pronounce the word. We use it as a prescriptive
model if we decide that we should pronounce it this way, and we use it as a per-
formance score if we rehearse the pronunciation that it represents, either silently
or aloud.
4.14 Third Party Transcriptions

There is not a lot of point in transcriptions if other people cannot use them as a
source of knowledge about the data which have been transcribed. However, tran-
scriptions are usually made with specific aims and purposes, so that it may not
always be appropriate to draw conclusions about one kind of phenomenon from
a transcription which was made in order to analyse another kind. For example, if
a transcription incorporating some representation of pitch and rhythmic structure
was originally made in order to analyse vowel qualities in different prosodic
contexts, it might be the case that a broader transcription of pitch and rhythm
was deemed sufficient alongside a narrower transcription of vowel qualities. It
would then be inappropriate to base a detailed analysis of pitch and rhythm on
that transcription instead of returning to the original recordings to make a closer
analysis of those aspects. This point echoes the concern of Docherty and Foulkes
(2000: 112–17), who cast some doubts on the use of third party transcriptions
in phonological analysis and theorising when the circumstances surrounding the
transcriptions are not known. In any use of third party transcriptions, account
must be taken of the extent to which the transcriber’s methods and focus of inter-
est match one’s own. When presenting one’s own transcriptions it is therefore
helpful to others if the methods and aims are made as explicit as possible.
4.15 Laying Out Transcriptions

Ball and Local (1996: 69–70) distinguish between transcriptions in working
records and what they call ‘presentation’ transcriptions. The former are for the
use of the researcher and colleagues and will be likely to be very detailed. These
SomoudBarghouthy
transcriptions will have been made before further analysis, for example phono-
logical analysis, has been carried out, and may have been revised and updated as
more data are listened to. There may be rough explanatory notes and alternative
transcriptions yet to be decided upon. By contrast, a presentation transcription is a
finished product ready for inclusion in a publication or presentation. In preparing a
presentation transcription, decisions will have to be made about what to include and
what to exclude, and how to lay it out. Account should be taken of who is likely to
read it and what we want to tell them about the speech or the speaker (Heselwood
and Howard 2008: 392). It can be distracting if there is more detail than is required
to illustrate the points being made or to support the overall analysis.
If a transcript of a conversation is presented, thought should be given to how
layout might imply certain relationships between the interlocuters. Ochs (1979)
makes the point that readers tend to see a dialogue in terms of the first speaker
being active and directing, the second being passive and responding. If the focus
is on one speaker whose utterances are given in phonetic transcription while the
other speaker’s utterances are given orthographically, Bucholtz (2000: 1453)
warns that this might stigmatise one speaker as ‘other’, an issue to be aware of
in clinical and legal contexts.
It is usual in transcriptions of more than a single word to leave word-spaces,
although they have no phonetic reality in fluent speech. From a theoretical point
of view, word-spaces belong to spelling not to phonetic transcription, but tran-
scribers have not tended to agree with Sweet (1877: 108) that ‘[w]ord-division is
perfectly useless to those readers who are practically familiar with the particular
language’. Long uninterrupted strings of symbols are not easy on the eye and
are harder to process, the more so the more diacritics they have. Compare, for
example, the transcriptions in (4.15) where (a) has no word-spaces and (b) has
double word-spaces.
(4.15) a) ðəɹʷʊ̜kswə̃meɪkɪ̃ŋə̃nɔ̜fɫkəkʰɒ̜fə̃ni ̽
b) ðə ɹʷʊ̜ks wə̃ meɪkɪ̃ŋ ə̃n ɔ̜fɫ kəkʰɒ̜fə̞ni̽
Nevertheless, there are times when word-spaces are not possible or not appro-
priate. If the language being transcribed is unknown to the transcriber, then
word boundaries will not be identifiable without further investigation, so whole
stretches between pauses will have to be transcribed without word-spaces. When
there is coalescence across a word boundary it may be difficult and inappropriate
to split the coalescence with a word-space. For example, a coalesced pronuncia-
tion of the English phrase but you did should be transcribed as in (4.16a) rather
than as in (4.16b).
(4.16) (a) [bəʧə ˋdɪd]

(b) [bət ʃə ˋdɪd]
General phonetic conventions suggest an interpretation of (b) in which the

dynamics of the [t ʃ] sequence are different from [ʧ] in terms of durations and
amplitude rise times, with significant implications for auditory quality, and might
lead to (b) being lexicalised as but she did.
As well as using double word-spacing, it is advisable to leave at least one clear
line between each line of transcription, and in general to avoid an impression of
overcrowdedness while still maintaining textual cohesion. In multi-tiered and
multilayered transcriptions it is important that each tier and layer is easy to read
but also that it is clear how the tiers and layers relate to each other.
Note
1. Automatic pitch extraction is not without its own problems of validity; see Johnson
(2003: 31).
SomoudBarghouthy
5
e
Narrow Impressionistic Phonetic
Transcription
e
5.0 Introduction
In this chapter I shall argue that the value of narrow impressionistic phonetic
transcription is that it is a method for representing in proper phonetic notation an
analysis of what speech sounds like to a phonetically trained listener. Ezra Pound
strove in his imagist doctrine for a poetic language which would be an accurate
objective expression of subjective experience (Moody 2007: 226), and that
captures fairly well what impressionistic phonetic transcription tries to achieve.
What we are doing when we make an impressionistic analysis of speech is trying
to express holistically experienced exemplars as realisations of the products of
category intersections by exploiting the relation between a theoretical model
and a descriptive model. It is the categories of the theoretical models that confer
some measure of objectivity. The centre of attention in impressionistic analysis
is on sound-as-heard, not sound-as-produced or sound-as-transmitted. In fact in
Section 5.1 sound for our purposes is defined as something which exists only
in the experience of hearing it. Experiences cannot be measured in the way that
articulatory gestures and pressure-waves can, but what experienced phenomena
and measured phenomena have in common is that they are amenable to analysis –
not into ontologically the same kinds of constituents, but an analytic approach is
possible in both cases, guided by the same body of phonetic theory.
Narrow impressionistic phonetic transcriptions are specific transcriptions.
They take as their raw data particular utterances of individual speakers. The aim
may be to focus on a particular utterance, as in the context of forensic phonetics
and conversation analysis, or on that individual, as in a clinical context, or the
focus of interest may be wider, inferring the speech patterns of a speech commu-
nity from the utterances of one or two speakers. Whatever the aim, it is a complex
process requiring, in the words of Abraham Tucker, the eighteenth-century
author of Vocal Sounds, ‘constant close attention’ in which the transcriber
faces ‘a continual hazard of blunders’ (quoted in Abercrombie 1948/1965: 63).
In order to try to understand the process, it can be broken down into different
stages (Knight 2011). First of all, speech has to be heard. The process of hearing
can be divided into reception and perception. Reception is carried out by the
SomoudBarghouthy Narrow Impressionistic Phonetic Transcription
peripheral auditory system, and the subcortical and lower cortical structures
179
of the central auditory system, with perception the responsibility of the higher
cortical levels. Once we are conscious of perceiving speech, we must then make
judgements about its phonetic properties by bringing to bear our knowledge of
phonetic theory together with our experience of hearing and producing speech as
phoneticians, before selecting symbols to express those judgements.
In this chapter we will start by looking at how the auditory system mediates
between acoustic pressure-waves in the physical world on the one hand, and our
brains and minds on the other. In other words, we need to try to understand how
sounds get into our heads. We will then consider how best to regard speech sounds
from the point of view of their being objects of perception and consciousness,
before tackling the issues that make narrow impressionistic phonetic transcription
a challenging and somewhat controversial practice in an age where instrumental
analyses of speech are increasingly dominant in speech research of all kinds.
Before looking at how the auditory-perceptual system mediates between
the physical world and consciousness, some conceptual and terminological
distinctions need to be made.
5.1 Pressure-Waves, Auditory Events and Sounds

Perhaps the most important concept and term is that of sound itself. There is a
puzzle which asks whether a tree crashing to the ground in a remote uninhabited
forest makes a sound. If we use the term ‘sound’ to refer to the pressure-waves
created by the impact, then the answer is of course ‘yes’. But if by ‘sound’ we
mean conscious awareness of an auditory sensation, then the answer is ‘no’
because there are no auditory sensations going on if there is nobody around to
have them. I shall reserve the term ‘sound’ for conscious awareness of an audi-
tory sensation caused by response to external pressure-waves. The term ‘auditory
events’ I shall use for what goes on in the auditory system of which we are not
consciously aware when we process pressure-wave stimuli. Table 5.1 sum-
marises these distinctions, between which there are cause-and-effect relations
such that pressure-waves cause auditory events which in turn cause sounds to be
heard. The crucial point is that the term ‘sound’ applies only to what goes on in
conscious experience, not to the events that lead up to and cause that experience.
TABLE 5.1: Pressure-waves, auditory events and sounds
Pressure-waves Auditory events Sounds as

perceptual objects
Location Ambient medium, Peripheral and central Consciousness
e.g. surrounding auditory systems
air
Key properties Frequency Transductions of Pitch and timbre
Amplitude pressure- wave Loudness
Duration properties into Length
properties of sounds
SomoudBarghouthy
The example of the tree in the forest shows that, by the scheme in Table 5.1,
there can be pressure-waves without there being associated auditory events or
sounds, but it is also quite possible for there to be auditory events and sounds
without pressure-waves in an ambient medium to cause them. In fact it is impos-
sible for normally hearing people’s auditory systems to be inactive whilst awake.
The American composer John Cage, whose piece ‘4 minutes 33 seconds’ is
famous for consisting only of ‘silence’, once went into an anechoic chamber, a
room built to have no ambient or reflected pressure-waves, in order to experience
complete silence. When he came out he said he was surprised because he had
heard two continuous pitches: a high pitch and a low pitch. The person in charge
of the chamber explained that the high pitch was caused by his nervous system
and the low pitch was the sound of his blood circulating (Sorensen 2009: 143).
The irritating, and in extreme cases debilitating, condition known as tinnitus arises
when there is abnormal activity somewhere in the auditory system, often of a vas-
cular nature, which causes the person to hear a ringing sound. Possibly related
to tinnitus, though a clear connection has not been established, are spontaneous
oto-acoustic emissions, which occur in over half the normal hearing population.
These originate from outer hair cell activity in the inner ear and create very weak
pressure-waves in the external meatus (ear canal) (Zurek 1981), thus reversing the
usual direction of cause and effect between pressure-waves and auditory events.
Oto-acoustic emissions can also be evoked by the presence or immediate after-
math of pressure-wave stimuli, often called ‘echoes’ because they sound similar
to the stimulus sound. There can furthermore be components in a sound which
have no direct corresponding acoustic correlate although they are systematically
caused by properties of an acoustic stimulus. Combination tones are an example,
and are part of the auditory system’s response to tones which are present in the
acoustic signal. We shall return to this phenomenon in Section 5.6 below.
Auditory hallucinations can be experienced in which people report hearing
voices speaking inside their heads, an experience often associated with schizo-
phrenia. In this situation, there are no pressure-waves and there may be no
auditory events, at least not at the subcortical levels. Unless otherwise stated,
when I refer to sounds I will mean conscious awareness of sound as a perceptual
object caused by pressure-waves, mediated by auditory events and available as
an object of attention about which judgements can be made, including phonetic
judgements.
Because of the crucial role of auditory processing and conscious aware-
ness, the analysis which impressionistic transcription expresses can be termed
‘auditory-perceptual analysis’ providing we remember that attention has to be
directed at the perceptual objects before phonetic judgements can be made; per-
ceptual processes do not themselves carry out phonetic analysis. In cases where
the transcriber can also see the speaker’s face, for example a video recording
with a synchronised soundtrack, we can call it ‘audiovisual perceptual analysis’.
5.2 The Auditory System and Auditory Perception of Speech

As everyone knows, we hear with our ears. Although information entering
through other senses such as vision can heavily influence what we hear, as the
McGurk effect demonstrates (McGurk and McDonald 1976), it is the reception,
181
processing and perception of speech by the auditory system which we will focus
on in this section because all impressionistic transcription, by definition, involves
listening to speech in order to analyse and transcribe it. Because speech, as the
expression of linguistic form, is spoken to be heard and perceived, Heselwood
and Howard (2008: 382) have described the auditory system in the context of
phonetic analysis as ‘a perceptual tool exactly tailored to the natural conditions
of the phenomena we wish to investigate’. That is to say, when we listen as tran-
scribers we are using the same apparatus as when we listen as users of spoken
language and, other conditions being equal (though often they are not equal; see
Section 5.11 below), will be sensitive to the same differences in sound quality.
For this reason, the human auditory system has been called ‘the best normalisa-
tion system yet developed’ (Foulkes, Scobbie and Watt 2010: 730), meaning
that linguistically irrelevant acoustic differences in parameters such as VOT, F0,
formant frequencies, duration, speech rate and so on are smoothed out when a
listener makes an auditory judgement as to whether two vowel tokens count as
the ‘same’ vowel, and should therefore be represented by the same symbol in a
transcription. It is interesting in this context to note that manufacturers of hi-fi
equipment use human listeners as the final arbiters of sound quality in their prod-
ucts. The auditory system therefore has an absolutely crucial role in impressionis-
tic transcription, being the system which transforms external pressure-waves into
neural signals, which in turn are transformed into sounds as particular objects
of consciousness. It is the analysis of these objects of consciousness which we
record in a transcription. How these objects of consciousness arise may help us
understand something of the complex nature of the relationship between a tran-
scription and what it purportedly represents.
The auditory system as a whole can be divided into peripheral and central
systems. The peripheral system comprises the structures of the outer, middle
and inner ear, while the central system contains the subcortical neural pathways
linking the ear to the auditory cortex, and the auditory cortex itself, which is
located in the temporal lobes of the brain. For most people, heard speech is pro-
cessed in the auditory cortex in the left temporal lobe, which is contralaterally
related to the right ear. Speech entering the left ear is first processed by the cortex
in the right temporal lobe and then integrated with the information in the left tem-
poral lobe after passing through the corpus callosum, the bridge between the two
hemispheres of the brain. Experiments have shown a ‘right-ear advantage’ effect
in listening to speech, which is explained by the contralateral connection to the
auditory cortex in the left temporal lobe (Bryden 1988; Franc and Styne 1991).
The peripheral auditory system takes in an acoustic signal and converts it into
an electrical signal via the mechanics of the middle ear and the fluid and mem-
brane oscillations of the inner ear. Unlike a good-quality microphone, however,
the ear is not a high-fidelity instrument. During the process of conversion, or
transduction, there is a certain amount of reshaping of the sound due to the prop-
erties of the structures involved. These changes mean that the signal reaching
the brain is not quite the same as the same signal reaching an acoustic analysis
system designed to produce spectrograms and waveforms. We should therefore
not necessarily expect judgements about sounds based on auditory analysis
SomoudBarghouthy
always to be consistent with judgements made by looking at spectrograms and
waveforms of the same input signals.
The central auditory system processes the output of the peripheral auditory
system and takes it up through the brainstem to the auditory cortex, fashioning
it into something the listener becomes aware of, which can be referred to as an
‘auditory-perceptual object’, or ‘auditory percept’. During this journey, the lis-
tener is not aware of the auditory events prior to the formation of the percept, but
many events take place. Electrical impulses leave the cochlea along the auditory
nerve fibres (ANFs) in a bundle which is part of the VIIIth (vestibulocochlear)
cranial nerve; cranial nerves are ones that enter the brainstem directly rather
than via the spinal cord. The VIIIth is classified as a ‘special somatic afferent
nerve’, which means it carries information relating to one of the ‘special’ senses
(hearing), from a sensory receptor (somatic, in this case the ear) inwards to the
brain (afferent). There also are some efferent fibres which carry information out-
wards from the brain to the cochlea, which are implicated in sharpening sudden
responses to frequencies by inhibiting neighbouring frequencies – a bit like
preventing red paint from running into adjacent yellow paint so as to preserve
the distinctness of the colours. ANFs carrying low-frequency information, from
the apical end of the basilar membrane, form the centre of the bundle exiting the
cochlea, with those on the outside carrying the high-frequency information. The
general principle by which the VIIIth cranial nerve encodes auditory information
is that frequency is related to where in the bundle the active fibres are situated,
while the intensity of a sound is related to the rate at which the neurons fire, and
probably also to the timing of the firings. Not all ANFs reach the auditory corti-
ces. Some terminate at the various synapses between the cochlea and the brain.
Quite how speech is processed as it travels up towards the auditory cortex
is not well understood, but it is generally accepted that speech and non-speech
stimuli are processed by the same subcortical structures, and enter the auditory
cortex at the same points, that is to say there is no specialisation of speech pro-
cessing at these levels (Bernstein 2005: 80). One thing we do know is that, at the
lower cortical levels, processing takes place with smaller inputs in terms of fre-
quency ranges and time windows, and at higher levels there is more integration
as time windows increase markedly (Kluender, Coady and Kiefte 2003: 66–7).
The effect is to smooth out small spectral irregularities, ‘cleaning’ the stimuli as
they approach the moment of perception.
Properties of the auditory system determine the auditory response area of
human hearing, defined as the range of frequencies in pressure-waves which can
be detected and processed. These properties also determine the level of inten-
sity each frequency must attain in order to be detected and perceived as sound.
At very high intensities, pressure-waves are not percieved as sound but felt as
pain. Figure 5.1 shows the auditory response area bounded at the lower edge by
the minimum audibility curve and at the upper edge by the pain threshold. The
shape of the auditory response area can be thought of as describing the absolute
transfer function of the auditory system as it mediates between pressure-waves
in the external environment and the consciousness of the listener. That is to say,
for human beings, by definition there cannot be sounds caused by pressure-waves
with properties outside the auditory response area.
130 Pain threshold
183
120
110
100
90
80
70
60
db SPL 50
0 = 20 µPs
40
30
20
10
0 Threshold of hearing
0.125 0.250 0.50 1 2 4 8 16

kHz
FIGURE 5.1: The human auditory response area
An important feature of the minimum audibility curve is that it is not a straight

line. It shows that our hearing is more sensitive to frequencies in the approximate
range 500–4,000 Hz than to other frequencies. The importance of this is that this
range contains virtually all the frequencies which are required for transmitting
speech, and is about the range carried by most telephone systems. Although it
does not include the range in which are found the fundamental frequencies (F0)
of periodic sounds, which lie typically between c. 90 and 350 Hz depending on
age and gender, listeners are astute at extracting pitch information from harmonic
structure. The auditory system computes the frequency difference between adja-
cent harmonics which is equal to the F0; pitch perceived in the absence of F0 is
known as residue pitch, but it seems that even when F0 is present in a complex
tone what we perceive is actually a residue pitch (Moore 1997: 188–9), most
probably computed from harmonics in the high-sensitivity range of the response
area.
Once signals have reached the brain, they are processed in a hierarchy of
synaptic levels, the auditory cortex being the lowest level of the hierarchy. The
first three levels seem, from experimental evidence, to be unisensory, dealing
only with auditory input and probably not distinguishing between speech and
other signals arriving via the auditory nerve; however, certain phonetically and
phonologically important events are represented at these lower levels, such as
VOT intervals (Bernstein 2005: 81–2), and normalisation processes may also
take place here (McQueen and Cutler 2010: 501–3). The governing pattern
appears to be that as the signal proceeds higher up the cortical hierarchy, the
processing becomes more oriented to phonetic information rather than general
auditory information, paying more attention to properties relevant to speech, but
the mixture of auditory and phonetic processing in the lower stages may be what
enables listeners to be simultaneously aware of linguistic and indexical percepts
in the same stimuli (Bernstein 2005: 86–7).
SomoudBarghouthy
184
5.2.1
Just noticeable differences
Within the auditory response area, a crucial question is how sensitive the auditory
system is to changes and differences in pressure-wave stimuli. It is these relative
thresholds, or ‘just noticeable differences’, which express our ability to judge that
two sounds are the same or different.
A ‘just noticeable difference’ (JND), also called a ‘difference limen’ (DL), is
the ‘smallest detectable change in a stimulus’ (Moore 1997: 359). For sounds,
JNDs apply to the acoustic parameters of amplitude, frequency and duration.
In fact within a speech sound such as a vowel with complex spectra, amplitude
and frequency are really two sides of the same coin: frequencies are detected
because their amplitudes are higher than adjacent frequencies. JNDs vary across
the auditory response area, tending, other things being equal, to be smallest in
the region of greatest sensitivity. Experiments to determine JNDs have tradition-
ally used non-speech sounds, generally pure tones and synthetic speech, so it
may be problematic to generalise the results to discrimination of real speech
sounds. Furthermore, for speech sounds, JNDs are also affected by the language
or languages the hearer speaks through the experiential process of attunement.
For these reasons it is difficult to be precise about JND values, but it is possible
to get some general indications from the experimental literature to help us under-
stand how perceptual constraints can limit the delicacy of narrow impressionistic
phonetic transcription.
For frequency, the JND is about a twelfth of a semitone, or one thirtieth of a
critical bandwidth (Howard and Angus 2001: 125). Linguistic-phonetic distinc-
tions based on pitch changes are very much larger than this, but F0 differences
distinguishing some individual speakers may not be.
Quené (2007) reports a JND for speech tempo differences of 5 per cent. For
segment durations, Bochner, Snell and MacKenzie (1988) found that normal-
hearing listeners had JNDs of 10–15 per cent while hearing-impaired listeners
needed differences of 15–30 per cent before they noticed them. Quantity distinc-
tions of vowel length in languages tend to operate with durational differences
in the range of 30–90 per cent (Lehiste 1970: 34), while geminate consonants
are anything between 1.5 and 3 times as long as their singleton counterparts
(Ladefoged and Maddieson 1996: 92).
JNDs for vowel formants, the major determinants of vowel quality, are more
difficult to assess and vary considerably across individuals. Listeners respond
differently to spectral changes in vowels presented in isolation and to vowels
flanked by consonants, probably due to the presence of formant transitions.
Using synthetic vowels, Fry, Abramson, Eimas and Liberman (1962) found that
in the latter condition, JNDs sharpen at phoneme boundaries whereas in the
former condition they do not. Mermelstein (1978) replicated these results and
also noted that JNDs depended on whether formant frequency was increased
or decreased, on how close a formant is to another formant, on how close a
harmonic is to the formant frequency, and on the duration of the vowel. The
results he obtained showed that, on average, the JND for F1 is 60Hz, and for F2
is around 175Hz.
5.3 Perception of Speech

185
What JND studies confirm is Moore’s (2010: 481) point that the normally func-
tioning human auditory system can comfortably handle the kinds of linguisti-
cally important acoustic differences thrown at it by speakers. It does not have
to operate at near capacity except perhaps when attending to fine phonetic detail
in dialectological and sociophonetic studies, forensic phonetics and disordered
speech, all of which involve focusing on indexical information of one kind or
another in individual speakers. Put simply, our ears are easily up to the task of
perceiving speech even in quite poor channel conditions, though it would be
a mistake to assume that this entails that transcribers are easily up to the task
of making impressionistic transcriptions. The falsity of such an assumption
becomes clear if we conceptually separate perception of speech as manifesting
linguistic structure from perception of speech as phonetic structure. The goal of
the former is to identify words, phrases and sentences and is carried out by listen-
ers when comprehending the content of speech, whereas the goal of the latter is to
analyse the phonetic structure into component phonetic categories and is carried
out by phoneticians. Ability in the kind of perception necessary for comprehen-
sion of spoken language is achieved by all humans in the normal acquisition and
use of a language, but ability in the phonetic analysis of speech only comes with
specialist training and practice within a theoretical framework.
The speech perception literature focuses mostly on perception of speech
as spoken language and sees word recognition as the central problem to be
explained. Unfortunately there is some looseness not only of terminology but of
theoretical grip in much of the discussion. What is described as lexical informa-
tion is really phonetic or phonological information, a mistake which follows
from the assertion that words are composed of phonemes, an assertion made so
often in many linguistic contexts that I will not cite any particular ones. There are
three objections I wish to make to the assertion. Firstly, it is phonological forms
which are composed of phonemes, not lexical items. Words are abstract items
in grammar which are independent of, and neutral with respect to, both pronun-
ciation and spelling (see Chapter 1 Section 1.1.2). Homonyms clearly illustrate
this point: bat (flying mammal) and bat (in cricket or table-tennis) are different
lexical items but have the same phonological form /bat/ (and happen also to have
the same spelling <bat>). Information coming from pressure-waves through the
auditory system can only give rise to perception of sound, not of lexical items.
Awareness of lexical items from incoming speech involves cognitive processing,
where semantic, contextual and general knowledge play a role in, for example,
disambiguating homonyms (Swinney 1981) – we cannot tell which bat has been
said merely by perceiving its phonetic properties. Lexical items only exist in the
minds of speakers and listeners: the function of the speech signal is to ‘push the
right button’ in the mind of the listener and activate the grammatical and seman-
tic information in their mental lexicon.
Secondly, and related to the first point, phonemes are devoid of meaning,
and it is logically incoherent to assert that any combination of meaningless ele-
ments can create meaning. That is to say, a meaningful lexical item is not itself
a function of combining meaningless elements. Such a combination can express
SomoudBarghouthy
a lexical item, and in cases like /bat/ can express more than one, but it cannot be
a lexical item. If combinations of phonemes created lexical items, then a phono-
logical form such as English /klamp/ would contain the words clam, lamb, lamp,
am, amp in addition to clamp, and a meaningless combination such as /plɛm/
would by definition be a lexical item.
Thirdly, and most crucially, the concept of the phoneme is a model in phono-
logical theory which cannot be assumed to be a unit of perception in listeners,
or even a unit of any kind in any behaviours associated with what speakers and
listeners do with spoken language. The assumption that units found to be useful
for the analysis of language must also be the units that speakers and listeners
operate with is unwarranted. We should only accept that phonemes are perceptual
units if there is experimental evidence to support this view. Quite to the contrary,
however, experiments show that listeners can identify whole syllables faster than
any of the phonemes supposedly contained in them, only identifying individual
consonants and vowels by a subsequent process of analysis (Warren 2008: 199).
A problem with the proposal that phonemes are units of perception can be
illustrated by variant pronunciations of the same word involving radically dif-
ferent allophones. Amongst other possibilities, bat can be pronounced [bat] or
[baʔ] in many varieties of English. If a listener takes them to be equivalent, it
is because of a lexical identity judgement rather than a phoneme identity judge-
ment. If attention is directed to what they sound like, they are likely to be judged
as different. Asked what the difference in sound is between English dogs and
docks, the /ɡ–k/ difference is much more likely to be picked out than the /z–s/
difference because of the lexical dog–dock distinction, whereas although /z/ and
/s/ are different phonemes they express the same morpheme in this context.
One of the tricky aspects of talking about perception is specifying at what
level of awareness a percept exists. For example, British English-speaking listen-
ers are not generally consciously aware of the ‘clear’ and ‘dark’ allophones of /l/,
which could be taken as evidence that the phoneme /l/ is their percept. However,
they can often spot an Australian or North American pronunciation of a word like
leave by the initial [ɫ], in which case [ɫ] must be a percept. But because they are
not attending to the quality of the /l/ they only notice it if it stands out as different
from their own. Phonetic training is partly about sharpening conscious aware-
ness of small differences in sounds which go unnoticed in most communicative
situations where the goal is comprehension of spoken language, and then being
able to give an account of the differences in terms of phonetic theory. That is to
say, it is about attending to the phonetic structure of speech and bringing it into a
higher level of conscious awareness where it can be subjected to analysis. For the
phonetician and the transcriber, the units of perception have to become units of
attention so that phonetic judgements can be made about them. Phonetic judge-
ments, then, come about through an interaction between the learned categories of
phonetic theory and the integrative processes in the auditory cortex which unify
disparate peripheral and subcortical auditory events into perceptual objects. This
interaction is under some degree of attentional control.
The postulation of auditory integration, however, can be problematic.
Integration of adjacent spectral components, which underlies perceptual scales
such as the Bark and ERB (equivalent rectangular bandwidth), is not problematic
because it can be explained by auditory filter bandwidths (Hayward 2000: 140–3)
187
and critical band theory (Howard and Angus 2001: 74–9). But explaining the inte-
gration of spectrally disparate and temporally distributed auditory components
into a unified percept without invoking some homunculus-like agent directing the
operation is much more difficult (Bernstein 2005: 90–1). A famous integration
problem is posed by the McGurk effect, in which auditory and visual information
combine to shape the final perception (McGurk and McDonald 1976). Subjects
are presented with an acoustic signal such as the syllable-sequence [ba ba ba] and
a film or video of a person’s face articulating another syllable-sequence such as
[ɡa ɡa ɡa]. The two are presented synchronously so that it looks as if the person
is producing the heard syllable. Subjects report hearing [da da da] when they
look at the face, but [ba ba ba] when they shut their eyes. Visual information
thus radically influences the auditory perception, and carries on doing so even
when the subject knows that the acoustic signal is always [ba]. For those who
experience the integration, it is impossible to make themselves hear [ba] while
looking at the face. How or where the two sensory inputs combine is not fully
understood, but the fact that speakers of different languages in different cultures
do not all respond in the same way suggests that high-level cognitive processes
interact with the effect (Bernstein 2005: 90–2; Alsalmi in preparation). A ques-
tion posed by Bernstein, Auer and Moore (2004) is whether auditory and visual
information actually fuse into a single perceptual object which is neither solely
auditory nor solely visual, but is amodal (known as the ‘common format’ view),
or whether the sensory modalities remain distinct but associate through some
sort of cross-reference mechanism to determine how the multisensory stimuli are
perceived. These researchers conclude the latter scenario is correct, after consid-
ering evidence that auditory adaptation effects are not influenced by visual cues,
and that the McGurk effect can be overcome if the listener is familiar with the
speaker (ibid.: 207).
Perception of speech is made the easier, so it is often assumed, the more the
phonological distinctions of the language are realised with large articulatory and
acoustic differences. Languages tend to contrast vowel qualities which are fairly
evenly distributed through the vowel space rather than all crammed into one
corner of it, a principle known as dispersion (Hayward 2000: 170–2; Flemming
2002: 15–51; Raphael 2005: 156–8; Schwartz, Boë, Vallée and Abry 2007:
106–7). The dispersion principle is somewhat harder to apply to consonants,
though, as exemplified by the fricatives /f, θ, s, ʃ/ in English and their voiced
cognates /v, ð, z, ʒ/. They are crowded into the place-of-articulation region
between labiodental and post-alveolar, making no use of the larger area from
palatal through to pharyngeal. Western Desert and Greenlandic distinguish dental
from alveolar plosives (Lass 1984: 148), as does New York English (Wells 1982:
515–16), instead of placing them further apart. Either these systems are unstable
and will change fairly quickly to be better suited to the perceptual abilities of
their speakers, or the concept of dispersion needs to be more subtle and sophis-
ticated to incorporate a richer model of phonetic space based on better knowl-
edge of what kinds of distinctions listeners can make and how they make them.
Confusion matrices, and perceptual maps constructed from them (for example
in Johnson 2003: 59–74), can provide a chart of phonetic space but they do not
SomoudBarghouthy
identify the parameters and parameter-values constituting the fabric of that space.
They may tell us that [f] and [θ] are closer neighbours than [f] and [s], but not
why. And we should not simply assume that they would be closer in speakers of
a language that makes no phonological use of [s].
Often discussed alongside dispersion is the notion of auditory enhancement.
If two or more independently controllable articulatory gestures contribute to pro-
ducing a particular auditory effect such that the effect would be less if only one
gesture were to be employed, then we can say that the other gestures enhance the
effect. An example is provided by the ‘emphatic’ consonants in Arabic, which
have a markedly ‘dark’ auditory quality associated with them, mostly manifested
in adjacent vowels. Pharyngeal narrowing, lip-protrusion and hollowing of the
tongue-body all contribute independently to creating a large mouth chamber,
which, with a reduced volume in the pharynx, has the acoustic effect of bringing
F1 and F2 close together, centred on 1 kHz. The resulting wide band of low-
frequency resonance causes auditory events which are perceived as a ‘dark’ or
‘heavy’ timbre compared to the ‘lighter’ timbre of vowels in which F1 and F2
are more widely separated (Khattab, Al-Tamimi and Heselwood 2006: 142–4).
Some applications of the notion of auditory enhancement take one feature to
be primary and the others secondary (for example, Stevens and Keyser 1989),
the latter adding a kind of top-up. But, after a decades-long search for invariant
properties in the speech signal, it is now generally accepted that it is rare to find a
phonetic feature whose presence is essential to the creation of a percept in speech
(Goldinger 1998: 251; Raphael 2005: 200), so it may be preferable to think in
terms of mutual enhancement rather than one feature being enhanced by other
less important ones. It is the dynamics of vocal tract configurations as a whole
which produce sounds, not just activity in one part.
Dispersion, auditory enhancement and the existence of multiple cues for pho-
nological contrasts have all been explained as being motivated by the need to
make speech easier to perceive and therefore easier to comprehend. In Lindblom’s
‘hyper- and hypo-’ theory (Lindblom 1990) this need is in conflict with the
speaker’s preference for pronunciations requiring less effort. ‘Hyperarticulation’
caters for the listeners’ preference for clarity, while ‘hypoarticulation’ concedes
to the preferences of speakers to do as little as possible. Hale and Reiss (2000:
180–1) have ridiculed this ‘Manichean’ battle of tendencies by arguing that the
observable facts of diachronic and synchronic pronunciation behaviour can be as
well accounted for by ‘dysfunctionalist’ principles, which make things more dif-
ficult for speakers and hearers, as by functionalist principles, which make things
easier for them. These authors certainly have a point when this is looked at from
a logical perspective, but it is altogether less plausible to explain hyperarticulated
speech as being motivated by the speaker’s desire to show how well he or she
can accomplish difficult tasks, or to explain hypoarticulation as speakers doing
their best to conceal their utterances from listeners (notwithstanding elocution
competitions and deliberate mumbling so as not to be understood).
The transition from auditory events to conscious awareness of speech sounds
is an area of research relying at least as much on intelligent informed specula-
tion as on empirical evidence, though increasingly there are experimental results
to draw on. Pressure-waves differ in their spectral and temporal characteristics,
and it is transductions of these characteristics which are responsible for how we
189
perceive differences between sounds. Though we do not fully understand how

these auditory events interact to create percepts, some aspects of this are none-
theless becoming clear as a result of research. For example, it is discontinuities
and abrupt changes in the acoustic structure of the pressure-waves which appear
to be the most important ‘landmarks’, as Stevens (2005) has called them, rather
than stable, steady-state portions of sound. Remez and Trout (2009: 243) claim
that it is the time-varying nature of stimuli, not the stimuli themselves, which is
‘critical for eliciting phonetic perception’. This view fits in with the particular
sensitivity of the auditory system to spectral changes (Moore 2010: 470). Recall
that the function of the efferent auditory nerve fibres is largely to suppress the
response of the peripheral system to adjacent frequencies in the cochlea prior to
their transmission up to the auditory cortex by the ANFs. This effect has been
found for time-varying stimuli such as formant transitions as well (Lacerda and
Moreira 1982: 93). The overall effect of these dynamic changes is to amplitude-
modulate the stream of speech at a rate of 3 or 4 Hz, which has been said to be
essential for speech perception (Delgutte 1997: 508). Kluender et al. (2003: 65)
conclude that it is a ‘fundamental principle that perceptual systems respond pri-
marily to change’, and cite evidence from several studies that this is true across
all sensory modalities, not just hearing. The paradox is that although the auditory
system seems to be always on the lookout for changes in the properties of the
signal, the perceptual objects which we become conscious of by directing our
attention at speech appear to have some local stability. It is as if our percepts are
fashioned by a mechanism which assumes stability until certain dynamic thresh-
olds are crossed. For example, we are not aware of all the moment-to-moment
changes in formant frequency values during a monophthongal vowel which can
easily be seen on a spectrogram. We are only aware of the vowel as an object
which is unchanging over its duration, or, to use terminology more appropriate
for perception, over its length. Nor are we aware of the formant transitions which
are a crucial ingredient in the perception of stop consonants; we are only aware
of the stop consonant as an internally unchanging object.
One of the most difficult challenges is to try to discover how listeners’ audi-
tory systems can sort through simultaneous and competing sounds in order to
group similar stimuli together and track them through time. The problem can be
illustrated by taking the example of an orchestra. Imagine listening to a record-
ing of a piece of orchestral music while you can also hear other sounds going
on around you at the same time, such as people talking, the bins being emptied
outside, traffic noise, a dog barking and such like. Not only is it possible to pick
out the music and follow it, but it is also possible to concentrate on listening to
the solo violin, or the flutes, or some other instruments, or to the bins if you so
wish. When we remember that the complex pressure-waves reaching our ear-
drums carry all these acoustic signals as one set of vibrations, the problem would
appear to be insoluble. One influential attempt to explain how this is possible is
known as ‘auditory scene analysis’ (Bregman 1990).
The basic principle of auditory scene analysis is that incoming signals are
streamed automatically during cortical processing so that component signals of
a similar type are allocated to the same stream. Conscious attention can then be
SomoudBarghouthy
brought to bear to identify and classify the contents of the streams (Sussman
2005: 1287). This makes intuitive sense because of our awareness of the several
distinct kinds of sounds – the music, the bins clanging, the traffic – and our
awareness of the different instruments in the orchestra. It is easy to imagine these
various components being diverted into parallel streams so that we can decide
which stream we want to pay attention to, even if we cannot understand how
this is achieved. Remez (2005: 32–3) has objected to the notion of streaming
on the grounds that when we consider the acoustic structure of speech we find
radically different kinds of acoustic classes, as elaborated in Chapter 1 Section
1.2.1 (see Figure 1.5). The objection rests on an assumption that transients would
be streamed separately from aperiodic continuants, and periodic sounds would
be separate again. Depending on how finely the analysis were carried out, there
could be many streams for different kinds of subclasses. In these conditions there
would be no perception of speech as such, merely a clashing of noises as different
streams rush through the system. Listeners would not experience any unification
of these ‘noises’ into a structured whole. But this is precisely what listeners do
experience. On the basis of evidence from experiments using synthetic speech
composed just of sine-waves to represent formant frequencies, Remez (ibid.:
36–8) argues that listeners can and do integrate sounds, which they first experi-
ence as non-speech noises, to form analysable speech percepts once they are told
it is speech. Conscious deliberation and expectation therefore can play a role in
creating perceptions, as can be observed when dealing with unintelligible real
speech. Listeners sometimes report recordings of unintelligible speech becoming
intelligible if a plausible target utterance is suggested to them. The unintelligible
sounds suddenly seem to shape themselves into the suggested utterance. There
are constraints on the kind of stimuli components which can be integrated into
speech percepts, just as there are limits on what a listener can be persuaded to
hear in an unintelligible utterance. Stimuli must have a certain resemblance to
speech, and suggested targets must have a certain resemblance to what is being
listened to. What we do not know is how imprecise the resemblances can be yet
still work. Evidence from perception of degraded acoustic signals indicates that
robust phonetic percepts can arise from impoverished stimuli (Moore 2010: 481).
It is the rich redundancy of the signal in the auditory system which is responsible
for this robustness. There is more information carried to the auditory cortex than
is necessary. For example, closure duration, burst spectrum, VOT, F0, spectral
tilt, vowel duration, and F2 and F3 formant transitions can all provide informa-
tion assisting in perception of [k] in continuous speech, but they do not all have to
be present. Various subsets will suffice, and perception experiments rarely iden-
tify any parameter as absolutely essential for a certain perceptual object to come
into consciousness. To pursue the example of [k], the parameters listed above
involve quite different kinds of acoustic classes and events, yet somehow the lis-
tener experiences [k] without being aware of how all these parameters contribute
to the experience. In an auditory scene analysis account, they would belong to
different streams unless there are other factors at work. Bregman (1990: 529–94)
identifies several possible factors including rhythm, pitch and timbre and adduces
experimental evidence for them. Speech from the same speaker all comes from
the same noise sources and through the same vocal tract, which imposes its
own gross transfer functions on the acoustic signal, perceived as characteristic
191
timbres, or sound qualities. Speakers produce a fundamental frequency (F0)

which does not fluctuate erratically with sudden large jumps but tends to vary
along smooth trajectories. Speakers speak with a voice quality and a rhythmic
structure which are also not characterised by erratic shifts. These spectral and
temporal continuities may enable listeners to integrate what would otherwise be
disparate acoustic events into a fabric which is perceived as continuous speech
even when other sounds mask some of the speech or when a portion of recorded
speech has been spliced out and replaced with non-speech noise (Parker and
Diehl 1984; Warren 2008: 198). Sussman (2005: 1296) conjectures that during
processing ‘the integration mechanism is “searching” for elements that likely
form a perceptual unit’. We should not overlook possible top-down effects on
integration either. Listeners are, after all, most interested in what speakers are
saying rather than how they are saying it and so pay attention to semantic coher-
ence and discourse cohesion.
To sum up, there is much we do not know about how speech is processed
and perceived, but we do understand enough to be able to put forward a general
picture of how pressure-wave stimuli are transduced, so that information con-
tained in them can reach the brain and a listener become aware of hearing sounds
of different kinds and qualities.
5.4 Is Speech Processed Differently from Non-Speech

Stimuli?
Speech and non-speech signals are processed by the same structures at least up
to the fourth level in the cortical hierarchy. Above that, there may be increasing
specialisation of processing reliant on distinguishing speech from other kinds
of acoustic stimuli. A question which has been central to this issue is whether
speech has properties that other acoustic stimuli do not share. If it does, then
the processing of those properties is speech-specific by definition. A number
of properties have been put forward as special to speech. One is that, presented
with continuous speech at a typical rate of 10–15 segments per second (Studdert-
Kennedy and Goldstein 2003: 238), listeners can recognise the successive
consonants and vowels with much greater accuracy than successive non-speech
noises of similar durations, frequencies and intensities presented at the same rate.
Non-speech noises at that rate seem to happen too quickly for our perceptions
to keep up (Mole 2009: 212). This ability may be explicable without invoking a
special speech mode of perception if we take into account two important factors.
One is that listeners must have mental representations of pronunciation-forms
which they can refer to, and which interact with properties of the signal in what
Lindblom (1990: 207–12) calls signal-complementary processing, whereas they
are unlikely to have representations of strings of arbitrary non-speech noises. The
concept of signal-complementary processing clearly has implications for tran-
scription because it predicts that transcribers who speak the language can have
different percepts from those who do not. This point is returned to in Section
5.11 below. The second factor is that information about speech segments is
distributed across much of the sequence, but this is not true of concatenations of
SomoudBarghouthy
noises. Indeed, the very concept of a speech segment is, as discussed in Chapter
1 Section 1.2.1, problematic because of the syntagmatic distribution of informa-
tion. In comparing the ability of listeners to identify consonants and vowels in a
sequence with their ability to identify arbitrary sequences of non-speech noises,
like is not being compared with like. Both factors are likely to relate to process-
ing not at the lower subcortical levels of the central auditory system but at the
higher cortical levels, where differentiation between speech and non-speech is
thought to take place (Bernstein 2005: 84–7).
Other properties which have been deemed to evidence that speech is pro-
cessed differently are categorical perception, duplex perception and audiovisual
fusion. Categorical perception is contrasted with continuous perception and has
been claimed not only to be specific to the perception of consonants as opposed
to vowels, but also to be absent in the perception of non-speech sounds. The
clearest examples come from experiments which manipulate VOT values and
show that listeners are more sensitive to differences across a category boundary
than to differences within a category (see review in Liberman 1996), having
attuned their sensitivity during language acquisition (Vihman 1996: 73–97). If
a language has VOT values below, say, 20 ms for realisations of /b/ and above
20 ms for realisations of /p/ (much like English), then English-speaking listeners
will notice a marked difference between tokens having values of 15 and 25 ms,
but only a small difference, if any, between tokens with values of 5 and 15 ms,
or 25 and 35 ms, although the absolute differences are 10 ms in each case. Note
that this pattern cannot be explained by the general Weber–Fechner law, which
states that differences between quantities have to be larger with larger quantities
before they are perceived. For example, we can easily see that a 2 cm object is
bigger than a 1 cm object, but we cannot see that one building is 1 cm taller than
its neighbour. This law would predict that the 5–15 ms difference would be easier
to perceive than the 15–25 ms difference, but experiments have refuted such a
prediction. The claim that categorical perception is specific to the perception of
speech sounds in language-using humans has an obvious appeal in relation to
the categorical structure of phonological systems, but it has been undermined by
studies showing that the effect occurs with non-speech acoustic stimuli in adult
and infant humans and also in the responses of certain non-human mammals such
as chinchilla rodents and macaque monkeys (see review in Kuhl 1989). From
these and similar experimental studies, Hauser and Fitch (2003: 173) conclude
that ‘when a claim has been made that a particular mechanism X is special to
speech, animal studies have generally shown that the claim is false’. Only the
speech-specific nature of categorical perception is in doubt, however, not the
phenomenon itself. Electrophysiological studies using measures such as event-
related potentials (ERP) have shown categorical perception to be a robust audi-
tory processing strategy for VOT and place of articulation (Molfese, Fonaryova
Key, Maguire, Dove and Molfese: 2005).
Duplex perception is the phenomenon of hearing the same acoustic stimulus as
a non-speech sound in one context of listening but as part of the sound structure
of speech in another context. The now-famous example of this phenomenon was
presented in Liberman, Isenberg and Rakerd (1981). If the formant transitions
from a stop consonant into a vowel are cut from the signal and played into one
ear via headphones, listeners report hearing a brief ‘chirp’ type of non-speech
193
noise. When the same stimulus is accompanied by the rest of the vowel played
into the other ear temporally aligned to follow the transitions, listeners still hear
the ‘chirp’ but they also hear a CV syllable – /da/ or /ɡa/ depending on the direc-
tions of the transitions. Furthermore, as the transition directions are changed
to be appropriate for /d/ or /ɡ/, listeners respond in a continuous fashion to the
changing ‘chirp’ but in a categorical fashion to the shift from /d/ to /ɡ/ signalled
by those very same transitions. The switch from hearing a given stimulus as non-
speech to hearing it as speech was found by Remez using synthetic sine-waves
(see Section 5.3 above). The evidence seems strong that human listeners can
hear the same pressure-wave stimulus in two distinct modes, which have been
labelled ‘auditory’ for non-speech listening and ‘phonetic’ for speech listening
by Studdert-Kennedy (1982), who described them as ‘different, active, “atten-
tional” modes of scanning the signal for information’ (ibid.: 10). However, a
duplex perception effect was induced in listeners by Fowler and Rosenblum
(1990) using the noises of wooden and metal doors slamming, indicating that the
effect is not confined to perception of speech.
Audiovisual integration has been discussed in the speech literature mostly in
relation to the McGurk effect, explained in Section 5.3 above, but the importance
of visual information in speech perception has been demonstrated in a number of
other kinds of experiments (Massaro 2004) and the results have been placed in
the context of the multisensory nature of perception generally, not just of speech
(Calvert and Thesen 2004). Moreover, evidence for the influence of visual
information on auditory perception has been found outside of speech. Saldana
and Rosenblum (1993) present results of an experiment in which the same cello
notes were heard as plucked or bowed depending on what the listeners saw the
cellist do. If percepts resulting from audiovisual integration are not specific to
speech, it could still be true that the degree of integration, and the importance of
the phenomenon, might be. Until the inventions of telephony, radio transmission
and recording and their widespread use over the last century or so, speech almost
always took place, and mostly still does, in contexts where speaker and listener
are in visual contact and in close enough proximity that the much faster speed
of light over sound is not apparent. The potential importance for interpersonal
activity and social organisation of the communicative content of speech will
always have motivated listeners to use whatever perceptual cues are available in
order to comprehend the speaker, a task which includes accurate perception of
quite subtle distinctions of sound. These are the kinds of factors that might make
audiovisual integration of speech stimuli highly valued by humans (Mole 2009:
221–2), more so than other kinds of stimuli.
To conclude, at least provisionally, on the issue of whether there is a speech-
oriented phonetic mode of perception distinct from a non-speech-oriented audi-
tory mode, we can say that while all acoustic stimuli are conveyed to the auditory
cortex without differentiation, and are processed in the same ways at the lower
cortical levels, humans might pay a different level of attention to sounds which
they identify as belonging to speech. Of course this begs the question of how
they make this identification. An answer may lie in the importance of the ability
to identify members of our species and communities by their voices, and the
SomoudBarghouthy
consequent tuning of our perceptual systems assisted by the fact that listeners are
also speakers producing the same kind of output. The streaming of the diverse
acoustic material of speech into a single coherent fabric, and the ease with which
this is done, may attest to a special attentional sensitivity to speech without
entailing a special mode of perception.
5.5 The Issue of Consistency

One of the most common arguments against the value of impressionistic tran-
scription is that it is subjective and therefore inherently unreliable and inconsist-
ent. It has often been observed not only that different transcribers come up with
different analyses of the same data (Shriberg and Lof 1991; Wester, Kessens,
Cucchiarini and Strik 2001), but that the same transcriber may express a different
analysis on subsequent occasions (Kerswill and Wright 1990). This inconsistency
of data analysis makes impressionistic transcription a non-scientific procedure,
a problem recognised in the formative years of the IPA. Its recognition moti-
vated the first two of the six principles governing phonetic notation published in
August 1888 in ðə fonetik tîtcər, the forerunner of Le Maître phonétique, which
later became the Journal of the International Phonetic Association. The first
principle states that there ‘should be a separate letter for each distinctive sound’,
and the second principle advocates representing ‘very similar shades of sound’
with the same symbol. These principles are phonological rather than purely
phonetic, as Ladefoged (1990: 338) has commented. The term ‘distinctive’
means both ‘auditorily distinct’ and having the attested function of distinguish-
ing lexical items in at least one language (Esling 2010: 687). The fear, shared by
Bloomfield and other American linguists, was that impressionistic transcription
would be undisciplined in its use of symbols and diacritics, varying too much
across transcribers. Science aims for consistency, and the test of consistency
is whether results are replicated when the same methods and procedures are
applied to the same phenomena. It is important to point out, however, that even
in instrumental phonetics total and absolute consistency is rarely achievable.
Although the same piece of equipment with the same settings will always yield
the same analysis of the same data, thus making instrumental analysis immune to
the intra-transcriber problem, different pieces of equipment can produce different
results, as can the same equipment with different settings. Foulkes et al. (2010:
370) warn that these variables can introduce non-trivial differences into one’s
results, for example giving different formant frequency values. Inter-transcriber
inconsistency is therefore replicated to some extent in instrumental analysis. Nor
should we forget that measurements require someone to decide how measure-
ment points are to be identified, either manually or by algorithm. Rather than a
strict division between consistent and inconsistent methods, the situation is more
of a continuum with certain methods being more or less consistent, and thus more
or less scientific, than others. Impressionistic transcription is unarguably much
further towards the inconsistent end than instrumental analysis, but steps can be
taken to control the conditions under which transcriptions are made to bolster its
credentials to some considerable extent (see Section 5.12 below).
The main counter-argument to the unreliability criticism lies not, however, in
SomoudBarghouthy Narrow Impressionistic Phonetic Transcription 195
trying to convince sceptics that the shortcomings of auditory-perceptual analysis
can be remedied, but in that its purpose is different from instrumental analysis:
namely, to analyse the sounds of speech as perceptual objects, not as objects
in the external physical world. By their very nature, perceptual objects have
the property of plasticity – they can be altered by context and experience. For
example, experiments have shown that listeners adapt to sounds in such a way
that category boundaries are shifted. Warren (2008: 222–4) gives examples of
this general auditory ‘criterion shift rule’ by which there is a ‘continual calibra-
tion (i.e. verification or modification) of criteria used for perceptual evaluation
of sensory input’ (ibid.: 224). One clear example involves VOT’s well-known
function of distinguishing between voiced and voiceless stops in languages. If lis-
teners are repeatedly exposed to tokens of /t/ with short VOTs, they adapt to them
and become less quick to accept slightly longer VOTs as tokens of /t/. That is to
say, their /t–d/ boundary has shifted in the direction of /t/ so that tokens which
would before have been perceived as /t/ are now perceived as /d/. Adaptation to
/d/ of course shifts the boundary in the other direction. Adaptation effects have
also been found with place of articulation cues.
The processes by which perceptual objects are created, together with their
plasticity, raise the question of their relationship with the external stimuli
which caused them. For our purposes, this specifically concerns the relation-
ship between, on the one hand, properties of pressure-waves and the vocal tracts
which produced them and, on the other hand, the expression in an impressionistic
transcription of the content of speech sounds as perceptual objects. This question
brings us to the issue of veridicality.
5.6 The Issue of Veridicality

The notion of veridicality concerns how accurately a representation corresponds
to the thing it purports to represent. To assess the veridicality of an impres-
sionistic phonetic transcription, then, we first have to have a clear idea of what
exactly it purports to represent. Because phonetics covers the diverse domains
of articulation, acoustic transmission and auditory perception, phonetic notation
can in principle be employed to denote categories in any of these domains (see
Chapter 6 Section 6.5), but we have seen in the history and development of pho-
netic notation a bias towards the representation of what speakers do in their vocal
tracts to produce a particular sound. That is to say, the focus has been on the
articulatory domain. The organic-iconic notations of Bishop Wilkins, Alexander
Melville Bell and Henry Sweet (see Chapter 3 Section 3.1) provided resources
for expressing an analysis of a complex articulation into its component parts, or
at least what were at the time believed to be its components (recall, for example,
Wilkins’s erroneous identification of the epiglottis as the locus of voicing). This
focus is also evident in non-iconic analogical, analphabetic and alphabetic nota-
tion systems including the modern IPA, whose consonant and vowel charts can
be interpreted as instructions on what to do with one’s articulators (Heselwood
2008b). If an impressionistic transcription purports to represent an analysis of a
speaker’s articulation, then the extent to which it is veridical can be evaluated
if we have clear objective evidence of what the vocal organs were doing, for
SomoudBarghouthy
example in the form of electropalatograms, articulograms, ultrasound images,
X-ray film etc. Acoustic evidence can also be used in so far as articulatory events
can be inferred from spectrograms and acoustic waveforms, for example tongue
position from the position of a vowel in the F1–F2 plane, or a stop closure from
a short period of silence, or close approximation of articulators from the presence
of aperiodic acoustic energy, or laryngeal tension from irregular voice pulses,
and so on.
Evaluating how veridical an impressionist transcription is with respect to
activities in the speaker’s vocal tract, or with respect to the acoustic structure of
the pressure-wave, is more problematic than it might at first seem. To see it as a
process of ticking off one-to-one correspondences between elements in the tran-
scription and elements in the articulation or acoustic structure, the way one might
tick off one-to-one correspondences between objects in a room and a photograph
of that room, is too simplistic.
Let us first of all take an example from outside speech. The example concerns
musical tones. If two tones of, say, 1,245 Hz (D♯6) and 1,480 Hz (F♯6) are pre-
sented to the ear simultaneously, many listeners will also hear a lower-pitched
tone of 235 Hz (A♯3) at the same time. This tone is called a difference tone
because it has a frequency equal to the difference between the frequencies of
the two presented tones (the principle is the same as residue pitch). Other tones
may also be heard which are equal to the lowest presented tone minus the differ-
ence tone, called the second order difference tone (1,245 − 235 = 1,010 Hz), the
lowest tone minus twice the difference tone, called the third order difference tone
(1,245 − (2*235) = 775 Hz), and so on. Note that these combination tones are
always lower in frequency than the lowest presented tone. People vary consider-
ably in how many combination tones they hear (Howard and Angus 2001: 229).
Some modern composers, for example the Hungarian György Ligeti, exploit the
phenomenon of combination tones in their music.
The relevance of combination tones for transcription is that were two musi-
cians to transcribe the interval or chord they heard when presented with the
two-tone stimulus specified above, and one transcribed it as the two-note (minor
third) interval D♯6–F♯6, the other as the three-note (second-inversion D♯ minor)
chord A♯3–D♯6–F♯6, can we say that the first transcription is veridical but the
second is not? Looking at the spectrum of the input pressure-wave we would
see two tones matching the two notes of the interval. What we would not see
would be a third tone corresponding to A♯3. A simple tick-box approach would
indeed conclude that the first transcription was accurate and the second inaccu-
rate. However, if we apply our knowledge of how the auditory system responds
to these kinds of tones, then we might expect to see A♯3 in a transcription. The
second transcription is veridical with respect to this expectation. There is there-
fore a very compelling justification for saying that both transcriptions are correct
because it is equally plausible that they are veridical, not with respect to the
objectively measurable acoustic stimulus, but with respect to the contents of the
two transcribers’ perceptual objects.
Analogous examples can be found concerning speech. We have already noted
the phenomenon of residue pitch, where the auditory system computes perceived
pitch from harmonics, and the same kind of relationship with the acoustic signal
occurs in relation to formant resonances. When two or more formants are within
197
about 3.5 Bark, they are integrated by the auditory system into a single percep-
tual formant (Bladon 1983: 311–13; Hayward 2000: 154–6) with a Bark value
which does not directly map to an acoustic formant at a corresponding Hz value.
The resonance peak in auditory space may actually map to a valley in acoustic
space, being the amplitude-weighted mean of the acoustic formant frequencies.
The auditory system transforms the spectra of vowels, and approximants such
as [ɹ] (Heselwood and Plug 2011), such that the individual formants within the
integration band are not separately heard. Residue pitch and perceptual formants
illustrate that auditory qualities and acoustic spectra have different shapes, but
because the shapes have predictable relationships, perception of acoustic spec-
trum S as auditory quality Q may be taken to be a veridical perception.
A less straightforward example in phonetic transcription might concern the
presence or absence of a short schwa vowel in a word such as English collapse.
Judgements of this kind concerning schwa have indeed been shown to be prob-
lematic in evaluating transcriptions (Wester et al. 2001). Even if acoustic analysis
shows no vocalic segment between /k/ and /l/, an impressionistic transcription
with a schwa would be justified if the transcriber heard it, and could be argued
to be veridical because we know that the auditory system is capable of integrat-
ing subtle cues in the adjacent sounds into a percept of schwa (see for example
Coleman 1994: 318–20; Wells 1995a: 403; Patterson, LoCasto and Connine
2003; Simpson 2005: 56–7). However, we do not know as much about how this
integration happens, or what precisely the inputs to the integration process are,
as we do about combination tones.
Providing we can exclude hallucinatory and other effects not associated with
the relevant stimulus, we are justified in assuming that the contents of a listener’s
perceptual objects are related to properties of the stimulus as transduced by the
auditory system (and visual system in the case of audiovisual-perceptual analy-
sis) in systematic ways. Kerswill and Wright (1990) carried out a study in which
transcribers were asked to transcribe a phonetician’s pronunciations of phrases
containing an alveolar-to-velar assimilation site at a word boundary, for example
road collapsed. Transcriptions were compared with EPG records of the utter-
ances which showed three conditions at assimilation sites: no assimilation, partial
assimilation and complete assimilation. About half of the complete assimilations
were transcribed as alveolar despite there being no EPG evidence for this. Using
the EPG evidence as criterial, and taking a simplistic view of veridicality, half
the transcriptions would be dismissed as non-veridical. Kerswill and Wright are
careful to point out, though, that lack of alveolar contact in the EPG record is not
proof of the absence of what they call ‘an abstract auditory parameter that might
be labelled alveolarity’ (ibid.: 272), cued by other aspects of the signal caused
by some residual alveolar gesture which EPG does not detect. It is ‘data of con-
sciousness’ of these kinds that Merleau-Ponty (1945/2002: 8) cites to refute what
he calls the ‘constancy hypothesis’, namely that stimuli and perception enjoy a
stable, point-by-point correspondence, i.e. that the one is simply a topological
transform of the other.
For veridicality to have any application in evaluating the content of a percep-
tual object we have to be able to state what would count as non-veridical. If a
SomoudBarghouthy
tick-box approach is not considered a valid procedure, then it is not clear how we
can confidently conclude non-veridicality. Any mismatch between the contents
of a perceptual object and the contents of that which has caused the perception
could in principle, one might argue, be explained in the same way that combina-
tion tones or perception of a ‘missing’ schwa or an assimilated alveolar can be
explained. Failure to explain would simply be due to an insufficient understand-
ing of how perceptual objects are formed from auditory input.
In relation to impressionistic transcription, are we then committed to accepting
the veridicality, or at least not accepting the non-veridicality, of any and all tran-
scriptions of the same piece of speech? Because we cannot inspect another per-
son’s perceptions, a transcriber’s claim that ‘I heard it’ is unassailable. Recalling
Abercrombie’s (1967: 127) crucially insightful statement that ‘phonetic tran-
scription records not an utterance but an analysis of an utterance’, transcriptions
of the same utterance will differ if transcribers have made different judgements
about how to categorise the contents of their perceptions, perhaps because of dif-
ferent interpretations of what certain categories mean; that is to say, their criteria
for category assignment are not exactly the same. There are thus grounds for
rejecting transcriptions if it can be shown that an erroneous judgement has been
made through misinterpretation of a category. It is important therefore that the
conventions for a notation system are as clear, detailed and explicit as possible
and firmly grounded in phonetic theory. The issue of ‘correctness’ of impres-
sionistic transcriptions is taken up again in Section 5.9 below.
5.7 The Content of Perceptual Objects

The issue of veridicality of perception in relation to events external to the listener
leads us to consider how we can characterise the content of a perceptual object
such as a speech sound as it exists in conscious awareness, and whether in con-
sidering perceptual objects we can ignore the external events. There are different
views on this issue which divide at a deep philosophical level and which have
crucial implications for how impressionistic phonetic transcriptions relate to raw
speech data, and therefore for the claims that can be made about what such tran-
scriptions can tell us about speech.
An influential approach has been to identify vocal tract activity as the object
of perception, an approach taken by ‘motor theories’ of speech perception very
much associated with the Haskins Laboratories in the US (for example Liberman
and Mattingley 1985). The idea is that listeners refer to, or ‘recruit’, the knowl-
edge they have as speakers about how sounds are produced in order to perceive
speech spoken by others. As a speaker, I know what the syllables I produce
sound like and I know, in a procedural sense, how to produce them, so perhaps
when I hear someone else produce that syllable it activates my knowledge of
how I produce it. At that point, but not before, I perceive the sounds which have
been produced. Proponents of this view point to the subsequent discovery of
‘mirror’ neurons as a mechanism for how production and perception knowledge
could be matched up. Mirror neurons have been found in Broca’s area of the
human brain (Iacoboni et al. 1999), where they could be responsible for how
an auditory stimulus might trigger, at a subvocal level, the patterns of motor
activation necessary for imitating it. We will consider imitation as a strategy in
making impressionistic transcriptions in Section 5.12 below. Robust connections
between motor representations of speech in the anterior cortex and sensory rep-
resentations in the posterior cortex have been hypothesised by Honda (1996: 49)
on the basis of electromyographic data from extrinsic tongue-muscle activities
in the production of different vowel qualities. These connections could provide
the supporting structures for perception and imitation. According to the motor
account of perception, the content of a perceptual object in the context of speech
would be potential movements of speech organs which parallel the actual move-
ments of the speaker’s organs. The account thus fits well with the articulatory
bias in the conventions of phonetic notation systems. Using the symbol [b], for
example, would express an analysis of a perceptual object containing mirror-
neuronal stimulations corresponding to bilabial closure, velic closure and vocal
fold vibration.
A somewhat similar approach, also well suited to the articulatory bias of pho-
netic notation and also associated with the Haskins Laboratories, is encountered
in the theory known as ‘direct realism’ (Fowler 1986), except that instead of the
listener’s own vocal gestures being the objects of perception it is the speaker’s
gestures which are perceived directly. This therefore comes into the general
class of ‘distal’ theories of auditory perception (O’Callaghan and Nudds 2009:
10). It holds that we can tell what it is that is causing a perceived sound, as well
as where it is being caused, giving us potentially valuable information about our
environment. Direct realism claims that there are no cognitive processes inter-
vening between the external object and its perception (Fowler 1986: 4). In terms
of the scheme presented in Table 5.1, direct realism maintains that pressure-
waves have the true imprint on them of the vocal tract gestures which caused
them; this imprint is faithfully transmitted to and through the auditory system
and exhaustively determines the form of the perceptual object such that it may be
claimed that the listener in fact ‘hears’ the vocal tract gestures, not properties of
the acoustic pressure-waves (ibid.: 6). Moreover, listeners hear them directly, not
through any processes of inference. The theory guarantees complete veridicality
of perception, though not of judgements about perception (and therefore cannot
guarantee accurate transcriptions). Because the notion of ‘gesture’ is understood
in this approach to be a synergetic complex constituted by a coordination of
actual vocal organ movements, the theory cannot be easily tested by trying to
find out what listeners can perceive in detail about how a speaker has produced
an utterance (for example whether they can somehow directly perceive a velar
closure at the beginning of a pronunciation of cat), nor does it specify what level
of detail is relevant (for example individual vibrations of vocal folds in a vowel).
The theory predicts that a listener should be able to reproduce accurately not just
the sounds that another speaker has made, but the very same vocal tract actions
by which they were produced. Ohala (1986: 76–7), in a reply to Fowler (1986),
draws attention to various pieces of research which show that different speakers
can produce the ‘same’ sound with significantly different gestural components.
This observation has been made in relation to American English /r/, for instance,
which some speakers produce as retroflex [ɻ] and some as a ‘bunched’ sound
for which Laver (1994: 302) has proposed the symbol [ψ], yet the two types
SomoudBarghouthy
‘sound virtually alike’ (Ohala 1986: 76). Ohala lists, as challenges to direct
realism, ventriloquism and the compensatory articulations of speakers who for
clinical reasons produce speech sounds in atypical ways. If a listener perceives
[b] yet the speaker made no lip-closing gesture – lenition studies such as Lavoie
(2001) show that Fowler and Galantucci (2005: 636) are wrong when they say
that the lips always make contact in realisations of /b/ – then what exactly is
being perceived according to a direct realist account? A motor theory account
can surmount the challenge by saying that properties of the acoustic stimulus
appropriate for [b] can trigger a labial activation pattern in the listener’s motor
system. In direct realism the acoustic signal is said to be transparent, yet there
is no labial closing gesture to be seen through it. Fowler has answered this kind
of objection by appealing to the notion of a mirage, saying that the effect of a
compensatory articulation is to mimic a ‘normal’ articulation (Fowler 1990: 533)
the way that a mirage mimics water, that is to say to put the same imprint on the
signal and thereby to cause the same perception. This move in effect concedes
that the acoustic signal from a compensatory articulation, or a ventriloquial
articulation, is not transparent, at least not in the sense of affording a true view
of the source. The listener’s perceptual object is clearly false if it contains a
purportedly real labial closure when in fact there was no labial closure. At some
point in the process of perception the wrong interpretation of the signal has been
made. If interpretation comes into the process at all, then perception is not direct
but partly shaped by an interpretative process internal to the listener’s auditory-
perceptual system, such as that carried out by the ‘input analyzers’ postulated by
Fodor (1984: 45).
Direct realism seems to be naïve realism in its belief that our percep-
tual apparatus has no effect on our perceptions, and that perceptual objects are
identical to external objects. These beliefs ignore the transfer, or filter, function
of the auditory system, which reshapes the properties of pressure-waves into
psychoacoustic objects such that there is no complete isomorphism. We simply
cannot find out what the pressure-waves would sound like in the absence of the
filter function of the auditory system, for without the auditory system we would
perceive nothing at all. The simple formula in (5.1a), stating that a percept can be
accounted for by a filter function applied to a stimulus, cannot be reformulated
as (5.1b) because a percept is not amenable to inverse filtering, which means it
cannot meaningfully be compared with the stimulus (S = stimulus, P = percept,
f = filter function).
(5.1) (a) P = S + f
(b) S = P − f
The clinical condition of auditory agnosia provides evidence that an interpre-

tative function must be operating in speech perception. In this condition, listen-
ers can hear sounds perfectly well, and discriminate between them, but cannot
make sense of them (Badecker 2005; Ingram 2007: 161–3). They have percep-
tual objects of some kind because they are aware of sounds, but they cannot
tell what the contents of those objects are; that is to say, they cannot interpret
them or make analytic judgements about them. It is not necessary to suffer from
clinical auditory agnosia to experience a sound that one cannot interpret. Infants
must be frequently in that situation when hearing sounds for the first time, and
occasionally adults will hear sounds they have not heard before and be unable to
recognise or categorise them, for example when listening to a language or accent
with consonants and vowel qualities they have not heard before, or to a musical
instrument they have not come across before. When phoneticians find themselves
in this situation they can call on their knowledge of the categories of phonetic
theory to assist them in their interpretation of what they hear.
I contend that the complex of articulatory activities responsible for a speech
percept is an indeterminate one – that our perceptual system does not have pro-
cedural knowledge of everything about vocal tract actions that contribute to, or
could contribute to, a percept on any given occasion. This contention is not the
same as saying that we cannot know the full contents of a percept, providing of
course that we do not subscribe to direct realism. It does mean, though, that we
cannot be certain about the causes of perceptions and therefore cannot be certain
about what is happening in the speaker’s vocal tract. What we can say is that,
for any speech-related perceptual object O, there will have been a coordinated
complex of vocal organ movements of varying probabilities which generated an
acoustic signal having many-to-many, one-to-many and/or many-to-one relations
with those vocal tract events. The acoustic signal is transduced by the audi-
tory system, which, as outlined earlier in Section 5.2, imposes its own transfer
function before O is formed by the integration of auditory events. By ‘varying
probabilities’ I mean that, if a [b] is perceived, it is most probable that there
was a complete bilabial closure, and therefore very reasonable for the system to
interpret the transduced acoustic information to this effect, and for the listener to
judge that there was. But, as ventriloquial speech testifies (Howard and Jordan
2009: 32), there are other ways of inducing a [b]-percept which are less common
and therefore less probable, but not impossible, as causal explanations for it. It is
the occurrence of low-probability articulatory causes which are likely to fool us
the way mirages fool us. The question still remains: what is being perceived? I
propose to leave that question for philosophers to pursue further. I suggest that,
for purposes of trying to elucidate what impressionistic transcriptions are tran-
scriptions of, it may be more fruitful and useful to ask: what is the transcribing
listener aware of such that a transcription can represent it? In other words, what is
it which is ascribed to phonetic categories when an auditory-perceptual analysis
is made during the activity of impressionistic transcription?
5.8 The Objects of Analysis for Impressionistic Transcription

It is tautologous under the definition of ‘sound’ offered in Section 5.1 that when
a listener is aware of hearing a sound, it is a sound which the listener is aware of.
In order to direct attention to the sound and make phonetic judgements about it,
it is only necessary for the listener to be aware of it as a sound-sensation, not as
being caused by any kind of external event. This is not to deny external events as
causes of sounds, just to point out that the actual causes of a particular sound S
can be disregarded when making judgements about its auditory qualities (Scruton
1997: 2, 2009: 57–8). Because the terminology of phonetic classification is
SomoudBarghouthy
overwhelmingly articulatory, in the phenomenalist approach outlined below care
has to be taken not to interpret the judgement that S is bilabial as a judgement that
the speaker made a bilabial articulation. Hammarström (1958: 34) has made the
point that ‘[i]f a listener hears the same sound twice and if it is shown that the two
sounds were articulated quite differently, this information is obviously irrelevant
on the auditory level’. In a direct realist approach, however, the two kinds of
judgements are the same because perceptual objects are constituted only by their
causes. For direct realists, the objects of analysis for an impressionistic phonetic
transcription have to be vocal tract activities, not sound-sensations, a requirement
which makes direct realism a physicalist theory. Shriberg and Kent (2003: 3),
for example, take a direct realist position on impressionistic transcription when
they say that the purpose of phonetic transcription is ‘to represent the produc-
tion of speech sounds’. Fowler’s acceptance of mirage perceptions concedes
that listeners can be mistaken about speakers’ vocal tract activities, a concession
which aligns direct realism with the physicalism of experimental phonetics in a
common suspicion that the sense of hearing is inadequate to the task of finding
out about the physical realities of speech; see Section 5.10 below.
It is not difficult to show that the sense of hearing is indeed inadequate to
this task, but what I want to propose is that this inadequacy is not relevant if
the aim of auditory-perceptual analysis is to inspect the contents of perceptual
objects rather than the contents of any external events which may have directly or
indirectly, in whole or in part, caused them. If we want to get at external causes
then there are more successful ways to do so than through perceptual analysis.
Instruments for articulatory and acoustic analysis will do a better job, but they
cannot tell us what speech sounds like. The contents of palatograms and spec-
trograms are not the contents of perceptual objects, nor are the raw transductions
of the auditory system, because these have disappeared within at most 400 ms
(Remez 2005: 39).
Let us take the first of these points first. If the content of an auditory per-
ceptual object is not to be located outside of the experience of being aware of
a sound-sensation, then one of two positions is being taken: either a phenom-
enological position or a phenomenalist position. Both positions would focus
on the subjective experience of hearing sound, but phenomenalism holds that
the very ontology of sound as a phenomenon is confined to the experience of
hearing it, while phenomenology is agnostic on that point, allowing for sounds
to exist independently of the hearer, for example at the source. The definition of
‘sound’ we have been using, represented in Table 5.1, is more consistent with
phenomenalism in that the events giving rise to the external stimuli, and the
events during auditory processing, are not themselves sounds or parts of sounds,
though they can be said to be events which are disposed to affect our percep-
tions in certain ways. A useful analogy might be a factory which makes objects
(which, for some reason, never leave the factory) from raw materials delivered
to it. The objects only exist in the factory and therefore can only be analysed and
described from inside the factory. The claim that they can be observed to exist,
and can be described from a subjective point of view, makes phenomenalism an
empirical philosophy, albeit a subjective and not a physicalist one. For example,
nasality as a percept is observed to exist by the hearer in the hearer’s experience
of it. Although we can point to external events such as velum-lowering, nasal
203
airflow and acoustic spectra with particular pole–zero pairs, and responses of
the auditory system to these spectra, they do not have the phenomenal character
of sound. Nasality as a perceptual object is qualitatively different from any of
the articulatory, aerodynamic, acoustic and auditory raw materials out of which
we can, at least in principle, say it was made. In fact we know that nasality as a
percept can be caused by configurations in the larynx and pharynx, not only by
nasal resonance (Laver 1980: 86). It should be emphasised that a phenomenalist
account of sound does not commit one to a phenomenalist account of the world
in general. In the puzzle of the tree crashing down in a deserted forest, we can
coherently say there is no sound while at the same time saying that there is a tree,
there is a forest and there are pressure-waves. All these are primary objects, i.e.
their existence is not dependent on anyone experiencing them, whereas sound is
a secondary object: it has no existence outside of being heard, yet it is nonethe-
less ‘a real part of the objective world’ (Scruton 2009: 58) in the same way that
the products of the factory in the above example are things in the world at large.
The second point concerned the brief time for which the auditory system
holds the transduced signal: no more than 400 ms. Does this mean that auditory-
perceptual objects only last for this short time, or do they persist? If they persist,
it must be in some coded form which can be stored and retrieved for inspection.
The fact that we can make same-or-different judgements about sounds well after
400-ms intervals, and that voices can be recognised after long periods of time, is
evidence for their persistence. The content of auditory-perceptual objects, then,
must have the form of codings in memory and it must be these codings about
which phonetic judgements are made, for example a judgement of nasality or
bilabiality. Baddeley (2004: 3) claims that working memory has a ‘phonological
loop’ which can hold a memory of sound for about 2 seconds, during which time
it is coded. It may be that the optimum time to make phonetic judgements is while
sounds are in this loop, in which case anything exceeding the duration of the loop
cannot be so effectively judged and may be affected by ‘resonance’, or ‘echoes’,
which accrue non-phonetic information from activation spreading through the
mental lexicon (Johnson 2007: 36–7), and which may introduce lexical biases on
phonetic judgements. If the speech to be transcribed is not recorded, and the tran-
scription has to be done ‘live’, then there is only one 2-second bite at the cherry.
A recording means that the 2-second optimum window can be repeatedly opened,
although one has to be careful not to open it too often (see Section 5.12 below).
To sum up, a sound as an auditory-perceptual object has a complex rela-
tionship with events in the articulatory and acoustic domains, and also with
the events these cause in the auditory system. The information responsible for
the unity and stability of a perceptual object is distributed dynamically across
the spectral and temporal structure of the acoustic signal in ways which are still
not fully understood, and which make it crucially important to engage in what
Kelly and Local (1989: 34–5) call ‘holistic listening’. The unity and stability
exist only in the experience of perceiving and make it possible to talk coherently
about speech having a segmental structure, despite the coarticulatory processes
in speech production and the absence of observable phonetically defined, as
opposed to acoustically defined, segments in the acoustic signal, and also make
SomoudBarghouthy
it possible to identify the perceived segments with the isolated sounds discussed
in Chapter 1 Section 1.2.1 and Chapter 2 Section 2.2.4. It is the phenomenal
contents of auditory-perceptual objects which form the objects of analysis for
impressionistic transcription.
5.9 Phonetic Judgements and Ascription

Once an auditory-perceptual object is available in the transcriber’s conscious-
ness, phonetic judgements have to be made about its content and how that content
can be mapped onto the theoretical models denoted by proper phonetic symbols.
Following a terminological suggestion by Dickins (1998: 109–10), I shall call
this mapping process ascription. Ascription requires access to stored knowledge
about phonetic categories, which distinguishes it from what happens in everyday
speech perception if a listener directs attention to the sounds but does not have
knowledge of phonetic theory and its categories. Two different types of memory
are involved in this distinction: recognition memory and declarative memory
(Johnson 2007: 30–2). In ascribing the contents of a perceptual object to, for
example, the categories ‘voiced’, ‘bilabial’ and ‘plosive’, a transcriber engaged
in proper phonetic transcription has first to recognise the object using recognition
memory. This process has been characterised by Ashby (1990) as one in which
the input is matched against stored prototypes based on best exemplars using what
measurement theory calls nominal measurement. In nominal measurement, ‘the
only relation which holds among observations is that of same/different’ (ibid.:
24). Once a best match has been made, and the input assigned to a prototype
category, declarative memory then draws on remembered, explicit knowledge of
phonetic theory, which provides the categories for its ascription. The difference
between pseudo-transcription on the one hand and proto- and proper transcrip-
tion on the other is that in pseudo-transcription there is no declarative memory to
bring to bear. Recognition memory accesses stored exemplars, enabling judge-
ments to be made about which sounds different words have in common, but it
may not be appropriate to say that ascription takes place.
We have repeatedly noted the articulatory bias of the terms used in the pho-
netic classification of sounds, and this poses an obvious problem for impres-
sionistic transcription (Heselwood 2008b). Principle 2 of the IPA states that
the representation of sounds by IPA symbols ‘uses a set of phonetic categories
which describe how each sound is made’ (IPA 1999: 159). Rather than have to
commit ourselves to truths about the causes of speech sounds in order to ascribe
them to theoretical models, I propose, in line with the generally phenomenal-
ist stance outlined above, that in the context of impressionistic transcription
the articulation-based terminology should be interpreted in auditory-perceptual
terms. For example, [b] denotes the auditory-perceptual quality which we experi-
ence when a speaker produces a voiced bilabial plosive, though the same quality
might be experienced when a speaker does something different. The symbol
when used in an impressionistic transcription therefore does not denote vocal
fold vibration or a closure of the two lips, nor does it denote the acoustic corre-
lates of these events. Although these articulatory and acoustic events are, accord-
ing to phonetic theory, the most probable causes of the experience, they are not
essential causes. The auditory-perceptual categories ‘voicedness’, ‘bilabiality’
205
and ‘plosiveness’ are thus strongly motivated by the probable causes of these
impressions, but the ascription of an auditory-perceptual object to the symbol
[b] can be valid in their absence. The relationship between auditory-perceptual
categories such as ‘voicedness’ on the one hand, and articulatory and acoustic
events such as vocal fold vibration and fundamental frequency on the other, can
be characterised as ‘fuzzy’ (Ashby 1990: 21–2). Principle 2 of the IPA should, in
order to legitimate impressionistic transcription, be amended to read something
like ‘a set of categories which describe how each sound is typically, or most
probably, made’. The symbol [b] in an impressionistic transcription needs to be
glossed as something like ‘a sound which sounded the same as a sound produced
with vocal fold vibration, closed lips and a closed velopharyngeal port’ (see
Howard and Heselwood 2002: 388–9; Heselwood 2008b: 91).
By setting up auditory-perceptual analogues of articulatory categories in
such a way as to avoid a necessary causal connection between them, we provide
resources for the analysis of auditory-perceptual objects and move auditory anal-
ysis into territory mapped by phonetic theory. Without such resources, impres-
sionistic transcription would be pseudo-transcription, because it could only
operate by making judgements about what different utterances have in common
through abstracting from experience – it would be unable to give an analytic
account of what it is they have in common. To square the circle of experience and
taxonomy in the context of impressionistic transcription, a symbol such as [b]
has to stand for two things: firstly, a distinct auditory quality (Esling 2010: 687),
that is to say a cardinal quality in auditory-perceptual space which may well be
in the form of a prototype, as proposed by Ashby (1990), or an ‘exemplar-based
generalisation mechanism’, preferred by Johnson (2007: 35); and secondly, a
bundle of categories developed by theorising about what it is that makes different
sound-types distinct from each other, which, for historical reasons, is couched in
articulatory terms. These categories structure an abstract phonetic space which
can be modelled by charts such as the IPA chart. Ascription is the process of
mapping from a point in auditory-perceptual space to a point in abstract phonetic
space, from a cardinal quality to an intersection of categories.
If we cannot say that speech-sound perception itself is determined by
phonetic theory, we can say that judgements about its content certainly are.
Separating perceptual objects from judgements about their content means that
judgements can be wrong without implying that the perceptual object itself
is somehow a faulty object. For example, transcribing a low rising tone as a
falling-rising tone is a fairly common error in student transcriptions if in the
intonation group there is a high head preceding the tonic syllable. The step
down in pitch before the rise is mistakenly judged to be part of the tone: the
perceptual content has been misanalysed prior to being ascribed to a category,
but it has not been malformed.
We should also distinguish errors of judgement and analysis from errors of
ascription, and errors of ascription from errors of symbol use. An ascriptional
error in a narrow impressionistic transcription would be instanced by a failure
to be as fine-grained in one’s analysis as the set of categories allows one to be,
and so failing to make a distinction which one is able to perceive; for example,
SomoudBarghouthy
ascribing falling-rising tones to the category of a simple rising tone. Symbol use
errors arise through inadequate knowledge of the interpretative conventions of the
notation system, for example using IPA [ ]̰ instead of [ ]̃ to denote nasalisation.
Recognising that these different kinds of errors can arise without having to
attribute them to faulty perception gives us a way of approaching the issue of dif-
ferent transcriptions of the same utterance, and the question as to whether some
transcriptions can be said to be wrong even if no transcription can be said to be
definitively right.
In addition to internal analysis of segments, judgements have to be made about
sequential order. Experimental results summarised in Warren (2008: 129–30)
suggest that it is easier for listeners to identify the order of speech sounds when
they are presented in syllables with natural transitional and onset-decay charac-
teristics than when presented as stable, steady-state sounds. It is counter-intuitive
that it should be harder to judge sequential order when the units have clear
objective boundaries than when they overlap, but only if we forget that it is their
sequence in auditory-perceptual objects which is being judged, not the sequence
of events in the acoustic signal, despite the causal connection between them.
It seems to be the case that perceptual objects do indeed have clearly defined
boundaries.
5.10 Objections to Impressionistic Transcription

Practical and theoretical objections to the value of impressionistic transcription
have been expressed, sometimes in strong terms, which need to be countered as
far as they can be. Some of them have been commented on already, particularly in
the sections on consistency (5.5) and veridicality (5.6). Heselwood and Howard
(2008: 381–3) address some of these objections, and Heselwood (2009) examines
the more philosophical objections – physicalist and rationalist – before present-
ing a phenomenalist response. What follows draws largely on these accounts.
If the speaker’s vocal tract behaviours and/or the acoustic signal are the
ultimate phenomena to be analysed and described, then listening to speech and
performing impressionistic analysis is not the best method to employ. Compared
to the consistency and high resolution of instrumental analysis, it lacks reliability
and cannot quantify data in more than gross relative terms such as quieter–louder,
shorter–longer, higher–lower in pitch, darker–lighter in timbre, more like [æ]
than [ɛ] etc. It is less reliable because of inter- and intra-listener variation in
making judgements about the same piece of speech, and it has been criticised for
lacking validity because it sometimes disagrees with more objective instrumental
records (Shriberg, Kwiatkowski and Hoffmann 1984). Scientific investigation of
the real world aims to be both reliable and valid; that is to say, results should be
replicable and should tell us something about whatever it is we want to know
about, for example a speaker’s tongue and lip movements when producing the
diphthong [aʊ], and the spectral dynamics of its acoustic structure. The charge
of invalidity can be refuted, however, if we frame the purpose of impressionistic
analysis in terms of giving a phonetic account of what an utterance sounds like;
in other words, of analysing the phonetic content of auditory-perceptual objects.
These objects exist only in consciousness and cannot be accessed directly by
instruments. Framed in this way, impressionistic transcription can have high
207
resolution if carried out with careful analytic listening, a firm grasp of phonetic
theory and its categories, and full use of the transcriptional resources available.
The senses, including hearing, have been distrusted as sources of information
about the external world at least since Democritus in the latter half of the fifth
century bce (Russell 1961: 89), and rejected as scientific tools by the physicalist
empiricism of Enlightenment philosophers such as John Locke in the seventeenth
century. Like Democritus, Locke regarded what the senses can discern as ‘sec-
ondary qualities’ of things, which get in the way and prevent our apprehension
of ‘the real constitution’, or the ‘primary qualities’ of things as they really are
(Locke 1690: book II, ch. 23, section 11). Because ‘there is no discoverable
connection between any secondary quality and those primary qualities that it
depends on’ (ibid.: book IV, ch. 3, section 12), our sense-impressions cannot
tell us about the ‘real constitution’ of external things. What Locke is saying is
that the sound-sensation of hearing a vowel such as [a] has no connection to the
production of the vowel of the kind that can tell us about the behaviour of the
speaker’s vocal tract, or about the acoustic structure of the pressure-waves. The
connections are, as far as we can understand them, arbitrary ones. In the words
of O’Callaghan and Nudds (2009: 6), sounds under this view ‘lack a constitutive
ontological connection with vibrations or activities of objects we ordinarily count
as sound sources’. The same thinking is behind Scruton’s conception of sounds as
secondary objects, rather than secondary qualities, except that for Scruton there is
no need to suppose dependency, arbitrary or otherwise, on primary qualities. The
human auditory system and human consciousness could just as well be arranged
in such a way that a listener had a very different kind of auditory experience as
a result of those particular external events, although an out-and-out reductionist
account would disagree, saying that we just do not know enough about causes
and effects to explain sensory experience in terms of physical causes (see for
example Dennett 1991). In order to reach and apprehend the ‘real constitution’
of things in a non-arbitrary way, we need, according to Locke, instruments which
are not distracted by sense-impressions. The methods of instrumental phonetics
are compatible with physicalist empiricism because they do not rely on knowing
about speech via the senses. We find this attitude expressed uncompromisingly
by the experimental phonetician Edward Scripture, who went as far as to say that
phoneticians should be ‘congenitally deaf and totally ignorant of any notions
concerning sound’ (quoted in Kohler 2007: 49). To get reliable, high-resolution
information about speech, we should use instruments designed for articulatory
and acoustic analysis, not trust our ears, which will only give us a false analysis.
Notwithstanding the fact that ultimately we can only find out via our senses
whatever it is that instruments have discovered, for example by using our eyes
to look at spectrograms or arrays of numbers, the physicalist argument against
impressionistic transcription appears to be a forceful one until we consider the
implications of accepting it. Taking it to its extreme, we would avoid listening to
speech altogether for phonetic purposes. The consequences for assessing speech
intelligibility and the effects on speech of speech impairments, for example,
would be calamitous. If we want to explain why an individual’s speech lacks
intelligibility, or how it is affected by an impairment, we can only begin to do so
SomoudBarghouthy
by ‘bringing phonetic knowledge to the act of listening’ (Heselwood and Howard
2008: 382); in other words, by making exactly those judgements required for
the analysis of auditory-perceptual objects and their ascription to phonetic cat-
egories. Compensatory articulations, the possibility of ventriloquial speech, and
restorative effects in auditory processing mean that intelligibility cannot be reli-
ably predicted from articulatory or acoustic data.
The same counter-argument applies in non-clinical contexts of speech analy-
sis. It is well known, for example, that for listeners to hear a glottal stop there
does not need to be a glottal closure, just a sufficient, and sufficiently sudden,
reduction in amplitude and F0 (Hillenbrand and Houde 1996). T-glottalling is a
widespread accent feature in English with high social-indexical meaning in many
speech communities (Wells 1982; see also chapters in Foulkes and Docherty
1999), but we cannot reliably infer its existence as a perceptual object by inspect-
ing glottograms or spectrograms: it exists only in the hearing of it (Heselwood
2009: 29–30). It is only when it is heard that it can convey its social-indexical
meaning. The judgement that the perceptual object contains a glottal stop can
then be expressed with the symbol [ʔ] despite articulatory and acoustic evidence
of continuous phonation.
The physicalist objection, then, is that our ears are simply not as good at
finding out about what speakers’ vocal tracts do, or about the acoustic structure
of pressure-waves, as instruments are. A very different objection to impression-
istic phonetic transcription comes from rationalism, the philosophy underlying
generative linguistics and most closely associated with the French seventeenth-
century philosopher René Descartes (Chomsky 1966: 72–3). While empiricists,
including physicalists, are interested in the output of the speaker’s vocal tract
and want to make it as fully accessible as possible, rationalists regard it as irrel-
evant. What they are interested in is not the output of the speaker but the output
of the grammar (Hale and Reiss 2000: 173). Grammatical output is a product of
computations in the mind, and it is the output of the phonological component
of the grammar which phonetic symbols, according to linguistic rationalists,
should represent. Chomsky and Halle (1968: 294) have expressed this by saying
that what a phonetic transcription represents is ‘what a speaker of a language
takes to be the phonetic properties of an utterance’. Because it is only intentions
to produce sounds which are specified in the grammar, not sounds themselves,
phonetic transcription should be interpreted as representing those intentions
(Bromberger and Halle 2000: 23–5). A speaker’s articulations may not always
be a faithful reflection of the intentions behind them, because of performance
factors such as fatigue or haste, or speech impairments of one kind or another.
If we transcribe the output of the speaker, so the argument goes, we may lose
sight of the intentions specified by the grammar amongst all the performance
noise. In the contexts of dialectological or sociophonetic fieldwork, or clinical
speech assessment, or conversational interaction analysis, the inability to get
inside the minds of the speakers to transcribe their intentions would leave tran-
scribers stranded with blank sheets of paper. The logic of the rationalist position
is that the speaker, being the only person with access to his or her own mind
and the intentions in it, is the only person capable of making a valid specific
transcription. Such a transcription, assuming the technical knowledge to make
one, would necessarily be systematic rather than impressionistic, denoting only
209
those categories represented in the phonological component of the grammar, not

the phonetic categories to be observed in the speech output. A further problem
with the rationalist position is that we cannot establish the form of the speaker’s
intentions. That is to say, we cannot, even by introspection as native speakers or
as phoneticians, know whether our intentions are specified in articulatory terms,
or auditory terms, or some mixture of both. We therefore cannot get at the object
of analysis and transcription even when it is in our own minds.
Complete acceptance of the rationalist argument leads, just like complete
acceptance of the physicalist argument, to the conclusion that there is no point in
listening to speech in order to analyse and describe it. It is generally assumed that
unintelligible speakers intend to be intelligible, that is to say intend to realise the
categories specified by the grammar. A rationalist transcription of an unintelligi-
ble utterance would be identical to a transcription of a fully intelligible utterance
providing it could be established that the same intentions lay behind it. In fact a
transcription of silence could be the same if it were claimed that there had been
the appropriate intentions to speak. Transcriptions of intentions would be of little
use to a clinician and would make many clinical interventions seem needless.
Carney (1979) has argued that systematic transcriptions are too remote from
observed speech behaviour to have any clinical value. By the same token, a tran-
scription of one accent, sociolect or dialect would be the same as another unless
it could be shown that the outputs of their grammars were different in relevant
respects. How, we might ask, could that be shown other than by observation and
analysis of speaker outputs? Impressionistic phonetic transcription is merely one
way to record these analyses.
5.11 Who Should Make Impressionistic Transcriptions?

The first qualification for making a narrow impressionistic transcription is suf-
ficient knowledge of phonetic theory and the conventions of the notation system.
To say that phonetic transcriptions should be done by phoneticians is rather
obvious, but a less obvious and interesting recent discovery from brain-imaging
is that, compared to the rest of the population, phoneticians are much more likely
to have either a split Heschl’s gyrus or multiple gyri in the left auditory cortex
(Golestani, Price and Scott 2011), a feature thought to be congenital and which
may confer greater auditory processing power. Golestani et al. speculate that
people with this auditory advantage may be drawn to a profession such as pho-
netics because they find they have a natural ability in it. There is therefore some
reason to believe that phoneticians, though perhaps not all of them, are more
suited to making impressionistic transcriptions than others not simply because of
their phonetic training. It also suggests that their auditory-perceptual objects may
be to some extent different.
An important question is whether impressionistic transcriptions should be
made by phoneticians who are familiar with the language and accent to be tran-
scribed, or by phoneticians with little or no familiarity with it. Familiarity means
the transcriber has representations of the relevant pronunciation-forms in his or
her mental lexicon which can become activated and provide ‘false echoes’ of the
SomoudBarghouthy
sounds to be transcribed. Lindblom (1990: 408) quotes Lashley (1951: 112) in
this context as saying that ‘the input is never into a quiescent or static system,
but always into a system which is already actively excited and organized’; that
is to say perceptual systems are in states of resonance arising from the interac-
tion of bottom-up input and top-down activation (Grossberg 2003: 425). These
false echoes might in effect become the perceptual objects which are then ana-
lysed, ascribed to categories, and transcribed. Laver (1994: 556–7) emphasises
this risk and also the risk of ‘categorical contamination’ from the transcriber’s
language onto analysis and transcription of speech in other languages. Speaking
on behalf of phoneticians generally, Ladefoged (1990: 340–1) has admitted that
‘[f]ew of us could ever make a totally impressionistic transcription’, adding that
‘most phonetic observations are made in terms of a phonological framework’.
Most phoneticians would probably agree with Laver and Ladefoged on these
issues. Lindblom (1990: 408), for example, makes the same point when he says
that ‘if we know a certain language, we cannot help imposing that knowledge
on the signal’. He identifies ‘signal-complementary processes’ as the source of
this interference. Kelly and Local (1989: 58) make a similar point about general
phonetic categories when they say that ‘[l]inguists observe and record with an
ear to the kinds of entities their theory contains’. One argument in favour of
transcriptions being done by transcribers who are native speakers of the variety
is that they are likely to be more finely tuned to all the cues distributed through
the signal which give rise to perception of the sounds the variety uses; what
Ladefoged (1990: 345) has nicely described as ‘the delicate locally woven fabric
of individual languages’. To the counter-argument that these transcribers would
therefore be biased, one could respond by saying that all transcribers are biased
by what they are most finely tuned into, so one may as well use transcribers who
are tuned into the speech to be transcribed. However, the strength of assumptions
that native speakers make about the sounds of their languages can be responsi-
ble for judgements which are strikingly at odds with the judgements of others.
Kim (2011) presents a fascinating study demonstrating that Korean speakers
adamantly identify as nasals sounds which English-speaking listeners judge to
be plosives. She found that spectrographic and aerometric data were consistent
with lack of nasality, but because they pattern phonologically with nasals, native
speakers report perceiving them as nasals. Figure 5.2 presents a spectrogram of
an alveolar example, clearly showing a burst release similar to that of oral [d].
Although it is very probably not possible to escape completely from the ‘per-
ceptual grid’ (Laver 1994: 556) of one’s own linguistic background as a speaker
and hearer, as Kim’s study shows, the grid may be loosened by increased experi-
ence of listening to, analysing and transcribing a wide range of speech material
from a wide range of speakers and languages. Firstly, there is evidence from
another result in the Golestani et al. (2011) study that experience in phonetic
listening can enlarge the pars opercularis in the auditory cortex, which may
increase sensitivity to distinctions of sound. Secondly, the more exemplars there
are in a listener’s exemplar store, the more items there are to take part in signal-
complementary processing through being activated by the signal, with the result
that the system should have a higher discriminative power. This is simply another
FIGURE 5.2: Korean ‘denasalised’ alveolar stop, with IPA symbol

alternatives, from the phrase miguŋ nodoŋ ‘American labour’. Waveform
and spectrogram with kind permission from Kim (2011: 52). K = Korean
speakers’ percept, E = English-speakers’ percept. (Kim uses [n0] to
symbolise this denasal sound.) Arrows point to release bursts of [ñ]/[d̃]
/
and [d] for comparison.
way of saying that the more sounds one has heard, and the more often one has
heard them, the more one is likely to be able to discriminate between sounds.
However much it may be possible to counter the influence on perception and
analysis of one’s own linguistic background, there still remains the practical
question of who is best placed to make a transcription of variety V of language
L. Is it a speaker of variety V, or someone with no previous exposure to it? The
answer may depend on the purpose of the transcription, chiefly on whether the
purpose is to reflect how a speaker of that variety who is also a phonetician
perceives it, or whether it is to investigate the phonetics of the variety from as
neutral a standpoint as possible. In practice, there is much to be gained from com-
paring transcriptions by transcribers who have different levels of familiarity with,
and knowledge of, the pronunciation of the variety (see Section 5.13 below).
5.12 Conditions for Making Transcriptions

It was mentioned in Section 5.5 above that, by paying attention to the conditions
under which an impressionistic transcription is to be made, its reliability can
SomoudBarghouthy
be maximised. Firstly, I would like to plead for an end to live impressionistic
transcription, or at the very least for a highly sceptical attitude towards it should
it be necessary to undertake it, and certainly for speech of more than a couple
of syllables, unless it is delivered in the context of ear-training. Amorosa, von
Benda, Wagner and Keck (1985) found that live transcriptions of clinical speech
missed significant amounts of detail, particularly where the details deviated from
the norm, and even whole words were missed. Heselwood and Howard (2008:
385) make the point that a transcriber simply cannot keep up with a speaker.
Normal speech rate is around five syllables per second (Laver 1994: 541) and,
depending on the phonotactics of the language, there may be up to half a dozen
or more segments per syllable. In fluent speech, the transcriber could be faced
in each second with twenty or thirty segments to transcribe, not to mention
suprasegmental transcription. Each segment requires the process outlined above
involving analysis of the perceptual object, ascription to phonetic categories, and
the writing of symbols and diacritics. There is no going back to retrieve anything
which was missed except by trying to remember it while more speech is passing
by. Nor can one ask the speaker to repeat something, because they may pro-
nounce it differently. It has also been found that it is harder to concentrate on the
phonetics of an utterance, and ignore the linguistic content, in live transcription
(Oller and Eilers 1975; Amorosa et al. 1985).
The problems of live transcription can easily be overcome by recording
speech, either audio-only recording or audiovisual recording, although this brings
with it its own problems. It is important to try to ensure high-quality recording by
using good equipment, and by careful placing of the microphone and, if making
a visual recording as well, the camera. The sampling rate should be high enough
to capture the high frequencies of fricatives, which can reach 12 kHz or more,
so a sampling rate of at least 24 kHz is needed. The microphone should be about
an inch or two from the speaker’s lips and displaced to one side so that plosive
bursts do not peak the response, and the camera should be placed so that the
speaker’s mouth and face are always in view. It is easier to ensure these standards
are met in studio or laboratory recordings; the more ‘natural’ the setting, the more
it is harder to meet them. Inevitably, some balance has to be struck between the
technical quality of the speech recording and what we could call its ethnological
quality, or social authenticity. Field recordings of good technical quality can now
be made using radio microphones with a transmitter attached unobtrusively to the
speaker, transmitting the signal to a recorder placed out of sight. For advice on
recording speech, see Ladefoged (2003: 16–26).
Once a recording has been made, the transcriber has various options about how
to listen to it. Some of these will be matters of personal preference, for example
whether to use ear-buds, use headphones with or without closed cups, or listen in
free-field conditions. Free-field listening should be done in a quiet environment,
but again there may be personal preference as to whether this means an acousti-
cally treated room or an ordinary quiet room. Our auditory-perceptual acuity is
remarkably good at picking out speech signals against noise, so it may make little
or no difference if low levels of non-speech noise are present – they almost always
are when we use spoken language, so completely noise-free speech is unnatural.
Digital audio recordings can be ‘cleaned up’ by using filters designed to remove
or attenuate noise if this is judged to be necessary. Interestingly, though, it has also
been found that adding noise can sometimes increase perceptual clarity (Warren
2008: 170–2), an observation made by William Holder in the seventeenth century.
Holder (1669: 165–8) describes how, when he ‘beat a Drum fast and loud’ behind
a deaf person, that person was enabled to hear speech which he otherwise could
not. Holder also relates anecdotal evidence of a similar effect experienced by deaf
persons when travelling in noisy horse-drawn coaches over cobbled streets. He
explains these effects, saying that sudden loud noises increase the tension of the
tympanic membrane, making it more conductive of other lesser vibrations.
Ideally, it is preferable for recordings to be done by someone other than the
transcriber, so that impressionistic transcription can be carried out without prior
knowledge of the linguistic content of the utterances – so-called ‘blind’ listening.
Different opinions have been expressed about this, with Pye, Wilcox and Siren
(1988) suggesting that knowing in advance the words and phrases in an utter-
ance can facilitate transcription, but Ingrisano, Klee and Binger (1996) taking the
opposite view. The effect on an otherwise unintelligible utterance of suggesting
a linguistic target is a serious warning that prior expectation can have a profound
influence on impressionistic transcription, even in trained phoneticians of long
experience (Howard and Heselwood 2002: 394–5), and speaks loudly in favour
of ‘blind’ transcription.
Presented with recorded utterances, transcribers then have to decide how
long a chunk of speech to listen to at once, and how many times to listen to it. If
Baddeley (2004) is correct about the ‘phonological loop’ holding about 2 seconds
of speech, then its capacity is about ten syllables, or up to about fifty or sixty
identifiable segments, which is too many to deal with. On the one hand, we want
to minimise how much speech we need to analyse at once, but on the other hand,
given the phenomenon of coarticulation and the fact that segments as perceptual
objects are formed from material which is distributed over at least as much as 1
second’s worth of the acoustic speech signal (West 1999), we need as much in
the loop as will be relevant for making analytic judgements. It is impossible to be
precise, but chunks of four or five syllables may be about optimum. If chunking
is done approximately by rhythm-groups it will tend to reduce lexical effects to
the extent that rhythm-group boundaries occur within words, although this is not
the case in all languages; to counter any resulting discontinuities, the last syllable
of one chunk can be repeated as the first syllable of the next. For example, the
phrase that was about the seventeenth of November could be chunked as in (5.2):
(5.2) that was ab

about the sev
seventee
teenth of Nov
November
So that the transcriber does not get familiar with the lexical material whilst pre-
paring it into chunks, this task ought ideally to be carried out by someone else.
How many times to listen to a chunk is also a question to which it is difficult
to give a precise answer. Many playback systems have a loop repeater which will
SomoudBarghouthy
play a selected portion of the recording over and over again. But we should not
overdo it. Over-listening can start to play ‘tricks’ on the ear, and make us doubt
what we thought we previously heard. Aware of this, Shriberg et al. (1984: 459)
recommend listening no more than three times to try to resolve disagreement
between transcribers over an item. However, within a chunk, one can direct
one’s attention to particular aspects of it, such as the place of articulation of
the first consonant, the vowel quality, the pitch movement or whether the final
stop is released. Directed analytic listening as advocated by Ashby, Maidment
and Abberton (1996) means that in effect, although the chunk as a whole may
present to the auditory-perceptual system many times, each item about which a
judgement is made is only an object of attention a much smaller number of times.
Nonetheless there are still risks. The phenomenon of ‘criterion shift’ (mentioned
in Section 5.5 above), in which category boundaries shift towards recent exem-
plars, is highly relevant in this regard.
Imitation of what one is hearing is a most useful strategy for making analytic
judgements about the data being transcribed. Catford (1977: 5–6) refers to this
process as ‘motor empathy’. Its usefulness is not dependent on accepting a motor
theory of speech perception, although it may well be the case that mirror neurons
are implicated in the process (see Section 5.7 above). According to some theo-
rists of language evolution, mirror neurons were key to the emergence of the
ability to imitate compound actions of the kind necessary for human language
(Arbib 2003). In Baddeley’s memory model, the ‘phonological loop’ contains
a ‘subvocal rehearsal process’ (Baddeley 2004: 3), allowing the hearer to copy
the contents of the loop. His suggestion is that this arrangement ‘evolved for the
purpose of language acquisition’. For purposes of the phonetic analysis of utter-
ances, this subvocal rehearsal process can be vocalised to identify the most prob-
able way in which the speaker produced what we judge we are perceiving, and
by doing so enable us to ascribe what we perceive to phonetic categories having
articulatory definitions. If, in my own estimation, I reproduce the ‘same’ sound as
I perceive, for example [b], then ascribing it to the category ‘plosive’ entails not
that the speaker produced it as a plosive, but only that producing it as a plosive
is the most probable and usual way, according to phonetic theory, to trigger the
auditory-perceptual object which we analyse as [b].
Digital audio technology provides further options for the transcriber in terms
of the speed and direction of playback. Speech can be slowed down or speeded
up while the original pitch is maintained through a process known as ‘warping’.
Slowed playback gives more time to make a decision about a sound, and faster
playback may help to get a more global sense of distributed coarticulatory
effects. Reverse play turns onsets into offsets and vice versa, which can help in
making decisions about diphthongs, affricates and clusters. Ladefoged (2003:
26–7) advocates listening to vowels at half speed and using reverse play if it
seems appropriate. It has to be appreciated, however, as Heselwood and Howard
(2008: 386) point out, that speech is not then being listened to in its natural
state and, furthermore, the analysis is starting to shift from listener-oriented to
speaker-oriented. That is to say, using these techniques is motivated by the same
desire to try to ascertain what the speaker’s vocal tract is doing as motivates the
use of articulatory instrumentation. The auditory-perceptual object is different
when the playback speed or direction is altered, which may mean that the tran-
215
scription can only claim to be a transcription of slowed or speeded-up speech,

or of reverse-play speech, the latter normalised by presenting it as a standard
left-to-right transcription.
5.13 Comparing Transcriptions and Consensus Transcriptions

Golestani et al.’s (2011) evidence suggests that a mixture of predisposition and
experience could lead to inter-subjective differences in the processing and result-
ing perception of speech sounds, and therefore to differences in transcription.
Whatever the reasons for transcriptional differences, the inherently subjective
nature of perception makes it advisable in much received opinion to seek inter-
transcriber agreement on impressionistic transcriptions. Critical scrutiny of this
opinion, however, does throw some doubt on the reasoning behind it. Firstly,
there is no logic in the argument that several subjective accounts add up to a
more objective account. Secondly, the phenomenalist stance taken in relation
to impressionistic transcription means that different transcribers cannot access
the same data, because the data for transcription are auditory-perceptual objects
existing only in the consciousness of the hearer. Seeking transcription agreement
makes sense if we are trying to establish that the speech sample sounds the same
to different phonetically trained listeners, which is certainly an interesting and
worthwhile aim. It does not, however, make sense to try to reach a consensus
transcription unless it is explicitly a speaker-oriented exercise.
Having said all that, it is nevertheless common for transcribers to compare
impressionistic transcriptions, and also common for them to try to reach a con-
sensus. In practice, most of us feel more confident about our transcriptional skills
if other phoneticians’ transcriptions are similar to our own, even if that feeling
is somewhat irrational. To compare transcriptions, or to reach consensus, it is
important that like is compared with like, and thus important that all the tran-
scribers did their transcriptions under the same conditions. The options available
to transcribers have been outlined in Section 5.12 above. On one view, the matter
is as simple as ensuring that everyone wore the same headphones, used the same
playback settings, listened the same number of times and so on. But there might
be an argument for saying that each transcriber should transcribe in the condi-
tions they find most amenable. If one person prefers closed-cup earphones and
another prefers free-field listening, then if headphones are insisted on for both,
one person is transcribing in what for them is an amenable condition while the
other is not. If we compare their transcriptions, are we really comparing like
with like? There seems to be no obvious answer to this question, but it might be
a question worth pursuing in a controlled experiment.
Two further aspects of the like-with-like issue concern transcription align-
ment and symbol interpretation (Cucchiarini 1996: 143–9), both of which make
any automated procedure of transcription comparison rather difficult. Table
5.2 illustrates the importance of alignment and interpretation. In column 1, two
impressionistic transcriptions are given of an atypical production of the English
word slippers. In column 2 they are aligned in a straightforward, symbol-by-
symbol, linear manner; in column 3 alignment has been rearranged according to
SomoudBarghouthy
TABLE 5.2: Alignments of variant transcriptions
1 2 3
Transcriptions a b c d e f g h i a b c d e f g h i
A. ɬ6)jeɪʔ͡pʰʃ˞ ɬ6) j ə ʔ͡pʰ ʃ˞ ɬ6) j ə ʔ͡p ʰ ʃ˞
B. s ( l̥ ɪəʔ pə̥ɹ̥z̥ s ( l̥ ɪ ə ʔ p ə̥ ɹ̥ z̥ s l̥ ɪ ə ʔ p ə̥ ɹ̥ z̥
˥ ˥ ˥
interpretations of the symbols so that like is compared to like. That is to say, align-
ment should take more heed of the transcription conventions than of the symbols.
The alignment in column 2 suggests very little agreement between the two
transcribers A and B. In fact, by the often-used formula for percentage agreement
in (5.3), where the criterion for agreement is usually using the same symbol, it
comes out at 0 per cent.
A × 100
(5.3) % agreement =
D+A
A = number of agreements
D = number of disagreements
In Table 5.2, the fricative segment in 2a is shown as partly voiced and lateral by
A but voiceless and median by B. In 2b, A has a voiced median palatal approxim-
ant where B has a devoiced alveolar lateral. In 2fghi, A has nothing correspond-
ing to the symbols in B’s transcription. Taking a more phonetically intelligent
approach to alignment, column 3 reveals considerable agreement, almost to the
point where they could be expressions of the same judgements about the same
auditory-perceptual objects. Even so, a simple symbol-by-symbol comparison
yields only 25 per cent agreement if no attention is paid to symbol options. The
contents of 3ab are almost identical except for absence of laterality at the start in
B’s version. Very similar tongue position is implied in 3c; in 3ef the difference
corresponds to phasing of labial and glottal gestures; and in 3ghi pretty much
the same component categories can be identified, although they are bundled into
symbols in different ways. Alternative bundling of categories into symbols is to
be expected, given the distributed nature of the information which ends up being
streamed into perceptual objects. The lesson to be learned from the example in
Table 5.2, and many others like it, is that when comparing transcriptions we have
to try to see beyond what is written on the page to evaluate how the categories
present in one transcription match up to those in another, by referring to the tran-
scription conventions and our general knowledge of phonetics.
In addition to the alignment issue, Cucchiarini (1996: 137) has pointed out two
crucial weaknesses in the application of the percentage agreement measure to
transcriptions. Firstly, it assumes that all disagreements are of equal magnitude,
so that [t]–[d] is no less of a disagreement than [t]–[ɡ]. Secondly, it takes no
account of how chance effects differ according to how many categories transcrib-
ers are choosing from. In a phonemic transcription the choice may be /t/ or /d/,
where chance agreement is 50 per cent, but in a narrower transcription choice
might be [t]–[t̪]–[tʰ]–[d]–[d̪]–[d̥], where chance agreement reduces to 16.7 per
cent. This can in principle be corrected for by the kappa (κ) coefficient in (5.4):
Po – Pe
(5.4) κ =
1 – Pe
Po = observed % agreement
Pe = expected % agreement
Cucchiarini (ibid.: 142) points out, though, that it is only possible to work out
expected agreement if all the options are independent and have an equal chance
of selection eachAtime one of them is chosen, but in phonetic transcription there
(5.3) agreement 5
are syntagmatic 3 100 which prevent this condition from being met.
D 1dependencies
A
Chance effects therefore cannot properly be calculated in percentage agreement
procedures applied to phonetic transcriptions.
To avoid the problems of percentage agreement measures, Cucchiarini pro-
poses using distance matrices. If it is possible to quantify how similar one sound
is to another, then [t]–[d] can be shown to be less of a disagreement than [t]–[ɡ],
for examplePin o 2sharing
Pe the same place of articulation. Distance matrices can
(5.4) be builtk on
5 judgements of articulatory distance (Vieregge, Rietveld and Jansen
1 2 Pe
1984) or on auditory-perceptual distance (Picone, Goudie-Marshall, Doddington
and Fisher 1986), and could also be constructed according to acoustic distances if
it were felt that would serve any purpose. Degree of agreement is then expressed
as an ‘average-distance metric’ with the formula in (5.5):
Na
(5.5) 1 N
(5.5) D5 di
i51
D = average distance, N = number of symbol pairs

d = distance between pair members, i = index
Returning to the transcriptions in Table 5.2, applying the distance matrices given
in Cucchiarini (ibid.: 154–5) as far as possible (no matrices are provided for
diacritics), the value for D is 7.17 when the transcriptions are not aligned (the
maximum score of 14 is given when a symbol is paired with zero), but only 0.42
after alignment (taking the mean where one symbol is paired with two symbols,
and assuming no distance between [ʰ] and [ə̥]).
The average-distance measure better captures our judgements about the
similarity of the two aligned transcriptions than does the percentage agreement
measure, but until there is a widely agreed method of expressing distances
between sounds numerically, different distance matrices will give different
values for D. In Vieregge et al. (1984), distances were established empirically
in a listening experiment in which subjects made proprioceptive judgements. It
is quite likely that familiarity with sounds presented for judgement will have
an influence on how similar or different they seem, meaning that the ideal of
empirically established, universal, general phonetic distance matrices is prob-
ably a remote prospect. A theoretical approach could be taken instead, whereby
SomoudBarghouthy
categories are placed at points in domain-neutral, multidimensional, taxonomic
phonetic space, like an enormously elaborate and all-inclusive IPA chart, with
adjacent points separated by what we could call one unit of phonetic distance
(UPD). However, to set up such a scheme is unrealistic; it would doubtless
occasion disagreement about how many categories there are in phonetic clas-
sification, particularly categories on ordinal scales (Ashby 1990: 24), and over
which categories should be placed where; and it would have to be rejigged and
distances recalculated if new categories were introduced. The benefit of being
able to quantify phonetic distance is that any two or more transcriptions could be
compared for closeness of agreement in a reliable and consistent manner and this
could be done automatically by computer. Meanwhile, transcribers are probably
better advised to compare their transcriptions qualitatively first by discussing
disagreements and identifying those which are really down to alternative ways
of symbolising the same or extremely similar denotata, for example [ɥ]–[jʷ]–[wʲ]
(Heselwood and Howard 2008: 388). Where a measure of agreement has to be
provided, for example in a publication using transcriptional data, any percentage
agreement figure or average-distance metric should be augmented by a short
account of the kinds of disagreements which could not be resolved and how they
impact on the conclusions drawn from the transcriptions.
The notion of a consensus transcription seems to legitimate the idea that there
is a ‘correct’ transcription, and that this is more likely to be reached by a collec-
tive effort than by individual effort. It was pointed out in Section 5.9 above that
there is no privileged ground from which to claim that a particular impression-
istic transcription is more accurate than another. Nobody is in possession of the
kind of facts which could validate an impressionistic transcription – if they were,
there would be no point in going to the trouble of making an impressionistic
transcription: we could just ask for the facts.
There are two approaches to deriving a consensus transcription from the tran-
scriptions of different transcribers. One is to try to identify a point in phonetic
space towards which a set of variants seem to be converging – transcriptions
derived in this way have been called ‘compromise transcriptions’ (Shriberg et al.
1984: 458); the other is to eliminate what the variants do not have in common – I
shall call these ‘common transcriptions’. The first of these approaches is advo-
cated by Shriberg et al., who propose, as one of their seventeen ‘consensus
rules’, that in the case of two variant transcriptions, a sound ‘somewhere midway
between the two transcriptions’ should be adopted (ibid.: 461). There are two
problems with this compromise approach. Firstly, it will not always be clear in
multidimensional phonetic space what point is being converged on, and whether
we should be considering articulatory or auditory space. If the variants are [h]
and [ɸ], for example, then we could simply take the midpoint on the IPA chart
and agree [ʂ], but this ignores the dimensions of lip-shape and tongue-tip raising
as well as the auditory feature of sibilance; it also ignores the auditory similar-
ity of [h] and [ɸ], suggested by their allophonic relation in languages such as
Japanese (Okada 1999: 118). And how should we handle [t’], [ǀ], [t͈˭] where dif-
ferent airstream mechanisms are involved? Shriberg et al.’s own example of [d]
and [ð] yielding the compromise [d̪] is somewhat arbitrary – one could also com-
promise with [ð ‗]. Secondly, adopting compromise variants means that the tran-
scription expresses judgements that none of the transcribers made. Proponents
219
of this approach need to explain how judgements that were not made can have
greater validity than ones which were.
The ‘common transcription’ approach, eliminating categories not present in all
the variants, avoids the problems of compromise transcriptions but loses resolu-
tion, sometimes catastrophically. Table 5.3 shows variant transcriptions by four
transcribers of the realisation of the lateral reflex of the Arabic voiced emphatic
dental fricative /ðˁ/ produced by a speaker of the Rijāl Almā’ dialect in south-
west Saudi Arabia (from the working records of Watson et al. 2012).
TABLE 5.3: Comparison of variant transcriptions and what they have in common
1 2 3 4 Commonality
l
zˁlˤ ɮˁ lˁ ðˁ All: voiced pharyngealised coronal continuant with
laterality
Majority: fricative component (1, 2, 4)
Disagree: dental (4)/alveolar (1, 2, 3); sonorant (3)/fricative
(1, 2, 4); apical contact (2, 3)/grooved (1)/slit (4);
simultaneity of lateral and central airflow (4)/lateral only
(2, 3)/central-lateral sequence (1)
What these transcriptions have in common could be represented reasonably

well as [(L‗¯ˤ )], which glosses as ‘some sort of alveolar lateral with pharynge-
alisation’. But it misleads one into thinking that none of the four transcribers
perceived a central component in the articulation. To be able to make more accu-
rate common transcriptions requires the introduction of symbols to cover two or
more categories, which in the above case would mean a category covering lateral
and lateral + central articulations. It would not be practical, however, to do this
because it would have to be done for all the hundreds, or thousands, of possible
cover categories. Analogical notation systems, where each category is denoted
by a separate symbol component, might allow for common transcriptions more
easily by leaving out non-agreed components, but they have been found too
inconvenient for other reasons and have never really got further than the design
and illustration stages. A consensus transcription based on this approach there-
fore runs aground due to a lack of transcriptional resources for expressing cover
categories and has to resort to non-symbolic expressions. In extreme cases, there
may be no categories in common across the variants. Nothing could be expressed
in the transcription except ‘indeterminate sounds’.
To conclude this section, there is certainly value to be had in comparing tran-
scriptions intelligently and discussing differences. It can also be instructive to
present all the variants for others to see. What seems far less certain is whether
there is much value in deriving compromise or common transcriptions, unless
we are taking a speaker-oriented perspective and trying to establish what the
speaker’s vocal tract actions were by using our ears as substitutes for more accu-
rate and reliable technologies. Even so, care has to be taken that the results of
consensual processes are phonetically coherent and informative. The coherence
SomoudBarghouthy
of deriving [ʂ] from [h] and [ɸ] is, to say the least, doubtful. In a clinical setting
where the transcription is intended to guide intervention, say with a child having
problems realising /s/ due to a cleft palate, it would be far less than helpful.
Regarding common transcriptions, they are only informative in so far as there is
agreement between transcribers. The more disagreement there is, the less detailed
the derived common transcription becomes and the less information it contains.
5.14 Are Some Kinds of Data Harder to Transcribe

Than Others?
There are two related aspects to the question in the heading of this section. One
concerns different kinds of sounds and whether some are inherently more prob-
lematic than others when it comes to impressionistic transcription, and the other
concerns different kinds of speakers and speech data.
From experience and from research it seems that not all classes of sounds are
equally easy or difficult to analyse and transcribe. Vowels have always been
seen as particularly tricky. The early eleventh-century Persian scholar Ibn Sīnā
(Avicenna), after having presented analyses of the consonants of Arabic, admit-
ted in exasperation that ‘[a]s for the vowels, their conditions seem to escape
me’ (Semaan 1963: 48–9). John Wilkins (1668: 363) identified eight ‘easily
distinguishable’ vowel qualities but declared that any more would ‘prove of so
difficult distinction, as would render them useless’. William Holder (1669: 80–1)
conceded that ‘very many’ vowel qualities are possible but also thought only
eight were distinctive, and judged the vowels the most difficult class of sounds
‘to be discerned and described’ (ibid.: 27). One cannot but help remarking here
that there are exactly eight primary cardinal vowels in Daniel Jones’s system,
four front and four back. Although Bell (1867: 71–2) had developed a classifica-
tion system for vowels (notably with nine primary qualities) which Sweet (1881:
184–5) praised as ‘perfect’, nevertheless in 1911 Rippmann was exhorting that
‘we must pull ourselves together, for we have come to the vowels, and they are
very troublesome’ (Rippmann 1911: 32). Vowels are harder to analyse and clas-
sify than consonants because the identities and relationships of active and passive
articulators, which form the basis for consonant classification, are not so easy to
establish. The tongue is further away from the superior speech organs such that,
firstly, it is not obvious from proprioception and kinaesthesia where the parts of
the tongue are in relation to the parts of the palate and pharynx, and secondly,
the acoustics of vowels are not tied so closely to specific constriction locations,
being more a function of the distribution of volume across the buccal and phar-
yngeal chambers and the area of the opening to the outside world at the lips. In
addition, Ball et al. (1996: 8) point out that for vowels ‘there is almost an infin-
ity of possible permutations of tongue and lip positions’. The less peripheral the
vowel, the more these problems of analysis tend to increase (Eisen, Tillman and
Draxler 1992; Maassen, Offereinga, Vieregge and Thoonen 1996), and among
peripheral vowels transcribers usually experience more difficulty distinguishing
between the low back ones (O’Connor 1973: 110). Furthermore, F0 can affect
perception of vowels in ways which are not seen with consonants (Carrell et al.
1981; Maurer et al. 1993), as can voice quality (Lotto, Holt and Kluender 1997).
The spectral tilt in breathy voice is characterised by the first harmonic having a
high amplitude, which makes F1 sound lower and the vowel consequently higher.
Long vowels and diphthongs tend to be easier to deal with than short vowels
(Norris, Harden and Bell 1980; Pye et al. 1988). Difficulties transcribers have
with vowels are responsible for consonants being transcribed more narrowly and
vowels more broadly in many transcriptions (Crystal 1982).
Eisen et al. (1992) report that transcriptions of voiced laterals and nasals
tend to be more consistent than transcriptions of other types of consonants.
Interestingly, both these classes are marked by anti-resonances in their acoustic
spectra, and their members are sometimes found in cognate forms – for example
Spanish naranja and Portuguese laranja ‘orange’, German Orgel and English
organ – and in second language substitutions (e.g. Wong and Setter 2002; Zhang
2007), suggesting they form a perceptual class. Maassen et al. (1996) also iden-
tify nasal consonants as relatively easy, along with oral stops, compared to frica-
tives and affricates, which they found to be more challenging for transcribers.
Place of articulation of oral stops is one thing, but making fine judgements about
voicing and aspiration can complicate matters. For fricatives, not only place
of articulation but also what Laver (1994: 140) calls aspect of articulation can
be quite hard to make judgements about. Fricative spectra are caused by high-
pressure air continuously slamming into the body of stationary air just in front
of the articulatory constriction. The size and shape of the channel through which
the high-pressure air exits have a crucial effect on the quality of the perceived
sound, as do the size of the chamber containing the stationary air and the reflec-
tive properties of its surfaces. These factors add further dimensions to phonetic
space and make auditory-perceptual analysis more complicated.
The second aspect of the question in the heading of this section concerns dif-
ferent kinds of speakers and speech data. In Section 5.11 the focus was on the
transcriber, and one question was whether a transcription is more valid if done
by a speaker of the language or language variety being transcribed, or at least
by someone familiar with it. I concluded that there is no single answer to this
question, and that it depends on the purpose of the transcription. Looking at it
this time with the focus on the difficulty of transcription, the question is whether
it is easier to transcribe a variety one is familiar with than an unknown variety.
Unsurprisingly perhaps, there is no single answer to this question either, except
to say that one is likely to encounter different kinds of difficulties in each case,
connected with influences from one’s own linguistic and phonetic experience.
Speech can be typical or atypical, the latter generally being found when a
speaker has some developmental or medical condition preventing their speech
from being typical of their speech community. Transcription of atypical speech
poses problems not unlike the transcription of an unknown language (Heselwood
and Howard 2008: 383–4) in that we should not make any assumptions about
its probable phonetic content, even if we know the medical condition affecting
the speech (ibid.). When atypical speech is unintelligible the similarity with an
unknown language is particularly obvious. Atypical speech data are probably the
most common type of data to be analysed and transcribed impressionistically.
It is especially difficult to transcribe speech when the speaker has an atypical
vocal tract structure or when the vocal organs function atypically. Some atypical
SomoudBarghouthy
speech data defy transcription because of shortcomings in the notation system,
but some may even defy phonetic analysis if phonetic theory has not anticipated
them. The development of the ExtIPA symbols and conventions and the VoQS
set was prompted by the demand for transcriptional resources to cope with atypi-
cal speech and, like the IPA, they are under regular review to respond to new
demands.
The vocalisations of infants and the early speech of young children have
been identified as having their own special difficulties for transcription such that
researchers have questioned whether the symbols and conventions for transcrib-
ing adult speech are appropriate (see Chapter 3 Section 3.4.11). In dealing with
infant and child data, transcribers have to be conversant with the special analytic
categories which have been developed to account for them and the symbols used
for representing them, and be able to make appropriate judgements when listen-
ing.
Many transcribers are less comfortable transcribing suprasegmental speech
phenomena than they are with segmental transcription. Vaissière (2005: 253)
notes that language users have great skill in detecting and interpreting intona-
tional distinctions which are just too subtle for easy identification and measure-
ment. Ability to make analytic judgements about intonation, however, varies
across individual transcribers, but can be considerably improved with instruction
and practice. Anecdotal evidence from teachers of practical phonetics suggests
that musical experience and training are beneficial, but may actually make recog-
nition of phonologically motivated categories of intonational tones more difficult
when F0 peaks are not aligned strictly with accented vowels, as Beckman and
Venditti (2010: 621–6) point out is often the case.
SomoudBarghouthy
6
e
Phonetic Transcription in Relation to
Instrumental and Other Records
e
6.0 Introduction
Although experimental phonetics using specially designed technological devices
dates back to the early nineteenth century and had made sophisticated advances
in the work of Rousselot, Scripture and Panconcelli-Calzia by the late nineteenth
and early twentieth centuries, the necessary equipment was only available to very
small numbers of researchers. The situation remained like this until the second
half of the twentieth century. The first instrument to make a huge impact on
phonetic research was the sound spectrograph, developed at the Bell Telephone
Laboratories in the US, which started to become publicly available in the late
1940s (Koenig, Dunn and Lacy 1946), enabling precise measurements to be made
on broad-band spectrograms of such key acoustic properties as vowel formant
resonances, fricative spectra, and vowel and consonant durations, and measure-
ments of F0 on narrow-band spectrograms. Never before had these phenomena
been made so readily visible and quantifiable to phoneticians. But it is since the
late 1980s or so that instrumental means of investigating the phonetic structure of
speech have become much more widely accessible, as part of the ‘digital revolu-
tion’ in computer technology and its applications. In addition to spectrography,
we now have computerised instruments that reveal articulatory activities, such
as palatography, articulography, laryngoscopy, laryngography and ultrasound
imaging. Information gained in these ways can be displayed for all who have
sufficient phonetic training to see and interpret. Unsurprisingly, these new
windows onto the objective properties of speech have had the effect of relegat-
ing auditory-perceptual analysis from the league of scientific methods, and either
making phonetic transcription obsolete or co-opting it as a means of representing
the results of instrumental analysis. Instrumental information quickly came to be
seen as a means of checking the validity of auditory-perceptual analyses, on the
assumption that the objects of instrumental and auditory-perceptual analyses are
the same. After looking at how phonetic transcription can be used in the service
of instrumental analyses, I shall, on the back of the case presented in Chapter 5
for the value of impressionistic phonetic transcription, argue that this assumption
is misconceived, and that impressionistic transcription and instrumental records
SomoudBarghouthy
have a complementary relationship rather than a competitive one (Howard 2011:
135).
It is good practice when presenting most instrumental records of speech to
provide an aligned transcription, exploiting the convention that time runs from
left to right in transcriptions and in instrumental records that have a temporal
dimension. Software packages for instrumental analysis often provide a facility
for adding aligned transcriptions to the display, as in Figure 6.1.
FIGURE 6.1: Praat waveforms, spectrogram and labelled text grids for
segmentation and annotation. With kind permission from Barthel (2013)
Alignment of discrete symbols with continuously changing dynamic records

has to contend with the fact that the parameters of speech ‘are neither instantane-
ous nor aligned simultaneously’ (IPA 1999: 35). Segmentation lines demarcat-
ing ‘segments’ have to be interpreted loosely as indicating the portion of the
record which displays phenomena most closely associated with the categories
denoted by a particular symbol. Traditional segmentation lines do not allow for
overlap of phenomena associated with different symbols, but this can be shown
by adapting the labelled ‘curly braces’ convention from the ExtIPA symbol set.
Figure 6.2 presents an example of a spectrogram with aligned symbols where
overlap is indicated in this manner. For convenience, the transcription is arranged
in tiers, which can be done systematically to show different classes of sounds
on each tier. In Figure 6.2, the first tier has obstruent symbols, the second has
sonorant consonant symbols and the third has vowel symbols. This multi-tiered
arrangement can be made to suit the syntagmatic structure of the utterance by not
assigning classes to the same tier which commonly occur in adjacent positions.
Something of the dynamics can be expressed by placing different symbols at the
opening and closing curly braces, as has been done using {d d˞},{ɹ̝ ɹ}, {ə ə̃}, {ɱ
ɱ̊}, {l̥ l}; more could be added between them if one so wished.
Aligned transcriptions can perform a number of functions which are discussed
in the following sections.
SomoudBarghouthy Relation to Instrumental and Other Records 225
FIGURE 6.2: Spectrogram of a dragonfly with aligned multi-tiered

transcription showing segment overlap
6.1 Instrument-Dependent Transcriptions

Every instrumental record is a record of a specific utterance, therefore instrument-
dependent transcriptions are by definition specific transcriptions. Like any spe-
cific transcription, they can be relatively broad or narrow. They can also be
systematic or general phonetic, depending on whether the transcriber knows
which phonological system lies behind the data and uses that knowledge
explicitly to represent language-specific system categories in the transcription.
It is often useful to provide a narrow general phonetic transcription and also a
systematic one so that the details in the instrumental record can be related to
phonological structure (Abercrombie 1954/1956: 112–13; see also Barry and
Fourcin 1990: 36–8); an additional orthographic transcription can be added to
relate the specific utterance to its system sentence, resulting in ‘multilayered’
transcriptions (Müller and Ball 2006). Transcriptions representing non-segmental
aspects of speech such as intonation, stress-accent, rhythm, tempo, amplitude and
voice quality can be used in the interpretation of instrumental records in addition
to segmental transcription.
Two types of instrument-dependent transcriptions, namely instrument-
determined and instrument-informed, were identified in Chapter 4 Section 4.12,
so I shall discuss and exemplify these in turn. Then in Section 6.2 a distinction
common to both types of instrument-dependent transcriptions is drawn on the
basis of their functions, whether annotating or summarising. The difference is not
a great one but I believe it is worth distinguishing between them.
6.1.1 Instrument-determined transcriptions
Transcriptions are instrument-determined if they take as the data to be tran-

scribed a record of instrumental analysis of speech instead of the original speech
itself. That is to say, a phonetician skilled in reading instrumental records can
SomoudBarghouthy
make a transcription without hearing the original speech. The relationship of the
transcription to the speech is indirect, as shown in Chapter 4 Figure 4.8, via the
direct relationship with the instrumental record. The task of reading ‘mystery
spectrograms’, often set for students of acoustic phonetics, usually includes
making instrument-determined transcriptions as a way of expressing their solu-
tions to the mystery.
The purpose of instrument-determined transcriptions is to interpret the instru-
mental record in terms of the categories of phonetic theory. That is to say, a
symbol denotes a phonetic category, and not some part of the graphics of the
display, because the display represents an analysis of an utterance as carried out
by the instrument. In Figure 6.4 below, for example, [ʰ] denotes aspiration, not
some category of visual marking, because we know that that category of visual
marking is employed by the instrument to represent the kind of noise made by
aspiration. Instrument-determined transcription is therefore a kind of translitera-
tion from one graphic representation to another where both have, or are taken to
have, the same denotata in terms of the categories of phonetic theory.
Transcriptions of records showing articulatory data are speaker-oriented
but those of acoustic records, as mentioned in Chapter 4 Section 4.2, are
best regarded as signal-oriented because of the difficulty of inferring speaker
behaviours from acoustic analysis. An important point about the use of pho-
netic notation in instrument-determined transcription is that typically it both
over-analyses and under-analyses the data compared to its use in instrument-
independent impressionistic transcriptions of speech. Over-analysis comes from
the fact that instrumental records only show those aspects of speech which they
are capable of showing and are designed to show. Palatography, for example,
only displays patterns of contact on a defined area of the palate. It does not
show what made the contact, and it does not tell us anything about lip-shape,
phonation or resonance. Nevertheless, transcriptions aligned with palatograms
generally use symbols denoting categories for which there is no evidence in
the display – the integral nature of IPA symbols makes this in many instances
inevitable. Conversely, instruments reveal details for which no symbols are
available, so in this respect the instrumental data are under-analysed. Thomas
Carlyle’s insight that symbols both reveal and conceal is highly relevant in this
context.
By way of exemplifying these points, Figure 6.3 contains seventeen palato-
graphic frames showing the approach (frames 304–8), hold (frames 309–16) and
release (frames 317–20) phases of a lateral articulation. The hold phase shows
asymmetrical contact quite typical of an alveolar lateral consonant where the
air is exiting from the right-hand side. If we use a symbol for an alveolar lateral
such as [l] or [ɬ] we are specifying whether it is voiced or voiceless, sonorant
or obstruent, but none of these features can be read from the frames. To avoid
this we can use the ExtIPA ‘indeterminate’ notation with a capital L and double
underline – [(L ¯‗)] – to denote just ‘alveolar lateral’. The frames are quite detailed
as to where the palate is being touched, i.e. much further back on the left than the
right, but these details cannot be represented in alphabetic symbols such as the
IPA or ExtIPA, which have no means of representing asymmetrical articulation.
Organic notation fares rather better in this respect. Bell’s ‘front mixed divided’
SomoudBarghouthy Relation to Instrumental and Other Records
¯‗)]
[(L
227
FIGURE 6.3: Palatographic frames showing onset, steady state and offset
of a lateral articulation. There is asymmetrical tongue contact during the
steady state, but some symmetry can be seen in the way the dynamics of
the onset are reproduced in the offset. The symbol denotes only ‘alveolar
lateral’.
symbol with Sweet’s ‘unilateral modifier’ gives us [], but the absence of the
voiced symbol implies voiceless. Anyone wishing to resurrect organic phonetic
notation might have some success in the context of instrument-determined tran-
scription, where there is a clear argument to be made for its usefulness.
Another factor responsible for mismatches of detail between instrumental
records and instrument-dependent transcriptions is the dynamic nature of the
former compared to the essentially static nature of the latter if non-parametric
notation is employed. We have seen in Figure 6.2 how some limited dynamism
can be incorporated into segmental transcriptions. The top and bottom rows of
Figure 6.3 show the approach and release phases of the lateral articulation, which
prove too much of a challenge to any non-parametric transcription. Consequently,
what could be significant facts in the instrumental record, for example compar-
ing the dynamics and symmetry of approaches and releases in articulation,
cannot be represented in transcriptional form. The distinction between phonetic
description and phonetic classification (O’Connor 1973: 125–8; Howard and
Heselwood 2013: 73–9) is clearly pointed up here. Phonetic categories derive
SomoudBarghouthy
from taxonomic classification, which focuses on whatever are thought to be the
defining aspects of a sound’s production which distinguish it phonetically from
other sounds. We have seen in Chapter 3 Section 3.4.5 that in modern phonetic
theory the location and degree of a constriction in the vocal tract during the
hold phase of a sound, in so far as a hold phase can be identified, are taken to
be the most important criteria for classifying a sound and for characterising the
speaker’s articulatory ‘targets’. Because symbols denote categories, a wealth of
detail which would appear in a complete articulatory description of how a sound
is produced is excluded in segmental transcriptions. In principle, a parametric
transcription could incorporate many of these details because it is not giving any
privileged status to steady states, notional or real, over approaches and releases
and other activities regarded from a segmental point of view as transitions, and
thus taken to be of lesser importance for classification. The practical complica-
tions of drawing parameters to show these kinds of details are, however, usually
enough to deter attempts by transcribers to do so.
6.1.2 Instrument-informed transcriptions
If information from instrumental records has been consulted in the course of

making an impressionistic transcription to help the transcriber make transcrip-
tional decisions, then it is an instrument-informed transcription but not an
instrument-determined one. The relationship of the transcription to the original
speech data is different from that in instrument-dependent transcription because
it includes direct analysis of the speech, as well as indirect analysis via direct
analysis of the instrumental record (see Figure 4.8 in Chapter 4). The resulting
transcription is a mixture of symbolising the contents of the transcriber’s percep-
tual objects and the contents of the instrumental record. Normally it is impos-
sible to recover afterwards from the transcription which is which, this being
one argument for the kind of multi-tiered transcription advocated in Section
6.4 below. During instrument-informed transcription, the balance between
perceptual and instrumental inputs probably varies throughout the process. At
one point the transcriber may be relying predominantly on the one, using the
other as a check, and at another point reversing this balance. For example, there
may not be enough information in the vowel formants on a spectrogram to
decide if the vowel is [æ] or [ɛ], or it may be difficult to decide from a fricative
spectrum whether one is dealing with [ʒ] or [ʓ]. One then listens to the speech
recording to decide. Conversely, listening to a syllable with a plosive onset may
leave one undecided between lightly voiced or devoiced. One then looks at the
waveform and spectrogram, where the presence or absence of voicing is likely
to show up.
The motive for using instrumental methods to investigate articulation is to
discover facts about the production of speech; any transcription using this infor-
mation is thus speaker-oriented. It is therefore logical in instrument-informed
transcription for the instrumental evidence to be given greater weight. If the
transcriber hears [l] but palatograms show no tongue–palate contact in the
alveolar region, then [l] should not appear in an instrument-informed transcrip-
tion. An acoustic record might, however, show a resonance pattern consistent
with [l], appearing to validate the transcriber’s perception. This exemplifies the
suggestion made before that we should distinguish between speaker-oriented
and signal-oriented transcription, because, in a speaker-oriented transcription,
if we transcribed [l] on the acoustic evidence in these circumstances we would
be factually wrong about what the speaker did. We would not, however, be
factually wrong about what the acoustic record contained. A similar argument
is put in Section 6.4 concerning auditory-perceptual transcription. Instrument-
dependent transcriptions are therefore either speaker-oriented or signal-oriented,
and the facts being represented in the transcriptions are either articulatory or
acoustic facts respectively, although the latter will often be couched in articula-
tory terms.
6.2 Functions of Instrument-Dependent Transcriptions

To show that a transcription is instrument-dependent, and that the focus is
on the instrumental data, the transcription symbols can be placed beneath the
instrumental record. There are basically two functions that instrument-dependent
transcriptions may be used for. As stated in Section 6.1 above, the distinction is
not a fundamental one but is in my opinion nevertheless worth making.
6.2.1 Annotating function
When a transcription annotates an instrumental record, it has little or no value on

its own. Often one is only interested in specific features of an instrumental record
and therefore one annotates only those features. For example, in the spectrogram
and waveform in Figure 6.4 only the aspirated plosives have been transcribed.
Presented on their own, they are out of context. In an acoustic study of aspirated
plosives it will be useful for the researchers in their private working records
FIGURE 6.4: Example of an annotated spectrogram and waveform

incorporating measurement data
SomoudBarghouthy
to identify where all the tokens of aspirated plosives are. A convenient way to
do this is by using phonetic notation to label the relevant parts of the display.
They can then be located easily later for further spectral and temporal analysis.
Annotated records are also useful in publications and presentations for drawing
readers’ and audiences’ attention to the relevant parts of the display. They can
incorporate numerical data as well, as has been done in Figure 6.4 for the dura-
tions of aspiration.
In Figure 6.5 a series of palatographic frames is shown over a spectrogram and
waveform from an utterance of the phrase mish gdar ‘was not able to’ in Libyan
Arabic. The point of interest here is the articulatory overlap resulting in a short
period when there are simultaneous alveolar and velar closures in a /ɡd-/ onset
cluster sequence. The evidence for temporal overlap of gestures is contained in
frame 168, where it can be seen that there are complete closures across the first
four rows and the last row of the palate. The closure on the last row marks the
end of the velar closure seen in the preceding frames, and the anterior closure is
the start of the alveolar occlusion, which continues through to frame 175. With
a sampling rate of 100 frames per second, we can work out that the total overlap
lasted up to 20 ms, which we can indicate in the annotation. Information about
the overlap is not present in the acoustic record because the release of the /ɡ/ is
FIGURE 6.5: Acoustic and palatographic displays of Libyan Arabic

/miʃ ɡdar/ ‘was not able to’ showing total overlap of alveolar and velar
articulations in frame 168 and the release of /d/ in frame 176; arrows
show the respective time points on the spectrogram. Adapted with kind
permission from Shitaw (in preparation)
masked by the more anterior closure, which also makes it impossible to measure
231
the respective durations of the stops on the waveform or spectrogram.

Figure 6.6 shows a spectrogram and waveform of the Libyan Arabic word
wagt ‘time’ in which an epenthetic [ə] intrudes between the coda consonants.
The aligned graph beneath is constructed from analysis of palatographic data
to represent the amplitudes and time courses of the tongue-back gesture for /ɡ/
and the tongue-tip gesture for /t/. It tracks the percentage contact in the posterior
and anterior parts of the palate respectively. What it reveals is that, despite the
presence of a vowel between the consonants, the gesture for /t/ begins before the
gesture for /ɡ/ has reached the end of its downward trajectory. The [ə] symbol
on the graph is placed to show that the overlap of the gestures occurs during
the vowel. The information in the display helps to support an analysis in which
the presence of the vowel is explained as the product of a looser timing relation
between the gestures than is seen in Figure 6.5.
Another example of annotated records is given in Figure 6.7. The utterance is
of the Arabic word /saʕiːd/ ‘happy’ by an Iraqi speaker. The dotted line on the
FIGURE 6.6: Acoustic display of Libyan Arabic wagt ‘time’ with

epenthetic [ə] separating /ɡ/ from /t/. Graph showing partial overlap of
tongue-back gesture (diamonds) and tongue-tip gesture (squares). [ə]
intrudes between the gesture maxima. % contact scale on y-axis, time on
x-axis. Adapted with kind permission from Shitaw (in preparation)
SomoudBarghouthy
FIGURE 6.7: Spectrogram, waveform, laryngoscopic images and

spectrum (FFT and LPC) of the Iraqi Arabic word /saʕiːd/ ‘happy’
realised as [saˁʕ̆iːd]. (i) glottis open for [s], epilaryngeal tube relatively
unconstricted; (ii) cuneiform cartilages (C) approximating during [aˁ];
(iii) cuneiforms meet and make contact with retracting tubercle of
epiglottis (E) for 17 ms during [ʕ̆]; (iv) they move away again during [iː].
Spectrum is from the time at the dotted line during the tap. See text.
spectrogram shows where the spectrum was taken from, and the laryngoscopic
233
images show the configuration of the epilaryngeal tube at four points in the time
course of the utterance. The images are shown with the back of the pharynx
at the top. Image (iii) is the point of maximum closure during the pharyngeal
tap, which is asymmetrical – the left cuneiform cartilage makes more complete
contact than the right one. The sound file and laryngoscopic video from which
the frames were extracted were kindly supplied by John Esling and Zeki Majeed
Hassan and contain what is probably the first ever laryngoscopically observed
example of a pharyngeal tap and the first confirmation of this articulation (see
Esling forthcoming). Esling (2010: 696, 700) proposes the symbol [ʕ̆] for it
and describes the action as an inward flexing of the aryepiglottic folds with the
cuneiform cartilages functioning as the elbows of the mechanism (John Esling,
personal communication).
Figure 6.8 contains an annotated waveform and spectrogram in which atten-
tion is drawn by the segmentation lines to a lenited realisation of English /t/. The
transcription [t ̞̞] with a double ‘lowered’ diacritic is informed by the acoustic
analysis to distinguish it from the preceding realisation of /s/.
FIGURE 6.8: Annotated waveform and spectrogram focusing on a

particular realisation of English /t/. With kind permission from Buizza
(2010: 44).
6.2.2 Summarising function
A summarising transcription can be meaningfully presented without the instru-

mental record in the same way that a direct transcription of speech can be pre-
sented without the audio recording. A competent summary of an instrumental
record should tell a reader of the transcription the significant information which
he or she would get from looking at the original instrumental record itself.
Meeting this condition, it can stand in place of the instrumental display, whereas
SomoudBarghouthy
an annotating transcription cannot. In Figure 6.9, the top of the display shows
changes in acoustic intensity by the thickness of the line, pitch is represented
by the Fx trace derived from a larynx waveform, and the Qx trace derived from
the same waveform shows closed quotient values; breathy voice is assumed for
values below 40 per cent. Below the orthographic transcription, the information
in the three display lines is summarised such that it can be gleaned from the tran-
scriptions without reference to the displays.
FIGURE 6.9: Intensity, Fx (pitch) and Qx (closed quotient) traces from an

utterance of What are you talking about? annotated with ExtIPA, IPA and
VoQS notation
6.2.3 Corpus transcriptions
In addition to summarising instrumental information about a specific utter-

ance, transcription can be used to summarise a corpus of phonetic data. If, in an
acoustic study of vowels, for example, formant values have been averaged from
a corpus to derive a mean F1 and F2, a vowel symbol could be used to express
the resulting averaged vowel quality. We can call this a corpus transcription.
When presenting corpus transcriptions it needs to be made clear to what extent
the tokens which went into the calculation had formant values consistent with
the transcription.
A corpus transcription is by definition generic because it is not tied to a partic-
ular utterance. It will be either speaker- or signal-oriented depending on whether
the data are articulatory or acoustic, and it can be systematic, in which case it
is likely to be broad. However, it could in principle be very narrow because it
expresses an object which is the same size as a single utterance. A mean F1–F2
coordinate will occupy a specific point in the plane, which one might identify as
between [a], [æ] and [ɐ] and decide to express as [a˔]. A narrow corpus transcrip-
tion of this kind will be neither impressionistic nor̄ systematic. It is not impres-
sionistic because it does not express a perceptual object, and it is not systematic
because the details are not being supplied by language-specific conventions.
6.3 Indexed Transcriptions

Instrument-dependent transcriptions privilege the instrumental record over the
transcription, but this relationship is reversible so that the focus can be on the
transcription as the main expression of the analysis. Instrumental records can be
indexed to a particular part of a transcription to provide extra insight about the
data (Howard and Heselwood 2013: 94). To show that the focus is on the tran-
scription, it can be placed above the instrumental records. In Figure 6.10, an FFT
spectrum averaged over 35 ms of pharyngeal constriction and a larynx (Lx)
waveform of six glottal cycles from the same portion of the Arabic word
/waʕʕad/ are indexed to the relevant part of an impressionistic transcription.
These instrumental records show us that the phonation during [ʕː] has the acous-
tic and articulatory characteristics of breathy voice, i.e. negative spectral tilt and
long open quotient (Hayward 2000: 231–6), although breathiness is not repre-
sented in the transcription because it was not perceived as breathy. A mismatch
like this between impressionistic analysis and instrumental analyses should
prompt the research question ‘Why?’ (see Section 6.4 below).
The transcription to which records are indexed does not of course have to be
impressionistic. It could be any type of transcription whatsoever, but there is a
FIGURE 6.10: Averaged FFT spectrum (left) and laryngogram (right)

indexed to a specific transcription of the Arabic word /waʕʕad/ ‘to make
someone promise’ showing voice quality features in the realisation of
the geminate pharyngeal /ʕʕ/: spectral tilt of −5.2 dB, and long open
quotients averaging 64 per cent in the larynx waveform, both indicative of
breathiness
SomoudBarghouthy
particular value to this practice when it is done in relation to the expression of
auditory-perceptual analysis in specific transcriptions. This will be discussed in
the next section. However, instrumental records from a specific utterance can
also usefully be indexed to generic transcriptions to show articulatory or acoustic
features thought to be typical of speech in a particular language or variety. For
example, to show the typical difference between clear and dark allophones of
/l/ in English varieties of English, a spectrogram could be indexed to a generic
allophonic transcription of a word such as lilt, as in Figure 6.11.
FIGURE 6.11: Spectrograms indexed to a generic allophonic transcription

of English lilt to show typical clear and dark allophones of /l/ with
formant tracks
6.4 Impressionistic Transcription and Instrumental Records

In Chapter 5, impressionistic phonetic transcription was examined in detail and
argued to be a valuable method for recording the analysis of perceptual objects
created in the transcriber’s experience of hearing speech. In this section, I shall
compare auditory-perceptual analysis with instrumental analysis and consider the
implication of the comparison for phonetic transcription.
First of all, I would like to repeat an anecdote related by the art critic Kenneth
Clark as reported by Gombrich (1972: 5):
A master of introspection, Kenneth Clark, has recently described to us most
vividly how even he was defeated when he attempted to ‘stalk’ an illusion.
Looking at a great Velázquez, he wanted to observe what went on when the
brush strokes and dabs of pigment on the canvas transformed themselves into
a vision of transfigured reality as he stepped back. But try as he might, step-
ping backward and forward, he could never hold both visions at the same time,
and therefore the answer to his problem of how it was done always seemed
to elude him.
Clark’s experience illustrates that looking at the brush strokes in the paint, that
is at the isolatable components composing the Velázquez painting as a visible
object, is a qualitatively different experience from experiencing the aesthetics of
the painting as a work of art. The aesthetics cannot be experienced by staring at
the brush strokes. The pigments and the brush strokes are amenable to quantita-
tive analysis of various kinds to yield data about the chemical composition of the
paints, the sizes of the brush strokes and so on, but the aesthetics of the painting is
not. Phoneticians are in much the same situation in relation to speech. Listening
to isolatable bits of acoustic material such as vowel formants, plosive bursts or
movements of F0, the auditory equivalents of brush strokes and pigments, is
not the same as experiencing the piece of speech they are isolatable from. That
is to say, phonetic structure itself is not reducible to physical analysis because
part of what makes it phonetic structure and not just acoustic structure is what
we hear when we engage in analytic listening as phoneticians. That is to say,
acoustic structure is not the same as phonetic structure. I am suggesting that the
phonetic structure of speech is like the aesthetic structure of a painting, or of a
piece of music. A Beethoven sonata cannot be heard by isolating the harmonic
structures of the tones. The sum of all possible analyses of the physical materials
of a Velázquez or a piece of performed music will never add up to an account of
what we see or hear when we experience the aesthetic structure, although it will
contribute, along with theories of visual and auditory perception, to an account of
what we see or hear when we look at the brush strokes and the dabs of pigment,
or listen to notes played on a piano. Similarly, instrumental analyses of articu-
latory actions and acoustic signals do not add up to an account of the phonetic
structure of speech as we hear it, although they will contribute to an account of
what we hear when we listen to isolatable bits of acoustic material, and contribute
to an account of how speakers produced them. For this reason, some proponents
of impressionistic transcription are strongly of the view that it is inappropriate
to use instrumental analysis to validate auditory-perceptual analysis (see for
example Heselwood 2009; Howard and Heselwood 2011). In a sentence reso-
nant of Clark’s anecdote, Howard and Heselwood (2011: 941) remark that ‘[o]ne
does not take a photometer to the Louvre to verify that one is seeing the Mona
Lisa’. The question is, how can images of the glottis, or tongue, or tongue–palate
contact, or formant values and waveform durations prove what is or is not in the
perceptual objects about which a transcriber makes phonetic judgements? They
do of course prove what was in the articulatory structure or acoustic structure,
but not what is judged to be in the phonetic structure. Pointing to a rather low
F2 does not invalidate a perceptual judgement that the vowel in question sounds
front, any more than the vowel sounding front invalidates the low F2.
When Oller and Eilers (1975: 301), writing about transcription accuracy, say
that ‘the listener may perceive elements which are not present in the acoustic
signal, and/or he may fail to perceive elements which are present’, they are
assuming that it is possible to tell exactly what the phonetic content of an acoustic
signal is, and also that the purpose of impressionistic transcription is to express
it. In my view it is possible in principle to give an exhaustive account of what
the acoustic content of an acoustic signal is, but not its phonetic content. If a
transcriber judges the phonetic content to contain [l] when the acoustic record
SomoudBarghouthy
shows no known acoustic correlates for it, then this is certainly something to try
to explain, and if we find an explanation then we have advanced our understand-
ing of the relationship between the acoustics of speech and speech perception,
between acoustic structure and phonetic structure. If we do not find an explana-
tion then we have not advanced our understanding, but that is no justification for
saying that the perceptual judgement is wrong. Looking for a perceived [l] in the
acoustic signal is like looking for the Mona Lisa’s smile in the brush strokes: it
simply isn’t there, and never can be. Nor, however, is it there even if we find
the expected acoustic correlates. What we can say in that case is that we have an
explanation for the perception of [l].
When we have instrumental records of the articulatory or acoustic structure of
an utterance, we can, as outlined in Section 6.2 above, annotate or summarise the
information in the form of an instrument-dependent transcription. We can also
add an impressionistic transcription to record the phonetic judgements of a phone-
tician arrived at by listening. In Howard and Heselwood (2011) the result is called
a two-tier transcription, in which the transcription expressing the instrumental
analysis is labelled ‘I’ and the one expressing the auditory-perceptual analysis is
labelled ‘P’. An ‘I’ transcription is speaker- or signal-oriented and a ‘P’ transcrip-
tion is listener-oriented. A two-tier transcription presents both orientations or,
if articulatory and acoustic records of the same utterance are annotated or sum-
marised, a signal-oriented transcription is added and the transcription has three
tiers, giving complementary perspectives on the utterance from the articulatory,
acoustic and perceptual domains. None of the transcriptions in a multi-tiered tran-
scription of this kind should be taken as inherently more veridical than the others.
These different orientations also bring to prominence a fundamental point
about the relationship between general phonetic categories and phonetic domains
which is both a weakness and a strength for a notation system such as the IPA,
which is predominantly integral rather than componential. The same symbols,
denoting the same phonetic categories, are used whether the analysis of speech
is articulatory, acoustic or auditory-perceptual, meaning that they should be
conceived of as domain-neutral, although for historical and practical reasons the
terminology tends to be taken from the articulatory domain. O’Connor (1973:
104–5) identifies the terms ‘plosive’, ‘roll’ (= trill), ‘flap’ and ‘fricative’ as audi-
tory (or perceptual) (see also Pike 1943: 70 for ‘fricative’ as an acoustic-auditory
term), but ‘plosive’ and ‘fricative’ are really aerodynamic terms (Heselwood
2008b: 89), and ‘roll’ and ‘flap’ surely describe types of tongue behaviour. There
are of course domain-specific categories for the analysis of acoustic structure, for
example long-lag and short-lag VOT categories, but no IPA symbols explicitly
denote them, which may help to explain why aspiration and VOT often fail to
be properly distinguished. The advantage of domain-neutral categories is that the
same symbols can be used regardless of the domain in which the analysis was
conducted, and an analysis carried out in a specific domain can be interpreted
in domain-neutral terms. For example, ‘lateral’, despite being a term taken from
the articulatory domain, stands as a general phonetic category with correlates in
each of the domains. In the articulatory domain it means a particular range of
tongue–palate contact configurations; in the acoustic domain, a particular range
of resonance patterns containing a zero (Stevens 1998: 546); in the auditory
domain, transductions of the acoustic resonance patterns which give rise to a
239
range of perceptions each having its own phenomenal character. One can also
consider aerodynamics as a domain (see Section 6.5.2 below), in which ‘lateral’
means airflow exiting round one side or both sides of the tongue. To strengthen
this advantage, phonetic theory needs to develop consistent and coherent ways of
translating between these domains so that a category such as ‘lateral’ is robustly
domain-neutral. The weakness in having the same categories and symbols for
all domains is that they tend to be used and interpreted in terms of the domain
about which most is known and in terms of which it is easiest to define one’s
categories, and that generally means articulation. This tendency easily slips into
an interpretation in which one domain is considered to be where ‘the truth’ lives.
Ladefoged’s (1990: 344) dictum that ‘[f]or the phonetician there is no universal
truth independent of the observer’ can be extended to say there is no universal
truth independent of the domain of observation.
From the point of view of phonetic research, it is interesting to see where
the different tiers of analysis do and do not seem to be in agreement, and to see
whether patterns of agreement and disagreement are consistent. Lack of congru-
ence between them generates research questions to pursue and hypotheses to test
concerning how the articulatory, acoustic and auditory correlates relate to each
other and where these relations are and are not monotonic (Stevens 1997: 463). It
can also be highly informative for clinicians dealing with atypical speech; see for
example Gibbon (1990). Howard and Heselwood (2011) present five examples
from atypical and typical speech in which what is perceived maps differently
onto phonetic categories from the mappings of instrumental analysis. One of their
examples shows acoustic and laryngographic data from a production of the Arabic
phrase bēt Darīn ‘Darin’s house’ (/beːt daˈriːn/) spoken by a Syrian speaker, in
which there is voicing throughout the realisation of /t/, but no voicing, and a posi-
tive VOT, in the realisation of /d/; see Figure 6.12. Nevertheless, Arabic listeners,
and most English listeners with no knowledge of Arabic, hear them respectively
as [t] and [d], not [d] and [t]. What the acoustic ingredients are that trigger these
perceptions is difficult to say, but no doubt signal-complementary processing
is involved originating in listeners’ phonologised experience, prompting the
authors to question whether IPA symbols can ever in practice be as language-
independent as they are intended to be (ibid.: 947). Figure 6.12 has been adapted
from the original figure by adding an extra tier of transcription to annotate the
acoustic and laryngographic records separately.
When auditory-perceptual judgements of the phonetic structure are not congru-
ent with the articulatory and acoustic structures revealed by instruments, it would
be a big mistake to say they are therefore wrong. Judgements of phonetic quality,
as opposed to articulatory or acoustic structure, are judgements about the phenom-
enal character of auditory-perceptual experience, which does not reduce to articu-
latory and acoustic facts. What it is like to perceive a sound as [t] or [d] can only
be known by experiencing the perception. In the now famous thought-experiment
in Jackson (1986), the question is asked whether Mary, who is confined inside a
colourless room but with access to all conceivable facts about the physical world,
can know what it is like to see something red. Jackson’s answer is that, although
she may be able to imagine what it would be like, she could not actually know.
SomoudBarghouthy
FIGURE 6.12: Multi-tiered transcription showing: (A) signal-oriented

transcription summarising acoustic records (spectrogram and speech
waveform); (B) speaker-oriented transcription summarising an articulatory
record (larynx waveform); (C) listener-oriented impressionistic
transcription. Adapted from Howard and Heselwood (2011: 946)
Things that cannot be known through physicalist facts are termed qualia. Instead
of Mary, imagine her sister Jane, who is confined to a room in which no sound
can exist. Jane has access to all possible facts about the articulatory and acoustic
properties of speech and of the human auditory system. Can she know what it is
like to hear a glottal stop? Or to hear the difference between [t] and [d], [æ] and
[a], a falling tone and a rising tone? My contention is that a complete knowledge
of phonetics requires knowledge of physical facts and of qualia. The former can
be approached theoretically and instrumentally but the latter only practically and
through human experience. It may have been this kind of distinction which Sweet
had in mind when he said that in phonetics ‘[t]heoretical knowledge is not enough’
(Sweet 1906: 4). When we perceive speech sounds we do not hear theoretical cat-
egories or intersections of categories; we perceive sounds. Categories themselves
are silent. I have pointed out elsewhere (Heselwood 2008b: 88) that ‘[t]he IPA
chart would be as true in a silent world as in our world of sound’. Neither do we
perceive tongue movements or formant frequencies. Equally, spectrograms do not
depict tongue positions and articulograms do not depict formants, nor do either
of them contain phonetic categories. In the next section these issues are explored
and a scheme of how the different domains of phonetics relate to each other and
to the categories of phonetic theory is proposed.
6.5 Phonetic Domains, Phonetic Theory and Their Relations

Figure 6.13 sketches how the different phonetic domains can be related to each
other and to the categories of phonetic theory.
Apart from the perceptual domain, each domain can be conceived of as
Articulatory
Aerodynamic Acoustic
domain
domain domain
Physical space = vocal
Physical space = vocal Physical space =
tract anatomy and
tract airways pressure-waves
physiology
Abstract aerodynamic Abstract acoustic space
Abstract articulatory
space structured by structured by acoustic
space structured by
aerodynamic categories categories
articulatory categories
Abstract domain-neutral taxonomic

phonetic space structured by
intersecting theoretical categories
denoted by symbols
Auditory
Perceptual domain
domain Physical space =
No physical space transduction of physical
acoustic space
Auditory-perceptual
space structured by Abstract auditory space
exemplar-based structured by
perceptual categories psychoacoustic
categories
FIGURE 6.13: Phonetic domains in a chain of cause and effect which

map independently to phonetic categories. Brackets link domains which
together form the input to the next domain. Phonetic symbols denote
entities in taxonomic phonetic space (shaded box), but refer to, or
represent, entities in specific domains (clear boxes).
comprising a real-world physical space and an abstract space structured by

domain-specific categories. It makes little sense to try to identify a real-world
physical space in the perceptual domain because of the non-physicalist phenome-
nal view of perception argued for in Chapter 5 Section 5.8, and Section 6.4 above.
Abstract space is a system of category relations set up by the interface of phonetic
theory and theories developed in disciplines relevant to the domain, for example
anatomy and physiology for the articulatory domain, fluid dynamics for the aero-
dynamic domain, acoustics for the acoustic domain and neuro-anatomy for the
auditory domain. In principle, and as seen to some extent in practice, each domain
could have its own notation for denoting its categories. In Figure 1.5 in Chapter
1, acoustic classes are denoted by letters such as p for periodicity, t for transient
SomoudBarghouthy
etc. Alternatively, or in addition, each domain could have its own conventions for
interpreting a common notation such as the IPA, or, when used with instrumental
records, a multi-tiered approach could be taken in which a tier was assigned to
each category and instances of that category were picked out in a binary absent–
present notation. Whichever notational practice is adopted, it allows for multi-
tiered transcriptions expressing analyses from the various domains.
The main point of this scheme is that, firstly, general phonetic categories
should be interpreted as domain-neutral despite domain-specific origins and con-
notations in terminology, and secondly, analyses of events in one domain can be
mapped onto general phonetic categories independently of any analysis in any
other domain. Most of the time we would expect the same utterance to be mapped
onto the same categories from each domain of analysis, but as we have seen in
a number of cases, this does not always happen. ‘Domain mismatches’ are often
interesting and can lead to advances in our understanding of phonetics when we
seek explanations for them, but they should not be taken as evidence that map-
pings from one domain are more accurate than from another.
The domains are arranged in cause-and-effect relations familiar from the
‘speech chain’ concept (Denes and Pinson 1963) and the division of sound
production into phases (Catford 1977: 2–6), but some have a closer relationship
than others. For example, the articulatory and aerodynamic domains both inhabit
the vocal tract and so share the same overall physical space; their combined
output creates the acoustic domain. There is also a close relationship between the
acoustic and auditory domains in that the latter is a transduction of the former,
and there is a sense in which they combine to form the input to perception. That
is to say, the auditory domain cannot provide input to perception on its own and
is thus completely dependent on the acoustic domain, whereas synthetic speech
shows that the acoustic domain is not so completely dependent on the articula-
tory and aerodynamic domains. Each domain will be discussed in turn, but first
some explanation is offered of how the scheme in Figure 6.13 relates to notation
and transcription and to theoretical and descriptive models, as distinguished in
Chapter 1 Section 1.3.
Abstract taxonomic phonetic space contains the denotata, in the form of
theoretical models comprising bundles of category intersections, for phonetic
symbols. The various phonetic domains contain the phenomena which can be
referred to, or represented by, phonetic symbols, the conjunction of which with
the theoretical models denoted by those symbols creates descriptive models. The
scheme thus allows for domain-specific descriptive models. That is to say, the
conjunction of the theoretical model denoted by [b] with certain phenomena in
the articulatory domain creates a different descriptive model from that which is
created when it conjoins with phenomena from the acoustic domain, or from the
perceptual domain. These relations are diagrammed in Figure 6.14.
The theoretical model [b] can be characterised disjunctively as in (6.1).
(6.1) [b] = { ([b]Ar) ∨ ([b]Ae) ∨ ([b]Ac) ∨ ([b]Au) ∨ ([b] Pe) }
Disjunctivity means that in fact the symbol for the theoretical model [b] is
polysemous. All the [b] symbols in (6.1) are related in meaning because, by
Theoretical model
243
[b]
Relevant Relevant Relevant Relevant Relevant
articulatory aerodynamic acoustic auditory perceptual
phenomena phenomena phenomena phenomena phenomena
[b] Ar [b] Ae [b] Ac [b] Au [b] Pe
Descriptive Descriptive Descriptive Descriptive Descriptive
articulatory aerodynamic acoustic auditory perceptual
model model model model model
FIGURE 6.14: Domain-neutral theoretical model and domain-specific

descriptive models
definition, they all denote the same theoretical model whilst representing differ-
ent kinds of data. We are therefore dealing here not with homonymy but with
polysemy. The semantic relation of hyponymy is also evident in these relation-
ships. The symbol [b] in its capacity of denoting a theoretical model has a super-
ordinate relation to the various descriptive models symbolised in (6.1), although
indistinguishable in expression from its hyponym symbols. It parallels cases in
lexical-semantic structure such as cat as a superordinate term for all felines, and
cat as a co-hyponym of lion, tiger etc.
6.5.1 Articulatory domain
The physical space in the articulatory domain is the anatomy and physiology
of the vocal tract. It is therefore a three-dimensional Euclidean space of height,
length and breadth. Time is a fourth dimension through which vocal tract
shape changes within physiological constraints. The specific–generic distinc-
tion is applicable. Specific physical articulatory space is assumed to be unique
to each speaker and undergoes changes during the course of life (Mackenzie
Beck 2010). It can be investigated using various instruments such as X-ray,
endoscopy, articulography, ultrasound imaging, magnetic resonance imaging
and palatography (see Stone 2010). Generic physical articulatory space is the
generalised anatomical and physiological description of the vocal tract in terms
of the categories set up by the disciplines of anatomy and physiology, which
SomoudBarghouthy
provide phonetics with terms and definitions for the articulators. In diagrams
it is usually presented two-dimensionally in the sagittal, coronal or transverse
planes and is assumed to be applicable to all speakers with typical vocal tracts;
see Figure 6.15a. In some clinical contexts, for example cleft lip and palate,
speakers may have atypical vocal tracts, in which case their articulatory behav-
iour cannot be modelled so accurately by reference to generic articulatory
space.
(a)
al velar
postalveolar
lat uv
pa ula
r
dental
alveolar
labial
ba
front ck
dorsum
s ade
l
lip lip b
root
pharnygeal
glottal
(b)
FIGURE 6.15: (a) Midsagittal vocal tract diagram representing generic

physical articulatory space with IPA symbol [s] at the relevant place of
articulation. Adapted from the IPA Handbook (1999: 7). IPA (1999),
Handbook of the International Phonetic Association, Cambridge:
Cambridge University Press; (b) region of abstract articulatory space
containing [s] (in bold) as the product of category intersection.
Abstract articulatory space is a multidimensional system of categories based

on generic physical articulatory space. The categories are established by how
phonetic theory says that sounds are formed by the vocal tract, and are named
using mostly anatomical terminology. For example, ‘alveolar’ is a category
named from the alveolus, established because theory tells us that it is impor-
245
tantly involved in the production of a particular class of sounds. The organic

notations which have been proposed and developed over the ages (see Chapter
3 Sections 3.1 and 3.2) would be appropriate as domain-specific notation for
articulatory categories, having been specially designed for that very purpose.
For example, Sweet’s symbol [] for a voiced alveolar plosive could be mapped
onto a domain-neutral IPA [d] to show that there really was an alveolar closure
with vocal fold vibration and without a lowered velum. Phoneticians wishing to
give unambiguously speaker-oriented transcriptions may find it more convenient,
however, to use the IPA symbol and refer to a domain-specific set of articulatory
conventions so that it is interpreted in articulatory terms.
6.5.2 Aerodynamic domain
The specific–generic distinction can also be applied in this domain. Specific

physical space in the aerodynamic domain is constituted by the tube-like ducts
in the vocal tract through which air flows. Patterns and modes of airflow can
be investigated using aerometric instruments, with nasal and oral airflow sepa-
rated for measurement using a Rothenberg mask (Shadle 2010: 66–7). Abstract
aerodynamic space is the set of aerodynamic categories which phonetic theory
tells us are important for classifying and distinguishing speech sounds, for
example turbulent and laminar airflow, relative volume velocities, subglottal
and supraglottal pressures and so on. Some abbreviatory notations from fluid
dynamics exist for these categories and category values – Psg for subglot-
tal pressure, ml/sec for volume velocity in millilitres per second – but other
notational devices could be harnessed for annotational purposes, such as t for
turbulence etc.
The aerodynamic domain tends not to receive as much attention in phonetics
as the articulatory and acoustic domains (Catford 1977: 9; Shadle 2010: 39–40),
although all phoneticians since the time of the ancient Indian grammarians
have stressed the fact that without a movement of air in the vocal tract there
could be no natural speech. One problem which has perhaps prevented atten-
tion to the aerodynamic domain is the difficulty in getting aerodynamic data
over and above the rather crude observations we can make without instruments.
Of all the domains, this is the one in which methods of data gathering have
the largest disruptive effects on the phenomena one wants to observe, some of
them being highly invasive or demanding of the experimental subject (Shadle
2010: 62–8). For these reasons, aerodynamic analyses are less likely to impact
on phonetic analysis than analyses performed in other domains, and less likely
to require some kind of notational representation. Consequently, the concept of
aerodynamic space may be of limited practical value in current phonetics, but it
is interesting to note that Law (1990: 219–20) identifies a focus on airflow as
a distinguishing mark in medieval Middle Eastern phonetics, compared to the
focus on stricture locations in the ancient Indian tradition. It is the latter which
we find in the IPA system, but there are some difficulties in classifying certain
types of sounds such as laterals and nasals in this way, as discussed in Chapter
3 Section 3.4.5.
SomoudBarghouthy
246
6.5.3
Acoustic domain
Physical space in the acoustic domain is constituted by the speech-generated

pressure-waves as they propagate through a medium. Because this medium is
almost exclusively the air, Shadle (2010: 39) includes speech acoustics within
speech aerodynamics, but for our purposes it may be advantageous to identify
it as a separate domain. Specific physical acoustic space is therefore unique to
every utterance, with generic space being what is thought common to all similar
utterances. The main instrumental method for investigating acoustic space is
spectrography, and a spectrogram can usefully be seen as analogous to a vocal
tract diagram, with frequency and amplitude as the analogues of articulators: it is
relationships between frequency and amplitude which define spectra.
The acoustic classes identified in Chapter 1 Section 1.2.1 are part of the fabric
of generic acoustic space and are assumed to be common to all similar utterances.
Abstract acoustic phonetic space is composed of categories identified by pho-
netic theory as important for the acoustic analysis of speech and includes the like
of VOT, amplitude rise-time, formant centre-frequency and bandwidth, formant
transition, energy-density maximum and so on. The most common representation
of abstract phonetic space in the acoustic domain is a formant chart with formants
F1 and F2 (or F1 and F2 − F1) as the dimensions which define acoustic models
of vowel qualities. In Figure 6.16, formants are given normalised kHz values in
order to account for inter-speaker vocal tract size variation, using the method in
Watt and Fabricius (2002) in which S is a centre of gravity (centroid) value for a
SF21
2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5
0.95
1
1.05
1.1
1.15
1.2
ɛ
1.25
SF1
1.3
1.35
1.4
ʌ 1.45
a
1.5
1.55
1.6
1.65
FIGURE 6.16: Vowel plot as a model of normalised acoustic space

showing the grand mean distributions and standard deviations of the
English dress, trap and strut vowels for different groups of speakers.
Adapted by kind permission from Kamata (2008: 247).
formant. The x-axis is the centroid value for F1 subtracted from the value for F2,
while the y-axis is the F1 centroid value.
For consonants with aperiodic acoustic energy – fricatives and stop bursts –
a two-dimensional phonetic space can be set up in which centroid values are
plotted on a frequency X amplitude plane. An example for a token of [s] is given
in Figure 6.17, with the centroid marked with the phonetic symbol.
We have seen that acoustic classes can be notated with letter-labels such as
p for ‘periodic’, and this could be extended to a notation for symbolising pho-
netic categories specific to the acoustic domain. Alternatively, domain-neutral
symbols could be used with domain-specific conventions to give the symbols
acoustic interpretations. This would be relevant if vowel symbols were placed
on formant charts as annotations, for example, or if an unambiguously signal-
oriented transcription were called for.
FIGURE 6.17: Centroid for a token of [s], calculated at 6453 Hz following

the procedure in Ladefoged (2003: 157)
6.5.4 Auditory domain
Linking the auditory domain more closely to the acoustic domain than to the
perceptual domain may seem odd, particularly as I have made liberal use of the
term ‘auditory-perceptual’ throughout Chapters 5 and 6. As explained above, it
is justified because it is the auditorily transformed acoustic signal which forms
the input to perception, but also the different constitution of perceptual space
makes it hard to discuss it in the same terms as the concept of auditory space
(see Section 6.5.5 below).
Listeners are not consciously aware of the auditory domain as defined here.
It is where the automatic processing of sound takes place to produce the per-
cepts that form in our consciousness; the percepts themselves inhabit perceptual
space. Physical auditory space is an auditory transduction of physical acoustic
space such that the elements of the latter have correlates in the former which
are in principle predictable by psychoacoustic theory. Similarly, abstract audi-
tory space is a transform of abstract acoustic space, with pitch, timbre and
loudness instead of frequency and amplitude. Bark, ERBs, mels and semitones
have been developed as psychoacoustic scales for measuring pitch and timbre
(Moore 1997: 107–9; Hayward 2000: 140–2; Howard and Angus 2001: 76); for
SomoudBarghouthy
measuring loudness, the sone scale is commonly used for stimuli above 40 dB
SPL (Moore 1997: 58). Progress in auditory processing research should lead to a
better understanding of phonetic space in the auditory domain and to the setting
up of further dimensions to structure it. Formant charts can be adapted to model
auditory space by using psychoacoustic scales on the x- and y-axes instead of
physical frequency scales. When plotting values onto a Bark or ERB chart, the
auditory integration of spectral components as hypothesised by psychoacoustic
theory can be represented and could be incorporated into domain-specific con-
ventions for the auditory interpretation of phonetic symbols, for example on a
Bark chart.
Auditory features such as sibilant, sonorant and grave are discussed by
Ladefoged (1997: 611–16), who includes voice and vowel height as auditory
features, and also brightness, a function of the difference between F1 and F2′ (or
Z1 and Z2 on a Bark scale). Some provisional proposals for how auditory space
could be structured into dimensions are given by Flemming (2002: 18–25), who
develops ordinal scales for a number of auditory correlates of acoustic categories
such as F1, F2, F3, a category of ‘noise frequency’ to rank fricatives by height
of their centre-of-gravity spectral moment, and various categories of ‘loudness’
for overall rankings of sonority; for ranking within classes such as fricatives and
stop bursts, and VOT, categories are given numbers.
Johnson (2007: 35) provides a formula, given here in (6.2), for calculat-
ing auditory distance between a sound entering the auditory system and the
hearer’s stored exemplars by comparing auditory spectrograms, or cochleagrams
(Johnson 2003: 56–7), i.e. spectrograms which have been transformed by a psy-
chocaoustic algorithm.
(6.2) Auditory distance = √ ∑ (xi − xj)2
x = Measurable parameter
ij
= Indices
On the basis of similarity computed as ‘least auditory distance’, Johnson envis-

ages being able to predict the level of activation of an exemplar, thus providing a
promising means of quantifying the signal-complementary processing discussed
in Chapter 5 Sections 5.4 and 5.11.
6.5.5 Perceptual domain
As mentioned before, it makes little sense to think of the perceptual domain

inhabiting a physical space. It is therefore differently constituted from the other
domains and is most usefully conceived of as an abstract space populated by
best exemplars of sound-types forming prototypical phonetic categories (Ashby
1990). These prototype categories are associated in long-term memory with
the taxonomic categories of phonetic theory, so that when a phonetician hears
an instance of the sound-type [b] it can be mapped onto the intersection of the
categories ‘bilabial’, ‘voiced’ and ‘plosive’, although the sound is unlikely to be
perceived in these terms.
The perceptual domain is the domain of perceptual objects and is therefore
249
necessarily a discrete kind of abstract space, because perceptual objects are by

definition discrete. Auditory-visual fusion as demonstrated in the McGurk effect
means that the location of an object in auditory perceptual space can be partly
determined by visual information, and that the location of an object in visual
perceptual space can be partly determined by auditory information; for example,
looking at a face articulating [ɡ] synchronised with audio [b] results for most
people in a [d]-perception, explained as a fusion of the two cross-modal input
stimuli (McGurk and McDonald 1976).
Perceptual judgements in all modalities, because we are conscious of what we
perceive, cannot but involve interpretation, a process in which we, as it were, tell
ourselves what we are perceiving. The exemplar-based categories available to
us in terms of which we can judge what it is we are perceiving are partly deter-
mined by biologically specified cognitive and perceptual constraints, and partly
by our experiences of perceiving things, for example the sounds of our own
language or other languages. It is biologically specified that humans cannot hear
sounds over about 22 kHz, although dogs and some other animal species can;
and it is the presence of /l/ and /r/ in English which is responsible for English
speakers being able to discriminate their realisations much better than Japanese
speakers who do not have this distinction in their language, although Japanese
infants’ performance is no different from that of American infants (Kuhl et al.
1997). What we can and cannot perceive results from a combination of nature
and nurture.
Phonetic theory has not yet developed a model of the structure of perceptual
phonetic space, nor is there terminology available for consistent descriptions
of phonetic percepts much beyond the terms of phonetic taxonomy. When IPA
symbols are used in impressionistic transcriptions they denote articulatorily
defined models while referring to, and representing, phenomena which the tran-
scriber judges meet the criteria for being mapped onto those models. For example,
the symbol [ʔ] in an impressionistic transcription has to be read as something like
‘a sound that sounded as if it was made with a glottal closure’, even though instru-
mental evidence may prove that the glottis was not in fact fully closed.
6.5.6 Phonetic categories as domain-neutral
Historical association of abstract taxonomic phonetic categories with the articu-

latory domain can profitably be severed, or at least weakened, in general pho-
netic theory now we understand more about the other domains and that each
domain has to be investigated in its own terms: we cannot claim to know fully
what is happening in one domain on the basis of knowledge of another domain.
Taxonomic phonetic space should therefore be conceived of as domain-neutral
such that phonetic categories, despite their domain-specific names, can be
interpreted according to conventions appropriate to whichever domain is being
analysed. The category ‘labial’, for example, means something different in each
domain, and it is the task of phonetic theory to explain how the correlates of
‘labial’ relate to each other across those domains.
SomoudBarghouthy
250
6.6
Multi-Tiered and Multilayered Transcriptions

Phonetic transcriptions can be aligned with other kinds of transcriptions to build
up what Müller and Ball (2006) have called ‘multilayered transcriptions’ to try
to capture in transcriptional form a more complete account of an utterance as a
communicative event. There is not scope in this book to go into other kinds of
transcription of a non-phonetic kind, but examples with which phonetic tran-
scriptions can be aligned are transcriptions of gaze and gesture (Damico and
Simmons-Mackie 2002, 2006), transcriptions of discourse features (Müller and
Guendouzi 2006), and proxemics and kinesics (Poyatos 2002: 140–2).
It is useful to distinguish hierarchically between ‘multilayered’ and ‘multi-
tiered’ transcription by defining a ‘layer’ of transcription as being composed
of one or more ‘tiers’. For example, a segmental layer can have an allophonic
tier and a phonemic tier, or tiers separating different classes of sounds as in
Figure 6.2, or tiers orienting to different domains of phonetics as in Figure 6.12.
To a segmental layer can be added a prosodic layer, which can have separate tiers
for phonetic and phonological analysis, or for intonational and rhythmic analysis,
for lexical tones, or for representing pauses and dysfluencies. Added to these can
come a layer of gaze and gesture transcription, and another layer for conversation
analysis transcription (Walker 2013: 471–2; and see Chapter 7 Section 7.6), and
so on. It is also useful to provide, wherever relevant and feasible, an orthographic
transcription at the summit of the hierarchy to relate the other transcriptions to
system sentences, or fragments of system sentences.
SomoudBarghouthy
7
e
Uses of Phonetic Transcription
e
7.0 Introduction
In this brief survey of some of the main uses of phonetic transcription I will try
to characterise the kinds of transcriptions employed in terms of the typological
distinctions discussed in Chapters 3 and 4, the information they are providing and
to whom, and the functions they are performing. I will start with those uses for
which little or no knowledge of phonetic theory on the part of users is assumed.
7.1 Transcription in Dictionaries

There are many different sorts of dictionaries: monolingual and bilingual diction-
aries, dictionaries of standard usage, dialect and slang dictionaries, pronounc-
ing dictionaries and specialist technical dictionaries. It is in dictionaries that
members of the general public are most likely to encounter phonetic transcrip-
tions of one kind or another.
Transcriptions in dictionaries are generic transcriptions. They are aimed at
users who are assumed to have no specialist knowledge of phonetics and for
whom a transcription therefore does not represent an analysis into the categories
of phonetic theory. However, the notation does embody some kind of analysis if
only into categories of pre-theoretical sound-types. That is to say, users will have
some notion of a difference between consonants and vowels, and different kinds
of consonants and vowels, from their own experience of being literate language
users. It is therefore meaningful for them to see the pronunciation of a word
analysed into discrete symbols; this is analogous to an analysis of the spelling of
words into letters. Keywords will tell them through ostensive definition which
sound-types the symbols represent. For the average user, then, transcriptions in
dictionaries are pseudo-transcriptions, although they will have been made by lin-
guists for whom they are proper transcriptions because for them the notation has
theoretical content. The status of a transcription as pseudo- or proper thus depends
crucially on the phonetic knowledge of the user as well as on the intentions of
the transcriber. Pronunciation dictionaries such as the English Pronouncing
Dictionary (EPD) started by Daniel Jones, now in its eighteenth edition (Roach,
SomoudBarghouthy
Setter and Esling 2013), and the Longman Pronunciation Dictionary (Wells
2008) use IPA-based phonetic notation to provide broad systematic transcriptions
of all entries, this being their raison d’être. Although pronunciation dictionar-
ies provide keywords, and there are often explanations of important phonetic
and phonological concepts such as assimilation and weak forms, the specialist
phonetic notation is likely to deter users who have no prior phonetic knowledge,
and to attract those who do. The readership of pronunciation dictionaries will
therefore share with the compilers an understanding of phonetic categories such
that the transcriptions represent analyses, not just indications, of how the words
are pronounced.
Appreciation of what a dictionary transcription is saying as a generalised
record of pronunciation is a passive use of the transcription. Usage becomes
active when a transcription functions as a prescriptive model for users to base
their own pronunciation on, and as a performance score from which they can
rehearse their pronunciation. The extent to which pronunciation information is
given over and above that which is contained in the spellings varies from the
marking only of word-accent, through phonetically motivated respellings (see
Congleton 1979: 71–3 for examples of respellings in Samuel Johnson’s 1755
A Dictionary of the English Language and for the claim that Johnson instigated
respelling as a lexicographic method), to broad phonetic transcriptions of all
headwords.1 Pronunciation information of this kind only made its way into dic-
tionaries very gradually. Various schemes for making the spelling of European
vernaculars more phonetic appeared from the sixteenth century onward, but they
remained isolated efforts, despite showing that some of their inventors had a
good grasp of phonetic analysis. They may, however, have influenced some of
the means and methods that seventeenth- and eighteenth-century lexicographers
used for representing pronunciation and which are still found in modern diction-
aries. For example, Hart (1551: 164) used an acute accent placed over a vowel
letter to mark word-accent in English, and we find this device in the first English
dictionaries to indicate aspects of pronunciation (Beal 2008: 150). Although
other means have been used, such as raised periods placed before or after the
vowel letter, the acute accent was adopted in the early IPA charts. Jones replaced
it in the fourth edition of the EPD with the vertical stroke (superior), which
became the standard IPA primary stress mark. Placement of accent marking
has varied, appearing either before or after the vowel letter corresponding to the
nucleus of the accented syllable, and also either before or after the sequence of
letters corresponding to the whole syllable. It is usual now to place it before the
letter or symbol corresponding to the first sound in the accented syllable.
Phonetic transcriptions in dictionaries are typically based on citation form
pronunciations and are of the systematic segmental type, often broad enough
to count as phonemic. Different devices have been used in dictionaries to mark
quantity and quality distinctions, particularly of vowels. These are usually in the
form of diacritics such as macrons, circumflexes, breves and colons, but invented
letters have been used, as have numbers in association with vowel letters, these
devices first appearing in English dictionaries in the latter half of the eighteenth
century (Beal 2008: 161). In fact it was to dictionary-making that most of the
phonetic activities of that time were directed, resulting in experimentation with
SomoudBarghouthy Uses of Phonetic Transcription
respellings and adapted notations such as Thomas Spence’s amalgam letter-
253
shapes (see Chapter 2 Section 2.3.4, and Chapter 3 Section 3.4.1).

The kinds of prosodic properties that one may find represented in dictionaries
are primary and secondary word-accent, syllabification and, for tone languages,
lexical tone. We have already mentioned various ways of marking accent.
Syllabification has been marked by periods, hyphens and spaces while acute,
grave and other kinds of accent marks have been used for lexical tones, in addi-
tion to numbers.
The notation used for transcriptions in dictionaries is now predominantly
based on the IPA (Esling 2010: 678) but some dictionaries developed their own
notation. This was famously the case with the New English Dictionary (later the
Oxford English Dictionary) for which the editor, James Murray, designed his
own system to try to avoid what he saw as problems with Ellis’s palaeotype and
glossic notations and Sweet’s romic system, but ending up with a notation never
used since (MacMahon 1985: 90–1), and described by Collins and Mees (2008:
180–1) as ‘not merely ill-organised and over-elaborate but often inconsistent’. As
Abercrombie (1977/1991: 88) points out, it uses more than fifty different vowel
symbols.
An interesting and, as far as I know, unique way of ordering transcriptions
in a dictionary is found in the dialect dictionary constructed from the Survey of
English Dialects material. Transcriptions representing alternative pronunciations
for a headword are ordered according to the vowel symbol in the first stressed
syllable, starting with vowels in the ‘high front’ area of the vowel quadrilateral
and continuing round anti-clockwise, with central vowels coming last (Upton,
Parry and Widdowson 1994: 6); in effect, it follows Daniel Jones’s cardinal
vowel numbering but without a primary–secondary distinction.
7.2 Transcription in Foreign Language Learning

and Teaching
Concern with phonetic notation has been closely associated with language
teaching since the Renaissance, particularly so in the context of languages
such as English and French in which pronunciation changes have severely
disrupted phonographic letter–sound correspondences in the orthography. This
concern strengthened in the nineteenth century and led to the foundation of the
International Phonetic Association via its predecessor organisations, due to the
efforts mainly of teachers of foreign languages who also had an interest, and
considerable skills, in phonetics.
Currently, the use of phonetic transcription in the teaching of foreign lan-
guages varies from hardly any or none to systematically helping students acquire
pronunciation skills in the target language. This will depend largely on the
teacher’s knowledge of phonetics and confidence in using phonetic transcrip-
tion, but even where teachers have these skills most students are unlikely to have
much if any understanding of phonetic theory. Consequently their knowledge of
what symbols represent will be confined to associating a particular symbol with
a particular sound, that is to say treating it as an imitation label, learning the
association either from hearing the target sound demonstrated or from example
SomoudBarghouthy
keywords in their own language judged the same as, or similar to, sounds in the
target language. When sounds are similar but not the same, students may fail to
notice the difference. English learners of French, Italian or Spanish, for example,
often fail to notice that voiced plosives are prevoiced in these languages and
produce them as devoiced because they unwittingly think they are the same as in
English. Transcriptional representations can draw attention to these differences.
Martinet (1986) relates how, when he was teaching English to French stu-
dents, he devised his own symbol variants to try to avoid ‘cross-associations’ of
the kind that so worried Sweet. To prevent his students giving a uvular ‘Parisian
r’ value to English ‘r’, Martinet invented a symbol which ‘began like a z (or 7)
and ended like a 6’ (ibid.: 39) and which became known as ‘le zed à ventre’.
From his account, it was a highly successful strategy and managed to dissociate
English /r/ from French /r/ in the minds of the students. Arabic speakers learning
French may find it helpful to associate French <r> with Arabic <‫ >غ‬and exploit
their common correspondence with [ʁ], avoiding association with Arabic <‫>ر‬,
which transliterates into roman as <r> and corresponds variably to [ɾ] and [r].
In addition to phonetic transcription helping students with the pronunciation
of single words, Wells (1996: 239) points out that it can be of great benefit when
it comes to connected speech features which are not usually reflected in spelling;
spelling is context-free at the level of the word whereas speech is highly context-
sensitive. English de-alveolarisation (Cruttenden 2001: 285), for example, is
never shown in spelling across word boundaries, further illustrating that the
primary purpose of spelling is not to indicate pronunciation but to identify
words. Learners, however, do not always appreciate this point and may treat the
spellings as reliable guides to pronunciation, leading to ‘spelling pronunciation’
from which connected speech processes are excluded. Even in well-established
compounds, spelling remains unaffected by assimilation and other connected
speech phenomena such as elision, so that the extremely common pronunciation
of handbag as [hambaɡ] or football as [fʊp˺bɔɫ] comes as a surprise to learners
of English, who go for a [handbaɡ] or [fʊtbɔɫ] reading, treating the spelling as a
pseudo-transcription. It is only in Latinate prefixes that English spelling reflects
historical assimilations, such as imprudent from the Latin derivation imprudens
(in + prudens). Most transcription courses and workbooks aimed at language
learners give plenty of practice in the transcription of connected speech phenom-
ena (for example Lecumberri and Maidment 2000). For some languages with
phonographically oriented writing systems, such as Spanish, Japanese, Turkish
and Arabic, spelling does in fact mostly provide a reliable guide and can be
interpreted as pseudo-transcription, but in others it can be quite misleading, as
is notoriously the case in English and French. Logographically oriented writing
systems such as Chinese encode much less systematic information about pronun-
ciation, so any information has to be given using some system of transcription
such as Pinyin transliterations or IPA.
Proposals have been made that language learners should first encounter broad
phonetic transcriptions instead of orthographic forms in their textbooks, only
learning the spelling of words later. It is claimed this approach improves not only
students’ pronunciation but also their spelling. Advocates of phonic spellings for
teaching literacy to children are of the same persuasion.
Because most users of language learning texts will not have studied phonet-
255
ics, transcriptions have to be quite broad. Diacritics will tend to confuse and
may even frighten not only students but also some teachers. Broad transcriptions
obviously contain less information than narrower ones and to that extent may
sometimes be less useful, although this will be offset by their being easier to
read. A teacher using transcription in a language teaching class will need to strike
a balance between the level of detail provided in a transcription and students’
ability to interpret the notation.
Transcriptions in the context of language learning and teaching are almost
always going to be generic transcriptions functioning as prescriptive models
for students to base their own pronunciations on, and as performance scores
for them to practise their pronunciations from. If teachers dictate something for
students to transcribe, then strictly speaking the students will be making specific
transcriptions, although what they are interested in is not the idiosyncrasies of the
teacher’s pronunciations but the teacher’s exemplification of ‘correct’ or typical
pronunciations; broad rather than narrow transcriptions will usually therefore be
more appropriate.
There are textbooks specifically for teaching pronunciation of languages from
a theoretical perspective, which often devote the early chapters to general pho-
netics so that students can attain a reasonable understanding of how the vocal
tract works and therefore interpret transcription symbols in a more sophisticated
way. A widely used textbook of this kind for teaching English pronunciation is
Roach (2000). Since Jones (1918) there has been a tradition in theoretical books
on English phonetics of addressing the needs of foreign learners of English,
reflecting the emergence of phonetics as an academic discipline in Europe from
the world of foreign language teaching. The best known of these in current
use is probably Cruttenden (2001), updating Gimson’s An Introduction to the
Pronunciation of English, which first appeared in 1962 and went to four edi-
tions. These texts offer detailed phonetic descriptions of spoken English, based
on the RP variety but with increasing attention being paid to other varieties
(for example Cruttenden 2001: 84–90). For the language learner using these
texts to be able to appreciate the differences between these varieties as they are
expressed in phonetic notation, it is necessary to have a solid understanding of
phonetics.
There are language teaching texts that do not require so much phonetic
knowledge, phonetic notation being restricted to the representation of pho-
nemes. For example, Hewings (2004) presents a list of phoneme symbols for
English using IPA notation and uses them for indicating connected speech
phenomena. The assimilated pronunciation of hot potato is given as ‘ho/p/
potato’ (Hewings 2004: 81), which inserts a phoneme symbol into an otherwise
orthographic form to show the assimilation. Readers are not introduced to pho-
nological theory and no arguments are put forward to justify phonemicising the
assimilation as /p/ (see Chapter 4 Section 4.6). It therefore expresses not really
a phonological analysis but a broad phonetic one. Phoneme symbols are being
used not for their theoretical content but as a convenient practical resource for
respelling.
SomoudBarghouthy
256
7.3
Transcription in Phonetics Learning and Teaching

It is difficult to conceive of learning and teaching phonetics without using pho-
netic notation and phonetic transcription quite extensively. It is standard for pho-
netics textbooks to include an IPA chart and to use phonetic symbols regularly
throughout the text. Individual symbols appearing in descriptions and explana-
tions of speech production processes are denoting theoretical models.
In the teaching of practical phonetics, symbols and transcriptions are often
used as performance scores in production practice and as specific transcriptions
in ear-training. If a student is shown a symbol such as [ɑ] or a ‘nonsense’ string
of symbols such as [ʁɛʈɯɸp’] and asked to produce it, it can hardly be called
a transcription of the student’s utterance because it predates the utterance (and
the utterance may in fact not match the transcription). It is composed of symbols
that denote theoretical models. The success of the performance is judged by how
appropriate it is to map it onto the models in the transcription. If, in ear-training,
a phonetics tutor performs the transcription score [ʁɛʈɯɸp’] and asks students
to transcribe the performance, then what they produce is a specific transcription.
The first phonetician to use nonsense strings in practical phonetics seems to have
been Jean Passy (Collins and Mees 1999: 21; and see Chapter 4 Section 4.13.1),
brother of Paul Passy, the leading figure in the founding of the International
Phonetic Association. The practice is now well established as a valuable part of
the syllabus in phonetics teaching.
Transcription is used in the marking and assessment of practical phonetics
when examiners transcribe a student’s productions of consonants and vowels in
order to judge how accurate they are. Clearly these are specific transcriptions and
will typically be quite narrow, because the examiner wishes to capture as much
detail of the production as possible in order to assess it.
In dealing with consonants and vowels, segmental transcription is the most
common kind to be encountered in phonetics learning and teaching, but para-
metric transcription also has a valuable contribution to make (Tench 1978). A
generic parametric transcription is a very useful way of illustrating how phonetic
theory views the temporal relations between the actions of different speech
organs during speech production. In fact it can be regarded as a model of such
relations. For a parametric transcription to be a specific transcription it would
have to be based on information about a specific utterance. The broad–narrow
distinction can be applied to parametric transcriptions (see Chapter 4 Section
4.11) such that those based on detailed measurements of temporal relations
would be classed as narrow while those that give a more abstract general picture
would be classed as broad. Narrow specific parametric transcriptions could only
meaningfully come from instrumental methods of observation and measurement
which can validate them (Howard and Heselwood 2013: 94).
7.4 Transcription in Speech Pathology and Therapy

Speech pathology is one of the contexts in which specific transcriptions are most
commonly made, for both therapeutic intervention and research purposes. Any
transcription that attempts to capture the idiosyncrasies of a particular speaker’s
atypical speech is a specific transcription, and if a written record is to be made of
257
such idiosyncrasies preliminary to phonological analysis then phonetic transcrip-

tion is a useful and convenient way to do this (Heselwood and Howard 2008:
381). Nevertheless, generic transcriptions do play a role in the speech pathology
literature, as when it might be said, for example, that speakers with cleft palates
typically realise /s/ as [s͋ ]. Here the symbol stands for an indefinitely large class
of past, present and future productions by a particular but indefinite population
of speakers. In this case the population is defined clinically rather than geo-
graphically or socially, but the principle is the same as with the example of [a]
presented as the typical vowel in hat in certain varieties of English. The ExtIPA
set of symbols provides [s͋ ] as the theoretical model in terms of which a specific
or generic use of [s͋ ] in a transcription can be interpreted, i.e. ‘voiceless alveolar
grooved fricative with simultaneous nasal airflow’.
Clinical transcriptions are most commonly in the form of segmental transcrip-
tions with prosodic and voice quality features also represented as appropriate (Ball
et al. 1996: 60). The narrowness or broadness of transcriptions will vary as the
focus of interest shifts to particular parts of an utterance and as the transcriber sees
fit (Grunwell 1987: 35). It is usually more appropriate to make impressionistic
transcriptions rather than systematic ones because it cannot be predicted in advance
of an analysis what implications the speakers’ pathology or immaturity has for the
structure of their phonological system (ibid.: 34; Heselwood and Howard 2008:
383–4). In fact Ball et al. (1996: 60) claim that ‘only a detailed phonetic transcrip-
tion has any validity with disordered speech’; they provide examples to show that
narrow transcription provides more accuracy and insight into atypical speech than
do broad transcriptions (ibid.: 82–7). The authors warn, however, against using
transcriptions to assess the severity of a speaker’s speech difficulties (ibid.: 93–5).
Because pathological speech may contain sounds that are not normally found
in speech, special notations have been devised to cope with them. These have not
always been the same in different countries, particularly in the context of speech
related to cleft palate (Howard 2011: 131). However, the current ExtIPA notation
(see Chapter 3 Section 3.4.6) has been introduced into clinical phonetics training
(Ball 2006: 60–1) and is firmly established as part of the transcriptional resources
for researchers in clinical phonetics. Another specialist clinical notation based on
the PRDS set, which was the forerunner of ExtIPA, is the one devised for use
with the PETAL speech assessment procedures (see Parker 1999). An articula-
tion it recognises which the IPA and ExtIPA do not is ‘bilabiolingual’, denoted
by a bilabial symbol with a strikethrough, [ᵽ], and distinguished from IPA lingu-
olabial, which in PETAL is described as a ‘very advanced indeed’ type of coronal
consonant with ‘tongue tip advanced to upper lip’, symbolised as [t]. The differ-
̟̟ ̟
ence is that in [ᵽ] the lower lip closes against the tongue whereas in [t] it does not.
̟̟
̟
Another innovation is [L→] to represent a unilateral articulation.
7.5 Transcription in Dialectology, Accent Studies and

Sociophonetics
Transcriptions made from dialectological, accent studies or sociophonetic field-
work observations and recordings are specific transcriptions, but it is generally
SomoudBarghouthy
not the speech of the particular recorded individuals which is the focus of inter-
est. Individual speakers are recorded because they are believed to be representa-
tive of a particular geographically or socially defined population of speakers, and
it is the phonetic behaviour of that whole population which is of interest. The
private specific transcriptions of fieldworkers might be presented in publications
as generic transcriptions by induction.
If the phonological system of the language variety being studied is not known
then transcriptions cannot be systematic and will have to be impressionistic. They
can be broad or narrow depending on the needs of the transcriber, but a specific
transcription which is very narrow may be capturing details that are speaker-
specific and therefore not generalisable. Some editing of the specific transcrip-
tions may be necessary in order to filter out idiosyncrasies once they have been
identified as such. Often phonetic transcriptions in the context of dialectology,
accent studies and sociophonetics will be used as a starting point for phonologi-
cal analysis so that a phonological description of the variety, or some aspect of
it, can be made. It is therefore essential that any editing is done carefully to avoid
discarding details that may turn out to be phonologically important.
Dialectological traditions tended to develop their own notation systems, which
became deeply rooted and widely used before IPA symbols and the use of square
and slant brackets became well established. Roman letter-shapes, often with dia-
critics added, were adapted and given in italics to set them off from surrounding
text. Lepsius’s Standard Alphabet was heavily influenced by this practice (see
Chapter 3 Section 3.4.2). In the study of Semitic languages, for example, dental
fricatives were transcribed with underlines, so that t d were employed instead of
ˉˉ
IPA [θ ð]. Although in more recent works IPA equivalents are often given (for
example Watson 2012: 10–16), journals and other publications specialising in
Semitic languages regularly still employ their own notation systems. Because
they are richer than the orthography on which they are based, and the diacriti-
cal additions are motivated by the need to represent some aspect of phonetic or
phonological structure, such systems can be regarded as proper phonetic nota-
tion systems in so far as an appreciation of phonetic and/or phonological theory
guides their usage, although transcription is not always clearly separated from
transliteration. Often, no systematic distinctions were made between phonetic
and phonemic representations in traditional dialectology, largely because dialec-
tologists were, and in some cases still are, working independently of develop-
ments in theoretical phonology. Dialectologists, at least in the past, have been
more oriented towards a philological approach to language study than towards
a modern linguistic approach. That is to say, their main aims were to chart the
lexical distribution of sounds through time and across dialects, and to give an
account of dialectal differences in pronunciation in terms of sound correspond-
ences, rather than to work out how sounds functioned in synchronic phonological
systems.
Sometimes two traditions have used the same glyph for different kinds of
sounds. In Indian dialectology an underline was used not for fricatives but as
a ligature for affricate symbols: ts dz = IPA [ʧ ʤ] (Grierson 1928: 2). Indian
and Semitic dialectology have both used a combining subscript dot, for example
s. In Indian dialectology it was introduced by Franz Bopp to denote retroflex
(or ‘cerebral’ in Indian dialectological terminology) consonants, equivalent to
259
IPA [ʂ], while in the Semitic tradition it denotes an ‘emphatic’ consonant. This
difference of denotation of the subscript dot led Lepsius (1863: 74) to use an
underline for Arabic emphatics, which may solve one problem but only creates
another. Exactly what the IPA equivalent of s is in Semitic languages depends on,
firstly, which Semitic language is being described, and secondly, one’s view of
the articulatory correlates of emphasis in that language. In Arabic, for example,
there is still debate on whether it is pharyngealisation, velarisation or uvularisa-
tion (Laufer and Baer 1988: 181–5; Heselwood and Hassan 2011: 20), and also
about which consonants actually are emphatic (Heselwood and Al-Tamimi 2011:
123–5). One can see here how questions of phonetic and phonological theory
impact on notational and transcriptional practice in dialectology.
In dialectological studies taking a historical perspective, to incorporate too
much phonetic detail in transcriptions would obscure the kinds of phonetic
and phonological relationships across time and locations which one is trying
to describe. Again taking the example of emphasis in Semitic languages, it is a
common view that they originated as ejectives in Proto-Semitic (Kogan 2011:
60–1) but have changed in many modern Semitic languages into pulmonic
sounds with some kind of constriction in the pharyngeal region of the vocal tract.
If we wish to express the historical identity of the modern emphatic reflexes with
the Proto-Semitic emphatics, using the subscript dot is a convenient way to do
it. The phonetic differences can be described and represented in IPA symbols if
one wants to show how the emphatics have changed. It is similar in principle to
the phoneme–allophone relation, except that the ‘allophones’, or reflexes, are
distributed over time (and may have changed their phonemic status through splits
or mergers).
A serious problem, however, attends the use of IPA symbols, which are inte-
gral and quite phonetically specific, when we do not know enough about the
details of pronunciation at a particular time or in a particular place. Weninger,
Khan, Streck and Watson (2011: 5) exemplify the difficulty of having a unified
transcription for all Semitic languages when they ask: ‘How should, e.g., Ugaritic
s be transcribed in IPA, when all we know about this phoneme is that it is the
product of the merger of *s, *ṭ and *ṣ́ ?’ (* = reconstructed proto-form). Grierson
ˉ
(1928: 1) explains that, in comparing linguistic forms across the languages of
India, vowels were not represented in the Linguistic Survey of India by IPA
symbols because, firstly, the precise vowel sounds in the remoter locations were
not always known, and secondly, much of the material was collected by people
‘who were not skilled phoneticians’. Instead, alphabetic vowel letters with
various diacritics were used, illustrated by English keywords. For example, ä is
defined as the vowel in English hat, but of course we need to know how words
like hat were pronounced in 1920s British English, and which accent Grierson
had in mind, before we can interpret his ostensive keyword definitions with
phonetic accuracy.
Collection of material by skilled phoneticians was not a problem in the Survey
of English Dialects (SED) project, overseen by Harold Orton and Eugen Dieth
in the 1950s and based at the University of Leeds (Orton and Dieth 1962; Upton
et al. 1994). Among the fieldworkers was Stanley Ellis, widely respected for his
SomoudBarghouthy
exceptional ear for phonetic qualities and famous in the forensic field for having
pinpointed the village where the speaker on the hoax Yorkshire Ripper tape was
from (see Ellis’s account in Ellis 1994). Transcriptions in the SED were made
using the symbols and ‘modifiers’ of the 1951 IPA chart (the term ‘diacritics’ was
not used on charts until the 1979 revision). Even among experienced transcrib-
ers, some variation was found in how they interpreted the instruction to make
‘impressionistic’ transcriptions (Viereck 1973: 79). Figure 7.1 shows two pages
of Ellis’s fieldwork notes made in Horton-in-Ribblesdale, North Yorkshire, con-
taining IPA symbols supplemented with phonetic observations and descriptions.
FIGURE 7.1: Pages from Ellis’s SED fieldwork notes with IPA
transcriptions. Yorkshire Response Books 13, dated November 1952.
Reproduced with the permission of the Brotherton Collection, Leeds
University Library
Accent studies overlap considerably with dialectology and with sociophonet-

ics (Foulkes and Docherty 1999: 4–6). In so far as they are different, they usually
take a synchronic perspective and concentrate on phonetic and phonological
features which distinguish one accent of a language from other accents, without
systematically investigating how this variation maps onto social variables, and
without presenting more than background information on the history and origins
of particular sounds. Phonetic notation and transcription are clearly hugely
important in accent studies for representing those sounds and features which
are considered to characterise a particular accent. The use of IPA symbols in
Wells (1982) and Foulkes and Docherty (1999), for example, is extensive and
shows how indispensable a proper phonetic notation system is for describing and
comparing the different ways in which the same language can be pronounced. In
261
the ‘standard lexical sets’ in Wells (1982: 127–68) we can see the influence of the
philological concern with lexical distributions of sounds in related language vari-
eties, but in the narrowness of the transcriptions we can see the importance Wells
places on phonetic exactness when we know enough of the phonetic facts. As
in dialectology, transcriptions in accent studies tend to be generalised transcrip-
tions because it is the characterisation of the speech of a relatively large group of
speakers which is the aim, for example all speakers identifiable by their speech
as coming from Liverpool, or from the Scottish Highlands, the southern United
States, Hong Kong etc., notwithstanding variation within those populations.
These also tend to be auditory-perceptual transcriptions, as they are often made
either live ‘in the field’ or from audio recordings made in the field. Although
instrumental analyses have been, and continue to be, carried out in the context of
accent studies (Docherty and Foulkes 1999: 52–4), auditory-perceptual analysis
has the advantage that it will only capture features which are perceivable, and it
is only perceivable features which everyday language users can use to identify
which accent group a particular speaker belongs to. Investigating how listeners
might do this is the concern of perceptual dialectology (Preston 1989). It has
to be remembered, of course, that which features are perceived is not always a
straightforward matter. Listeners’ own speech habits and exposure to other varie-
ties are confounding variables (see Chapter 5 Section 5.11).
‘Sociophonetics’ is a recent term introduced to cover that part of sociolinguis-
tics which is concerned with how variation in the pronunciation of a language
correlates with social variables, that is to say with ‘socially structured variation
in speech’ (Foulkes et al. 2010: 704). It takes primarily a synchronic perspective
in which phonetic detail is of paramount importance, including details which can
only be investigated instrumentally. Sociophonetics, it is probably fair to say,
tries to get to grips with the variation within accent-groups to show how it is non-
random and governed by social factors such as age, gender and socio-economic
class. Transcription is for sociophonetics ‘a tool of the trade’ (Kerswill and
Wright 1990: 256), widely used both for recording auditory analyses of speech
(but see Thomas (2011: 1, 145) for a less positive view), and for annotating
instrumental records. Typically, a sociophonetic variable is identified and repre-
sented in phonetic notation in parenthesis brackets (Chambers and Trudgill 1980:
61) to represent a sound-type without committing to its phonological status and
which can be defined in relation to a particular phonotactic context. For example,
in a sociophonetic study of intervocalic rhotics in non-rhotic British English, (r)
could include ‘linking’ and ‘intrusive’ R even if the analyst does not regard them
as belonging to the /r/ phoneme, whereas /r/ in slant brackets would exclude them
for those who regard them as the result of an insertion rule (for example, Wells
1982: 222–7; McMahon 2000: 280). The variants of the variable are represented
in narrow phonetic transcription to indicate particular instantiations of it.
7.6 Transcription in Conversation Analysis

As with speech pathology, the particularities of the pronunciation phenomena
that conversation analysts are interested in have prompted the development of a
SomoudBarghouthy
special notation, including conventions for prosodic features, which has become
standard in this area of research (Walker 2013: 469). Exemplified and discussed
in Jefferson (2004), the notation provides a means of showing phonetic details
that may not be important for lexical-phonological distinctions but which can
have considerable importance in the structure of conversational interaction.
Conversation analysis (CA) transcripts crucially preserve lexical and grammati-
cal identities through the use of normal orthography, but a mixture of respelling
and special notation is also employed where attention is to be drawn to pronun-
ciation details. Often they are all mixed together in the same word. An example
is suppose represented as <s’po:ze> in Local and Walker (2012: 257). The under-
line representing ‘some form ˉof stress via pitch and/or amplitude’ (Jefferson
2004: 25) is a special CA transcription convention; the colon for vowel length
is a transliteration of IPA [ː] also used in SAMPA; the apostrophe is a standard
orthographic marking of elision, and the ‘silent’ final <e> firmly belongs to the
orthography of English diphthongs; the <z> and omission of a letter can be
seen as either phonographic respelling or phonetic transcription, but the former
is probably the best way to view it because it is crucial in CA that lexical and
grammatical information is represented, and this is the important function of
spelling, not of phonetic transcription. Whether the use of the special notation
can be regarded as proper phonetic transcription in the way I have defined it in
Chapter 1 Section 1.3 depends on the extent to which the notation is based on
phonetic theory. The conventions in Jefferson (2004) are not presented as con-
sisting of categories defined by a theory, but they do have definitions that provide
for consistency in their use and they do express an analysis of pronunciation
beyond the capabilities of the standard orthography. To take an example, degree
signs enclosing talk indicate that ‘the sounds are softer than the surrounding talk’
(Jefferson 2004: 27). However, there is a level of vagueness in the definition
that precludes ‘soft’ from being an adequate theoretical category compared, for
example, to an IPA category. It is not clear if ‘softer’ necessarily means quieter,
or if it could also mean a ‘soft’ or ‘lax’ or ‘breathy’ voice quality. This vagueness
may not matter from the point of view of CA practitioners, who will judge the
adequacy of the notation by how well it enables them to express the analyses they
want to make. Indeed, Walker (2013: 469) is satisfied that transcriptions using
Jefferson’s system ‘are by and large suited to their purpose’, although he identi-
fies shortcomings in a critical discussion of Jefferson’s notation (ibid.: 469–71;
see also Hepburn and Bolden 2013), identifying ambiguities in the meaning
of capitalisation and inadequacies in dealing with pitch. The relationship of
CA notation to a body of phonetic theory is not as tight as it is where a proper
phonetic notation such as the IPA is concerned. CA transcription therefore lies
somewhere between the pseudo-transcription of respelling and a proper phonetic
transcription explicitly based on theoretically defined phonetic categories. Rather
than being pre-theoretical, its notation provides what we could characterise as
quasi-theoretical models, at least from a phonetic point of view; they may of
course be fully theoretical from a CA point of view.
Transcriptions of conversations will clearly be specific transcriptions. They
record observations of individuals’ productions of speech on particular occa-
sions. From repeated observations of a given kind of phenomenon a conversation
SomoudBarghouthy Uses of Phonetic Transcription 263
analyst may wish to make a generalisation and express it in transcriptional form,
in which case it will be a generic transcription.
Importantly, it is not the phonological structure of lexical items that conver-
sation analysts are analysing, but the subtleties of interactional behaviour as
manifested when people talk to each other. Transcription is therefore impres-
sionistic. Use of instrumental records such as spectrograms is becoming more
common (Walker 2013: 457), offering the opportunity to align transcriptions
with spectrographic information and to index spectrograms, waveforms and
spectra to particular points in a transcription (see Chapter 6 Sections 6.2 and
6.3). Local and Walker (2012), for example, align IPA phonetic transcriptions
to spectrograms and waveforms to show how features such as voicing continu-
ation across word boundaries, anticipatory coarticulation and gestural reduction
signal that the speaker is going to continue talking rather than yield the floor to
another speaker.
IPA notation can be used in CA transcripts, although there is no standard way
of marking it out as IPA symbols. There is some potential for confusion where
the same symbols or diacritics have different interpretations. For example, the
apostrophe indicates ejective (glottalic egressive) production in IPA notation
but elision in CA transcriptions. Square brackets are the normal way to mark
symbols as IPA notation but in CA they demarcate sections of overlapping talk
(Jefferson 2004: 24), so are not readily available to be used unambiguously with
their IPA function. Walker (2013: 471–2) suggests a multilayered approach to
CA transcription in which phonetic transcriptions can be placed on a separate
layer aligned with Jefferson-style transcriptions, the latter showing the sequential
structure of interactive talk, the former its phonetic structure.
7.7 Transcription in Forensic Phonetics

In both legal and academic spheres of activity, forensic phonetics has a higher
profile now than ever before. Professional practitioners are phonetically trained,
often being, or having been, academic phoneticians, and use proper phonetic
notation and transcription as a tool for making written records of their analyses
of individual criminals’ and suspects’ speech samples. Practice varies somewhat
depending on whether a forensic phonetician works alone or as part of a team, but
an informal survey of experienced members of the profession in the UK reveals
there is a common general approach to using phonetic transcription which has
similarities to its use in clinical work. Forensic phoneticians produce private
transcriptions for their own reference, varying in narrowness according to the
features felt to be important, but they also lay some of these out as public tran-
scriptions if they are to be presented in reports for counsel. This is usually done
in appendices with some explanations for the benefit of phonetically untrained
lawyers, but, because the opposing side in a case has the right to see all evidence,
transcriptions might be shown to another forensic phonetician, in which case it
can be useful to preserve the details in a narrow transcription. Transcriptions will
be predominantly specific because, as with clinicians, it is speech samples from
individuals that are the focus of interest and it is idiosyncratic speech behaviours
that are most useful for speaker identification (Nolan 1997). Because these may
SomoudBarghouthy
include behaviours resulting from pathologies, there is considerable overlap with
the practice of transcription in clinical contexts.
A further similarity with clinical phonetic transcription is that recordings are
often of poor quality, even more so in forensic contexts when they have been
made surreptitiously with hidden microphones or from intercepted phone calls.
Fraser (2003) gives some advice on how orthographic transcriptions of these
poor-quality recordings can be made more accurate by awareness of the phonet-
ics of speech and of the nature of speech perception.
Transcriptions done for forensic purposes are typically in the form of seg-
mental transcriptions using IPA and ExtIPA notation with prosodic and voice
quality features added as necessary. Speaking rate, articulation rate and fluency
are often important factors, which can be computed from acoustic analysis and
summarised in a transcription. Transcriptions may also be used to annotate spec-
trograms, and it is common practice to construct vowel plots from acoustic analy-
ses, labelled with vowel phoneme symbols or keywords such as Wells’s (1982:
127–68) lexical set names, to map out an individual speaker’s vowel space.
Whether transcriptions of forensic data are subjected to any procedures
designed to assess inter-transcriber reliability (see Chapter 5 Section 5.13)
depends on whether the forensic phonetician works alone or in a team. Those
who work alone sometimes do a second transcription at a later date for compari-
son, but short deadlines can make that difficult to achieve as a matter of routine.
Note
1. Johnson was apparently dismissive of the worth of showing pronunciation in a diction-
ary beyond word-accent placement. Boswell reports a conversation in which he men-
tioned to Johnson that Sheridan had a scheme for showing how vowels are pronounced.
Johnson replied that ‘Sheridan’s Dictionary may do very well; but you cannot always
carry it about with you: and, when you want the word, you have not the Dictionary’
(The Life of Samuel Johnson, vol. 2, the year 1772). The same could be said, of course,
about any information in a dictionary.
SomoudBarghouthy
Glossary
e
Words in bold appear as entries.
abjad – set of letters for written language which can be put into correspondence
with consonants in spoken language; vowels are either not represented, or repre-
sented by bound diacritics.
abugida – set of letters for written language which can be put into correspond-
ence with consonant-plus-vowel sequences in spoken language. It differs from
a syllabogram in that the base letter corresponds to a consonant, with an added
modification corresponding to a vowel. It can thus be thought of as a vocalically
augmented abjad.
alphabet – set of letters for written language which can be put into correspond-
ence with consonants and vowels in spoken language.
analogical (notation) – a notation system in which denotata are consistently

denoted by the same symbol.
analphabetic (notation) – a formulaic notation in which each theoretical cat-

egory is separately denoted by a symbol.
character – a glyph used in written language such as a letter, syllabogram or

logogram.
diacritic – a phonetic symbol which denotes only a single theoretical cat-

egory, e.g. palatalisation, voicelessness, or aspiration, and modifies a base
symbol.
generalised transcription – a transcription representing the typical pronuncia-

tion of a speaker-group or accent; it represents an indefinitely large class of past,
present and future utterances.
SomoudBarghouthy
gestural score – a transcription in which articulatory gestures are represented
(see also parametric transcription).
glyph – the graphic form of a character or symbol.
iconic (notation) – a notation system in which the symbols resemble their

denotata.
letter – a character belonging to an alphabet or abjad or abugida.
logogram – a character for written language corresponding to a word, not to

sounds.
notation (phonetic) – a set of phonetic symbols.
organic (notation) – a notation system in which the denotata are anatomical

structures of the vocal tract and their physiological relationships.
orthography – a standard, or codified, set of spellings for a language, e.g. the

spellings for English words given in The Oxford English Dictionary.
parametric transcription – a transcription in which phonetic parameters are

shown to vary through time during speech.
performance score – a transcription to be read aloud from.
pronunciation – the phonetic realisations of the phonological form/s of a lin-

guistic item such as a word or phrase, e.g. the word table can be pronounced
[tʰeɪbəɫ, tˢeɪbəɫ, tʰeɪbɫ, teːbəɫ . . . etc.]; phonological forms are set up to account
for the phonetics of pronunciations.
proper symbol (phonetic) – a glyph used in a phonetic notation system to

denote a model in phonetic theory.
proto-symbol – a character in an orthography used as a phonetic symbol.
pseudo-symbol – a phonetic symbol which has an ostensive definition instead

of denoting a theoretical model.
specific transcription – a transcription representing a phonetic analysis of a

single utterance by a particular speaker on a particular occasion.
syllabary – a set of characters for writing language which can be put into cor-
respondence with syllables in spoken language.
syllabogram – a character belonging to a syllabary.
SomoudBarghouthy Glossary
symbol (phonetic) – see proper symbol.
267
spelling – the use of characters in a writing system to represent linguistic items;

e.g. the word cat is spelt using the letters <c>, <a> and <t> from the English
writing system.
transcription (phonemic) – the use of symbols to represent phonemes.
transcription (phonetic) – the use of phonetic symbols to represent a phonetic

analysis of spoken language.
transliteration – the replacement of the characters of one writing system by the

characters of another writing system; in practice, transliteration tends to rely on
character–sound correspondences.
writing system – the set of elements used for writing a language, e.g. the writing
system for English comprises the 26 letters of the Roman alphabet plus all the
punctuation marks.
SomoudBarghouthy
References
e
Abercrombie, David (1948), ‘Forgotten phoneticians’, Transactions of the

Philological Society 47, 1–34. Reprinted with extra footnotes in Abercrombie
(1965), pp. 45–75.
Abercrombie, David (1949), ‘What is a “letter”?’, Lingua 2, 54–63. Reprinted
in Abercrombie (1965), pp. 76–85.
Abercrombie, David (1953), ‘fənetik transkripʃənz’, Le Maître phonétique, 32–4.
Abercrombie, David (1954), ‘The recording of dialect material’, Orbis 111,
232–5. Reprinted in Abercrombie (1965), pp. 108–13.
Abercrombie, David (1964a), English Phonetic Texts, London: Faber and Faber.
Abercrombie, David (1964b), ‘Parameters and phonemes’, in The Child Who
Does Not Talk, Clinics in Developmental Medicine 13, London. Reprinted
in Abercrombie (1965), pp. 120–4.
Abercrombie, David (1965), Studies in Phonetics and Linguistics, London:
Oxford University Press.
Abercrombie, David (1967), Elements of General Phonetics, Edinburgh:
Edinburgh University Press.
Abercrombie, David (1977), ‘The indication of pronunciation in reference
books’, paper presented to the Dictionaries Group, European Group of
Educational Publishers, Peebles, June 1977. Reprinted in Abercrombie
(1991), pp. 85–90.
Abercrombie, David (1981), ‘Extending the Roman alphabet: Some ortho-
graphic experiments of the past four centuries’, in R. E. Asher and Eugénie J.
A. Henderson (eds), Towards a History of Phonetics, Edinburgh: Edinburgh
University Press, pp. 206–24.
Abercrombie, David (1986), ‘Hylomorphic taxonomy and William Holder’,
Journal of the International Phonetic Association 16, 4–7. Reprinted in
Abercrombie (1991), pp. 33–6.
Abercrombie, David (1989), ‘Segments’, in Abercrombie (1991), pp. 27–32.
Abercrombie, David (1991), Fifty Years in Phonetics, Edinburgh: Edinburgh
University Press.
Abercrombie, David (1993), ‘William Holder and other 17th-century phoneti-
cians’, Historiographia Linguistica 20, 309–30.
SomoudBarghouthy References
Akamatsu, Tsutomu (1988), The Theory of Neutralization and the Archiphoneme
269
in Functional Phonology, Amsterdam: John Benjamins.

Albright, R. W. (1958), The International Phonetic Alphabet: Its Backgrounds
and Development, Bloomington: Indiana University.
Allen, W. S. (1953), Phonetics in Ancient India, London: Oxford University Press.
Allen, W. S. (1981), ‘The Greek contribution to the history of phonetics’,
in R. E. Asher and Eugénie J. A. Henderson (eds), Towards a History of
Phonetics, Edinburgh: Edinburgh University Press, pp. 115–22.
Al-Nassir, A. A. (1993), Sibawayh the Phonologist, London: Kegan Paul.
Alsalmi, Jehan (in preparation), The Influence of Native Language on Audio-
Visual Integration During Speech Perception, PhD thesis, University of
Leeds.
Amorosa, H., U. von Benda, E. Wagner and A. Keck (1985), ‘Transcribing
detail in the speech of unintelligible children: A comparison of procedures’,
British Journal of Disorders of Communication 20, 281–7.
Anderson, John and Derek Britton (1999), ‘The orthography and phonology of
the Ormulum’, English Language and Linguistics 3, 299–334.
Anderson, John and Colin Ewen (1987), Principles of Dependency Phonology,
Cambridge: Cambridge University Press.
Arbib, Michael A. (2003), ‘The evolving mirror system: A neural basis for
language readiness’, in Morten H. Christiansen and Simon Kirby (eds),
Language Evolution, Oxford: Oxford University Press, pp. 182–200.
Archangeli, Diana (1988), Underspecification in Yawelmani Phonology and
Morphology, New York: Garland Press.
Ashby, Michael (1990), ‘Prototype categories in phonetics’, Speech, Hearing
and Language 4, 21–8.
Ashby, Michael, John Maidment and Evelyn Abberton (1996), ‘Analytic lis-
tening: A new approach to ear-training’, Speech, Hearing and Language 9,
1–10.
Baddeley, Alan D. (2004), ‘The psychology of memory’, in A. D. Baddeley,
M. D. Kopelman and B. A. Wilson (eds), The Essential Handbook of Memory
Disorders for Clinicians, Oxford: John Wiley and Sons, pp. 1–13.
Badecker, William (2005), ‘Speech perception following focal brain injury’,
in David B. Pisoni and Robert E. Remez (eds), The Handbook of Speech
Perception, Oxford: Blackwell, pp. 524–45.
Baines, John (2004), ‘The earliest Egyptian writing: Development, context,
purpose’, in Stephen D. Houston (ed.), The First Writing, Cambridge:
Cambridge University Press, pp. 150–89.
Bakalla, Muhammad H. (1983), ‘The treatment of nasal elements by early Arab
and Muslim phoneticians’, in Cornelis H. M. Versteegh, Konrad Koerner
and Hans-J. Niederehe (eds), The History of Linguistics in the Near East,
Amsterdam: John Benjamins, pp. 49–69.
Baker, A. (1919), The Life of Sir Isaac Pitman, New York: Pitman and Sons.
Ball, Martin J. (1991), ‘Recent developments in the transcription of non-normal
speech’, Journal of Communication Disorders 25, 59–78.
Ball, Martin J. (2006), ‘Transcribing at the segmental level’, in Nicole Müller
(ed.) Multilayered Transcription, San Diego: Plural, pp. 41–67.
SomoudBarghouthy
Ball, Martin J. and John Local (1996), ‘Current developments in transcrip-
tion’, in Martin J. Ball and Martin Duckworth (eds), Advances in Clinical
Phonetics, Amsterdam: John Benjamins, pp. 51–89.
Ball, Martin J. and Joan Rahilly (1999), Phonetics: The Science of Speech,
London: Edward Arnold.
Ball, Martin J., Chris Code, Joan Rahilly and Diane Hazlett (1994),
‘Non-segmental aspects of disordered speech: Developments in transcrip-
tion’, Clinical Linguistics and Phonetics 8, 67–83.
Ball, Martin J., John H. Esling and Craig Dickson (1995), ‘The VoQS system
for the transcription of voice quality’, Journal of the International Phonetic
Association 25, 71–80.
Ball, Martin J., Rachel Manuel and Nicole Müller (2004), ‘An atypical articu-
latory setting as learned behaviour: A videofluorographic study’, Child
Language Teaching and Therapy 20, 153–62.
Ball, Martin J., Joan Rahilly and Paul Tench (1996), The Phonetic Transcription
of Disordered Speech, San Diego: Singular.
Barry, William J. and Adrian J. Fourcin (1990), ‘Levels of labelling’, Speech,
Hearing and Language 4, 31–43.
Barthel, Helen (2013), Phonetics in the Media: English and German Radio
Newscasts and Change of Speech Rate while Reading, MA research disserta-
tion, University of Leeds.
Bates, Sally (1995), Towards a Definition of Schwa: An Acoustic Investigation of
Vowel Reduction in English, PhD thesis, University of Edinburgh.
Beal, Joan C. (1999), English Pronunciation in the Eighteenth Century, Oxford:
Clarendon Press.
Beal, Joan C. (2008), ‘Pronouncing dictionaries I: Eighteenth and early
nineteenth centuries’, in A. P. Cowie (ed.), The Oxford History of English
Lexicography. Vol. 2: Specialised Dictionaries, Oxford: Oxford University
Press, pp. 149–75.
Beckman, Mary E. and G. M. Ayers (1994), Guidelines for ToBI Labelling,
Version 2.0, Columbus: Ohio State University, Linguistics Department.
Beckman, Mary E. and Jennifer J. Venditti (2010), ‘Tone and intonation’, in
William J. Hardcastle, John Laver and Fiona E. Gibbon (eds), The Handbook
of Phonetic Sciences, Oxford: Wiley-Blackwell, second edition, pp. 603–52.
Bell, Alexander Melville (1867), Visible Speech: The Science of Universal
Alphabetics, London: Simpkin, Marshall.
Benvenuto, Bice and Roger Kennedy (1986), The Works of Jacques Lacan: An
Introduction, London: Free Association Books.
Bernal, Martin (1987a), ‘On the transmission of the alphabet to the Aegean
before 1400 BC’, Bulletin of the American Schools of Oriental Research 267,
1–19.
Bernal, Martin (1987b), Black Athena: The AfroAsiatic Roots of Classical
Civilisation. Vol. 1: The Fabrication of Ancient Greece 1785–1985, London:
Free Association Books.
Bernhardt, Barbara and Martin J. Ball (1993), ‘Characteristics of atypical
speech currently not included in the extensions to the IPA’, Journal of the
Bernstein, Lynne E. (2005), ‘Phonetic processing by the speech perceiving
271
brain’, in David B. Pisoni and Robert E. Remez (eds), The Handbook of

Speech Perception, Oxford: Blackwell, pp. 79–98.
Bernstein, Lynne E., Edward T. Auer and Jean K. Moore (2004), ‘Audiovisual
speech binding: Convergence or association?’ in Gemma Calvert, Charles
Spence and Barry E. Stein (eds), The Handbook of Multisensory Processes,
Cambridge, MA: MIT Press, pp. 203–23.
Bladon, R. Anthony (1983), ‘Two-formant models of vowel perception:
Shortcomings and enhancements’, Speech Communication 2, 305–13.
Bloch, Bernard and George L. Trager (1942), Outline of Linguistic Analysis,
Baltimore: Linguistic Society of America.
Boas, Franz, P. E. Goddard, Edward Sapir and A. L. Kroeber (1916), Phonetic
Transcription of American Indian Languages, Publication 2415, Washington,
DC: Smithsonian Institute.
Bochner, Joseph H., Karen B. Snell and Douglas J. MacKenzie (1988),
‘Duration discrimination of speech and complex stimuli by normally hearing
and hearing-impaired listeners’, Journal of the Acoustical Society of America
84, 493–500.
Boone, Elizabeth Hill (2004), ‘Beyond writing’, in Stephen D. Houston (ed.),
The First Writing, Cambridge: Cambridge University Press, pp. 313–48.
Botma, Bert (2011), ‘Sonorants’, in Marc van Oostendorp, Colin J. Ewen,
Elizabeth Hume and Keren Rice (eds), The Blackwell Companion to
Phonology. Vol. 1: General Issues and Segmental Phonology, Malden, MA:
Wiley-Blackwell, 171–94.
Bowey, J. A. and Francis, J. (1991), ‘Phonological analysis as a function of age
and exposure to reading instruction’, Applied Psycholinguistics 12, 91–121.
Braille Authority of North America (1997), Braille Formats: Principles of Print
to Braille Transcription, Louisville, KY: American Printing House for the
Blind.
Braille Authority of the United Kingdom (1990), The International Phonetic
Alphabet (Revised to 1979), London: Royal National Institute of Blind People.
Breckwoldt, G. H. (1979), ‘African click sounds, early descriptions and sym-
bols’, in Harry Hollien and Patricia Hollien (eds), Current Issues in the
Phonetic Sciences, Amsterdam: John Benjamins, pp. 509–20.
Bregman, Albert S. (1990), Auditory Scene Analysis, Cambridge, MA: MIT
Press.
Bright, William (1996), ‘The Devanagari script,’ in Peter T. Daniels and
William J. Bright (eds), The World’s Writing Systems, Oxford: Oxford
Bromberger, Sylvain and Morris Halle (2000), ‘The ontology of phonology
(revised)’, in Noel Burton-Roberts, Philip Carr and Gerard Docherty (eds),
Phonological Knowledge, Oxford: Oxford University Press, pp. 19–37.
Bronkhorst, Johannes (2002), ‘Literacy and rationality in Ancient India’,
Asiatische Studien/Études Asiatiques 56, 797–831.
Browman, Catherine P. and Louis Goldstein (1989), ‘Articulatory gestures as
phonological units’, Haskins Laboratories Status Report on Speech Research
SR 99/100, 69–101.
SomoudBarghouthy
Browman, Catherine P. and Louis Goldstein (1990), ‘Tiers in articulatory
phonology, with some implications for casual speech’, in John Kingston
and Mary Beckman (eds), Papers in Laboratory Phonology I: Between the
Grammar and the Physics of Speech, Cambridge: Cambridge University
Press, pp. 341–76.
Browman, Catherine P. and Louis Goldstein (1992), ‘Articulatory phonology:
An overview’, Phonetica 49, 155–80.
Bryden, M. (1988), ‘An overview of the dichotic listening procedure and its
relation to cerebral organisation’, in K. Hugdahl (ed.), Handbook of Dichotic
Listening: Theory, Methods and Research, Chichester: John Wiley and Sons,
pp. 1–43.
Bucholtz, Mary (2000), ‘The politics of transcription’, Journal of Pragmatics
32, 1439–65.
Buizza, Emanuela (2010), Plosive Lenition: Frication and Affrication of /t/ in
RP English Spontaneous Speech, MA research dissertation, University of
Leeds.
Calvert, G. A. and T. Thesen (2004), ‘Multisensory integration: Methodologucal
approaches and emerging principles in the human brain’, Journal of
Physiology, Paris 98, 191–205.
Canepari, Luciano (2005), A Handbook of Phonetics, Munich: Lincom Europa.
Carlin, Laurence (2009), The Empiricists, London: Continuum Books.
Carlson, Rolf and Björn Granström (2010), ‘Speech synthesis’, in William
J. Hardcastle, John Laver and Fiona E. Gibbon (eds), The Handbook of
Phonetic Sciences, Oxford: Wiley-Blackwell, second edition, pp. 781–803.
Carney, Edward (1979), ‘Inappropriate abstraction in speech-assessment proce-
dures’, British Journal of Disorders of Communication 14, 123–35.
Carney, Edward (1994), A Survey of English Spelling, London: Routledge.
Carpenter, Rhys (1933), ‘The antiquity of the Greek alphabet’, American
Journal of Archaeology 37, 8–29.
Carrell, T., L. Smith and D. Pisoni (1981), ‘Some perceptual dependen-
cies in speeded classification of vowel color and pitch’, Perception and
Psychophysics 29, 1–10.
Carter, Michael G. (2007), ‘The origins of Arabic grammar’, in Ramzi
Baalbaki (ed.), The Early Islamic Grammatical Tradition, Aldershot:
Ashgate, pp. 1–26. Originally published 1972 as ‘Les origines de la gram-
maire arabe’, Revue des Études Islamiques 40, 69–97.
Catford, J. C. (1977), Fundamental Problems in Phonetics, Edinburgh:
Edinburgh University Press.
Chambers, Jack and Peter Trudgill (1980), Dialectology, Cambridge: Cambridge
University Press.
Chao, Yuan-Ren (1930), ‘A system of tone-letters’, Le Maître phonétique 45,
24–7.
Cho, Taehong and Peter Ladefoged (1999), ‘Variation and universals in VOT:
Evidence from 18 languages’, Journal of Phonetics 27, 207–29.
Chomsky, Noam (1964), ‘Current issues in linguistics’, in Jerry A. Fodor and
Jerrold J. Katz (eds), The Structure of Language: Readings in the Philosophy
of Language, Englewood Cliffs, NJ: Prentice Hall, pp. 50–118.
Chomsky, Noam (1966), Cartesian Linguistics, New York: Harper and Row.
273
Chomsky, Noam and Morris Halle (1968), The Sound Pattern of English, New
York: Harper and Row.
Clauss, Sidonie (1982), ‘John Wilkins’ Essay Toward a Real Character: Its
place in the seventeenth-century episteme’, Journal of the History of Ideas
43, 531–53.
Coleman, John (1994), ‘Polysyllabic words in the YorkTalk synthesis system’,
in Patricia A. Keating (ed.), Phonological Structure and Phonetic Form:
Papers in Laboratory Phonology III, Cambridge: Cambridge University
Press, pp. 293–324.
Collins, Beverley and Inger Mees (1999), The Real Professor Higgins: The Life
and Career of Daniel Jones, Berlin: Mouton de Gruyter.
Collins, Beverley and Inger Mees (2008), ‘Pronouncing dictionaries II: Mid-
nineteenth century to the present day’, in A. P. Cowie (ed.), The Oxford
History of English Lexicography. Vol. 2: Specialised Dictionaries, Oxford:
Oxford University Press, pp. 176–218.
Congleton, J. E. (1979), ‘Pronunciation in Johnson’s Dictionary’, in
J. E. Congleton, J. Edward Gates and Donald Hobar (eds), Papers on
Lexicography in Honor of Warren N. Cordell, Terre Haute: Dictionary
Society of America, Indiana State University, pp. 59–81.
Cooper, Jerrold S. (2004), ‘Babylonian beginnings: The origin of the cuneiform
writing system in comparative perspective’, in Stephen D. Houston (ed.),
The First Writing, Cambridge: Cambridge University Press, pp. 71–99.
Coulmas, Florian (1996), The Blackwell Encyclopedia of Writing Systems,
Oxford: Blackwell.
Coulmas, Florian (2003), Writing Systems, Cambridge: Cambridge University
Press.
Cruttenden, Alan (1997), Intonation, Cambridge: Cambridge University Press,
second edition.
Cruttenden, Alan (2001), Gimson’s Pronunciation of English, London: Edward
Arnold, sixth edition.
Crystal, David (1969), Prosodic Systems and Intonation in English, Cambridge:
Cambridge University Press.
Crystal, David (1982), ‘Terms, time and teeth’, British Journal of Disorders of
Communication 17, 3–19.
Cucchiarini, Catia (1996), ‘Assessing transcription agreement: Methodological
aspects’, Clinical Linguistics and Phonetics 10, 131–56.
Dagenais, P. A., L. C. Lorendo and M. J. McCutcheon (1994), ‘A study of voic-
ing context effects upon consonant linguapalatal contact patterns’, Journal
of Phonetics 22, 225–38.
Damico, Jack S. and Nina Simmons-Mackie (2002), ‘The base layer and the gaze/
gesture layer of transcription’, Clinical Linguistics and Phonetics 16, 317–27.
Damico, Jack S. and Nina Simmons-Mackie (2006), ‘Transcribing gaze and
gesture’, in Nicole Müller (ed.), Multilayered Transcription, San Diego:
Plural, pp. 93–111.
Danecki, Janusz (1985), ‘Indian phonetical theory and the Arab grammarians’,
Rocznik Orientalistyczny 44, 127–34.
SomoudBarghouthy
Daniels, Peter T. (1996), ‘The study of writing systems’, in Peter T. Daniels
and William J. Bright (eds), The World’s Writing Systems, Oxford: Oxford
Daniels, Peter T. (2001), ‘Writing systems’, in Mark Aronoff and Janie Rees-
Miller (eds), The Handbook of Linguistics, Oxford: Blackwell, pp. 43–80.
Danielsson, Bror (1955), John Hart’s Works on English Orthography and
Pronunciation, Stockholm: Almqvist and Wiksell.
Davis, Barbara L. and Peter F. MacNeilage (2005), ‘The frame/content
theory of speech evolution: From lip smacks to syllables’, Primatologie 6,
305–28.
DeFrancis, John (1989), Visible Speech: The Diverse Oneness of Writing
Systems, Honolulu: University of Hawai’i Press.
Delgutte, Bertrand (1997), ‘Auditory neural processing of speech’, in William
J. Hardcastle and John Laver (eds), The Handbook of Phonetic Sciences,
Oxford: Blackwell, pp. 507–38.
Denes, P. B. and E. N. Pinson (1963), The Speech Chain, New York: Bell
Telephone Laboratories.
Dennett, Daniel (1991), Consciousness Explained, Boston: Little, Brown.
Dickins, James (1998), Extended Axiomatic Linguistics, Berlin: Mouton de
Gruyter.
Diller, Anthony (1996), ‘Thai and Lao writing’, in Peter T. Daniels and William
J. Bright (eds), The World’s Writing Systems, Oxford: Oxford University
Press, pp. 457–66.
Dobson, E. J. (1957), The Phonetic Writings of Robert Robinson, London:
Docherty, Gerard and Paul Foulkes (1999), ‘Derby and Newcastle: Instrumental
phonetics and variationist studies’, in Paul Foulkes and Gerard Docherty
(eds), Urban Voices, London: Edward Arnold, pp. 47–71.
Docherty, Gerard and Paul Foulkes (2000), ‘Speaker, speech, and knowledge
of sounds’, in Noel Burton-Roberts, Philip Carr and Gerard Docherty (eds),
Phonological Knowledge, Oxford: Oxford University Press, pp. 105–29.
Docherty, Gerard and Ghada Khattab (2008), ‘Sociophonetics and clinical lin-
guistics’, in Martin J. Ball, Mick Perkins, Nicole Müller and Sara Howard
(eds), The Handbook of Clinical Linguistics, Oxford: Wiley-Blackwell, pp.
603–25.
Dresher, B. Elan (2011), ‘The phoneme’, in Marc van Oostendorp, Colin J.
Ewen, Elizabeth Hume and Keren Rice (eds), The Blackwell Companion to
Phonology. Vol. 1: General Issues and Segmental Phonology, Malden, MA:
Wiley-Blackwell, pp. 241–66.
Duckworth, Martin, George Allen, William Hardcastle and Martin Ball (1990),
‘Extensions to the International Phonetic Alphabet for the transcription of
atypical speech’, Clinical Linguistics and Phonetics 4, 273–80.
Dudley, Homer and T. H. Tarnoczy (1950), ‘The speaking machine of Wolfgang
von Kempelen’, Journal of the Acoustical Society of America 22, 151–66.
Eisen, B., H. G. Tillman and C. Draxler (1992), ‘Consistency of judgements
in manual labelling of phonetic segments’, Proceedings of the International
Conference on Language Processing ’92, Banff, Canada, pp. 871–4.
Elert, C-C. (1964), Phonologic Studies of Quantity in Swedish, Uppsala:
275
Almqvist and Wiksell.

Elliott, Ralph W. V. (1954), ‘Isaac Newton as phonetician’, Modern Language
Review 49, 5–12.
Ellis, Alexander J. (1867), ‘On palaeotype: Or, the representation of spo-
ken sounds, for philological purposes, by means of the ancient types’,
Transactions of the Philological Society 12, October Supplement, 1–52.
Ellis, Alexander J. (1869), On Early English Pronunciation with Especial
Reference to Shakspere and Chaucer, London: Asherand.
Ellis, Alexander J. (1889), On Early English Pronunciation with Especial
Reference to Shakspere and Chaucer, Part 5, London: Asherand.
Ellis, Stanley (1994), ‘The Yorkshire Ripper enquiry: Part 1’, Forensic
Linguistics: The International Journal of Speech, Language and the Law 1,
197–206.
El-Saaran, M. H. A. (1951), A Critical Survey of the Phonetic Observations of
the Arab Grammarians, PhD thesis, University of London.
Englebretson, Robert (2009), ‘An overview of IPA Braille: An updated tac-
tile representation of the International Phonetic Alphabet’, Journal of the
Esling, John H. (2005), ‘There are no back vowels: The laryngeal articulator
model’, Canadian Journal of Linguistics 50, 13–44.
Esling, John H. (2010), ‘Phonetic notation’, in William J. Hardcastle, John
Laver and Fiona E. Gibbon (eds), The Handbook of Phonetic Sciences,
Oxford: Wiley-Blackwell, second edition, pp. 678–702.
Esling, John H. (forthcoming), ‘The articulatory function of the larynx and
the origin of speech’, plenary paper presented at the 38th Meeting of the
Berkeley Linguistics Society, February 2012.
Esling, John H. and Jimmy G. Harris (2005), ‘States of the glottis: An
articulatory phonetic model based on laryngoscopic observations’, in
William J. Hardcastle and Janet Mackenzie Beck (eds), A Figure of Speech:
A Festschrift for John Laver, Mahwah, NJ: Lawrence Erlbaum Associates,
pp. 347–83.
Eustace, S. S. (1969), ‘The meaning of palaeotype in A. J. Ellis’s On Early
English Pronunciation, 1869–89’, Transactions of the Philological Society
68, 31–79.
Faber, Alice (1992), ‘Phonemic segmentation as epiphenomenon: Evidence
from the history of alphabetic writing’, in Pamela Downing, Susan D. Lima
and Michael Noonan (eds), The Linguistics of Literacy, Amsterdam: John
Benjamins, pp. 111–34.
Fant, Gunnar M. (1962), ‘Descriptive analysis of the acoustic aspects of
speech’, Logos 5, 3–17. Reprinted in Ilse Lehiste (ed.) (1967), Readings in
Acoustic Phonetics, Cambridge, MA: MIT Press, pp. 93–107.
Farnetani, Edda and Daniel Recasens (2010), ‘Coarticulation and connected
speech processes’, in William J. Hardcastle, John Laver and Fiona E. Gibbon
(eds), The Handbook of Phonetic Sciences, Oxford: Wiley-Blackwell, second
edition, pp. 316–52.
Finnegan, Ruth (1977), Oral Poetry, Cambridge: Cambridge University Press.
SomoudBarghouthy
Firth, J. R. (1946), ‘The English School of phonetics’, Transactions of the
Philological Society 45, 92–132.
Flemming, Edward S. (2002), Auditory Representations in Phonology, New
York: Routledge.
Fodor, Jerry (1984), The Modularity of Mind, Cambridge, MA: MIT Press.
Foulkes, Paul and Gerard Docherty (1999), ‘Urban voices: Overview’, in Paul
Foulkes and Gerard Docherty (eds), Urban Voices, London: Edward Arnold,
pp. 1–24.
Foulkes, Paul, James M. Scobbie and Dominic Watt (2010), ‘Sociophonetics’,
in William J. Hardcastle, John Laver and Fiona E. Gibbon (eds), The
Handbook of Phonetic Sciences, Oxford: Wiley-Blackwell, second edition,
pp. 703–54.
Fowler, Carol A. (1986), ‘An event approach to the study of speech perception
from a direct realist perspective’, Journal of Phonetics 14, 3–28.
Fowler, Carol A. (1990), ‘Calling a mirage a mirage: Direct perception of
speech produced without a tongue’, Journal of Phonetics 18, 529–41.
Fowler, Carol A. and Bruno Galantucci (2005), ‘The relation of speech percep-
tion and speech production’, in David B. Pisoni and Robert E. Remez (eds),
The Handbook of Speech Perception, Malden, MA: Blackwell, pp. 633–52.
Fowler, Carol A. and L. D. Rosenblum (1990), ‘Duplex perception: A com-
parison of monosyllables and slamming doors’, Journal of Experimental
Psychology: Human Perception and Performance 16, 742–54.
Fox, Anthony (2000), Prosodic Features and Prosodic Structures, Oxford:
Franc, Boris and Styne, Draga (1991), ‘Electric ear circuits’, Vox Machinalia
1, 1–11.
Fraser, Helen (2003), ‘Issues in transcription: Factors affecting the reliability
of transcripts as evidence in legal cases’, International Journal of Speech,
Language and the Law 10, 203–26.
Fraser, Helen (2005), ‘Representing speech in practice and theory’, in William
J. Hardcastle and Janet Mackenzie Beck (eds), A Figure of Speech: A
Festschrift for John Laver, Mahwah, NJ: Lawrence Erlbaum Associates, pp.
93–128.
Fromkin, Victoria A. and Peter Ladefoged (1981), ‘Early views of distinctive
features’, in R. E. Asher and Eugénie J. A. Henderson (eds), Towards a
History of Phonetics, Edinburgh: Edinburgh University Press, pp. 3–8.
Fry, Denis B., Arthur S. Abramson, P. D. Eimas and Alvin M. Liberman (1962),
‘The identification and discrimination of synthetic vowels’, Language and
Speech 5, 171–89.
Gandour, J. T. (1979), ‘Tonal rules for English loanwords in Thai’, in
T. L. Thongkum, V. Panupong, P. Kullavanijaya and M. R. K. Tingsabadh
(eds), Studies in Thai and Mon-Khmer Phonetics and Phonology: In Honor of
Eugénie Henderson, Bangkok: Chulalongkorn University Press, pp. 94–105.
Gardiner, Alan H. (1916), ‘The Egyptian origin of the Semitic alphabet’,
Journal of Egyptian Archaeology 3, 1–16.
Gelb, Ignace Jay (1969), A Study of Writing, Chicago: University of Chicago
Press, revised edition.
Gibbon, Fiona E. (1990), ‘Lingual activity in two speech disordered children’s
277
attempts to produce velar and alveolar stop consonants: Evidence from elec-
tropalatographic (EPG) data’, British Journal of Disorders of Communication
25, 329–40.
Giegerich, Heinz J. (1999), Lexical Strata in English, Cambridge: Cambridge
University Press.
Gimson, A. C. (1980), An Introduction to the Pronunciation of English, London:
Edward Arnold, third edition.
Goldinger, Stephen D. (1998), ‘Echoes of echoes? An episodic theory of lexical
access’, Psychological Review 105, 251–79.
Golestani, Narly, Cathy J. Price and Sophie K. Scott (2011), ‘Born with an ear
for dialects? Structural plasticity in the expert phonetician brain’, Journal of
Neuroscience 31, 4213–20.
Gombrich, E. H. (1972), Art and Illusion, London: Phaidon Press, fourth edi-
tion.
Grabe, Esther and E. L. Low (2002), ‘Durational variability in speech and the
rhythm class hypothesis’, in Carlos Gussenhoven and Natasha Warner (eds),
Papers in Laboratory Phonology VII, Berlin: Mouton, pp. 515–46.
Grierson, G. A. (1928), Linguistic Survey of India. Vol. I, Part II: Comparative
Vocabulary, Calcutta: Government of India Central Publications Branch.
Grossberg, Stephan (2003), ‘Resonant neural dynamics of speech perception’,
Journal of Phonetics 31, 423–45.
Grunwell, Pamela (1987), Clinical Phonology, London: Croom Helm, second
edition.
Grunwell, Pamela and Anne Harding (1996), ‘A note on describing types of
nasality’, Clinical Linguistics and Phonetics 10, 157–61.
Guendouzi, Jacqueline A. and Nicole Müller (2006), ‘Orthographic transcrip-
tion’, in Nicole Müller (ed.), Multilayered Transcription, San Diego: Plural,
pp. 19–39.
Gussenhoven, Carlos and Haike Jacobs (1998), Understanding Phonology,
London: Edward Arnold.
Hale, Mark and Charles Reiss (2000), ‘Phonology as cognition’, in Noel
Burton-Roberts, Philip Carr and Gerard Docherty (eds), Phonological
Knowledge, Oxford: Oxford University Press, pp. 161–84.
Halliday, Michael A. K. (1967), Intonation and Grammar in British English,
The Hague: Mouton.
Halliday, Michael A. K. (1970), A Course in Spoken English: Intonation,
Oxford: Oxford University Press.
Halliday, Michael A. K. (1981), ‘The origin and early development of Chinese
phonological theory’, in R. E. Asher and Eugénie J. A. Henderson (eds),
Towards a History of Phonetics, Edinburgh: Edinburgh University Press, pp.
123–40.
Halliday, Michael A. K. (1985), Spoken and Written Language, Oxford: Oxford
University Press, second edition.
Hamlet, S. and M. Stone (1976), ‘Compensatory vowel characteristics resulting
from the presence of experimental dental prostheses’, Journal of Phonetics
4, 199–218.
SomoudBarghouthy
Hammerström, Göran (1958), ‘Representation of spoken language by written
symbols’, Miscellanea Phonetica III, 31–9.
Hammond, Michael (1999), The Phonology of English, Oxford: Oxford
University Press.
Harris, Roy (1986), The Origin of Writing, London: Duckworth.
Hart, John (1551), The Opening of the Unreasonable Writing of our Inglish
Toung. Reprinted in Danielsson (1955), pp. 109–64.
Hart, John (1569), An Orthographie. Reprinted in Danielsson (1955),
pp. 165–228.
Hart, John (1570), A Methode. Reprinted in Danielsson (1955), pp. 229–50.
Haugen, Einar (1972), First Grammatical Treatise: An Edition, Translation and
Commentary, London: Longman, second edition.
Hauser, Marc D. and W. Tecumseh Fitch (2003), ‘What are the uniquely
human components of the language faculty?’, in Morten H. Christiansen and
Simon Kirby (eds), Language Evolution, Oxford: Oxford University Press,
pp. 158–81.
Hayward, Katrina (2000), Experimental Phonetics, London: Longman.
Helmont, Franciscus Mercurius ab (1667), Alphabeti Vere Naturalis Hebraici
Brevissima Delineatio, Sulzbaci: A. Lichtentaler.
Hepburn, Alexa and Galina B. Bolden (2013), ‘The conversation analysis
analytic approach to transcription’, in Jack Sidnell and Tanya Stivers (eds),
Handbook of Conversation Analysis, Oxford: Wiley-Blackwell, pp. 57–76.
Heselwood, Barry (2007), ‘Schwa and the phonotactics of RP English’,
Transactions of the Philological Society 105, 148–87.
Heselwood, Barry (2008a), ‘Simultaneous phonemes in English’, Linguistica
Online 7, http://www.phil.muni.cz/linguistica/art/heselwood/hes-001.pdf
Heselwood, Barry (2008b), ‘Features of tablature notation in the International
Phonetic Alphabet’, Leeds Working Papers in Linguistics and Phonetics 13,
85–94.
Heselwood, Barry (2009), ‘A phenomenalist defence of narrow impressionistic
phonetic transcription as a clinical and research tool’, in Victoria Marrero
and Idaira Pineda (eds), Linguistics: The Challenge of Clinical Application,
Madrid: Euphonia Ediciones, pp. 25–31.
Heselwood, Barry (2012), ‘Ayn’, in Lutz Edzard and Rudolf de Jong (eds),
Encyclopedia of Arabic Language and Linguistics, Brill Online.
Heselwood, Barry and Fedapwy Al-Tamimi (2011), ‘A study of the laryn-
geal and pharyngeal consonants in Jordanian Arabic using nasoendoscopy,
videofluoroscopy and spectrography’, in Zeki Majeed Hassan and Barry
Heselwood (eds), Instrumental Studies in Arabic Phonetics, Amsterdam:
John Benjamins, pp. 101–27.
Heselwood, Barry and Zeki Majeed Hassan (2011), ‘Introduction’, in Zeki
Majeed Hassan and Barry Heselwood (eds), Instrumental Studies in Arabic
Phonetics, Amsterdam: John Benjamins, pp. 1–25.
Heselwood, Barry and Sara Howard (2008), ‘Clinical phonetic transcription’,
in Martin J. Ball, Mick Perkins, Nicole Müller and Sara Howard (eds), The
Handbook of Clinical Linguistics, Oxford: Wiley-Blackwell, pp. 381–99.
Heselwood, Barry and Leendert Plug (2011), ‘The role of F2 and F3 in the
perception of rhoticity: Evidence from listening experiments’, Proceedings
279
of the XVIIth International Congress of Phonetic Sciences, 867–70.

Heselwood, Barry, Zeki Majeed Hassan and Mark J. Jones (2013), ‘Historical
overview of phonetics’, in Mark J. Jones and Rachael-Anne Knight (eds),
The Bloomsbury Companion to Phonetics, London: Bloomsbury, pp. 5–20.
Hewings, Martin (2004), Pronunciation Practice Activities Book, Cambridge:
Hillenbrand, J. M. and R. A. Houde (1996), ‘Role of F0 and amplitude in the
perception of intervocalic glottal stops’, Journal of Speech and Hearing
Research 39, 1182–90.
Hirst, Daniel J. (2004), ‘Lexical and non-lexical tone and prosodic typology’,
in Bernard Bel and Isabelle Marlien (eds), Proceedings of the International
Symposium on Tonal Aspects of Languages, Beijing: Chinese Academy of
Social Sciences, pp. 81–8.
Hockett, Charles F. (1955), A Manual of Phonology, Bloomington: Indiana
University Publications in Anthropology and Linguistics, Memoir 11.
Hodge, Megan M. (2013), ‘Development of the vowel space in children’, in
Martin J. Ball and Fiona E. Gibbon (eds), Handbook of Vowel Disorders,
New York: Psychology Press, pp. 1–23.
Holder, William (1669), Elements of Speech, facsimile edition, ed. R. C. Alston,
Menston: Scolar Press, 1967.
Honda, K. (1996), ‘Organization of tongue articulation for vowels’, Journal of
Phonetics 24, 39–52.
Howard, David and James Angus (2001), Acoustics and Psychoacoustics,
Oxford: Focal Press, second edition.
Howard, Sara (2011), ‘Phonetic transcription for speech related to cleft pal-
ate’, in Sara Howard and Annette Lohmander (eds), Cleft Palate Speech:
Assessment and Intervention, Oxford: John Wiley and Sons, pp. 127–44.
Howard, Sara and Barry Heselwood (2002), ‘Learning and teaching phonetic
transcription for clinical purposes’, Clinical Linguistics and Phonetics 16,
371–401.
Howard, Sara and Barry Heselwood (2011), ‘Instrumental and perceptual pho-
netic analyses: The case for two-tier transcriptions’, Clinical Linguistics and
Phonetics 25, 940–8.
Howard, Sara and Barry Heselwood (2013), ‘The contribution of phonetics to
the study of vowel development and disorders’, in Martin J. Ball and Fiona
E. Gibbon (eds), Handbook of Vowels and Vowel Disorders, New York:
Psychology Press, pp. 61–112.
Howard, Sara and Zoe Jordan (2009), ‘Speaking under articulatory constraints:
What ventriloquist speech can tell us about impaired speech production’,
in Victoria Marrero and Idaira Pineda (eds), Linguistics: The Challenge of
Clinical Application, Madrid: Euphonia Ediciones, pp. 32–40.
Hüllen, Werner (1986), ‘The paradigm of John Wilkins’ Thesaurus’, in
R. R. K. Hartmann (ed.), The History of Lexicography, Amsterdam: John
Benjamins, pp. 115–25.
Hyman, Larry M. (1975), Phonology; Theory and Analysis, New York: Holt,
Rinehart and Winston.
SomoudBarghouthy
Iacoboni, M., R. P. Woods, M. Brass, H. Bekkering, J. C. Mazziotta and
G. Rizzilatti (1999), ‘Cortical mechanisms of human imitation’, Science 286,
2526–8.
Ingram, John C. L. (2007), Neurolinguistics: An Introduction to Spoken Language
Processing and its Disorders, Cambridge: Cambridge University Press.
Ingrisano, D., T. Klee and C. Binger (1996), ‘Linguistic context effects on tran-
scription’, in Thomas W. Powell (ed.), Pathologies of Speech and Language:
Contributions of Clinical Phonetics and Linguistics, New Orleans: ICPLA,
pp. 45–6.
IPA (1949), The Principles of the International Phonetic Association.
International Phonetic Association.
IPA (1999), Handbook of the International Phonetic Association, Cambridge:
Jackson, Frank (1986), ‘What Mary didn’t know’, Journal of Philosophy 83,
291–5.
Jakobson, Roman (1968), Child Language, Aphasia and Phonological
Universals, The Hague: Mouton.
Jefferson, Gail (2004), ‘Glossary of transcript symbols with an introduction’,
in Gene H. Lerner (ed.), Conversation Analysis: Studies from the First
Generation, Amsterdam: John Benjamins, pp. 13–31.
Jespersen, Otto (1889), The Articulations of Speech Sounds, Marburg:
N. G. Elwert.
Jespersen, Otto (1907), John Hart’s Pronunciation of English, Heidelberg: Carl
Winter’s Universitätsbuchhandlung.
Johnson, Keith (2003), Acoustic and Auditory Phonetics, Malden, MA:
Blackwell, second edition.
Johnson, Keith (2007), ‘Decisions and mechanisms in exemplar-based phonol-
ogy’, in Maria-Josep Solé, Patrice Speeter Beddor and Manjari Ohala (eds),
Experimental Approaches to Phonology, Oxford: Oxford University Press,
pp. 25–40.
Johnson, Sally (2005), Spelling Trouble: Language, Ideology and the Reform of
German Orthography, Clevedon: Multilingual Matters.
Jones, Daniel (1909), The Pronunciation of English, Cambridge: Cambridge
University Press.
Jones, Daniel (1918/1972), An Outline of English Phonetics, Cambridge:
Cambridge University Press, ninth edition.
Jones, Daniel and Sol Plaatje (1916), A Sechuana Reader, London: London
University Press.
Kamata, Miho (2008), An Acoustic Sociophonetic Study of Three London
Vowels, PhD thesis, University of Leeds.
Kelly, John (1981), ‘The 1847 alphabet: An episode of phonotypy’, in
R. E. Asher and Eugénie J. A. Henderson (eds), Towards a History of
Kelly, John and John Local (1984), ‘The modernity of Henry Sweet’, Henry
Sweet Society Newsletter 2, 3–9.
Kelly, John and John Local (1989), Doing Phonology, Manchester: Manchester
University Press.
Kemp, J. A. (1972), John Wallis’s Grammar of the English Language, London:
281
Longman.
Kemp, J. A. (1981a), ‘Early descriptions of nasality’, in R. E. Asher and
Eugénie J. A. Henderson (eds), Towards a History of Phonetics, Edinburgh:
Edinburgh University Press, pp. 35–49.
Kemp, J. A. (1981b), ‘Introduction to Lepsius’s Standard Alphabet’, in Lepsius
(1863), pp. ix*–99*.
Kemp, J. A. (2001), ‘The development of phonetics from the late 18th to the late
19th centuries’, in Sylvain Auroux, E. F. K. Koerner, Hans-Josef Niederehe
and Kees Versteegh (eds), History of the Language Sciences, Berlin: Walter
de Gruyter, pp. 1468–80.
Kemp, J. A. (2006), ‘Phonetics: Precursors to modern approaches’, in Keith
Brown (ed.), Encyclopedia of Language and Linguistics. Vol. 9, Amsterdam:
Elsevier, pp. 470–89.
Kenstowicz, Michael (1994), Phonology in Generative Grammar, Cambridge,
MA: Blackwell.
Kenstowicz, Michael and Charles Kisseberth (1977), Topics in Phonological
Theory, New York: Academic Press.
Kerswill, Paul and Susan Wright (1990), ‘The validity of phonetic transcrip-
tion: Limitations of a sociolinguistic research tool’, Language Variation and
Change 2, 255–75.
Khattab, Ghada, Feda Al-Tamimi and Barry Heselwood (2006), ‘Acoustic
and auditory differences in the /t/–/t/ opposition in male and female speak-
ers of Jordanian Arabic’, in Sami Boudelaa (ed.), Perspectives on Arabic
Linguistics XVI, Amsterdam: John Benjamins, pp. 131–60.
Kim, Young-Shin (2011), An Acoustic, Aerodynamic and Perceptual Investigation
of Word-Initial Denasalisation in Korean, PhD thesis, University College,
London.
King, Ross (1996), ‘Korean writing’, in Peter T. Daniels and William J. Bright
(eds), The World’s Writing Systems, Oxford: Oxford University Press,
pp. 218–27.
Kluender, Keith R., Jeffry A. Coady and Michael Kiefte (2003), ‘Sensitivity to
change in perception of speech’, Speech Communication 41, 59–69.
Knight, Rachael-Anne (2011), ‘Towards a cognitive model of phonetic tran-
scription’, Proceedings of the Phonetics Teaching and Learning Conference
2011, University College London, pp. 17–20.
Koenig, W., H. K. Dunn and L. Y. Lacy (1946), ‘The sound spectrograph’,
Journal of the Acoustical Society of America 17, 19–49.
Kogan, Leonid (2011), ‘Reconstructing Proto-Semitic and models of classifi-
cation’, in Stefan Weninger (ed.), The Semitic Languages: An International
Handbook, Berlin: De Gruyter Mouton, pp. 54–151.
Kohler, Klaus J. (1981), ‘Three trends in phonetics: The development of the
discipline in Germany since the nineteenth century’, in R. E. Asher and
Eugénie J. A. Henderson (eds), Towards a History of Phonetics, Edinburgh:
Edinburgh University Press, pp. 161–78.
Kohler, Klaus J. (2007), ‘Beyond laboratory phonology: The phonetics of
speech communication’, in Maria-Josep Solé, Patrice Speeter Beddor and
SomoudBarghouthy
Manjari Ohala (eds), Experimental Approaches to Phonology, Oxford:
Köhler, Oswin, Peter Ladefoged, Jan Snyman, Anthony Traill and Rainer
Vossen (1988), ‘The symbols for clicks’, Journal of the International
Phonetic Association 18, 140–2.
Kuhl, Patricia K. (1989), ‘On babies, birds, modules and mechanisms: A
comparative approach to the acquisition of vocal communication’, in
R. J. Dooling and S. H. Hulse (eds), The Comparative Psychology of
Audition, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 379–422.
Kuhl, Patricia K., Shigeri Kiritani, Toshisada Deguchi, Akiko Hayashi, Erica B.
Stevens, Charmaine D. Dugger and Paul Iverson (1997), ‘Effects of language
experience on speech perception: American and Japanese infants’ perception
of /ra/ and /la/’, Journal of the Acoustical Society of America 102, 3135–6.
Lacerda, Francisco and Henrique Onofre Moreira (1982), ‘How does the
peripheral auditory system represent formant transitions? A psychophysical
approach’, in Rolf Carlson and Björn Granström (eds), The Representation of
Speech in the Peripheral Auditory System, Amsterdam: Elsevier Biomedical
Press, pp. 89–94.
Ladefoged, Peter (1967), ‘The nature of vowel quality’, in Three Areas of
Experimental Phonetics, Oxford: Oxford University Press, pp. 50–142.
Ladefoged, Peter (1990), ‘Some reflections on the IPA’, Journal of Phonetics
18, 335–46.
Ladefoged, Peter (1997), ‘Linguistic phonetic descriptions’, in William J.
Hardcastle and John Laver (eds), The Handbook of Phonetic Sciences,
Ladefoged, Peter (2003), Phonetic Data Analysis, Oxford: Blackwell.
Ladefoged, Peter and Ian Maddieson (1996), The Sounds of the World’s
Languages, Oxford: Blackwell.
Ladefoged, Peter and Anthony Traill (1994), ‘Clicks and their accompani-
ments’, Journal of Phonetics 22, 33–64.
Ladefoged, P., R. Harshman, L. Goldstein and L. Rice (1978), ‘Generating
vocal tract shapes from formant frequencies’, Journal of the Acoustical
Society of America 64, 1027–35.
Lashley, K. S. (1951), ‘The problem of serial order in behavior’, in L. A. Jeffress
(ed.), Cerebral Mechanisms in Behavior, New York: John Wiley and Sons,
pp. 112–46.
Lass, Roger (1984), Phonology, Cambridge: Cambridge University Press.
Laufer, Asher (1996), ‘The common [ʕ] is an approximant and not a fricative’,
Journal of the International Phonetic Association 26, 113–17.
Laufer, Asher and Thomas Baer (1988), ‘The emphatic and pharyngeal sounds
in Hebrew and in Arabic’, Language and Speech 31, 181–205.
Laver, John (1980), The Phonetic Description of Voice Quality, Cambridge:
Laver, John (1994), Principles of Phonetics, Cambridge: Cambridge University
Press.
Lavoie, Lisa M. (2001), Consonant Strength: Phonological Patterns and
Phonetic Manifestations, New York: Garland.
Law, Vivien (1990), ‘Indian influence on early Arabic phonetics – or coin-
283
cidence?’, in Kees Versteegh and Michael G. Carter (eds), Studies in the

History of Arabic Grammar II, Amsterdam: John Benjamins, pp. 215–27.
Law, Vivien (1995), Wisdom, Authority and Grammar in the Seventh Century,
Law, Vivien (1997), Grammar and Grammarians in the Early Middle Ages,
London: Longman.
Lecumberri, M. Luisa Garcia and John Maidment (2000), English Transcription
Course, London: Edward Arnold.
Lee, Hyun Bok (1999), ‘Korean’, in Handbook of the International Phonetic
Association, Cambridge: Cambridge University Press, pp. 120–3.
Lehiste, Ilse (1970), Suprasegmentals, Cambridge, MA: MIT Press.
Lepsius, Richard (1863), Standard Alphabet for Reducing Unwritten Languages
and Foreign Graphic Systems to a Uniform Orthography in European Letters,
second edition, ed. J. Alan Kemp, Amsterdam: John Benjamins, 1981.
Levelt, W. J. M. and L. Wheeldon (1994), ‘Do speakers have access to a mental
syllabary?’, Cognition 50, 239–69.
Li, Leyi (1992), Tracing the Roots of Chinese Characters: 500 Cases, Beijing:
University of Language and Culture. (In Chinese).
Liberman, Alvin M. (1996), Speech: A Special Code, Cambridge, MA: MIT
Press.
Liberman, Alvin M. and Ignatius Mattingley (1985), ‘The motor theory of
speech perception revised’, Cognition 21, 1–36.
Liberman, Alvin M., D. Isenberg and B. Rakerd (1981), ‘Duplex perception
of cues for stop consonants: Evidence for a phonetic mode’, Perception and
Psychophysics 30, 133–43.
Lindau, Mona (1985), ‘The story of /r/’, in Victoria A. Fromkin (ed.), Phonetic
Linguistics: Essays in Honor of Peter Ladefoged, Orlando: Academic Press,
pp. 157–68.
Lindblöm, Björn (1990), ‘Explaining phonetic variation: A sketch of the H&H
theory’, in William J. Hardcastle and A. Marchal (eds), Speech Production
and Perception Modelling, Dordrecht: Kluwer Academic, pp. 403–40.
Linell, Per (1982), The Written Language Bias in Linguistics, Linköping:
Linköping University, Department of Communication Studies.
Local, John (1983), ‘Making a transcription: The evolution of A. J. Ellis’s pal-
aeotype’, Journal of the International Phonetic Association 13, 2–12.
Local, John and Gareth Walker (2012), ‘How phonetic features project more
talk,’ Journal of the International Phonetic Association 42, 255–80.
Locke, John L. (1993), The Child’s Path to Spoken Language, Cambridge, MA:
Harvard University Press.
Lotto, A.J., L. L. Holt and K. R. Kluender (1997), ‘Effect of voice quality on
perceived height of English vowels’, Phonetica 54, 76–93.
Lyons, John (1977), Semantics. Vol. I, Cambridge: Cambridge University
Press.
Maassen, B., S. Offereinga, W. Vieregge and G. Thoonen (1996), ‘Transcription
of pathological speech in children by means of ExtIPA: Agreement and
relevance’, in Tom Powell (ed.), Pathologies of Speech and Language:
SomoudBarghouthy
Contributions of Clinical Phonetics and Linguistics, New Orleans: ICPLA,
37–43.
Mackenzie Beck, Janet (2010), ‘Organic variation of the vocal apparatus’,
in William J. Hardcastle, John Laver and Fiona E. Gibbon (eds), The
pp. 155–201.
MacMahon, Michael K. C. (1985), ‘James Murray and the phonetic notation
in the New English Dictionary’, Transactions of the Philological Society 83,
72–112.
MacMahon, Michael K. C. (1986), ‘The International Phonetic Association:
The first 100 years’, Journal of the International Phonetic Association 16,
30–8.
MacMahon, Michael K. C. (1994), ‘A mid-18th-century use of [ə], [ɔ] and [ʞ]
as phonetic symbols’, Journal of the International Phonetic Association 24:
19–20.
MacMahon, Michael K. C. (1996), ‘Phonetic notation’, in Peter T. Daniels
Makkai, Valerie Becker (1972), Phonological Theory: Evolution and Current
Practice, New York: Holt, Rinehart and Jovanovitch.
Mann, Virginia A. (1986), ‘Phonological awareness: The role of reading expe-
rience’, Cognition 24, 65–92.
Martinet, André (1986), ‘“Le zed à ventre” or a functional approach to phonetic
notation’, Journal of the International Phonetic Association 16, 39–45.
Massaro, Dominic W. (2004), ‘From multisensory integration to talking heads
and language learning’, in G. Calvert, C. Spence and B. E. Stein (eds),
Handbook of Multisensory Processes, Cambridge, MA: MIT Press, 153–76.
Matthews, Pete (1994), ‘Greek and Roman linguistics’, in Giulio Lepschy (ed.),
History of Linguistics. Vol. II: Classical and Medieval Linguistics, London:
Longman, pp. 1–133.
Maurer, D., B. Gröne, T. Landis, G. Hoch and P. W. Schönle (1993),
‘Re-examination of the relation between the vocal tract and the vowel sound
with electromagnetic-articulography (EMA) in vocalizations’, Clinical
Linguistics and Phonetics 7, 129–43.
McCawley, James D. (1996), ‘Musical notation’, in Peter T. Daniels and
William J. Bright (eds), The World’s Writing Systems, Oxford: Oxford
McGurk, H. and J. W. McDonald (1976), ‘Hearing lips and seeing voices’,
Nature 264, 746–8.
McIntosh, Angus (1961), ‘“Graphology” and meaning’, Archivum Linguisticum
13, 107–20.
McLeod, S., A. Roberts and J. Sita (2006), ‘Tongue/palate contact for the pro-
duction of /s/ and /z/’, Clinical Linguistics and Phonetics 20, 51–66.
McMahon, April (2000), Lexical Phonology and the History of English,
McQueen, James M. and Anne Cutler (2010), ‘Cognitive processes in speech
perception’, in William J. Hardcastle, John Laver and Fiona E. Gibbon (eds),
The Handbook of Phonetic Sciences, Oxford: Wiley-Blackwell, second edi-
285
tion, pp. 489–520.

Mehiri, Abdelkader (1973), Les théories grammaticales d’Ibn Jinni, Tunis:
Publications de l’Université de Tunis.
Merleau-Ponty, Maurice (1945/2002), Phenomenology of Perception, London:
Routledge.
Mermelstein, Paul (1978), ‘Difference limens for formant frequencies of
steady-state and consonant-bound vowels’, Journal of the Acoustical Society
of America 68, 572–80.
Merrick, W. Percy and W. Potthoff (1934), A Braille Notation of the
International Phonetic Alphabet (1932) with Keywords and Specimen Texts,
London: National Institute for the Blind.
Misra, Vidya Niwas (1966), The Descriptive Technique of Pānini, The Hague:
Mouton.
Mole, Christopher (2009), ‘The motor theory of speech perception’ in Matthew
Nudds and Casey O’Callaghan (eds), Sounds and Perception, Oxford:
Molfese, Dennis L., Alexandra P. Fonaryova Key, Mandy J. Maguire, Guy
O. Dove and Victoria J. Molfese (2005), ‘Event-related evoked potentials
(ERPs) in speech perception’, in David P. Pisoni and Robert Remez (eds),
The Handbook of Speech Perception, Malden, MA: Blackwell, pp. 99–121.
Moody, A. David (2007), Ezra Pound: Poet. Vol. 1: The Young Genius 1885–
1920, Oxford: Oxford University Press.
Moore, Brian C. J. (1997), An Introduction to the Psychology of Hearing, San
Diego: Academic Press, fourth edition.
Moore, Brian C. J. (2010), ‘Aspects of auditory processing related to speech
perception’, in William J. Hardcastle, John Laver and Fiona E. Gibbon (eds),
The Handbook of Phonetic Sciences, Oxford: Wiley-Blackwell, second edi-
tion, pp. 454–88.
Morais, José, Paul Bertelson, Luz Cary and Jesus Alegria (1986), ‘Literacy
training and speech segmentation’, Cognition 24, 45–64.
Morpurgo Davies, Anna (1998), ‘Nineteenth century linguistics’, in Giulio
Lepschy (ed.), History of Linguistics. Vol. IV: Nineteenth-Century Linguistics,
London; Longman.
Mountford, John (1996), ‘A functional classification’, in Peter T. Daniels
Mulder, Jan W. F. (1968), Sets and Relations in Phonology, Oxford: Clarendon
Press.
Mulder, Jan W. F. (1975), ‘Linguistic theory, linguistic descriptions, and the
speech-phenomena’, La Linguistique 11, 87–104. Reprinted in Jan W. F.
Mulder and Sándor Hervey (1980), The Strategy of Linguistics, Edinburgh:
Scottish Academic Press, 15–28.
Mulder, Jan W. F. (1987), ‘Effective methodology and effective phonological
description’, La Linguistique 23, 19–42.
Mulder, Jan W. F. (1989), Foundations of Axiomatic Linguistics, Berlin:
Mouton de Gruyter.
SomoudBarghouthy
Mulder, Jan W. F. (1994), ‘Written and spoken languages as separate semiotic
systems’, Semiotica 101, 41–72.
Mulder, Jan W. F. and Sándor Hervey (1975), ‘Language as a system of sys-
tems’, La Linguistique 11, 3–22. Reprinted in Jan W. F. Mulder and Sándor
Hervey (1980), The Strategy of Linguistics, Edinburgh: Scottish Academic
Press, 73–87.
Müller, Nicole and Martin J. Ball (2006), ‘Assembling and extending the tool
kit’, in Nicole Müller (ed.), Multilayered Transcription, San Diego: Plural,
pp. 149–60.
Müller, Nicole and Jacqueline A. Guendouzi (2006), ‘Transcribing at the dis-
course level’, in Nicole Müller (ed.), Multilayered Transcription, San Diego:
Plural, pp. 113–33.
Nagel, Thomas (1974), ‘What is it like to be a bat?’, Philosophical Review 83,
435–50.
Nolan, Francis (1997), ‘Speaker recognition and forensic phonetics’, in
William J. Hardcastle and John Laver (eds), The Handbook of Phonetic
Sciences, Oxford: Blackwell, pp. 744–67.
Nomura, Masaaki (1988), Kanji no Mirai [The Future of Kanji], Tokyo:
Chikuma Shobo.
Norris, M., J. R. Harden and D. M. Bell (1980), ‘Listener agreement on articu-
lation errors of four- and five-year-old children’, Journal of Speech and
Hearing Disorders 45, 378–89.
O’Callaghan, Casey and Mathew Nudds (2009), ‘Introduction: The philosophy
of sounds and auditory perception’, in Casey O’Callaghan and Mathew
Nudds (eds), Sounds and Perception, Oxford: Oxford University Press,
pp. 1–25.
Ochs, E. (1979), ‘Transcription as theory’, in E. Ochs and B. Schiefflin (eds),
Developmental Pragmatics, New York: Academic Press.
O’Connor, J. D. (1973), Phonetics, Harmondsworth: Penguin.
O’Connor, J. D. and G. F. Arnold (1973), Intonation of Colloquial English,
London: Longman, second edition.
Odisho, Edward Y. (2011), ‘Journey of scientific heritage: An exclusive Arab/
Muslim enterprise or a multi-ethnic multi-religious one?’, Parole de l’Orient
36, 201–18.
Ohala, John J. (1986), ‘Against the direct realist view of speech perception’,
Okada, Hideo (1999), ‘Japanese’, in Handbook of the International Phonetic
Association, Cambridge: Cambridge University Press, pp. 117–19.
Oller, D. Kimbrough (1980), ‘The emergence of speech sounds in infancy’,
in G. H. Yeni-Komshian, J. F. Kavanagh and C. A. Ferguson (eds),
Child Phonology. Vol. 1: Production, New York: Academic Press,
pp. 93–112.
Oller, D. Kimbrough (2000), The Emergence of the Speech Capacity, Mahwah,
NJ: Lawrence Erlbaum Associates.
Oller, D. Kimbrough and Rebecca E. Eilers (1975), ‘Phonetic expectation and
transcription validity’, Phonetica 31, 288–304.
Oller, D. Kimbrough, Rebecca E. Eilers, A. Rebecca Neal and Heidi K.
SomoudBarghouthy References 287
Schwartz (1999), ‘Precursors to speech in infancy: The prediction of speech
and language disorders’, Journal of Communication Disorders 32, 223–45.
Olson, David (1994), The World on Paper, Cambridge: Cambridge University
Press.
Olson, Kenneth S. and John Hajek (1999), ‘The phonetic status of the labial
flap’, Journal of the International Phonetic Association 29, 101–14.
Orton, Harold and Eugen Dieth (eds) (1962), The Survey of English Dialects:
Introduction, Leeds: Edward Arnold.
Parker, Ann (1999), PETAL: Phonological Evaluation and Transcription of
Audio-Visual Language, Milton Keynes: Speechmark.
Parker, Ellen M. and Randy L. Diehl (1984), ‘Identifying vowels in CVC syl-
lables: Effects of inserting silence and noise’, Perception and Psychophysics
36, 369–80.
Passy, Paul (1907), ‘Alphabet organique’, Le Maître phonétique 22, 55–7.
Patterson, D., P. C. LoCasto and C. M. Connine (2003), ‘Corpora analysis of
frequency of schwa deletion in conversational American English’, Phonetica
60, 45–69.
Paulian, Christiane (1975), Le kukuya, langage teke du Congo: phonologie,
classes nominales, Paris: Société d’études linguistiques et anthropologiques
de France.
Perkell, Joseph S. (1997), ‘Articulatory processes’, in William J. Hardcastle
and John Laver (eds), The Handbook of Phonetic Science, Oxford: Blackwell,
pp. 333–70.
Peterson-Falzone, Sally, Judith Trost-Cardamone, Michael P. Karnell and
Mary A. Hardin-Jones (2006), The Clinician’s Guide to Treating Cleft Palate
Speech, St. Louis: Elsevier.
Picone, J., K. M. Goudie-Marshall, G. R. Doddington and W. Fisher (1986),
‘Automatic text alignment for speech system evaluation’, IEEE Transactions
on Acoustics, Speech, and Signal Processing 34, 780–4.
Pike, Eunice V. (1946), Dictation Exercises in Phonetics, Glendale: Summer
Institute of Linguistics.
Pike, Kenneth L. (1943), Phonetics, Ann Arbor: University of Michigan Press.
Pike, Kenneth L. (1947), Phonemics, Ann Arbor: University of Michigan Press.
Port, Robert F. and Penny Crawford (1989), ‘Incomplete neutralisation and
pragmatics in German’, Journal of Phonetics 17, 257–82.
Poyatos, Fernando (2002), Non-Verbal Communication Across Disciplines. Vol.
1: Culture, Sensory Interaction, Speech, Conversation, Amsterdam: John
Benjamins.
PRDS (1980), ‘The phonetic representation of disordered speech’, British
Journal of Disorders of Communication 15, 217–23.
Preston, Dennis R. (1989), Perceptual Dialectology: Nonlinguists’ Views of
Areal Linguistics, Dordrecht: Foris.
Prince, Alan S. and Paul Smolensky (1993), Optimality Theory: Constraint
Interaction in Generative Grammar, Rutgers University Centre for Cognitive
Science, Report 2.
Pulgram, Ernst (1965), ‘Graphic and phonic systems: Figurae and signs’, Word
21, 208–24.
SomoudBarghouthy
Pullum, Geoffrey K. and William A. Ladusaw (1996), Phonetic Symbol Guide,
Chicago: University of Chicago Press.
Pye, C., K. Wilcox and K. A. Siren (1988), ‘Refining transcriptions: The sig-
nificance of transcriber “errors”’, Journal of Child Language 15, 17–37.
Quené, Hugo (2007), ‘On the just noticeable difference for tempo in speech’,
Ramus, F., M. Nespor and J. Mehler (1999), ‘Correlates of linguistic rhythm in
the speech signal’, Cognition 73, 265–92.
Raphael, Lawrence J. (2005), ‘Acoustic cues to the perception of segmental
phonemes’, in David B. Pisoni and Robert E. Remez (eds), The Handbook of
Speech Perception, Oxford: Blackwell, pp. 182–206.
Read, Charles, Zhang Yun-Fei, Nie Hong-Yin and Ding Bao-Qing (1986), ‘The
ability to manipulate speech sounds depends on knowing alphabetic writing’,
Cognition 24, 31–44.
Remez, Robert E. (2005), ‘Perceptual organisation of speech’, in David B.
Pisoni and Robert E. Remez (eds), The Handbook of Speech Perception,
Remez, Robert E. and J. D. Trout (2009), ‘Philosophical messages in the
medium of spoken language’, in Matthew Nudds and Casey O’Callaghan
(eds), Sounds and Perception, Oxford: Oxford University Press, pp. 234–63.
Repp, Bruno (1981), ‘On levels of description in speech research’, Journal of
the Acoustical Society of America 69, 1462–4.
Revell, E. J. (1975), ‘The diacritical dots and the development of the Arabic
alphabet’, Journal of Semitic Studies 20, 178–90.
Rippmann, Walter (1911), English Sounds, London: Dent.
Ritner, Robert K. (1996), ‘Egyptian writing’, in Peter T. Daniels and William J.
Bright (eds), The World’s Writing Systems, Oxford: Oxford University Press,
pp. 73–84.
Roach, Peter (2000), English Phonetics and Phonology, Cambridge: Cambridge
University Press, third edition.
Roach, Peter, Jane Setter and John H. Esling (eds) (2013), English Pronouncing
Dictionary, Cambridge: Cambridge University Press, eighteenth edition.
Robertson, J. S. (2004), ‘The possibility and actuality of writing’, in Stephen D.
Houston (ed.), The First Writing, Cambridge: Cambridge University Press,
pp. 16–38.
Robins, R. H. (1990), A Short History of Linguistics, London: Longman, third
edition.
Robinson, Robert (1617), The Art of Pronuntiation, facsimile edition, ed.
R. C. Alston, Menston: Scolar Press, 1969.
Russell, Bertrand (1961), History of Western Philosophy, London: Allen and
Unwin, second edition.
Saldana, H. M. and L. D. Rosenblum (1993), ‘Visual influences on auditory
pluck and bow judgements’, Perception and Psychophysics 54, 406–16.
Salmon, Vivian (1972), The Works of Francis Lodwick: A Study of his Writings
in the Intellectual Context of the Seventeenth Century, London: Longman.
Salmon, Vivian (1983), ‘Nathaniel Chamberlain and his “Tractatus de literis
et lingua philosophica” (1679)’, in E. G. Stanley and Douglas Grey (eds),
Five Hundred Years of Words and Sounds: A Festschrift for Eric Dobson,
289
Cambridge: D. S. Brewer, pp. 128–36.

Salmon, Vivian (1995), ‘Some reflection of Dionysius Thrax’s “Phonetics”
in sixteenth-century English scholarship’, in Vivien Law and Ineke Sluiter
(eds), Dionysius Thrax and the Technē Grammatikē, Münster: Nodus
Publikationen, pp. 135–50.
Sampson, Geoffrey (1985), Writing Systems: A Linguistic Introduction, London:
Hutchinson.
Sara, Solomon J. (2009), ‘Al-Khalīl ibn Ahmad Al-Farāhīdī: The sound system
of Arabic’, Journal of Arabic Linguistics Tradition 7, 1–15.
Saussure, Ferdinand de (1974), Course in General Linguistics, Glasgow:
Fontana.
Savinainen-Makkonen, Tuula (2007), ‘Geminate template: A model for first
Finnish words’, First Language 27, 347–59.
Scancarelli, Janine (1996), ‘Cherokee writing’, in Peter T. Daniels and William
Press, pp. 587–92.
Schwartz, J. L., L-J. Boë, N. Vallée and C. Abry (2007), ‘The dispersion-
focalisation theory of vowel systems’, Journal of Phonetics 25, 255–86.
Scragg, D. G. (1974), A History of English Spelling, Manchester: Manchester
University Press.
Scruton, Roger (1997), The Aesthetics of Music, Oxford: Oxford University
Press.
Scruton, Roger (2009), ‘Sounds as secondary objects and pure events’, in
Matthew Nudds and Casey O’Callaghan (eds), Sounds and Perception,
Oxford: Oxford University Press, pp. 50–68.
Semaan, Khalil I. (1963), Arabic Phonetics: Ibn Siyna’s ‘Risaalah’ on the Points
of Articulation of the Speech Sounds, Lahore: Sh. Muhammad Ashraf.
Shadle, Christine H. (2010), ‘The aerodynamics of speech’, in William J.
Hardcastle, John Laver and Fiona E. Gibbon (eds), The Handbook of
Phonetic Sciences, Oxford: Wiley-Blackwell, second edition, pp. 39–80.
Shannon, Claude E. (1948), ‘A mathematical theory of communication’, Bell
Technical Journal XXVII, 379–423.
Shitaw, Abderraouf (in preparation), An Instrumental Phonetic Investigation of
Timing Relations in Two-Stop Consonant Clusters in Tripolitanian Libyan
Arabic, PhD thesis, University of Leeds.
Shriberg, Larry D. and Raymond D. Kent (1982), Clinical Phonetics, New
York: John Wiley and Sons.
Shriberg, Larry D. and Raymond D. Kent (2003), Clinical Phonetics, Boston:
Allyn and Bacon, third edition.
Shriberg, Larry D. and G. L. Lof (1991), ‘Reliability studies in broad and nar-
row phonetic transcription’, Clinical Linguistics and Phonetics 5, 225–79.
Shriberg, Larry D., J. Kwiatkowski and K. Hoffmann (1984), ‘A procedure
for phonetic transcription by consensus’, Journal of Speech and Hearing
Research 27, 456–65.
Simpson, Adrian P. (2005), ‘“From a grammatical angle”: Congruence in
Eileen Whitley’s phonology of English’, York Papers in Linguistics 2, 49–90.
SomoudBarghouthy
Skjærvø, P. Oktor (1996), ‘Aramaic scripts for Iranian languages’, in Peter T.
Daniels and William J. Bright (eds), The World’s Writing Systems, Oxford:
Smith, Janet S. (Shibamoto) (1996), ‘Japanese writing’, in Peter T. Daniels
Smith, Thomas (1568), De Recta et Emendata Linguae Anglicae Scriptione,
Diologus. Bror Danielsson’s critical edition with English translation from the
Latin, 1983, Stockholm: Almqvist and Wiksell.
Sorensen, Roy (2009), ‘Hearing silence: The perception and introspection of
absences’, in Matthew Nudds and Casey O’Callaghan (eds), Sounds and
Perception, Oxford: Oxford University Press, pp. 126–45.
Spedding, J., R. Ellis and D. Heath (eds) (1858), The Works of Francis Bacon,
London: Longman
Sproat, Amasa D. (1857), An Endeavour Towards a Universal Alphabet,
Chillicothe, OH: Author, reissued by Kessinger.
Stark, Rachel E. (1986), ‘Prespeech segmental feature development’, in Paul
Fletcher and Michael Garman (eds), Language Acquisition, Cambridge:
Cambridge University Press, second edition, pp. 149–73.
Steele, Joshua (1775), An Essay Towards Establishing the Melody and Measure
of Speech, to be Expressed and Perpetuated by Peculiar Symbols, facsimile
edition, ed. R. C. Alston, Menston: Scolar Press, 1969.
Stevens, Kenneth N. (1997), ‘Articulatory–acoustic–auditory relationships’,
in William J. Hardcastle and John Laver (eds), The Handbook of Phonetic
Sciences, Oxford: Blackwell, pp. 462–506.
Stevens, Kenneth N. (1998), Acoustic Phonetics, Cambridge, MA: MIT Press.
Stevens, Kenneth N. (2005), ‘Features in speech perception and lexical access’,
in David B. Pisoni and Robert E. Remez (eds), The Handbook of Speech
Perception, Oxford: Blackwell, pp. 125–55.
Stevens, Kenneth N. and Samuel Jay Keyser (1989), ‘Primary features and their
enhancement in consonants’, Language 65, 81–106.
Stoddart, Jana, Clive Upton and J. D. A. Widdowson (1999), ‘Sheffield dia-
lect in the 1990s: Revisiting the concept of NORMS’, in Paul Foulkes and
Gerard Docherty (eds), Urban Voices, London: Edward Arnold, pp. 72–89.
Stone, Maureen (2010), ‘Laboratory techniques for investigating speech articu-
lation’, in William J. Hardcastle, John Laver and Fiona E. Gibbon (eds), The
pp. 9–38.
Studdart-Kennedy, Michael (1982), ‘On the dissociation of auditory and
phonetic perception’, in Rolf Carlson and Björn Granström (eds), The
Representation of Speech in the Peripheral Auditory System, Amsterdam:
Elsevier Biomedical Press, pp. 9–26.
Studdert-Kennedy, Michael and Louis Goldstein (2003), ‘Launching language:
The gestural origin of discrete infinity’, in Morten H. Christiansen and
Simon Kirby (eds), Language Evolution, Oxford: Oxford University Press,
pp. 235–54.
Sumera, Magdalena (1981), ‘The keen prosodic ear: A comparison of the
notations of rhythm of Joshua Steele, William Thomson and Morris Croll’,
291
in R. E. Asher and Eugènie J. A. Henderson (eds), Towards a History of

Sundby, Bertil (1983), ‘Transcribing orthoepistic data’, in E. G. Stanley and
Douglas Gray (eds), Five Hundred Years of Words and Sounds: A Festschrift
for Eric Dobson, Cambridge: D. S. Brewer.
Sussman, Elyse S. (2005), ‘Integration and segregation in auditory scene analy-
sis’, Journal of the Acoustical Society of America 117, 1285–98.
Sweet, Henry (1877), A Handbook of Phonetics, Oxford: Clarendon Press.
Sweet, Henry (1881), ‘Sound notation’, Transactions of the Philological Society
18, 177–235.
Sweet, Henry (1904), ‘The Arabic throat sounds again’, Le Maître phonétique,
36–7.
Sweet, Henry (1906), A Primer in Phonetics, Oxford: Clarendon Press, third
edition.
Swinney, D. (1981), ‘Lexical processing during sentence comprehension:
Effects of higher-order constraints and implications for representation’, in
T. Myers, John Laver and John Anderson (eds), The Cognitive Representation
of Speech, Amsterdam: North-Holland, pp. 201–9.
Tavoni, Mirko (1998), ‘Renaissance linguistics: Western Europe’, in Guilio
Lepschy (ed.), History of Linguistics. Vol. III: Renaissance and Early Modern
Linguistics, London: Longman, pp. 1–108.
Tench, Paul (1978), ‘On introducing parametric phonetics’, Journal of the
Thomas, Erik R. (2011), Sociophonetics: An Introduction, Basingstoke:
Palgrave Macmillan.
Threatte, Leslie (1996), ‘The Greek alphabet’, in Peter T. Daniels and William
Press, pp. 271–80.
Trager, George L. and Bernard Bloch (1941), ‘The syllabic phonemes of
English’, Language 17, 223–46.
Trigger, Bruce G. (2004), ‘Writing systems: A case study in cultural evolu-
tion’, in Stephen D. Houston (ed.), The First Writing, Cambridge: Cambridge
Trubetzkoy, Nikolai S. (1933/2001), ‘The systematic phonological representa-
tion of languages’, in Anatoly Liberman (ed.), N. S. Trubetzkoy, Durham,
NC: Duke University Press, pp. 11–13.
Trubetzkoy, Nikolai S. (1937/2001), ‘On a new critique of the concept of the
phoneme’, in Anatoly Liberman (ed.), N. S. Trubetzkoy, Durham, NC: Duke
Trubetzkoy, Nikolai S. (1938/2001), ‘Quantity as a phonological problem’, in
Anatoly Liberman (ed.), N. S. Trubetzkoy, Durham, NC: Duke University
Press, pp. 44–9.
Trubetzkoy, Nikolai S. (1969), Principles of Phonology, Berkeley: University
of California Press.
Tucker, Abraham (1773), Vocal Sounds, facsimile edition, ed. R. C. Alston,
Menston: Scolar Press, 1969.
SomoudBarghouthy
Turk, Alice, Satsuki Nakai and Mariko Sugahara (2006), ‘Acoustic segment
duration in prosodic research: A practical guide’, in Stefan Sudhof, Denisa
Lenertová, Roland Meyer, Sandra Pappert, Petra Augurzky, Ina Mleinek,
Nicole Richter and Johannes Schlieβer (eds), Methods in Empirical Prosody
Research, Berlin: Mouton de Gruyter, pp. 1–28.
Upton, Clive, David Parry and J. D. A. Widdowson (1994), The Survey of
English Dialects: The Dictionary and Grammar, London: Routledge.
Vachek, Josef (1945–9), ‘Some remarks on writing and phonetic transcription’,
Acta Linguistica V, 86–93.
Vachek, Josef (1973), Written Language: General Problems and Problems of
English, The Hague: Mouton.
Vaissière, Jacqueline (2005), ‘Perception of intonation’, in David B. Pisoni
and Robert E. Remez (eds), The Handbook of Speech Perception, Oxford:
Blackwell, pp. 236–63.
Vaissière, Jacqueline (2007), ‘Area functions and articulatory modeling as a
tool for investigating the articulatory, acoustic, and perceptual properties
of sounds across languages’, in Maria-Josep Solé, Patrice Speeter Beddor
and Manjari Ohala (eds), Experimental Approaches to Phonology, Oxford:
Varma, Siddeshwar (1961), Critical Studies in the Phonetic Observations of
Indian Grammarians, Delhi: Munshi Ram Manohar Lal.
Venezky, Richard L. (1970), The Structure of English Orthography, The Hague:
Mouton.
Versteegh, Kees (1977), Greek Elements in Arabic Linguistic Thinking, Leiden:
E. J. Brill.
Viereck, Wolfgang (1973), ‘A critical appraisal of the Survey of English
Dialects’, Orbis XXII, 72–84.
Vieregge, W. H. (1987), ‘Basic aspects of phonetic segmental transcription’,
Zeitschrift für Dialektologie und Linguistik Beihefte 54, 5–55.
Vieregge, W. H., A. C. M. Rietveld and C. I. E. Jansen (1984), ‘A distinctive
feature based system for the evaluation of segmental transcription in Dutch’,
in M. P. R. van den Broecke and A. Cohen (eds), Proceedings of the Xth
International Congress of Phonetic Sciences, Dordrecht: Foris, pp. 654–9.
Vihman, Marilyn May (1996), Phonological Development: The Origins of
Language in the Child, Cambridge, MA: Blackwell.
Vineis, Edoardo and Alfonso Maierú (1994), ‘Medieval linguistics’, in Giulio
Lepschy (ed.), History of Linguistics. Vol. II: Classical and Medieval
Linguistics, London: Longman, pp. 134–346.
Walker, Gareth (2013), ‘Phonetics and prosody in conversation’, in Jack Sidnell
and Tanya Stivers (eds), Handbook of Conversation Analysis, Oxford: Wiley-
Blackwell, pp. 455–74.
Walker, John (1791), A Critical Pronouncing Dictionary and Expositor of the
English Language, London: Robinson and Cadell.
Wallis, John (1653), Grammatica Linguae Anglicanae, facsimile first edition,
ed. R. C. Alston, Menston: Scolar Press, 1969.
Wallis, John (1765), Grammatica Linguae Anglicanae, London: G. Bowyer,
sixth edition, in Kemp (1972).
Warren, Richard M. (2008), Auditory Perception: Analysis and Synthesis,
293
Cambridge: Cambridge University Press, third edition.

Watson, Janet C. E. (2012), The Structure of Mehri, Wiesbaden: Otto
Harrassowitz.
Watson, Janet C. E., Barry Heselwood, Munira Al-Azraqi and Samia Naïm
(2012), ‘Lateral articulations of Arabic ḍād in south-western Saudi
Arabia: Electropalatographic evidence’, paper presented at the British
Association of Academic Phoneticians Colloquium, University of Leeds,
26–8 March.
Watt, Dominic and Anne Fabricius (2002), ‘Evaluation of a technique for
improving the mapping of multiple speakers’ vowel spaces in the F1~F2
plane’, Leeds Working Papers in Linguistics and Phonetics 9, 159–73.
Wellisch, H. H. (1978), The Conversion of Scripts: Its Nature, History and
Utilization, New York: John Wiley and Sons.
Wells, John C. (1982), Accents of English, Cambridge: Cambridge University
Press, 3 vols.
Wells, John C. (1995a), ‘New syllabic consonants in English’, in Jack Windsor
Lewis (ed.), Essays in Honour of Professor J. D. O’Connor, London:
Routledge, pp. 401–12.
Wells, John C. (1995b), ‘Computer-coding the IPA: A proposed extension of
SAMPA’, http://www.phon.ucle.ac.uk/home/sampa/x-sampa.htm
Wells, John C. (1996), ‘Why phonetic transcription is important’, Malsori
(Journal of the Phonetic Society of Korea) 31–2, 239–42.
Wells, John C. (2006), English Intonation, Cambridge: Cambridge University
Press.
Wells, John C. (2008), Longman Pronunciation Dictionary, London: Longman,
third edition.
Weninger, Stefan, Geoffrey Khan, Michael P. Streck and Janet C. E. Watson
(2011), ‘Introduction’, in Stefan Weninger (ed.), The Semitic Languages: An
International Handbook, Berlin: De Gruyter Mouton, pp. 1–6.
West, Paula (1999), ‘Perception of distributed coarticulatory properties of
English /r/ and /l/’, Journal of Phonetics 27, 405–26.
Wester, Mirjam, Judith M. Kessens, Catia Cucchiarini and Helmer Strik (2001),
‘Obtaining phonetic transcriptions: A comparison between expert listeners
and a continuous speech recognizer’, Language and Speech 44, 377–403.
Wilkins, John (1668), An Essay Towards a Real Character and a Philosophical
Language, London: Sa. Gellibrand.
Williams, Ann and Paul Kerswill (1999), ‘Dialect levelling: Change and con-
tinuity in Milton Keynes, Reading and Hull’, in Paul Foulkes and Gerard
Docherty (eds), Urban Voices, London: Edward Arnold, pp. 141–62.
Wong, C. S. P. and Jane Setter (2002), ‘Is it “night” or “light”? How and why
Cantonese-speaking ESL learners confuse syllable-initial [n] and [l]’, in A.
James and J. Leather (eds), New Sounds 2000: Proceedings of the Fourth
International Symposium on the Acquisition of Second Language Speech,
University of Klangenfurt, pp. 351–9.
Yip, P. (2000), The Chinese Lexicon: A Comprehensive Survey, New York:
Routledge.
SomoudBarghouthy
Zhang, Wei (2007), ‘Alternation of [n] and [l] in Sichaun dialect, Standard
Mandarin and English: A single case study’, Leeds Working Papers in
Linguistics and Phonetics 12, 156–73.
Zurek, P. M. (1981), ‘Spontaneous narrowband acoustic signals emitted by
human ears’, Journal of the Acoustical Society of America 69, 514–23.
SomoudBarghouthy

e
IPA Chart Revised to 2005
SomoudBarghouthy
Elaborated Consonant Chart from Esling (2010)
SoURcE: Figure 18.6 ‘An elaborated phonetic chart of consonants based on the 2005 IPA chart’
© Hardcastle, Laver and Gibbon (2010), Handbook of Phonetic Sciences, 2nd edition, Wiley-Blackwell
SomoudBarghouthy
SomoudBarghouthy
ExtIPA Chart Revised to 2008
SoURcE: © ICPLA 2008
SomoudBarghouthy
VoQS Chart 1994
299
SoURcE: © 1994 Martin J. Ball, John Esling, Craig Dickson
SomoudBarghouthy
IPA Braille Chart 2009
SomoudBarghouthy
SomoudBarghouthy
SoURcE: © Englebretson (2009), ‘An overview of IPA Braille: An updated tactile representation of the International Phonetic Alphabet’,
Journal of the International Phonetic Association 39, 67–86
SomoudBarghouthy
SomoudBarghouthy
Index
e
Note: Page references in bold are to the Glossary, ‘f ’ refers to figures,

‘t’ to tables, and ‘n’ to notes.
Abercrombie, David, 3, 10, 16, 26, 55, 59, 63, see also International Phonetic Alphabet
66, 88, 92–3, 101, 144, 145, 155, 157, alphabetic writing, 16, 40, 41–3, 93, 101
163, 198, 253 alphabets, 7, 7t, 97, 265
abjads, 7, 7t, 265 Al-Sakkākī, 51, 51f, 117
abugidas, 7, 7t, 265 Amorosa, H. U. et al., 212
accent studies, 257–8, 260–1 analogical notation, 265; see also organic-
acoustic classes, 16–18, 17f analogical notation
acoustic domain, 19, 246–7, 246f, 247f analphabetic notation, 92–3, 265
acrophony, 40–1 Jespersen’s analphabetic notation, 92, 93–4,
aerodynamic domain, 18, 245 94t, 137
Akkadian, 15, 40 Pike’s analphabetic notation, 92–3, 95–7,
Albright, R. W., 62, 106, 107, 108, 109 96t, 137
Alcuin of York, 71, 174–5 Ancient Egyptian, 6, 39, 40, 43, 47, 70, 171
Al-Khalīl, 173 Ancient Greek, 8–9, 42–3, 49–50, 127, 171–2
Allen, W. S., 44, 49, 68 Anderson, John, 102–3
allograms, 15 anti-phonography, 47–8, 71
allophones, 8, 149, 151–2, 154–5 aperiodicity, 17
allophonic transcription, 155–7 Arabic
alphabetic notation and the structure of abjads, 7, 265
symbols, 97–101, 99f acoustic displays, 230–3, 230f, 231f, 232f,
pre-nineteenth-century alphabetic notation, 235, 235f
101–6 consonants, 188
Ellis’s paleotype notation, 100, 109–11, dialects, 122–3, 123f
112, 137, 141 multi-tiered transcription, 239, 240f
ExtIPA, 105, 119–23, 128, 136–7, 298f naqt pointing, 45
infant vocalisations, 130–2, 222 phonetic theory, 50–1, 54, 66, 117
IPA Braille notation, 31, 124–6, 127, 300–3f writing, 43
Lepsius’s Standard Alphabet, 106–9, 258, Aramaic, 15, 45
259 arbitrariness of symbols, 3, 14
pitch notation, 126–8, 126f, 168f archiphonemes, 8–9, 157–8
SAMPA notation, 129–30 archiphonemic transcription, 157–8
Sweet’s romic notation, 111–12 Aristotle, 9, 52, 62, 174
using notations, 132–4 articulatory domain, 18, 243–5, 244f
voice quality and long domain categories, articulatory phonology, 165
128–9 articulography, 223, 243
SomoudBarghouthy
Ashby, Michael et al., 28, 146, 204, 205, 214
Index
Cage, John, 180
305
assimilation, 151–2, 197, 252, 254, 255 Canepari, Luciano, 134, 135f
atypical speech, 120, 122, 142, 200, 215, Carlyle, Thomas, 2, 26, 226
221–2, 239, 257 Carney, Edward, 71, 209
audio recording, 69, 212, 264 Carter, Michael G., 50
audiovisual integration, 193 categorical contamination, 210
audiovisual perceptual analysis, 180, 212 categorical perception, 192
auditory agnosia, 200–1 Catford, J. C., 92, 117, 119, 214
auditory domain, 19, 247–8 Chao, Yuan-Ren, 127
auditory enhancement, 188 characters, 10, 11, 265
auditory events, 179–80, 179t Cherokee, 108
auditory integration, 186–7 Chinese
auditory nerve, 182–3, 189 fǎnqiè, 40–1, 43, 172
auditory perception of speech, 180–4 logography, 6, 14, 15, 30–1, 32, 39
auditory-perceptual analysis, 180 Pinyin, 30–1, 42, 47, 127
auditory response area, 182, 183f, 184 tones, 127, 128
auditory scene analysis, 189–91 Wubi (Wang Ma), 31
auditory system, 180–4 Cho, Taehong, 146
autosegmental phonology, 153–4 Chomsky, Noam, 160, 208
Avestan, 9 Clark, Kenneth, 236–7
Azeri, 8, 8t coarticulation, 16, 163–4, 213, 263
Collins, Beverley, 253
Bacon, Francis, 59–60 combination tones, 180, 196–8
Baddeley, Alan D., 203, 213, 214 compensatory articulations, 143, 200, 208
Baines, John, 39, 171 confusion matrices, 187
Bakalla, Muhammad H., 50 consensus transcription, 215, 218–20
Baker, A., 11 consonants
Ball, Martin J. et al., 92, 121, 122, 127, 146, Canepari’s chart, 134, 135f
175–6, 220, 250, 257 Esling’s elaborated chart, 132, 133,
Bark scale, 186, 197, 247–8 297f
Barnfield, Richard, 56–7 quantity, 154
Beal, Joan C., 67 transcription, 221
Beckman, Mary E., 166 conversation analysis (CA), 261–3
Bell, Alexander Melville, 68 Cooper, Jerrold S., 47
Visible Speech notation, 3, 55, 56f, 57, 74, corpus transcription, 147, 234–5
79–80, 79f, 82, 89, 90f, 91, 97, 114, 127, Coulmas, Florian, 15, 29, 40, 47, 174
220 criterion shift, 195, 214
Bernhardt, Barbara, 122 Cruttenden, Alan, 255
Bernstein, Lynne E. et al., 187 Cucchiarini, Catia, 216–17
Bloch, Bernard, 141, 153
Boas, Franz et al., 107, 117 Daniels, Peter T., 7, 8, 11
Bochner, Joseph H. et al., 184 Danielsson, Bror, 52, 54–5
Bopp, Franz, 258–9 Darwin, Erasmus, 92
Borges, Jorge Luis, 19 Davis, Barbara L., 64
Boswell, James, 142, 264n ðə fonetik tîtcər, 194
bracketing conventions, 141 DeFrancis, John, 6, 8, 11
Braille Authority of North America, 124 Democritus, 207
Braille Authority of the United Kingdom, 124 Desainliens, Claude, 52
Braille: IPA Braille notation, 31, 124–6, 127, Descartes, René, 208
300–3f Devanagari script, 9
Bregman, Albert S., 189, 190 development of phonetic theory, 48–9
Britton, Derek, 102–3 the pre-Modern world, 49–51
broad transcription, 144–5 the Early Modern world, 51–64
Browman, Catherine P., 165, 166f terminology in the ‘English School’, 65–6,
Brücke, Ernst, 68 65t, 117
Bucholtz, Mary, 176 late eighteenth–nineteenth centuries, 66–9
SomoudBarghouthy
development of phonetic theory (cont.) ‘First Grammarian’, 11, 50, 101, 101f, 103,
from correspondence to representation, 115
69–70 Firth, J. R., 57
spelling reform, 47, 70–2 Fitch, W. Tecumseh, 192
diacritics, 265 Flemming, Edward S., 248
dialectology, 257–60 foreign language learning and teaching, 253–5
diaphones, 161 forensic phonetics, 263–4
Dickins, James, 204 formant transitions, 15–16, 18
dictionaries, 251–3 Foulkes, Paul et al., 175, 181, 194, 260
Dieth, Eugen, 259–60 Fowler, Carol A., 193, 199, 200, 202
difference limen (DL) see just noticeable Fox, Anthony, 126, 127, 153
differences (JNDs) Fraser, Helen, 41, 264
differences tones see combination tones French, 52, 118
direct realism, 164, 199–200, 202 Fry, Denis B. et al., 184
distinctive features, 7, 149–50, 158 function of transcription, 35, 36
Dobson, E. J., 31, 57 functionalist phonology, 150
Docherty, Gerard, 175, 260
Donatus, 11 Galantucci, Bruno, 200
Duckworth, Martin et al., 120, 121 Gandour, J. T., 128
duplex perception, 192–3 Garrick, David, 142
dynamic transcription, 161 Gelb, Ignace Jay, 40
gestural scores, 165–6, 166f, 266 general phonetic transcriptions, 147–8
intonation and rhythm, 166–9, 167f, 168f generalised transcription, 265
notation, 161–3, 162f generative phonology, 149–50, 160
parametric transcription, 141, 163–5, 164f, generic transcriptions, 142–3
266 German, 159–60
gestural scores, 165–6, 166f, 266
Edison, Thomas Alva, 69 Gill, Alexander, 52, 62
Eilers, Rebecca E., 237 Gimson, A. C., 255
Eisen, B. et al., 221 glossic, 111
Ellis, Alexander J., 31, 68, 143, 144 glossotype, 111
paleotype notation, 100, 109–11, 112, 137, glyphs, 10, 12–13, 23, 73, 266
141 Goldstein, Louis, 165, 166f
Ellis, Stanley, 259–60, 260f Golestani, Narly et al., 209, 210, 215
empiricism, 207–8 Grabe, Esther, 169
Englebretson, Robert, 124, 125 Grammaticus, Virgilius Maro, 172–3
English Phonotypic Alphabet, 109 graphemes, 7–8
equivalent rectangular bandwidth (ERB), 186, Grierson, G. A., 259
247–8 Guendouzi, Jacqueline A., 33
Esling, John H., 112–13, 128, 233 Gussenhoven, Carlos, 138
elaborated consonant chart, 132, 133,
297f Hajek, John, 132, 133
Esperanto, 60, 173 Hale, Mark, 188
Eustace, S. S., 109 Halle, Morris, 160, 208
exemplar-based generalisation mechanism, Halliday, Michael A. K., 67, 126, 126f, 128
205 Hammarström, Göran, 202
exemplars, 21, 178, 204, 210, 214, 241f, Harris, Roy, 38, 40
248–9 Hart, John, 53–5, 65t, 66, 71, 101, 103, 104f,
ExtIPA notation (extensions to the IPA), 105, 252
119–23, 128, 136–7, 298f Hassan, Zeki Majeed, 233
Haugen, Einar, 102
Faber, Alice, 41–2, 43 Hauser, Marc D., 192
Fabricius, Anne, 246 Hebrew, 7, 45, 60, 76, 77f
featural systems, 7, 7t dagesh pointing, 45
feature geometry, 138–9 Hellwag, Christoph, 67
figura, 10 Helmholtz, Herrman von, 68
SomoudBarghouthy
Helmont, Franciscus Mercurius ab, 60, 76, 77f
Index 307
instrumental records, 170f, 223–4, 224f, 225f
Hervey, Sándor, 135 and impressionistic transcription, 236–40,
Heschl’s gyrus, 209 240f
Heselwood, Barry et al., 61–2, 63, 106, 118, indexed transcriptions, 235–6, 235f, 236f
151, 154, 181, 206, 208, 212, 214, 237, instrument-dependent transcriptions,
238, 239 225–35
heterography, 14–15 International Clinical Phonetics and
Hewings, Martin, 255 Linguistics Association (ICPLA), 120,
hierarchical notation, 137–9 121
Hill, Thomas Wright, 92, 93 International Council on English Braille
Hirst, Daniel J., 128 (ICEB), 124
Hockett, Charles F., 15 International Korean Phonetic Alphabet
Hodge, Megan M., 131 (IKPA), 76
Holder, William, 57, 62–4, 63f, 65, 65t, 66, International Phonetic Alphabet (IPA), 48–9,
67, 105, 115, 137, 213, 220 69, 224
holistic listening, 128, 203 chart, 295–6f
homographs, 45, 52, 110, 125, 134–6, 151 consonantal terminology, 65t
homophones, 6, 38 ExtIPA, 105, 119–23, 128, 136, 298f
Honda, K., 199 IPA Braille notation, 31, 124–6, 127,
Howard, Sara, 181, 206, 208, 212, 214, 237, 300–3f
238, 239 notation, 3–4, 11, 24, 64, 108, 112–19
Hüllen, Werner, 60 pitch notation, 127
hyper/hypoarticulation, 188 and reformed spelling, 70
International Phonetic Association, 48, 69,
Ibn Jinni, 51 119, 253
Ibn Sīnā (Avicenna), 54, 220 intonation transcription, 127–8, 166–9, 167f,
ICEB (International Council on English 168f, 222
Braille), 124 INTSINT notation (International Transcription
Icelandic, 50 System for Intonation), 128
iconic notation, 266; see also organic-iconic IPA see International Phonetic Alphabet
notation
ICPLA see International Clinical Phonetics Jackson, Frank, 239
and Linguistics Association Jacobs, Haike, 138
IKPA (International Korean Phonetic Jakobson, Roman, 158
Alphabet), 76 Japanese
impressionistic transcription, 145, 146–7 diacritics, 45, 114
and instrumental records, 236–40, 240f kana, 6, 7
see also narrow impressionistic phonetic kanji, 40, 47
transcription Jefferson, Gail, 262, 263
indexed transcriptions, 235–6, 235f, 236f Jespersen, Otto, 2, 141
India analphabetic notation, 92, 93–4, 94t, 137
dialectology, 258–9 JNDs see just noticeable differences
phonetic theory, 49 Johnson, Keith, 205, 248
infant vocalisations, 130–2, 222 Johnson, Samuel, 142, 252, 264n
Ingrisano, D. et al., 213 Jones, Daniel, 52, 55, 56f, 100, 107, 112,
instrument-dependent transcriptions, 170, 124, 126, 128, 144, 155, 157, 161, 220,
170f, 225 251, 252; see also Passy-Jones organic
annotating function, 229–33, 229f, 230f, alphabet
231f, 232f, 233f Jones, William, 67–8, 117
corpus transcriptions, 234–5 Journal of the International Phonetic
instrument-determined transcriptions, 170, Association, 52, 70
225–8, 227f, 229f Joyce, James, 4
instrument-informed transcriptions, 170, just noticeable differences (JNDs), 184, 185
228–9
summarising function, 233–4, 234f Kelly, John, 68, 97, 109, 128, 203, 210
instrument-independent transcriptions, 170 Kemp, J. A., 57, 59, 62, 107
SomoudBarghouthy
Kent, Raymond D., 202 MacNeilage, Peter F., 64
Kerswill, Paul, 197 Madsen, Jacob, 53
keywords, 108 Maierú, Alfonso, 173
Kim, Young-Shin, 210, 211f Makkai, Valerie Becker, 141
Kluender, Keith R. et al., 189 Mann, Virginia A., 42
Köhler, Oswin et al., 108 Martinet, André, 157, 254
koineisation, 49 Mees, Inger, 253
Korean Mehiri, Abdelkader, 51
Hangŭl, 7, 45–6, 47–8, 75–6, 75f Meigret, Louis, 52, 53
transcription of, 210, 211f memory, 80, 203, 214, 248
declarative, 118, 204
Lacan, Jacques, 3 recognition, 20, 118, 204
Ladefoged, Peter, 23–4, 28, 118, 146, 148, Merkel, Carl, 68
194, 210, 214, 239, 248 Merleau-Ponty, Maurice, 44–5, 197
Ladusaw, William A., 10, 12 Mermelstein, Paul, 184
language, speech and writing, 9, 9f Merrick, W. Percy, 124
Lao, 127 mirror neurons, 198–9, 214
laryngography, 223 Montanus, Petrus, 53
laryngoscopy, 48, 68, 223 Moore, Brian C. J., 184, 185
Lashley, K. S., 210 mora, 153
Latin, 52, 54, 60, 67, 70–1, 172–3, 175 Morais, José et al., 41–2
Laufer, Asher, 115 morphemes, 5–7, 11–12, 32, 36n2, 48, 98,
Laver, John, 16, 41, 92, 94, 129, 199, 210, 159–60, 186
221 morphology, 32, 97
Lavoie, Lisa M., 200 morphophonemic transcription, 158–60
Law, Vivien, 48, 117, 172–3, 245 morphophonemic writing (morpho-
Le Maître phonétique, 52, 194 phonography), 6
Lee, Hyun Bok, 76 morphosyllabograms, 7
Lepsius, Richard: Standard Alphabet, 106–9, motor empathy, 214
258, 259 Mulder, Jan W. F., 135, 154, 156
letters, 10, 72n1, 266 Müller, Max, 11
lexical sets, 361 Müller, Nicole, 33, 250
lexicography, 67 multi-tiered transcription, 167f, 177, 238, 239,
Liberman, Alvin M. et al., 192–3 240f, 250
Ligeti, György, 196 multilayered transcriptions, 167f, 177, 225,
Lindblöm, Björn, 188, 191, 210 250
Linell, Per, 26 Murray, James, 160–1, 253
linguistic signs, 12–14, 13f
listener-oriented transcriptions, 27, 143 narrow impressionistic phonetic transcription,
Local, John, 68, 97, 109, 111, 122, 128, 144, 178
175–6, 203, 210, 262, 263 auditory system and auditory perception of
Locke, John, 207 speech, 180–4, 183f
Lodwick, Francis, 57, 65 comparing transcriptions, 215–18, 216t
analogical notation, 64, 74, 86–8, 87f, 89, conditions for, 211–15
97, 137 consensus transcriptions, 215, 218–20, 219t
logograms, 14, 47, 266 consistency, 194–5
transliteration of, 30–1 content of perceptual objects, 198–201
logography, 5, 6; see also Chinese and instrumental records, 236–40, 240f
logosyllabograms, 7, 7t objections to, 206–9
Low, E. L., 169 objects of analysis, 201–4
perception of speech, 185–91
Maassen, B. et al., 221 phonetic judgements and ascription, 204–6
McGurk effect, 187, 249 pressure-waves, auditory events and sounds,
Mackenzie Beck, Janet, 130–1 179–80, 179t
MacMahon, Michael K. C., 31, 113, 122, qualifications for making, 209–11
160–1 speakers and speech data, 221–2
SomoudBarghouthy
speech and non-speech processing, 191–4
Index
Pāṇini, 12
309
stages of, 178–9 parametric transcription, 141, 163–5, 164f,

transcription of sounds, 220–1 266
veridicality, 195–8 Passy, Jean, 171, 256
narrow transcription, 119, 133–4, 144–5; see Passy-Jones organic alphabet, 82–3, 82f, 90f,
also narrow impressionistic phonetic 91
transcription Passy, Paul, 69; see also Passy-Jones organic
Newton, Isaac, 105 alphabet
nomen, 10 Paulian, Christiane, 120
nonsense words, 171–3 perceptual domain, 19, 248–9
notation see also paleotype notation; phonetic perceptual objects, 198–201
notation; phonetic notation charts; pitch performance scores, 170–1, 266
notation; proper notation; proto-notation; active and passive readings, 175
pseudo-notation; romic notation nonsense words, 171–3
Nudds, Mathew, 207 spelling pronunciation, 174–5
transcriptions as prescriptive models, 173–4
O’Callaghan, Casey, 207 periodicity, 17
Ochs, E., 176 PETAL speech assessment, 257
O’Connor, J. D., 238 phenomenalism, 202–4, 215
Ohala, John J., 199–200 phenomenology, 202
Old Norse, 101, 102 Phonautograph, 68
Oller, D. Kimbrough, 131, 237 phonemes, 7, 8, 149–50, 156, 185–6
Olson, David, 11–12, 40, 41 phonemic transcription, 141, 145–6, 148–55,
Olson, Kenneth S., 132, 133 267
Optimality Theory, 149 phonetic categories, 60, 61f, 65–6, 65t, 227–8,
organic notation, 266 238–9, 249
organic-analogical notation, 83 phonetic description, 227–8
Lodwick’s analogical notation, 64, 74, phonetic domains, 238–9, 240–3, 241f, 243f
86–8, 87f, 89, 97, 137 acoustic domain, 19, 246–7, 246f, 247f
notation for voiced alveolar trill, 90–2, 90f aerodynamic domain, 18, 245
Sproat’s analogical notation, 74, 88–90, 89f articulatory domain, 18, 243–5, 244f
Wilkins’s analogical notation, 74, 83–5, auditory domain, 19, 247–8
84f, 88, 105, 137 perceptual domain, 19, 248–9
organic-iconic notation, 74–5 phonetic categories as domain-neutral, 249
Helmont’s interpretation of Hebrew letters, phonetic models, 20
60, 76, 77f content of, 26–8
Korean Hangŭl, 75–6, 75f descriptive phonetic models, 24–5, 25f,
Passy-Jones organic alphabet, 82–3, 82f, 27–8
90f, 91 general phonetic models, 24
Sweet’s organic-iconic notation, 80–1f, pre-theoretical models, 20, 22
80–2, 89, 90f, 92 theoretical models, 20, 21f, 26–7, 28
Wilkins’s organic-iconic symbols, 60, 62, phonetic notation, 73–4, 266
77, 78f, 90–1, 90f alphabetic notation and the structure of
see also Visible Speech notation symbols, 97–134
Orrmulum, 101, 102–3 analphabetic notation, 92–7
orthographic transcription, 32–3 hierarchical notation, 137–9
interpretation of spellings and homosymbols, 136
transcriptions, 33–4 ordering of components, 134, 135–7
orthography, 266 organic-analogical notation, 83–92
Orton, Harold, 259–60 organic-iconic notation, 74–83
ostensive definitions, 20 and phonetic models, 20–4
role of phonetic theory, 20–2
pairwise variability index (PVI), 169 status of, 35–6, 35f
palaeotype notation, 100, 109–11, 112, 137, phonetic notation charts
141 Esling’s elaborated consonant chart, 297f
palatography, 68, 223, 226–7, 243 ExtIPA chart, 298f
SomoudBarghouthy
phonetic notation charts (cont.) proper phonetic symbols, 21, 23, 266
IPA Braille chart, 300–3f proper phonetic transcription, 25
IPA chart, 295–6f proto-notation, 22, 23
VoQS chart, 299f proto-phonetic transcriptions, 25
phonetic prototypes, 20, 23, 204–5, 248 proto-symbols, 23, 266
phonetic symbols, 10, 11 proto-writing, 38
as descriptive models, 24 pseudo-notation, 21, 22
integral symbols, 86, 89, 97, 113, 118–20, pseudo-phonetic symbols, 21, 23, 34, 38, 39,
133, 136, 226, 238, 259 266
proper phonetic symbols, 21, 23, 266 pseudo-transcription, 21, 22, 25
and speech sounds, 15–20 borrowing of writing systems, 46f
phonetic taxonomy, 66, 88, 134, 249 respelling as, 28–31, 30f
phonetic theory, 20–2; see also development and spelling reform, 70
of phonetic theory transliteration as, 29–31, 30f
phonetic transcription, 267 Pulgram, Ernst, 8
brackets, 141 Pullum, Geoffrey K., 10, 12
as data reduction-by-analysis, 25–6 Punjabi, 149
as descriptive phonetic models, 24–5, 25f Pye, C. et al., 213
general phonetic transcription, 147–8
and phonographic writing, 9–14, 10f, 13f qualia, 240
purpose of, 11 Quené, Hugo, 184
segmental transcription, 141 Quintilian, 53, 71, 174
phonetics learning and teaching, 256
phonographic processes in writing systems, 38 Rahilly, Joan, 92
acrophonic principle, 40–1 Rambaud, Honorat, 77
anti-phonography, 47–8, 71 rationalism, 150, 206, 208–9
diffusion and borrowing of writing systems, Read, Charles et al., 42
46 rebus principle, 21, 38–9
rebus principle, 21, 38–9 Reiss, Charles, 188
segments, 41–5 Remez, Robert E., 189, 190, 193
subsegmental analysis, 45–6 respelling as pseudo-phonetic transcription,
syllabography, 39–40 28–31, 30f
phonographic writing, 9–14, 10f, 13f Revell, E. J., 45
phonography, 5, 6, 69 rhythm, transcription of, 67, 129, 138–9, 142,
phonological loop, 203, 213–14 166–9, 225, 250
phonological transcription see phonemic Rippmann, Walter, 220
transcription Roach, Peter, 255
phonotactic analysis, 154 Robinson, Robert, 31, 55–7, 56f, 59, 65, 65t,
physicalism, 202, 206–8, 241 103, 126
Pike, Eunice V., 133 romic notation, 111–12
Pike, Kenneth L., 18, 153, 162, 162f, 169 Rosenblum, L. D., 193
analphabetic notation, 92–3, 95–7, 96t, 137
pitch notation, 126–8, 126f, 168f Saldana, H. M., 193
Pitman, Isaac, 11, 65, 68, 109 Salmon, Vivian, 54
Plaatje, Sol, 126 SAMPA notation (Speech Assessment
polysemy, 242–3 Methods Phonetic Alphabet), 129–30
potestas, 10–11 Sampson, Geoffrey, 5, 6, 7t, 55
Potthoff, W., 124 Sanskrit, 12, 43, 67–8, 117
Pound, Ezra, 178 Saussure, Ferdinand de, 12, 14, 36n3, 92, 152,
prescriptive transcriptions, 173–4 154, 165
pressure-waves, 179–80, 179t schwa, 151, 197
pronunciation, 266 Scragg, D. G., 103
representation of in writing systems, 37–8 script conversion, 39–40
spelling pronunciation, 174–5 Scripture, Edward, 207
pronunciation-forms, 10 Scruton, Roger, 203, 207
proper notation, 23 segments, 15–18, 41–5
SomoudBarghouthy
Semaan, Khalil I., 220
Index
purpose of, 11
311
semasiograms, 38 respelling as pseudo-phonetic transcription,

Semitic languages, 42–3, 258–9 28–31, 30f
Sequoyah, 108 sound–spelling correspondence, 6–9, 7t
Shadle, Christine H., 246 spoken and written languages as translation
Shakespeare, William, 33(n10), 142 equivalents, 14–15, 14f
Shannon, Claude E., 145, 151 spelling pronunciation, 174–5
Shriberg, Larry D. et al., 202, 214, 218 spelling reform, 47, 70–2
Sībawayh, 50, 54, 66 Spence, Thomas, 67, 106, 111
Sievers, Eduard, 68, 69 Sproat, Amasa D.: analogical notation, 74,
signal-complementary processing, 191, 210 88–90, 89f
signal-oriented transcriptions, 27, 143, 226, Standard Alphabet, 106–9, 258, 259
229 Stark, Rachel E., 131
signified (content), 12 status of notation and transcription, 35–6, 35f
signifier (expression), 12 Steele, Joshua, 67, 126, 127, 142, 142f, 166–7,
silence, 16 167f
‘silent’ letters, 8 Stevens, Kenneth N., 189
Smith, Sir Thomas, 53, 54, 65t, 102 Storm, Johan, 68
sociophonetics, 257–8, 261 Studdert-Kennedy, Michael, 193
sounds subsegmental analysis, 45–6
early charts, 51f, 56f, 58f, 61f, 63f, 67 Sumerian, 15, 40
as perceptual objects, 179–80, 179t Sundby, Bertil, 158
sound–spelling correspondence, 6–9, 7t Survey of English Dialects (SED), 259–60
Spanish, 6 Sweet, Henry, 3, 49, 57, 66, 68, 69, 71, 73,
speaker-oriented transcriptions, 27, 143, 226, 107, 110, 113, 141, 144, 176, 220, 240
228–9 organic-iconic notation, 80–1f, 80–2, 89,
specific transcriptions, 142, 266 90f, 92
spectrograms, 223, 224, 225f romic notation, 111–12
spectrography, 48, 74, 246 Swift, Jonathan, 20
speech and non-speech processing, 191–4 syllabary, 266
audiovisual integration, 193 syllables, 40–1, 43–4
categorical perception, 192 syllabograms, 7, 7t, 39, 266
duplex perception, 192–3 syllabography, 39–40
signal-complementary processing, 191, 210 symbols, arbitrariness of, 3, 14; see also
speech chain, 242 phonetic symbols; proto-symbols;
speech pathology and therapy, 174, 256–7; see pseudo-phonetic symbols
also ExtIPA notation synchronic grammars, 165
speech perception, 185–91 synchysis, 172–3
auditory enhancement, 188 syntax, 41
auditory integration, 186–7 systematic transcriptions, 145–6
auditory perception, 180–4
auditory scene analysis, 189–91 tempo, 26, 33, 184, 225
dispersion, 187–8 Tench, Paul, 162, 163, 164, 164f
hyper/hypoarticulation, 188 Thai, 127, 128
levels of awareness, 186, 188–9 third party transcriptions, 175
phonemes, 185–6 Thrax, Dionysius, 54, 172
speech sounds tmesis, 172–3
vs. analysis of speech sounds, 19–20 ToBI notation (Tone and Break Indices), 128,
complexity of, 18–19 168f, 169
as discrete segments, 15–18 Trager, George L., 141, 153
spelling, 5, 267 Traill, Anthony, 148
etymological spellings, 52 transcription alignment, 16, 163, 166f,
interpretation of spellings, 33 215–17, 224–5, 250, 263
logography and phonography, 5, 6, 69 transcription types, 141
orthographic transcription, 32–4 allophonic transcription, 155–7
and phonetic transcription, 9–14, 10f, 13f archiphonemic transcription, 157–8
SomoudBarghouthy
transcription types (cont.) ventriloquial speech, 200, 201
broad and narrow, 119, 133–4, 144–5 Vieregge, W. H. et al., 217
dynamic transcription, 161–9, 167f, 168f Viëtor, Wilhelm, 68
exclusive and inclusive, 160–1 Vihman, Marilyn May, 131
general phonetic transcription, 147–8 Vineis, Edoardo, 173
generic transcriptions, 142–3 Visible Speech notation, 3, 55, 56f, 57, 74,
instrument-dependent and instrument- 79–80, 79f, 82, 89, 90f, 91, 97, 114, 127,
independent, 170 220
laying out transcriptions, 175–7 vocal tract, 51, 51f, 57, 60, 79f
morphophonemic transcription, 158–60 voice onset time (VOT), 146, 165, 181, 183,
multi-tiered and multilayered, 27, 250 192, 195, 238–9, 248
orientation of transcriptions, 143 voice quality, 26, 33, 128–9, 191, 220, 235f,
as performance scores, 170–5 257, 264, 299f
phonemic transcription, 141, 145–6, voicing, 54, 62–3, 64, 79
148–55, 267 VoQS notation (Voice Quality Symbols), 299f
specific transcriptions, 142, 266 vowel quality and quantity, 152–3
systematic and impressionistic, 143, 145–7 vowel transcription, 220–1
third party transcriptions, 175
see also narrow impressionistic phonetic Walker, Gareth, 262, 263
transcription; orthographic transcription; Walker, John, 67
phonetic transcription; pseudo- Wallis, John, 57–9, 58f, 65, 65t, 104, 137,
transcription 139n4
transience, 16–17 Warren, Richard M., 18, 195, 206
transliteration, 22 Watt, Dominic, 246
definition, 29, 267 Wellisch, H. H., 39
as pseudo-phonetic transcription, 29–31, 30f Wells, John C., 97, 252, 254, 260–1
as respelling, 30f Weninger, Stefan et al., 259
Trout, J. D., 189 Whitney, W. D., 68
Trubetzkoy, Nikolai S., 8–9, 42, 157–8, 165 Wilkins, John, 57, 59, 139n4, 220
Tucker, Abraham, 106, 178 analogical notation, 74, 83–5, 84f, 88, 105,
Turing machine, 149, 150 137
organic alphabet, 60, 62, 77, 78f, 90–1, 90f
ultrasound images, 196, 223, 243 phonetic categories, 60, 61f, 65, 65t
Upton, Clive et al., 253 Wright, Susan, 197
uses of phonetic transcription writing systems, 267
accent studies, 257–8, 260–1 diffusion and borrowing, 46
conversation analysis (CA), 261–3 logography and phonography, 5, 6
dialectology, 257–60 notation classification, 9–11, 10f
dictionaries, 251–3 phonographic processes in, 38–48
foreign language learning and teaching, representation of pronunciation in, 37–8
253–5 sound–spelling correspondence, 6–9, 7t
forensic phonetics, 263–4 and speech, 11–14, 13f
phonetics learning and teaching, 256
sociophonetics, 257–8, 261 X-SAMPA notation (Extended Speech
speech pathology and therapy, 256–7 Assessment Methods Phonetic Alphabet),
128, 129–30
Vaissière, Jacqueline, 134, 118, 222 xenography, 14–15, 47
Venditti, Jennifer J., 166
Venezky, Richard L., 8 Yeomans, John, 105

Phonetic Transcription in Theory and Practice PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Phonetic Transcription in Theory and Practice PDF

Transféré par

Droits d'auteur :

Formats disponibles

SomoudBarghouthy

© Barry Heselwood, 2013

Edinburgh University Press Ltd

ISBN 978 0 7486 4073 7 (hardback)

The right of Barry Heselwood

1 Theoretical Preliminaries to Phonetic Notation and

4 Types of Transcription 141

5 Narrow Impressionistic Phonetic Transcription 178

6 Phonetic Transcription in Relation to Instrumental and

7 Uses of Phonetic Transcription 251

Table 1.1 Types of writing-system units and their corresponding

Figure 1.1 Two views of the relationship between language, speech

its revelations are displayed.

1.1 Phonetic Transcription and Spelling

Logography and phonography

1.1.2 Sound–spelling correspondence

Relationships between elements of writing and elements of pronunciation I shall,

TABLE 1.1: Types of writing-system units and their corresponding

‘Sound–spelling correspondence’ is a general term, neutral with respect both to

The rarity of different allophones of a phoneme being in correspondence with

<ψ> corresponded to the sequence comprising the archiphoneme /P/, resulting

(a) LANGUAGE SPEECH WRITING

1.1.3 Speech, writing and the linguistic sign

Resemblances between phonetic transcription and phonographic writing are

Graphic resources Graphic resources

FIGURE 1.2: Classification of notation in writing

The three attributes of a ‘letter’ discussed by Abercrombie (1949/1965) –

interpretation seems to have been given to it by the Icelandic ‘First Grammarian’

FIGURE 1.3: The relationship of phonetic transcription to language

When spellings for a written language become fixed and an orthography

1.1.4 Spoken and written languages as translation equivalents

Written <book> <livre>

FIGURE 1.4: Correspondences and equivalences between expression-

The expression-forms of English are completely different from the equivalent

spoken translation equivalent in the borrowing language. An example would be if

1.2 Phonetic Symbols and Speech Sounds

1.2.1 Speech sounds as discrete segments

The notion of a single discrete speech sound, often referred to as a ‘segment’,

a partial articulatory stricture under pressure to create the turbulence of frica-

FIGURE 1.5: Segmentation of So does she keep chickens? into acoustic

1.2.2 Complexity of speech sounds

rapid oscillations of countless air particles at thousands of different fre-

1.2.3 Speech sounds vs. analysis of speech sounds

1.3 Phonetic Notation, General Phonetic Models and the

are constellations of events whose complexity, as we have seen in Section 1.2,

Category i Category j Category k

Category c Model ci Model cj Model ck

Category d Model di Model dj Model dk

Category e Model ei Model ej Model ek

FIGURE 1.6: Categories, dimensions and models in a small,

I shall call any notation system not underpinned by a phonetic theory

1. Pseudo-notation – denoting models not defined by phonetic theory; com-

orthographic characters which then take on the status of proto-phonetic

The status of a transcription is defined by the status of the notation system in

(1.1) Phonetic symbol = Glyph R theoretical phonetic model

Pseudo-phonetic symbols are glyphs in relation with non-theoretical denotata

(1.2) Pseudo-phonetic symbol = Glyph R ostensive definition

What distinguishes a proper symbol from a proto-symbol is that it is a member

1.3.1 Phonetic transcription as descriptive phonetic models

MODEL [a] PHENOMENA

Denotes [a] Represents

FIGURE 1.7: The mapping of speech phenomena onto a theoretical model

Phonetic transcriptions, then, are composed of descriptive phonetic models.