Vous êtes sur la page 1sur 20

1

The nature of speech

More than a word Darling?, came an anxious voice as he edged open the unlocked door to the darkened hotel room. He knew at once it was not his wifes voice. It was the voice of a younger woman an Australian, he thought. Sadly, perhaps, the rest of this book will not reveal whether our hero was a leading scientist about to be seduced by a beautiful spy, or just the kind of average fellow who is prone to get out of the hotel lift at the wrong floor without noticing and blunder into someone elses room. Instead, the book will concentrate on an ultimately more intriguing mystery: how speech conveys information. This, broadly, is the subject matter of the discipline of phonetics. In the fictional example above the unseen woman spoke only one word. It is clear, however, that as she did so a number of quite different kinds of information were conveyed. The hearer identified a particular word (darling) of the language he shares with the speaker. As indicated by the question mark, he interpreted it to be a query of some kind. Something about the utterance told the hearer that the speaker was anxious perhaps it was spoken quickly, quietly, and a little breathily. He knew immediately from the voice that the speaker was not his wife; nor, presumably, anyone else he knew very well. But even so he was able to infer, with at least a fair degree of confidence, further facts about the unknown speaker: her sex, her age, and her geographical background. Unlike the word, and its function as a query, these other kinds of information cannot be directly represented in the way an utterance is written, and so an author has to resort to commentary to convey them. The main point, then, is that any utterance conveys a number of distinct types of information. Two further points need to be made. Firstly, not all types of information conveyed by an utterance are equally intended by the speaker. Speakers clearly do intend to say particular words appropriate to given 1

1.1

The nature of speech contexts; but it would be odd to suggest that they normally intend to sound like a tired young woman, or a large man with a cold. The pair of terms communicative and informative have been used for this kind of distinction (Lyons 1977:33). Any aspect of an utterance is informative if it potentially makes the hearer aware of something. Only those aspects which are intended by the speaker to be informative are communicative. Secondly, whilst the various types of information are distinct, they are all conveyed by a single complex speech signal. The speech signal is the physical link between a speaker and a hearer (the only one if they are out of sight and touch of each other). It consists of very rapid pressure variations in the air, caused by the speakers speech organs, and sensed by the hearers ear. Although the speech signal can be analysed in terms of a number of separate acoustic dimensions, corresponding to what we perceive as for instance pitch, loudness, rhythm, and so on, it is far from being the case that each type of information will be carried by its own acoustic dimension. Any acoustic dimension will help to carry a variety of information. The pitch of the speech signal, for instance, might tell a listener that the utterance is a question, and that the speaker is a man, and that he is bored. Humans are very skilled at unravelling the information in the speech signal, but the difficulty of building a machine that will replicate some of this skill an automatic speech recogniser, for example demonstrates the complexity of the way in which information is represented in the speech signal. Information carried by the speech signal A speaker is often described as communicating a message. Since a speaker in fact uses the speech signal to convey a variety of information, it is better to think of the speaker having not a simple message but a complex communicative intent. This is made up of a number of distinct kinds of information. Cognitive information is essentially factual, or propositional; it consists of things we know, or could know as opposed, for instance, to how we feel. Words, and their combination into phrases and sentences, are the primary vehicle of cognitive information, and it is the kind of information which writing copes with best. Affective information has to do with a speakers feelings and attitudes. In everyday terms a speakers tone of voice is one way of conveying affective information, as in I didnt mind what he said to me, it was his patronising tone which I objected to. Choice of words, too, can be important. 2 1.2

Information carried by the speech signal Social information in speech might seem to be something the speaker does not choose, and therefore to be merely informative rather than communicative. We think of speakers having an accent which indicates they belong to particular geographically and socially defined communities, and generally see this as an unalterable attribute of the person. But in fact most speakers vary their way of speaking according to the situation and their addressee(s). A speaker may also adjust his or her accent in the direction of that of another person, probably as a way of indicating friendliness or solidarity with that person. Self-presentational information concerns the speakers self-image. A speaker who wishes to present an authoritative, knowledgable, persona to the world may adopt a confident tone of voice (probably relatively loud, and involving a moderate degree of muscular tension and clear, precise pronunciation). Finally here, though this list does not necessarily exhaust the kinds of information a speaker may choose to convey in an utterance, is regulative information. This concerns the management of a spoken interaction. For a conversation to proceed smoothly there have to be some traffic rules, to avoid the counterproductive situation of both participants speaking simultaneously and being silent simultaneously. The participant who is speaking will encode signals in the utterance to communicate that he or she is full flow, and shouldnt be interrupted (maybe by speaking louder and speeding up a little), or nearing the end of his or her conversational turn (by lowering the voice and slowing). In contrast to the above types of information which speakers may intend to communicate are those which they convey willy-nilly. The latter kinds of information leave their trace in the speech signal without intent on the part of the speaker, and are sometimes called indexical because they serve as an index or indicator of aspects of an individual (e.g. Abercrombie, 1967:6). Such aspects include the speakers social background, age, sex, physique, psychological state, and health. The routes by which these various kinds of information leave their traces in the speech signal will be discussed in section 1.4.

1.3

The speech machine We have considered various kinds of information which originate 3

The nature of speech within the speaker, and an external speech signal in which that information is conveyed. The machine which produces this signal is made up of two parts, the vocal mechanism and the linguistic mechanism. In a sense these are rather like a computer and its software. The vocal mechanism is the physical device concerned with speaking, while the linguistic mechanism is the software which controls it. The vocal mechanism is shown schematically in Fig. 1, which is like an xray picture of someone facing to the left. The vocal mechanism consists most obviously of the speakers mouth, throat, nose, larynx, and associated structures such as the tongue and the vocal cords. But it also consists of the lungs and the muscles which control breathing, since speech requires air; and very importantly those parts of the brain and nervous system which control the rest of the vocal mechanism.

The speech machine

Fig. 1.1

Sagittal section of the vocal mechanism

In the production of speech, broadly, air is expelled in a controlled way from the lungs, and the airstream is interfered with at various points. These various kinds of interference create acoustic energy (sound) of different kinds, which is further modified by the shape of the vocal tract (the air passage through the mouth and nose). These processes will be dealt with in detail in Chapter 2. 5

The nature of speech The linguistic mechanism includes the speakers language, but goes beyond what is most commonly thought of as a language. The linguistic mechanism consists of the set of conventions which the speaker shares with the relevant language community. These conventions range over aspects such as the following (assuming, say, that the relevant language community is English speaking and in the South East of England): that the word for a canine quadruped is dog; that adjectives come before nouns (a big dog not a dog big); that a rising pitch towards the end of an utterance often signals a question; that shifting the pronunciation of the word time in the direction of toym is less educated, while shifting it in the direction of tame is posh or affected; and that the use of a whispery voice can mark what is being said as in some way confidential. The last two of these conventions would be excluded by many definitions of a language, but are an integral part of the linguistic mechanism as a whole. Linguistics gives us a more structured way of looking at some of these conventions. Fig. 1.2 shows several components of the linguistic mechanism. These could be thought of as a set of resources at the speakers disposal. The lexicon is the mental dictionary shared by speakers of a language, linking meanings and pronunciations, and including grammatical information. Arguably it is organised on the basis of morphemes, meaningful sub-word elements such as point, -ing, aim, -less, -ly, and a set of rules for combining them into words such as aiming, and pointlessly. Syntax is a set of conventions governing the combination of words grammar in its most familiar sense. It is responsible for the sense the English speaker has, for instance, that pass me the butter is a usable combination of words, and me butter the pass is not. Phonology is a set of conventions specifying how a language organises sound. The fact that English has a th sound as in thin and French does not, or that palm and farm rhyme in some varieties of English and not in others, are two small examples of phonological differences between languages or varieties. Prosody provides a set of conventionalised patterns of pitch and timing which can signal the organisation of the words of an utterance, refine the meaning of the utterance, and organise sounds within words. So in Youre broke again? You never have any money! the punctuation partially captures the prosody (the utterance is organised into two parts; the first has questioning role despite its declarative syntax, and the second is spoken with marked emphasis); and the bold type conveys a particular prominence lent to never by prosodic features. Not indicated in the written form is the organisation of never into a more prominent part nev and a less 6

The speech machine prominent part -er. Tone of voice lies outside the usual definition of language (see 1.5), and provides for the communication of additional information in ways such as loudness, voice quality, and whisper. Core language relies on mapping meaning via discrete abstract categories. Tone of voice involves a more direct signalling system where gradual changes in meaning are mapped onto gradual changes in phonetic signals. Increasing anger may correlate gradiently with increasing loudness. Likewise, different pitch ranges may express a continuum of involvement or enthusiasm across utterances of the words Oh, thats great. The categorical and gradient elements of the linguistic mechanism are represented in Fig 1.2 by the two cylinders abutting over prosody. This acknowledges that part of prosody, for instance intonation pitch range, is not categorical.

PHONETIC PLAN Fig. 1.2 The linguistic mechanism There must be a point of contact between the linguistic mechanism and the vocal mechanism. In the view adopted here, that point of contact is the phonetic plan of an utterance. From the speakers point of view this is a specification of all the sound properties which the vocal mechanism will have to achieve during the utterance. It is also likely that the listener will have to derive something similar to the phonetic plan as a stage in interpreting the utterance. The exact nature of the phonetic plan is a difficult issue, and depends on assumptions about how speech is produced and perceived.

1.4

The mapping of information onto the speech signal 7

The nature of speech This section combines the conceptualisation of the speech machine developed in section 1.3 with the different sources of information discussed in section 1.2. The purpose is to show the variety of routes by which information gets into, or is mapped onto, the speech signal. Fig 1.3 gives an overview of the mapping process.

Fig. 1.3 1.4.1

Overview of the mapping of communicative intent

The encoding of communicative intent Communicative intent is shown at the top of Fig 1.4. Its mapping onto the speech signal is mediated by the linguistic mechanism. The linguistic mechanism can be thought of as defining a kind of code shared between speakers of a language, and the process of mapping communicative intent onto the speech signal as encoding. There is not, however, a simple one-to-one relation between aspects of communicative intent and the distinct resources of the linguistic mechanism. Cognitive information, for example, is not mapped 8

The mapping of information onto the speech signal exclusively through lexical choices, and affective information is not conveyed solely through choices of tone of voice.

Fig. 1.4

Mapping of communicative intent onto the linguistic mechanism

Communicating cognitive information, the most factual, message-like element behind speaking, depends on selecting appropriate words, combining them into grammatical structures, choosing the right prosody (e.g. question or statement), and, less obviously, using the right tone of voice (it is possible to override the apparent meaning of an utterance by using an ironic or sarcastic tone of voice). Affective communicative intent the speakers attitude can likewise be conveyed in multiple ways: by choices of prosody and tone of voice certainly, but also by the words chosen, and perhaps by syntax (the lines are not shown in Fig. 1.4 to avoid a visual cats cradle). Social intent may affect any of the four resources, since finding a way of speaking appropriate to a particular social setting may involve the precise variety of a language chosen, and a particular tone of voice. Self-presentation may depend on choosing the right words, opting for more or less complex syntax, and making phonological and tone-of-voice choices. Regulation of an interaction perhaps recruits fewest resources, the completeness or incompleteness of a turn being communicated to interlocutors mainly by prosodic choices and tone of voice. 9

The nature of speech The imprinting of indexical factors It would be less appropriate to talk of indexical factors being encoded, as there is no intention on the part of the speaker, and no obvious code. The metaphor used by Laver (1994:20-21) is that of a handworker producing artifacts, and leaving traces of the apparatus used to produce the artifact and of his or her personal style. Both the apparatus and the style can leave what will be called here their imprint. A cast metal object might have a detectable seam where two halves of the mould in which it was cast joined, and a particular detail of working in its finish characteristic of the individual who worked it. Indexical factors carried in the speech signal will be regarded as the result of a similar kind of imprinting. At the left of Fig 1.5 indexical factors are shown divided into two overlapping sets, according to whether their imprint is left mainly via the linguistic mechanism or the vocal mechanism. Consider first two extreme cases. We can tell social indexical information, including geographical, from a 1.4.2

Fig. 1.5

Imprinting of indexical factors on speech

persons dialect or accent as did our hero in section 1.1 when he guessed 10

The mapping of information onto the speech signal Australian. A dialect consists in features of the linguistic mechanism specific to a given geographical and/or social group. Dialect is often taken to refer more broadly to any linguistic resource, including for instance grammar (I dont know nothing about it is grammatical in many dialects of English) and vocabulary (to laik means to play in many parts of the North of England), whilst accent refers specifically to regular pronunciation differences, such as the use of a particular set of vowels, or the use of a glottal stop for certain consonants. Admittedly it was pointed out above that people have some socially oriented communicative choices in how they speak, but for most people such choices only cover a tiny part of the total range of variation which exists in a particular language, and so individuals are likely to reveal themselves reliably as for instance a Texan, or a middle class Liverpudlian. It is fairly difficult, however, to imagine how such indexical information would have an effect directly on the vocal mechanism. Contrast the case of a speakers health. A simple cold can have a drastic effect on the state of the vocal apparatus a blocked nose makes it hard to produce words like man properly, which contain nasal consonants, and inflammation of the vocal cords makes the whole voice sound croaky. In the longer term, persistent hoarseness can be a cue to serious diseases of the larynx such as cancer, and in the short term, normal tiredness will also be reflected in aspects of a persons voice. Similarly psychological state, such as momentary stress or longer term conditions such as depression, may also leave its imprint on the speech signal as a result of bio-chemical effects on the performance of the vocal mechanism. On the other hand a persons health or psychological state would not be encoded through the linguistic mechanism (utterances such as Ive got a throat like sandpaper or Im really depressed encode a cognitive analysis of the states giving rise to the indexical information, not the indexical information itself). Between these extremes there will be indexical information which the listener may be able to glean from the speakers linguistic resources and from acoustic effects encoded in the speech wave directly by the vocal tract. Age, for instance, has direct effects on the physiology of the vocal tract, including a lowering of the larynx toward middle age which results in a deepening of the voice, and a hardening of the vocal cords in old age which contributes to a very old persons characteristically quavery voice. But age may also be indicated by aspects of a persons linguistic mechanism, for instance the use of particular words, such as wireless rather than radio; the use of slang 11

The nature of speech expressions, which have a notoriously short life-span (e.g. groovy; far out) and, of more interest to phonetics, the use of particular sounds and pronunciations. It is less apparent that pronunciation, as opposed to words and expressions, changes within a lifetime; but there is no doubt that it does. Informally we may be aware of this listening to sound recordings (e.g. in films) from some decades ago. Speakers using an educated (i.e. prestige) pronunciation of South East England were much more likely to pronounce off as awf (as in awful), and much less likely to use glottal stops (see Chapter N) than their equivalents today. Speakers may modify their pronunciations to keep up with the developing trends of their speech community, but in general they get left behind enough for pronunciation to be informative about their age. Speakers physique is frequently reflected in the speech signal. Large objects have lower natural pitches (compare a violin and a cello). Larger vocal tracts have lower resonances, and larger vocal cords vibrate more slowly. Since there is a tendency for size of vocal tract and vocal cords to correlate with size of person, we have a better than chance ability to guess which of two speakers whose voices we hear is larger. Physique leaves it imprint on the speech signal directly through the vocal tract. Sex might be regarded as a special case of physique, and it can be inferred with fair reliability. Apart from mens and womens differing (though overlapping) ranges of vocal tract and vocal cord sizes, there are also differences in the proportions of the vocal tract (concerning the pharynx, which is proportionately longer in men) which may help the cuing of sex. However there are also said to be languages where women speak a different dialect, either in terms of pronunciation or grammar, and there are well attested differences in pronunciation trends between the sexes in English (see e.g. Trudgill 1974: 84ff). If it is truly the case that men and women have different dialects, rather than merely being free to make choices from a shared linguistic system, then sex is imprinted not only through the vocal mechanism but also through the linguistic mechanism.

Speaker identity and other problems This overview of the mapping of information mapped onto the speech signal should be seen as a suggestive outline rather than a definitive analysis. Where, for instance, does personality fit in? There is evidence that personality traits correlate with certain features of speech, for instance that 12

1.4.3

The mapping of information onto the speech signal extroversion is associated with greater loudness (cf. Scherer 1979:191), but is this a matter of self-presentation, chosen by the speaker, or is it purely indexical, determined in a biological way? Nor are the boundaries between categories as clear-cut as the boxes in Fig. 1.4 would suggest. Adjusting ones way of speaking to make it more like that of an addressee may involve changes in accent, apparently of a social kind; but it may be perceived as a cue to affective information, equivalent to the use of a friendly voice quality and intonation. Another question is how accurate hearers are at utilising information carried by the speech signal. Undoubtedly many of the factors discussed are often inferred quite inaccurately. We can be surprised to discover the real age of someone who sounded quite young over the phone; we can inadvertently butt in to another persons conversational turn thinking they have signalled the end of their own turn; and of course we can even misunderstand the cognitive content of speech by mishearing a word or misparsing a sentence. A surprising omission, perhaps, from the list of indexical factors, is identity. In everyday life, particularly over the telephone, we successfully identify a person just from speech, and techniques for identifying speakers for forensic and other purposes (see Chapter N) exploit information about identity in the speech signal. The omission reflects the problems surrounding the concept of identity. It can mean the biological entity constituting a person (this is the sense which is of relevance to forensic applications), or for instance a persons membership of certain subgroups of the population. The biological entity is what a fingerprint defines, with very few exceptions. But the indirectness of the relationship between the biological person and the speech signal, and the multiplicity of information mapped onto it, makes it questionable whether much information about the biological individual is available. An alternative is that our sense of hearing a particular person is actually derived from other indexical factors. If ones uncle Herbert is a large middle-aged man from Birmingham with chronic hoarseness, then those factors in themselves suffice for an identification when his habitual Sunday morning telephone call arrives. The possibility of voice disguise, and mimicry, highlight further the limitations of a model such as the one in Fig. 1.3. Some issues can be dealt with, albeit imperfectly. What kind of communicative intent is involved? Perhaps a special case of self-presentation. What linguistic resources are being manipulated? Presumably all four may be, though we tend in particular to 13

The nature of speech think of disguise and mimicry primarily in terms of phonology and tone of voice (adopting a different accent, and speaking loudly and fast, for instance). But the model does not really allow for that part of disguise or mimcry which consists in distorting ones vocal mechanism to make it sound like that of someone else (known in the case of mimicry or unknown in the case of disguise). This is not part of the linguistic mechanism (the shared conventions of the language community hardly includes the recipe for sounding each person in that community), and the model fails to provide a direct input from intention to the vocal mechanism. So in this respect, and many others, the model is imperfect. Nonetheless it does show how the speech signal at any given moment is determined by a wide variety of factors, and how it is potentially informative to a hearer in many different ways, intended and unintended by the speaker. The next section compares the ways in which different kinds of information are carried by the speech signal.

Gradience, discreteness, and componentiality Imagine someone asking Have you ever air-mailed a miniskirt to Iceland? Imagine, too, that the speaker has a cold. As a result of the utterance the hearer should have gleaned two very different kinds of information. The first comprises the speakers communicative intent, the cognitive content of which is almost certainly novel to the hearer. The second comprises the speakers state of health, about which the speaker has unintentionally informed the hearer. The way the speech signal carries these two kinds of information contrasts in a number of respects. As discussed in section 1.4, the cold imprints itself rather directly on the speech signal through its effect on the vocal mechanism the blocked nose and inflamed vocal cords. Miniskirt will sound a bit like bidiskirt because of the blocked nose, and the whole utterance may sound rather as if it came from a talking frog. The imprinting can vary, but only in the sense that the severity of the cold will be reflected in the degree of distortion of the speakers normal voice. The imprinting of the cold on the speech signal is gradient, that is, it varies continuously. If the listener chose to respond not to the communciative intent but to the indexical information, the response, depending on the severity of the cold, might be Youve got a cold, or Youve got a bad cold, or Youve got a terrible cold or even Youve got a really terrible cold. This illustrates that 14

1.5

Gradience, discreteness, and componentiality the language code is not gradient but works in terms of choices which are discrete. The respondent either uses an adjective, or not, chooses the word bad, or terrible or some other word, and so on. This discreteness is central to how language works. It is as though language partitions our mental experience, and allocates a label or symbol to stand for each partition. Lets imagine now a simplified language, in which each partition is a complete meaning, and the symbol standing for each meaning is a unique, simple sound. The meaning Im exhausted might be conveyed by a long f fffff; You bore me by ssssss; look at this by ooooo, and so on. This situation is schematised below, where geometric symbols are used instead of sound symbols.
M1 M2 M3 M4 Meanings

Symbols / Sounds

But what about Have you ever air-mailed a miniskirt to Iceland? Despite the versatility of our vocal tracts, we would soon run out of adequately distinct (and memorable) noises to convey the infinite variety of messages we might need to convey. Not surprisingly, no human language uses this kind of direct mapping of messages onto sound. Instead, all human languages exhibit componentiality: they construct larger units of communication out of smaller discrete components. At one level, roughly speaking, areas of our mental experience are mapped onto a finite number of words (the vocabulary of the language). At another level, these words are mapped onto a comparatively small set (often less than 50) of meaningless sound units. This is represented schematically as follows
M1 M2 M3 M4 .......... M? Meanings Words Symbols / Sounds

The meaningless sound units are often called phonemes and are what are represented, albeit often inconsistently, by letters in alphabetic writing. So for 15

The nature of speech instance in pin, tin, kin, we have three different words composed of the same sound units except for the first one which crucially differentiates them. By adding s to the beginning of each of these sequences, three new words are created (stin, in fact, is not an existing word of English, but is one which could be adopted if needed). The crucial fact that all human languages associate meanings with abstract units (words or morphemes) and designate these units by sequences of discrete meaningless sound elements is sometimes known as the dual structure of language. The meaningful units can then be combined into an infinite number of more complex sequential structures (phrases, sentences) according to the syntax of the language in order to convey a limitless set of complex meanings including that of the novel and improbable utterance Have you ever air-mailed a miniskirt to Iceland?. More precisely, dual structure may be taken to refer to two levels of structure, the grammatical level and the phonological level (Laver 1994:18). Meaningless sound elements at the phonological level serve as building blocks which designate meaningful structures at the grammatical level. Componentiality and sequential structure thus circumvent the problems that would be encountered if a distinct sound had to be found for every message we wanted to communicate, and they are taken here as definitional properties of language. But it is not the case that the speaker has no control over non-componential, gradient aspects of speech. The listener could, for instance, emphasise the severity of the respiratory ailment perceived in the speaker by saying youve got a REALLY TERRIBLE COLD! that is, by giving the words in capitals extra pitch variation and length, and perhaps speaking them with a breathy voice, as if gasping at the seriousness of the situation. These uses of extra pitch and length, and of breathy voice, do not share the properties with the units of language of being discrete elements which can be structured sequentially, and so are not regarded as part of language itself. They are considered here to be choices from a linguistic resource of tone of voice lying outside the language system, but closely allied to it. Tone of voice may mimic indexical effects. A person genuinely shocked may undergo physiological changes which make an extended pitch range and a breathy voice an automatic consequence. But in the present example the vocal effects are deliberately chosen by the speaker, and are part of the conventions shared by the speaker and the hearer. Tone of voice perhaps constitutes an exploitation and incorporation into the linguistic mechanism of the kind of gradual vocal effects which naturally inform of certain indexical facts.

16

Sameness and variation Sameness and variation A language, as an abstract code, can be embodied in more than one medium or physical carrier. Speech is one, and writing another. The written medium, as exemplified in the words on this page, clearly reflects the componentiality of language. Words are separated by spaces, and made up of letters which are discrete elements. Each occurrence or token of a letter is an exact replica of the other occcurences of its type. So for instance in big dogs scare her, the spaces reflect the linguistic structure of an utterance consisting of four words, and the two tokens of the letter <s> are identical, as are the two of <g>. The structure of the language system is closely mirrored in its realisation in the printed medium. In speech, there is a potential conflict between the discrete, componential structure of language, and the natural behaviour of the vocal mechanism. Like a gymnast, the vocal mechanism does not, indeed could not, move abruptly from one static posture to another. The laws of physics, governing the velocity, acceleration, and momentum of the gymnasts body, or the vocal organs, conspire against such abruptness. The gymnast, and the vocal mechanism, produce a flowing movement. This can be seen reflected in the spectrogram of the phrase a real worry in Fig. 1.6 [p.20]. A spectrogram is a way of displaying an acoustic analysis of speech produced by a computer or other device. In the picture, time runs from left to right, so the start of the utterance is at the left, while the vertical axis shows the breakdown of the complex speech signal into the different pitches or frequencies which make it up. The darker the pattern, the more sound there is at a particular frequency. Note that there are no breaks between the words, and there are no successive static patterns corresponding to successive sounds. This is a carefully selected example, but it vividly demonstrates the fact that in the spoken medium the discrete units of language are being realised in a continuous and flowing event. In the written medium of language, it is fluent (even sloppy) handwriting rather than printing or typescript which provides a slightly closer analogy with speech. In a handwritten realisation of big dogs scare her 1.6

although the words are still separated (unlike in speech), the letters within the words are not. More interestingly the two occurrences of <g> and <s> are different. In the second <g>, the descender loops back up again across itself on 17

The nature of speech the way to the following letter, and in the second <s>, there is no diagonal rising stroke at the start of the letter, as seen below:

The marks on the paper corresponding to a particular letter-type vary according to the context in which they occur. Everyones handwriting is different, but most will exhibit equivalent examples of contextual variation. Some of the variation can be interpreted as short-cuts, but in other cases the contextual variation may be purely a matter of habit or convention. Something very similar is true of speech. A sound will vary in its pronunciation according to adjacent sounds and its position. Both instances of <k> in khaki (the <h> in the spelling is irrelevant to the English pronunciation) require the body of the tongue to make a closure against the roof of the mouth, but the first will be further back in the mouth than the second. This is in keeping with the fact that the first vowel requires the body of the tongue to be near the back of the mouth, and the second requires it to be raised at the front of the mouth. The difference can be felt best if the word is said silently. If the word is whispered, emphasing the explosion, or burst in phonetic terms, of each <k>, the second burst will be heard as higher in pitch. This is because of the smaller mouth cavity in front of the release which has higher resonant frequencies.

Phonology The situation would be chaotic if such variation were random. We could never be sure which linguistic unit a particular burst was representing. What is needed is a set of conventions which express what variation is allowable with the representatives of a particular unit, and what is not. We can consider this with an abstract example using geometric shapes, which we could imagine to be elements in a visual code. The shape on the left of the arrow represents the abstract (or ideal) element of the code. The arrow means the following ways of writing the symbol are acceptable, while the crossed out arrow means these variants are not acceptable.

1.7

18

Phonology

In fact, it is neither necessary nor feasible to list what is unacceptable instead we just assume that everything else not mentioned in the positive part of the rule is not acceptable. Much the same sort of rules can be expressed for sounds. The following rules, which simplify the situation, express what the khaki case exemplified: /k/ /k/ [k] / [] [k] / [i]

This means that the linguistic unit or phoneme /k/ can be made at the back of the mouth or nearer the front (the minus and plus under the symbols mean further back and further forward in the vocal tract respectively) depending on the vowel following. In fact a whole range of realisations between these is possible. Notice that the abstract linguistic unit, the phoneme, is shown in slants / /, while its realisation is shown in square brackets [ ]. This is an important convention in phonetics. The variants realising a phoneme are called allophones. Such a rule reconciles a coding system using invariant abstract elements with the pressures towards variability arising from the physical constraints of the vocal mechanism. Phonology is the part of the language system which deals with sound patterns, including patterns of variability. Like other terms (syntax, the lexicon etc.) describing parts of the language system, phonology can mean both the phenomena we hypothesise to be part of language, and the scientific study of those phenomena. The phonology of a language also consists of its inventory of phonemes (sound-units), and principles governing their combination. For instance the phonology of many varieties of English specifies seven short-vowel phonemes, as exemplified in the contrasting words pit, pet, pat, pot, putt, put, and the first vowel of potato. And it specifies that spray, stray, and scray are all good sequences of phonemes (scray happens not to be a word, but could be), while fpray, stlay, and tsray are ill-formed. Phonology also has a role to play in the combination of morphemes. 19

The nature of speech Morphemes are the units which used in word-building, such as point and less. Although there is a arguably a constant relationship between a morpheme and a meaning, there can be variability in the relationship between a morpheme and its pronunciation. English provides many examples of this, but the spelling often disguises them. So when the noun-forming suffix -ity is added to the adjective electric to make electricity, the [k] at the end of electric changes to [s] and the stress shifts from -lec- to -tric- ([lktrk] [lktrsti]). Like the variation in /k/ according to what vowel followed it, this is another example of invariant units of a language receiving variable realisation. The part of phonology which specifies these patterns of variation in morphemes is sometimes called morphophonology. Various aspects of phonology are dealt with in Chapter N. In the next Chapter, however, the focus switches from the linguistic mechanism to the vocal mechanism as we examine how it generates the speech signal.

Fig. 1.6 Spectrogram of a real worry, with speech wave at the top and pitch trace at the bottom

20

Vous aimerez peut-être aussi