Miller J 1990 Speech Perception

ff Fows Ae bition te Gpuitic Seieww Lampucpe Nel 4. Ed. ty Dawid Nl. Oshorson awl Henerd dori 4940 4 Combricke, Mas: HLT Preg Speech Perception Joanne L. Miller le. (1978), Further thoughts on Kasem nominals Lingusic Analy 4, 167—205. Halle, M. (1983). On’ ‘Sagey, EW. (1986). The representation of features and relations In nowliner phonelogy. ‘Dottor ion, MIT, Cambridge, MA. van der Hult, Hy and N. Smith ed. (1962). The suture of phonological representations. Dordrecht, Holland: Fors es of the language recognition process: the rec- iual speech sounds (consonants and vowels) that language. tion ofthis chapter was supported by NIH grant NS 14394 and NIH BRSG RR7 ‘The study of this aspect of language comprehension, known as speech perception, tions have come from such diverse fields as lingus speech science, psychol- ogy, and electrical engineering. Although a complete theory of speech perception is not at hand, much has been learned about both the physical properties of speech and the way in which these physical properties are processed by human listeners. One of the major findings has been that the recognition of speech sounds is far from being simple and straightforward. In this chapter we will examine what makes the problem of speech rec- jon s0 complex, and we will discuss possible ways in which the xn processing system solves the problem, rendering speech recognition as effortless and accurate a 4.1 Basic Characteristics of Speech Perception ‘As we leammed in chapter 3, any utterance of a natural language can be analyzed in terms of its sound structure, which consists of an ordered sequence of speech sounds, called phonetic segments. Each word of the language is composed of a particular sequence of segments. Take as an example the word suit. It is composed of three phonetic segments, in a par- ular order: the and the final consonant soun the International Phonetic Ass this sequence of three sounds can be represented as [ ively. We can think of the problem of speech perception as one of how the listener recognizes the particular sequence of segments that was produced by the speaker— phonetic segments. To that every time a speaker said the sound [s}, a certain distinctive kind of Scgustic energppatiem would be produced: every tims the speaker said the FOURHUT yet another distinctive acoustic pattem wbuld be produced: speech worked inthis way, then fér every individual phonetic segment of the language—every distinct congonant and vowel that we perceive—there would be one and only one distinctive acoustic pattem of energy. The task of speech perception woufl be straightforward, In order to recognize the sequence of phonetic segments intended by the speaker (and hence the intended words), the lstéper would only have to recognize which of the distinctive energy patierns had been produced, and the order in which these had been produced, Speech Perception 71 speech does not work like this (see Liberman, et al. 1967), in large part because of the way in which we produce speech. When uttering a given word, such as sut, we do not produce each sound—each segment— independently, that is, fist [s} then [ul, and.then [t), with the next sound beginning only after the previous sound has been completed. This would in fact make speech production very slow and laborious—similar to spell ing out loud. Instead, when we are producing the [s] sound, we are aleady preparation for shaping our articulators (tongue, jaw, ips, and so the [u] sound—and we are even preparing to produc This means that the arti within a word overlap one articulated but are instead, n, individual phonetic imple way to single, dis speech signal ‘There are two basic wa rapping between the phone speech signal. These have to do wit in which coart nts intended by the speaker and the Kcr Ina spectrogram the across time; frequency the abscissa (x-axis, As an example, a spectrogram of roduced by a male speaker of English, is shown in the left panel of figure 4.1. This more subtle This can be shown experimentally in the following way (See Yer Komshian and Soli 1981). Using modern computer-editing techniques, an ‘experimenter can make a cut in the speech signal at the acoust tween segments A and B and play the segments individual Upon hearing segment A, the listener will identify the consonar high accuracy, as we would expect. And, upon hearing segment B, tener will easily identify the vowel ful. However, upon hea and being asked to identify the vowel2 Miller [su] [Su] FREQUENCY i TINE A Figure 41 ‘Spectrogram of el and Us, produced by a male speaker of Eoglish. tener can also answer quite accurately. This demonstrates that segment A contains information not only about which consonant was produced but also about which vowel was produced—vowel information is not limi to segment B. In other words, information about the consonant and vowel It can be transmitted in parallel, by the same segment of the acoustic signal | f (in this case segment A). It is important to emphasize that such par ul culation of the con- an the exception in phonetic segments is lel by the same acoustic segment, the listener must havea way 1g” each acoustic segment in terms of tion it provides about multiple phonetic segments. This means that speech perception cannot simply be a matter of recognizing each acoustic seg- rent and matching it to a single phonetic segment, one by one, from the beginning to the end of the word, sonant with the upcoming vowel, speech. And because information abo! 411.2 Invariance ‘A closely related complication that coarticulation introduces into the ‘mapping between phonetic segments and acoustic segments is that there Speech Perception 73 2 given phonetic segment of the language. Because of coarticulation, the precise form of an acoustic property that is important for the recognition of a par- cular consonant or vowel changes according to the phonetic context in the segment is produced. and (3 (the “sh” sound in ship), These two consonants Fricatives, which are produced with a const sufficiently narrow to gen concentration of energy at relatively hig than does the frication noise fr (3), This is illustrated by the spectrograms of [su] and [8u) in figure 4.1, which were produced by the same speaker. If you compare the frication noises (labeled segment A) on the two spectrograms, you will notice that the energy for the [s}-noise is shifted upward along the frequency scale (y-axis), compared to the energy for the {3}: So far, there is no problem. The frication noise for «higher frequency region than that for [8], s0 tha tify whether [s] or [8} had been spoken simply by noti noise of [s] and [3 is not invariant; instead, it changes, depending on which ‘vowel follows the fricative during production (Mann and Repp 1980). We can see this context-dependency by considering what happens wher and [3] are produced in the context of pends not only on which ficative was produced, vowel context in which it was produced, [aor fu. The consequence for der to corredt following vowel. This means that information ajter the frcation noi relevant to the identification of the fricative consonant. Thus, speech p

Miller J 1990 Speech Perception

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Miller J 1990 Speech Perception

Transféré par

Droits d'auteur :

Formats disponibles

Vous aimerez peut-être aussi