0 évaluation0% ont trouvé ce document utile (0 vote)
14 vues12 pages
I I FREQUENCY no if a CD Probability [dal identification OD FREQUENCY in(KHz) (,) FREQUENCY r UI 0 UI Ul "'0 to. To PERCENT LABELED Ib] o o Ii :. Mean number of suckin.
I I FREQUENCY no if a CD Probability [dal identification OD FREQUENCY in(KHz) (,) FREQUENCY r UI 0 UI Ul "'0 to. To PERCENT LABELED Ib] o o Ii :. Mean number of suckin.
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
I I FREQUENCY no if a CD Probability [dal identification OD FREQUENCY in(KHz) (,) FREQUENCY r UI 0 UI Ul "'0 to. To PERCENT LABELED Ib] o o Ii :. Mean number of suckin.
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
ff Fows Ae bition te Gpuitic Seieww Lampucpe
Nel 4. Ed. ty Dawid Nl. Oshorson awl
Henerd dori
4940
4 Combricke, Mas: HLT Preg
Speech Perception
Joanne L. Miller
le. (1978), Further thoughts on Kasem nominals Lingusic Analy 4, 167—205.
Halle, M. (1983). On’
‘Sagey, EW. (1986). The representation of features and relations In nowliner phonelogy.
‘Dottor ion, MIT, Cambridge, MA.
van der Hult, Hy and N. Smith ed. (1962). The suture of phonological representations.
Dordrecht, Holland: Fors
es of the language recognition process: the rec-
iual speech sounds (consonants and vowels) that
language.
tion ofthis chapter was supported by NIH grant NS 14394 and NIH BRSG RR7
‘The study of this aspect of language comprehension, known as speech
perception, tions have come
from such diverse fields as lingus speech science, psychol-
ogy, and electrical engineering. Although a complete theory of speech
perception is not at hand, much has been learned about both the physical
properties of speech and the way in which these physical properties are
processed by human listeners. One of the major findings has been that the
recognition of speech sounds is far from being simple and straightforward.
In this chapter we will examine what makes the problem of speech rec-
jon s0 complex, and we will discuss possible ways in which the
xn processing system solves the problem, rendering speech recogni-
tion as effortless and accurate a
4.1 Basic Characteristics of Speech Perception
‘As we leammed in chapter 3, any utterance of a natural language can be
analyzed in terms of its sound structure, which consists of an ordered
sequence of speech sounds, called phonetic segments. Each word of the
language is composed of a particular sequence of segments. Take as an ex-
ample the word suit. It is composed of three phonetic segments, in a par-
ular order: the
and the final consonant soun
the International Phonetic Ass this sequence of three
sounds can be represented as [ ively. We can think of
the problem of speech perception as one of how the listener recognizes
the particular sequence of segments that was produced by the speaker—
phonetic segments. To
that every time a speaker said the sound [s}, a certain distinctive kind of
Scgustic energppatiem would be produced: every tims the speaker said
the FOURHUT yet another distinctive acoustic pattem wbuld be produced:
speech worked inthis way, then fér every individual
phonetic segment of the language—every distinct congonant and vowel
that we perceive—there would be one and only one distinctive acoustic
pattem of energy. The task of speech perception woufl be straightfor-
ward, In order to recognize the sequence of phonetic segments intended
by the speaker (and hence the intended words), the lstéper would only
have to recognize which of the distinctive energy patierns had been pro-
duced, and the order in which these had been produced,
Speech Perception 71
speech does not work like this (see Liberman, et al. 1967), in large
part because of the way in which we produce speech. When uttering a
given word, such as sut, we do not produce each sound—each segment—
independently, that is, fist [s} then [ul, and.then [t), with the next sound
beginning only after the previous sound has been completed. This would
in fact make speech production very slow and laborious—similar to spell
ing out loud. Instead, when we are producing the [s] sound, we are aleady
preparation for
shaping our articulators (tongue, jaw, ips, and so
the [u] sound—and we are even preparing to produc
This means that the arti
within a word overlap one
articulated but are instead,
n, individual phonetic
imple way to single, dis
speech signal
‘There are two basic wa
rapping between the phone
speech signal. These have to do wit
in which coart
nts intended by the speaker and the
Kcr
Ina spectrogram the
across time; frequency
the abscissa (x-axis, As an example, a spectrogram of
roduced by a male speaker of English, is shown in the
left panel of figure 4.1. This
more subtle
This can be shown experimentally in the following way (See Yer
Komshian and Soli 1981). Using modern computer-editing techniques, an
‘experimenter can make a cut in the speech signal at the acoust
tween segments A and B and play the segments individual
Upon hearing segment A, the listener will identify the consonar
high accuracy, as we would expect. And, upon hearing segment B,
tener will easily identify the vowel ful. However, upon hea
and being asked to identify the vowel2 Miller
[su] [Su]
FREQUENCY
i
TINE
A
Figure 41
‘Spectrogram of el and Us, produced by a male speaker of Eoglish.
tener can also answer quite accurately. This demonstrates that segment A
contains information not only about which consonant was produced but
also about which vowel was produced—vowel information is not limi
to segment B. In other words, information about the consonant and vowel It
can be transmitted in parallel, by the same segment of the acoustic signal | f
(in this case segment A). It is important to emphasize that such par
ul culation of the con-
an the exception in
phonetic segments is
lel by the same acoustic segment, the listener must
havea way 1g” each acoustic segment in terms of
tion it provides about multiple phonetic segments. This means that speech
perception cannot simply be a matter of recognizing each acoustic seg-
rent and matching it to a single phonetic segment, one by one, from the
beginning to the end of the word,
sonant with the upcoming vowel,
speech. And because information abo!
411.2 Invariance
‘A closely related complication that coarticulation introduces into the
‘mapping between phonetic segments and acoustic segments is that there
Speech Perception 73
2 given
phonetic segment of the language. Because of coarticulation, the precise
form of an acoustic property that is important for the recognition of a par-
cular consonant or vowel changes according to the phonetic context in
the segment is produced.
and (3 (the “sh” sound in ship), These two consonants
Fricatives, which are produced with a const
sufficiently narrow to gen
concentration of energy at relatively hig
than does the frication noise fr (3), This is illustrated by the spectrograms
of [su] and [8u) in figure 4.1, which were produced by the same speaker. If
you compare the frication noises (labeled segment A) on the two spectro-
grams, you will notice that the energy for the [s}-noise is shifted upward
along the frequency scale (y-axis), compared to the energy for the {3}:
So far, there is no problem. The frication noise for
«higher frequency region than that for [8], s0 tha
tify whether [s] or [8} had been spoken simply by noti
noise of [s] and [3 is not invariant; instead, it changes, depending on which
‘vowel follows the fricative during production (Mann and Repp 1980). We
can see this context-dependency by considering what happens wher
and [3] are produced in the context of
pends not only on which ficative was produced,
vowel context in which it was produced, [aor fu. The consequence for
der to corredt
following vowel. This means that information ajter the frcation noi
relevant to the identification of the fricative consonant. Thus, speech p