Vous êtes sur la page 1sur 41

Overview of The Speech Production and Recognition Process

EE 627 Speech Signal Processing Lecture 1/2: Overview, Modeling speech, and Categorization of speech sounds
R. Hegde Dept. of Electrical Engg. IIT Kanpur

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Overview of The Speech Production and Recognition Process

Outline

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Outline

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Why ?

Speech is a natural mode of communication Human Communication is Device driven these days - man to machine, machine to man, machine to machine Communication with machines must be made natural Interest of Big Companies Success Stories $$$

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Applications
Telephone applications Hands free operation Assistive living Dictation Translation Emergency response Situational Awareness Multi modal processing Many more ....

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Viewing the Speech Signal - Temporal and Spectral

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

What We need to know to do Signal-Symbol Transformation


Need to understand the speech production mechanism Identify and dene units for the signal (acoustic) to symbol (language specic sound units) tranformation Choice of sound units - Based on language (Phonemes), Based on the physiological model (Articulatory phonetics), Based on the source-systemmodel of production (Acoustic phonetics), Based on the signal (signal parameters and features) Describe human sound production in terms of Articulatory phonetics and Acoustic phonetics Study categories of excitation - Voiced/Unvoiced/Silence Nature of the speech signal in terms of source-system, segmental/suprasegmental, temporal/spectral
rhegde @ iitk.ac.in EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Why is Speech Recognition Dicult


Word boundary hypothesis is still an unsolved problem due to continuity, variability, and disuencies in speakers Speaking rate variability Large vocabularies in all languages Variability in ambient acoustics, channel characteristics, microphone characteristics, background noise Adaptation to the variability Practical usability of algorithms - training algorithms that run for days are not useful Coarticulation - take, stake, tray, straight, butter, Kate ....

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Why is Speech Recognition Dicult ..


Dierent phrases sound the same Its not easy to wreck a nice beach Its not easy to recognize speech Its not easy to wreck a nice beach Sly drool Slide rule say s say yes Semantic sense or non sense Carter plans swell decit Farmer Bill dies in house Stud tires out
rhegde @ iitk.ac.in EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Why is Speech Recognition Dicult ..

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Speech Production and Perception Mechanism

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Speech Recognition Layers

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The SR Layers

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Speech Recognition - Dimensions

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Outline

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Physiological Model of Speech Production

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Source-System Model of Speech Production

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Analogy between PHY and MATH Model

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Descriptive Signal Level Analogy

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Reasonances of the Vocal Tract

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Length of Waves and Tube constraints

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Relating Articulation and the Acoustic Spectrum

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Schwa (Neutral Vowel) Fundamentals


From sound wave theory f = c / , f the frequency, c the velocity of sound 35K/sec, and the wavelength A sound with =10 meters, has low frequency f = 35 Hz (35,000/1000) A sound with =2 centimeters, has high frequency f = 17,500 Hz (35,000/2) There are 3 fundamental frequencies (formants) of the schwa, In general L = 17.5 F1=c/1 =c/4L = 35000/4*17.5 = 500 Hz F2=c/2 =c/(1/3)4L = 3*35000/4*17.5 = 1500 Hz F3=c/3 =c/(1/5)4L = 5*35000/4*17.5 = 2500 Hz A neutral vowel has 3 reasonances (formants) at 500, 1500, 2500 Hz
rhegde @ iitk.ac.in EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Outline

Overview of The Speech Production and Recognition Process Speech Recognition Why and What Modeling The Speech Production Mechanism Categorization of Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Phoneme Set

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Phonetic Transcription - Words

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Phonetic Transcription - Digits

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Classication Tree for Speech Sounds

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Vowel Space

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Vowel Triangle

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

The Dipthong Space

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Articulatory Models for Vowels

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Vowel Space of American English Vowels

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

/iY/ and /uW/

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

/ae/ and /aa/

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Place and Manner of Articulation


POA : Dental, Alveolar, Palatal, Velar, Uvular, Pharyngeal, Glottal ; MOA : Oral and nasal stops

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Consonant Classication based on Place of Articulation

Location where airow is constricted is the place of articulation Labial (with lips), Coronal (using tip or blade of tongue), Dorsal (using back of tongue) Lets see if http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html works for a discussion Bilabial: p, b, m ; Labiodental: f, v ; Dental: th/dh ; Alveolar: t/d/s/z/l ; Post: sh/zh/y; Velar: k/g/ng

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

Consonant Classication based on Manner of Articulation


Stops : No air through the mouth with a complete closure of articulators Oral stops : palate is raised, no air escapes through nose. Air pressure builds up behind closure, explodes when released /p, t, k, b, d, g/ Nasal stops : oral closure, but palate is lowered, air escapes through nose /m, n, ng/ Fricatives : There is a close approximation of two articulators resulting in turbulent airow between them producing a hissing sound /f, v, s, z, th, dh/ Approximant: Not so close approximation of two articulators and no turbulence / y, r / Lateral approximant : Obstruction of airstream along center of oral tract, with opening around sides of tongue / l / Aricate Stop immediately followed by a fricative / ch, jh /
rhegde @ iitk.ac.in EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

POA and MOA Table

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

ARPA and IPA Transcriptions

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Speech Recognition Why and What Overview of The Speech Production and Recognition Process Modeling The Speech Production Mechanism Categorization of Speech Sounds

References

Thomas Quatieri, Discrete Time Speech Signal Processing, Prentice Hall James Glass, Speech Recognition Open Course Ware, MIT http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall

rhegde @ iitk.ac.in

EE 627 - Speech Signal Processing, Lecture 1/2

Vous aimerez peut-être aussi