Académique Documents
Professionnel Documents
Culture Documents
Vowel Quadrilateral
Singer’s Formant
Trained singers can tune their voices to match the fundamental frequencies of
one or more harmonics of sound source.
The effect of tuning is a louder sound and improved aspects of vocal quality is
observed
Acoustically, the singer’s formant results in a spectral peak around 2500-3000Hz.
Filters
-Vocal Tract = Band Pass Filter
-eliminates or reduces certain frequencies of the vibrations of the vocal folds.
This is why we have formants and not all the frequencies.
-Broadly tuned filters = slow attenuation of frequencies outside of the cutoff
frequency
-Sharply tuned filters = fast attenuation outside of the cutoff frequency
-Cutoff frequency = The half power point, where the amplitude of the frequency
component is decreased by 3dB.
Sound Spectroscopy
Time frequency trade-off : The more time points there are, the lower the time resolution
and the higher the frequency resolution and vice versa.
Smaller/narrow bandwidth = higher frequency resolution
Larger/wide bandwidth = higher time resolution
Instruments Used:
Ultrasound : mainly used for capturing tongue posture during speech production
Cat Scan and MRI : benefit is we get a 3D reconstruction of tissues
Regular X-Ray:
Harmonic to Noise Ratio
-The ratio of energy in the fundamental frequency and harmonics, to the energy
in the aperiodic noise component of the speech signal, averaged over several cycles.
In irregular speech: the ratio of harmonics and fundamental frequency is
reduced.
-Jitter/shimmer H/N are time-based measures
-in moderate/severe dysphonia identification of cycle boundaries is
difficult and this measure may not be reliable.
Cepstral Measures
-A Fourier Transform of the power spectrum that shows the extent to which the
fundamental frequency and harmonic structure stand out from the background noise.
-Relative amplitude of dominant cepstral peak correlates well with perception of
breathiness and abnormal vocal quality
*CPP (cepstral peak prominence) - does not depend on time related information,
which makes it a good measure and easy to perform.
*CPP decrease is correlated with poor voice quality
Consonants
Stops and Fricatives
Constriction
Speech production is aerodynamic
Airflow is “egressive”
Vowel Transitions
The transition between the vowel and the consonant (VC) or the consonant and
the vowel (CV)
These allow the brain to distinguish between sounds and tell the difference
between consonant sounds (ex: /th/, /f/)
Vowel transitions are examples of coarticulation
Place of Articulation
Bilabial (p,b,m,w)
Constriction at the lips/lips come together
Labiodental (f,v)
Lower lip and upper teeth constriction
Constriction by the tongue tip against other areas (“lingual” articulation)
Dental - tongue tip with front upper teeth
Alveolar (t, d, z, s, n, l) - tongue tip with alveolar ridge
Palatal - tongue tip and hard palate
Retroflex (shape of tongue can be different)
Palatal or alveolar
Velar (k, g, )
back of tongue against soft palate
Pharyngeal fricative (h)
Back of tongue and pharynx (voiceless)
Glottal
(glottal stop)
Glottal consonants are voiceless, where glottis rapidly closes
Ex: butter in british, ah-ah in english
Phonetic Description of Consonants
V+ = voiced, V- = voiceless
Stops
Five acoustic cues important for perception of stops
Silence
Burst noise
Aspiration
Voice onset time (VOT)
CV or VC vowel formant transition
Release burst: transient burst noise upon release of the occlusion and impounded air
(Ex: pop sound heard in microphone)
Duration : approx 10-30ms for voiced stops, longer for voiceless cognates
Observed in waveforms as a “sudden change in amplitude”
Observed in spectrogram as “sudden appearance of energy at many frequencies