Académique Documents
Professionnel Documents
Culture Documents
INTRODUCTION
and are collectively called Hoomii.1,2 Hoomii includes a very low-pitched bass-type singing technique in Mongolia called Kargyraa, also practiced in
some Tibetan monasteries under the name of Dzo-ke.
Dzo-ke and Kargyraa are sung at a low fundamental
frequency of around 70 to 100 Hz and are performed
with or without the enhancement of single overtones.
In 1967, Smith et al.3 postulated that this kind of
singing is produced with double oscillators or asymmetrically vibrating vocal folds. They analyzed
sound recordings of singing monks, using sonagrams. They found what they called odd harmonics as the subjects changed from ordinary voice to
the bass-type singing. The same authors speculated
Mongolian throat singing has become the common label for a group of different singing techniques
that often include overtone singing. In Mongolia
these techniques form part of traditional folk music,
Accepted for publication June 21, 2000.
Address correspondence and reprint requests to Per-ke
Lindestad, Department of Logopedics and Phoniatrics, Huddinge University Hospital, S-141 86 Stockholm, Sweden.
e-mail: per-ake.lindestad@logphon.hs.sll.se
This paper is a revised version of one presented at the CoMet
session, Twenty-Third Congress of the International Association of Logopedics and Phoniatrics, Amsterdam, August 1998.
78
79
80
PER-KE LINDESTAD ET AL
judged from the DAT audio recordings made simultaneously with the high-speed recordings as well as
the inverse filtering recordings (see below).
Flow inverse filtering
Flow inverse filtering was performed on a separate
occasion using a face mask and the system of Glottal
Enterprises (Syracuse, NY).8 The mask was tightly
sealed to the subjects face to avoid air leakage.10 As
was the case during the high-speed filming, the subject
alternated between modal voice and throat singing.
The sustained vowel [a:] was chosen due to its high
first formant frequency which is necessary for a successful inverse filtering analysis. For this reason, it was
not possible to use the vowel [o:] for this recording.
The subject chose pitch and loudness that was comfortable for throat singing. Based on perceptual evaluation (by consensus of the authors), the voice qualities
of the modal voice and throat singing were considered
the same during the inverse filtering as during the
high-speed filming, and the results were compared although the recordings were nonsimultaneous.
Inverse filtering analysis
The antiresonances for the first and second formants, respectively, and their bandwidths were based
on the modal phonation and set manually by one of
the authors (MS). The settings were kept the same for
the throat singing. The inverse filtering was performed online and the flow signal was recorded using the Soundswell (Stockholm, Sweden) program.11
The flow glottogram from the throat singing was not
as successfully analyzed as the modal phonation
since some ripple, probably from the first formant,
remained. Probably the subject changed his articulation slightly when he changed the type of singing.
However, the analysis was considered satisfactory
for describing the main differences between the flow
glottograms. The flow signal was not calibrated. The
results of the flow glottograms were therefore described qualitatively.
Acoustic spectral analysis
The acoustic signal, including both modal voice and
throat singing, was analyzed with narrow-band spectra
and with the help of spectrum sections using the
Soundswell analysis program.11 Spectral analyses was
chosen to examine the harmonics and noise in the
acoustic signal of both modal voice and throat singing.
Journal of Voice, Vol. 15, No. 1, 2001
Kymography
Using a computer program developed at Huddinge
University Hospital,9 kymographic images of the
high-speed glottal and supraglottal vibrations were
created.12 A transversal line was placed across the
glottis on the high-speed image at the place of maximal ventricular fold vibration amplitude, where also
the vocal fold vibrations could be visualized. The program excluded all other lines from the picture frames
and added the chosen line from consecutive images to
form one continuous picture of vibrations over time.
The kymographic image was compared to the sound
signal and to the high-speed images, frame by frame.
RESULTS
The analysis was done for a portion of phonation at
which a good close-up image of the voice source was
aquired and the phonation was stable. This portion
contained a transition from what perceptually sounded like modal phonation into the throat singing
mode. The phonation portion was analyzed using
acoustic spectra, digital pictures, and kymography.
Perceptual evaluation
Perceptually, the modal phonation was sonorous
and slightly hyperfunctional/pressed. The perceived
pitch was estimated at approximately D3 (140 Hz).
After transition to throat singing the voice sounded
extremely low pitched, estimated to be one octave
lower around D2 (70 Hz). This mode was characterized by high intensity, sonority, and slight press.
Spectrum analysis
A narrow-band spectrogram of the recorded
phonation is shown in Figure 1. The first part of the
spectrogram (00.15 s) represents the end portion of
the previous throat singing phonation and will be disregarded in the following.
Modal phonation starts just before 0.2 second and
continues to about 0.45 second. In this portion the
partials are well defined above the fundamental of
about 140 Hz. Around 0.45 second noise between
partials occur. The transition continues to around 0.8
second and in the section after 0.67 second the fundamental is slightly raised and the partials become
very unstable and difficult to distinguish. At about
0.8 second, as the throat singing mode is established,
the pattern becomes very regular with a new subhar-
81
FIGURE 1. Narrow-band spectrogram of the analyzed phonation. For explanations see text below.
monic added below the fundamental and with subharmonics between the partials up to 1000 Hz. Figures 2 and 3 show spectrum sections from 0.2 and 0.8
second, respectively. When superimposed on each
other the second section (Figure 3) matches the first
section (Figure 2) with subharmonics added in every
gap between partials up to around 2500 Hz.
Flow inverse filtering
Flow glottograms of modal voice and throat
singing are shown in Figure 4. The glottal pulses in
modal phonation are regular with a clear closed
phase, as expected. The fundamental frequency (F0)
for the modal phonation during the inverse filtering
task was somewhat lower than that in the high-speed
recording, B3 (around 120 Hz). The perceptually rated F0 for the throat singing during the inverse filtering task was one octave lower, B2 (around 60 Hz).
The flow glottograms from the throat singing showed
an evident pattern in that every second airflow pulse
was lower in amplitude. The airflow pulses with lower amplitude were interpreted to be the result of
damping of the airflow brought about by the ventricular fold vibrations. Evidently, although it was not
complete, the damping was efficient enough to make
the signal sound as if F0 had been lowered one octave.
High-speed imaging
In modal phonation, normal vocal fold vibrations
with somewhat lower amplitude and a normal mucosal wave was noted. In addition, coexisting lowamplitude ventricular fold vibrations with incom-
82
PER-KE LINDESTAD ET AL
FIGURE 2. Spectrum section of the modal voice from the interval 0.20.3
second of the spectrogram in Figure 1.
GLOTTAL AIRFLOW
FIGURE 3. Spectrum section of throat singing from the interval 0.80.9 second of the spectrogram in Figure1.
MODAL VOICE
0.05 sec
GLOTTAL AIRFLOW
THROAT SINGING
0.05 sec
FIGURE 4. Inverse filtered signal, upper line modal phonation, and lower line for the throat singing mode. The y axis
shows transglottal airflow and the x axis shows time. During the throat singing every second pulse is much more shallow
with a slow decrease in flow compared to the other pulses that look similar to the modal pulses.
Journal of Voice, Vol. 15, No. 1, 2001
83
FIGURE 5. Consecutive frames from the high-speed images during two cycles of vocal fold vibration
and one full ventricular vibration. The frames were chosen from around 0.8 second in the narrow-band
spectrogram of Figure 1. Note that the vocal folds are still closed when the ventricular folds part (images
610). The following vocal fold opening and closing can be easily seen (images 1117). A low-amplitude
ventricular fold vibration takes part during frames 1421 approximately. The beginning of the next vocal
fold opening can be noted in image 25 but the rest is concealed by the closing ventricular folds as is the
next vocal fold closure.
Journal of Voice, Vol. 15, No. 1, 2001
84
PER-KE LINDESTAD ET AL
FIGURE 6. A kymographic image showing a section of bass-type throat singing. The line that was added from consecutive images to create the picture to the lower right is marked in the left picture. The sound signal is shown in the top image. The white
shadows coming in from top and bottom are the ventricular folds while the dark gray ones represent the true vocal folds. The
somewhat blurred light gray shadow between ventricular closures represents the low-amplitude oscillations of the ventricular
folds between closures. These are almost simultaneous with the vocal fold closure. The vocal folds vibrate with the same frequency as they did in modal phonation throughout the sequence, while the ventricular folds close during every second vocal fold
open phase concealing the next vocal fold closure. Note also that the sound excitation follows every second vocal fold closure
and not the ventricular closure.
REFERENCES
1. Pegg C. Mongolian conceptualizations of overtone singing
(Hoomii). Br J Ethnomusicol. 1992;1:31-54.
85
2. Levin TC, Edgerton ME. The throat singers of Tuva. Sci Am.
1999;281:70-77.
3. Smith H, Stevens K, Tomlinson R. On an unusual mode of
chanting by certain Tibetan lamas. J Acoust Soc Am.
1967;41:1262-64.
4. Hertegrd S, Lindestad P. Vocal fold vibrations studied
during phonation with high-speed-video imaging. Phoniatr
Logop Prog Rep. 1994;9:33-40.
5. Hammarberg B. High-speed observation of diplophonic
phonation. In: Fujimura O, Hirano M, eds. Vocal Fold Physiology-Voice Quality Control. San Diego, Calif: Singular
Publishing Group; 1995:243-245.
6. Larsson H, Hertegrd S, Lindestad P, et al. Vocal fold vibrations studied with high-speed imaging, kymography and
acoustic analysis. Phoniatr Logop Prog Rep. 1999;11:7-16.
7. Fuks L, Hammarberg B, and Sundberg J. A self sustained
vocal-ventricular phonation mode: acoustical, aerodynamic
and glottographic evidences. TMH-QPSR 1998;3:49-59.
8. Rothenberg M. A new inverse filtering technique for deriving the glottal air flow waveform during voicing. J Acoust
Soc Am. 1973;53:1632-1645.
9. Larsson H. High-Speed Tool Box. Custom made program.
Manual. 1998. Karolinska Institute, Department of Logopedics and Phoniatrics, Huddinge University Hospital, Huddinge, Sweden.
10. Holmberg E, Hillman RE, Perkell JS. Glottal airflow and
transglottal air pressure measurements for male and female
speakers in soft normal and loud voice. J Acoust Soc Am.
1988;84:511-529.
11. Ternstrm S. Soundswell-Signal Workstation Software. Manual version 3.4, 1996. Nyvalla DSP, Stockholm, Sweden.
12. Svec JG, Schutte H. Videokymography: high-speed line
scanning of vocal fold vibration. J Voice 1996;10:201-205.
13. Von Deorsten PG, Izdebski K, Ross JC, et al. Ventricular
dysphonia:a profile of 40 cases. Laryngoscope. 1992;102:
1296-1301.
14. Blixt V, Pahlberg-Olsson J. The Role of Ventricular Fold
Co-vibration in Ventricular Voice [masters thesis]. Dept of
Logopedics and Phoniatrics, Huddinge University Hospital,
Huddinge, Sweden; 1999.