Mongolian Singing

Journal of Voice
Vol. 15, No. 1, pp. 78-85

2001 The Voice Foundation
Voice Source Characteristics in Mongolian

Throat Singing Studied with High-Speed Imaging
Technique, Acoustic Spectra, and Inverse Filtering
*Per-ke Lindestad, *Maria Sdersten, Bjrn Merker, and Svante Granqvist
*Karolinska Institute, Department of Logopedics and Phoniatrics, Huddinge University Hospital, Huddinge, Sweden;
The Institute for Biomusicology, Mid Sweden University, stersund, Sweden;
Department of Speech, Music, Hearing, Royal Institute of Technology, Stockholm, Sweden
Summary: Mongolian throat singing can be performed in different modes. In

Mongolia, the bass-type is called Kargyraa. The voice source in bass-type throat
singing was studied in one male singer. The subject alternated between modal
voice and the throat singing mode. Vocal fold vibrations were observed with
high-speed photography, using a computerized recording system. The spectral
characteristics of the sound signal were analyzed. Kymographic image data
were compared to the sound signal and flow inverse filtering data from the same
singer were obtained on a separate occasion. It was found that the vocal folds
vibrated at the same frequency throughout both modes of singing. During throat
singing the ventricular folds vibrated with complete but short closures at half the
frequency of the true vocal folds, covering every second vocal fold closure. Kymographic data confirmed the findings. The spectrum contained added subharmonics compared to modal voice. In the inverse filtered signal the amplitude of
every second airflow pulse was considerably lowered. The ventricular folds appeared to modulate the sound by reducing the glottal flow of every other vocal
fold vibratory cycle. Key words: Mongolian throat singingVentricular fold
phonationHigh-speed photographyKymographyFlow inverse filtering.
INTRODUCTION
and are collectively called Hoomii.1,2 Hoomii includes a very low-pitched bass-type singing technique in Mongolia called Kargyraa, also practiced in
some Tibetan monasteries under the name of Dzo-ke.
Dzo-ke and Kargyraa are sung at a low fundamental
frequency of around 70 to 100 Hz and are performed
with or without the enhancement of single overtones.
In 1967, Smith et al.3 postulated that this kind of
singing is produced with double oscillators or asymmetrically vibrating vocal folds. They analyzed
sound recordings of singing monks, using sonagrams. They found what they called odd harmonics as the subjects changed from ordinary voice to
the bass-type singing. The same authors speculated
Mongolian throat singing has become the common label for a group of different singing techniques
that often include overtone singing. In Mongolia
these techniques form part of traditional folk music,
Accepted for publication June 21, 2000.
Address correspondence and reprint requests to Per-ke
Lindestad, Department of Logopedics and Phoniatrics, Huddinge University Hospital, S-141 86 Stockholm, Sweden.
e-mail: per-ake.lindestad@logphon.hs.sll.se
This paper is a revised version of one presented at the CoMet
session, Twenty-Third Congress of the International Association of Logopedics and Phoniatrics, Amsterdam, August 1998.
78
VOICE SOURCE CHARACTERISTICS IN MONGOLIAN THROAT SINGING

that the voice might be the result of an interaction
with subglottal resonances. Ventricular fold vibration
was not mentioned as a possible explanation for the
odd harmonics, although such vibration might play
a role in a double oscillator. In later studies it has
been shown with spectrograms that subharmonics
occur when switching from modal voice to throat
singing.2 This phenomenon was interpreted by the
authors as ventricular fold vibration at half the frequency of the vocal folds. However, until now no direct evidence for these interpretations, such as might
be provided by imaging techniques, has been available. Although videostroboscopy gives quite reliable
images in the study of regular vibrations, it appears
much more attractive to study a two-level voice
source with a high-speed system, which gives the
time resolution needed for the study of individual vibratory cycles. The Kodak Ectapro (Kodak,
Rochester, NY) system for high-speed imaging has
been used at the Department of Logopedics and Phoniatrics at Huddinge University Hospital for 5 years.
Recordings on video and simultaneous recordings of
sound for acoustic analysis have been performed.4,5
Recently, a different system [Weinberger Speedcam
(Weinbeger, Dietikon, Switzerland)] was used. This
provides higher resolution, improved possibilities for
digital storage, and better options for analyses of images and sound.6 In a recent study performed at our
department with this new equipment, Fuks et al.7 reported on self-sustained ventricular fold vibration in
different singing modes, one of which was similar
and possibly identical to Mongolian bass-type
singing. With high-speed imaging technique it could
be shown that the ventricular folds vibrated with half
the frequency of the vocal folds, probably damping
the sound signal of every second glottal pulse in this
mode of singing.
Inverse filtering8 of the glottal flow volume velocity waveform allows indirect estimation of the vocal
fold vibratory pattern. Studying a double voice
source with this technique was assumed to add valuable information to help interpret the activity of the
voice source. The purpose of the present study was to
further analyze Mongolian throat singing by comparing the voice source characteristics to modal voice in
the same subject. The means for the analyses were
high-speed imaging technique, acoustic analysis, inverse filtering, and kymographic images derived
from the high-speed recordings.
79
MATERIAL AND METHOD

Subject
Bass-type throat singing was studied in one of the
authors (BM), a male Swedish musicologist who has
received expert instruction from both Tibetan and
Mongolian performers and who is currently a professional performer of throat singing.
Voice material
The subject sustained the back vowel [o:] while alternating between his normal bass voice (henceforth
called modal voice) and throat singing, which then
was analyzed. Vowel, pitch, and loudness were chosen by the subject himself in order to produce optimal quality of throat singing. A section of sufficient
technical quality that contained the transition between the two vocal modes was chosen for subsequent off line acoustic and image analysis.
High-speed and sound recordings
A high-speed camera (Speedcam 500+, Weinberger,
Dietikon, Switzerland) attached to a flexible laryngoscope [Olympus ENF P3 (Olympus, Japan)] was used.
A halogen light source [Storz 600 (Storz, Kreutzlingen, Switzerland)] was connected to the endoscope.
The camera picture rate was 1904 frames/s with a resolution of 256 64 pixels and a total recording time
of 1 second.9 The recordings took place in an ordinary
examination setting. During fiberoptic examination
with the camera running, the examiner captured the
last preceding second of phonation by pressing a footpedal at a chosen moment.
Simultaneously with the high-speed digital recordings of the vocal fold vibrations, the subjects voice
was recorded into the computer using a microphone
[Audiotechnica ATM31 (Audiotechnica, Japan)] at a
distance of 30 cm from the subjects mouth. Simultaneous digital audiotape (DAT) recordings were performed [Sony DTC-59ES (SONY, Japan)].
When the sound signal was compared to the highspeed images, the time delay caused by the distance
that the sound had to travel from the vocal folds to
the microphone was adjusted for.
Perceptual evaluation
A perceptual evaluation regarding voice quality
and pitch was carried out by consensus of the authors. Both modal voice and throat singing were
Journal of Voice, Vol. 15, No. 1, 2001
80
PER-KE LINDESTAD ET AL
judged from the DAT audio recordings made simultaneously with the high-speed recordings as well as
the inverse filtering recordings (see below).
Flow inverse filtering
Flow inverse filtering was performed on a separate
occasion using a face mask and the system of Glottal
Enterprises (Syracuse, NY).8 The mask was tightly
sealed to the subjects face to avoid air leakage.10 As
was the case during the high-speed filming, the subject
alternated between modal voice and throat singing.
The sustained vowel [a:] was chosen due to its high
first formant frequency which is necessary for a successful inverse filtering analysis. For this reason, it was
not possible to use the vowel [o:] for this recording.
The subject chose pitch and loudness that was comfortable for throat singing. Based on perceptual evaluation (by consensus of the authors), the voice qualities
of the modal voice and throat singing were considered
the same during the inverse filtering as during the
high-speed filming, and the results were compared although the recordings were nonsimultaneous.
Inverse filtering analysis
The antiresonances for the first and second formants, respectively, and their bandwidths were based
on the modal phonation and set manually by one of
the authors (MS). The settings were kept the same for
the throat singing. The inverse filtering was performed online and the flow signal was recorded using the Soundswell (Stockholm, Sweden) program.11
The flow glottogram from the throat singing was not
as successfully analyzed as the modal phonation
since some ripple, probably from the first formant,
remained. Probably the subject changed his articulation slightly when he changed the type of singing.
However, the analysis was considered satisfactory
for describing the main differences between the flow
glottograms. The flow signal was not calibrated. The
results of the flow glottograms were therefore described qualitatively.
Acoustic spectral analysis
The acoustic signal, including both modal voice and
throat singing, was analyzed with narrow-band spectra
and with the help of spectrum sections using the
Soundswell analysis program.11 Spectral analyses was
chosen to examine the harmonics and noise in the
acoustic signal of both modal voice and throat singing.
Kymography
Using a computer program developed at Huddinge
University Hospital,9 kymographic images of the
high-speed glottal and supraglottal vibrations were
created.12 A transversal line was placed across the
glottis on the high-speed image at the place of maximal ventricular fold vibration amplitude, where also
the vocal fold vibrations could be visualized. The program excluded all other lines from the picture frames
and added the chosen line from consecutive images to
form one continuous picture of vibrations over time.
The kymographic image was compared to the sound
signal and to the high-speed images, frame by frame.
RESULTS
The analysis was done for a portion of phonation at
which a good close-up image of the voice source was
aquired and the phonation was stable. This portion
contained a transition from what perceptually sounded like modal phonation into the throat singing
mode. The phonation portion was analyzed using
acoustic spectra, digital pictures, and kymography.
Perceptual evaluation
Perceptually, the modal phonation was sonorous
and slightly hyperfunctional/pressed. The perceived
pitch was estimated at approximately D3 (140 Hz).
After transition to throat singing the voice sounded
extremely low pitched, estimated to be one octave
lower around D2 (70 Hz). This mode was characterized by high intensity, sonority, and slight press.
Spectrum analysis
A narrow-band spectrogram of the recorded
phonation is shown in Figure 1. The first part of the
spectrogram (00.15 s) represents the end portion of
the previous throat singing phonation and will be disregarded in the following.
Modal phonation starts just before 0.2 second and
continues to about 0.45 second. In this portion the
partials are well defined above the fundamental of
about 140 Hz. Around 0.45 second noise between
partials occur. The transition continues to around 0.8
second and in the section after 0.67 second the fundamental is slightly raised and the partials become
very unstable and difficult to distinguish. At about
0.8 second, as the throat singing mode is established,
the pattern becomes very regular with a new subhar-
81
FIGURE 1. Narrow-band spectrogram of the analyzed phonation. For explanations see text below.
monic added below the fundamental and with subharmonics between the partials up to 1000 Hz. Figures 2 and 3 show spectrum sections from 0.2 and 0.8
second, respectively. When superimposed on each
other the second section (Figure 3) matches the first
section (Figure 2) with subharmonics added in every
gap between partials up to around 2500 Hz.
Flow inverse filtering
Flow glottograms of modal voice and throat
singing are shown in Figure 4. The glottal pulses in
modal phonation are regular with a clear closed
phase, as expected. The fundamental frequency (F0)
for the modal phonation during the inverse filtering
task was somewhat lower than that in the high-speed
recording, B3 (around 120 Hz). The perceptually rated F0 for the throat singing during the inverse filtering task was one octave lower, B2 (around 60 Hz).
The flow glottograms from the throat singing showed
an evident pattern in that every second airflow pulse
was lower in amplitude. The airflow pulses with lower amplitude were interpreted to be the result of
damping of the airflow brought about by the ventricular fold vibrations. Evidently, although it was not
complete, the damping was efficient enough to make
the signal sound as if F0 had been lowered one octave.
High-speed imaging
In modal phonation, normal vocal fold vibrations
with somewhat lower amplitude and a normal mucosal wave was noted. In addition, coexisting lowamplitude ventricular fold vibrations with incom-
plete closures and with the same frequency and in the

same phase as the vocal fold vibrations could be observed. Perceptually, this phonation was slightly hyperfunctional. At the onset of throat singing, the ventricular fold vibrations became increasingly irregular.
During a period of approximately 0.12 second (corresponding to the interval between 0.67 and 0.79 second in the narrow-band spectrogram) the irregularity
persisted until the vibrations suddenly assumed a
very regular pattern with the ventricular folds vibrating at the same frequency as the vocal folds but closing only every second vibration (Figures 5 and 6). In
Figure 5 a complete cycle of two vocal fold closures
and one ventricular fold closure is shown divided into 28 images. The two levels of vibration can readily
be identified.
Closure during the ventricular vibrations was complete along the anterior two-thirds where vibration
amplitude was large. There was a chink in the posterior third. The opening phase of the ventricular folds
appeared faster than the closing phase. The vocal
folds on the other hand showed a fast closing phase
and a long and complete closed phase. Closure of the
ventricular folds did not coincide with vocal fold closure but preceded it. Thus the open phase and following closing phase of every second vocal fold vibratory cycle were concealed. At ventricular fold
opening the vocal folds remained closed for a moment before opening. The ventricular fold vibrations
between closures were of low amplitude and simultaneous with the unconcealed vocal fold closures.
82
FIGURE 2. Spectrum section of the modal voice from the interval 0.20.3
second of the spectrogram in Figure 1.
GLOTTAL AIRFLOW
FIGURE 3. Spectrum section of throat singing from the interval 0.80.9 second of the spectrogram in Figure1.
MODAL VOICE
0.05 sec
GLOTTAL AIRFLOW
THROAT SINGING
0.05 sec
FIGURE 4. Inverse filtered signal, upper line modal phonation, and lower line for the throat singing mode. The y axis
shows transglottal airflow and the x axis shows time. During the throat singing every second pulse is much more shallow
with a slow decrease in flow compared to the other pulses that look similar to the modal pulses.
83
FIGURE 5. Consecutive frames from the high-speed images during two cycles of vocal fold vibration
and one full ventricular vibration. The frames were chosen from around 0.8 second in the narrow-band
spectrogram of Figure 1. Note that the vocal folds are still closed when the ventricular folds part (images
610). The following vocal fold opening and closing can be easily seen (images 1117). A low-amplitude
ventricular fold vibration takes part during frames 1421 approximately. The beginning of the next vocal
fold opening can be noted in image 25 but the rest is concealed by the closing ventricular folds as is the
next vocal fold closure.
84
FIGURE 6. A kymographic image showing a section of bass-type throat singing. The line that was added from consecutive images to create the picture to the lower right is marked in the left picture. The sound signal is shown in the top image. The white
shadows coming in from top and bottom are the ventricular folds while the dark gray ones represent the true vocal folds. The
somewhat blurred light gray shadow between ventricular closures represents the low-amplitude oscillations of the ventricular
folds between closures. These are almost simultaneous with the vocal fold closure. The vocal folds vibrate with the same frequency as they did in modal phonation throughout the sequence, while the ventricular folds close during every second vocal fold
open phase concealing the next vocal fold closure. Note also that the sound excitation follows every second vocal fold closure
and not the ventricular closure.
They also appeared somewhat shorter than those

with complete closure (Figures 5 and 6).
Figure 6 shows a phonation section from the basstype throat singing that has been transformed into a
kymographic image. It is clearly shown that the vibrations of the ventricular folds coincide with the
open phase of every second glottal vibration and that
the closing phase of the following vocal fold cycle is
concealed by the ventricular folds. Moreover, in
comparing the sound signal with the kymography it
is shown that the sound excitation follows the vocal
fold closure which is not concealed.
DISCUSSION
Mongolian throat singing sounds very different
compared to most other known modes of singing
with its high intensity at an extremely low pitch and
with a much more restricted F0 range than modal
register. Therefore, it appeared plausible that mechanisms other than pure glottal phonation would be involved. In the present study it was again confirmed
that the ventricular folds play an important role in
this mode of sound production.
True ventricular fold phonation is very rare in persons with normal vocal folds, but may exist in patients as a compensatory phonation when the natural
voice source does not work, for example, after

cordectomy. Moreover, covibrating ventricular folds
are often noted in patients with chronic laryngitis, although it is not clear in what way they affect voice
quality.13,14
Several findings in our study indicate that the ventricular folds are responsible for the change of sound
in throat singing compared to modal phonation.
However, since the F0 of the throat singing mode
was half the F0 of the modal, and the vocal folds vibrated with the same frequency throughout the whole
sequence in the throat singing mode, it seems relevant to explain the 50% frequency reduction as the
effect of an attenuation of every second glottal pulse
by the covering ventricular folds. However, the
whole system of vibrating structures at two levels
must be regarded as the sound source. It could be
clearly seen in the high-speed film and the kymography derived from this film that every second vocal
fold closure was concealed by the adducted ventricular folds. This finding was also reflected in the glottal airflow data obtained from inverse filtering. Every
second glottal airflow pulse was markedly attenuated
in throat singing. However, the damping in our flow
glottograms was not as large as was shown in glottal
airflow data by Fuks et al.,7 who had examined one
subject using a similar (or possibly identical) singing

technique that they called vocal-ventricular mode
(VVM). It seems reasonable, however, to believe that
the two singers used the same kind of voice production with slight differences in technical performance.
The present data suggest that the voice in bass-type
throat singing is the product of a very complex voice
source, consisting of two vibrating levels. Whether
sound is elicited from only one level or both is still
impossible to say with certainty. What seems to be
true, however, is that the ventricular folds act to create an extremely low-pitched glottal phonation
through the damping of every second glottal excitation. This mode of sound production might be termed
ventricular gating.
CONCLUSIONS
The bass type of throat singing called Kargyraa is
produced with a combination of vocal fold and ventricular fold vibrations, the latter at half the frequency of the former and in a different phase. It seems
plausible that the ventricular folds modulate the
sound by attenuating every second glottal pulse. This
combined voice source produces a very dense spectrum of overtones suitable for overtone enhancement.
Acknowledgments: This study was supported by a
grant from Swedish National Medical Research Council.
Thanks are due to Britta Hammarberg and Eva B. Holmberg for constructive criticism of the manuscript.
REFERENCES
1. Pegg C. Mongolian conceptualizations of overtone singing
(Hoomii). Br J Ethnomusicol. 1992;1:31-54.
85
2. Levin TC, Edgerton ME. The throat singers of Tuva. Sci Am.
1999;281:70-77.
3. Smith H, Stevens K, Tomlinson R. On an unusual mode of
chanting by certain Tibetan lamas. J Acoust Soc Am.
1967;41:1262-64.
4. Hertegrd S, Lindestad P. Vocal fold vibrations studied
during phonation with high-speed-video imaging. Phoniatr
Logop Prog Rep. 1994;9:33-40.
5. Hammarberg B. High-speed observation of diplophonic
phonation. In: Fujimura O, Hirano M, eds. Vocal Fold Physiology-Voice Quality Control. San Diego, Calif: Singular
Publishing Group; 1995:243-245.
6. Larsson H, Hertegrd S, Lindestad P, et al. Vocal fold vibrations studied with high-speed imaging, kymography and
acoustic analysis. Phoniatr Logop Prog Rep. 1999;11:7-16.
7. Fuks L, Hammarberg B, and Sundberg J. A self sustained
vocal-ventricular phonation mode: acoustical, aerodynamic
and glottographic evidences. TMH-QPSR 1998;3:49-59.
8. Rothenberg M. A new inverse filtering technique for deriving the glottal air flow waveform during voicing. J Acoust
Soc Am. 1973;53:1632-1645.
9. Larsson H. High-Speed Tool Box. Custom made program.
Manual. 1998. Karolinska Institute, Department of Logopedics and Phoniatrics, Huddinge University Hospital, Huddinge, Sweden.
10. Holmberg E, Hillman RE, Perkell JS. Glottal airflow and
transglottal air pressure measurements for male and female
speakers in soft normal and loud voice. J Acoust Soc Am.
1988;84:511-529.
11. Ternstrm S. Soundswell-Signal Workstation Software. Manual version 3.4, 1996. Nyvalla DSP, Stockholm, Sweden.
12. Svec JG, Schutte H. Videokymography: high-speed line
scanning of vocal fold vibration. J Voice 1996;10:201-205.
13. Von Deorsten PG, Izdebski K, Ross JC, et al. Ventricular
dysphonia:a profile of 40 cases. Laryngoscope. 1992;102:
1296-1301.
14. Blixt V, Pahlberg-Olsson J. The Role of Ventricular Fold
Co-vibration in Ventricular Voice [masters thesis]. Dept of
Logopedics and Phoniatrics, Huddinge University Hospital,
Huddinge, Sweden; 1999.

Mongolian Singing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Mongolian Singing

Transféré par

Droits d'auteur :

Formats disponibles

Journal of Voice

Vol. 15, No. 1, pp. 78-85

Voice Source Characteristics in Mongolian

Summary: Mongolian throat singing can be performed in different modes. In

VOICE SOURCE CHARACTERISTICS IN MONGOLIAN THROAT SINGING

MATERIAL AND METHOD

VOICE SOURCE CHARACTERISTICS IN MONGOLIAN THROAT SINGING

plete closures and with the same frequency and in the

VOICE SOURCE CHARACTERISTICS IN MONGOLIAN THROAT SINGING

They also appeared somewhat shorter than those

voice source does not work, for example, after

VOICE SOURCE CHARACTERISTICS IN MONGOLIAN THROAT SINGING

Journal of Voice, Vol. 15, No. 1, 2001

Vous aimerez peut-être aussi