Vous êtes sur la page 1sur 17

Nonlinear source–filter coupling in phonation: Theorya)

Ingo R. Titzeb兲
Department of Speech Pathology and Audiology, The University of Iowa, Iowa City, Iowa 52242 and
National Center for Voice and Speech, The Denver Center for the Performing Arts, Denver, Colorado 80204

共Received 2 August 2007; revised 10 December 2007; accepted 14 December 2007兲


A theory of interaction between the source of sound in phonation and the vocal tract filter is
developed. The degree of interaction is controlled by the cross-sectional area of the laryngeal
vestibule 共epilarynx tube兲, which raises the inertive reactance of the supraglottal vocal tract. Both
subglottal and supraglottal reactances can enhance the driving pressures of the vocal folds and the
glottal flow, thereby increasing the energy level at the source. The theory predicts that instabilities
in vibration modes may occur when harmonics pass through formants during pitch or vowel
changes. Unlike in most musical instruments 共e.g., woodwinds and brasses兲, a stable harmonic
source spectrum is not obtained by tuning harmonics to vocal tract resonances, but rather by placing
harmonics into favorable reactance regions. This allows for positive reinforcement of the harmonics
by supraglottal inertive reactance 共and to a lesser degree by subglottal compliant reactance兲 without
the risk of instability. The traditional linear source–filter theory is encumbered with possible
inconsistencies in the glottal flow spectrum, which is shown to be influenced by interaction. In
addition, the linear theory does not predict bifurcations in the dynamical behavior of vocal fold
vibration due to acoustic loading by the vocal tract. © 2008 Acoustical Society of America.
关DOI: 10.1121/1.2832337兴
PACS number共s兲: 43.70.Bk, 43.75.Rs 关BHS兴 Pages: 2733–2749

I. INTRODUCTION synthesis, and processing for over 30 years. But it has been
recognized all along, however, that the linear theory is more
The acoustic features of all vowel productions, and applicable to male speech than female and child speech 共e.g.,
many consonant productions, have generally been described Klatt and Klatt, 1990兲. As long as the dominant source fre-
by a linear source–filter theory 共Chiba and Kajiyama, 1958; quencies lie well below the formant frequencies of the vocal
Fant, 1960; Flanagan, 1972; Stevens, 1999兲. This linear tract, the source is influenced only in simple ways by the
theory is based on the assumption that the source of sound filter, mainly in terms of glottal flow pulse skewing and pulse
for vowels and voiced consonants 共pulsatile airflow in the ripple. This mild interaction occurs for most male adult
larynx兲 is independent of the filter, an acoustic resonator speech, but greater interaction occurs for female and child
formed by airways known as the vocal tract. It is tradition- speech, and even more for singing, where the fundamental
ally assumed that the source–filter combination can be char- frequency range spans more than two octaves and the lower
acterized by mathematical convolution of source and filter partials of the source cross the formants. In these more in-
functions in the time domain or by multiplication of Fourier- tense interactions, bifurcations in the dynamics of vocal fold
transformed source and filter functions in the frequency do- vibration can occur that may generate sudden F0 jumps, sub-
main. Time domain convolution and frequency domain mul- harmonic frequencies, or changes in the overall energy level
tiplication are linear mathematical operations that carry with at the source. The earliest computer simulations of source–
them the superposition assumption; that is, the output of any filter coupling 共Flanagan, 1968; Flanagan and Landgraf,
combination of inputs is the linear combination of all the 1968; Ishizaka and Flanagan, 1972兲 showed the interactivity
individual outputs. Stated another way, the output of any and clearly. The one-mass vocal fold model did not self-sustain
all input frequencies can at most be an amplitude and phase
oscillation without a vocal tract 共a highly exaggerated cou-
changed version of these input frequencies. The filter cannot
pling effect兲, and the two-mass model showed sudden fre-
influence the source to produce new frequencies or change
quency discontinuities when F0 passed through the first for-
the overall energy level of the source. We will show here that
mant frequency F1. In human phonation, as our companion
this assumption is generally not valid, but under certain con-
paper shows, the frequency discontinuities are clearly ob-
ditions is an appropriate simplification.
servable, but to a lesser extent than in these earlier models.
Many aspects of speech production have been success-
The purpose of this paper is to elucidate the underlying
fully described by a linear source–filter theory. In particular,
mechanisms of source–filter interaction, both with simple
linear prediction of speech 共Markel and Gray, 1976; Atal and
analytical models and with a highly sophisticated computa-
Schroeder, 1978兲 has been the flagship of speech analysis,
tional model. Specific questions of interest are: 共1兲 what is
a兲 the primary parameter that regulates the degree of interac-
Readers are referred to 关J. Acoust. Soc. Am.123 共4兲, 1902–1915 共2008兲兴 for
a paper which reports on human subjects in this study. tion, 共2兲 can new frequencies and greater output power be
b兲
Electronic mail: ingo-titze@uiowa.edu produced with increased interaction, 共3兲 are there regions of

J. Acoust. Soc. Am. 123 共5兲, May 2008 0001-4966/2008/123共5兲/2733/17/$23.00 © 2008 Acoustical Society of America 2733

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
harmonic stability and instability that can be exploited by a Source–filter interactions that involve changes in vocal
vocalist, 共4兲 are sudden F0 discontinuities 共reported in a com- fold vibration have been demonstrated by several investiga-
panion paper on human subjects兲 always triggered by F0 in- tors. Sudden changes in vocal fold vibration can be triggered
teractions with F1, the first formant frequency, or can higher by vocal tract length changes 共Hatzikirou et al., 2006兲, a
harmonic-formant interactions contribute as well, and 共5兲 good example of source–filter interaction. Further observa-
how does the subglottal system contribute to nonlinear cou- tions about source–filter coupling were reported by Švec et
pling differently from the supraglottal system? al. 共1999兲 on human subjects and excised larynges, Mergell
It is hypothesized that humans 共and perhaps many ani- and Herzel 共1997兲 on a female subject, Miller and Schutte
mals兲 have the ability to operate their source–filter system 共2005兲 on singers, Neumann et al. 共2005兲 on male opera
with either linear or nonlinear coupling. One way to express singers, Zhang et al. 共2006a,b兲 on a physically constructed
the degree of coupling is through the relative impedances of model, Zañartu et al., 共2007兲 with a computer simulation
the source and filter. For linear source–filter coupling, the model, and Jiang and Tao 共2007兲 with analytical mathematic
source impedance 共transglottal pressure divided by glottal methods. Some of these studies will be referred to in more
flow兲 is kept much higher than the input impedance to the detail later. In singing, vowel modifications 共e.g., changing
vocal tract 共vocal tract input pressure divided by the airflow /u/ to /U/ or /i/ to /I/兲 are used routinely to strengthen a vowel
into the vocal tract兲. This linear coupling is accomplished by on a certain pitch 共Appelman, 1967; Coffin, 1987兲. Entire
adducting the vocal folds firmly and widening the epilarynx singing styles 共operatic, musical theatre, yodeling兲 are based
on the concept that certain vowels and voice qualities work
tube 共a normally narrow region of the vocal tract above the
best with certain pitches, a concept that would have no ex-
vocal folds also known as the laryngeal vestibule兲. The glot-
planation if the source–filter system were linear. The entire
tal flow is then determined strictly by aerodynamics, while
voice register terminology is based on observed phenomena
acoustic pressures above and below the glottis have little
related to interaction within the source–filter building blocks,
influence on either the transglottal pressure 共which drives the
which includes the subglottal system 共Titze, 2000; Chap. 10兲.
glottal flow兲 or the intraglottal pressure 共which drives the
Vocal pedagogues who invented terms like chest voice and
vocal folds兲. For nonlinear source–filter coupling, the glottal
head voice were not so naive to suggest that the source of
impedance is adjusted to be comparable to the vocal tract sound moves from location to location, but rather that inter-
input impedance, making the glottal flow highly dependent actions with certain parts of the airway are stronger with
on acoustic pressures in the vocal tracts 共above and below certain source–filter adjustments and lead to special sensa-
the glottis兲. This is accomplished by setting specific adduc- tions along the airway. The role of the subglottal system for
tion levels of the vocal folds that match a narrower epilarynx chest voice was implicated years ago by Van Den Berg
tube. Evidence of nonlinear coupling is the production of 共1957兲 and Vennard 共1967兲. Chest voice production has both
new frequencies in the form of distortion products, lowering a glottal feature 共a relatively long closed phase兲 and signifi-
of the oscillation threshold pressure 共the Hopf bifurcation兲, cant acoustic coupling to the trachea. For head voice, there
production of subharmonics or modulation frequencies, sud- appears to be more of an interaction with the supraglottal
den F0 jumps, or chaotic vibrations, as either vowel or F0 are tract.
changed 共companion paper兲. In an accompanying paper 共Titze et al., 2008兲, a primary
In many attempts to model the glottal airflow with ex- objective is to differentiate purely source-generated bifurca-
plicit mathematical formulas 共e.g., Rosenberg, 1971; Fant et tions, including F0 jumps, from vocal tract induced bifurca-
al., 1985; Fant and Lin, 1987兲, nonlinear source–filter cou- tions. Three vocal exercises are designed in the accompany-
pling in the form of distortion products had already been ing paper to deliberately cross the fundamental frequency
introduced implicitly by making the glottal flow pulse shape with the first formant frequency and to observe the resultant
different from the glottal area pulse shape. Usually, the peak nonlinear effects in nine normal males and nine normal fe-
of the flow pulse was delayed with respect to the area pulse males. It is hypothesized that crossing F0 with F1 changes the
shape. But such a delay 共referred to as flow pulse skewing兲 acoustic load dramatically and that this crossing can destabi-
cannot be justified if only quasisteady aerodynamic calcula- lize vocal fold vibration. The main goal is to determine the
tions are carried out 共i.e., linear source–filter coupling兲. proportion of irregularities that are due to nonlinear source–
Rothenberg 共1981兲 showed that the peak delay of the glottal tract interactions. Expected manifestations of nonlinearity
flow pulse is a result of low-frequency source–filter interac- are sudden pitch jumps, subharmonic generation, or chaotic
tion. Zhao et al. 共2002兲 make a case for intraglottal pressure vocal fold vibration. Results indicate that the most frequent
skewing for different glottal shapes based on an aerodynamic bifurcation is a sudden F0 jump, as predicted by Fletcher
treatment that involves flow separation and vortex shedding, 共1993兲 for pressure-controlled valves in gas flows. The ob-
but the effect on glottal flow is likely to be small in compari- jective in this paper is to provide a theoretical framework for
son to the vocal tract loading effect, which has recently been the bifurcation phenomena in vocal fold vibration with a
further explored by this author 共Titze, 2001; 2004a; b; nonlinear source–filter construct.
2006a兲. There is now strong evidence that glottal flow pulse
skewing always involves the vocal tract, explicitly or implic- II. INTERACTION BASED ON VOCAL TRACT
REACTANCE
itly. Constructing a glottal flow pulse shape without consid-
eration of the vocal tract load can lead to inconsistencies in Because the vocal tract is relatively short in comparison
combining the source with the filter. to a wavelength at typical speech fundamental frequencies

2734 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 1. 共Color online兲 Harmonic fre-
quency generation by source–filter
interaction; 共top left兲 vocal tract
shape; 共top right兲 reactance curves,
thin solid line for supraglottal,
dashed line for subglottal, and thick
solid line for combined; 共middle left兲
sinusoidal glottal area function;
共middle right兲 spectrum of glottal
area; 共bottom left兲 glottal flow; 共bot-
tom right兲 spectrum of glottal flow.

共e.g., at F0 = 200 Hz the vocal tract contains less than 1 / 8 of A. Level 1 interaction: Flow pulse dependency on the
a wavelength兲, and because a speaker or singer wishes to subglottal and supraglottal vocal tract pressures
convey all the phonetic variations of a spoken language, the
A first level of interaction is described in which vocal
length and shape of the vocal tract cannot be adjusted to
fold vibration is not significantly disturbed by oscillating
resonate many of the source frequencies simultaneously.
pressures above or below the glottis, but glottal flow is. This
Thus, unlike in most musical instruments, for which the
length and shape of the horn or bore is carefully designed to level occurs in all speech and is therefore worthy of special
resonate the dominant source frequencies simultaneously, consideration.
lining up source frequencies with vocal tract filter resonances What is common about harmonics whose frequencies
is highly selective and rare in human phonation. Apparent are less than F1 is that they all experience positive 共inertive兲
“formant-harmonic tuning” occurs in high soprano singing reactance from the vocal tract.1 Figure 1 共top left兲 shows a
共Sundberg, 1977; Joliveau et al., 2004兲, but close inspection uniform vocal tract 共subglottal and supraglottal兲 separated by
of the data reveals that F0 is usually slightly less than the the glottis. The top right panel shows the reactance curves
formant frequency F1. In some cases, at very high F0, oscil- 共subglottal as a dashed line, supraglottal as a thin solid line,
lation occurs for F0 ⬎ F1. We will show that this is more and combined reactance as a thick solid line兲 up to 1500 Hz.
likely for a falsetto-like vibration regime. Exact tuning of F1 The reactance curves were calculated with cascade transmis-
with a harmonic seems to occur only in so-called overtone sion line matrices as originally outlined by Sondhi and
singing, where a high frequency harmonic is reinforced by a Schroeter 共1987兲 and further developed by Story et al.
formant that is tuned precisely to its frequency 共Rachele, 共2000兲. From 0 to 500 Hz, the supraglottal reactance is posi-
1996兲. tive 共inertive兲, whereas from 500 to 1000 Hz it is negative
For low-pitched speech or singing, the dominant source
共compliant兲. The subglottal reactance stays inertive up to
harmonics 共typically F0 through 3F0兲 are below the first reso-
600 Hz for this configuration.
nance 共formant兲 frequency F1 of the vocal tract. For ex-
Inertive reactance has been shown to skew the flow
ample, a bass or baritone speaking or singing a note G2
共98 Hz兲 will reach the first formant of an /i/ or a /u/ vowel pulse 共delay its peak relative to that of the glottal area兲,
only with the third harmonic. For an /a/ vowel, source har- whether it is subglottal 共X1兲 or supraglottal 共X2兲. For a re-
monics higher than the seventh are needed to reach the first view of this flow pulse skewing, see Rothenberg 共1981兲, Fant
formant. Because these higher harmonics often do not have a 共1986兲, Fant and Lin 共1987兲, or Titze 共2006a兲. Note the delay
great influence on the nature of vocal fold vibration, their in the peak of Ug in Fig. 1 共bottom left兲 in relation to ag
interaction with a formant is perceived as a vowel or voice 共middle left兲. Further discussion of these graphs will follow.
quality characteristic, not a source change. But the flow pulse For glottal flow calculation, the subglottal input impedance
may nevertheless be influenced by higher harmonic source– and the supraglottal input impedance add together algebra-
filter interaction, as will be shown next. ically 共in complex form兲 to produce the effective load imped-

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2735

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
ance for the glottis. This can be shown by letting the input
pressure to the vocal tract 共epilarynx tube兲 be ug = ag 冋 4共ps+ − p−e 兲
k t␳
册 1/2
, 共7兲

P e = Z 2U g , 共1兲 which becomes a direct proportion to the glottal area ag. If a


constant subglottal pressure were to be applied to produce
where Ug is the complex 共Fourier transformed兲 glottal flow the glottal flow, the incident partial pressure wave ps+ in the
and Z2 is the complex supraglottal impedance. Similarly, the wave-reflection algorithm would be replaced by half of the
subglottal pressure is lung pressure PL with a + 1 reflection coefficient, so that the
subglottal pressure would then be Ps = ps+ 共1 + r兲 = 2ps+ = PL
Ps = Z1共− Ug兲, 共2兲
共Titze, 1984; Story, 1995兲. Furthermore, if the incident su-
where Z1 is the subglottal impedance. The transglottal pres- praglottal pressure p−e were set to zero, then
sure is then

Ps − Pe = − 共Z1 + Z2兲Ug = − 共R1 + R2兲Ug − i共X1 + X2兲Ug ,


ug = ag 冋 册 2PL
k t␳
1/2
, 共8兲

共3兲 which is the asymptotic condition for linear 共noninteractive兲


source–filter coupling. For linear coupling, the relative phase
where R1 and R2 are the subglottal and supraglottal resis- delays between ps+ and p−e due to wave propagation are re-
tances, i = 冑−1, and X1 and X2 are the corresponding reac- sponsible for the nonproportionality between ug and ag.
tances. Note that the combined reactance X1 + X2 shown in The skewing of the flow pulse as a result of an overall
Fig. 1 共top right, thick line兲 is not necessarily symmetric inertive reactance produces new harmonic frequencies in the
about the horizontal axis because the subglottal and supra- glottal airflow that are not part of the glottal area waveform.
glottal peaks do not occur at identical frequencies. This result is not new 共Rothenberg, 1981; Fant and Lin,
For incompressible, quasisteady glottal flow, the trans- 1987; Koizumi et al., 1985兲, but most previous analyses have
glottal pressure can also be expressed aerodynamically as underestimated the amount of skewing because the exact ge-
1
ometry of the epilarynx tube was not known. The magnetic
Ps − Pe = kt 2 ␳u2g/a2g , 共4兲 resonance images produced by Story 共1995; 2005兲 have been
the key data sets to validate strong source–filter interaction.
where kt is an empirically-determined transglottal pressure
The uniform tubes in Fig. 1 produce relatively weak interac-
coefficient with an average value of about 1.1 共Scherer et al.,
tions. In the middle panels, a sinusoidal ag waveform is
1983; Alipour and Scherer, 2007; Fulcher et al., 2006兲, ␳ is
shown on the left and its corresponding single line spectrum
the air density, ug is the time-dependent flow, and ag is the
is shown on the right. With this sinusoidal area 共forced os-
time-varying glottal area. Equation 共4兲 is nonlinear in ug and
cillation兲, glottal flow was produced 共bottom left panel兲 with
cannot easily be Fourier transformed, and since the resis-
0.8 kPa lung pressure in a voice simulator that could be
tances and reactances in Eq. 共3兲 are in the frequency domain,
switched between forced oscillation and flow-induced self-
Eqs. 共3兲 and 共4兲 cannot be equated. In previous work with a
sustained oscillation. Equation 共5兲 was combined with a 44
wave-reflection analog of the vocal tract 共Titze, 1984兲, it was
section uniform vocal tract with energy losses and the appro-
shown that the flow had the following closed-form solution:

再 冉 冊 冋冉 冊 册冎
priate radiation impedance was used as previously described
a gc ag ag 2
4kt + 1/2 共Titze, 2006b兲. Note that the flow waveform is slightly
ug = − * ⫾ + 共p − p−e 兲 , 共5兲 skewed and has an entire spectrum of frequencies 共bottom
kt A A* ␳c2 s right兲. Wave propagation occurred in both subglottal and su-
where c is the sound velocity, and A* is an equivalent vocal praglottal tracts. The source–filter coupling is nonlinear be-
tract area defined as cause new frequencies 共harmonic distortion frequencies兲 are
created by the vocal tract.
A* = AsAe/共As + Ae兲, 共6兲 The skewing of the flow pulse guarantees a dominant
excitation near glottal closing and raises the energy in the
with As and Ae being the subglottal and supraglottal 共epila- harmonics 共Fant, 1986兲. In the past it has been assumed that
ryngeal兲 entry areas, respectively. Further, in the wave- the harmonic spectrum of the source comes primarily from
reflection algorithm 共Kelly and Lochbaum, 1962; Liljen- vocal fold collision. This may be true for many phonations,
crants, 1985; Story, 1995; Titze, 2006b兲 ps+ is the incident but this example shows that vocal fold collision is not essen-
partial wave pressure arriving from the subglottis and p−e is tial to produce source harmonics. Nonlinear source–filter
the incident partial wave pressure arriving from the supra- coupling can produce a spectrum of source frequencies, with
glottis. It should be noted that the wave-reflection analogs do a spectral slope of about −15 dB per octave in this case.
not include the exact near-field pressures 共Zhao et al., 2002: Furthermore, the harmonic amplitudes are affected by the
Zhang et al., 2002兲, but capture the most important fields for reactance curve. The short vertical lines in the top right panel
wave propagation in the vocal tract. above the reactance curves show the location of the harmon-
Equation 共5兲 is critically important for understanding ics in the bottom right figure. Note that harmonics 3 and 4
source–filter interaction because it defines explicitly a cou- are in negative reactance territory and are depressed slightly
pling parameter, ag / A*. Note that when ag / A* is small, the in amplitude, relative to harmonics 2 and 5. The simplest
flow reduces to explanation for this amplitude depression is that negative

2736 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 2. 共Color online兲 Harmonic fre-
quency generation by source–filter
generation with stronger coupling
共Ae = 0.5 cm2兲; 共top left兲 vocal tract
shape; 共top right兲 reactance curves,
thin solid line for supraglottal,
dashed line for subglottal, and thick
solid line for combined; 共middle left兲
sinusoidal glottal area function;
共middle right兲 spectrum of glottal
area; 共bottom left兲 glottal flow; 共bot-
tom right兲 spectrum of glottal flow.

共compliant兲 reactance integrates the downstream flow and form 共bottom left panel of Fig. 2兲 reflect this increase in
builds up an opposing pressure that reduces the flow at a coupling strength. An apparent “closed phase” is seen, even
given frequency. If negative reactance were present at all though there was no glottal closure. As in Fig. 1, the area
frequencies, the flow pulse would be skewed to the left be- function ag was sinusoidal and always remained above zero
cause more flow would be accepted during glottal opening 共no truncation兲. This brings into question that whole enter-
than glottal closing as the back pressure builds up. With fre- prise of inferring vocal fold vibration patterns from inverse-
quency dependent reactance, selective components that filtered glottal flow, especially in terms of an open phase and
would skew the pulse to the left are reduced in amplitude a closed phase. Rothenberg and Zahorian 共1977兲 showed that
because, overall, the pulse is still skewed to the right. A more inverse filtering to obtain the glottal area from mouth flow is
detailed discussion would require the consideration of all the fundamentally a nonlinear process. Linear prediction cannot
phases of the components and how they are affected by vari- accomplish this task.
able reactance. Comparing the harmonic spectra 共bottom right兲 across
Figure 2 shows the same set of curves as in Fig. 1, but Figs. 1 and 2, we see that the strength of the harmonics is
now the coupling between the source and the filter is in- again related to the amount of inertive reactance present at
creased to a more realistic value. By reducing the epilarynx the harmonic frequency. For the F0 selected here 共200 Hz兲,
tube cross-sectional area Ae from 3.0 cm2 in Fig. 1 to the third harmonic gained no strength with increased cou-
0.5 cm2 in Fig. 2 共compare upper left graphs兲, the input im- pling because it still resides in negative 共compliant兲 reac-
pedance to the vocal tract has been increased. This input tance territory, as in Fig. 1. The second and the fourth har-
impedance is scaled by ␳c / Ae, the characteristic acoustic im- monics, however, both experience a slight amplitude increase
pedance of the first section of a tube, where ␳ is the density due to higher reactance. In particular, the fourth harmonic,
of air and c is the sound velocity. The coupling parameter which was in negative territory, now experiences about zero
ag / A* in Eq. 共5兲 was correspondingly increased from 0.1 to reactance. This stronger fourth harmonic is responsible for
0.35 because it contains Ae in the formula for A* in Eq. 共6兲. the ripple seen in the flow waveform at the bottom left.
The mean glottal area ag was held constant at 0.15 cm2 and 共Only three ripple cycles are seen on the increasing flow; the
the subglottal area was held constant at 3.0 cm2. Hence, the fourth ripple cycle is hidden on the downward slope and on
glottal source impedance remained the same. Note that the the flat portion near zero flow兲. The spectral slope is now
combined reactance curve 共thick line, upper right in Fig. 2兲 only on the order of −10 dB per octave.
now has a net upward shift toward positive 共inertive兲 values. Because the source spectrum can be affected by vocal
Specifically, over the entire 0 – 1500 Hz frequency range, tract reactance, and because this reactance is frequency de-
negative 共compliant兲 reactance occurs only between 600 and pendent, a spectrogram with an F0 glide at constant vocal
800 Hz. tract shape 共identical to the shape shown in Fig. 2, top left兲
The harmonic frequencies produced in the flow wave was investigated. This type of a pitch–glide spectrogram was

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2737

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 3. 共Color online兲 F0 glide pro-
duced with a driven sinusoidal glot-
tal area function with no source–
filter interaction; 共top left兲
spectrogram for glottal flow based on
sinusoidal glottal area, with reac-
tance curves overlaid vertically; 共top
right兲 spectrogram for radiated
mouth pressure P0; 共bottom left兲 am-
plitude envelope of glottal flow;
共bottom right兲 amplitude envelope of
radiated mouth pressure.

analyzed for recordings of human subjects in the companion duces a greater Po at 2000 than at 100 Hz, even though the
paper 共Titze et al., 2008兲. Figure 3 shows the noninteractive peak glottal flow was 0.2 vs 0.8 l / s at these extremes.
共linear兲 case as a control for later interactive cases. No inter- Figure 4 shows the flow-interactive case, with Ae re-
action with tissue vibration was allowed because the vibra- maining at 0.5 cm2. Vocal fold vibration still remained
tion was still forced with a sinusoidal area rather than self- forced 共no interaction with tissue vibration兲. The confirming
sustained. At the top left we see a spectrogram of the F0 glide observation is that a series of extra harmonics 共2F0 through
共straight sloping line兲, from 2000 to 100 Hz and back to 4F0兲 is again created in ug by vocal tract coupling. These
2000 Hz. The signal used in the spectrogram was the glottal harmonics are reinforced when there is positive reactance
flow ug, a purely sinusoidal function derived from the sinu- 共positive reactance curve is left of the vertical center line兲
soidal area function ag according to Eq. 共8兲. The amplitude rather than at the center of the formants. This is especially
envelope of this sinusoidal flow is shown below the spectro- noticeable in the second harmonic. 关The dependence on re-
gram on the left. Its peak at 4.0 s comes from the fact that actance is also present in the fundamental, but the spectro-
the glottal area function 共not shown兲 was programmed to gram gray scale was deliberately saturated for F0 so that the
vary inversely with the square root of frequency to approxi- higher harmonics could be seen.兴 At the formants, where the
mate realistic amplitudes of vibration. reactance changes suddenly from positive to slightly nega-
The spectrogram 共top left panel兲 also shows the reac- tive, the source harmonics are diminished in their ampli-
tance curves of Fig. 2, now superimposed on the center of tudes. This is further evidenced by the envelope of the flow
the spectrogram such that they are displayed vertically from wave form ug, which is modulated by an uneven treatment of
0 to 5000 Hz. Positive reactance is to the left and negative the harmonics by this reactance 共note the valleys at 0.8 and
reactance is to the right of center. The reactance fluctuation 3.0 s in the bottom left panel兲. The peaks and valleys in the
in the third and fourth formant region 共around 3000 Hz兲 is so amplitude envelope are unmistakable evidence of nonlinear
large that the subglottal reactance is barely visible on the source–filter coupling because the glottal area and lung pres-
same scale. This large fluctuation is attributed to the epilar- sure were identical to the control case. As a result, the overall
ynx tube, which begins to establish its own characteristic peak-to-valley ripple in the Po waveform 共bottom right兲 is
quarter-wave resonance near the third formant at less severe than in the linear case of Fig. 3. The dips in the ug
2500–3000 Hz 共Titze and Story, 1997兲. amplitude at the formants partially cancel the increase in
Note again that no harmonic distortion frequencies are energy transmission through the vocal tract at the formants
created by the vocal tract for this noninteractive F0 glide, the 共Rothenberg, 1987兲. Thus, the effect of nonlinear formant–
control case. The single frequency component of the flow harmonic coupling in the glottal flow 共level 1 interaction兲 is
共F0, sloping downward from 0.0 to 4.0 s and then upward to distribute the acoustic energy over the entire spectrum
from 4.0 to 8.0 s兲 is unaffected by the filter. What is affected rather than to accentuate it at the center of a formant.
by the filter is the oral radiated pressure Po, shown on the In summary, rather than attempting to resonate several
right side of Fig. 3. As expected, Po increases near 0.6 s source harmonics by tuning them to the tube resonances, as
when F0 passes through F2 and near 3.0 s when F0 passes is done in many musical instrument designs, a vocalist may
through F1. These increases 共and decreases on the other side attempt to reinforce a cluster of several harmonics with fa-
of a formant兲 would be perfectly symmetric if the radiation vorable reactance. In most cases, as many harmonics as pos-
at the mouth were independent of frequency. High frequen- sible are placed on the lower frequency side of a formant
cies radiate better than low frequencies, however, which pro- 共below the resonance frequency兲. For high-pitched singing,

2738 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 4. 共Color online兲 F0 glide pro-
duced with a driven sinusoidal glot-
tal area function with source–filter
interaction 共Ae = 0.5 cm2兲; 共top left兲
spectrogram for glottal flow based on
sinusoidal glottal area, with reac-
tance curves overlaid vertically; 共top
right兲 spectrogram for radiated
mouth pressure P0; 共bottom left兲 am-
plitude envelope of glottal flow;
共bottom right兲 amplitude envelope of
radiated mouth pressure.

this often requires a special vowel, so that at least F0, 2F0, B. Level 2 interaction: Mode of vibration dependency
and 3F0 can all be reinforced over a reasonable pitch range. on vocal tract reactance
An example is shown in Fig. 5, where subglottal reactance is There is another level of interaction, identified as level
now shown by the thin line and supraglottal reactance by the 2, which involves a change of the vibration pattern of the
thick line. Note the ranges of F0, 2F0, and 3F0 below the vocal folds as a result of vocal tract changes. For this level of
reactance curves at the bottom of Fig. 5. Only the subglottal interaction, subglottal reactance and supraglottal reactance
reactance is negative in the 2F0 range, but in the next discus- affect vocal fold vibration differently. While supraglottal re-
sion it will be shown that the combination of a compliant actance is generally most favorable when it remains inertive
共negative兲 subglottal reactance and an inertive 共positive兲 re- 共positive兲, the subglottal reactance is sometimes more favor-
actance provides ideal reinforcement of vocal fold vibration. able if it is compliant 共negative兲. The complicating factor is
So, the loss in level 1 interaction may in part be overcome by the geometry of the vocal folds, or shape of the glottis, in
a gain in level 2 interaction. This example suggests that cer- direct analogy to Fletcher’s 共1993兲 differentiation of inward,
tain vowels will be favored over other vowels. The utility of outward, and lateral striking valves that can self-sustain os-
this type of interaction is that source and filter frequencies do cillation in a pipe. An added complexity, not discussed by
not need to match exactly, leaving some degree of freedom Fletcher, is that vocal folds can propagate a surface wave in
for articulation and vowel migration with the general goal of their tissue, which changes the shape of the valve dynami-
strengthening the dominant low-frequency harmonics. cally. Based on this surface wave, a relation for the mean
intraglottal driving pressure on the vocal fold surface was
derived previously 共Titze, 1988兲,

冉 冊
Pg = Ps 1 −
a2
a1
a2
+ Pe ,
a1
共9兲

where Pg is the mean 共entry to exit兲 intraglottal driving pres-


sure, Ps is the subglottal pressure, Pe is the supraglottal 共ep-
ilarynx tube兲 input pressure, a1 is the glottal entry area
共lower margin of the vibrating portion of the vocal folds兲,
and a2 is the glottal exit area 共upper margin of the vibrating
portion of the vocal folds兲. In deriving Eq. 共9兲, the pressure
profile in the glottis was assumed to follow the Bernoulli
energy conservation law, the glottal area was assumed to
vary linearly from bottom to top, the acoustic pressure Pe
was assumed to be the only pressure recovery at glottal exit,
and the transglottal pressure coefficient was set to 1.0 for
simplicity. 共For details of the full derivation, see Titze
共1988兲, p. 1542.兲
FIG. 5. 共Color online兲 共Top兲 Vocal tract shape for the vowel /U/; 共bottom兲
reactance curves, thick line supraglottal and thin line subglottal, and favor- As a valve, the vocal folds are closest to the 共+ , + 兲 case
able ranges for F0, 2F0, and 3F0 shown underneath. described by Fletcher 共1993兲. In his notation, the valve is

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2739

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
folds, so that the entry area a1, where vibration begins, is low
in the glottis vertically. To the contrary, in the so-called fal-
setto register 关Fig. 6共b兲兴, vibration is confined mainly to the
upper portion of the vocal folds, with a1 being much higher
in the glottis. 共The point where vibration effectively begins
vertically in the glottis has been called the mucosal upheaval
point; Yumoto and Kadota, 1998兲. If we approximate the
vibrating portion of the medial surface with a straight line,
then the prephonatory shape is both convergent 共a2 ⬍ a1兲 and
divergent 共a2 ⬎ a1兲 for the modal register but mainly diver-
gent 共a2 ⬎ a1兲 for the falsetto register. Physiologically, the
thyroarytenoid 共TA兲 muscle controls this medial surface
shape. When the TA muscle contracts, it thickens and bulges
out the lower part of the vocal fold, thereby “squaring up”
the glottis and producing modal register. When the TA
muscle is relaxed, the bottom of the vocal fold retracts and
only the top remains engaged in vibration. The medial sur-
face is more rounded. The vocal ligament is used for adduc-
tory positioning and tensing of the vocal fold tissues.
It will now be shown how vocal tract pressures can as-
sist or hinder vibration in these basic two registers. Neglect-
ing vocal tract resistance and steady pressures for the first
part of the discussion, basic acoustic theory would predict
the vocal tract pressures to be

du
Ps = − I1 inertive subglottal tract 共10兲
dt

= P1 −
1
C1
冕 udt compliant subglottal tract, 共11兲

du
Pe = + I2 inertive supraglottal tract 共12兲
dt


FIG. 6. Sketches of right vocal fold tissue displacement from the glottal
midplane in coronal view of 共a兲 modal register and 共b兲 falsetto register. After 1
Hirano, 1975. = P2 + udt compliant supraglottal tract, 共13兲
C2

共+ , + 兲 if tissue moves laterally with both increasing subglot- where P1 and P2 are constants of integration. With these
tal pressure 共first +兲 and increasing supraglottal pressure relations, the intraglottal pressure of Eq. 共9兲 has four possible
共second +兲. But, as Eq. 共9兲 indicates, intraglottal pressure is forms:
greater for a convergent glottis than for a divergent glottis.
Tissue surface waves on the vocal folds 共i.e., standing waves,
or modes of vibration兲 can be excited by the airflow to pro- Pg = − I1 1 − 冉 冊 a2 du
a1 dt
+ I2
a2 du
a1 dt
inertive – inertive 共14兲
duce self-sustained oscillation, even without vocal tract in-
teraction 共Titze, 1988兲. Note that, even if Ps is a constant and
Pe is zero in Eq. 共9兲, an alternating push–pull pressure can be
created by an a2 / a1 ratio that is less than 1.0 for lateral
= P1 −
1
C1
1−冉 冊冕
a2
a1
udt
movement and greater than 1.0 for medial movement. Vocal
tract interaction is therefore an important, but not a neces- a2 du
+ I2 compliant – inertive, 共15兲
sary, condition for self-sustained vocal fold oscillation. a1 dt
Figure 6 shows a sketch of how the medial surface of the

冉 冊
vocal folds may change during vibration, both in terms of the
a2 du
static and the time varying configuration. The sketch is pat- =− I1 1 − + P2
terned after Hirano 共1975兲. Two different registers of phona- a1 dt
tion are identified. In the so-called modal register 关Fig. 6共a兲兴,
vibration is observed over much of the thickness of the vocal +
1 a2

C2 a1
udt inertive – compliant, 共16兲

2740 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 7. Acoustic circuit diagrams for subglottal and supraglottal reactance, 共a兲 inertive–inertive, 共b兲 compliant–inertive, 共c兲 inertive–compliant, and 共d兲
compliant–compliant.

= P1 −
1
C2
1−冉 冊冕
a2
a1
udt + P2
tions. The Flanagan and Landgraf 共1968兲 and Ishizaka and
Flanagan 共1972兲 simulations had no subglottal tract. Hence,
their vocal tract interaction effects were probably exagger-
+
1 a2
C2 a1
冕udt compliant – compliant. 共17兲
ated. The Zhang et al. 共2006a, b兲 investigations had only a
subglottal tract, which could underestimate the overall inter-
action. The Zañartu et al. 共2007兲 simulations included both a
These cases are basically the same as the four cases de- subglottal and a supraglottal tract. Given that their vocal fold
scribed by Fletcher 共1993兲. An acoustic “circuit” representa- model was composed of only a single mass, the results can
tion of these relations is shown in Fig. 7. Inertance is repre- be considered an excellent correction to the Flanagan and
sented by coils I1 and I2 and compliance is represented by Landgraf 共1968兲 model. In a later section, it will be shown
parallel plates C1 and C2, following the symbolism of elec- that neglect of the subglottal reactance can be dramatic, even
tric circuitry for inductance and capacitance, respectively. with greater degrees of freedom in tissue movement.
The driving pressure Pg from the above-presented math- The compliant–inertive tract appears to be the most fa-
ematical expressions is also labeled in Fig. 7. For maximum vorable for vocal fold vibration in modal register 关Eq. 共15兲
reinforcement of vocal fold vibration, the driving pressure Pg
and Fig. 7共b兲兴. If the glottis is mostly convergent, with diver-
should provide an alternating push–pull on the vocal fold
gence occurring only over a small fraction of the cycle prior
tissue, a push when the glottis is opening and a pull when the
to closure, the integration of the flow in Eq. 共15兲 produces a
glottis is closing. Thus, when du / dt is positive 共flow is in-
steady decrease in intraglottal pressure over the open portion
creasing during glottal opening兲, Pg should be positive; when
of the glottal cycle 共because a2 ⬍ a1兲. This gradual decrease
du / dt is negative 共flow is decreasing during glottal closing兲,
in pressure 共stronger during opening and weaker during clos-
Pg should be negative.
ing兲 adds to the dominant push–pull produced by the inertive
Consider first the inertive – inertive case 关Eq. 共14兲 and
Fig. 7共a兲兴. Both coefficients in front of du / dt in Eq. 共14兲 supraglottal tract, as described before. Tongue-tip trills
should be positive for the push–pull condition. The only way 共McGowan, 1992兲 have also been shown to be sustained by
this can occur is if a2 / a1 ⬎ 1.0 over most of the open portion an upstream compliant reactance. The dominant compliance
of the glottal cycle, which means the glottis must be mainly in tongue trills is a wall compliance 共rather than an air com-
divergent. Falsetto register may provide this configuration pliance兲, but the effect is similar.
quite readily, referring to Fig. 6共b兲. It is known that the top of The inertive – compliant tract 关Eq. 共16兲 and Fig. 7共c兲兴 is
the vocal folds spread apart slightly in falsetto register. A net the least favorable for modal register. Supraglottal integra-
divergent glottis at the top created by this spread would ap- tion of the flow u raises the intraglottal pressure throughout
pear to be beneficial to both terms in Eq. 共14兲. For modal the open portion of the cycle, creating a greater push during
register, the divergent configuration occurs over a smaller closing than during opening. This is contrary to the desired
fraction of the glottal cycle, prior to closure. But maximum push–pull condition. In addition, when the glottis is conver-
pressures occur in this fraction of the cycle, yielding strong gent 共a2 / a1 ⬍ 1.0兲, the inertive subglottal tract further hin-
excitation in a pulse-like manner. Thus, subglottal inertance ders oscillation, as discussed earlier. But some assistance is
I1 generally provides both a help and a hindrance to vocal possible from subglottal inertance in falsetto register, if a2
fold vibration in modal register. On the other hand, supra- ⬎ a1, also as discussed earlier.
glottal inertance I2 always provides the favorable push–pull Finally, a compliant – compliant tract 关Eq. 共17兲 and Fig.
condition, for both registrations and both glottal configura- 7共d兲兴 is also not favorable to modal register, but a little more

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2741

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
so than the inertive–compliant combination just discussed.
For convergence, although the gradual pressure reduction
from the first integration in Eq. 共17兲 is favorable, the second
integration is detrimental. The worst of all situations exists
for divergence. The glottis is simply blown apart by a uni-
formly increasing intraglottal pressure. Thus a compliant–
compliant vocal tract squelches phonation when the glottis
diverges.
To determine how the above-noted driving pressures af-
fect the frequency and amplitude of self-sustained oscilla-
tion, it is typical to develop an autonomous differential equa-
tion of motion for tissue displacement in terms of flow-
dependent driving pressures. Fletcher’s 共1993兲 small
amplitude analysis of autonomous vibration of simple
pressure-controlled valves in gas flows is relevant to vocal
fold vibrations, but the alternating convergent–divergent
glottis created by surface waves on the tissues were not mod-
eled. Hence, the analysis was basically for a one-mass model
共or the x10 mode in the abbreviated mode nomenclature ofT-
itze 共2006b, Chap. 4兲. Adachi and Sato’s 共1996兲 treatment of
two-dimensional lip vibration 共transverse and longitudinal to
the flow兲 captured a z10 mode in addition to the x10 mode, but
also did not include the surface wave. Titze’s earlier 共1988兲
analysis did include the surface wave, but considered only
downstream inertive reactance of the tract 共the most typical
for low F0 speech兲. In Chan and Titze 共2006兲 and in Titze
共2006b, Chap. 7兲, upstream inertive reactance was also con-
sidered, but not compliant reactance. Thus, to date nobody FIG. 8. 共Color online兲 Fletcher’s 共1993兲 small oscillation analysis; 共top兲
has derived an autonomous differential equation that in- reactance curves; 共middle兲 the difference between the oscillation frequency
cludes both inertive and compliant reactance, upstream and F and the no load frequency F0; 共bottom兲 oscillation threshold pressure Pth.
downstream, with the inclusion of a surface wave on the
tissue. true oscillating frequency for any applied load. Figure 8
Given that Fletcher’s 共1993兲 closed-form 共analytical兲 so- 共middle graph兲 shows calculations for F − F0, the difference
lutions contain both subglottal and supraglottal reactance ex- in oscillation frequency from the no-interaction fundamental
plicitly, his equations are utilized here to approximate fre- frequency. In the bottom graph we see the corresponding
quency and amplitude changes for autonomous 共self- oscillation threshold pressure Pth.
sustained兲 oscillation. Equations 共19兲 and 共20兲, from Fletcher Consider first some general trends. The oscillation fre-
共1993兲 p. 2176, were solved with the following parameters quency tends to be mostly below the no-interaction reso-
共Fletcher’s notation兲 that are relevant for human phonation: nance frequency F0. This is because inertive reactance domi-
␴1 = ␴2 = + 1 共Fletcher’s overpressure parameters that define nates in airways that have a constricted region 共the epilarynx
the valve type兲; W = 1.0 cm 共vocal fold length兲; m = 0.05 g tube in this case兲, creating effectively an increase in the mass
共vocal fold mass in vibration兲; S1 = S2 = S3 = 0.5 cm2 共inferior, of the oscillating system 共tissue and air columns collec-
medial, superior surfaces of vocal folds, respectively兲; x0 tively兲. Note that there is a general inverse relation between
= 0.03 cm 共neutral glottal half-width兲; p̄2 = 10 000 dyn/ cm2 F − F0 and the supraglottal reactance X2. The highest fre-
⬇ 1 kPa 共mean subglottal pressure兲; ␪ = 30°⫽entry and exit quency is at 750 Hz, where X1 = −30 dyn s / cm5 共compliant兲
angles into glottis; k = 共0.05兲 共2␲F0兲⫽tissue resonance band- and X2 = 0. This frequency is slightly greater than F0. The
width 共157 Hz at F0 = 500 Hz兲; X1 , X2⫽subglottal and supra- most notable drop in frequency 共−50 Hz兲 occurs just above
glottal reactances, respectively 共variables兲. 500 Hz, where both X1 and X2 are highly positive. The small
Both X1 and X2 were varied according to the reactance peak around 600 Hz is for X1 = X2 = 0, the no-interaction con-
curves of Fig. 2 共top right兲. They are redrawn for comparison dition for which F = F0.
at the top of Fig. 8. Recall that these reactances apply to a The threshold pressure Pth basically follows the sum of
uniform 3.0 cm2 subglottal tract and a uniform 3.0 cm2 su- the reactance curves. Aside from a narrow dip near the tube
praglottal tract that has a narrowed 0.5 cm2 epilarynx tube. resonances, where X1 + X2 = 0, the lowest threshold pressure
Fletcher’s analytical equations allow for direct specifi- is found in the 750– 1000 Hz region, where X1 is negative
cation of the natural frequency F0 of the vocal fold oscillator 共compliant兲 and X2 is positive 共inertive兲. Thus, as stated ear-
in a no-load condition. This frequency was varied from 0 to lier, the compliant – inertive acoustic load is the most favor-
1500 Hz as a pitch glide to correspond to human ranges of able to vocal fold oscillation. Titze and Sundberg 共1992兲
F0 and to predict the F0 jumps observed in the data of the have shown that every doubling of lung pressure above
companion paper 共Titze et al., 2008兲. F was defined to be the threshold raises the source intensity by about 6 dB. In addi-

2742 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
mode was present and vocal tract interaction was reduced.
Thus, the percentage of x11 mode excitation 共relative to x10兲
serves as a decoupler of vocal tract interaction, in direct op-
position to the narrowing of the epilarynx tube. It offers a
vocal tract-independence mechanism of self-sustained oscil-
lation 共Titze, 1988; Lucero and Koenig, 2007; Chan and
Titze, 2006; Jiang and Tao, 2007兲. Speakers and singers who
adjust their source and filter for linear coupling rely heavily
on this mechanism of self-sustained oscillation.

III. SOURCE–FILTER INTERACTIONS WITH


COMPUTER SIMULATIONS

Because of the many simplifications and limitations im-


posed by the low-dimensional vocal fold models and analyti-
cal treatments, it is important to balance the above-presented
FIG. 9. Sketches of a convergent glottis with two vibrational modes, 共a兲 the results with computer simulation models that are capable of
x10 mode and 共b兲 the x11 mode.
handling all levels of interaction and many degrees of free-
dom in tissue movement.
tion, Alipour et al. 共2001兲 showed that an approximate 6 dB
increase in intensity was obtained in an excised larynx when
A. Methods
a vocal tract was added by way of a physical tube that low-
ered the threshold pressure. Given that the threshold pressure An L ⫻ M ⫻ N point-mass model of the vocal folds was
in Fig. 8 varies from below 0.1 to above 0.3 kPa, and given used for simulation of flow-induced, self-sustained oscilla-
that typical lung pressures for speech range between 0.5 and tion. The details of this model are beyond the scope of any
1.0 kPa, it is possible that more than one doubling of Pth or single article, but are well documented in Titze 共2006b,
6–12dB in source intensity could be realized in the lower Chap. 4兲. L is the number of masses in the medio-lateral
threshold regions with normal lung pressures. But a word of direction 共7 in this simulation兲, M is the number of masses in
caution is in order. The Fletcher 共1993兲 equations used for the anterior–posterior direction 共5 in this simulation兲, and N
the Fig. 8 calculations did not contain vocal tract losses. is the number of masses in the inferio–superior direction 共5
Hence, the Pth fluctuations are probably overestimated. Nev- in this simulation兲. Thus, the model had 175 point masses,
ertheless, significant changes in source energy are likely with each with two degrees of freedom 共horizontal and vertical兲.
only a few tenths of a kPa reduction in Pth. Tissue properties were defined with a fiber-gel construct,
In any vocal fold model other than the one-mass model, where the fibers carried the nonlinear stress–strain character-
the a2 / a1 ratio is variable throughout the glottal cycle. The istics of muscle, ligament, and mucosa, and the gel proper-
amount of dynamic a2 / a1 variation can be explained in terms ties were defined with Young’s moduli, shear moduli, and
of two dominant modes of vibration of vocal fold tissues Poisson’s ratios 共Titze, 2006b; Chaps. 2–4兲. Aerodynamic
共Fig. 9兲. For prephonatory 共static兲 convergence, an x10 mode pressures and glottal flow were calculated with a modified
has no vertical variation in tissue displacement, as shown in Bernoulli equation that included flow separation and jet for-
Fig. 9共a兲. It produces less variation in the a2 / a1 ratio than an mulation by rule 共Titze, 2006b, Chap. 5兲. All vocal tract pres-
x11 mode, which has a 180° phase difference between top sures were computed with the well-known wave-reflection
and bottom displacement, as shown in Fig. 9共b兲. 关For a com- analog 共Liljencrants, 1985; Story et al., 1996; Titze, 2006b,
plete description of tissue modes and their nomenclature, see Chap. 6兲.
Titze 共2006b兲, Chap. 4兴. The a2 / a1 ratio for the x10 mode Based on the foregoing autonomous differential equa-
gradually increases and decreases over the open portion of tion analyses, it was important to include both a subglottal
the glottal cycle, but stays between 0.0 and 1.0. For the x11 and a supraglottal vocal tract to assess the individual and
mode, however, both convergence and divergence can be ex- combined interaction effects. As in the previous section on
perienced over the cycle. Thus, the x11 mode is less depen- level 1 interaction with forced oscillation, the subglottal sys-
dent on vocal tract interaction than the x10 mode because the tem had 36 cylindrical sections, each 0.398 cm long, for a
pushes and pulls from vocal tract pressures tend to cancel total length of 14.33 cm. The supraglottal system had 44
each other over the glottal cycle. This conclusion agrees with cylindrical sections, each of the same length, for a total of
what Flanagan and Landgraf 共1968兲 observed with a one- 17.51 cm. The first eight sections of the supraglottal tract
mass model, which only supports an x10 mode, and what again constituted the epilarynx tube, the diameter of which
Zhang et al. 共2006a, b兲 observed on a physical model. It is was kept uniform so that a single parameter 共Ae兲 could be
also the reason why Zañartu et al. 共2007兲 focused their recent varied for coupling strength. The sampling frequency was
analysis on the one-mass model. All models with a dominant 44.1 kHz. The medial surface of the vocal folds was nearly
x10 mode are highly affected by vocal tract reactance. In the flat, with a slight convergence in the inferio–superior direc-
two-mass model of Ishizaka and Flanagan 共1972兲, the x11 tion and a similar convergence 共tapering兲 in the posterio–

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2743

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
reactance curves is set at time= 4 s, and the scaling of the
reactance magnitude can be obtained from Fig. 2. Under-
neath the spectrogram, the amplitude envelope for the glottal
flow ug is shown in the middle panel and the amplitude en-
velope for the glottal area ag is shown in the bottom panel.
The following parameters of the model were held constant in
the simulations: PL = 1.5 kPa, LCA activity⫽31%, IA activity
⫽30%, PCA activity⫽0.0%, vowel⫽uniform tube with nar-
rowed epilarynx tube as in Fig. 2, with no nasal coupling. CT
activity was varied from 90-0-90% and TA activity from
0-10-0%. Details of how these activities affect laryngeal pos-
turing in a speech-like gesture can be found in Titze and
Hunter 共2007兲.
To simulate no vocal tract interaction, the pressures in
the glottis were programmed as described previously in the
forced oscillation case 共level 1 interaction兲. Waves propa-
gated through the vocal tract and were radiated from the
mouth, but this propagation produced no load on the glottis
because the incident pressures 共traveling waves兲 were nulli-
fied by programming for glottal flow and pressure calcula-
tion. Only aerodynamic pressures and flows were retained.
Note that this caused glottal flow and glottal area envelopes
to be proportional, as in the forced oscillation case. The glot-
tal area spectrogram 共not shown兲 was identical to the glottal
flow spectrogram, indicating that there was no interaction.
Both spectrograms showed harmonics; however, these re-
sulted from vocal fold collision in this case instead of flow
FIG. 10. 共Color online兲 Simulation of downward and upward F0 glide pro- pulse skewing. 共It is difficult to obtain a perfectly sinusoidal
duced with a 175 point-mass self-oscillating biomechanical model of the area with no collision with self-sustained oscillation.兲 The
vocal folds with no vocal tract interaction; 共top兲 spectrogram, with reactance
curves superimposed vertically, thick white line supraglottal and thin white
broad peak in the center of the ug and ag envelopes was due
line subglottal; 共middle兲 glottal flow envelope; 共bottom兲 glottal area enve- to greater laxness of the tissue, which caused greater vibra-
lope. tional amplitude when F0 was low. The fundamental fre-
quency 共F0兲 in the glide ranged from 700 to 330 Hz. Note
anterior direction. Dynamically, the shape of the glottis as- that F0 passed through the positive peak of the reactance
sumed a variety of mode configurations due to surface wave curve at about 1.0 s and again at about 7.0 s, but this had no
propagation. effect on the flow or area waveforms, indicating linear
Control parameters for this model were lung pressure PL source–filter coupling for this control case.
and simulated muscle activations of the intrinsic laryngeal
muscles of the larynx: cricothyroid 共CT兲, thyroarytenoid 2. Simulation 2
共TA兲, lateral cricoarytenoid 共LCA兲, posterior cricoarytenoid For the next simulation, subglottal interaction alone was
共PCA兲, and interarytenoid 共IA兲. These muscle activations investigated. To uncouple the supraglottal tract, the supra-
共ranging from 0.0 for no activation to 1.0 for 100% activa- glottal pressure was set to zero in the program for the pur-
tion兲 positioned the vocal folds 共Titze, 2006b, Chap. 3; Titze pose of glottal flow and intraglottal pressure calculation, but
and Hunter, 2007兲 and determined all of the tissue properties. not for wave propagation. Figure 11 shows the result. Note
Specific values will be given in the following sections. first that the overall signal strength is a little less than for no
interaction, judged by height of the ag and ug envelopes, and
B. Results there is not an exact proportionality between the glottal area
and glottal flow. More important, new frequencies have been
1. Simulation 1
created due to bifurcations in tissue movement, as seen in the
A no-interaction case was first investigated as a control spectrogram on top. At 0.8 s, a period-3 subharmonic occurs,
case. A high–low–high pitch glide was simulated under self- which changes to a period-5 subharmonic at 1.6 s. The tim-
sustained oscillation, following the pattern produced earlier ing of the period-3 bifurcation agrees with F0 being in the
for level 1 interaction and produced by human subjects in the maximum negative reactance dip and the period-5 bifurca-
companion paper. Figure 10 shows a spectrogram for the tion occurs when 2F0 enters the negative reactance region.
glottal flow waveform, with both the subglottal reactance Much needs to be understood about the nature and onset of
共thin white solid line兲 and the supraglottal reactance 共thick these subharmonic bifurcations, but this is a topic for subse-
white line兲 superimposed vertically. These reactances are the quent research. Suffice it to say here that individual harmon-
same as in previous figures 共Figs. 2–4 and 8兲 for the uniform ics passing through rapidly changing reactance regions can
tubes with a 0.5 cm2 epilarynx tube. The zero line for the destabilize the vibration regimes, as the foregoing analysis

2744 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 11. 共Color online兲 Simulation of downward and upward F0 glide pro-
duced with a 175 point-mass self-oscillating biomechanical model of the
FIG. 12. 共Color online兲 Simulation of downward and upward F0 glide pro-
vocal folds with subglottal interaction only; 共top兲 spectrogram, with subglot-
duced with a 175 point-mass self-oscillating biomechanical model of the
tal reactance superimposed vertically with white line; 共middle兲 glottal flow
vocal folds with supraglottal interaction only; 共top兲 spectrogram, with su-
envelope; 共bottom兲 glottal area envelope.
praglottal reactance superimposed vertically with white line; 共middle兲 glottal
flow envelope; 共bottom兲 glottal area envelope.

demonstrated. Our quantitative interest here is in F0 drops, 3. Simulation 3


because they are the most predictable instabilities and the
most prevalent in our companion paper on human phonation. The next simulation was for supraglottal interaction
alone. Here subglottal pressure was set to the lung pressure,
An F0 drop of about 50 Hz occurs at 2.4 s. This F0 drop,
but supraglottal traveling waves were kept intact. Results are
predicted well by the foregoing small oscillation analysis
shown in Fig. 12. The waveform envelopes show that this
with Fletcher’s 共1993兲 model 共Fig. 8兲 and by Zhang et al.
case has the greatest overall signal energy. 共The scale of ug
共2006a,b兲 on a physical laboratory model that had only sub-
has been increased by a factor of 3 and the scale of ag by a
glottal interaction, was a “correction” from a higher F0 that
factor of 2 with respect to Figs. 10 and 11兲. A weak period-4
prevailed in the compliant reactance region. The starting F0 bifurcation occurred at 2.0 s, followed by a stronger one at
was 750 Hz rather than the no-interaction F0 of 700 Hz seen 2.7 s. F0 and 2F0 both went from higher to lower inertive
in Fig. 10. Compliance basically adds stiffness to the inter- reactance in this bifurcation region. With regard to funda-
active vibrating system, thereby raising F0, whereas iner- mental frequency changes, note that F0 started lower than in
tance adds mass, thereby lowering F0. An interesting obser- the noninteractive case, 570 Hz instead of 700 Hz. This
vation is that the 50 Hz pitch jump occurred below the “pulling down” of F0 toward F1, even when all muscle ac-
positive peak of the reactance curve. This may be related to tivities were the same as before, resulted from the inertive
the fact that the second harmonic 2F0 entered the compliant reactance of the vocal tract, which 共as stated earlier兲 adds
region at about 2.4 s, which could have delayed the jump by mass to the interactive oscillating system. A very small F0
adding some stiffness to the system before inertive reactance rise 共10– 20 Hz兲 is then seen at the beginning of the strong
took over. At 3.2 s in Fig. 11, some perturbed vibration oc- period-4 bifurcation, correlated with diminished reactance
curred, as indicated by the darker 共noisier兲 background. This for both 2F0. This small rise, together with the subharmonic
aperiodic vibration is related to an impedance mismatch in- regime, delays the eventual larger drop in F0 of about 100 Hz
stability, i.e., the source impedance became lower, on aver- in positive reactance territory. Finally, the vocal fold vibra-
age, than the vocal tract input impedance, and highly vari- tional amplitude became disproportionably large and slightly
able due to oscillation. The F0 upglide showed a slight unstable in the 3.0– 5.0 s region. A hysteresis effect was also
asymmetry in the duration of another brief noisy regime at seen in that the bifurcations were delayed on the upslope of
5.6 s, indicating a small hysteresis effect. the glide. Given that these bifurcations did not appear in the

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2745

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
FIG. 14. 共Color online兲 Simulation of downward and upward F0 glide pro-
FIG. 13. 共Color online兲 Simulation of downward and upward F0 glide pro-
duced with a 175 point-mass self-oscillating biomechanical model of the
duced with a 175 point-mass self-oscillating biomechanical model of the
vocal folds with both subglottal and supraglottal interaction, Ae = 0.2 cm2;
vocal folds with both subglottal and supraglottal interaction, Ae = 3.0 cm2;
共top兲 spectrogram, with subglottal reactance 共thin white line兲 and supraglot-
共top兲 spectrogram, with subglottal reactance 共thin white line兲 and supraglot-
tal reactance 共thick white line兲 superimposed; 共middle兲 glottal flow enve-
tal reactance 共thick white line兲 superimposed; 共middle兲 glottal flow enve-
lope; 共bottom兲 glottal area envelope.
lope; 共bottom兲 glottal area envelope.

noninteractive case 共Fig. 10兲, we conclude that they are the less than that of the noninteractive case 共the scale for the
result of supraglottal source–filter interaction. signal envelopes was set back to that of Figs. 10 and 11兲.

4. Simulation 4 5. Simulation 5

The next simulation was with combined subglottal and Figure 14 shows results for a severe case of epilaryngeal
supraglottal interactions. Thus, it represented a realistic 共non- constriction 共Ae = 0.2 cm2兲. The overall signal strength was
restricted兲 situation. The degree of supraglottal interaction much higher 共scale on the glottal flow envelope was changed
was controlled by two separate values of the cross-sectional to 3.0 l / s兲, and stronger bifurcations occurred. A 100 Hz F0
area of the epilarynx tube Ae. For the lowest degree of inter- jump occurred at 1.0 s, which was preceded by a period-4
action 共Ae = 3.0 cm2兲 the epilarynx tube diameter was equal subharmonic, again as F0 entered the rapidly changing reac-
to that of the uniform tube 共recall Fig. 1, top left兲. Figure 13 tance region. When 2F0 entered the minimum reactance re-
shows the simulation results for this configuration. Both sub- gions, around 3.0 s, destabilization occurred in the vibration,
glottal and supraglottal reactance curves are shown in white as evidenced by further bifurcations. Strong hysteresis ef-
on the spectrogram, supraglottal being the thicker line. Again fects were also evident in that reverse bifurcations occurred
the scales are arbitrary, but the relative magnitudes between at higher frequencies. Of particular significance is the fact
subglottal and supraglottal reactance are correct. There were that vibrational amplitude increased and decreased sharply
two F0 drops, one at about 0.8 s, where F0 entered a region 共and irregularly兲 near the lowest F0 共middle graph兲, while the
of rapidly changing overall F1 reactance and 2F0 entered a flow amplitude reached a plateau 共bottom graph兲. This pla-
region of positive supraglottal reactance, and another at drop teau is attributed to the fact that the input impedance to the
about 3.0 s, where F0 entered the region of positive supra- vocal tract became higher than the glottal impedance for
glottal F1 reactance. Also, 2F0 entered the compliant– Ae = 0.2 cm2, thus limiting the flow.
compliant region at the second drop, and a period-3 bifurca-
tion occurred when 2F0 entered this region, similar to what 6. Changes in energy levels
was seen in Fig. 12. Some evidence of chaotic vibration was Table I shows numerical results for a selected group of
seen near 4.0 s, where the lowest F0 and the highest vibration dynamical variables at all levels of interaction discussed ear-
amplitude occurred. The overall signal strength was slightly lier. 共An intermediate case for Ae = 0.5 cm2 was added to

2746 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
TABLE I. Output quantities for various degrees of interaction.

Mean glottal Mean glottal Aerodynamic Radiated Efficiency


area 共cm2兲 flow 共l/s兲 power 共w兲 power 共w兲 共%兲

No interaction 0.0381 0.3284 0.4825 0.0117 2.42


Subglottal only 0.0336 0.1835 0.2264 0.0033 1.44
Supraglottal only 0.0579 0.5014 0.7408 0.0523 7.07
共Ae = 0.5 cm2兲
Subglottal and supraglottal 0.0347 0.1717 0.2226 0.0014 0.62
共Ae = 3.0 cm2兲
Subglottal and supraglottal 0.0405 0.1860 0.2228 0.0035 1.59
共Ae = 0.5 cm2兲
Subglottal and supraglottal 0.0534 0.3200 0.2604 0.0143 5.49
共Ae = 0.2 cm2兲

show the progression with increased nonlinear coupling.兲 aerodynamic power used. Hence, the glottal efficiency had
The output quantities for comparison are mean glottal area, doubled, with the same lung pressure. Comparing mild inter-
mean glottal flow, aerodynamic power at the glottis, acoustic action 共Ae = 3.0 cm2兲 to strong interaction 共Ae = 0.2 cm2兲, all
power radiated at the mouth, and glottal efficiency 共the ratio variables increased categorically. The radiated acoustic
of radiated power to aerodynamic power兲. The numbers rep- power increased by a factor of 10, as did the glottal effi-
resent time averages over the entire 8.0 s pitch glide. The ciency. We made no attempt to optimize the output variables
“no interaction case” 共top row兲 produced self-sustained os- by choosing the best value of Ae for the given glottal condi-
cillation by excitation of the x11 mode with aerodynamic tions, but we expect that such an optimization would yield an
pressures only. It serves as the control case. Consider first the even higher efficiency.
“subglottal only” interaction case 共row 2兲. In comparison to
“no interaction,” every dynamic quantity was reduced in
magnitude. This is consistent with the expectation that sub- IV. DISCUSSION AND CONCLUSIONS
glottal inertance generally hinders vocal fold vibration, and
such inertance was present over most of the F0 glide. To the Source–filter interaction has been divided into two lev-
contrary, for the supraglottal only interaction case 共row 3兲, els. Level 1 is the interaction of glottal airflow with acoustic
every dynamic variable was increased, again consistent with vocal tract pressures, even if vocal fold vibration is undis-
previous predictions and discussions. Most variables in- turbed. The interaction parameter is the mean glottal area
creased by a factor of about 2 with supraglottal interaction divided by the effective 共parallel combination兲 tube area of
only, but acoustic radiated power increased by a factor of the subglottis and supraglottis. For constant adduction of the
more than 4. Glottal efficiency 共the ratio of acoustic radiated vocal folds and constant large tracheal diameters, the cou-
power to glottal aerodynamic power兲 increased by a factor of pling parameter becomes the cross-sectional area of the ep-
about 3. Thus, supraglottal interaction alone would be a ilarynx tube. Level 1 interaction produces harmonic distor-
highly favored condition. Unfortunately, the trachea is al- tion frequencies that contribute to the source spectrum. This
ways present in human phonation. Its effect cannot be re- interaction is present in all speech and singing, male and
moved, but perhaps altered for better reinforcement of vocal female. It has been described for nearly three decades, but
fold vibration by larynx lowering or raising. generally underestimated in magnitude because details of the
With both subglottal and supraglottal interaction 共rows lower vocal tract were unknown. Level 1 interaction contrib-
4–6兲, the results were dependent on the degree of interaction, utes to the spectral slope and the spectral ripple in the glottal
which was controlled by the parameter Ae, the epilarynx tube sound source, even when the spectrum is purely harmonic
area. For Ae = 3.0 cm2, the same as the area of the rest of the and no bifurcations in vocal fold vibration occur. The supra-
uniform tube, the radiated acoustic power was only glottal and subglottal impedances are additive for this inter-
0.0014 W, nearly an order of magnitude less than for no action. If both impedances are inertive 共positive兲, a maxi-
interaction. This suggests that interaction is not necessarily mum skewing of the flow pulse is achieved, which increases
an advantage. If the impedances are not well matched, more the maximum flow declination rate and thereby vocal inten-
power can be absorbed internally in the system 共Titze, 2002兲. sity. Individual harmonics can be enhanced or suppressed by
Efficiency was only 0.62% compared to 2.43% for no inter- frequency-dependent reactances that change from positive to
action. Mean glottal flow was about half, as was the aerody- negative.
namic power. Mean glottal area remained about the same. An interesting discovery was made in this investigation
As the degree of interaction increased with a decrease in with regard to level 1 interaction. The entire spectrum of
the epilarynx tube area Ae 共rows 5 and 6兲, all dynamical source frequencies can 共theoretically兲 be produced without
variables increased. For the greatest interaction, Ae vocal fold collision. This finding could have an impact on
= 0.2 cm2, all quantities except aerodynamic power were voice therapy, particularly for vocal fold pathologies result-
greater than those for the “no interaction” case. This suggests ing from excessive tissue collision stress. With a sinusoidally
a double advantage, getting more radiated power for less varying glottal area and no vocal fold contact, a

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2747

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
−12 dB/ octave spectral slope was shown here to be achiev- may have cultivated a mixed register for speech to avoid this
able with an epilarynx tube cross-sectional area of 0.5 cm2. instability, given that a 300 Hz fundamental frequency is
Level 2 interaction is realized more in high F0 produc- well within their speaking range. The companion paper
tions for which the dominant harmonics 共F0 , 2F0 , 3F0 , . . . 兲 shows the females exhibit fewer instabilities on a pitch glide
are near the formants. Frequency jumps and a variety of new than males, even though the likelihood of F0 − F1 crossing is
source frequencies or instabilities can be produced, including greater.
subharmonics and non-random noise. The instabilities occur A great amount of future work is needed to determine
mostly when one of the dominant harmonics encounters sud- the extent to which combinations of subglottal and supraglot-
den changes in reactance, destabilizing the modes of vibra- tal reactance can be exploited, especially in high-pitched
tion of the tissue that are affected differently by reactance. speech, loud speech, and singing on a variety of vowels. In
The control parameter for these phenomena is the same as this paper, all discussions pertained to a neutral-shaped vocal
for level 1 interaction. Computer simulations with a high- tract. Detailed interaction effects need to be developed for all
dimensional model showed that vocal efficiency, the ratio of vowels and consonants. Finally, although humans do not
radiated acoustic power to aerodynamic power, can increase have much control over tracheal diameters, tracheal lengths
by an order of magnitude when the epilarynx tube area is can be changed with larynx raising and lowering. Perhaps
narrowed from 3.0 to 0.2 cm2, but this narrowing also pro- the subglottal entry configuration can also be changed for
duced greater instabilities when dominant harmonics were in more compliance, and the soft wall construct of the trachea
unfavorable reactance regions 共i.e., near formants兲. 共the portion in contact with the esophagus兲 may be useful for
A thick and pliable mucosal layer on the vocal folds can introducing subglottal compliance.
lead to self-sustained oscillation without much reliance on
vocal tract reactance. A parameter for this is the strength of ACKNOWLEDGMENT
the x11 mode of vibration 共characterized by large vertical Funding for this work was provided by the National In-
phase differences兲 relative to the x10 mode. Some vocalists stitute on Deafness and Other Communication Disorders,
may have the choice to operate in either a nearly linear re- Grant No. 5 R01 DC004224 08.
gion, with maximum harmonic stability, or in a nonlinear
region with greater output power and greater efficiency but at 1
Reactance is the energy-storing part of impedance, in contrast to resis-
the expense of less harmonic stability. tance, which is the energy-dissipating part. These two parts of impedance
In the companion paper 共Titze et al., 2008兲, the most are written as a complex number 共real and imaginary part兲, with reactance
frequently occurring instabilities in human subjects were F0 being the imaginary part of the impedance. Positive reactance is labeled
inertive 共because the acoustic flow lags behind the pressure in phase兲 and
jumps, in magnitude on the order of 30– 40 Hz. But not all negative reactance is labeled compliant 共because the acoustic flow leads
subjects exhibited these instabilities, suggesting that nonlin- the pressure in phase兲.
ear interaction varies across subjects, and perhaps even
within subjects for repeated vocalizations. The theoretical Adachi, S., and Sato, M. 共1996兲. “Trumpet sound simulation using a two-
analyses here predicted the F0 jumps, both with analytical dimensional lip vibration model,” J. Acoust. Soc. Am. 99, 1200–1209.
Alipour, F., Montequin, D., and Tayama, N. 共2001兲. “Aerodynamic profiles
treatments and with simulation. They are mostly triggered of a hemilarynx with a vocal tract,” Ann. Otol. Rhinol. Laryngol. 110,
when F0 passes through F1, but occasionally when 2F0 550–555.
passes through F1, as both theory and measurement showed. Alipour, F., and Scherer, R. C. 共2007兲. “On pressure-frequency relations in
the excised larynx,” J. Acoust. Soc. Am. 122, 2296–2305.
Larger F0 jumps tend to occur with greater coupling 共i.e., a
Appelman, D. R. 共1967兲. The Science of Vocal Pedagogy: Theory and Ap-
narrower epilarynx tube兲. plication 共Indiana University Press, Bloomington, IN兲.
For the modal voice register, used largely in speech at Atal, B. S., and Schroeder, M. R. 共1978兲. “Linear prediction analysis of
relatively low F0, the ideal vocal tract load for self-sustained speech based on a pole-zero representation,” J. Acoust. Soc. Am. 64,
1310–1318.
oscillation would be subglottal compliance and supraglottal Chan, R., and Titze, I. R. 共2006兲. “Dependence of phonation threshold pres-
inertance. The x10 mode would get maximum reinforcement. sure on vocal tract acoustics and vocal fold tissue mechanics,” J. Acoust.
Unfortunately, this combination does not exist at low F0 in Soc. Am. 119, 2351–2362.
the human voice. Both reactances tend to be inertive because Chiba, T., and Kajiyama, M. 共1958兲. The Vowel and Its Nature and Structure
共Tokyo-Kaisenikan, Tokyo兲.
the trachea and the supraglottal tract are roughly of equal Coffin, B. 共1987兲. Coffin’s Sounds of Singing: Principles and Applications of
length. Although this inertive–inertive combination has been Vocal Techniques with Chromatic Vowel Chart, 2nd ed. 共The Scarecrow
shown to be less favorable for self-sustained oscillation than Press, Metuchen, NJ兲.
the compliant–inertive combination, flow pulse skewing Fant, G. 共1960兲. The Acoustic Theory of Speech Production 共Moulton, The
Hague兲.
共level 1 interaction兲 benefits from dual inertive reactance. Fant, G. 共1986兲. “Glottal flow: Models and interaction,” J. Phonetics 14,
Hence the two effects are offsetting. The worst combination 393–399.
for self-sustained oscillation in the modal register is an iner- Fant, G., and Lin, Q. 共1987兲. “Glottal voice source—Vocal tract acoustic
tive subglottal tract and a compliant supraglottal tract. This interaction,” Q. Prog. Status Rep. STL-QPSR 4, 13–27.
Fant, G., Linjencrants, J., and Lin, Q. 共1985兲. “A four-parameter model of
combination can occur in speech for low F1 vowels such as glottal flow,” STL Q. Prog. Status Rep. 4, 1–13.
/i/ and /u/. For example, if F0 = 300 Hz, F1 = 250 Hz, and F11 Flanagan, J. L. 共1968兲. “Source-system interaction in the vocal tract,” Ann.
= 600 Hz 共first subglottal formant兲, then the inertive– N.Y. Acad. Sci. 155, 9–17.
Flanagan, J. L. 共1972兲. Speech Analysis, Synthesis, and Perception
compliant condition exists. If interaction is high 共narrow ep-
共Springer, New York兲.
ilarynx tube兲, the register can flip from modal to falsetto, Flanagan, J., and Landgraf, L. L. 共1968兲. “Self-oscillating source for vocal
which is more sustainable with this acoustic load. Females tract synthesizers,” IEEE Trans. Audio. Electroacoust. AU-16, 57–64.

2748 J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms
Fletcher, N. H. 共1993兲. “Autonomous vibration of simple pressure- of an artificially lengthened and constricted vocal tract,” J. Voice 14, 455–
controlled valves in gas flows,” J. Acoust. Soc. Am. 93, 2172–2180. 469.
Fulcher, L. P., Scherer, R. C., Zhai, G., and Zhu, Z. 共2006兲. “Analytic Story, B. H. 共1995兲. “Speech simulation with an enhanced wave-reflection
representation of volume flow as a function of geometry and pressure in a model of the vocal tract.” Ph.D. dissertation, University of Iowa, Iowa
static physical model of the glottis,” J. Voice 20, 489–512. City, IA.
Hatzikirou, H., Fitch, W. T. S., and Herzel, H. 共2006兲. “Voice instabilities Story, B. H. 共2005兲. “Synergistic modes of vocal tract articulation for
due to source-tract interactions,” Acta. Acust. Acust. 92, 468–475. American English vowels,” J. Acoust. Soc. Am. 118, 3834–3859.
Hirano, M. 共1975兲. “Phonosurgery: Basic and clinical investigations,” Otol. Story, B. H., Titze, I. R., and Hoffman, E. A. 共1996兲. “Vocal tract area
共Fukuoka兲 21, 239–440. functions from magnietic resonance imaging,” J. Acoust. Soc. Am. 100,
Ishizaka, K., and Flanagan, J. L. 共1972兲. “Synthesis of voiced source sounds 537–554.
from a two-mass model of the vocal cords,” Bell Syst. Tech. J. 51, 1233– Sundberg, J. 共1977兲. “The acoustics of the singing voice,” Sci. Am. 236,
1268. 82–91.
Jiang, J., and Tao, C. 共2007兲. “The minimum glottal airflow to initiate vocal Svec, J. G., Schutte, H. K., and Miller, D. G. 共1999兲. “On pitch jumps
fold oscillation,” J. Acoust. Soc. Am. 121, 2873–2881. between chest and falsetto registers in voice: Data from living and excised
Joliveau, E., Smith, J., and Wolfe, J. 共2004兲. “Vocal tract resonances in human larynges,” J. Acoust. Soc. Am. 106, 1523–1531.
singing: The soprano voice,” J. Acoust. Soc. Am. 116, 2234–2439. Titze, I. R. 共1984兲. “Parameterization of the glottal area, glottal flow, and
Kelly, J. L., and Lochbaum, C. 共1962兲. “Speech synthesis,” Proceedings of vocal fold contact area,” J. Acoust. Soc. Am. 75, 570–580.
the Fourth International Congress on Acoustics, Paper G42, pp. 1–4. Titze, I. R. 共1988兲. “The physics of small-amplitude oscillation of the vocal
Klatt, D. H., and Klatt, L. C. 共1990兲. “Analysis, synthesis, and perception of folds,” J. Acoust. Soc. Am. 83, 1536–1552.
voice quality variations among female and male talkers,” J. Acoust. Soc. Titze, I. R. 共2000兲. Principles of Voice Production 共National Center for
Am. 87, 820–857. Voice and Speech, Denver, CO兲.
Titze, I. R. 共2001兲. “Acoustic interpretation of resonant voice,” J. Voice 15,
Koizumi, T., Taniguchi, S., and Hiromitsu, S. 共1985兲. “Glottal source-vocal
519–528.
tract interaction,” J. Acoust. Soc. Am. 78, 1541–1547.
Titze, I. R. 共2002兲. “Regulating glottal airflow in phonation: Application of
Liljencrants, J. 共1985兲. “Speech synthesis with a reflection-type line analog,”
the maximum power transfer theorem to a low dimensional phonation
Doctoral dissertation, Department of Speech Communication and Music
model,” J. Acoust. Soc. Am. 111, 367–376.
Acoustics, Royal Institute of Technology, Stockholm, Sweden.
Titze, I. R. 共2004a兲. “A theoretical study of F0-F1 interaction with applica-
Lucero, J. C., and Koenig, L. L. 共2007兲. “On the relation between the pho-
tion to resonant speaking and singing voice,” J. Voice 18, 292–298.
nation threshold lung pressure and the oscillation frequency of the vocal
Titze, I. R. 共2004b兲. “Theory of glottal airflow and source-filter interaction
folds,” J. Acoust. Soc. Am. 121, 3280–3283. in speaking and singing,” Acta. Acust. Acust. 90, 641–648.
Markel, J. D., and Gray, A. H. J. 共1976兲. Linear Prediction of Speech Titze, I. R. 共2006a兲. “Theoretical analysis of maximum flow declination rate
共Springer, New York兲. versus maximum area declination rate in phonation,” J. Speech Lang.
McGowan, R. S. 共1992兲. “Tongue-tip trills and vocal-tract wall compliance,” Hear. Res. 49, 439–447.
J. Acoust. Soc. Am. 91, 2903–2910. Titze, I. R. 共2006b兲. The Myoelastic-Aerodynamic Theory of Phonation 共Na-
Mergell, P., and Herzel, H. 共1997兲. “Modeling biphonation—The role of the tional Center for Voice and Speech, Denver, CO兲.
vocal tract,” Speech Commun. 22, 141–154. Titze, I. R., and Hunter, E. J. 共2007兲. “A two-dimensional biomechanical
Miller, D. G., and Schutte, H. K. 共2005兲. “ ‘Mixing’ the registers: Glottal model of vocal fold posturing,” J. Acoust. Soc. Am. 121, 2254–2260.
source or vocal tract?,” Folia Phoniatr Logop 57, 278–291. Titze, I. R., Riede, T., and Popolo, P. 共2008兲. “Nonlinear source-filter cou-
Neumann, K., Schunda, P., Hoth, S., and Euler, H. A. 共2005兲. “The interplay pling in phonation: Vocal exercises,” J. Acoust. Soc. Am. 123, 1902–1915.
between glottis and vocal tract during male passaggio,” Folia Phoniatr Titze, I. R., and Story, B. H. 共1997兲. “Acoustic interactions of the voice
Logop 57, 308–327. source with the lower vocal tract,” J. Acoust. Soc. Am. 101, 2234–2243.
Rachele, R. 共1996兲. Overtone Singing Study Guide, 2nd ed. 共Cryptic Voices Titze, I. R., and Sundberg, J. 共1992兲. “Vocal intensity in speakers and sing-
Productions, Amsterdam, The Netherlands兲. ers,” J. Acoust. Soc. Am. 91, 2936–2946.
Rosenberg, A. 共1971兲. “Effect of the glottal pulse shape on the quality of Van Den Berg, J. 共1957兲. “Subglottal pressures and vibration of vocal folds,”
natural vowels,” J. Acoust. Soc. Am. 49, 583–590. Folia Phoniatr. 9, 65–71.
Rothenberg, M. 共1981兲. “Acoustic interaction between the glottal source and Vennard, W. 共1967兲. Singing: Mechanism and Technique 共Fisher, New
the vocal tract,” in Vocal Fold Physiology, edited by K. N. Stevens and M. York兲.
Hinano 共University of Tokyo Press, Tokyo兲, 305–328. Yumoto, E., and Kadota, Y. 共1998兲. “Pliability of the vocal fold mucosa in
Rothenberg, M. 共1987兲. “Cosi fan tutte and what it means or nonlinear relation to the mucosal upheaval during phonation,” Arch. Otolaryngol.
source-tract acoustic interaction in the soprano voice and some implica- Head Neck Surg. 124, 897–902.
tions for the definition of vocal efficiency,” in Laryngeal Function in Zañartu, M., Mongeau, L., and Wodlicka, G. R. 共2007兲. “Influence of acous-
Phonation and Respiration, edited by T. Baer, C. Sasaki, K. S. Harris, . tic loading on an effective single mass model of the vocal folds,” J.
共College-Hill Press, Little, Brown and Company, Boston兲, pp. 254–269. Acoust. Soc. Am. 121, 1119–1129.
Rothenberg, M., and Zahorian, S. 共1977兲. “Nonlinear inverse filtering tech- Zhang, C., Zhao, W., Frankel, S. H., and Mongeau, L. 共2002兲. “Computa-
nique for estimating the glottal-area waveform,” J. Acoust. Soc. Am. 61, tional aeroacoustics of phonation, 2. Effects of flow parameters and ven-
1063–1071. tricular folds,” J. Acoust. Soc. Am. 112, 2147–2154.
Scherer, R. C., Titze, I. R., and Curtis, J. F. 共1983兲. “Pressure-flow relation- Zhang, Z., Neubauer, J., and Berry, D. A. 共2006a兲. “The influence of sub-
ships in two models of the larynx having rectangular glottal shapes,” J. glottal acoustics on laboratory models of phonation,” J. Acoust. Soc. Am.
Acoust. Soc. Am. 73, 668–676. 120, 1558–1569.
Sondhi, M. M., and Schroeter, J. 共1987兲. “A hybrid time-frequency domain Zhang, Z., Neubauer, J., and Berry, D. A. 共2006b兲. “Aerodynamically and
articulatory speech synthesizer,” IEEE Trans. Acoust., Speech, Signal Pro- acoustically driven modes of vibration in a physical model of the vocal
cess. 35, 955–967. folds,” J. Acoust. Soc. Am. 120, 2841–2849.
Stevens, K. 共1999兲. “Current studies in linguistics,” Acoustic Phonetics Zhao, W., Zhang, C., Frankel, S. H., and Mongeau, L. 共2002兲. “Computa-
共MIT, Cambridge, MA兲. tional aeroacoustics of phonation. 1. Computational methods and sound
Story, B., Laukkanen, A.-M., and Titze, I. R. 共2000兲. “Acoustic impedance generation mechanisms,” J. Acoust. Soc. Am. 112, 2134–2146.

J. Acoust. Soc. Am., Vol. 123, No. 5, May 2008 Ingo R. Titze: Nonlinear source–filter theory 2749

Downloaded 19 Jun 2013 to 131.104.62.10. Redistribution subject to ASA license or copyright; see http://asadl.org/terms

Vous aimerez peut-être aussi