Vous êtes sur la page 1sur 4

Effect of Voice Onset Type on Vocal Attack Time

*Ben C. Watson, *R. J. Baken, and Rick M. Roark, *yValhalla, New York

Summary: Vocal attack time (VAT) is the time lag between the growth of sound pressure (SP) and electroglottographic
(EGG) signals at vocal initiation. The characteristics of voice initiation are associated with issues of vocal hygiene, ef-
ficiency, and quality. Vocal onsets have commonly been qualitatively characterized into three types: hard, simultaneous,
and breathy. This study examines the effect of voice onset type on VAT values in normal speakers. SP and EGG record-
ings were obtained for 55 female and 57 male subjects while producing multiple tokens of three tasks (sustained // and
always as unaspirated onsets, and hallways as an aspirated onset). Results revealed a significant effect of onset type
on VAT, with the mean VAT for the hallways (aspirated) task greater than the mean VAT for the sustained // and
always (unaspirated) tasks. There was no significant VAT difference between the sustained // and always tasks.
Findings confirm the sensitivity of the VAT measure to vocal onset type and suggest its potential application as an objec-
tive and quantitative clinical measure of the type of vocal onset.
Key Words: Voice onsetVocal attack timePhonationVocal fold physiology.

INTRODUCTION the VAT measure against high-speed videoendoscopy, from


Voice researchers, teachers, and clinicians have long identified which a digital kymogram was generated, and showed that
voice initiation as an important aspect of vocal performance and breathy onsets have positive VAT values, whereas hard
have tied it to issues of vocal hygiene,1,2 efficiency,3,4 and onsets have negative VAT values. Roark et al22 and Ma et al23
quality.57 Indeed, the speed with which the vocal folds reported normative VAT values for English and Cantonese
adduct to the midline is considered an important variable in speakers, respectively, in tasks for which voice onset was not
the etiology of some voice disorders and a meaningful constrained.
indicator of neural dysfunction.814 It is generally accepted The present study builds on the work of Roark et al22 to
that the initiation of phonation includes two somewhat examine effects of aspirate voice onsets on VAT values in
distinct phases: (1) the prephonatory adjustment phase that is normal speakers. Specifically, we compare VAT values for tasks
associated with setting the appropriate tension, gross that can be produced with either hard or simultaneous onsets
adduction, and aerodynamic forces15; and (2) the attack phase with a task that requires an aspirate initiation.
that is associated with the onset of vocal fold oscillation and
sound generation. In one of the first voice studies to use
ultrahigh-speed cinematography, Moore16 addressed the issue SUBJECTS
of the movements of the larynx leading to and including the This study was approved by the Committee for the Protection of
initiation of tone. Moore described the observed differences Human Subjects (Institutional Review Board) of New York
between what he termed the glottal stroke or glottal shock Medical College, and all subjects provided informed consent
and the breathed and simultaneous vocal attacks. Based before data collection began. The subject sample for the present
largely on an auditory-perceptual categorization, voice onset study is the same sample previously described in Roark et al22
continues to be popularly divided into abrupt or hard, normal and included 55 females (mean age: 28 years; range:
or simultaneous, and breathy, aspirate, or soft glottal attack.17,18 2250 years) and 57 males (mean age: 29 years; range: 21
Baken and Orlikoff19,20 proposed that the time delay between 50 years). A health history that included chronic speech, voice,
the rise of the sound pressure (SP) and electroglottographic cardiac, respiratory, or neurologic disorder was cause for exclu-
(EGG) signals is related to characteristics of the attack sion from the study. A current health status that included upper
gesture. Specifically, a lag in the rise of the SP signal relative respiratory infection was cause for a delay in participation until
to the EGG signal may correspond to a hard glottal onset the infection resolved.
while a lead in the rise time of the SP signal may correspond
to an aspirate onset. The inter-signal lag, termed vocal attack METHODS
time (VAT), can be automatically and objectively estimated Simultaneous acquisition of SP and EGG signals was achieved
using a cross-correlation method and is, therefore, free of oper- by use of a headphone-mounted microphone (model 333012;
ator/investigator bias and imprecision. Orlikoff et al21 validated RadioShack, Ft. Worth, TX) positioned approximately 5 cm in
front of the participants lips at approximately 15 off midline
Accepted for publication December 8, 2014.
and an EGG (Glottal Enterprises EG2; Syracuse, NY), respec-
From the *Department of Speech-Language Pathology, New York Medical College, tively. The SP and EGG signals were routed through an ampli-
Valhalla, New York; and the yDepartment of Otolaryngology, New York Medical College,
Valhalla, New York.
fier and digitizer (M-Audio Fast Track; Irwindale, CA) and
Address correspondence and reprint requests to Ben C. Watson, School of Health Sci- digitized (44.1 ksamples/second; 16-bit resolution) to a laptop
ences and Practice, New York Medical College, 30 Plaza West, Valhalla, NY 10595. E-
mail: Watson@nymc.edu
computer for storage and subsequent analysis. Signal acquisi-
Journal of Voice, Vol. 30, No. 1, pp. 11-14 tion used Audacity software (v.3, General Public License,
0892-1997/$36.00
2016 The Voice Foundation
http://audacity.sourceforge.net). Subjects sustained a vowel at
http://dx.doi.org/10.1016/j.jvoice.2014.12.004 comfortable pitch and loudness while gain was adjusted to
12 Journal of Voice, Vol. 30, No. 1, 2016

Step 2, signal segmentation: automated identification of a


600 milliseconds segment of the SP and EGG signals
centered at the approximate time of vocal offset;
Step 3, F0-based frequency filtering and signal modeling:
automated extraction of a representative value of funda-
mental frequency (F0) of the raw EGG signal and band-
pass filtering the 600 milliseconds segment at 40% of the
F0 value. The analytic signal models of the band-pass
filtered SP signal and EGG signal are as follows:
FIGURE 1. Smoothed amplitude functions of the bandpass-filtered
sound pressure (SP) and electroglottographic (EGG) signals, yielding
ASP (t) and AEGG (t) respectively. 
XBP t AX tsin 2pfx tt ; (1)

achieve adequate amplification without clipping while signals where X is SP or EGG, and AX(t) is the instantaneous amplitude
were monitored on a two-channel oscilloscope. and fX(t) is the instantaneous frequency of the modeled band-
The stimuli always and ahh (to represent the sustained passfiltered signals (Figure 1);
vowel, //) were used to elicit unconstrained, albeit typically
unaspirated, voice onsets while the task hallways was used Step 4, extraction of measures: the instantaneous ampli-
to elicit a specifically aspirate onset. The stimuli were printed tudes [AX(t), where X is SP or EGG] were smoothed and
on individual 3 3 5 inch index cards (five cards per task). difference functions [AX 0t AX t  AX t  Dt,
The experimenter shuffled the cards and placed them face- where X is SP or EGG and Dt 10 milliseconds] were
down in front of the subject. The subject was instructed to derived and normalized to represent local variation in the
turn over each card, silently read the task item, and then to SP and EGG amplitude models. The normalized cross-
say the word at a comfortable pitch, loudness, and rate or to sus- correlation of the smoothed difference functions is shown
tain the vowel at a comfortable pitch and loudness for approx- in Figure 2. Here, the peak of the correlation occurs at
imately 2 seconds. Subjects were instructed that this was not a 9.98 milliseconds, which represents the VAT for this token
reaction time task and to pause for several seconds between of hallways.
cards. EGG and SP signals were recorded continuously until
the subject completed all 15 cards. The cards were then re- VAT measures were grouped by sex and task (always, // ,
shuffled, and one or two more trials were recorded. VAT data hallways) and analyzed using a mixed model with sex and
for the vowel task were originally reported by Roark et al.22 task entered as fixed effects. A compound symmetric covariate
SP and EGG signals were stored as stereo wave (.wav) files structure was used for the within subjects repeated measure
and analyzed using custom algorithms implemented in MAT- (token).
LAB (MathWorks, Natick, MA). The signal analysis process
is described in Roark et al22 in sufficient detail to replicate RESULTS
this method and is summarized briefly here. Following Roark et al,24 the linear correlation coefficient, Pear-
The data analysis process consists of four steps, each of son r, computed for the smoothed SP and EGG difference func-
which includes graphical display and optional acoustic play- tions was used as the figure of merit (FOM) for measures of
back. Analysis was allowed to progress to the next stage only VAT. An a priori criterion FOM of greater than 0.75 applied
if graphic display and/or acoustic playback met criteria for to the database resulted in the rejection of 32 (2.8%) of the
acceptable data quality (eg, production of the correct task): 1133 tokens for //, 41 of the 1086 tokens (3.7%) for always,
and 95 of the tokens 971 (9.7%) for hallways. Increased vari-
Step 1, signal verification: signals could be zoomed and ability in the production of aspirate onsets likely led to more
panned and/or played acoustically through external speakers frequent violation of assumptions underlying our FOM crite-
to verify accuracy of task target and fidelity of signal quality; rion. There was no discernable pattern to the rejection of these

FIGURE 2. Difference functions, ASP0 (t) and AEGG0 (t) and the cross-correlation function. The time lag at the peak of cross-correlation provided
the vocal attack time, VAT.
Ben C. Watson, et al Effect of Voice Onset Type on VAT 13

TABLE 1.
Summary Statistics of VAT (Milliseconds)
VAT

Group/ 2575
Subgroup N Mean (SD) Median Percentiles
Females, all 1512 2.66 (5.36) 2.82 0.59 to 6.31
tasks
Males, all 1510 3.55 (4.50) 3.67 0.72 to 6.34
tasks
FIGURE 3. Histograms of VAT measures by task (bin width 1 ms).
All subjects
// 1101 1.98 (5.32) 2.04 0.99 to 5.03
Always 1045 1.85 (4.89) 1.85 1.06 to 4.71
Hallways 876 6.01 (3.21) 5.78 3.97 to 8.19 DISCUSSION
Females Normative VAT values are reported for aspirated and unaspi-
// 519 0.78 (5.34) 0.93 1.81 to 3.83 rated voice onsets. We found no significant difference between
Always 535 0.94 (4.95) 1.08 1.83 to 4.05 females and males for VAT values collapsed across the three
Hallways 458 6.79 (3.08) 6.82 4.83 to 17.09 tasks. There was a significant effect of task on VAT. Adjusted
Males mean VAT values for the vowel and always tasks, both typi-
// 582 3.04 (4.90) 2.95 11.76 to 6.25 cally characterized by an unaspirated onset, differed by
Always 510 2.81 (4.65) 2.91 0.18 to 5.48 0.06 milliseconds. Both sexes showed longer VAT values for
Hallways 418 5.17 (3.14) 4.88 3.33 to 7.18 the hallways taska task phonetically constrained to be
Abbreviation: SD, standard deviation. initiated with an aspirated onset. However, females showed a
larger VAT difference between the unaspirated and aspirated
onsets (approximately 6 milliseconds) than did the males
tokens. As a consequence, the higher rejection rate was unlikely (approximately 2.2 milliseconds).
to bias the data set. This report is based on analysis of the re- Mean VAT for the hallways task in this study was shorter
maining 3022 tokens. than VAT values for the breathy onset condition reported by
Table 1 and Figure 3 summarize descriptive statistics for Orlikoff et al21 for both males (5.17 vs 26.4 milliseconds) and
VAT values by task. Table 2 summarizes the mixed model. females (6.79 vs 17.8 milliseconds). These differences may
The effect of sex was not significant. The effect of task was reside in the fact that subjects in the Orlikoff et al study inten-
significant (F[2, 199] 378.28, P < 0.0001). The adjusted tionally attempted to initiate voicing with a breathy onset. In so
mean VAT for the hallways task (5.99 milliseconds; stan- doing, they likely exaggerated the aspirate onset gesture.
dard error [SE]: 0.29 milliseconds) was greater than the The present findings reveal the sensitivity of the VAT mea-
adjusted mean VAT for the // (1.89 milliseconds; SE: sure to phonetically constrained differences in the voice onset
0.28 milliseconds) and the always (1.83 milliseconds; type. Furthermore, the finding that VAT values for hallways
SE: 0.29 milliseconds) tasks. There was no significant reported here were longer than VAT values for unaspirated on-
VAT difference between the // and always tasks. The sets and shorter than VAT values for the breathy onsets re-
Task by Sex interaction effect was also significant (F[2, ported by Orlikoff et al21 raises the possibility that VAT is
199] 88.26, P < 0.0001). Figure 4 shows both the high sensitive to the physiology of aspiration. That is, the range of
variability of VAT values and the nature of this interaction glottal constriction related to generating a sufficiently high
effect. Females showed smaller mean VAT values than Reynolds number is likely to influence the value of VAT. Sensi-
males for the // and always tasks, but a greater mean tivity of VAT to degree of glottal constriction supports its poten-
VAT value for the hallways task. tial as an objective and quantitative clinical measure of the type

TABLE 2.
Mixed Model of VAT, in Milliseconds: Testing the Fixed Effects of Task With Tokens as Random Effects
Subgroup Adjusted Mean SE of Adjusted Mean Mean Difference SE of Mean Difference t Value
// 1.89 0.28 Reference
Always 1.83 0.29 0.05 0.16 0.36
Hallways 5.98 0.29 4.10 0.17 24.01*
Always 1.83 0.29 Reference
Hallways 5.98 0.29 4.15 0.16 24.47*
*P < 0.0001.
14 Journal of Voice, Vol. 30, No. 1, 2016

7. Steinhauer K, Grayhack JP, Smiley-Oyen AL, Shaiman S, McNeil MR. The


relationship among voice onset, voice quality, and fundamental frequency:
a dynamical perspective. J Voice. 2004;18:432442.
8. Luchsinger R, Arnold G. Voice-Speech-Language. Belmont, CA: Wads-
worth; 1965.
9. Koike Y, von Leden H. Pathologic vocal initiation. Ann Otol Rhinol Lar-
yngol. 1969;78:138148.
10. Leeper HA Jr. Voice initiation characteristics of normal children and chil-
dren with vocal nodules: a preliminary investigation. J Commun Disord.
1976;3:235238.
11. Lamprecht A. Untersuchungen zur Stimmschallparametern zu Beginn der
Phonation. Folia Phoniatr. 1990;42:302311. [German].
12. Pesak J, Urbanek K. Incoordination of phonation start in individuals with
balbuties. Folia Phoniatr. 1993;45:6267.
FIGURE 4. Average (+/ 1 standard deviation) VAT measures by 13. Andrade DF, Heuer R, Hockstein NE, Castro E, Spiegel JR, Sataloff RT.
task and sex. The frequency of hard glottal attacks in patients with muscle tension
dysphonia, unilateral benign masses and bilateral benign masses. J Voice.
2000;14:240246.
of vocal onset. Future work will explore the sensitivity of VAT 14. Goberman AM, Blomgren M. Fundamental frequency change during offset
to a wider range of voice onsets demonstrated in clinical popu- and onset of voicing in individuals with Parkinson disease. J Voice. 2008;
22:178191.
lations. Another question to explore is how the variability of
15. Faaborg-Andersen K. Electromyography of laryngeal muscles. Technics
VAT might impact its utility as a clinical measure. Data reported and results. In: Current problems in phoniatrics and logopedics, vol 3.
here may, in fact, capture the normal range of variability. VAT in Basel, Switzerland: Karger; 1965.
clinical populations might exceed the range reported here. 16. Moore P. Motion picture studies of the vocal folds and vocal attack. J
Speech Disord. 1938;3:235238.
17. Werner-Kukuk E, von Leden H. Vocal initiation: high speed cinemato-
Acknowledgments graphic studies on normal subjects. Folia Phoniatr. 1970;22:107116.
The authors express their appreciation to Dr. Qiuhu Shi for his 18. Orlikoff RF, Kahane JC. Laryngeal structure and function. In: Lass NJ, ed.
assistance with the statistical analysis. Principles of Experimental Phonetics. St. Louis, MO: Mosby; 1996:
112181.
19. Baken RJ, Orlikoff RF. Vocal Fold Adduction Time Estimated From Glotto-
REFERENCES graphic Signals. Presented at the 25th Mid-Winter Meeting of the Associ-
1. Froeschels E. Hygiene of the voice. Arch Otolaryngol. 1943;38:122130. ation for Research in Otolaryngology; St. Petersburg, FL; February 1998.
2. Rees M. Harshness and glottal attack. J Speech Hear Res. 1958;1:344349. 20. Baken RJ, Orlikoff RF. Estimating vocal fold adduction time from EGG and
3. Isshiki N, von Leden H. Hoarseness: aerodynamic studies. Arch Otolar- acoustic records. In: Schutte HK, Dejonckere P, Leezenberg H, Mondelaers
yngol. 1964;80:206213. B, Peters HF, eds. Programme and Abstract Book: 24th IALP Congress,
4. Koike Y, Hirano M, von Leden H. Vocal initiation: acoustic and aero- Amsterdam, The Netherlands; 1998:15.
dynamic investigations of normal subjects. Folia Phoniatr. 1967;19: 21. Orlikoff RF, Deliyski DD, Baken RJ, Watson BC. Validation of a glotto-
173182. graphic measure of vocal attack. J Voice. 2009;23:164168.
5. Peters HFM, Boves L, van Dielen JCH. Perceptual judgment of the abrupt- 22. Roark RM, Watson BC, Baken RJ, Brown DJ, Thomas JM. Measures of
ness of voice onset in vowels as a function of the amplitude envelope. J vocal attack time for healthy young adults. J Voice. 2012;26:1217.
Speech Hear Disord. 1986;51:299308. 23. Ma EPM, Baken RJ, Roark RM, Li PM. Effect of tones on vocal attack time
6. de Krom G. Some spectral correlates of pathological breathy and rough in Cantonese speakers. J Voice. 2012;26:670.e1670.e6.
voice quality for different types of vowel fragments. J Speech Hear Res. 24. Roark RM, Watson BC, Baken RJ. A Figure of Merit for Vocal Attack
1995;38:794811. Time. J Voice. 2012;26:811.

Vous aimerez peut-être aussi