Académique Documents
Professionnel Documents
Culture Documents
*Ben C. Watson, *R. J. Baken, and Rick M. Roark, *yValhalla, New York
Summary: Vocal attack time (VAT) is the time lag between the growth of sound pressure (SP) and electroglottographic
(EGG) signals at vocal initiation. The characteristics of voice initiation are associated with issues of vocal hygiene, ef-
ficiency, and quality. Vocal onsets have commonly been qualitatively characterized into three types: hard, simultaneous,
and breathy. This study examines the effect of voice onset type on VAT values in normal speakers. SP and EGG record-
ings were obtained for 55 female and 57 male subjects while producing multiple tokens of three tasks (sustained // and
always as unaspirated onsets, and hallways as an aspirated onset). Results revealed a significant effect of onset type
on VAT, with the mean VAT for the hallways (aspirated) task greater than the mean VAT for the sustained // and
always (unaspirated) tasks. There was no significant VAT difference between the sustained // and always tasks.
Findings confirm the sensitivity of the VAT measure to vocal onset type and suggest its potential application as an objec-
tive and quantitative clinical measure of the type of vocal onset.
Key Words: Voice onsetVocal attack timePhonationVocal fold physiology.
achieve adequate amplification without clipping while signals where X is SP or EGG, and AX(t) is the instantaneous amplitude
were monitored on a two-channel oscilloscope. and fX(t) is the instantaneous frequency of the modeled band-
The stimuli always and ahh (to represent the sustained passfiltered signals (Figure 1);
vowel, //) were used to elicit unconstrained, albeit typically
unaspirated, voice onsets while the task hallways was used Step 4, extraction of measures: the instantaneous ampli-
to elicit a specifically aspirate onset. The stimuli were printed tudes [AX(t), where X is SP or EGG] were smoothed and
on individual 3 3 5 inch index cards (five cards per task). difference functions [AX 0t AX t AX t Dt,
The experimenter shuffled the cards and placed them face- where X is SP or EGG and Dt 10 milliseconds] were
down in front of the subject. The subject was instructed to derived and normalized to represent local variation in the
turn over each card, silently read the task item, and then to SP and EGG amplitude models. The normalized cross-
say the word at a comfortable pitch, loudness, and rate or to sus- correlation of the smoothed difference functions is shown
tain the vowel at a comfortable pitch and loudness for approx- in Figure 2. Here, the peak of the correlation occurs at
imately 2 seconds. Subjects were instructed that this was not a 9.98 milliseconds, which represents the VAT for this token
reaction time task and to pause for several seconds between of hallways.
cards. EGG and SP signals were recorded continuously until
the subject completed all 15 cards. The cards were then re- VAT measures were grouped by sex and task (always, // ,
shuffled, and one or two more trials were recorded. VAT data hallways) and analyzed using a mixed model with sex and
for the vowel task were originally reported by Roark et al.22 task entered as fixed effects. A compound symmetric covariate
SP and EGG signals were stored as stereo wave (.wav) files structure was used for the within subjects repeated measure
and analyzed using custom algorithms implemented in MAT- (token).
LAB (MathWorks, Natick, MA). The signal analysis process
is described in Roark et al22 in sufficient detail to replicate RESULTS
this method and is summarized briefly here. Following Roark et al,24 the linear correlation coefficient, Pear-
The data analysis process consists of four steps, each of son r, computed for the smoothed SP and EGG difference func-
which includes graphical display and optional acoustic play- tions was used as the figure of merit (FOM) for measures of
back. Analysis was allowed to progress to the next stage only VAT. An a priori criterion FOM of greater than 0.75 applied
if graphic display and/or acoustic playback met criteria for to the database resulted in the rejection of 32 (2.8%) of the
acceptable data quality (eg, production of the correct task): 1133 tokens for //, 41 of the 1086 tokens (3.7%) for always,
and 95 of the tokens 971 (9.7%) for hallways. Increased vari-
Step 1, signal verification: signals could be zoomed and ability in the production of aspirate onsets likely led to more
panned and/or played acoustically through external speakers frequent violation of assumptions underlying our FOM crite-
to verify accuracy of task target and fidelity of signal quality; rion. There was no discernable pattern to the rejection of these
FIGURE 2. Difference functions, ASP0 (t) and AEGG0 (t) and the cross-correlation function. The time lag at the peak of cross-correlation provided
the vocal attack time, VAT.
Ben C. Watson, et al Effect of Voice Onset Type on VAT 13
TABLE 1.
Summary Statistics of VAT (Milliseconds)
VAT
Group/ 2575
Subgroup N Mean (SD) Median Percentiles
Females, all 1512 2.66 (5.36) 2.82 0.59 to 6.31
tasks
Males, all 1510 3.55 (4.50) 3.67 0.72 to 6.34
tasks
FIGURE 3. Histograms of VAT measures by task (bin width 1 ms).
All subjects
// 1101 1.98 (5.32) 2.04 0.99 to 5.03
Always 1045 1.85 (4.89) 1.85 1.06 to 4.71
Hallways 876 6.01 (3.21) 5.78 3.97 to 8.19 DISCUSSION
Females Normative VAT values are reported for aspirated and unaspi-
// 519 0.78 (5.34) 0.93 1.81 to 3.83 rated voice onsets. We found no significant difference between
Always 535 0.94 (4.95) 1.08 1.83 to 4.05 females and males for VAT values collapsed across the three
Hallways 458 6.79 (3.08) 6.82 4.83 to 17.09 tasks. There was a significant effect of task on VAT. Adjusted
Males mean VAT values for the vowel and always tasks, both typi-
// 582 3.04 (4.90) 2.95 11.76 to 6.25 cally characterized by an unaspirated onset, differed by
Always 510 2.81 (4.65) 2.91 0.18 to 5.48 0.06 milliseconds. Both sexes showed longer VAT values for
Hallways 418 5.17 (3.14) 4.88 3.33 to 7.18 the hallways taska task phonetically constrained to be
Abbreviation: SD, standard deviation. initiated with an aspirated onset. However, females showed a
larger VAT difference between the unaspirated and aspirated
onsets (approximately 6 milliseconds) than did the males
tokens. As a consequence, the higher rejection rate was unlikely (approximately 2.2 milliseconds).
to bias the data set. This report is based on analysis of the re- Mean VAT for the hallways task in this study was shorter
maining 3022 tokens. than VAT values for the breathy onset condition reported by
Table 1 and Figure 3 summarize descriptive statistics for Orlikoff et al21 for both males (5.17 vs 26.4 milliseconds) and
VAT values by task. Table 2 summarizes the mixed model. females (6.79 vs 17.8 milliseconds). These differences may
The effect of sex was not significant. The effect of task was reside in the fact that subjects in the Orlikoff et al study inten-
significant (F[2, 199] 378.28, P < 0.0001). The adjusted tionally attempted to initiate voicing with a breathy onset. In so
mean VAT for the hallways task (5.99 milliseconds; stan- doing, they likely exaggerated the aspirate onset gesture.
dard error [SE]: 0.29 milliseconds) was greater than the The present findings reveal the sensitivity of the VAT mea-
adjusted mean VAT for the // (1.89 milliseconds; SE: sure to phonetically constrained differences in the voice onset
0.28 milliseconds) and the always (1.83 milliseconds; type. Furthermore, the finding that VAT values for hallways
SE: 0.29 milliseconds) tasks. There was no significant reported here were longer than VAT values for unaspirated on-
VAT difference between the // and always tasks. The sets and shorter than VAT values for the breathy onsets re-
Task by Sex interaction effect was also significant (F[2, ported by Orlikoff et al21 raises the possibility that VAT is
199] 88.26, P < 0.0001). Figure 4 shows both the high sensitive to the physiology of aspiration. That is, the range of
variability of VAT values and the nature of this interaction glottal constriction related to generating a sufficiently high
effect. Females showed smaller mean VAT values than Reynolds number is likely to influence the value of VAT. Sensi-
males for the // and always tasks, but a greater mean tivity of VAT to degree of glottal constriction supports its poten-
VAT value for the hallways task. tial as an objective and quantitative clinical measure of the type
TABLE 2.
Mixed Model of VAT, in Milliseconds: Testing the Fixed Effects of Task With Tokens as Random Effects
Subgroup Adjusted Mean SE of Adjusted Mean Mean Difference SE of Mean Difference t Value
// 1.89 0.28 Reference
Always 1.83 0.29 0.05 0.16 0.36
Hallways 5.98 0.29 4.10 0.17 24.01*
Always 1.83 0.29 Reference
Hallways 5.98 0.29 4.15 0.16 24.47*
*P < 0.0001.
14 Journal of Voice, Vol. 30, No. 1, 2016