Vous êtes sur la page 1sur 14

Voice-Vibratory Assessment With Laryngeal Imaging

(VALI) Form: Reliability of Rating Stroboscopy and


High-speed Videoendoscopy
*Bruce J. Poburka, Rita R. Patel, and Diane M. Bless, *Mankato, Minnesota, Bloomington, Indiana, and Madison, Wisconsin

Summary: Objective. The purpose of the study was to evaluate the inter-judge and intra-judge reliability of raters
using the Voice-Vibratory Assessment with Laryngeal Imaging (VALI) rating form that was developed for assessing
videostroboscopy and high-speed videoendoscopic (HSV) recordings.
Subjects and Methods. Nine speech-language pathologists with an average of 12.8 years of experience with la-
ryngeal imaging were trained to use the VALI form for rating 66 de-identified and randomized samples with voice
disorders. Inter-judge reliability for parameters with scale data (amplitude, mucosal wave, nonvibratory portion, su-
praglottic activity, phase closure, symmetry, and regularity or periodicity) was assessed with intraclass correlations,
and parameters with nominal data (glottal closure, vertical level, and free edge contour) were assessed with Fleiss
kappa. Intra-judge reliability was assessed using the Spearman rho statistic for scale data and percentage of concor-
dant pairs for nominal data.
Results. Inter-judge reliability for parameters with scale data ranged from 0.57 to 0.96 for stroboscopy and from 0.81
to 0.94 for HSV. For nominal parameters, correlations ranged from 0.18 to 0.35 for stroboscopy and from 0.13 to 0.33
for HSV. Intra-judge reliability correlations for parameters with scale data ranged from 0.19 to 0.87 for stroboscopy
and from 0.28 to 0.85 for HSV. For parameters with nominal data, percentage of concordance ranged from 44% to
78% for stroboscopy and from 52% to 89% for HSV.
Conclusions. The VALI rating form and the training protocol is a first, a priori developed rating form that includes
visual-perceptual ratings of both stroboscopy and HSV. The current form can be used to make reliable visual-
perceptual judgments for selected features of vibratory motion from stroboscopy and HSV.
Key Words: Laryngeal imagingStroboscopyHigh-speed videoendoscopyVisual-perceptual assessmentClinical
judgments and training.

INTRODUCTION vocal folds continues to be commonly used. Although many


Stroboscopy and high-speed videoendoscopy (HSV) reveal in- visual-perceptual forms have been developed a priori as an in-
formation about the complex interaction among laryngeal strument for rating videostroboscopy,1720 nearly none exist for
structures, aerodynamic forces, tissue mechanics, and muscle use HSV. This may be because HSV technology is used less fre-
patterns.1,2 In theory, the resulting vibratory dynamics obtained quently than videostroboscopy, that users are relying on software
from stroboscopy and HSV can be evaluated both quantita- for quantification, or that clinicians consider HSV still to be ex-
tively and qualitatively, but both have problems. Quantitative perimental. Regardless of the underlying rationale, the lack of
evaluations of videostroboscopy37 and HSV815 are available but standardization is troublesome because HSV is being increas-
have not been widely adopted in the clinic because the custom- ingly used in voice clinics as a means to overcome some of the
developed software are not readily available and are too time- challenges arising from videostroboscopy, such as poor tempo-
consuming for most daily clinical practices.16 Qualitative ral resolution and poor pitch extraction or tracking of severely
evaluations are fraught with problems because of the lack of stan- impaired voices.14,2129 The increased temporal resolution of HSV
dards for intra- and inter-institution comparisons and intra- and compared with stroboscopy (eg, 8000 fps versus 30 fps) can be
inter-clinician judgments. used to capture vibratory phenomena in individuals with severe
Nevertheless, for clinical purposes, qualitative judgments are disturbances of voice. Without standards and rating reliability,
used routinely. Evaluation of laryngeal imaging via visual- however, the potential of HSV to overcome stroboscopy short-
perceptual ratings of anatomical markers and movement comings cannot fully be realized.
parameters reflecting both the structure and the function of the It is well accepted that for visual-perceptual ratings to be
meaningful, they must achieve satisfactory levels of inter- and
Accepted for publication December 2, 2016.
Poster presentation at the American Speech-Language-Hearing Associations Annual Con-
intra-rater reliability. Numerous rater- and psychometric scale-
vention, 2015. related factors have been reported to influence the visual-
From the *Department of Speech, Hearing, & Rehabilitation Services, Minnesota State
University, Mankato, Minnesota; Department of Speech and Hearing Sciences, Indiana
perceptual ratings of laryngeal imaging; however, only a few have
University, Bloomington, Indiana; and the Department of Surgery Division of Otolaryn- been subjected to empirical investigations for visual-perceptual
gology, School of Medicine and Public Health, University of Wisconsin Madison, Madison,
Wisconsin.
ratings of videostroboscopy. For research purposes, Rosen20 sug-
Address correspondence and reprint requests to Bruce J. Poburka, Communication gested limiting raters to individuals with greater than 0.80 intra-
Disorders, Minnesota State University, Clinical Sciences Bldg. 326, Mankato, MN 56001.
E-mail: bruce.poburka@mnsu.edu
rater reliability and that pre-study statistical evaluation of the
Journal of Voice, Vol. 31, No. 4, pp. 513.e1513.e14 variability of the data be accounted for.20 Clinically, the relia-
0892-1997
2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
bility of the visual-perceptual ratings has been reported to have
http://dx.doi.org/10.1016/j.jvoice.2016.12.003 been enhanced by a number of factors such as the use of a
513.e2 Journal of Voice, Vol. 31, No. 4, 2017

multimedia computer-based rating form,18 use of a calibrated grid- praglottal activity, phase closure, symmetry, regularity or
tracking method,30 and by controlling for factors like clear periodicity, glottal closure, vertical level, and free edge contour.
description of imaging parameters and rater training. Poburka
and Bless18 reported that after 45 hours of training with METHODS
computer-based rating form, the subjects with no prior experi-
Samples
ence with rating stroboscopic videos completed ratings that
A total of 30 (males = 10, females = 19, child = 1) participants
correlated highly with those of experts. Despite decades of
with dysphonia provided video samples for the study. The par-
videostroboscopy use in the clinic, there are still no standards
ticipants with dysphonia were recruited at the University of
for rating parameters or reliability of clinical judgments, as the
Kentucky Voice Clinic and the Vocal Physiology and Imaging
existing forms vary in the combination of the parameters used.
Laboratory after completion of institutional review board-
The rating scales that the forms use also vary from being a three-
approved consent or assent forms. The participants age range
point rating scale to being a five-point rating scale. Additionally,
was 1089 years, with an average age of 49.8 years. Table 1 lists
the available forms are constrained by a lack of clear defini-
the participants characteristics in terms of age, gender, and vocal
tion of imaging parameters, structured observer training, and
pathology. Auditory-perceptual rating of participants voice quality
observer bias.1820,3032
To the best of our knowledge, literature on a priori develop- was performed using the Grade, Roughness, Breathiness, As-
ment of a rating form for visual-perceptual rating and factors related thenia, Strain (GRBAS) scale 35 at the time of the initial
to improving the reliability of ratings with high-speed video is scant. examination. The GRBAS scale is a perceptual evaluation scale
HSV has been used to evaluate vibratory characteristics in healthy in which raters evaluate grade, roughness, breathiness, asthe-
subjects,33 in various vocal pathologies,21,23 and to differentiate in- nia, and strain. Each parameter of the GRBAS was rated using
dividuals with healthy voice from those with vocal pathologies,34 a 4-point scale on which 0 = normal, 1 = mild abnormality,
but the reliability of the qualitative ratings has varied widely, like 2 = moderate abnormality, and 3 = severe abnormality. The par-
perceptual judgments of the auditory signal. A high degree of re- ticipants fall under different groups according to their overall
liability is required for generalizability of the findings. Although grade of dysphonia (G): 0, n = 3; grade 1, n = 6; grade 2, n = 9;
there may be multiple factors contributing to the variance, two and grade 3, n = 12. The participants with an overall grade of
likely factors are (1) rating parameters, which often were se- 0 had chief complaints of either vocal effort, vocal fatigue, or
lected from those used to rate videostroboscopy, despite the difficulty with high notes during singing.
differences resulting from the improved resolution that dictate dif- Videostroboscopic and HSV recordings were captured using
ferences in observation; and (2) raters, who often have limited PENTAX Medical (Montvale, NJ) models RLS 9100b (strobo-
observational training specific to HSV. Consequently, we main- scope with camera) and 9710 (high speed video system with
tain that the first step in systematic investigation of factors camera) respectively. Recordings were captured at participants
influencing the visual-perceptual ratings of HSV is the develop- self-selected typical pitch and loudness on sustained phonation
ment of a standardized form and rater training. Once accomplished, of the vowel /i/. A rigid 70-degree endoscope was used to acquire
subsequent studies could investigate specific factors of influence the recordings. Topical anesthesia was not used for any of the par-
such as type of pathology, clinical experience, and concurrent in- ticipants. Recordings with videostroboscopy were captured at the
formation obtained from case history and auditory signal. standard 30 frames per second with spatial resolution of 640 480
To mitigate the problems discussed above, in this paper, a new pixels, whereas the black and white recordings with HSV were
rating form and training protocol that incorporates visual- captured at 4000 frames per second with a spatial resolution of
perceptual ratings of both videostroboscopy (Figure 1) and HSV 256 512 pixels. HSV recordings were captured using the black
(Figure 2) is introduced. This new visuoperceptual assessment and white camera instead of the colored camera because of su-
tool is called the Voice-Vibratory Assessment with Laryngeal perior light and improved clarity (spatial resolution) of the video
Imaging (VALI) rating form. The VALI form was developed as with black and white recordings compared with color record-
an update to the Stroboscopy Examination Rating Form19 with ings with the system used in the study.23,24 The HSV and the
the addition of rating for HSV. The rating instrument and the videostroboscopic recordings were captured during the same visit
rater training are inseparable. Together, they have a number of using identical instructions, physical position, and endoscope.
features that are intended to improve reliability of ratings for
both stroboscopy and HSV. For each rating parameter, the VALI Raters
form features (1) a definition of what is being rated, (2) rating Based on a power analysis, it was determined that nine raters
instructions that are unique to the parameter, (3) high-quality were needed to have sufficient power to address the questions
graphics to illustrate the structural or vibratory phenomenon being posed. Power analysis indicated an 80% chance of detecting a
rated, and (4) video samples (used with training). The purpose significant difference at the 0.05 level of significance with nine
of the current study was to evaluate inter-rater and intra-rater raters. Nine speech-language pathologists were recruited to serve
reliability of raters who used the VALI form and training pro- as raters. Speech-language pathologists were selected to serve
tocol for rating videostroboscopy and HSV videos. Inter-rater as raters because of their primary interest in evaluating the vi-
and intra-rater reliability was evaluated for sequentially ob- bratory pattern, including movements, timing patterns, and
tained recordings of videostroboscopy and HSV on the following judgments about glottal closure.36 Furthermore, they were able
parameters: amplitude, mucosal wave, nonvibratory portion, su- to devote the many hours needed to complete the training and
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e3

FIGURE 1. VALI rating form for stroboscopy. Voice-Vibratory Assessment with Laryngeal Imaging (VALI)Stroboscopy (Poburka, B., Patel,
R., and Bless, D. 2016).
513.e4 Journal of Voice, Vol. 31, No. 4, 2017

Figure 1.
(continued )

ratings. The average experience of the raters with laryngeal perceptual assessment was exclusively with videostroboscopy;
imaging was 12.8 years, with a range of 225 years. None of they had no prior experience with HSV (Table 2). All raters com-
the authors participated as raters for the study. Before partici- pleted institutional review board-approved informed consent
pation in this study, the raters prior experience with visual- before participating in the study.
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e5

FIGURE 2. Voice-Vibratory Assessment with Laryngeal Imaging (VALI) for high-speed videoendoscopy.
513.e6 Journal of Voice, Vol. 31, No. 4, 2017

Figure 2.
(continued )

Rater training web conferencing software (GoToMeeting v6.4.10; v7.1.3, Ft.


A group-training format was used to train the raters to use the Lauderdale, FL). Five raters completed the training face to face,
VALI form. Two types of training sessions were available: one three raters completed the training online, and one rater com-
was held face to face, and the other was presented online using pleted the training with a combination of both formats. Regardless
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e7

Figure 2.
(continued )

of format, all training sessions used identical materials, the same The parameters selected for rating stroboscopic and HSV re-
presenters, and a similar amount of time. The training involved cordings were based on the parameters suggested in the literature
defining each parameter (Table 3), providing video samples of as those commonly used for making visual-perceptual ratings.1,17,23
the end points for each feature for both stroboscopy and HSV, The training duration ranged from 2 to 2.5 hours. At the con-
providing instruction on the use of the VALI form, and provid- clusion of training, raters were provided with a booklet containing
ing an opportunity to ask questions in an open discussion about rating forms for each case, instructions for accessing the online
the rating process. For each parameter, the raters were provided videos via secure link, suggestions for the rating environment
a definition and a video example of the end points for each feature.23 and pace, and contact information for the principal investigator.
513.e8 Journal of Voice, Vol. 31, No. 4, 2017

TABLE 1.
Age, Gender, Voice Diagnosis, and Overall Grade of Dysphonia on the GRBAS Scale of the Video Samples
Number Age (y) Gender Diagnosis Overall Dysphonia Grade*
1. 10 M Vocal nodules 3
2. 19 M Laryngopharyngeal reflux 1
3. 27 F Muscle tension dysphonia 1
4. 32 M Right vocal fold laryngocele 1
5. 37 F Right vocal fold cyst/left vocal fold reactive lesion 2
6. 37 M Bilateral nodules 0
7. 39 F Muscle tension dysphonia 3
8. 42 F Bilateral vocal fold nodules 2
9. 44 F Bilateral vocal fold edema and erythema 2
10. 45 F Bilateral vocal fold nodules 2
11. 45 F Left vocal fold cyst and erythema 3
12. 47 F Right vocal fold scarring 3
13. 48 M Laryngopharyngeal reflux 1
14. 49 F Left vocal fold paresis 2
15. 50 F Muscle tension dysphonia 3
16. 51 F Right vocal fold scarring/left vocal fold cyst 3
17. 51 M Right vocal fold scarring/left vocal fold granuloma 0
18. 51 M Left vocal fold granuloma 0
19. 53 F Interarytenoid scarring 1
20. 57 F Muscle tension dysphonia 3
21. 57 F Left vocal fold paralysis 2
22. 59 F Presbylarynx/left vocal fold scarring 2
23. 60 F Left vocal fold paralysis 3
24. 60 M Left vocal fold cyst 1
25. 62 F Presbylarynx 2
26. 62 M Bilateral vocal fold scarring 3
27. 64 F Laryngeal tremor 3
28. 70 M Left vocal fold paralysis and bilateral scarring 3
29. 77 F Laryngeal papilloma 3
30. 89 F Presbylarynx and mild tremor 2
* 0 = normal, 1 = mild, 2 = moderate, 3 = severe overall grade (G) or degree of dysphonia.

Rating process with HSV recordings at 20 frames per second. Hence, to make
Raters were blinded to the diagnosis and GRBAS ratings and the recordings comparable between HSV and videostroboscopy,
were unfamiliar with the rating form before the training. Ratings the audio from the videostroboscopic recording was disabled.
were conducted on sustained phonations. To make the videos The removal of the audio from the videostroboscopic record-
as comparable as possible, sustained phonation at self-selected ing also would prevent any adverse influence or bias from the
habitual pitch and loudness was segmented from both the stro- quality of the voice during visual-perceptual rating of the vi-
boscopic and the HSV recordings. All recordings had complete bratory characteristics.
view of the glottis without epiglottal obstruction. The HSV re- Raters accessed the video clips via online streaming from a
cordings had a playback rate of 20 frames per second, whereas secure website that was available only to them. There were 30
the stroboscopic recordings were viewed at the standard 30 frames de-identified and randomized voice cases consisting of mixed
per second. Because viewing of the HSV recordings requires a disorders. None of the cases were repeats from the training
slower frame rate, it is impossible to play simultaneous audio session. Each case consisted of a pair of video clips, one that
was recorded using stroboscopy and the other with HSV, re-
sulting in a total of 60 video files. The presentation of these 60
TABLE 2. files was randomized. The raters evaluated a total of 66 ran-
Training Method and Rater Experience With Laryngeal domly sequenced video clips. For purposes of assessing intra-
Imaging judge reliability, 10% (three cases; six clips or files) were repeated
and randomly distributed among the 60 files, resulting in a total
Training Number of Years of
Method Raters Experience of 66 video clips. Two separate forms were provided for each
of the 33 video files, one form for rating videostroboscopy and
Face to face 5 225
the other for rating the HSV recordings. The following param-
Webinar 4 219
eters were evaluated for rating videostroboscopic recordings:
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e9

TABLE 3.
Definition of the Parameters
Features Definition
1. Glottal closure Appearance of the glottis during the most closed portion of the glottal cycle
2. Amplitude Magnitude of lateral movement of the vocal folds
3. Mucosal wave Magnitude of lateral movement of the mucous membrane
4. Vertical level The level difference between the two folds at maximum point of contact
5. Nonvibrating portion Adynamic segments of tissue that appear stiff
6. Supraglottic activity Constriction of the supraglottic structures
7. Free edge contour Appearance of the free edge of the vocal folds during maximum abduction
8. Phase closure Relative duration of consecutive glottal cycles or stroboscopic cycles
9. Phase symmetry The degree of symmetry between the left and the right vocal folds in terms of
opening and closing
10. Regularity* Consistency of averaged stroboscopic cycles
11. Nonvibratory observations Structural observations not mentioned elsewhere (eg, mucous, varices, erythema)
Additional features for high-speed videoendoscopy
1. Glottal cycle periodicity Degree of consistency of the glottal cycles
2. Other voice vibratory patterns Vibratory motions not mentioned elsewhere (eg, vocal fry, oscillatory breaks)
* Regularity was not rated using high-speed videoendoscopy.

glottic closure, amplitude, mucosal wave, vertical level, portion, supraglottic activity, phase symmetry, and regularity
nonvibrating portion, supraglottic activity, free edge contour, reg- or periodicity), ratings were input directly with no modifica-
ularity, phase symmetry, and phase closure (Figure 1). HSV ratings tion. For parameters with nominal data (glottal closure, vertical
were performed on the following parameters: glottal closure, level, and free edge contour), ratings were converted using a
mucosal wave, amplitude, vertical level, nonvibrating portion, coding scheme to facilitate statistical analysis. The coding
supraglottic activity, free edge contour, glottal cycle periodici- scheme assigned a number to each possible rating. For example,
ty, phase symmetry, and phase closure (Figure 2). Although both a rating of complete glottal closure was coded as 1, a rating
forms provided a space for reporting nonvibratory observa- of anterior gap was coded as 2, etc. The parameter of phase
tions (eg, erythema, varices, mucous), any such observations were closure required raters to place a hash mark along a continu-
not subjected to statistical analysis as the goal of the study was ous scale ranging from open phase predominates to closed
to evaluate inter- and intra-rater reliability on nine commonly phase predominates. To convert the location of the hash mark
reported parameters of vocal fold structure and function. For each to scale data, a transparent overlay was created to allow the
parameter, the forms provided a concise definition, high- location of the hash mark to be converted to a number. The
quality graphic, and/or an image sequence taken from video. The scale on the overlay ranged from +10 (closed phase predomi-
stroboscopic form was in color, whereas the HSV form was in nates) to 10 (open phase predominates). The midpoint of the
black and white to parallel the recordings to be rated. In in- scale was zero, which corresponded to a rating of nearly
stances when the raters were unable to rate a specific parameter equal.
because of stroboscopic tracking errors or physiological ob- Ratings for inter-judge reliability were assessed using SPSS
structions (eg, false vocal fold compression, prolapsed arytenoids), Statistics 22 (IBM, Armonk, NY). The parameters of ampli-
the raters were instructed to write could not rate for the spe- tude, mucosal wave, nonvibrating portion, supraglottic activity,
cific parameter. phase closure, symmetry, and regularity produce scale data. They
Ratings were made on a self-paced basis until all ratings were were assessed with the intraclass correlation (ICC). This statis-
completed. The rating period was not regulated; time from train- tic was chosen to measure reliability across the nine raters because
ing to completion of the ratings ranged from several weeks to it analyzes data that are structured as groups rather than as paired
several months depending on the rater. If rating questions or di- observations.
lemmas were encountered, the raters were instructed to contact Ratings of glottal closure, vertical level, and free edge contour
the authors for clarification of rating procedures, but no case- produce nominal data. Accordingly, the Fleiss kappa statistic
specific advice was provided. Only one of the raters had questions: was used because it measures the reliability of ratings for a fixed
one related to the raters long-standing uncertainty about evalu- number of raters who are assigning their observations to cat-
ating mucosal wave, and another was seeking clarification about egories. Intra-judge reliability was assessed with the Spearman
the nature of rating nonvibrating portion. rho statistic for parameters with scale data. Spearman rho is a
nonparametric test that does not assume that the data are nor-
Statistical analyses mally distributed, which was the case in this study.
All ratings were entered into a spreadsheet using Microsoft The parameters with nominal data required a nonparametric
Excel (v14.5.7; Microsoft Corporation, Redmond, WA). For test. Accordingly, intra-judge agreement was evaluated using per-
parameters with scale data (amplitude, mucosal wave, nonvibrating centage of concordance. This is a simple count of how often the
513.e10 Journal of Voice, Vol. 31, No. 4, 2017

specified parameters. The list-wise deletion reduced the sample


TABLE 4.
Percent of Cases by Parameter That Were Excluded in Cal-
size for inter-judge reliability. The reduced sample size in-
culation of Inter-judge Reliability (List-wise Deletion) for creased the width of the confidence interval (CI). For selected
Stroboscopy and High-speed Videoendoscopy (HSV); parameters, the CI was either too wide to allow valid interpre-
Average Grade of Dysphonia G on GRBAS Rating for tation or the correlations should be interpreted with caution. The
Cases Where Three or More Raters Could Not Rate four affected parameters were (1) amplitude (left and right), (2)
mucosal wave (left and right), (3) supraglottic activity
Parameter Stroboscopy HSV
(mediolateral), and (4) nonvibrating portion (left) (Table 5).
Amplituderight 52 25 Inter-judge reliability for parameters with scale data ranged
Amplitudeleft 52 22 from 0.57 to 0.96 for stroboscopy and from 0.81 to 0.94 for HSV
Mucosal waveright 85 31
(Table 5). HSV parameters (n = 11) with scale data showed ICC
Mucosal waveleft 85 34
Nonvibrating portionright 70 28
that were very strong, indicating strong agreement between the
Nonvibrating portionleft 61 31 raters. With stroboscopy, amplitude (right), mucosal wave (right),
Supraglottic 43 46 and mucosal wave (left) showed moderate ICC; amplitude (left),
activityanteroposterior nonvibrating portion (left), and supraglottic activity (mediolateral)
Supraglottic activitymediolateral 61 49 showed strong ICC; and the remainder of the parameters showed
Phase closure 46 19 very strong ICC (Table 5). Statistical significance (P 0.05) was
Symmetry 49 19 achieved for all parameters regardless of imaging method, in-
Regularity 40 7 dicative of consistency (conformity) of the measurements made
Average grade of dysphonia G 2.92 1.75 by multiple raters. For nominal parameters (n = 5), correla-
on the GRBAS rating for cases tions ranged from 0.18 to 0.35 for stroboscopy and from 0.13
where three or more raters could
to 0.33 for HSV (Table 6).
not rate at least one parameter
Intra-judge reliability correlations for parameters with scale
data (n = 11) ranged from 0.19 to 0.87 for stroboscopy and from
0.28 to 0.85 for HSV (Table 7). Statistical significance (P 0.05)
judges achieved exact agreement with themselves when rating was achieved for 7 out of 11 parameters on stroboscopy, whereas
the same case twice. significance was achieved for 9 out of 11 parameters for HSV.
On stroboscopy, five out of seven parameters (amplitude
RESULTS right, amplitudeleft, mucosal waveright, phase closure, and
In the current study, 40%85% of the cases involved rating dif- regularity) achieved moderate values of the Spearman rho cor-
ficulty where one or more raters were unable to rate a parameter relation coefficient, whereas the correlation was strong for the
(stated could not rate) because of stroboscopic tracking errors parameter of nonvibrating portion (right) and was very strong
or obstructions resulting from either abnormal or compensato- for the parameter of nonvibrating portion (left). For param-
ry conditions (Table 4). When a rater is unable to complete a eters with nominal data (n = 4), percentage of concordance ranged
rating, the ICC requires omission of the case from analysis (list- from 44% to 78% for stroboscopy and from 52% to 89% for
wise deletion), because of missing values in at least one of the HSV (Table 8).

TABLE 5.
Inter-judge Reliability Using Intraclass Correlation (ICC), Degrees of Freedom (df), Confidence Interval (CI), and F value
(F) for Stroboscopy and High-speed Videoendoscopy (HSV)
Stroboscopy HSV
Parameter ICC* df F CI (95%) ICC* df F CI (95%)
Amplituderight 0.64 15 4.79 .360, .845 0.83 24 11.54 .671, .919
Amplitudeleft 0.79 15 7.84 .587, .914 0.81 25 10.47 .646, .910
Mucosal waveright 0.57 4 3.39 .032, .940 0.87 22 11.79 .759, .939
Mucosal waveleft 0.59 4 3.16 .002, .946 0.82 21 9.00 .683, .920
Nonvibrating portionright 0.96 9 38.11 .905, .989 0.90 23 13.18 .823, .951
Nonvibrating portionleft 0.69 12 4.07 .400, .884 0.87 22 10.36 .770, .938
Supraglottic activityanteroposterior 0.93 18 18.8 .871, .970 0.89 17 11.42 .796, .953
Supraglottic activitymediolateral 0.75 12 5.73 .503, .907 0.85 16 8.09 .723, .938
Phase closure 0.88 17 10.04 .791, .951 0.85 26 8.96 .751, .925
Symmetry 0.91 16 11.76 .833, .963 0.94 26 17.32 .902, .969
Regularity/Periodicity 0.96 19 28.31 .936, .984 0.91 30 11.58 .860, .953
* All correlations were significant (P 0.05).

Wide CI; not interpretable.

Interpret with caution.
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e11

judge reliability in this study are higher than that of our prior
TABLE 6.
Inter-judge Reliability for Parameters With Nominal Data
study comparing videostroboscopy with HSV.23 This unique
Using Fleiss Kappa () for Stroboscopy and High-speed finding of high inter- and intra-judge reliability of HSV com-
Videoendoscopy (HSV) pared with videostroboscopy may be attributed to form specific
training and clear definitions of the parameters on the form. In
Stroboscopy HSV
Patel et al,23 a total of 126 participants (252 video clips) were
Parameter Strength Strength evaluated using visuoperceptual ratings by three raters to in-
Glottal closure 0.35 Fair 0.33 Fair vestigate the clinical value of HSV. Even though the Patel et al23
Vertical level 0.19 Slight 0.13 Slight study had a larger number of cases, the rating forms were simple
Free edge contourright 0.18 Slight 0.21 Fair black and white categorical or visual analog rating scales without
Free edge contourleft 0.21 Fair 0.25 Fair clear definitions on the rating forms. In addition to the clarity
of the rating form, the increased reliability in this study could
be attributed to higher sampling rate of 4000 fps compared with
DISCUSSION
2000 fps in the Patel et al study.23 Deliyski et al37 reported that
This study revealed that the VALI form and training can be used
visual-perceptual rating of the parameter glottal edge was not
to make reliable clinical judgments of at least 8 of 15 param-
affected at frame rates of 4000 or higher, and parameters of
eters on stroboscopy and 11 of 15 parameters on high-speed video
mucosal wave, aperiodicity, and vocal fold contact were mini-
recordings.
mally affected between 4000 and 5333 frames per second, in a
Compared with stroboscopy, HSV had higher levels of inter-
study where three raters evaluated 14 patients and 14 controls
judge and intra-judge reliability for nearly all parameters. This
with HSV at playbacks of 60 and 30 fps. Even though there are
finding is similar to the finding of higher inter-judge agree-
subtle differences in definition of parameters between the current
ment for HSV (54%) compared with videostroboscopy (42%)
study and the study by Deliyski et al,37 comparable high intra-
reported by Olthoff et al.21 Olthoff et al21 did not report intra-
rater reliability was obtained for mucosal wave, regularity, and
judge reliability. Moreover, the values of inter-judge and intra-
glottal closure in the current study using HSV captured at 4000
fps with a playback of 20 fps.
TABLE 7. In the current study, when HSV was compared with
Intra-judge Reliability Using Spearman Rho (r s ) for stroboscopy, there were considerably fewer instances where raters
Stroboscopy and High-speed Videoendoscopy (HSV) could not rate a parameter when using HSV (Table 4). The dis-
Stroboscopy HSV
crepancy is likely related to differences in how the two techniques
Parameter rs rs obtain images.21,23,38 With HSV, there is no reliance on pitch ex-
traction, and temporal resolution is very high. With stroboscopy,
Amplituderight 0.58* 0.70*
proper tracking is dependent on accurate pitch extraction, and
Amplitudeleft 0.52* 0.84*
temporal resolution is lower because the images are obtained by
Mucosal waveright 0.59* 0.81*
Mucosal waveleft 0.39 0.45* sampling. Consequently, stroboscopy is more prone to prob-
Nonvibrating portionright 0.66* 0.67* lems when moderate to severe or severe dysphonia interferes with
Nonvibrating portionleft 0.87* 0.43* tracking. In this study, when three or more raters could not rate
Supraglottic 0.19 0.28 at least one parameter using stroboscopy, the average overall grade
activityanteroposterior of dysphonia on the GRBAS35 scale was 2.92, which corre-
Supraglottic activitymediolateral 0.39 0.84* sponds to moderate to severe or severe dysphonia (Table 4). The
Phase closure 0.48* 0.34 cutoff of three or more raters out of a total of nine raters was
Symmetry 0.42 0.85* empirically determined to calculate the overall grade of dys-
Regularity 0.52* 0.54* phonia, as it would be unlikely that factors unique to an individual
* Significant correlations (P 0.05). rater could be a factor for the rating could not rate. The pa-
rameters with highest instances of rating difficulty on stroboscopy
were mucosal wave (left and right), nonvibrating portion (left
and right), amplitude (left and right), phase closure, symmetry,
TABLE 8.
and regularity. These parameters all reflect dynamic aspects of
Intra-judge Reliability for Parameters With Nominal Data
Using Percent Concordance for Stroboscopy and High- vocal fold vibratory motion compared with static parameters like
speed Videoendoscopy (HSV) vibratory edge and are consistent with findings reported by Patel
et al,23 suggesting that HSV might be better than stroboscopy
Stroboscopy HSV in evaluating mucosal wave, amplitude, phase closure, period-
% %
icity, and tissue pliability in cases with severe dysphonia. When
Parameter Concordance Concordance
rating difficulty was experienced using HSV, the average grade
Glottal closure 63 74 of dysphonia on the GRBAS rating35 for the difficult to rate cases
Vertical level 78 89 was 1.75, suggesting that the difficulties were not created by
Free edge contourright 44 52
severe dysphonia. In fact, the parameters with the highest in-
Free edge contourleft 52 67
stances of rating difficulty were supraglottic activitymediolateral
513.e12 Journal of Voice, Vol. 31, No. 4, 2017

and anteroposterior. In these cases, prior experience with duction or breathing. However, the video samples provided for
stroboscopy may have introduced some bias on how supraglot- the ratings did not include any views of abduction and this made
tic compression was being judged. With stroboscopy, the videos rating the free edge contour difficult.
can be played in real time, whereas because of the high tem- The parameters of mucosal wave and nonvibrating portion had
poral resolution of HSV, the videos need to be viewed at a slow a high number of instances where raters could not rate a case,
playback rate of 1030 frames per second. This high temporal especially for stroboscopy. Between 61% and 85% of cases were
resolution of HSV may have actually been a hindrance, because excluded from analysis for these parameters. This was likely
the movements of the supraglottic structures are very slow com- owing to problems with temporal resolution and tracking that
pared with vocal fold movement. Consequently, supraglottic resulted from severe dysphonia.21,23 In the current study, ap-
activity may not be apparent because it would take many frames proximately 40% of the participants received an auditory-
of video to appreciate the slow movement of the supraglottic struc- perceptual grade of 3, indicating severe dysphonia, and 30% had
tures. Furthermore, because the raters prior experience was with moderate dysphonia. In our prior study, 100% of the partici-
stroboscopy only, the very slow movement of the supraglottic pants with severe dysphonia and 64% of participants with
structures may have been a dramatic departure from their moderate dysphonia could not be visually judged on stroboscopy
previous experience and expectations of how supraglottic move- for vibratory parameters.23 The current study had a similar finding.
ment looks, suggesting additional instruction be included in As reported earlier, the average overall grade of dysphonia on
training on this parameter. the GRBAS scale was 2.92 for cases where three or more raters
Compared with previous studies evaluating videostroboscopic could not judge at least one parameter. Furthermore, when a rater
rating forms, stronger correlations were achieved using the VALI is unable to complete a rating, the ICC requires omission of the
form, particularly for the parameters of phase closure and case from analysis (list-wise deletion), and this reduced sample
regularity.18,20 Poburka and Bless18 reported correlations of 0.54 size (Table 4). Reduced sample size contributed to wider CIs,
and 0.71 for phase closure and regularity, respectively. In the hence correlations for the stroboscopy parameters of ampli-
current study, the correlations were 0.88 and 0.96. Rosen20 re- tude (right), mucosal wave (left and right), and nonvibrating
ported a correlation of 0.58 for duration (phase closure in the portion (left) could not be interpreted validly. Likewise, the cor-
current study) and correlations of 0.37 and 0.46 for periodicity relations for amplitude (left) and supraglottic activity
of the right and the left vocal folds, respectively (regularity in (mediolateral) should be interpreted with caution (Table 5). Fur-
the current study). Because all of these studies employed some thermore, it could be argued that because list-wise deletion was
form of training, we postulate the stronger correlations in the used, the more difficult to rate cases were eliminated from anal-
current study may be due, at least in part, to the VALIs graphic ysis and the correlations were higher than they would be if all
illustrations and the form-specific training. The graphics helped cases had been included in the analysis. Despite a lower sample
illustrate the complex temporal and timing patterns associated size, significance was reached for the majority of parameters.
with the parameters to be rated. The form-specific training was Future studies should require rating all cases, even if it re-
provided before rating with discussion to help ensure all raters quires a best attempt.
were looking for the same features. Additionally, the presence Although this is a prospective study, there are several limi-
of clear definitions of rating form parameters, including indi- tations. Clearly, a larger number of video samples and raters,
cations as to which task facilitates rating (eg, the task of normal including other frequent users of laryngeal imaging such as oto-
pitch normal phonation for glottal closure, fully abducted vocal laryngologists and teachers of singing, would be ideal. However,
folds for free edge contour), could be additional reasons for the this may be difficult to achieve because of the time commit-
stronger correlations for videostroboscopic ratings compared with ment for the combined training and subsequent rating of cases.
prior studies. For this controlled study, the analyses were limited to sus-
Unexpectedly, weak inter-judge correlations were obtained for tained phonation without audio and crossed several disorders and
glottal closure, vertical level, and free edge contour across both levels of severity. The video cases did not include rest breath-
techniques. Because the raters had all of their questions ad- ing or adduction-abduction tasks, and raters found it difficult to
dressed during the training period and felt they were able to rate rate free edge contour. Including these would provide a better
these parameters, the discrepancy was not thought to be related range of tasks for evaluation of the various vocal fold static as
to the training but to the form itself. For example, the low cor- well as dynamic features. Determination of whether inclusion
relation for glottal closure may have occurred because the VALI of audio with laryngeal imaging modalities, which more closely
form lacked a way of rating glottal closure when the closure simulates typical clinical practice, improves ratings or intro-
pattern was variable (ie, presence of more than one closure duces bias would be of interest. In this study, we did not limit
pattern)17 during the examination. The category of variable ratings to any particular diagnostic category. Future studies di-
closure was not one of the choices for glottal closure pat- rected at investigating specific pathologies might also help clarify
terns. Appropriate modifications in the rating form and training why some parameters had greater reliability than others. In this
protocol should improve correlations for glottal closure. For ver- study, we did not limit diagnostic categories. Future studies di-
tical level, weak correlations were attributed to insufficient rected at investigating specific pathologies might help clarify why
variability in the video cases for this parameter. For free edge some parameters were more dependably rated than others and
contour, raters reported difficulty rating this parameter because further illuminate the usefulness of this training and rating form
the training called for edge to be rated during vocal fold ab- to improve reliability of visual-perceptual ratings of laryngeal
Bruce J. Poburka, et al Reliability of the VALI in Rating Stroboscopy and HSV 513.e13

images. Moreover, investigating ratings as they relate to spe- 5. Hanson DG, Jiang J, DAgostino M, et al. Clinical measurement of mucosal
cific levels of dysphonia severity and/or specific disorders could wave velocity using simultaneous photoglottography and laryngostroboscopy.
Ann Otol Rhinol Laryngol. 1995;104:340349.
further delineate what form is best for rating laryngeal move- 6. Saadah AK, Galatsanos NP, Bless D, et al. Deformation analysis of the vocal
ment parameters. folds from videostroboscopic image sequences of the larynx. J Acoust Soc
Am. 1998;103:36273641.
7. Sercarz JA, Berke GS, Arnstein D, et al. A new technique for quantitative
CONCLUSIONS measurement of laryngeal videostroboscopic images. Arch Otolaryngol Head
The VALI rating form and training protocol is the first a priori Neck Surg. 1991;117:871875.
form that can be used to make reliable visuoperceptual evalu- 8. Yan Y, Chen X, Bless D. Automatic tracing of vocal-fold motion from
high-speed digital images. IEEE Trans Biomed Eng. 2006;53:1394
ation of vocal fold structure and function for HSV. The new rating
1400.
form and the training protocol were developed not only for 9. Lohscheller J, Toy H, Rosanowski F, et al. Clinically evaluated procedure
visuoperceptual evaluation of HSV, but also includes an updated for the reconstruction of vocal fold vibrations from endoscopic digital
form for evaluation of videostroboscopy, because at the present high-speed videos. Med Image Anal. 2007;11:400413.
time, the routine clinical use of HSV as a sole imaging modal- 10. Patel R, Donohue KD, Unnikrishnan H, et al. Kinematic measurements of
the vocal-fold displacement waveform in typical children and adult
ity is limited owing to the limited commercial availability of this
populations: quantification of high-speed endoscopic videos. J Speech Lang
technology for clinical use and the excessive amount of time re- Hear Res. 2015;58:227240.
quired to view the high-frame rate videos. Moreover, the data 11. Mehta DD, Deliyski DD, Quatieri TF, et al. Automated measurement of
storage from HSV recordings is still cumbersome for a busy clinic vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.
owing to its high frame rate. The VALI rating form is an updated J Speech Lang Hear Res. 2011;54:4754.
12. Ikuma T, Kunduk M, McWhorter AJ. Advanced waveform decomposition
version of the Stroboscopy Evaluation Rating Form. The VALI
for high-speed videoendoscopy analysis. J Voice. 2013;27:369375.
form includes a definition of the parameter being rated, rating 13. Yamauchi A, Imagawa H, Yokonishi H, et al. Evaluation of vocal fold
instructions for each parameter, high-quality graphics, and train- vibration with an assessment form for high-speed digital imaging:
ing video samples. The results of this initial prospective study comparative study between healthy young and elderly subjects. J Voice.
suggest that the VALI form combined with training can be used 2012;26:742750.
14. Krausert CR, Liang Y, Zhang Y, et al. Spatiotemporal analysis of normal
to make reliable qualitative judgments of videostroboscopy and
and pathological human vocal fold vibrations. Am J Otolaryngol.
HSV. Further refinement of the form and training could enhance 2012;33:641649.
its usefulness and would likely result in even higher levels of 15. Yiu EM, Wang G, Lo AC, et al. Quantitative high-speed laryngoscopic
reliability for selected parameters. analysis of vocal fold vibration in fatigued voice of young karaoke singers.
High-speed video emerged as a superior imaging technique J Voice. 2013;27:753761.
16. Woo P. Objective measures of laryngeal imaging: what have we learned
compared with stroboscopy. Stroboscopy has inherent limita-
since Dr. Paul Moore. J Voice. 2014;28:6981.
tions that constrained its usefulness in cases of severe dysphonia. 17. Bless DM, Hirano M, Feder RJ. Videostroboscopic evaluation of the larynx.
For selected parameters, problems with pitch extraction and track- Ear Nose Throat J. 1987;66:289296.
ing resulted in a high number of instances in which raters could 18. Poburka BJ, Bless DM. A multi-media, computer-based method for
not make a judgment. Such problems were largely overcome with stroboscopy rating training. J Voice. 1998;12:513526.
19. Poburka BJ. A new stroboscopy rating form. J Voice. 1999;13:403413.
the use of HSV. Additionally, correlations for both inter-judge
20. Rosen CA. Stroboscopy as a research instrument: development of a
and intra-judge reliability were higher for HSV. Although use perceptual evaluation tool. Laryngoscope. 2005;115:423428.
of HSV enhanced our raters ability to complete visuoperceptual 21. Olthoff A, Woywod C, Kruse E. Stroboscopy versus high-speed glottography:
assessments, the technique also has challenges that limit its prac- a comparative study. Laryngoscope. 2007;117:11231126.
tical use in clinical settings. For example, its high frame rate 22. Mehta DD, Deliyski DD, Zeitels SM, et al. Voice production mechanisms
following phonosurgical treatment of early glottic cancer. Ann Otol Rhinol
requires large amounts of digital storage, and viewing an entire
Laryngol. 2010;119:19.
examination requires an excessive amount of time. At the present 23. Patel R, Dailey S, Bless D. Comparison of high-speed digital imaging with
time, primary use of stroboscopy, with selective use of high- stroboscopy for laryngeal imaging of glottal disorders. Ann Otol Rhinol
speed imaging to resolve difficult imaging conditions appears Laryngol. 2008;117:413424.
to be most effective. Thus, although not all the parameters are 24. Patel RR, Liu L, Galatsanos NP, et al. Differential vibratory characteristics
of adductor spasmodic dysphonia and muscle tension dysphonia on
equally easy to rate, the form provides a means to provide intra-
high-speed digital imaging. Ann Otol Rhinol Laryngol. 2011;120:2132.
and inter-institution clinical rating of laryngeal movements from 25. Patel RR, Pickering J, Stemple J, et al. A case report in changes in phonatory
both HSV and stroboscopic videos. physiology following voice therapy: application of high-speed imaging.
J Voice. 2012;26:734741.
26. Deliyski DD, Hillman RE. State of the art laryngeal imaging: research and
REFERENCES clinical implications. Curr Opin Otolaryngol Head Neck Surg. 2010;18:147
1. Hirano M, Bless D. Videostroboscopic Examination of the Larynx. San Diego, 152.
CA: Singular Publishing Group, INC.; 1993. 27. Mendelsohn AH, Remacle M, Courey MS, et al. The diagnostic role of
2. Zanartu M, Mehta DD, Ho JC, et al. Observation and analysis of in vivo high-speed vocal fold vibratory imaging. J Voice. 2013;27:627631.
vocal fold tissue instabilities produced by nonlinear source-filter coupling: 28. Yamauchi A, Yokonishi H, Imagawa H, et al. Quantification of vocal fold
a case study. J Acoust Soc Am. 2011;129:326339. vibration in various laryngeal disorders using high-speed digital imaging.
3. Woo P. Quantification of videostrobolaryngoscopic findingsmeasurements J Voice. 2016;30:205214.
of the normal glottal cycle. Laryngoscope. 1996;106(2 suppl 79 3 pt):127. 29. Yamauchi A, Yokonishi H, Imagawa H, et al. Visualization and estimation
4. Elidan G, Elidan J. Vocal folds analysis using global energy tracking. J Voice. of vibratory disturbance in vocal fold scar using high-speed digital imaging.
2012;26:760768. J Voice. 2016;30:493500.
513.e14 Journal of Voice, Vol. 31, No. 4, 2017

30. Peppard RC, Bless D. A method for improving measurement reliability in 35. Hirano M, ed. Clinical Examination of Voice. New York: Springer-Verlag;
laryngeal videostroboscopy. J Voice. 1990;4:280285. 1981.
31. Uloza V, Vegiene A, Pribuisiene R, et al. Quantitative evaluation of video 36. Association AS-L-H. The roles of otolaryngologists and speech-language
laryngostroboscopy: reliability of the basic parameters. J Voice. pathologists in the performance and interpretation of strobovideolaryngoscopy
2013;27:361368. (Relevant Paper). 1998. Available at: Www.asha.org/policy. Accessed
32. Otto KJ, Hapner ER, Baker M, et al. Blinded evaluation of the effects of November 11, 2016.
high definition and magnification on perceived image quality in laryngeal 37. Deliyski DD, Powell MEG, Zacharias SRC, et al. Experimental investigation
imaging. Ann Otol Rhinol Laryngol. 2006;115:110113. on minimum frame rate requirements of high-speed videoendoscopy
33. Kendall KA. High-speed laryngeal imaging compared with videostroboscopy for clinical voice assessment. Biomed Signal Proces. 2015;17:21
in healthy subjects. Arch Otolaryngol Head Neck Surg. 2009;135:274281. 28.
34. Inwald EC, Dllinger M, Schuster M, et al. Multiparametric analysis of vocal 38. Bonilha HS, Deliyski DD, Gerlach TT. Phase asymmetries in normophonic
fold vibrations in healthy and disordered voices in high-speed imaging. speakers: visual judgments and objective findings. Am J Speech Lang Pathol.
J Voice. 2011;25:576590. 2008;17:367376.

Vous aimerez peut-être aussi