Shear 2001

166
Shear et al.
DEPRESSION AND ANXIETY 13:166178 (2001)
RELIABILITY AND VALIDITY OF A STRUCTURED

INTERVIEW GUIDE FOR THE HAMILTON ANXIETY
RATING SCALE (SIGH-A)
M. Katherine Shear, M.D.,1* Joni Vander Bilt, M.P.H.,1 Paola Rucci, D. Stat.,1 Jean Endicott, Ph.D.,2
Bruce Lydiard, M.D.,3 Michael W. Otto, Ph.D.,4 Mark H. Pollack, M.D.,4 Linda Chandler, Ph.D.,5
Jenna Williams, B.S.,1 Arjumand Ali, and David M. Frank1
The Hamilton Anxiety Rating Scale, a widely used clinical interview assessment tool, lacks instructions for administration and clear anchor points for the
assignment of severity ratings. We developed a Structured Interview Guide for
the Hamilton Anxiety Scale (SIGH-A) and report on a study comparing this
version to the traditional form of this scale. Experienced interviewers from
three Anxiety Disorders research sites conducted videotaped interviews using
both traditional and structured instruments in 89 participants. A subset of the
tapes was co-rated by all raters. Participants completed self-report symptom
questionnaires. We observed high inter-rater and test-retest reliability using
both formats. The structured format produced similar but consistently higher
(+ 4.2) scores. Correlation with a self-report measure of overall anxiety was
also high and virtually identical for the two versions. We conclude that in settings where extensive training is not practical, the structured scale is an
acceptable alternative to the traditional Hamilton Anxiety instrument. Depression and Anxiety 13:166178, 2001. 2001 Wiley-Liss, Inc.
Key words: anxiety disorders; generalized anxiety disorders; assessment
instrument; outcomes
INTRODUCTION
The Hamilton Anxiety Rating Scale [HAM-A or
HARS: Hamilton, 1959, 1969] is a 14-item clinical interview measure of somatic and psychic anxiety symptoms. This scale was one of the first attempts to
measure the clinical status of patients diagnosed with
neurotic anxiety states quantitatively and has become
one of the most widely used symptom rating scales in
the world. Although the scale assesses a broad range of
symptoms that are common to all eight of the DSM IV
Anxiety Disorders, it is most often used to assess severity of Generalized Anxiety Disorder (GAD). The
Hamilton Anxiety Rating Scale comprises the main
outcome measure in most treatment studies of this disorder. However, in its original form, the scale has no
established reliability aids, such as instructions for administration or for scoring, and there are no scripted
questions to guide the interviewers who administer this
scale. Without such guidelines, the method of administering each item and assigning the level of symptom
severity can be quite arbitrary. These deficiencies could
result in inconsistent use of this scale, which could increase the variability of treatment outcome ratings and
decrease the accuracy of cross-site or cross-rater comparisons [Bruss et al., 1994]. The industry sponsor of
2001 WILEY-LISS, INC.
this study was concerned that such variance in outcome

ratings might have contributed to difficulty detecting
true drug-placebo differences in several large and expensive multicenter clinical studies of GAD. To address this problem, we developed a modified form of
the instrument that includes instructions for standard
administration and better-specified anchors for assigning severity ratings.
University of Pittsburgh School of Medicine, Pittsburgh,

Pennsylvania
2
Columbia University, New York, New York
3
Medical University of Southern Carolina, Charleston, South
Carolina
4
Massachusetts General Hospital, Boston, Massachusetts
5
Pfizer Incorporated, New York, New York
Contract grant sponsor: Pfizer Company; Contract grant sponsor: National Institute of Mental Health; Contract grant numbers:
MH-53817, MH30915
*Correspondence to: Dr. M. Katherine Shear, M.D., Western Psychiatric Institute and Clinic, 3811 OHara Street, Pittsburgh,
Pennsylvania, 15213. E-mail: shearmk@msx.upmc.edu
Received for publication 11 May 2000; Accepted 12 October 2000
Research Article: HAM-A Anxiety Reliability
The structured interview approach was motivated in

part by the successful development of a structured interview guide for the Hamilton Depression Rating
Scale [SIGH-D; Williams, 1988]. The SIGH-A was
designed to improve the ease and consistency of use by
employing specific instructions for administration,
structured questions for each item, and operationalized
criteria for scoring. The instructions include suggestions for probing and handling boundary problems
in rating severity, as well as for establishing a uniform
time frame during which the symptoms are rated (Fig.
1). Instructions to rate items on a boundary by using
the lower rating may provide more sensitivity at lower
levels of symptoms and might be useful to avoid floor
effects. The purpose of this paper is to report results
of a study designed to determine the reliability of the
structured interview compared to the traditional form
of the Hamilton Anxiety Rating Scale and to document
the correlation between the two forms, when administered to outpatients with different anxiety disorders.
METHODS
The study sample included individuals who sought
treatment for an anxiety disorder at one of three sites
(Western Psychiatric Institute and Clinic, Massachusetts General Hospital, and the Medical University of
South Carolina) between April 1, 1997 and February
28, 1998. Eligible patients signed informed consent as
approved by the Institutional Review Boards associated
with the three study sites. All interviews were videotaped for the purpose of co-rating. In addition to completing structured interviews, all patients completed
self-report questionnaires assessing anxiety-related
symptoms. Participants underwent two interviews on
each of 2 days and were compensated $50 for each day.
Study participants were consenting patients, age 18
years or older, who met criteria for a DSM IV Anxiety
Disorder of at least 6 months in duration, as determined by trained raters, using the Structural Clinical Interview for DSM-IV [SCID-P with Psychotic Screen;
Spitzer et al., 1995]. Participants were excluded from
the study if they had a primary diagnosis of Major Depressive Disorder, Panic Disorder without comorbid
Generalized Anxiety Disorder, psychotic disorder, or if
they met criteria for any psychoactive substance abuse
or dependence within the past 6 months or were
judged unable to participate in the interviews reliably
because of things like chaotic lifestyle or practical
problems or other characteristics judged by the coordinator to be likely to interfere with providing accurate
or complete data.
Participants underwent 2 days of testing within a 7day period. On each day all participants completed
both the traditional Hamilton Anxiety Rating Scale
and the structured interview form of the Hamilton
scale, with the order of these scales randomly assigned
and counterbalanced across the two testing days. On
day 2 the scales were administered in the opposite or-
167
der as on day one. In addition, all participants completed the Beck Anxiety Inventory (BAI). Patients
completed the Patient Global Improvement (PGI) on
day 2 to ensure that there was no substantial change in
clinical state. The mean PGI score on day 2 was 3.73
(SD 1.02) (3 = minimally improved; 4 = unchanged).
Raters who participated in this study were experienced research interviewers. Similar to standard procedures in pharmaceutical trials, they received a brief
introduction and instructions for using each scale but
did not undergo formal training and certification procedures. There was no cross-site training. In order to
avoid crossover effects, e.g., inadvertent use of the
guidelines from the structured interview, raters at each
site administered either the structured SIGH-A only
or the unstructured HAM-A version only. A different
rater administered each scale on day 1 and day 2, thus
providing a more stringent test of inter-rater reliability
than co-rated videotapes. All interviews were videotaped and sent to Western Psychiatric Institute and
Clinic. Two raters at each site, along with two additional raters from New York State Psychiatric Institute,
carried out co-ratings of 32 videotaped interviews for
the structured interview form of the Hamilton Scale
and Hamilton Anxiety Rating Scale. The 32 tapes were
selected by a random process of 16 from the first half
of the sample, and 16 from the second half of the
sample, with an even distribution of the range of Clinical Global Impression severity scores across sites.
INSTRUMENTS
HAMILTON ANXIETY RATING SCALE
[HARS; HAMILTON, 1959]
This instrument was developed to assess and quantify symptom severity among patients with anxiety
neurosis. Inter-rater reliability has been reported as an
Intraclass Correlation Coefficient of 0.740.96 [Bruss
et al., 1994].
BECK ANXIETY INVENTORY [BAI; BECK ET
AL., 1988]
This instrument is a 21-item, self-report questionnaire designed to assess and evaluate the frequency of
anxiety symptoms over a one-week period. This test
assesses two factors: cognitive and somatic symptoms.
The instrument has good internal consistency ( =
0.92), test-retest reliability (r = 0.75; df = 81, P =
<.001), and convergent and discriminant validity.
RESULTS
DEMOGRAPHIC AND CLINICAL
CHARACTERISTICS
Eighty-nine adults participated in the study including 30 at Massachusetts General Hospital, 30 at the
Medical University of South Carolina, and 29 at
Western Psychiatric Institute and Clinic. Sixty percent
168
Shear et al.
Figure 1.
SIGH-A: Structured interview for the Hamilton Anxiety Rating Scale.
169
170
Shear et al.
171
172
Shear et al.
173
174
Shear et al.
175
176
Shear et al.
were female. Participants mean age was 37.1 years (SD

10.9) with range of 19 to 68 years. Eighty-eight percent were European-American, 6% African-American,
4% Hispanic, and 2% other. Thirty-eight (43%) were
married. Three participants (3%) had no high school
diploma or G.E.D., 30 patients (34%) highest degree
was a high school diploma or G.E.D., and 56 (63%)
had completed some college, graduate school, or vocational/technical/business school following high school.
The mean years of education was 14.4 (SD = 3.4).
Sixty-three participants (71%) were either employed
or were students and 26 (29%) were unemployed.
Primary DSM IV diagnoses included Generalized
Anxiety Disorder (n = 46, 52%), Social Phobia (n = 20,
22%), Panic Disorder (n = 13, 15%), and Obsessive
Figure 2.
score.
Relationship between SIGH-A and HAM-A total
Compulsive Disorder (n = 10, 11%). Considering any

current diagnosis, 74% endorsed Generalized Anxiety
Disorder (n = 66), 33% Social Phobia (n = 29), 24%
Panic Disorder (n = 21), 14% Major Depression (n =
12), 14% Obsessive Compulsive disorder (n = 12),
10% Dysthymic Disorder (n = 9), and 5% Posttraumatic Stress Disorder (n = 4).
RELIABILITY
Test-retest reliability was assessed with the Intraclass
Correlation Coefficient [Bartko and Carpenter, 1966]
(ICC) for the total score on each instrument, assigned by
raters on day 1 and day 2. Using a two-way random effects model, the ICC for the HAM-A was 0.86 (CI 95%
0.780.91). The corresponding ICC for the SIGH-A day
1-day 2 ratings was 0.89 (CI 95% 0.830.93).
Inter-rater reliability was assessed using the ICC for
the co-ratings of the 32 selected videotaped interviews. The ICC was 0.98 (CI 95% 0.970.99) for the
traditional scale and 0.99 (CI 95%: 0.980.99) for the
SIGH-A. Item correlations for traditional Hamilton
Anxiety Rating Scale ranged from 0.94 to 0.98, except
for item 14 (observational rating of behavior), where
the ICC was 0.76. Item-total correlations for the
structured interview guide ranged from 0.91 to 0.99,
except for item 14 , where the correlation was 0.81.
VALIDITY OF SIGH-A AS A MEASURE OF
HAMILTON ANXIETY SCORE
The total score obtained using the structured interview format correlated highly with the total score of
the traditional format on both day one (r = 0.77, P <
.01) and day two (r = 0.75, P < .01). We also examined
the relationship between each of the two versions of
the Hamilton Scale with scores on the Beck Anxiety
Inventory, a different way of obtaining an overall measure of somatic and cognitive anxiety. The correlation
177
TABLE 1. Reliability of SIGH-A and HAM-A Scales in Patients With and Without Current GAD.
Total sample
ICC (test-retest) for HAM-A
ICC (test-retest) for SIGH-A
ICC (inter-rater reliability) for HAM-A
ICC (inter-rater reliability) for SIGH-A
HAM-A/SIGH-A correlation
Internal consistency (alpha) for SIGH-A
Internal consistency (alpha) for HAM-A
Mean HAM-A total score (day1)(S.D.)
Mean SIGH-A total score (day1)(S.D.)
Current GAD
.86 (.78.91)
.89 (.83.93)
.98 (.97.99)
.99 (.98.99)
.77 (day1) .75 (day2)
.82
.85
20.58 (8.48)
24.62 (9.09)
.79 (.66.87)
.88 (.80.92)
.98
.98
.70 (day1) .72 (day2)
.79
.82
20.85 (7.47)
24.67 (8.68)
No current GAD
.94 (.85.97)
.93 (.84.97)
.98
.99
.89 (day1) .84 (day2)
.88
.92
19.80 (11.05)
24.48 (10.38)
of the two forms was essentially the same (0.53 for the
traditional Hamilton scale and 0.57 for the SIGH-A).
This finding provides further confirmation of the convergent validity of the two forms of the instrument.
formation = 2.6; P < 0.05), whereas this is not true of

the SIGH A.
RELATIONSHIP BETWEEN THE HAM-A

AND SIGH-A
The structured interview guide was associated with
consistently higher scores, such that the mean score
for this version was 24.6 (SD 9.1), while the mean for
the traditional format was 20.5 (SD 8.4) on day 1
(paired-sample t-test = 6.4, df = 88, P < 0.001) and
23.5 (SD 9.0) and 19.2 (SD 7.6) on day 2 (pairedsample t-test = 6.8, df = 88, P < 0.001). Considering
individual items, correlation between the two forms
ranged from 0.33 for item 14 (behavior at interview)
to 0.78 for item 5 (trouble concentrating or remembering). In all cases, on both days, the mean item
score from the structured interview was higher than
for the same item on the traditional scale. The magnitude of this difference ranged from a mean of 0.02
(items 8, 9, and 10) to 0.57 (item 4). The mean difference between the scales was 4.1 for day 1 and 4.3
for day 2. The relationship between the two forms,
as obtained by linear regression for day 1, was
SIGH-A score = 7.547 + 0.832 HAM-A score (Fig.
2). These differences are most likely a result of explicit instructions on the structured interview form,
which describe how to rate severity and indicate how
to resolve boundary differences, as well as the systematic inquiry made for each item. For consistency,
these instructions tell the rater to score questionable
cases at the higher level.
The Hamilton Anxiety Rating Scale is an extensively used assessment instrument, which was developed for rating severity of anxiety symptoms prior to
the development of reliable diagnostic criteria for
different Anxiety Disorders. In some studies it has
shown sensitivity to change and may be useful as an
outcome measure in clinical settings. The HAM-A is
the primary outcome measure most often used in
treatment studies of Generalized Anxiety Disorder,
and it is also used to rate severity of anxiety symptoms in other disorders. The lack of instructions for
administration and the absence of clear anchor points
for severity ratings mean training is somewhat difficult and decisions for both administration and scoring can be idiosyncratic.
We developed a structured interview to provide an
explicit guide for the use of the Hamilton scale. The
study reported here documents good reliability of the
SIGH-A, though the traditional scale also performed
well. Of some interest, there was significantly lower
rater reliability across 2 days on the unstructured
HAM-A in GAD patients compared to non-GAD patients. Scores on the two instruments were highly correlated in this study, with a uniform and reliable
difference between them. One possible reason that the
SIGH-A has yielded slightly higher scores on average is
that, unlike the traditional scale, the SIGH-A instructs
clinicians to probe subject responses for frequency, distress, and interference before making ratings. Furthermore, these ratings are based on distinct severity scale
anchor points. Because the HAM-A does not instruct
clinicians to probe subject responses before making ratings, this may contribute to the HAM-A generating
lower total scores due to potentially obtaining less information from subjects. A second possibility is that the
raters, despite not having formal training on the SIGHA instrument, may have been aware of the hypothesis of
the study. These rater expectancies could have influenced the ratings in a consistent direction.
Our study is limited by the fact that this was a
RELIABILITY OF THE HAM-A AND SIGH-A

IN PATIENTS WITH AND WITHOUT
CURRENT GAD
Since the HAM-A is the most frequently used outcome measure in patients with Generalized Anxiety
Disorder, we examined reliability and validity in patients with (n = 66) and without (n = 23) current GAD.
Table 1 compares the reliability results. On the more
stringent test of inter-rater HAM-A performs significantly worse in subjects with GAD than in subjects
with other anxiety disorders (t-test on Fishers z trans-
DISCUSSION
178
Shear et al.
sample of subjects who presented to our Universitybased clinics. The subjects we recruited were similar
to those who come to our settings for treatment of
anxiety and depressive disorders in having a range of
diagnoses and severity. Moreover, collection of data
from three sites improves generalizability. However,
we cannot be certain if results would be similar for patients presenting to other clinical settings or for those
who do not seek treatment. In addition, all raters in
this study had prior experience as research raters; we
do not know whether untrained raters would achieve
similar levels of reliability on either instrument.
We conclude that either form of the Hamilton scale
can be used with confidence by trained research raters.
The main advantage of the traditional format is that it
has been used for many years. However, the advantage
of the structured interview is that it provides instructions to assist in training and increased consistency of
administration and scoring, which may also generate
more appropriate cross-site comparisons and increase
the variability of treatment outcome ratings. The fact
that raters in this study were all experienced research
assessors may have contributed to the lack of significant differences between the two instruments. Providing clear instructions may be especially useful when
raters are inexperienced and extensive training is impractical. A study comparing ratings in such a situation would be of interest.
ACKNOWLEDGMENTS
The authors acknowledge the contributions of assessors at Columbia University: Richard Blumenthal,
PhD; Miriam Gibbon, MSW; Hillary Glick, PhD; and

Kristin Trautman, MSW; at Massachusetts General
Hospital: Steve Safren, PhD; Isabel Scarinci, PhD;
Naomi Simon, MD; and Sabine Wilhelm, PhD; at the
Medical University of South Carolina: Sarah Book,
MD; Marsha Crawford , RN,C; Naresh Emmanuel,
MD; Michael Johnson, MD; Rebecca Kapp, RN; and
Alex Morton, PharmD; and at the University of Pittsburgh: Ulrike Feske, PhD; Briggett Ford PhD; Carolyn Hughes LSW; Mark Jones, LSW; Carl Lejuez MS;
Mary McShea, M.Ed.; and Pamela Stimac, LSW. This
project was supported in part by a grant to Dr. Shear
from the Pfizer Company and in part by the National
Institute of Mental Health grants MH-53817 and
MH30915.
REFERENCES
Bartko JJ, Carpenter WT. 1966. The Intraclass Correlation Coefficient as a measure of reliability Psychol Rep 19:311.
Beck AT, Epstein N, Brown G, Steer RA. 1988. An inventory for
measuring clinical anxiety: psychometric properties. J Consult
Clin Psychol 56:893897.
Bruss GS, Gruenberg AM, Goldstein RD, Barber JP. 1994. Hamilton anxiety rating scale interview guide: joint interview and testretest methods for interrater reliability. Psychiatry Res 53:
191202.
Hamilton M. 1959. The assessment of anxiety states by rating. Br J
Psychiatry, 32:5055.
Hamilton M. 1969. Diagnosis and rating of anxiety. Br J Psychiatry
Special Pub 3:7679.
Williams JB. 1988. A structured interview guide for the Hamilton
Depression Rating Scale. Arch Gen Psychiatry 45:742747.

Shear 2001

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Shear 2001

Transféré par

Droits d'auteur :

Formats disponibles

166

DEPRESSION AND ANXIETY 13:166178 (2001)

RELIABILITY AND VALIDITY OF A STRUCTURED

this study was concerned that such variance in outcome

University of Pittsburgh School of Medicine, Pittsburgh,

Research Article: HAM-A Anxiety Reliability

The structured interview approach was motivated in

SIGH-A: Structured interview for the Hamilton Anxiety Rating Scale.

Research Article: HAM-A Anxiety Reliability

Research Article: HAM-A Anxiety Reliability

Research Article: HAM-A Anxiety Reliability

Research Article: HAM-A Anxiety Reliability

were female. Participants mean age was 37.1 years (SD

Relationship between SIGH-A and HAM-A total

Compulsive Disorder (n = 10, 11%). Considering any

Research Article: HAM-A Anxiety Reliability

formation = 2.6; P < 0.05), whereas this is not true of

RELATIONSHIP BETWEEN THE HAM-A

RELIABILITY OF THE HAM-A AND SIGH-A

PhD; Miriam Gibbon, MSW; Hillary Glick, PhD; and

Vous aimerez peut-être aussi