Académique Documents
Professionnel Documents
Culture Documents
The CES-D scale is a short self-report scale was designed to measure current level of depres-
designed to measure depressive symptomatology in sive symptomatology, with emphasis on the af-
the general population. The items of the scale are
fective component, depressed mood. The symp-
symptoms associated with depression which have toms are among those on which a diagnosis of
been used in previously validated longer scales. The
new scale was tested in household interview surveys clinical depression is based but which may also
and in psychiatric settings. It was found to have accompany other diagnoses (including &dquo;nor-
very high internal consistency and adequate test- maI&dquo;) to some degree.
retest repeatability. Validity was established by pat-
This definition of the variable being measured
terns of correlations with other self-report measures,
determines the appropriate criteria of validity
by correlations with clinical ratings of depression,
and by relationships with other variables which and reliability (Standards for Educational and
support its construct validity. Reliability, validity, Psychological Tests, 1974). Content validity will
and factor structure were similar across a wide be based on the clinical relevance of the symp-
variety of demographic characteristics in the toms which comprise the items of the scale.
general population samples tested. The scale should Criterion-oriented validity will include correla-
be a useful tool for epidemiologic studies of de-
pression. tions with other valid self-report depression
scales, correlations with clinical ratings of
The Center for Epidemiologic Studies Depres- severity of depression, and discrimination be-
sion Scale (CES-D Scale) was developed for use tween psychiatric patients and general popula-
in studies of the epidemiology of depressive tion samples. Construct validity will be based on
symptomatology in the general population. Its what is known about the theory and epidemiolo-
purpose differs from previous depression scales gy of depressive symptoms. Evidence that the
which have been used chiefly for diagnosis at scale is reliable but is also sensitive to current
clinical intake and/or evaluation of severity of levels of symptomatology will be based on
illness over the course of treatment. The CES-D predictability of test-retest changes in scores
(e.g., scores of patients before and after treat-
ment, or scores of household respondents before
and after &dquo;Life Events Losses&dquo;). Since several
comparable samples (essentially replications)
were tested, consistency of results across the
385
samples will also be shown as indirect evidence possible range of scores is zero to 60, with the
of reliability. higher scores indicating more symptoms,
The CES-D was designed for use in general weighted by frequency of occurrence during the
population surveys, and is therefore a short, past week.
structured self-report measure. It is usable by
lay interviewers, acceptable to the respondent, Field Tests: Methods
and not substantially influenced by the normal
First Questionnaire Survey (Ql Survey)
range of conditions during a household inter-
view. The scale was designed for use in studies of The CES-D scale was included in a structured
the relationships between depression and other interview containing over 300 items, including
variables across population subgroups. To com- other scales designed to measure depression or
pare results from one subgroup to another, the depressed mood (Bradburn Negative Affect,
scale must be shown to measure the same thing 1969; Lubin, 1967), psychological symptomss
in both groups. Therefore, it will be shown that (Langner, 1962), well-being (Bradburn Positive
properties of the scale (validity, reliability, factor Affect, 1969; Cantril Ladder, 1965) and Social
structure) are similar for the various population Desirability (Crowne Marlowe, 1960). It also
&
subgroups to be studied. included standard sociodemographic items (age,
sex, education, occupation, marital status) and
measures of life events, alcohol problems, social
Development of the Scale functioning, physical illness and use of medica-
The CES-D items were selected from a pool of tions. The interview, which took about an hour,
items from previously validated depression was conducted by an experienced lay interviewer
scales (e.g. Beck, Ward, Mendelson, Mock, & in the home of the respondent.
Erbaugh, 1961; Dahlstrom & Welsh, 1960; Probability samples of households designed to
Gardner, 1968; Raskin, Schulterbrandt, Reatig, be representative of two communities (Kansas
& McKeon, 1969; Zung, 1965). The major com- City, Missouri, and Washington County, Mary-
ponents of depressive symptomatology were land) selected. An individual (aged 18 and
were
identified from the clinical literature and factor over) randomly selected for interview from
was
analytic studies. These components included: each household in the sample. Independent
depressed mood, feelings of guilt and worthless- samples of households were designated for each
ness, feelings of helplessness and hopelessness, week of the study. Strong efforts were made to
psychomotor retardation, loss of appetite, and complete interviews in the assigned week, but up
sleep disturbance. Only a few items were to three weeks (and unlimited numbers of call-
selected to represent each component. Four backs) were allowed to maximize response rate.
items were worded in the positive direction to Interviewing was done from October 1971
break tendencies toward response set as well as through January 1973 in Kansas City and from
to assess positive affect (or its absence). To em- December 1971 through July 1973 in Washing-
phasize current state, the directions read: &dquo;How ton County. The response rate in Kansas City
often this past week did you ... &dquo; Each response was about 75%, with a total of 1173 completed
was scored from zero to three on a scale of fre- interviews; in Washington County the response
quency of occurrence of the symptom. rate was about 80%, with 1673 completed inter-
Pretests on small &dquo;samples of convenience&dquo; views. Informed consent was obtained from all
indicated appropriate performance of the scale respondents. Both sites had a refusal rate of
and guided minor revisions for clarity and about 17%, plus a small percentage of not-at-
acceptability. The 20-item scale used in the home and other reasons for nonresponse.
studies reported here is shown in Table 1. The Demographic distributions of the samples are
reported elsewhere (Comstock & Helsing, in interview, including the CES-D scale. The sam-
press), as are analyses of characteristics of those ples probably have some underrepresentation of
who refused to be interviewed (Comstock & Hel- males and the poorly educated. However, they
sing, 1973; Klassen & Roth, 1974). Refusals include respondents with a wide range of demo-
were significantly more likely to have lower edu- graphic characteristics, in numbers adequate for
cation and come from smaller households than analyses of relationships among variables.
respondents. Analyses have been made of re-
spondents interviewed in the assigned week (&dquo;on
time&dquo;) versus the harder to find respondents in- Second Questionnaire Survey (Q2 Survey)
terviewed in the following three weeks (&dquo;late&dquo;) The CES-D scale was also included in a slight-
(Mebane, 1973). Males and working people were ly revised (mainly shortened) version of the ques-
slightly overrepresented among the &dquo;late&dquo; re- tionnaire (Q2) used in Washington County only,
spondents, but the &dquo;late&dquo; did not differ from the from March 1973 through July 1974 (for three
&dquo;on time&dquo; on the psychological measures in the months Q1 and Q2 were used alternately). Sam-
ples were drawn for four-week periods. The re- ticut (Weissman. Prusoff & Newberry, 1975). In
vision was not expected to affect the CES-D, the Washington County study, seventy patients
since the scale was placed very early in both in- residing in a private psychiatric facility were
terviews, with identical preceding sections. The selected on the basis of willingness and ability to
major differences between the 01 and Q2 sur- participate. Each patient was rated on the Rock-
veys were: length of interview (60 vs. 30 liff Depression Rating Scale (Rockliff, 1971) by
minutes); the time-basis of the sampling frame the nurse-clinician who was most familiar with
(weekly vs. four-week); and the site (Kansas City the patients current status. Immediately follow-
and Washington County vs. Washington County ing this, the patient was interviewed by one of
otili,). Theresponse rate for the Q2 survey was the interviewers from the Washington County
about 75%, with 1089 completed interviews, and general population survey, using the original in-
about 22% refusals. Therefore, the obtained terview form (Ql). In the New Haven Study,
sample for Q2 may be slightly less representative thirty-five people admitted to outpatient treat-
than that of the Washington County QI survey. ment for severe depression and scoring seven or
higher on the Raskin Depression Rating Scale
(Raskin et al., 1969) participated in the study.
Mail-backs They were given the CES-D scale and the SCL-
90 (Derogatis, Lipman, & Covi, 1973) as self-re-
From May 1973 through March 1974, each re- ports and rated by clinicians on the Hamilton
spondent to Q2 was asked to fill out and mail Rating Scale (Hamilton, 1960) as well as the
back one retestthe CES-D scale either two,
on
Raskin. The measures were taken upon admis-
four, six, or weeks after the original inter-
eight sion for treatment, after one week, and after
view. A total of 419 mail-backs was received four weeks of treatment, using psychotropic
(about 56% response rate). medication and supportive psychotherapy.
socioeconomic status people report more phy- cyclic in at least some individuals, and the phase
sical symptoms while higher socioeconomic (length) of cycles may vary across individuals.
status people report more affective symptoms The CES-D was designed to be sensitive to possi-
(Crandell & Dohrenwend, 1967). In summary, in ble depressive reactions to events in a persons
a general population sample, we would expect a life; the timing of these events is unpredictable
but presumably aperiodic. There are also
great deal of heterogeneity, with many people
experiencing a few symptoms and a few ex- methodological complications in test-retest
measures. For example, there may be biases due
periencing many. Therefore, some inter-item
correlations may be quite low, but the direction to nonresponse, biasing effects of repeated test-
of correlations should be consistent enough to ing, and asymmetric regression toward the mean
produce reasonably high measures of internal due to the very skewed distribution of CES-D
scores. Furthermore, in the present data, the
consistency. In a patient group, we would expect
higher item means, higher inter-item correla- test-retest time interval was confounded with
tions, and very high internal consistency. differences in style of data collection: all initial
The results support these expectations (see scores were based on interviews; the short-inter-
Table 3). Both inter-item and item-scale correla- val (weeks) retests were different (i.e., self-ad-
tions were higher in the patient sample than in ministered mail-backs); the long-interval
the general population samples (even when the (months) retests were the same (i.e., interviews).
small N and, therefore, greater sampling error of In light of these properties of the variable be-
the patient sample is taken into account). Ex- ing measured, we would expect only moderate
pectations were also confirmed by measures of levels of test-retest correlations in the overall
internal consistency (coefficient alpha and the samples. Shorter test-retest time intervals
Spearman-Brown, split-halves method; Nun- should produce somewhat higher correlations
nally, 1967). They were high in the general than longer intervals. However, if people were
population (about .85) and even higher in the selected by the information we have about what
patient sample (about .90). happened during the time interval, the correla-
This high internal consistency may include tions should be better differentiated. Specifical-
some component of response bias, i.e., the ten- ly, life events are expected to introduce variabili-
dency of an individual to answer all questions in ty (i.e., some individuals may react more than
the same (positive or negative) direction. In the others) and thus lower the test-retest correla-
tions. Tables 4 and 5 show that the results were but not the other had intermediate correlations.
consistent with these predictions. The correlation for those with no events (r .54) =
Table 4 shows the test-retest correlations for might be considered the fairest estimate of test-
those who responded to the request to fill out retest reliability, in the sense of repeatability
and mail in a retest of the CES-D (mail-backs) with conditions replicated, for the three- to
and those who were reinterviewed (Q3). All twelve-month time interval. In the New Haven
respondents were retested only once; each time patient group, the correlation of CES-D scores
interval represents a different group of people. at admission with scores obtained after four
The correlations were in the moderate range (all weeks of treatment was .53 (compared with r =
but one were between .45 and .70) and were, on .58 for the SCL-90). In this group, &dquo;events&dquo; had
the average, larger for the shorter time intervals. certainly occurred, but the effect of treatment
In Table 5, all Q3 respondents (test-retest may be assumed to be in the same direction for
time interval ranging from three to twelve all or most patients. Therefore, it is reasonable
months) were classified by whether any one of 14 that the correlation was about the same as that
in the &dquo;no events&dquo; group.
negative life events had occurred in the year
prior to the first interview and in the interval be-
tween interviews. Those with no life events at Validity
either time had the highest test-retest correla- Although not designed for clinical diagnosis,
tion ; those with life events at both times had the the CES-D scale is based on symptoms of de-
lowest correlation. Those with events at one time pression as seen in clinical cases. Therefore, it
Table 5. Test-retest Correlations by Life ment, the correlations were substantially higher
Events Losses Before Each Test (.69 to .75). These correlations were almost as
high as those obtained for the 90-item SCL 90
(Weissman et al., 1975).
Self-report Criteria
Table 6 shows correlations of the CES-D scale
with other self-report scales in the several sam-
ples. (Note that Q2 and Q3 did not include all
scales.) In all the samples, the pattern of correla-
tions of the CES-D with other scales gives rea-
should discriminate strongly between patient sonable evidence of discriminant validity. The
and general population groups, be sensitive to
highest rs were with scales designed to measure
levels of severity of depressive symtomatology,
symptoms of depression (i.e., Lubin, Bradburn
and reflect improvements after psychiatric treat-
Negative Affect and Bradburn Balance) or gen-
ment. In addition, it should correlate well with eral psychopathology (Langner) and the Cantril
other scales designed to measure depression and Ladder. The correlation of the CES-D with the
less well with scales which measure related but Bradburn Positive Affect scale was negative and
different variables; be related to a felt need for was low positive with scales designed to measure
psychiatric services; and be sensitive to possible different variables (medications, disability days,
reactive depression in the face of certain life social functioning, aggression). The CES-D cor-
events. related moderately with interviewer ratings of
depression but low negative to zero with inter-
Clinical Criteria viewer ratings of cooperation and understanding
The CES-D scores discriminated well between of the question.
Table 6 also shows support for the concept of
psychiatric inpatient and general population
a &dquo;syndrome&dquo; of depression which is more con-
samples and discriminated moderately among
levels of severity within patient groups. Table 2 sistent in the patient sample than in the general
shows that the average CES-D score for the population samples. In the patient groups, the
group of 70 Washington County psychiatric in- correlations with other depression scales were
patients was substantially and significantly higher positive (in the New Haven patients, cor-
higher than the average for the general popula- relation with the SCL-90 was .83); with the
tion samples. Seventy percent of the patients but Bradburn Positive Affect, higher negative; and
only 21% of the general population scored at with other scales, the same low positive.
and above an arbitrary cutoff score of 16. In the The low negative correlations with the Mar-
lowe-Crowne scale of &dquo;social desirability&dquo; sug-
patient group, the correlation between the CES-
D scale and ratings of severity of depression by gest that there may be some general response set
the nurse-clinician was .56 (Craig & Van Natta, involved in the CES-D scores (see also Klassen,
in press). In the New Haven patient group, the Hornstra, & Anderson, 1975). However, the pat-
average CES-D score at admission was 39.11, tern of correlations in Table 6 suggests that this
with no score below 16 (note that this group was bias is small and does not entirely mask mean-
screened to include only those above 6 on the ingful relationships with other variables.
Raskin scale). The correlations of the CES-D
with the Hamilton Clinicians Rating scale and Need for Services
with the Raskin Rating scale were moderate (.44 In the Ql and Q2 surveys, the respondents
to .54) at admission. After four weeks of treat- were asked whether they had had an emotional
validity.
Life Events
Past research has shown an association of ill-
ness, including mental illness, with certain sig-
nificant life events (Dohrenwend & Dohren-
wend, 1974). Table 8 shows the average CES-D
scores for those who do and do not report certain
events in the year (or during the retest interval
for Q3) preceding the interview. The results were
as predicted: the more negative the event, the
a
Overall significance of difference between groups in change scores:
p < .01 in one-way analysis of variance and in one-way analysis of
covariance, with score at time 1 as covariate.
along with the SCL 90, the Hamilton, and the tor loadings is quite consistent across the three
Raskin, decreased significantly from the time of groups. Including items with loadings above .40
admission to one week and to four weeks of in all three groups, the four factors are readily
treatment (see Table 10). The mean for each of interpretable as follows:
the 20 items was lower after four weeks of threat- I. Depressed affect (blues, depressed. lonelv.
ment than upon admission (tables available on crv sad)
request). The change was particularly large for II. Positive affect (good, hopeful, happy, en-
Including items with loadings of at least .35 in at This is very strong evidence that the CES-D has
least two groups would add the items ,failure a similar factor structure in two samples from
fearful, happy. and enjoy to the depressed affect similar populations (Ql vs. Q2) and across two
factor and the items blues, mind, depressed. and tests on essentially the same sample (Ql vs. Q3).
talk to the somatic factor. In all three groups, The factors found in the general population
the depressed affect factor shares the largest are consistent with the components of depres-
proportion of the variance (about 16%) and the sion built into the scale. However, the high inter-
nal consistency of the scale found in all groups
interpersonal factor, the smallest proportion
(about 8%). argues against undue emphasis on separate fac-
tors. The items are all symptoms related to de-
Similarity of factor structure of the three sam-
ples was estimated by the Factorial Invariance pression. For epidemiologic research, a simple
total score is recommended as an estimate of the
Coefficient, ri,(Derogatis, Kallmen, & Davis,
197l ; Derogatis, Serio, & Cleary, 1972; Pinneau degree of depressive symptomatology.
& Newhouse, 1964). The ri, is a measure of the
correlation of the loadings of all items on one Generalizability Across Subgroups
factor in one group versus the loadings on one To be useful fur epidemiologic studies (e.g.,
factor in another group. If the factor structure of distribution of depression across demographic
two groups is similar, the r;,, will be very high subgroups), the CES-D scale must have ade-
when loadings on the same factor in both groups quate reliability and validity and a similar fac-
are correlated (the &dquo;diagonal&dquo; coefficients) and tor structure within each subgroup of the popu-
very low when d(fferent factors are correlated lation. Therefore, the analyses of Tables 3, 4, 6,
(the &dquo;off-diagonal&dquo; coefficients). Comparing Q1 7 and 11 were repeated on each of three age
with Q2 and Ql with Q3, the diagonal coeffi- groups (under 25, 25-64, over 64), the two sexes,
cients were very high (.87 to .99). The off- two races (Black and White), three levels of edu-
diagonal coefficients (i.e., the similarity of dif cation (less than high school, high school,
ferent factors) were very low (the largest was .13). greater than high school), and the two &dquo;need
help&dquo; groups (&dquo;need help,&dquo; &dquo;not need help&dquo;). sampling design balanced by interviewer may be
For these analyses, the data from Kansas City appropriate.
and Washington County Q1 and Q2 were com- On the positive side, the results reported here
bined to maximize numbers in the subgroups. are very favorable for the uses of the CES-D
With few exceptions, the results for the total scale for which it was designed. The scale has
population were confirmed in all subgroups. (ta- high internal consistency, acceptable test-retest
bles available on request). In all subgroups, co- stability, excellent concurrent validity by clinical
efficient alpha was .80 or above. Test-retest cor- and self-report criteria, and substantial evidence
relations were moderate (.40 or above) in all but of construct validity. These properties hold
three groups (Blacks, age under 25, and &dquo;need across the general population subgroups
help&dquo;). The subgroup patterns of correlations studied. The scale is suitable for use in Black
with other scales (as in Table 6) and relation- and White English-speaking American popula-
ships to &dquo;need help&dquo; (as in Table 7) were very tions of both sexes with a wide range of age and
similar to those in the total population. The sub- socioeconomic status for the epidemiologic
groups did not differ from each other or from study of the symptoms of depression. A group
the total population in factor structure. The with a high average score may be interpreted to
&dquo;need help&dquo; group (which had been found to be be &dquo;at risk&dquo; of depression or in need of treat-
similar to the Washington County patient group ment. The scale is a valuable tool to identify
by various criteria above) was not like that pa- such high-risk groups and to study the relation-
tient group in factor structure but was very simi- ships between depressive symptoms and many
lar to the total general population. other variables.
Crowne, D. P., & Marlowe, D. A new scale of social Nunnally, J. C. Psychometric theory. New York: Mc-
desirability independent of psychopathology. Graw Hill, 1967.
Journal of Consulting Psychology, 1960, 24, Pinneau, S. R., & Newhouse, A. Measures of invari-
349-354. ance comparability in factor analysis for fixed
and
Derogatis, L. R., Kallmen, C. H., & Davis, D. M. variables. Psychometrika, 1964,
, 271-281.
29
FMATCH: A program to evaluate the degree of Raskin, A., Schulterbrandt, J., Reatig, N., & Mc-
equivalence of factors derived from analyses of Keon, J. Replication of factors of psychopathology
different samples. Behavioral Science, 1971, 16, in interview, ward behavior, and self-report rat-
271-273. ings of hospitalized depressives. Journal of Ner-
Derogatis, L. R., Lipman, R. S., & Covi, L. SCL-90: vous and Mental Disease, 1969, 148, 87-96.
An outpatient psychiatric scale: Preliminary re- Rockliff, B. W. A brief rating scale for anti-depres-
port. Psychopharmacology Bulletin, 1973, 9, sant drug trials. Comprehensive Psychiatry, 1971,
13-27. 12,
122-135.
Derogatis, L. R., Serio, J. C., & Cleary, P. A. An Standardsfor educational and psychological tests.
empirical comparison of three indices of factorial Washington, D.C.: American Psychological Asso-
similarity. Psychological Reports, 1972, 30, ciation, 1974.
791-804. Trieman, B. Depressive mood among middle class ur-
Dohrenwend, B. S., & Dohrenwend, B. P. (Eds.) ban ethnic groups. Technical Report, 1975, Con-
Stressful life events: Their nature and effects. New tract HSM 42-73-238, National Institute of Men-
York: Wiley-Interscience, 1974. tal Health.
Weissman, M. M., Prusoff, B., & Newberry P. Com-
Gardner, E. A. Development of a symptom check list
the measurement of depression in a popula- parison of the CES-D with standardized depres-
for sion rating scales at three points in time.
tion. Unpublished, 1968.
Technical Report, 1975, Yale University, Contract
Hamilton. M. A rating scale for depression. Journal
ASH-74-166, National Institute of Mental Health.
of Neurologic Neurosurgical Psychiatry, 1960, 23,
56-62. Zung, W. W. K. A self-rating depression scale. Ar-
chives of General Psychiatry, 1965, 12, 63-70.
Handlin, V., Klassen, D., Hornstra, R., & Roth, A.
Interviewer effects in a community mental health
survey. Technical Report, 1974, The Greater Kan- Acknowledgements
sas City Mental Health Foundation, Contract PH The CES-D Scale was originallv developed bv Mr.
43-66-1324, National Institute of Mental Health. Ben Z. Locke, Chief, Center ,for Epidemiologic
Klassen, D., Hornstra, R., & Anderson, P. The in- Studies (CES), National Institute of Mental Health
fluence of social desirability on symptom and and Dr. Peter Putnam. ,formerlv at CES. The overall
mood reporting in a community Journal
survey. of program was initiated by Dr. Robert Markush.
Consulting & Clinical Psychology, 1975, 43, _former Chiqfi CES. The,field studies were carried out
448-452. bv the Epidemiologic Field Station, Greater Kansas
Klassen, D., & Roth, A. Characteristics of non-re- Citv Mental Health Foundation. Kansas Citv. Mis-
spondents in the Community Mental Health As- souri (Dr. Rob(jn Hornstra. Director) and The Train-
sessment survey. Technical Report, 1974, The irzg Center for Public Health Research, Johns Hop-
Greater Kansas City Mental Health Foundation, kins Universitv. Hagerstown. Maryland (Dr. George
Contract PH 43-66-1324, National Institute of Comstock) under contract with CES. The clinical
Mental Health. validation studies were carried out bv Dr. Thomas
Klein, D. F. Endogenomorphic depression. Archives Craig. _fbrmer(v of Johns Hopkins and Dr. Mvrna
of General Psychiatry, 1974,
447-454.
31, Weissman. Yale Universitv, under contract NItl7
CES. Important advice on and review of this report
Langner, T. S. A twenty-two item screening score of
was provided bv those connected with the
psychiatric symptoms indicating impairment. study. es-
Journal of Health and Human Behavior, 1962, 3, peciallv Dr. Thomas Craig and Dr. Evelvn Goldberg,
269-276. Johns Hopkins; bv Dr. Len Derogatis, Johns Hop-
Lubin, B. Munual for the depression adjective check kins ; and by the reviewers and editor of this journal.
lists. San Diego: Educational and Industrial Test-
ing Service, 1967. Authors Address
Mebane, I. On time and late respondents. Technical
Report, 1973. Center for Epidemiologic Studies, Lenore S. Radloff, Room 10C-09, Parklawn Building,
National Institute of Mental Health. 5600 Fishers Lane, Rockville, Maryland 20852.