Académique Documents
Professionnel Documents
Culture Documents
http://tcp.sagepub.com/
Published by:
http://www.sagepublications.com
On behalf of:
Division of Counseling Psychology of the American Psychological Association
Additional services and information for The Counseling Psychologist can be found at:
Subscriptions: http://tcp.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://tcp.sagepub.com/content/34/6/806.refs.html
What is This?
Scale Development Research
A Content Analysis and Recommendations
for Best Practices
Roger L. Worthington
University of Missouri–Columbia
Tiffany A. Whittaker
University of Texas at Austin
The authors conducted a content analysis on new scale development articles appearing
in the Journal of Counseling Psychology during 10 years (1995 to 2004). The authors
analyze and discuss characteristics of the exploratory and confirmatory factor analysis
procedures in these scale development studies with respect to sample characteristics,
factorability, extraction methods, rotation methods, item deletion or retention, factor
retention, and model fit indexes. The authors uncovered a variety of specific practices
that were at variance with the current literature on factor analysis or structural equa-
tion modeling. They make recommendations for best practices in scale development
research in counseling psychology using exploratory and confirmatory factor analysis.
The authors contributed equally to the writing of this article. We would like to thank Jeffrey
Andreas Tan for his assistance with the content analysis. Address correspondence to Roger L.
Worthington, Department of Educational, School, and Counseling Psychology, University of
Missouri, Columbia, MO 65211; e-mail: WorthingtonR@missouri.edu
THE COUNSELING PSYCHOLOGIST, Vol. 34 No. 6, November 2006 806-838
DOI: 10.1177/0011000006288127
© 2006 by the Society of Counseling Psychology
806
Downloaded from tcp.sagepub.com at TULANE UNIV on September 5, 2014
Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 807
CONTENT-ANALYSIS PROCEDURE
based our selection of articles on two central criteria: We included (a) only
new scale development research articles (i.e., we excluded articles investi-
gating only the reliability, validity, or revisions of existing scales) and (b)
only articles that reported results from EFA and CFA. A paid graduate stu-
dent assistant reviewed the tables of contents for each issue of JCP pub-
lished during the specified time frame. We instructed the graduate student
to err on the side of being overly inclusive, which resulted in the identifi-
cation of 38 articles that used EFA and CFA to examine the psychometric
properties of measurement instruments. The first author reviewed these
articles and eliminated 15 that did not meet the selection criteria, resulting
in 23 articles for our sample. Next, the first author and second author inde-
pendently evaluated the 23 articles to identify and quantify the EFA and
CFA characteristics. The only discrepancies in the independent evaluations
of the articles were because of clerical errors in recording descriptive infor-
mation (as opposed to disagreement in classification), which we jointly
checked and verified.
We were interested in a number of characteristics of the studies. For stud-
ies reporting EFA procedures, we were interested in the following: (a) sample
characteristics, (b) criteria for assessing the factorability of the correlation
matrix, (c) extraction methods, (d) criteria for determining rotation method,
(e) rotation methods, (f) criteria for factor retention, (g) criteria for item dele-
tion, and (h) purposes and criteria for optimizing scale length (see Table 1).
For studies reporting CFA procedures, we were interested in the follow-
ing: (a) using SEM versus alternative methods as a confirmatory approach,
(b) sample-size criteria, (c) fit indexes, (d) fit-index criteria, (e) cross-validation
indexes, and (f) model-modification issues (see Table 2).
Characteristic Frequency
Sample characteristics
Convenience sample 5
Purposeful sample of target group 10
Convenience and purposeful sampling 6
Criteria used to assess factorability of correlation matrix
Absolute sample size 1
Item intercorrelations 1
Participants per item ratio 3
Barlett’s test of sphericity 5
Kaiser-Meyer-Olkin test of sample adequacy 7
Unspecified 11
Extraction method
Principal-components analysis 9
Common-factors analysis
Principal-axis factoring 6
Maximum likelihood 3
Unspecified 1
Combination principal-components analysis and common-factors analysis 1
Unspecified 1
Criteria for determining rotation method
Subscale intercorrelations 2
Theory 3
Both 1
Other 3
Unspecified 12
Rotation method
Orthogonal
Varimax 8
Unspecified 1
Oblique
Promax 1
Oblimin 3
Unspecified 4
Both orthogonal and oblique 3
Unspecified 1
Criteria for item deletion or retention
Loadings 16
Cross-loadings 13
Communalities 0
Item analysis 1
Other 3
Unspecified 2
No items were deleted 2
(continued)
Downloaded from tcp.sagepub.com at TULANE UNIV on September 5, 2014
Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 811
TABLE 1 (continued)
Characteristic Frequency
NOTE: Values in each category may not sum to equal the total number of studies because
some studies may have reported more than one criterion or approach.
the basis for item selection based on (a) predictive utility for a criterion
group (e.g., depressives) or (b) homogenous item groupings. The method
described in this article is an empirical approach that employs factor analy-
sis to form homogenous item groupings.
A number of authors have recommended similar sequences of steps to
be taken prior to using factor-analytic techniques (e.g., Anastasi, 1988;
Dawis, 1987; DeVellis, 2003). We review these preliminary steps in the fol-
lowing section because, as is the case in most scientific endeavors, early
mistakes in scale development often lead to problems later in the process.
Once we have described all the steps in some detail, we address the extent
to which the studies in our content analysis incorporated the steps in their
designs.
Although there is little variation between models proposed by different
authors, we rely primarily on DeVellis (2003) as the most current resource.
Thus, the following description is only one of several similar models available
and does not reflect a unitary best practice. DeVellis (2003) recommends the
Characteristic Frequency
(continued)
TABLE 2 (continued)
Model modification
Lagrange multiplier 3
Wald statistic 0
Item parceling 2
NOTE: Values in each category may not sum to equal the total number of studies because
some studies may have reported more than one criterion or approach. AGFI = Adjusted
Goodness-of-Fit Index; AIC = Akaike’s Information Criterion; BIC = Bayesian Information
Criterion; CAIC = Consistent Akaike’s Information Criterion; CFI = Comparative Fit Index;
ECVI = Expected Cross-Validation Index; FA = Common-Factors Analysis; GFI = Goodness-
of-Fit Index; IFI = Incremental Fit Index; NFI = Normed Fit Index; NNFI/TLI = Nonnormed
Fit Index or Tucker-Lewis Index; PCFI = Parsimony Comparative Fit Index; RMR = Root
Mean-Square Residual; RMSEA = Root Mean-Square Error of Approximation; RNI =
Relative Noncentrality Index; SEM = Structural Equation Modeling; SRMR = Standardized
Root Mean-Square Residual.
take the quality of the item pool lightly, and a carefully planned approach to
item generation is a critical beginning to scale development research.
Having the items reviewed by one or more groups of knowledgeable
people (experts) to assess item quality on a number of different dimensions
is another critical step in the process. At a minimum, expert review should
involve an analysis of content validity (e.g., the extent to which a set of items
reflects the content domain). Experts can also evaluate items for clarity,
conciseness, grammar, reading level, face validity, and redundancy. Finally,
it is also helpful at this stage for experts to offer suggestions for adding new
items and length of administration.
Although it is possible to include additional scales for participants to
complete that may provide information about convergent and discriminant
validity, we recommend that researchers limit such efforts at this stage of
development. We recommend this for two reasons. First, it is wise to keep
the total questionnaire length as short as possible and directly related to the
study’s central purpose. The longer the questionnaire, the less likely poten-
tial participants will be to volunteer for the study or to complete all the items
(Converse & Presser, 1986). Scale development studies sometimes include
as many as 3 to 4 times the number of items that will eventually end up on
the instrument, making inclusion of additional scales prohibitive. Second,
there are several ways that items from other measures may interact with
items designed for the new instrument to affect participant responses and,
thus, to interfere in the scale development process. In particular, it would be
very difficult, if not impossible, to control for order effects of different mea-
sures while testing the initial factor structure for the new scale. Randomly
administering existing measures with the other instruments might contami-
nate participants’ responses on the items for the new scale, but administer-
ing the new items first to avoid contamination eliminates an important
procedure commonly used when researchers use multiple self-report scales
concurrently within a single study. Thus, we believe that it is important to
avoid influencing item responses during the initial phase of scale develop-
ment by limiting the use of additional measures. Although ultimately a mat-
ter of researcher judgment, assessing the convergent and discriminant
validity (e.g., correlation with other measures) is an important step that we
believe should occur later in the process of scale development.
Of the 23 studies in our content analysis, 14 reported a construct or scale
definition that guided item generation, and all but 2 studies indicated that
item generation was based on prior theoretical and empirical literature
in the field. Occasionally, however, we found that articles provided only
sparse details in the introductory material articulating the theoretical
foundations for the research. The studies in our review used various item-
generation approaches. All the approaches involved some form of rational
EFA
Peacock, & Jackson, 1982), there are several conditions under which FA
has been shown to be superior to PCA (Gorsuch, 1990; Tucker, Koopman,
& Linn, 1969; Widamen, 1993). Finally, compared with PCA, the outcomes
of FA should more effectively generalize to CFA (Floyd & Widaman,
1995). Thus, although there may be other appropriate uses for PCA, we
recommend FA for the development of new scales.
An example of the use of FA versus PCA in a simulated data set might
illustrate the differences between these two approaches. Imagine that a
researcher at a public university is interested in measuring campus climate for
diversity. The researcher created 12 items to measure three different aspects of
campus climate (each using 4 items): (a) general comfort or safety, (b) open-
ness to diversity, and (c) perceptions of the learning environment. In a sample
of 500 respondents, correlations among the 12 variables indicated that one
item from each subset did not correlate with any other items on the scale (e.g.,
no higher than r = .12 for any bivariate pair containing these items). In FA, the
three uncorrelated items appropriately drop out of the solution because of low
factor loadings (loadings < .23), resulting in a three-factor solution (each fac-
tor retaining 3 items). In PCA, the three uncorrelated items load together on a
fourth factor (loadings > .45). This example demonstrates that under certain
conditions, PCA may overestimate factor loadings and result in erroneous
decisions about the number of factors or items to retain.
We should also make clear that there are several techniques of FA,
including principal-axis factoring, maximum likelihood, image factoring,
alpha factoring, and unweighted and generalized least squares. Gerbing and
Hamilton (1996) have shown that principal-axis factoring and maximum-
likelihood approaches are relatively equal in their capacities to extract the
correct model when the model is known in the population. However,
Gorsuch (1997) points out that maximum-likelihood extractions result in
occasional problems that do not occur with principal-axis factoring. Prior to
the current use of SEM as a CFA technique, maximum-likelihood extraction
had some advantages over other FA procedures as a confirmatory technique
(Tabachnick & Fidell, 2001). For further discussion of less commonly used
approaches, see Tabachnick and Fidell (2001).
Among the studies in our content analysis, most used some form of FA
(n = 10), but a similar number used PCA (n = 9). One study used a combi-
nation of PCA and FA, and another did not report an extraction method.
(Note: 2 of the 23 studies used only CFA and are not included in the figures
reported earlier.) A cursory examination of the publication dates indicates
that the majority of studies using PCA were published prior to the majority
of those using FA, suggesting a trend away from PCA in favor of FA.
Criteria for determining rotation method. FA rotation methods include
two basic types: orthogonal and oblique. Researchers use orthogonal
rotations when the set of factors underlying a given item set are assumed or
known to be uncorrelated. Researchers use oblique rotations when the fac-
tors are assumed or known to be correlated. A discussion of the statistical
properties of the various types of orthogonal and oblique rotation methods
is beyond the scope of this article (we refer readers to Gorsuch [1983] and
Thompson [2004] for such discussions). In practice, researchers can deter-
mine whether to use an orthogonal versus oblique rotation during the initial
FA based on either theory or data. However, if they discover that the factors
appear to be correlated in the data when theory has suggested them to be
uncorrelated, it is still most appropriate to rely on the data-based approach
and to use an oblique rotation. Although, in some cases, both procedures
might produce the same factor structure with the same data, using an
orthogonal rotation with correlated factors tends to overestimate loadings
(e.g., they will have higher values than with an oblique rotation; Loehlin,
1998). Thus, researchers may retain or reject some items inappropriately,
and the factor structure may be more difficult to replicate during CFA.
Our content analysis showed that relatively few of the studies in our
review reported an adequate rationale for selecting an orthogonal or oblique
rotation method, with only 2 using subscale intercorrelations, 3 using theory,
and 1 using both. Twelve studies did not specify the criteria used to select
a rotation method, and 3 studies actually reported criteria irrelevant to the
task (e.g., although the factors were correlated, the orthogonal solution
matched the prior expectations for the factor solution). Also, 8 studies used
orthogonal rotations despite reporting moderate to high correlations among
factors, and 4 studies did not provide factor intercorrelations.
Criteria for factor retention. Researchers can use numerous criteria to
estimate the number of factors for a given item set. The most widely
known approaches were recommended by Kaiser (1958) and Cattell
(1966) on the basis of eigenvalues, which may help determine the impor-
tance of a factor and indicate the amount of variance in the entire set of
items accounted for by a given factor (for a more detailed explanation of
eigenvalues, see Gorsuch, 1983). The iterative process of factor analysis
produces successively less useful information with each new factor
extracted in a set because each factor extracted after the first is based on
the residual of the previous factor’s extraction. The eigenvalues produced
will be successively smaller with each new factor extracted (accounting
for smaller and smaller proportions of variance) until virtually meaning-
less values result. Thus, Kaiser (1958) believed that eigenvalues less than
1.0 reflect potentially unstable factors. Cattell (1966) used the relative val-
ues of eigenvalues to estimate the correct number of factors to examine
during factor analysis—a procedure known as the scree test. Using the
scree plot, a researcher examines the descending values of eigenvalues to
locate a break in the size of eigenvalues, after which the remaining values
tend to level off horizontally.
Parallel analysis (Horn, 1965) is another procedure for deciding how
many factors to retain. Generally, when using parallel analysis, researchers
randomly order the participants’ item scores and conduct a factor analysis on
both the original data set and the randomly ordered scores. Researchers
determine the number of factors to retain by comparing the eigenvalues
determined in the original data set and in the randomly ordered data set.
They retain a factor if the original eigenvalue is larger than the eigenvalue
from the random data. This has been shown to work reasonably well when
using FA (Humphreys & Montanelli, 1975) as well as PCA (Zwick &
Velicer, 1986). Parallel analysis is not readily available in commonly used
statistical software, but programs are available that conduct parallel analysis
when using principal-axis factor analysis and PCA (see O’Connor, 2000).
Approximating simple structure is another way to evaluate factor reten-
tion during EFA. According to McDonald (1985), the term simple structure
has two radically different meanings that are often confused. A factor pat-
tern has simple structure (a) if several items load strongly on only one fac-
tor and (b) if items have a zero correlation to other factors in the solution.
SEM constrains the relationships between items and factors to produce
simple structure as defined earlier (which will become important later).
McDonald (1985) differentiates this from what he prefers to call approxi-
mate simple structure, often reported in counseling psychology research as
if it were simple structure, which substitutes the word small (undefined) for
the word zero (definitive) in the primary definition. Researchers can esti-
mate approximate simple structure by using rotation methods during FA. In
EFA, efforts to produce factor solutions with approximate simple structure
are central to decisions about the final number of factors and about the
retention and deletion of items in a given solution. If factors share items
that cross-load too highly on more than one factor (e.g., > .32), the items
are considered complex because they reflect the influence of more than one
factor. Approximating simple structure can be achieved through item or fac-
tor deletion or both. SEM approaches to CFA assume simple structure, and
very closely approximating simple structure during EFA will likely
improve the subsequent results of CFA using SEM.
The larger the number of items on a factor, the more confidence one has that
it will be a reliable factor in future studies. Thus, with a few minor caveats,
some authors have recommended against retaining factors with fewer than
three items (Tabachnick & Fidell, 2001). It is possible to retain a factor with
only two items if the items are highly correlated (i.e., r > .70) and relatively
uncorrelated with other variables. Under these conditions, it may be appropri-
ate to consider other criteria (e.g., interpretability) in deciding whether to retain
items that fail to contribute meaningfully to any of the potential factor solu-
tions will make it more difficult to make a final decision about the number
of factors to retain. Thus, the process we recommend is designed to retain
potentially meaningful items early in the process and to optimize scale
length only after the factor solution is clear.
Most researchers begin EFA with a substantially larger number of items
than they ultimately plan to retain. However, there is considerable variation
among studies in the proportion of items in the initial pool that are planned
for deletion. We recommend that researchers wait until the last step in EFA
to trim unnecessary items and focus primarily on empirical scale develop-
ment procedures at this stage in the process so as not to confuse the purposes
of these two similar activities (e.g., item deletion). Thus, researchers should
base decisions about whether to retain or delete items at this stage on their
contribution to the factor solution rather than on the final length of the scale.
Most researchers use some guideline for a lower limit on item factor
loadings and cross-loadings to determine whether to retain or delete items,
but the criteria for determining the magnitude of loadings and cross-loadings
have been described as a matter of researcher preference (Tabachnick &
Fidell, 2001). Larger, more frequent cross-loadings will contribute to factor
intercorrelations (requiring oblique rotation) and lesser approximations of
simple structure (described earlier). Thus, to the degree possible, researchers
should attempt to set their minimum values for factor loadings as high as
possible and the absolute magnitude for cross-loadings as low as possible
(without compromising scale length or factor structure), which will result
in fewer cross-loadings of lower magnitudes and better approximations of
simple structure. For example, researchers should delete items with factor
loadings less than .32 or cross-loadings less than .15 difference from an
item’s highest factor loading. In addition, they should also delete items that
contain absolute loadings higher than a certain value (e.g., .32) on two or
more factors. However, we urge researchers to use caution when using
cross-loadings as a criterion for item deletion until establishing the final
factor solution because an item with a relatively high cross-loading could
be retained if the factor on which it is cross-loaded is deleted or collapsed
into another existing factor.
Item communalities after rotation can be a useful guide for item deletion as
well. Remember that high item communalities are important for determining
the factorability of a data set, but they can also be useful in evaluating specific
items for deletion or retention because a communality reflects the proportion of
item variance accounted for by the factors; it is the squared multiple correlation
of the item as predicted from the set of factors in the solution (Tabachnick &
Fidell, 2001). Thus, items with low communalities (e.g., less than .40) are not
highly correlated with one or more of the factors in the solution.
In our content analysis, the most common criteria for item-deletion deci-
sions were absolute values of item loadings and cross-loadings, which were
often used in combination. None of the studies we reviewed reported using
item communalities as a criterion for deletion, and one study used item-
analysis procedures (e.g., contribution to internal consistency reliability).
There were no items deleted in two studies, and two others did not specify
the criteria for item deletion.
Optimizing scale length. Once the items have been evaluated, it is useful to
assess the trade-off between length and reliability to optimize scale length.
Longer scales of relatively highly correlated items are generally more reliable,
but Converse and Presser (1986) recommended that questionnaires take no
longer than 50 minutes to complete. In our experience, scales that take longer
than about 15 to 30 minutes might become problematic, depending on the
respondents, the intended use of the scale, and the respondents’ motivation
regarding the purpose of the administration. Thus, scale developers may find
it useful to examine the length of each subscale to determine whether it is a
reasonable trade-off to sacrifice a small degree of internal consistency to
shorten its length. Some statistical packages (e.g., SPSS) allow researchers to
compare all the items on a given subscale to identify those that contribute the
least to internal consistency, making item deletion with the goal of optimizing
scale length relatively easy. Generally, when a factor contains more than the
desired number of items, the researcher will have the option of deleting items
that (a) have the lowest factor loadings, (b) have the highest cross-loadings,
(c) contribute the least to the internal consistency of the scale scores, and
(d) have low conceptual consistency with other items on the factor. The
researcher should avoid scale-length optimization that degrades the quality of
the factor structure, factor intercorrelations, item communalities, factor load-
ings, or cross-loadings. Ultimately, researchers must conduct a final EFA to
ensure that the factor solution does not change after deleting items.
CFA
SEM versus FA. SEM has become a widely used tool in explaining theo-
retical models within the social and behavioral sciences (see Martens, 2005;
Martens & Hasse, 2006; Quintana & Maxwell, 1999; Weston & Gore, 2006).
CFA is one of the most popular uses of SEM. CFA is most commonly used
during the scale development process to help support the validity of a scale
following an EFA. In the past, a number of published studies have used FA or
PCA procedures as confirmatory approaches (Gerbing & Hamilton, 1996).
With the increasing availability of computer software, however, most
researchers use SEM as the preferred approach for CFA.
data. In this case, researchers may use fit indices to select among compet-
ing models. It is becoming more and more common to compare nonnested
models using predictive fit indices (discussed further on), which indicate
how well a model will cross-validate in future samples.
Some competing models may be equivalent models—that is, these models
are mathematically equivalent even when their parameter configurations
appear different (MacCallum, Wegener, Uchino, & Fabrigar, 1993), and
they will have a different configuration but yield the same chi-square test
statistics and goodness-of-fit indices. Thus, theory should play the strongest
role in selecting the appropriate model when comparing equivalent models.
Another SEM approach that may support the construct validity of a scale
is called multiple-group analysis. In multiple-group analysis, the same
structural equation model may be applied to the data for two or more dis-
tinct groups (e.g., male and female) to simultaneously test for invariance
(model equivalency) across the two groups by constraining different sets of
model parameters to be equal in both groups (for more on conducting
multiple-group analysis, see Bentler, 1995; Bollen, 1989; Byrne, 2001).
Of the 10 studies in the content analysis using a confirmatory SEM
approach, 2 of them used the single-model approach wherein the model
produced by the EFA was specified in a CFA, and 8 of the studies per-
formed model comparisons. Of these 8 studies, 4 evaluated nested models,
but only 3 of the 4 used the chi-square difference test when selecting among
the nested models. All 4 of the studies used fit indices to select among
nonnested competing models. Of the 4 studies comparing alternative
nonnested models, 2 used predictive fit indices when selecting among the
set of competing models. Researchers compared equivalent and nonequiv-
alent models in 2 of the studies in the content analysis. One of these stud-
ies selected a nonequivalent model over 2 equivalent models based on
higher values of the fit indices. In the second study, the authors relied on
theory when selecting among 2 equivalent models.
Sample-size considerations. The statistical theory underlying SEM is
asymptotic, which assumes that large sample sizes are necessary to provide
stable parameter estimates (Bentler, 1995). Thus, some researchers have
suggested that SEM analyses should not be performed on sample sizes
smaller than 200, whereas others recommend minimum sample sizes
between 100 and 200 participants (Kline, 2005). Another recommendation
is that there should be between 5 and 10 participants per observed variable
(Grimm & Yarnold, 1995); yet another guideline is that there should be
between 5 and 10 participants per parameter to be estimated (Bentler &
Chou, 1987). The findings are mixed in terms of which criterion is best
because it depends on various model characteristics, including the number
of indicator variables per factor (Marsh, Hau, Balla, & Grayson, 1998),
they used particular criteria to evaluate the adequacy of the sample size to
conduct SEM. However, we assessed the sample sizes for all the studies
included in the content analysis and determined that the remaining studies
met the 5:1 ratio of participants to parameters.
Overall model fit. Researchers typically use a chi-square test statistic
as a test of overall model fit in SEM. The chi-square test, however, is
often criticized for its sensitivity to sample size (Bentler & Bonett, 1980;
Hu & Bentler, 1999). The sample-size dependency of the chi-square test
statistic has led to the proposal of numerous alternative fit indices that
evaluate model fit, supplementing the chi-square test statistic. These fit
indices may be classified as incremental, absolute, or predictive fit indices
(Kline, 2005).
Incremental fit indices measure the improvement in a model’s fit to the
data by comparing a specific structural equation model to a baseline struc-
tural equation model. The typical baseline comparison model is the null (or
independence) model in which all the variables are independent of each
other or uncorrelated (Bentler & Bonnett, 1980). Absolute fit indices mea-
sure how well a structural equation model explains the relationships found
in the sample data. Predictive fit indices (or information criteria) measure
how well the structural equation model would fit in other samples from the
same population (see Table 3 for examples of incremental, absolute, and
predictive fit indices).
We should note that there are various recommendations about reporting
these indices as well as suggested cutoff values for each of these fit indices
(e.g., see Hu & Bentler, 1999; Kline, 2005). Researchers have commonly
interpreted incremental fit index, goodness-of-fit index, adjusted goodness-
of-fit index, and McDonald’s Fit Index (MFI) values greater than .90 as an
acceptable cutoff (Bentler & Bonnett, 1980). More recently, however, SEM
researchers have advocated .95 as a more desirable level (e.g., Hu &
Bentler, 1999). Values for the standardized root mean square residual
(SRMR) less than .10 are generally indicative of acceptable model fit.
Values for the root mean square error of approximation (RMSEA) at or less
than .05 indicate close model fit, which is customarily considered accept-
able. However, debate continues concerning the use of these indices and the
cutoff values when fitting structural equation models (e.g., see Marsh, Hau,
& Wen, 2004). One reason for this debate is that the findings are mixed in
terms of which index is best, and their performance depends on various
study characteristics, including the number of variables (Kenny & McCoach,
2003), estimation method (Fan et al., 1999; Hu & Bentler, 1998), model
misspecification (Hu & Bentler, 1999), and sample size (Marsh, Balla, &
Hau, 1996). Researchers should bear in mind that suggested cutoff criteria
are general guidelines and are not necessarily definitive rules.
to covary. Neither of the two studies that modified the original structural
equation model cross-validated the respecified model in a separate sample.
Researchers in two of the studies in the content analysis used item parcel-
ing to avoid estimating large number of parameters and to reduce random
error, an approach we do not recommend.
CONCLUSIONS
APPENDIX
Journal of Counseling Psychology
Scale Development Articles
Reference List (1995 to 2004)
Barber, J. P., Foltz, C., & Weinryb, R. M. (1998). The central relationship questionnaire: Initial
report. Journal of Counseling Psychology, 45, 131-142.
Dillon, F. R., & Worthington, R. L. (2003). The Lesbian, Gay, and Bisexual Affirmative
Counseling Self-Efficacy Inventory (LGB-CSI): Development, validation, and training
implications. Journal of Counseling Psychology, 50, 235-251.
Heppner, P. P., Cooper, C., Mulholland, A., & Wei, M. (2001). A brief, multidimensional, problem-
solving psychotherapy outcome measure. Journal of Counseling Psychology, 48, 330-343.
Hill, C. E., & Kellems, I. S. (2002). Development and use of the helping skills measure to
assess client perceptions of the effects of training and of helping skills in sessions. Journal
of Counseling Psychology, 49, 264-272.
Inman, A. G., Ladany, N., Constantine, M. G., & Morano, C. K. (2001). Development and pre-
liminary validation of the Cultural Values Conflict Scale for South Asian women. Journal
of Counseling Psychology, 48, 17-27.
Kim, B. K., Atkinson, D. R., & Yang, P. H. (1999). The Asian Values Scale: Development, fac-
tor analysis, validation, and reliability. Journal of Counseling Psychology, 46, 342-352.
Kivlighan, D. M., Multon, K. D., & Brossart, D. F. (1996). Helpful impacts in group counsel-
ing: Development of a multidimensional rating system. Journal of Counseling Psychology,
43, 347-355.
Lee, R. M., Choe, J., Kim, G., & Ngo, V. (2000). Construction of the Asian American Family
Conflicts Scale. Journal of Counseling Psychology, 47, 211-222.
Lehrman-Waterman, D., & Ladany, N. (2001). Development and validation of the evaluation
process within supervision inventory. Journal of Counseling Psychology, 48, 168-177.
Lent, R. W., Hill, C. E. & Hoffman, M. A. (2003). Development and validation of the
Counselor Activity Self-Efficacy scales. Journal of Counseling Psychology, 50, 97-108.
Liang, C. T. H., Li, L. C., & Kim, B. S. K. (2004). The Asian American Racism-Related Stress
Inventory: Development, factor analysis, reliability, and validity. Journal of Counseling
Psychology, 51, 103-114.
Mallinckrodt, B., Gantt, D. L., & Coble, H. M. (1995). Attachment patterns in the psy-
chotherapy relationship: Development of the client attachment to therapist scale. Journal
of Counseling Psychology, 42, 307-317.
Miville, M. L., Gelso, C. J., Pannu, R., Liu, W., Touradji, P., Holloway, P., & Fuertes, J. (1999).
Appreciating similarities and valuing differences: The Miville-Guzman Universality-
Diversity Scale. Journal of Counseling Psychology, 46, 291-307.
Downloaded from tcp.sagepub.com at TULANE UNIV on September 5, 2014
Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 835
Mohr, J. J., & Rochlen, A. B. (1999). Measuring attitudes regarding bisexuality in lesbian, gay
male, and heterosexual populations. Journal of Counseling Psychology, 46, 353-369.
Neville, H. A., Lilly, R. L., Duran, G., Lee, R. M., & Browne, L. (2000). Construction and ini-
tial validation of the Color-Blind Racial Attitudes Scale (CoBRAS). Journal of Counseling
Psychology, 47, 59-70.
O’Brien, K. M., Heppner, M. J., Flores, L. Y., Bikos, L. H. (1997). The Career Counseling
Self-Efficacy Scale: Instrument development and training applications. Journal of
Counseling Psychology, 44, 20-31.
Phillips, J. C., Szymanski, D. M., Ozegovic, J. J., & Briggs-Phillips, M. (2004). Preliminary
examination and measurement of the internship research training environment. Journal of
Counseling Psychology, 51, 240-248.
Rochlen, A. B., Mohr, J. J., & Hargrove, B. K. (1999). Development of the attitudes toward
career counseling scale. Journal of Counseling Psychology, 46, 196-206.
Schlosser, L. Z., & Gelso, C. J. (2001). Measuring the working alliance in advisor-advisee
relationships in graduate school. Journal of Counseling Psychology, 48, 157-167.
Skowron, E. A., & Friedlander, M. L. (1998). The differentiation of self inventory: Development
and initial validation. Journal of Counseling Psychology, 45, 235-246.
Spanierman, L. B., & Heppner, M. J. (2004). Psychosocial Costs of Racism to Whites Scale
(PCRW): Construction and initial validation. Journal of Counseling Psychology, 51, 249-262.
Utsey, S. O., & Ponterotto, J. G. (1996). Development and validation of the index of race-
related stress. Journal of Counseling Psychology, 43, 490-501.
Wang, Y., Davidson, M. M., Yakushko, O. F., Savoy, H. B., Tan, J. A., & Bleier, J. K. (2003).
The Scale of Ethnocultural Empathy: Development, validation, and reliability. Journal of
Counseling Psychology, 50, 221-234.
REFERENCES
Brown, F. G. (1983). Principles of educational and psychological testing (3rd ed.). New York:
Holt, Rinehart, & Winston.
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological
Methods and Research, 21, 230-258.
Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications
and programming. Mahwah, NJ: Lawrence Erlbaum.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral
Research, 1, 245-276.
Cattell, R. B. (1974). Radial item parcel factoring vs. item factoring in defining personality
structure in questionnaires: Theory and experimental checks. Australian Journal of
Psychology, 26, 103-119.
Chou, C., & Bentler, P. M. (2002). Model modification in structural equation modeling by
imposing constraints. Computational Statistics and Data Analysis, 41, 271-287.
Comrey, A. L. (1973). A first course in factor analysis. New York: Academic Press.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum.
Converse, J. M., & Presser, S. (1986). Survey questions: Handcrafting the standardized ques-
tionnaire. Newbury Park, CA: Sage.
Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34, 481-489.
DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks,
CA: Sage.
Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model
specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56-83.
Fassinger, R. E. (1987). Use of structural equation modeling in counseling psychology
research. Journal of Counseling Psychology, 34, 425-436.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of
clinical assessment instruments. Psychological Assessment, 7, 286-299.
Friedenberg, L. (1995). Psychological testing: Design, analysis, and use. Boston, MA: Allyn
and Bacon.
Gerbing, D. W., & Hamilton, J. G. (1996). Viability of exploratory factor analysis as a pre-
cursor to confirmatory factor analysis. Structural Equation Modeling, 3, 62-72.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Gorsuch, R. L. (1990). Common factor analysis versus principal components analysis: Some
well and little known facts. Multivariate Behavioral Research, 25, 33-39.
Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of
Personality Assessment, 68, 532-560.
Gorsuch, R. L. (2003). Factor analysis. In J. A Schinka & W. F. Velicer (Eds.), Handbook of
psychology: Research methods in psychology (Vol. 2, pp. 143-164). Hoboken, NJ: John Wiley.
Grimm, L. G., & Yarnold, P. R. (1995). Reading and understanding multivariate statistics.
Washington, DC: American Psychological Association.
Guadagnoli, E., & Velicer, W. F. (1988). The relationship of sample size to the stability of com-
ponent patterns. Psychological Bulletin, 103, 265-275.
Helms, J. E., Henze, K. T., Sass T. L., & Mifsud, V. A. (2006). Treating Cronbach’s alpha
reliability as data in nonpsychometric substantive applied research. The Counseling
Psychologist, 34, 630-660.
Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psy-
chology research. The Counseling Psychologist, 34, 601-629.
Hoelter, J. W. (1983). The analysis of covariance structures: Goodness-of-fit indices.
Sociological Methods & Research, 11, 325-344.
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis.
Psychometrika, 30, 179-185.
Hoyt, W. T., Warbasse, R. E., & Chu, E. Y. (2006). Construct validation in counseling
psychology research. The Counseling Psychologist, 34, 769-805.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to
underparameterized model misspecification. Psychological Methods, 3, 424-453.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.
Humphreys, L. G, & Montanelli, R. G. (1975). An investigation of the parallel analysis criterion for
determining the number of common factors. Multivariate Behavioral Research, 10, 193-205.
Jöreskog, K. G., & Sörbom, D. (1981). LISREL V: Analysis of linear structural relations by
the method of maximum likelihood. Chicago: International Educational Services.
Jöreskog, K. G., & Sörbom, D. (1984). LISREL 6: A guide to the program and applications.
Chicago: SPSS.
Kahn, J. H. (2006). Factor analysis in counseling psychology research, training, and practice:
Principles, advances, and applications. The Counseling Psychologist, 34, 684-718.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187-200.
Kenny, D. A., & McCoach, D. B. (2003). Effect of the number of variables on measures of fit
in structural equation modeling. Structural Equation Modeling, 10, 333-351.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New
York: Guilford.
Loehlin, J. C. (1998). Latent variable models: An introduction to factor, path, and structural
analysis (3rd ed.). Mahwah, NJ: Lawrence Erlbaum.
MacCallum, R. C. (1986). Specification searches in covariance structure modeling.
Psychological Bulletin, 107, 247-255.
MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covari-
ance structure analysis: The problem of capitalization on chance. Psychological Bulletin,
111, 490-504.
MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of
equivalent models in applications of covariance structure analysis. Psychological Bulletin,
114, 185-199.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analy-
sis. Psychological Methods, 4, 84-99.
Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indices: A clar-
ification of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker
(Eds.), Advanced structural equation modeling: Issues and techniques (pp. 315-353).
Mahwah, NJ: Lawrence Erlbaum.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirma-
tory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.
Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The
number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral
Research, 33, 181-220.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-
testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing
Hu and Bentler’s (1999) findings. Structural Equation Modeling, 11, 320-341.
Martens, M. P. (2005). The use of structural equation modeling in counseling psychology
research. The Counseling Psychologist, 33, 269-298.
Martens, M. P., & Hasse, R. F. (2006). Advanced applications of structural equation modeling
in counseling psychology research. The Counseling Psychologist, 34, 878-911.
McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale, NJ: Lawrence Erlbaum.
McDonald, R. P. (1989). An index of goodness-of-fit based on noncentrality. Journal of
Classification, 6, 97-103.
McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and
goodness of fit. Psychological Bulletin, 107, 247-255.
Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989).
Evaluation of goodness-of-fit indices for structural equation models. Psychological
Bulletin, 105, 430-445.
O’Connor, B. P. (2000). SPSS and SAS programs for determining the number of components
using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments,
and Computers, 32, 396-402.
Quintana, S. M., & Maxwell, S. E. (1999). Implications of recent developments in structural
equation modeling for counseling psychology. The Counseling Psychologist, 27, 485-527.
Quintana, S. M., & Minami, T. (2006). Guidelines for meta-analyses of counseling psychol-
ogy research. The Counseling Psychologist, 34, 839-876.
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision.
Psychological Assessment, 12, 287-297.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464.
Sherry, A. (2006). Discriminant analysis in counseling psychology research. The Counseling
Psychologist, 34, 661-683.
Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT and SYGRAPH.
Evanston, IL: SYSTAT.
Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common
factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA.
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). New York:
Harper & Row.
Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts
and applications. Washington, DC: American Psychological Association.
Tinsley, H. E. A., & Tinsley, D. J. (1987). Uses of factor analysis in counseling psychology
research. Journal of Counseling Psychology, 34, 414-424.
Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation of factor analytic research
procedures by means of simulated correlation matrices. Psychometrika, 34, 421-459.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor
analysis. Psychometrika, 38, 1-10.
Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subject sampling on factor pattern
recovery. Psychological Methods, 3, 231-251.
Velicer, W. F., & Jackson, D. N. (1990). Component analysis versus common factor analysis:
Some issues in selecting an appropriate procedure. Multivariate Behavioral Research,
25, 1-28.
Velicer, W. F., Peacock, A. C., & Jackson, D. N. (1982). A comparison of component and fac-
tor patterns: A Monte Carlo approach. Multivariate Behavioral Research, 17, 371-388.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal
variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling:
Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.
Weston, R., & Gore, P. A., Jr. (2006). SEM 101: A brief guide to structural equation modeling.
The Counseling Psychologist, 34, 719-751.
Widaman, K. F. (1993). Common factor analysis versus principal components analysis:
Differential bias in representing model parameters? Multivariate Behavioral Research,
28, 263-311.
Worthington, R. L., & Navarro, R. L. (2003). Pathways to the future: Analyzing the contents
of a content analysis. The Counseling Psychologist, 31, 85-92.
Zwick, W. R., & Velicer, W. F. (1986). Factors influencing five rules for determining the num-
ber of components to retain. Psychological Bulletin, 99, 432-442.