Académique Documents
Professionnel Documents
Culture Documents
2
.
E
x
t
r
a
v
e
r
s
i
o
n
5
.
1
3
(
1
.
8
9
)
0
2
3
.
A
n
x
i
e
t
y
4
.
5
5
(
1
.
8
5
)
0
4
3
0
4
.
T
o
u
g
h
-
m
i
n
d
e
d
n
e
s
s
5
.
3
9
(
2
.
0
5
)
1
6
3
3
1
2
5
.
I
n
d
e
p
e
n
d
e
n
c
e
5
.
2
0
(
1
.
6
9
)
0
4
3
4
1
4
2
6
6
.
S
e
l
f
-
c
o
n
t
r
o
l
5
.
8
1
(
1
.
4
7
)
2
1
2
8
0
4
4
6
2
2
7
.
E
m
p
i
r
i
c
a
l
3
.
3
0
(
2
.
1
8
)
1
2
3
2
0
5
0
6
1
0
0
9
1
8
8
.
I
n
i
t
i
a
t
i
n
g
s
t
r
u
c
t
u
r
e
1
.
0
9
(
4
.
6
7
)
0
1
5
1
2
0
6
1
3
1
2
0
3
3
3
9
.
P
a
r
t
i
c
i
p
a
t
i
o
n
.
3
8
(
2
.
9
7
)
0
1
7
2
5
1
6
2
4
1
3
1
1
3
2
6
8
1
0
.
E
m
p
o
w
e
r
m
e
n
t
1
.
0
9
(
4
.
6
7
)
0
1
5
1
2
0
6
1
3
1
2
0
3
3
3
1
0
0
6
8
1
1
.
H
y
b
r
i
d
i
n
i
t
i
a
t
i
n
g
s
t
r
u
c
t
u
r
e
.
6
4
(
3
.
9
8
)
1
0
0
1
3
0
6
0
9
1
1
0
6
1
1
8
9
5
5
8
9
1
2
.
H
y
b
r
i
d
p
a
r
t
i
c
i
p
a
t
i
o
n
1
.
2
1
(
3
.
2
6
)
0
2
3
2
1
1
4
2
8
1
5
2
3
6
4
6
4
8
9
6
4
3
5
1
3
.
H
y
b
r
i
d
e
m
p
o
w
e
r
m
e
n
t
3
.
7
2
(
4
.
9
8
)
0
2
3
0
9
0
6
1
7
1
3
1
1
5
9
9
3
6
9
9
3
6
9
7
8
1
4
.
V
r
o
o
m
t
i
m
e
-
b
a
s
e
d
8
.
8
9
(
1
.
6
6
)
4
2
1
1
5
0
8
1
0
0
1
0
2
0
9
3
7
2
9
3
7
4
5
2
3
3
1
1
5
.
V
r
o
o
m
d
e
v
e
l
o
p
m
e
n
t
a
l
7
.
4
5
(
1
.
4
8
)
6
2
2
0
3
0
3
0
3
0
5
0
2
3
9
4
0
2
1
4
0
2
1
3
3
5
0
2
7
1
6
.
S
u
b
j
e
c
t
m
a
t
t
e
r
e
x
p
e
r
t
s
8
.
6
7
(
2
.
3
0
)
3
3
2
1
0
1
1
1
4
0
4
1
2
5
1
3
2
1
5
3
2
1
0
3
1
4
3
2
0
1
4
1
7
.
N
o
v
i
c
e
v
s
.
e
x
p
e
r
t
s
9
.
2
0
(
2
.
5
4
)
0
0
9
0
3
0
9
0
6
0
1
0
2
0
4
2
1
3
3
2
1
2
3
2
2
1
9
2
2
3
9
1
2
1
8
.
L
e
a
d
e
r
s
h
i
p
r
a
t
i
n
g
3
4
.
8
5
(
5
.
0
1
)
3
2
0
7
0
1
0
8
0
0
1
5
2
5
0
3
0
4
0
3
1
7
2
2
1
7
0
7
1
3
3
2
1
2
1
9
.
O
v
e
r
a
l
l
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
n
g
1
8
.
5
9
(
2
.
9
4
)
2
6
1
1
1
2
0
1
0
5
0
1
1
5
0
3
0
1
0
3
1
1
1
0
1
1
1
1
1
1
2
2
0
4
7
2
N
o
t
e
s
:
D
e
c
i
m
a
l
p
o
i
n
t
s
o
m
i
t
t
e
d
i
n
c
o
r
r
e
l
a
t
i
o
n
s
.
E
n
t
r
i
e
s
w
i
t
h
a
n
a
b
s
o
l
u
t
e
v
a
l
u
e
o
f
.
1
9
o
r
g
r
e
a
t
e
r
a
r
e
s
i
g
n
i
f
i
c
a
n
t
a
t
p
o
.
0
5
.
a
0
i
t
e
m
s
r
e
f
e
r
s
t
o
t
h
e
n
u
m
b
e
r
o
f
i
t
e
m
s
i
n
t
h
e
l
e
a
d
e
r
s
h
i
p
s
k
i
l
l
s
a
s
s
e
s
s
m
e
n
t
(
L
S
A
)
k
e
y
t
h
a
t
a
r
e
u
n
s
c
o
r
e
d
.
D
a
s
h
e
d
l
i
n
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
i
s
v
a
r
i
a
b
l
e
i
s
n
o
t
r
e
l
e
v
a
n
t
t
o
t
h
e
p
a
r
t
i
c
u
l
a
r
s
c
a
l
e
.
230 MINDY E. BERGMAN, FRITZ DRASGOW, MICHELLE A. DONOVAN, JAIME B. HENNING AND SUZANNE E. JURASKA
International Journal of Selection and Assessment
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006
statistically significant validity coefficients, many more did
not. Obviously, with only one SJT and one sample, we
cannot reach abiding conclusions about the goodness of
scoring approaches for all SJTs, or the boundary conditions
under which some approaches might be better than others.
What we can conclude is that the validity of an SJTdepends
in part on its scoring, and that poor choices could lead to
the conclusion that the SJTs content is not valid when it
may only be the scoring key that is not valid.
Our recommendation to carefully follow standard
validation procedures is not surprising. However, doing
so may be especially important for SJTs. Although a key
may be criterion-related, it might add little value once
cognitive ability and personality measures (which are
widely available and relatively inexpensive) are accounted
for. Because some SJTs especially multimedia, computer-
ized SJTs are costly to construct and, importantly, to
administer, it is not enough to know whether an SJT
predicts a criterion; it must also provide incremental value.
Further, because of the difficulty in determining correct
answers, organizations facing legal challenges to their SJT
use will need to be able to explain not only why the SJTs
content is job relevant but also why a particular scoring
strategy was used. Careful validation should both minimize
legal challenges and help organizations survive those that
do arise.
The challenging aspects of scoring are likely to increase
exponentially as the breadth of the SJTincreases. Although
empirical keying could proceed in the same general fashion
regardless of the breadth of the SJT (assuming that the
criterion was not deficient), other scoring strategies would
likely become more complicated. For every content subset
in an SJT, there will be different sets of theories to apply for
theoretical keying and different SMEs to query for expert
keying. The various keys for the subtests could be
combined in more or less optimal ways, such that the best
key for one subtest depends in part on the key for another
subtest. Hybrid scoring systems could then be applied to
the various keys, carrying over these same concerns. To
complicate matters, non-linear scoring (e.g., Breiman
et al.s (1984) classification and regression tree analysis)
might lead to the highest validity. In short, as the breadth
increases for an SJT, scoring can become more complex.
These issues speak to the importance of test development.
Awell-constructedSJTis one that wouldreflect clear content
domains, rather than contain a hodgepodge of items. As
difficult as scoring SJTs becomes as the breadth of the test
increases, it would be even more complicated if specific
content areas cannot be identified. Without clear content
domains, there is little guidance as to where test constructors
should look for theories to determine the scoring key or how
SMEs should think about the meanings of the items. Thus,
although broader SJTs are likely to have more scoring
difficulties than narrower ones, some of these problems can
be ameliorated if the SJT is carefully constructed to reflect
rational if not theoretical content domains.
Table 3. Hierarchical regressions
Step Variables entered b t R
2
F
a
DR
2
F for DR
2b
1. Wonderlic .32 13.09
*
.101 13.61
*
2. Extraversion .08 .76 .112 2.44
*
.011 .29
Anxiety .01 .14
Toughmindedness .01 .13
Independence .03 .26
Self-control .07 .68
3a. Empirical .17 1.77 .136 2.58
*
.023 3.06
*
3b. Initiating structure .03 .33 .113 2.09
*
.001 .13
3c. Participation .03 .35 .113 2.09
*
.001 .13
3d. Empowerment .03 .33 .113 2.09
*
.001 .13
3e. Hybrid initiating structure .18 2.01
*
.142 2.72
*
.030 4.02
*
3f. Hybrid participation .14 1.51 .129 2.44
*
.017 2.24
*
3g. Hybrid empowerment .10 1.14 .122 2.28
*
.010 1.31
3h. Time-based Vroom .02 .25 .113 2.08 .001 .13
3i. Development-based Vroom .07 .75 .116 2.17 .004 .52
3j. SMEs .24 2.60
*
.161 3.16
*
.049 6.72
*
3k. Novice vs. experts .09 1.05 .121 2.25
*
.009 1.18
Notes: Only the incremental additions to the hierarchical regressions are shown. Steps 1 and 2 were the same across all
sets of regressions.
a
Degrees of freedom for F tests were: step 1 (1, 121), step 2 (6, 116), step 3 (7, 115).
b
Degrees of freedom for F tests in change in R
2
were: step 2 (5, 116), step 3 (1, 115).
*
po.05.
SCORING SJTS 231
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006 Volume 14 Number 3 September 2006
Further, test administration can affect responses. For
example, different instruction sets lead to different
responses that, at the key level, are differentially related
to criteria (McDaniel & Nguyen, 2001; Ployhart &
Ehrhart, 2003). Even instructions that differ in seemingly
minor ways, such as identify the best option and
identify what you would do (what Ployhart & Ehrhart,
2003 referred to as should do and would do
instructions), appear to lead to different responses. This
suggests that keys and the constructs they represent vary
not only due to chosen scoring strategies but also because
of the ways that the respondents approach the assessment.
Different keys can lead to an SJT assessing different
constructs evenwhenit was designedtomeasure performance
in a specific domain, such as our LSA. For the LSA, we keyed
three different theoretical constructs: initiating structure,
participation, and empowerment. On one hand, these keys all
reflect the domain of leadership skills. On the other hand,
these keys refer to different components of the leadership
skills domain and it could be argued that they represent
different constructs. For empirical keys, as well as con-
tingency theory keys such as the Vroom(2000) keys, different
domains could be the best answer for different questions.
Because of the potentially multi-dimensional nature of
SJTs as well as the many constructs that this method can be
applied to, it may be difficult to conduct meta-analyses on
some of the questions raised here (Hesketh, 1999). To
minimize this potential problem, researchers should
include in their reports, in addition to standard validity
coefficients and effect sizes, descriptions of: (a) the
domain(s) that the items of the SJT measure; (b) the scoring
methods used; and, (c) the instruction set for the SJT.
Without this information, it will be impossible for future
meta-analytic efforts to reach any meaningful conclusions.
Limitations
As with any study, this one has limitations. First, the small
sample size makes strong conclusions difficult. Larger
samples would allowfor greater confidence in the results. A
larger sample might permit additional analyses, such as
subgroup differences across ethnicity groups. Further, the
low power afforded by the small sample size makes it
difficult to interpret differences in the validities of the keys.
However, because our predictors and criteria were
collected from different sources (managers and their
supervisors, respectively), some problems common to
concurrent validation such as common method bias
are not at issue here.
Further, we must acknowledge that a different empirical
key could emerge with a different or larger sample. The
minimum endorsement criterion is, in part, dependent on
the sample size (i.e., one should not require a minimum
endorsement rate that is unachievable in a particular
sample). Additionally, a sample froma different population
might lead to different results. Large samples from diverse
populations should improve the stability and general-
izability of keys.
Table 4. Subgroup differences across sex
Mean (SD), male Mean (SD), female t d
Wonderlic Personnel Test 27.73 (5.91) 27.11 (5.73) .55 .11
Extraversion 4.50 (1.77) 5.43 (1.88) 2.63
*
.51
Anxiety 4.49 (2.02) 4.59 (1.77) .27 .05
Tough-mindedness 6.30 (2.05) 4.96 (1.91) 3.59
*
.69
Independence 5.20 (1.48) 5.20 (1.79) .02 .003
Self-control 6.11 (1.47) 5.67 (1.46) 1.55 .30
Empirical 3.70 (2.16) 3.11 (2.18) 1.42 .27
Initiating structure 1.33 (4.68) .98 (4.70) .39 .07
Participation .40 (3.02) .37 (2.96) .05 .01
Empowerment 1.33 (4.68) .98 (4.70) .39 .07
Hybrid initiating structure .78 (4.53) .58 (3.71) .26 .05
Hybrid participation 1.10 (3.25) 1.27 (3.28) .26 .05
Hybrid empowerment 4.03 (4.56) 3.58 (5.20) .46 .09
Time-based Vroom 9.18 (1.68) 8.76 (1.64) 1.31 .25
Development-based Vroom 7.53 (1.40) 7.41 (1.52) .40 .08
SMEs 8.30 (1.52) 8.86 (2.58) 1.26 .24
Novice vs. expert 9.08 (2.69) 9.27 (2.48) .39 .07
Leadership ratings 34.73 (5.01) 34.90 (5.04) .58 .11
Overall performance ratings 18.60 (2.45) 18.59 (3.17) .10 .02
Note: N540 male, 83 female, degrees of freedom5121.
*
po.05.
232 MINDY E. BERGMAN, FRITZ DRASGOW, MICHELLE A. DONOVAN, JAIME B. HENNING AND SUZANNE E. JURASKA
International Journal of Selection and Assessment
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006
Most notably, we cannot drawa firmconclusion based
on this single sample in this single organization at this
single time using a single SJT about which scoring method
is best. There are likely to be boundary conditions on the
best scoring method, based on test content, test instruc-
tions, response options, and the like, that will aid in the
determination of the best scoring method for this SJT and
other assessments used in the future. However, this paper
provides a useful guide in both (a) the methods that are
currently available to create SJT keys and (b) the ways to
evaluate the relative effectiveness of keys.
Practical Issues in SJT Scoring
One important issue in all scoring is that there are other
keys that could be constructed. Although there is not an
infinite number of keys for an assessment, the possible
permutations of the pattern of scoring as correct, incorrect,
and zero across the number of options and items is of a very
large magnitude for a test of any reasonable length; a non-
trivial number of these mathematically possible keys are
likely to make some rational or theoretical sense. Further,
different criteria, approaches, and minimum scoring
requirements could lead to a multitude of other empirical
keys than the ones constructed. In short, there are many
ways to create a key within each general scoring strategy.
How one chooses keying systems should depend on the
tests intended use, theory (not just for theoretical keys, but
also to determine which scoring strategies are most useful),
and practical considerations.
Potential Effects of Studying Keys on
Broader Theory
One potential application of studying keys could be
providing support for theories about the content domain
of assessments. Support for a theory would be found when
empirical and theoretical keys overlap greatly in their best
option identification. For example, the LSA could be used
to provide support for a theory of leadership, such as
Vrooms (2000; Vroom & Jago, 1978; Vroom & Yetton,
1973). The extent to which the empirical key which, by
design, is related to leadership performance and the
theoretical key identify the same best and worst options
would indicate support for the theory.
The utility of this approach hinges on three issues. First,
the criterion measure must be reasonably construct valid so
that an effective and appropriate empirical key is created.
Second, the theoretical key must be developed carefully so
that it accurately reflects the theory. Finally, unscored
items on the empirical key must be minimized through the
use of a large sample. This is necessary so that there are
enough opportunities to evaluate the congruence of the
empirical and theoretical keys. Additionally, the unscored
options on the empirical key must be examined so that it is
clear whether they are unscored due to low correlations or
low endorsement rates. Options that have low correlations
and meet the minimum endorsement rate on the empirical
key are more informative about the option and its possible
relation to theory than options that are not scored because
they have not met the minimum endorsement criterion. It
may be useful to think of the first case as a score of zero
and the second as unscored, because in the second case it
is unclear what the score will be if the minimum
endorsement requirement is met. As noted in the introduc-
tion, there are many reasons why an option is scored as
zero; some of these reasons are mitigated when the
minimum endorsement criterion is met.
Conclusion
We described and illustrated a process for determining the
best key(s) fromamong many possible keys. Keys should be
assessed for validity, incremental validity, adverse impact,
and construct validity as described in this paper. Fromthese
analyses, the best key(s) can be identified. Although this
validation strategy seems basic, studies in the SJT literature
have rarely addressed the potential differential validity of
the multiple keys available for a given test. As demon-
strated here, it is essential that researchers critically
evaluate their SJT keying choices.
As we noted at the start of this paper, the major purpose
of this paper is to stimulate research on the topic of keying
in SJTs. We have reviewed the six general approaches to
scoring that have been examined or discussed in the SJTor
biodata scoring literatures to date, and we have demon-
strated four of them. Other scoring strategies might be
developed in the future, which will expand the possible
repertoire of scoring methodologies. Our goal is to
encourage SJT developers and researchers to investigate
and implement multiple scoring methods in their research
and to publish the various results of these keys. Ideally, in
10 years time, we would be able to revisit this topic to
conduct a meta-analysis on scoring strategies in order to
assess which approach is best.
Notes
1. Although the empowerment and initiating structure
keys are perfectly negatively correlated, both are
described because they were used in hybrid scoring;
the hybrid keys were not perfectly negatively correlated.
For ease of comparison, both the empowerment and the
initiating structure keys are presented here and included
in the analyses.
2. We must acknowledge that due to our small sample size,
sampling variability and error could also contribute the
variability of validity coefficients across the keys.
SCORING SJTS 233
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006 Volume 14 Number 3 September 2006
References
Arnold, J.A., Arad, S., Rhoades, J.A. and Drasgow, F. (2000) The
empowering leadership questionnaire: The construction of a
new scale for measuring leader behaviors. Journal of Organiza-
tional Behavior, 21, 249269.
Ashworth, S.D. and Joyce, T.M. (1994) Developing score protocols
for a computerized multimedia in-basket exercise. Paper
presented at the Ninth Annual Conference of the Society for
Industrial and Organizational Psychology, Nashville, TN, April.
Borman, W.C., White, L.A., Pulakos, E.D. and Oppler, S.H. (1991)
Models of supervisory job performance ratings. Journal of
Applied Psychology, 76, 863872.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984)
Classification and regression trees. Belmont, CA: Wadsworth.
Campbell, D.T. and Fiske, D.W. (1959) Convergent and discrimi-
nant validation by the multitraitmultimethod matrix. Psycho-
logical Bulletin, 56, 81105.
Campbell, J.P. (1990a) Modeling the performance prediction
problem in industrial and organizational psychology. In M.D.
Dunnette and L.M. Hough (Eds), Handbook of industrial and
organizational psychology, Vol. 1 (pp. 687732). Palo Alto:
Consulting Psychologists Press.
Campbell, J.P. (1990b) The role of theory in industrial and
organizational psychology. In M.D. Dunnette and L.M. Hough
(Eds), Handbook of industrial and organizational psychology,
Vol. 1 (pp. 3973). Palo Alto: Consulting Psychologists Press.
Campbell, J.P, Dunnette, M.D., Lawler, E.E. III. and Weick, K.E.
(1970) Managerial behavior, performance, and effectiveness.
New York: McGraw-Hill.
Cattell, R.B., Cattell, A.K. and Cattell, H.E. (1993) Sixteen
personality factor questionnaire, 5th Edn). Champaign, IL:
Institute for Personality and Ability Testing Inc.
Chan, D. and Schmitt, N. (1997) Video-based versus paper-and-
pencil method of assessment in situational judgment tests:
Subgroup differences in test performance and face validity
perceptions. Journal of Applied Psychology, 82, 143159.
Chan, D. and Schmitt, N. (2002) Situational judgment and job
performance. Human Performance, 15, 233254.
Chan, D. and Schmitt, N. (2005) Situational judgment tests. In A.
Evers, N. Anderson and O. Voskuijil (Eds), Handbook of
personnel selection (pp. 219246). Oxford: Blackwell.
Cleary, T.A. (1968) Test bias: Prediction of grades of Negro and
White students in integrated colleges. Journal of Educational
Measurement, 5, 115124.
Clevenger, J., Pereira, G.M., Wiechmann, D., Schmitt, N. and
Harvey, V.S. (2001) Incremental validity of situational judgment
tests. Journal of Applied Psychology, 86, 410417.
Cureton, E.E. (1950) Validity, reliability, and baloney. Educational
and Psychological Measurement, 10, 9496.
Dalessio, A.T. (1994) Predicting insurance agent turnover using a
video-based situational judgment test. Journal of Business and
Psychology, 9, 2332.
Desmarais, L.B., Masi, D.L., Olson, M.J., Barbara, K.M. and Dyer,
P.J. (1994) Scoring a multimedia situational judgment test: IBMs
experience. Paper presented at the Ninth Annual Conference of
the Society for Industrial and Organizational Psychology,
Nashville, TN, April.
Devlin, S.E., Abrahams, N.M. and Edwards, J.E. (1992) Empirical
keying of biographical data: Cross-validity as a function of scaling
procedure and sample size. Military Psychology, 4, 119136.
Dodrill, C.B. (1983) Long term reliability of the Wonderlic
Personnel Test. Journal of Consulting and Clinical Psychology,
51, 316317.
Dodrill, C.B. and Warner, M.H. (1988) Further studies of the
Wonderlic Personnel Test as a brief measure of intelligence.
Journal of Consulting and Clinical Psychology, 56, 145147.
England, G.W. (1961) Development and use of weighted applica-
tion blanks. Dubuque: Brown.
Flanagan, J.C. (1954) The critical incident technique. Psychological
Bulletin, 51, 327358.
Hein, M. and Wesley, S. (1994) Scaling biodata through subgroup-
ing. In G.S. Stokes, M.D. Mumford and W.A. Owens (Eds),
Biodata handbook: Theory, research, and use of biographical
information in selection and performance prediction (pp.
171196). Palo Alto: Consulting Psychologists Press.
Hesketh, B. (1999) Introduction to the International Journal of
Selection and Assessment special issue on biodata. International
Journal of Selection and Assessment, 7, 5556.
Hogan, J.B. (1994) Empirical keying of background data measures.
In G.S. Stokes, M.D. Mumford and W.A. Owens (Eds), Biodata
handbook: Theory, research, and use of biographical informa-
tion in selection and performance prediction (pp. 69107). Palo
Alto: Consulting Psychologists Press.
Hough, L. and Paullin, C. (1994) Construct-oriented scale
construction: The rational approach. In G.S. Stokes, M.D.
Mumford and W.A. Owens (Eds), Biodata handbook: Theory,
research, and use of biographical information in selection and
performance prediction (pp. 109145). Palo Alto: Consulting
Psychologists Press.
Judge, T.A., Bono, J.E., Ilies, R. and Gerhart, M.W. (2002)
Personality and leadership: A qualitative and quantitative
review. Journal of Applied Psychology, 87, 765780.
Karas, M. and West, J. (1999) Construct-oriented biodata develop-
ment for selection to a differentiated performance domain.
International Journal of Selection and Assessment, 7, 8696.
Krukos, K., Meade, A.W., Cantwell, A., Pond, S.B. and Wilson,
M.A. (2004). Empirical keying of situational judgment tests:
Rationale and some examples. Paper presented at the 19th
annual meeting of the Society for Industrial/Organizational
Psychology, Chicago, IL.
Legree, P.J., Psotka, J., Tremble, T. and Bourne, D.R. (2005) Using
consensus based measurement to assess emotional intelligence.
In R. Schulze and R.D. Roberts (Eds), Emotional intelligence:
An international handbook (pp. 155180). Cambridge, MA:
Hogrefe and Huber.
Liden, R.C. and Arad, S. (1996) A power perspective of empower-
ment and work groups: Implications for human resources
management research. In G.R. Ferris (Ed.), Research in
personnel and human resources management (pp. 205252).
Greenwich, CT: JAI Press.
MacLane, C.N., Barton, M.G., Holloway-Lundy, A.E. and Nickels,
B.J. (2001). Keeping score: Expert weights on situational
judgment responses. Paper presented at the 16th Annual
Conference of the Society for Industrial and Organizational
Psychology, San Diego, CA.
Mael, F.A. (1991) A conceptual rationale for the domain and
attributes of biodata items. Personnel Psychology, 44, 763792.
McDaniel, M.A., Morgeson, F.P., Finnegan, E.B., Campion, M.A.
and Braverman, E.P. (2001) Use of situational judgment tests to
predict job performance: Aclarification of the literature. Journal
of Applied Psychology, 86, 6079.
McDaniel, M.A. and Nguyen, N.T. (2001) Situational judgment
tests: A review of practice and constructs assessed. International
Journal of Selection and Assessment, 9, 103113.
McHenry, J.J. and Schmitt, N. (1994) Multimedia testing. In M.J.
Rumsey, C.D. Walker and J. Harris (Eds), Personnel selection
and classification research (pp. 193232). Mahwah, NJ:
Lawrence Erlbaum Publishers.
234 MINDY E. BERGMAN, FRITZ DRASGOW, MICHELLE A. DONOVAN, JAIME B. HENNING AND SUZANNE E. JURASKA
International Journal of Selection and Assessment
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006
Mead, A.D. (2000) Properties of a resampling validation tech-
nique for empirically scoring psychological assessments. Unpub-
lished doctoral dissertation, University of Illinois at Urbana-
Champaign.
Mitchell, T.W. and Klimoski, R.J. (1982) Is it rational to be
empirical? A test of methods for scoring biographical data.
Journal of Applied Psychology, 67, 411418.
Motowidlo, S.J., Dunnette, M.D. and Carter, G.W. (1990) An
alternative selection procedure: The low-fidelity simulation.
Journal of Applied Psychology, 75, 640647.
Mumford, M.D. (1999) Construct validity and background data:
Issues, abuses, and future directions. Human Resource Manage-
ment Review, 9, 117145.
Mumford, M.D. and Owens, W.A. (1987) Methodology review:
Principles, procedures, and findings in the application of
background data measures. Applied Psychological Measure-
ment, 11, 131.
Mumford, M.D. and Stokes, G.S. (1992) Developmental determi-
nants of individual action: Theory and practice in applying
background measures. In M.D. Dunnette and L.M. Hough
(Eds), Handbook of industrial and organizational psychology,
2nd Edn (pp. 61138). Palo Alto: Consulting Psychologists
Press.
Mumford, M.D. and Whetzel, D.L. (1997) Background data. In D.
Whetzel and G. Wheaton (Eds), Applied measurement methods
in industrial psychology (pp. 207239). Palo Alto: Davies-Black
Publishing.
Nickels, B.J. (1994) The nature of biodata. In G.S. Stokes, M.D.
Mumford and W.A. Owens (Eds), Biodata handbook: Theory,
research, and use of biographical information in selection and
performance prediction (pp. 116). Palo Alto: Consulting
Psychologists Press.
Olson-Buchanan, J.B., Drasgow, F., Moberg, P.J., Mead, A.D.,
Keenan, P.A. and Donovan, M.A. (1998) An interactive video
assessment of conflict resolution skills. Personnel Psychology,
51, 124.
Paullin, C. and Hanson, M.A. (2001) Comparing the validity of
rationally-derived and empirically-derived scoring keys for a
situational judgment inventory. Paper presented at the 16th
Annual Conference of the Society for Industrial and Organiza-
tional Psychology, San Diego, CA.
Ployhart, R.E. and Ehrhart, M.G. (2003) Be careful what you ask
for: Effects of response instructions on the construct validity and
reliability of situational judgment tests. International Journal of
Selection and Assessment, 11, 116.
Schoenfeldt, L.F. (1999) From dust bowl empiricism to rational
constructs in biographical data. Human Resource Management
Review, 9, 147167.
Schoenfeldt, L.F. and Mendoza, J.L. (1994) Developing and using
factorially derived biographical scales. In G.S. Stokes, M.D.
Mumford and W.A. Owens (Eds), Biodata handbook: Theory,
research, and use of biographical information in selection and
performance prediction (pp. 147169). Palo Alto: Consulting
Psychologists Press.
Smith, K.C. and McDaniel, M.A. (1998). Criterion and
construct validity evidence for a situational judgment measure.
Poster presented at the 13th Annual Meeting of the Society
for Industrial and Organizational Psychology, Dallas, TX,
April.
Stokes, G.S. and Searcy, C.A. (1999) Specification of scales in
biodata form development: Rational vs. empirical and global vs.
specific. International Journal of Selection and Assessment, 7,
7285.
Such, M.J. and Hemingway, M.A. (2003) Examining the usefulness
of empirical keying in the cross-cultural implementation of a
biodata inventory. Paper presented in F. Drasgow (Chair),
Resampling and Other Advances in Empirical Keying. Sympo-
sium conducted at the 18th annual conference of the Society for
Industrial and Organizational Psychology.
Such, M.J. and Schmidt, D.B. (2004) Examining the effectiveness of
empirical keying: A cross-cultural perspective. Paper presented
at the 19th Annual Conference of the Society for Industrial and
Organizational Psychology, Chicago, IL.
Vroom, V.H. (2000) Leadership and the decision-making process.
Organizational Dynamics, 28, 8294.
Vroom, V.H. and Jago, A.G. (1978) On the validity of the
VroomYetton model. Journal of Applied Psychology, 63,
151162.
Vroom, V.H. and Yetton, P.W. (1973) Leadership and decision
making. Pittsburgh: University of Pittsburgh Press.
Weekley, J.A. and Jones, C. (1997) Video-based situational testing.
Personnel Psychology, 50, 2549.
Weekley, J.A. and Jones, C. (1999) Further studies of situational
tests. Personnel Psychology, 52, 679700.
Wonderlic Personnel Test Inc. (1992) Users manual for the
Wonderlic Personnel Test and the Scholastic Level Exam.
Libertyville, IL: Wonderlic Personnel Test Inc.
SCORING SJTS 235
r2006 The Authors
Journal compilation rBlackwell Publishing Ltd. 2006 Volume 14 Number 3 September 2006