Ackerman 2007

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/5657615
Further Explorations of Perceptual Speed Abilities in the Context of

Assessment Methods, Cognitive Abilities, and Individual Differences During
Skill Acquisition
Article in Journal of Experimental Psychology Applied · December 2007

DOI: 10.1037/1076-898X.13.4.249 · Source: PubMed
CITATIONS READS
17 3,369
2 authors, including:
Margaret E Beier
Rice University
68 PUBLICATIONS 2,671 CITATIONS
SEE PROFILE
All content following this page was uploaded by Margaret E Beier on 19 May 2014.
The user has requested enhancement of the downloaded file.

Journal of Experimental Psychology: Applied Copyright 2007 by the American Psychological Association
2007, Vol. 13, No. 4, 249 –272 1076-898X/07/$12.00 DOI: 10.1037/1076-898X.13.4.249
Further Explorations of Perceptual Speed Abilities in the Context of

Assessment Methods, Cognitive Abilities, and Individual Differences
During Skill Acquisition
Phillip L. Ackerman Margaret E. Beier

Georgia Institute of Technology Rice University
Measures of perceptual speed ability have been shown to be an important part of assessment batteries for
predicting performance on tasks and jobs that require a high level of speed and accuracy. However,
traditional measures of perceptual speed ability sometimes have limited cost-effectiveness because of the
requirements for administration and scoring of paper-and-pencil tests. There have also been concerns
about the validity of previous computer approaches to administering perceptual speed tests (e.g., see
Mead & Drasgow, 1993). The authors developed two sets of computerized perceptual speed tests, with
touch-sensitive monitors, that were designed to parallel several paper-and-pencil tests. The reliability and
validity of the tests were explored across three empirical studies (N ⫽ 167, 160, and 117, respectively).
The final study included two criterion tasks with 4.67 and 10 hours of time-on-task practice, respectively.
Results indicated that these new measures provide both high levels of reliability and substantial validity
for performance on the two skill-learning tasks. Implications for research and application for comput-
erized perceptual speed tests and are discussed.
Keywords: computerized testing, assessment, validity, working memory, cognitive ability
With over one hundred years of ability theory, empirical re- addressed with PC software and touch-sensitive computer monitors
search, and application experience, much is known about many that can be rapidly prototyped and implemented. In a series of studies,
major ability domains, such as verbal, numerical, and spatial for example, Ackerman and Cianciolo (1999, 2000) found that much
content abilities (e.g., see Carroll, 1993). However, there are some of the variance common to apparatus-based tests of fine PM abilities
ability domains that are not well represented in the large corpus of could be captured using off-the-shelf computers, in a way that max-
investigations; two in particular are perceptual speed (PS) ability imized examiner-to-examinee ratios and involved minimal mainte-
and psychomotor (PM) ability. Although these two ability classes nance issues.
have been shown to have good validity for prediction of individual Assessments of PS abilities, however, have been previously
differences during skill acquisition (e.g., see Ackerman, 1988; limited mostly to the paper-and-pencil testing format (e.g., see
Alderton, Wolfe, & Larson, 1997; Wolfe, 1997, and a review by Carroll, 1993; Ekstrom, French, Harman, & Derman, 1976;
Levine, Spector, Menon, Narayanan, & Cannon-Bowers, 1996), Guilford & Lacey, 1947; Thurstone, 1944). In addition, very few
there are several reasons why these two domains are underrepre- paper-and-pencil tests of PS abilities are constructed in a fashion
sented in both the literature and recent application areas. For PM that allows for the use of high-speed optical scan scoring. Instead,
abilities, the main consideration has been the relative cost and the examinee typically is asked to write directly on the test itself,
difficulty of using customized apparatus tests in both the laboratory such as by circling or crossing out target items in an array of
and the field. Assessment of PM speed and accuracy (Fleishman, targets and distractor stimuli (see Figure 1 for two example tests).
1954) typically requires a one-to-one examiner-to-examinee ratio, and For most of these tests, scoring must be done by hand, and it
devices that are expensive to purchase and costly to maintain (e.g., see generally takes as long or longer to score each test as it does for
Fleishman, 1956; Melton, 1947). Some of these difficulties have been each examinee to complete the test. In fact, the scoring process
turns into somewhat of a PS test for the individual scorer, as he or
she attempts to count correct, missed, and incorrect responses
Editor’s Note. Gerald Matthews served as the action editor for this article. using a template to match to the examinee’s responses.
In terms of predicting job performance, measures of PS have a
long history of documented validity for a variety of different tasks.
Phillip L. Ackerman, School of Psychology, Georgia Institute of Technol- The Minnesota Clerical Test, which consists a Number Compari-
ogy, Atlanta, Georgia; Margaret E. Beier, Department of Psychology, Rice son and a Name Comparison PS tests, was developed in the early
University, Houston, Texas.
1930s, and though it has been revised several times, it has been in
This research was partially supported by a grant from the Air Force
continuous use since for selection in clerical and other occupations
Office of Scientific Research to the first author.
Correspondence concerning this article should be addressed to Phillip L. (Andrew, Paterson, & Longstaff, 1979). This test and similar
Ackerman, School of Psychology, Georgia Institute of Technology, 654 measures have been shown to have validity for selection of a wide
Cherry Street, MC 0170, Atlanta, GA 30332-0170. E-mail: variety of occupations, such as sewing machine operator (Otis,
phillip.ackerman@psych.gatech.edu 1938), census enumerators, calculator operators, hand transcribers
249
250 ACKERMAN AND BEIER
are more robust (i.e., predicting on-the-job performance as well as

training success), where general ability tests usually show a de-
cline in validity coefficients from training to on-the-job perfor-
mance (e.g., see Ghiselli, 1966).
There are two related reasons why most PS tests remain mostly
in the domain of paper-and-pencil format. The first reason is that
early efforts to computerize some PS tests indicated that they were
highly sensitive to format. For example, the Differential Aptitude
Test (DAT) Battery–Clerical Speed and Accuracy Test only
showed cross-format correlations of 0.37 and 0.40 for two differ-
ent samples (Henly, Klebe, McBride, & Cudeck, 1989) in com-
parison to alternate form reliability of 0.96 for the paper-and-
pencil version (from the test manual; Bennett, Seashore, &
Wesman, 1977). That is, a test that involves rapid scanning of a
test form and then making a response with paper and pencil
appears to tap different underlying processes than selecting an
answer from the computer display and then making a response
with a keyboard button. This is perhaps partly because the key-
board response taps less direct processes than selecting or writing
in a response on the test itself (or writing a response on a separate
optical scan response sheet).
A meta-analysis conducted by Mead and Drasgow (1993) of
computerized and paper-and-pencil cognitive ability tests indi-
cated that unspeeded (power) ability tests showed high cross-
correlations across the two formats (the authors did not report an
average observed correlation, but only a correlation corrected for
unreliability of the individual tests, which was 0.97). However, the
same meta-analysis indicated much lower correlations for speeded
tests. The results were based almost entirely on the Numerical
Operations and Code Speed subtests from the Armed Services
Vocational Aptitude Battery (ASVAB; estimated correlation ⫽
0.79), the DAT Clerical Speed and Accuracy test (estimated cor-
relation ⫽ 0.34), and from a mixture of information processing
tests and the Wonderlic Personnel Test (estimated correlation ⫽
0.60; the authors report that most corrections were based on
test–retest reliability estimates, but a few were from coefficient
Figure 1. Example perceptual speed tests. Figure 1A (Top panel): Can- alpha estimates). Mead and Drasgow reported a mean corrected
celing Symbols Test. Figure 1B (Bottom panel): Summing to 10 Test. correlation across these measures of 0.72 between paper-and-
Copyright 1992 Phillip L. Ackerman. Reprinted with permission of the pencil and computerized measures of speeded tests. These authors
author. also speculated that the extant computer interface used in most of
these studies (standard computer monitor displays and standard
keyboards) might be partly responsible for the low correlations
(Ghiselli, 1942), airplane navigators (Guilford & Lacey, 1947) between the two different media for testing used for the speeded
keypunch operators (Hay, 1951), Air Force weather forecasters tests, and that “pen-based computer-operating systems” might
(Jenkins, 1953), statistical clerks (Hall & Gough, 1977), air traffic “reduce or eliminate” these differences (Mead & Drasgow, 1993,
controllers (Ackerman & Kanfer, 1993; Sells, Dailey, & Pickrel, p. 453). The second reason for the lack of transition from paper
1984), and many others. Current selection of U.S. military appli- and pencil to computer administration is because many PS tests
cants using the Armed Services Vocational Aptitude Battery in- involve multiple items (both distractors and targets) that may be
cludes one test of PS (Code Speed) and another test that is highly responded to on the same page. This also suggests that a computer
associated with PS abilities (Numerical Operations; see Ackerman administration might need to involve a different interface than a
[1988] for a discussion). simple keyboard-based selection of multiple choices.
In general, PS measures appear to have useful levels of predic- The extant literature shows that many researchers have adapted
tive validity for performance on jobs that involve speed and a series of specific tests of PS usually referred to as “processing
accuracy of clerical or other routine operations (e.g., grocery store speed” to computer administration, such as the Posner task, or
checkout clerk, bank teller, insurance claims filing, gift wrapping, the S. Sternberg memory scanning task. However, as noted by
etc.). Two additional aspects of the validity of PS for occupational Kyllonen (1985), these tests have relatively little in common with
performance should be noted. First, the PS tests typically have standard paper-and-pencil tests of PS abilities (e.g., Clerical Speed
lower validity coefficients compared with tests of general intellec- and Accuracy on the DAT or Code Speed on the ASVAB), and
tual ability. Second, PS tests often have validity coefficients that thus appear to be missing much of the underlying variance in PS
PERCEPTUAL SPEED ABILITIES 251
abilities that is valid for predicting individual differences in per- Ackerman proposed that PS ability tests were essentially small
formance during skill acquisition. There is little convergent valid- learning tasks in and of themselves. That is, effective performance
ity evidence for such information processing tasks for any other on such tests was mainly attributable to individual differences in
domain of abilities, given that these tests are not typically admin- the speed and effectiveness of building relatively simple associa-
istered in the context of other tests that could usefully determine tive strategies (e.g., acquiring a template for the letter “a” embed-
underlying reference factors. However, see the review of “cogni- ded in series of words, as in the Canceling A’s test; or memorizing
tive speed” abilities by Carroll (1993) for an extended discussion a small number of symbols to match a set of numbers, as in the
of this issue. Digit-Symbol test). To test this proposition, Ackerman (1990)
With respect to PS abilities in general, there have been two provided extended practice on several PS tests in order to examine
investigations in the past decade that have focused on validating a whether the transition from early learning to late skilled perfor-
taxonomy of PS abilities. Carroll (1993), from his review of the mance on the PS tests would result in predictive validity changes
literature, identified two types of PS tests and abilities, namely: “1. similar to a transition from general/content tests to PM tests. The
Tests of speed in locating one or more given symbols or visual results indicated that there were indeed changes in predictive
patterns in an extended visual field, with or without distracting validities with practice on the individual PS tests, but that such
stimuli . . . including Cancellation, Finding A’s . . . Letter Cancel- changes were quite different depending on the particular test.
lation, and Scattered X’s” and “2. Tests of speed in comparing Subsequent research indicated that tests that load on PS factors
given symbols presented either side-by-side or more widely sep- were heterogeneous in terms of the underlying processing require-
arated in a visual field. . . . Clerical Checking, . . . Name Compar- ments, learning opportunities, and item content, which may give
isons, Numerical Checking . . . ” (Carroll, 1993, p. 350) rise to different patterns of predictive validity for various PS tests
There are PS ability tests that do not fit readily into either of during skill acquisition. For example, in an investigation of the
these two categories, but Carroll’s review provides a high-level taxonomic representation of PS abilities, Ackerman and Cianciolo
starting point for a discussion of PS group factors. The work by (2000; and also Ackerman & Kanfer, 1993) identified as many as
Ackerman and Cianciolo (2000) described later in the next section four separable factors of PS abilities, including two that corre-
further divided the PS construct space into four factors. spond at least generally to Carroll’s (1993) two factors (Pattern
Recognition and Scanning), but also factors of PS Memory (tapped
Theoretical Background by PS tests involving substantial demands on short term or work-
ing memory), and PS-Complex (tapped by tests that involve mul-
The theoretical basis for the role of PS abilities in the context of tiple comparisons within single items, and call for higher levels of
individual differences during skill acquisition was proposed by stimulus pattern integration).
Ackerman (1988). The theory states that the three major ability In addition to this taxonomic work, additional research has sup-
determinants of individual differences during skill acquisition par- ported the proposition that tests representing the four PS factors have
allel the three broad phases of skill acquisition, termed by Fitts and somewhat differentiable relations to individual differences in task
Posner (1967) as Phase 1 “cognitive,” Phase 2 “associative,” and performance during skill acquisition. In particular, Ackerman and
Phase 3 “autonomous.” The three ability classes that corresponded Cianciolo (2000) evaluated the predictive validity of the four identi-
to these phases of skill acquisition were: Phase 1 ⫽ general and fied PS factors over task practice for two tasks—the Kanfer-
content (e.g., verbal, math, and spatial) abilities; Phase 2 ⫽ PS Ackerman Air Traffic Controller (ATC) Task, which is a consistent
abilities; and Phase 3 ⫽ PM abilities. Although there have been and straightforward task that allows for the development of automa-
mixed results among empirical tests of the theory (e.g., see Keil & ticity within 4 or 5 hours of time-on-task (see Ackerman, 1988, 1990;
Cortina, 2001), there have been instances where PS tests have been Ackerman & Cianciolo, 2000); and the Wesson International Termi-
shown to be good predictors of individual differences during task nal Radar Approach Control (TRACON) Task, which has continuous
practice, and they often provide significant incremental predictive high demands on reasoning, planning, and spatial abilities; and which
validity over and above standard measures of general and broad does not result in automatized performance within a 10 to 20 hour
content abilities (e.g., see Ackerman & Cianciolo, 2000). period of task practice (see also Ackerman, 1992; Ackerman &
Working memory (WM) abilities have been studied extensively Kanfer, 1993; Ackerman, Kanfer, & Goff, 1995). In their study
in the context of PS and general content abilities but not in the Ackerman and Cianciolo found stable and substantial r ⫽ 0.4 to 0.7)
context of skill acquisition. Although there has been some debate correlations between PS-Complex ability and performance for both
about how WM is related to other cognitive abilities such as the ATC and TRACON tasks; relatively stable correlations for PS-
general intelligence and broad content abilities (Ackerman, Beier, Memory and performance on both tasks (0.3– 0.5); negligible corre-
& Boyle, 2005; Kane, Hambrick, & Conway, 2005; Oberauer, lations between PS-Pattern Recognition ability and performance on
Schulze, Wilhelm, & Sü␤, 2005), it appears that working memory either task; and substantial (0.4 – 0.5) correlations between the PS-
capacity is most highly related to content abilities that underlie Scanning ability and performance on the ATC task, but negligible (0.1
novel problem solving and speed of processing (for a meta- or less) correlations between PS-Scanning ability and performance on
analysis, see Ackerman et al., 2005). Because very little research the TRACON task.
had been conducted on individual differences in WM when In practice, however, the number of underlying PS factors that
Ackerman’s (1988) theory was proposed, little is known about can be reliably derived from a battery of tests depends critically on
how WM fits within the individual differences in performance the exact makeup of the battery, and of course, on the character-
during skill acquisition framework. istics of the sample tested. Because there is a high degree of
When Ackerman’s (1988) theory was originally proposed, rel- common variance among many PS tests, one can construct a
atively little was known about the taxonomic nature of PS abilities. battery that shows only one or as many as several underlying, but
correlated PS factors. It should be noted that the high degree of The critical issues for PS tests that might be adapted to computer
communality among PS tests often yields complex factor loadings administration, however, pertain to the reliability and the validity
for individual tests (indicating that they are factorially complex (both construct and criterion-related) of the measures administered
measures in terms of their underlying demands). This aspect of PS with this technology. Should these tests provide acceptable levels
tests means that some of them load significantly on more than one of psychometric characteristics, the advantage to using the com-
underlying PS factor. For our current purposes, we will not be puter for assessment over paper and pencil for PS tests could be
attempting to validate the four-factor taxonomy of PS ability marked by improvements in relative cost and accuracy. The accu-
factors. Rather, we will focus on broad aspects of PS test format racy advantage is based on the fact that hand-scoring paper-and-
and content. pencil tests is error prone, and often requires averaging responses
from two or more scorers to get reliabilities above 0.90. The
computer essentially provides near perfect reliability in this con-
Technology text. The efficiency of the computer is obvious, in that scores can
be calculated online and exported to a database program, without
In the late 1990s, touch-sensitive computer monitors became any appreciable time and effort on the part of the examiner.
readily available on the commercial market, with interfaces for
desktop PCs, such as the one marketed by MicroTouch, before it
Overview of Studies
was acquired by 3M. These monitors allow for both finger and
stylus input (e.g., the TouchPen, a stylus similar in size to a pen, The main goal for the series of studies reported here was to
but tethered to the monitor). Regardless of finger or stylus input, determine whether it is possible to obtain measures of PS abilities
the system codes the x,y coordinates and timing when contact is with acceptable psychometric properties that could be adminis-
made with the monitor, similar to the information provided by a tered in a computerized format, and to further validate these
mouse interface. (This technology preceded the current Tablet PC measures for predicting individual differences during skill acqui-
systems that have entered the market over the past four years, but sition. Three studies will be reported in this article. In Study 1, PS
is similar in operation.) In previous research with these devices, tests involving single-stimulus presentations were created for com-
several PM tests were created (e.g., Tapping, Alternate Tapping, puter administration, and assessed in the context of administration
Choice and Serial Reaction Time, Maze Tracing, Pursuit Maze of both computerized and paper-and-pencil test versions. These
Tracing, Rotary Pursuit, and Mirror Star Tracing.) These tests, tests represent a broad sample of underlying PS factors (Scanning,
while having some surface-level differences compared with appa- Memory, Pattern Recognition), and a to a lesser degree the PS-
ratus PM tests, were found to have good reliability and both Complex factor. In Study 2, the battery of PS tests was expanded
construct and criterion related validity (see Ackerman & to include those that involve simultaneous multiple-stimulus arrays
Cianciolo, 1999, 2000). and responses on the display, and the psychometric characteristics
Having demonstrated the viability of the touch-sensitive moni- of these new tests were examined in the context of the single-
tor platform for PM assessment, we sought to create a set of PS stimulus based computerized PS tests. The new tests administered
tests using the same general platform. For tests that involve pre- in Study 2 all involve a salient association with PS-Pattern Rec-
sentation of one stimulus on the display at a time (e.g., Name ognition. In Study 3, a battery of single and multiple-stimulus
Comparison), this is a relatively straightforward implementation. based PS tests was administered, along with practice on two
For tests that involve scanning a display of multiple targets and criterion skill acquisition tasks, to examine the criterion-related
distractors, the implementation is more difficult (because it re- validity for the new PS tests. In each of the studies, reference
quires collection of multiple data points within the same screen). In content ability tests and PM tests were also administered, to
both cases, however, one must keep in mind that the psychometric provide additional evidence regarding convergent and discriminant
characteristics (e.g., reliability and validity) of the PS tests may be construct validation.
compromised, when the underlying processes required for compe- Study 3 also included a battery of WM measures. Much research
tent test performance differ from those that underlie the paper-and- has examined the underlying processes associated with working
pencil versions of the same tests. At a surface level, at least, the use memory capacity and the relations among working memory and
of the touch-pen stylus does more closely mimic the use of a pencil other abilities (for a review, see Ackerman et al., 2005). However,
in the traditional paper-and-pencil tests. Also, the computer dis- there has been relatively little research on the predictive validity of
play does have similar, but not identical characteristics compared working memory measures for skill acquisition or skilled perfor-
with paper. The main limitation of most off-the-shelf computer mance, and more specifically, whether or not working memory
displays is that of resolution. For some tests that require highly accounts for incremental variance in skill acquisition over and
detailed stimulus (e.g., the Dial Reading Test; see Guilford & above measures usually used in selection (general ability, PS, and
Lacey, 1947) or a large amount of text information in a small font PM abilities). The lack of empirical evidence in this domain was
(e.g., the Table Reading Test; see Guilford & Lacey, 1947), the main impetus for examining the validity of WM measures for
standard computer displays are not yet adequate for providing the the prediction of skilled performance in Study 3.
level of detail necessary to match that of paper. For example, paper
tests might have as high a resolution as 2400 dots per inch or Study 1
higher, if lithographic printing is used (though more typically the
Predictions
level of resolution for Xerographic copies is about 300 – 600 dots
per inch). In contrast, a 17” computer monitor with a 1280 ⫻ 1024 Four broad sets of predictions were made for this study. First,
resolution only reaches about 100 pixels per inch. the internal reliability of the new computerized PS tests would be
generally equivalent to the traditional paper-and-pencil versions of Factors of 7; (3) Digit-Symbol; (4) Number Comparison, (5)
the PS tests, and would exceed 0.80 for all tests. Second, we Coding; (6) Summing to 10; (7) Directional Headings 1; (8)
expected significant practice effects between the initial and second Directional Headings 2; (9) Clerical Abilities (CA)-2; (10) Letter/
administration of the computerized PS tests, on the order of d ⫽ Number Substitution, and (11) Naming Symbols.
0.5 to 1.5 (based on both theory regarding PS tests and the Computerized PS tests. Eleven computerized PS tests, parallel
previous literature with paper-and-pencil PS tests; see Ackerman in content to the paper-and-pencil tests, were administered over the
& Cianciolo, 2000). Third, we expected substantial (greater than computers, with TouchPen responses. For each test, a single stim-
r ⫽ 0.50) correlations between each of the parallel paper-and- ulus was presented on the screen, and the participant marked his or
pencil and computerized PS tests. We expected a higher correla- her selected response from a choice of option boxes by pressing on
tion to be found for composite measures of the two sets of PS tests the screen with the end of a TouchPen stylus. For each test item,
r ⬎ 0.80), based on the principle of aggregation (e.g., see Rushton, as soon as the response was made, the next stimulus was presented
Brainerd, & Pressley, 1983). Finally, we expected a high degree of without delay. With the exception of the two Directional Headings
overlap in the construct validity of the paper-and-pencil and com- PS tests (2-minute single parts for each), each test was adminis-
puterized PS tests. That is, we expected similar patterns of corre- tered with three 90-sec test parts separated by a 5-sec delay, for a
lations between composite paper-and-pencil PS measure and con- total testing time 4.5 minute for each test.
tent ability measures (i.e., estimates of verbal, numerical, and These tests were selected to broadly sample the range of PS tests
spatial abilities), in comparison with correlations between a com- and factors from extant research (e.g., Ackerman & Cianciolo,
posite computerized PS measure and the content ability measures. 1999). Three tests were selected from the PS-Scanning factor
(Name Comparison, Number Comparison, CA-2). Two tests were
Method selected from the PS-Memory factor (Digit/Symbol, Naming Sym-
bols, and Coding). Two tests were selected from PS-Complex
Participants. Participants were recruited from undergraduate (Directional Headings 1 and Directional Headings 2), and three
Psychology courses and from the campus at-large at Georgia tests were selected that had somewhat complex loadings on PS-
Institute of Technology (through flyers distributed at random in Scanning, PS-Memory, and PS-Pattern Recognition (Letter/
campus mailboxes). Inclusion criteria were that participants be Number Substitution, Factors of 7, and Summing to 10). Both the
native English speakers; have normal or corrected-to-normal hear- PS-Pattern Recognition and PS-Complex factors were substan-
ing, vision, and motor coordination; and be between 18 and 30 tially undersampled in this study; the first because most PS-pattern
years old. One hundred sixty-seven adults participated. The sample recognition tests require searching through a multiple-stimulus
had 80 men and 87 women, Mage ⫽ 20.71, SDage ⫽ 1.72, range ⫽ display, and the second because the PS-Complex tests are not
18 –27 years. amenable to computer display (e.g., the prototypical PS-Complex
Apparatus. Pencil-and-paper testing was administered in a task, the Dial Readings Test [see Ackerman & Kanfer, 1993]
laboratory with prerecorded instructions and directions presented requires fine discrimination of highly detailed figures that cannot
over a public address system. Up to 14 examinees were tested at a be displayed at high enough resolution on standard computer
time. Computerized testing for the PS and PM tests was adminis- monitors [at least 300 dots per inch on the paper version]).
tered on Dell Pentium computers running Windows XP with Sony
G220 17” monitors with 3M/MicroTouch touch-sensitive touch
panels at individual carrels. Instructions were presented visually on Procedure
the computer screen and also auditorially over headphones. Par-
ticipants responded by placing a TouchPen on the screen location The study took place over three, 3-hour sessions, totaling 9
corresponding to the selected item. hours. The sessions were separated by at least 24 hours, and no
Reference ability tests. Tests administered in this and the other more than 48 hours. Each session included some amount of paper-
studies are described in detail in the Appendix, along with proce- and-pencil ability testing or an unrelated noncomputerized learn-
dural information (e.g., testing time). Reference content ability ing task. Five-minute breaks were given after every hour of testing.
tests were administered for verbal, numerical, and spatial ability In Session 1, participants completed the 11 paper-and-pencil PS
factors. Four tests were administered for each factor. Verbal ref- ability tests, with several content reference tests interspersed be-
erence tests included: Multidimensional Ability Battery (MAB)– tween groups of PS tests. That is, the first five PS tests were
Similarities; Word Beginnings, MAB–Comprehension, and ETS administered, followed by three reference content ability tests (one
Extended Range Vocabulary. Numerical reference tests included: each of the verbal, numerical and spatial tests), followed by a
Math Knowledge, Arithmetic, Science Research Associates (SRA) 5-minute break. Next, six additional PS tests were administered,
Number Series, and Math Approximation. Spatial reference tests followed by 4 reference content ability tests.
included: ETS Cube Comparison, Spatial Analogy, Paper Folding, Session 2 started with the PM tests, followed by a 5-minute break,
and Spatial Orientation. and then the first five computerized PS tests (in the same order as the
Reference tests for PM ability included Single Tapping, paper-and-pencil PS tests administered in Session 1). After another
8-choice Serial Reaction Time (RT), 4-Choice Serial RT, and 5-minute break, participants completed five content ability tests, an-
2-Choice Serial RT. For additional details on these tests, see the other 5-minute break, and then completed the remaining six comput-
Appendix and Ackerman and Cianciolo (1999). erized PS tests. In Session 3, participants completed a repetition of the
Paper-and-pencil PS tests. Eleven paper-and-pencil PS tests 11 computerized PS tests with new item orders, in the same groups of
were administered, to provide assessments that would parallel the 5 and 6 tests, though the two groups of tests were separated by an
new computerized PS tests. Tests were (1) Name Comparison, (2) unrelated learning task and additional breaks that lasted 50 minutes.
Participants received $100, research credit, or some combination of Internal test reliabilities (based on multiple part tests and the
the two for participation. Spearman-Brown Prophecy formula) for the eight paper-and-
pencil tests with multiple test parts ranged from 0.89 to 0.95 (M ⫽
Results 0.90). In comparison, internal test reliabilities computed for the 11
computerized PS tests ranged from 0.83 to 0.96 (M ⫽ 0.91). Such
Descriptive statistics of the PS tests will be presented first, along
results are consistent with the conceptualization of PS tests as
with internal, test-retest, and cross-format (paper-and-pencil vs.
having highly homogeneous items. Test–retest reliability for the
computerized PS test) correlations. Next, the descriptive statistics
computerized PS tests were also generally quite substantial, rang-
for the content and PM reference tests will be presented, along a
ing from 0.72 to 0.84 (M ⫽ 0.78), especially when one takes into
factor analysis used in support of building ability composite mea-
account both the general findings of significant practice effects and
sures. Finally, correlations between the sets of PS tests and content
the fact that each test was only 4.5 minutes in duration.
and PM abilities will be presented.
PS tests: Descriptive statistics and reliability. Means and SDs As expected, given the difference between response formats and
for the test and retest administrations of the computerized PS tests novelty of the computerized test apparatus, the correlations be-
are provided in Table 1, along with a dependent t test for the tween the paper-and-pencil versions of the PS tests and the com-
difference in means, and effect sizes expressed as Cohen’s d (using puterized PS tests were smaller than the test-retest correlations for
the formula for within-subject comparisons suggested by Dunlap, the computerized PS tests. Correlations ranged from 0.52 to 0.78
Cortina, Vaslow, & Burke, 1996). With the exception of the between the paper-and-pencil tests and the initial administration of
Number Comparison test (d ⫽ 0.02), all of the PS tests showed the computerized PS tests, with a mean of r ⫽ 0.62, and they
significant and substantial increases in performance from test to ranged from 0.48 to 0.81 for the paper-and-pencil tests and the
retest occasions, ranging from small to medium in magnitude. The retest of the computerized PS tests, with a mean of r ⫽ 0.63.
rule of thumb adopted here is that d values of 0.20 to 0.49 are small Reference content tests. Descriptive statistics for the reference
effects, 0.50 to 0.79 are medium-sized effects, and those ⬎0.80 are tests are shown in Table 2. In addition, a confirmatory factor
large effects (e.g., see Cohen, 1988). These results are consistent analysis (CFA) was computed using LISREL 8.7 (Jöreskog and
with practice effects found with paper-and-pencil versions of the Sörbom, 2006) for the reference tests, and is shown in Table 2. The
PS tests in other studies (e.g., see Ackerman & Cianciolo, 2000). factor solution indicated that the reference tests are good markers
Table 1
Study 1. Psychometric Characteristics of Computerized PS versus Paper-and-Pencil PS Tests
Initial Test (A) Retest (B) r with P&P Test-Retest
M SD rxx,a M SD rp&p,A rp&p,B rA,B t(B-A) d
Computerized PS tests
1. Name Comparison 29.19 6.20 0.89 31.73 7.21 0.78 0.81 0.84 7.92** 0.35
2. Factors of 7 65.39 16.77 0.96 78.05 14.37 0.57 0.65 0.78 14.89** 0.76
3. Digit-Symbol Substitution 60.61 10.97 0.94 65.49 10.79 0.62 0.62 0.80 8.96** 0.44
4. Number Comparison 32.39 6.76 0.87 32.39 6.70 0.69 0.69 0.74 0.41 0.02
5. Coding 51.71 8.92 0.93 55.69 9.77 0.60 0.55 0.83 8.70** 0.39
6. Sum to 10 86.75 10.50 0.93 91.24 10.48 0.56 0.57 0.76 7.64** 0.41
7. Directional Headings 1 64.84 16.35 0.88 73.78 17.67 0.58 0.58 0.74 9.12** 0.51
8. Directional Headings 2 36.56 12.17 0.86 42.94 11.63 0.61 0.67 0.72 8.94 0.52
9. CA-2 9.15 2.44 0.83 10.24 2.78 0.67 0.71 0.78 8.32** 0.43
10. Letter/Number Substitution 57.29 9.74 0.88 58.73 11.37 0.52 0.48 0.80 2.72** 0.13
11. Naming Symbols 79.33 9.80 0.92 83.47 10.31 0.53 .50 0.79 7.38** 0.37
M SD rxx,a
Paper-and-pencil PS tests
1. Name Comparison 28.61 6.99 0.89
2. Factors of 7 52.38 15.55 0.91
3. Digit-Symbol Substitution 84.21 19.50 0.94
4. Number Comparison 29.51 6.91 0.93
5. Coding 45.26 10.37 0.94
6. Sum to 10 66.04 11.81 0.95
7. Directional Headings 1 60.00 19.97 —
8. Directional Headings 2 46.26 14.89 —
9. CA-2 37.70 8.57 —
10. Letter/Number Substitution 66.65 12.39 0.94
11. Naming Symbols 110.82 20.11 0.89
Note. — ⫽ cannot be computed (single part test). For Directional Headings, only Directional Headings 1 was administered in paper-and pencil-version,
Directional Headings 1 and 2 in the computerized version.
a
Reliability for paper and pencil tests, and for initial computerized tests computed by calculating mean (via r-to-z transformation) correlation across three
test parts, and then computing reliability via Spearman-Brown prophecy formula.
Table 2
Study 1. Reference Tests: Descriptive Statistics and Factor Loadings
# of Items M SD ␣ Verbal Numerical Spatial PM
Reference tests
1. MAB Similarities 34 26.03 3.70 0.64 0.61
2. MAB Comprehension 28 21.23 3.04 0.64 0.48
3. Vocabulary 48 20.56 7.07 0.78 0.82
4. Word Beginnings Open-ended 29.99 9.44 0.59a 0.61
5. Math Knowledge 32 22.71 5.81 0.80 0.78
6. Arithmetic 20 7.58 4.26 0.72 0.61
7. Number Series 20 10.93 2.65 0.72 0.61
8. Math Approximation 40 18.58 6.89 0.84 0.90
9. Cube Comparison 42 25.04 9.35 0.87 0.73
10. Spatial Analogies 30 20.56 5.02 0.79 0.73
11. Paper Folding 24 15.31 5.57 0.83 0.78
12. Spatial Orientation 20 9.12 4.29 0.68 0.67
13. Tapping 5 95.99 15.01 0.93 0.36
14. 8-Choice Serial RT 50 3700.90 425.38 0.94a 0.82
Correlations between factors
Numerical 0.37
Spatial 0.42 0.55
Psychomotor 0.34 0.26 0.41
Note. PM ⫽ psychomotor; MAB ⫽ Multidimensional Aptitude Battery; RT ⫽ reaction time. For all tests except for the Tapping and Serial RT, units
are number of items correct minus items wrong. For Tapping, the units are average total number of taps for each trial. For Serial RT, units are ms for correct
responses.
a
Reliability calculated from Part 1, Part 2 correlation, and Spearman-Brown prophecy formula.
for the underlying factors identified in a priori selection ␹2(N ⫽ of this test the first administration of computerized tests was used).
167, 98) ⫽ 152.92, p ⬍ .05, comparative fit index (CFI) ⫽ .97, Fit of initial model was poor given that as many as four underlying
Root Mean Squared Error of Approximation (RMSEA) ⫽ 0.055. factors can be derived from the common variance among PS tests,
(Note: CFI indexes greater than .90 and RMSEA of less than .10 ␹2(N ⫽ 167, 208) ⫽ 1164.80. Because of the substantial amount
indicating adequate fit to the model; Byrne, 1998). Correlations of common variance among the factors (see above), and because
among the factors are also provided in Table 2. For further anal- the purpose of this analysis was not to identify the underlying
yses, unit-weighted z-score composites of the ability measures factor structure of the PS tests, we used this model as the basis of
were constructed (e.g., see Cohen, 1990; Thorndike, 1986). subsequent ␹2 difference tests to evaluate measurement equiva-
Validity. Factor analysis of the paper-and-pencil PS tests lence. Specifically, we set the paths from the underlying factors to
and/or the computerized PS tests, using the Montanelli-Humphreys the same tests (different format) to be equal for all tests. The
(1976) parallel analysis criterion suggests that there are as many as
four underlying factors that could be derived from the PS mea-
sures. However, deriving a single factor for each set of measures Table 3
illustrates that there is substantial degree of communality among Study 1. Factor Loadings (Single-factor solutions) for Paper-
the various measures, such that nearly half of the variance is and-Pencil and A and B Sets of Computerized version of PS
captured by a single general PS factor in each case (see Table 3). Tests
The single factor accounts for 48.6% of the variance in the set of
11 paper-and-pencil PS tests that are common to the computerized Paper and Computer Computer
PS test set. (For comparison purposes, computation of a four-factor Individual tests pencil A B
solution indicated that the second, third, and fourth factors ac- 1. Name Comparison 0.699 0.726 0.673
counted for only 7.2%, 6.0%, and 4.1% of the variance, respec- 2. Factors of 7 0.664 0.662 0.784
tively.) A single factor accounts for 51.6% of the initial comput- 3. Digit-Symbol Substitution 0.633 0.681 0.694
erized PS tests, and a similar analysis for the retest scores yields a 4. Number Comparison 0.754 0.644 0.736
factor that accounts for 52.7% of the variance in the retest set of 5. Coding 0.617 0.720 0.731
6. Sum to 10 0.722 0.667 0.711
computerized tests. 7. Directional Headings 1 0.740 0.788 0.692
Although we had no reason to expect that the change of test 8. Directional Headings 2 0.764 0.725 0.807
administration mode would result in strictly equivalent tests, we 9. CA-2 0.709 0.726 0.793
computed a test of measurement equivalence for the two sets of 10. Letter/Number 0.635 0.788 0.677
Substitution
tests (paper and pencil vs. computerized). To conduct this test, a 11. Naming Symbols 0.714 0.755 0.770
model was created with two correlated factors (PS Paper and
Pencil and PS Computerized) each with 11 indicators (for purposes PS ⫽ perceptual speed.
resulting ␹2 difference tests were significant ( p ⬍ .01) for only while the correlation between the test and retest PS composites
two of the tests, Coding and Naming Symbols, indicating that was r ⫽ 0.89.
these measures were not strictly equivalent from paper-and-pencil
to computerized administrations, ⌬␹2(N ⫽ 167, 1) ⫽ 64.52 and Discussion
⌬␹2(N ⫽ 167, 1) ⫽ 18.39, respectively. We also tested a model
In general, the predictions were confirmed. The computerized
that combined the paper-and-pencil and computerized factors to
PS tests showed good psychometric characteristics. The new tests
test for factorial equivalence. The results of this model showed that
showed high levels of internal reliability (ranging from 0.83 to
there was a significant difference in model fit for a one factor
0.93; M ⫽ .91), similar to the values found for the paper-and-
versus two factor solution, ␹2(N ⫽ 167, 1) ⫽ 132.62 ( p ⬍ .01)
pencil PS tests. In addition, there were reasonably high levels of
indicating there are differences in the underlying factor structure
test–retest reliability (ranging from 0.74 to 0.84; M ⫽ 0.78),
given the format of the test. Thus we conclude that while the
especially in light of the fact that these tests were each only about
measures show evidence of convergent and discriminant relations,
4.5 minutes in length. Consistent with expectations that the PS
paper-and-pencil and computerized tests are not strictly equivalent.
tests are ‘miniature learning tasks’ (Ackerman, 1990), all but one
As with the reference tests, unit-weighted z-score composites
of the tests showed significant mean improvement from initial test
were created for the 11 common paper-and-pencil PS tests, and the
to retest, though the improvement ranged from small to medium
two administrations of the computerized PS tests. Correlations
sized magnitudes (0.13 to 0.75, excluding the nonsignificant
between these composites and the reference ability composites are
change of 0.02 for Number Comparison; mean improvement d ⫽
shown in Table 4. The patterns of correlations with the reference
0.39).
abilities are largely similar for each of the sets of PS tests (higher
For evaluation of validity, the direct correlations between paper-
correlations with Numerical and PM abilities, and smaller corre-
and-pencil and computerized PS tests indicated that the correla-
lations with Verbal and Spatial abilities), but the correlations
tions were substantial, but they did not indicate the two testing
between these aggregate scores and a general ability composite
formats were essentially identical. Direct correlations ranged from
(which consists of equally weighted Verbal, Numerical, and Spa-
0.52 to 0.78 for the initial administration (M ⫽ 0.62), and from
tial ability composite scores) were significantly higher for the
0.48 to 0.81 for the retest versions of the test (M ⫽ 0.63).
computerized PS test composites in comparison with the paper-
Although multiple factors could be derived from the 11 new PS
and-pencil PS test composite (0.596 and 0.591 for the two com-
tests, a single factor accounted for similar amounts of variance
puterized PS test sets vs. 0.500 for the paper-and-pencil PS tests;
among these new tests and the paper-and-pencil PS tests (51% vs.
see Table 4). Although this is not a large difference in practical
49% for computerized and paper-and-pencil tests, respectively).
terms, it is interesting to note that this difference in correlations is
Evaluation of convergent and discriminant validity was provided
found in spite of common methods among the reference tests and
from comparison of the respective composite PS test scores in their
PS tests (that is, only using a paper-and-pencil method of assess-
respective correlations with verbal, numerical, spatial, PM, and a
ment). In contrast, the computerized PS test composites were also
general ability composite (verbal ⫹ numerical ⫹ spatial). The
significantly more highly correlated with the PM ability composite
pattern of correlations was similar for both the paper-and-pencil
than the paper-and-pencil PS test composite, but there was a
and computerized PS tests (lower correlations with verbal and
common method for the two sets of computerized tests.
spatial abilities, and higher correlations with numerical and PM
For the eight test pairs that had reliability estimates, computa-
abilities). However, the computerized versions of the PS tests had
tion of cross-modality estimates of correlations, after correcting for
a higher level of communality with the general ability composite
unreliability of the individual tests, increased the mean correlations
(sharing 36% of the variance vs. 25% of the variance for the
to a mean of 0.74 for initial tests and retests of the computerized
paper-and-pencil PS composite), and also a higher level of com-
tests. This made it clear that the tests did not correlate to the limit
munality with the PM ability composite (33% of the variance vs.
of their respective reliabilities (otherwise the correlations would be
21% of the variance for the paper-and-pencil PS composite).
near unity). Similar results were obtained by correlating aggregate
scores for the paper-and-pencil and computerized PS tests. That is,
Study 2
at the composite ability level, the correlations between the paper-
and-pencil PS test composite and the computerized PS test com- The goal of the next study was to develop and evaluate a series
posites were 0.73 and 0.75, respectively for the test and retest sets; of PS tests that allowed for the simultaneous presentation of
Table 4
Study 1. Correlations Between Composite Perceptual Speed Scores and Content Ability Composites, t-test for Difference
Ability composite P&P Computer A Computer B t(P&P,A) t(P&P,B)
1. Verbal 0.298 0.357 0.446 ⫺1.10 ⫺3.00**

2. Numerical 0.491 0.546 0.532 ⫺1.15 ⫺0.88
3. Spatial 0.362 0.438 0.378 ⫺1.47 ⫺0.32
4. Psychomotor 0.455 0.575 0.550 ⫺2.54** ⫺2.06*
General (V ⫹ N ⫹ S) 0.500 0.596 0.591 2.08* ⫺2.04*
Note. P&P ⫽ paper and pencil; V ⫹ N ⫹ S ⫽ Verbal ⫹ Numerical ⫹ Spatial. N ⫽ 167, df ⫽ 164. Composite P&P PS versus Composite Computer A:
r ⫽ 0.730. Composite P&P PS versus Composite Computer B: r ⫽ 0.751. Composite Computer A with B: r ⫽ 0.889.
multiple stimuli (including target and distractor stimuli) on the following exception. For verbal ability the Word Beginnings was
computer screen. This would essentially allow the examinee to replaced with a Cloze test, which we expected to provide a
scan and select multiple stimuli, which is similar to the format somewhat more robust indicator for the Verbal ability factor.
encountered on several paper-and-pencil PS tests. For example, in PS tests. Seven of the 11 computerized PS tests from Study 1
the paper-and-pencil version of the Scattered X’s test, the exam- were administered in Study 2 (see Appendix), where a single item
inee is presented with sheets of paper that have a random distri- appeared on the screen at a time. In addition 6 new PS tests or new
bution of letters and spaces, and five “x” stimuli. The examinee versions of the other tests were administered. Two of the tests
must scan the page for the x’s, and then circle them. For computer (Factors of 7 and Summing to 10) represented a different format
administration, it is necessary to present a display with similar from that used in Study 1. The other four tests (Canceling Symbols,
characteristics (targets and distractors), and allow the examinee to Finding a and t, Scattered Xs, and Finding 僆 and ¥) were new tests,
select multiple items without starting over after each target selec- based on other paper-and-pencil measures (e.g., see Ackerman &
tion. The strategy we adopted was to present each of the target and Cianciolo, 2000). Each of these new/revised tests was constructed in
distractor stimuli on the screen simultaneously. When the exam- such a way to mimic the multiple simultaneous stimulus display that
inee touched a stimulus location with the TouchPen, the item was is similar in format to the paper-and-pencil versions of these tests.
changed to indicate that it had been selected (usually by placing a That is, by providing a full screen of stimuli, the tests allowed
red “x” on top of the stimulus). The stimulus was still visible, and participants to use or develop scanning strategies that involve search-
if the examinee chose to change his or her answer, the examinee ing for the matching stimuli, rather than making a discrete and explicit
needed only to touch the location again (and the red “x” disap- “yes” or “no” (or “same” or “different”) type response for each
peared). At the bottom of each screen was a “Next Page” button individual stimulus in isolation. Participants were presented with
that the examinee pressed when that screen was completed. screens of as many as 286 items on a page, and were told to advance
The evaluation of these new tests proceeded in a similar fashion to the next page when they had completed the items. All of these tests
to that used in Study 1. The new tests were administered in a are markers for a PS-Pattern Recognition factor (see Ackerman &
test–retest format to provide reliability and practice effects data. Cianciolo, 1999), though both Factors of 7 and Finding 僆 and ¥ also
Rather than administering parallel tests for each new test, we load significantly on a PS-Memory factor.
administered a selected battery of the single-stimulus display PS
computerized tests so that we could evaluate cross-battery corre- Procedure
lations. However, we also included a battery of reference content
and PM tests, so that construct validity of the new measures could The study took place over two, 3-hour sessions, totaling 6 hours.
be examined. The predictions for this experiment were similar to The sessions were separated by at least 24 hours, and no more than
those of Study 1. We expected the new tests to have high levels of 48 hours. Procedural details (e.g., breaks, interspersing content and
internal reliability (⬎0.80), and show marked practice effects (d ⫽ PS test sets) were the same as in Study 1, with the exceptions noted
0.5–1.5). We expected that a composite from these new tests below. In Session 1, participants first completed the PM tests,
would be substantially correlated with a composite from the followed by the first administration of the six multiple-item-
single-stimulus display PS tests, but not as highly as would be display PS tests. Participants then completed a set of the content
expected from parallel tests. Specifically, we expected a correla- reference factors, followed by a break, some additional unrelated
tion in the range of r ⫽ 0.5 to 0.6, based on the following: (a) All measures, and one more spatial ability test. In Session 2, partici-
of the multiple-stimulus PS tests involve a dominant association pants started with tests of the seven PS tests from Study 1,
with PS-Pattern Recognition; (b) Most of the single-stimulus PS followed by a break and the remaining reference tests. Following
tests administered in the study involved dominant associations of a final 5-minute break, the participants completed retesting of the
PS-Scanning, PS-Memory, PS-Complex or some combination of six new multiple-item-display PS tests (with new item orders).
these factors; and (c) the correlations among all the four PS factors Participants received $60 for participation in the study.
in previous research were generally in the range of r ⫽ 0.5 to 0.6).
Finally, we expected that the correlation between a composite from Results
the multiple-stimulus PS tests would be more highly correlated
with PM ability than the single-stimulus PS tests, but would Descriptive statistics of the two sets of PS tests will be presented
otherwise show a similar pattern of correlations with the content first, along with internal, test–retest, and cross-format (single stim-
ability measures, given the expected relatively high communality ulus vs. multiple stimulus display computerized PS test) correla-
among the various PS measures. tions. Next, the descriptive statistics for the content and PM
reference tests will be presented, along a factor analysis used in
Method support of building ability composite measures. Finally, correla-
tions between the sets of PS tests and content and PM abilities will
Participants. Participants were selected in same manner as in
be presented.
Study 1. One hundred and 60 adults participated. The sample had
Descriptive statistics. Means and SDs for the test and retest
90 men and 70 women, Mage ⫽ 21.06, SDage ⫽ 1.74, range ⫽ 18
administrations of the multiple-item-display computerized PS tests
to 27 years. None of these individuals participated in Study 1.
are provided in Table 5, along with a dependent t test for the
difference in means, and effect sizes expressed as Cohen’s d. All
Apparatus
of the new PS tests showed significant and substantial increases in
Reference ability tests. The same content and PM reference performance from test to retest occasions, ranging from d ⫽ 0.61
ability tests used in Study 1 were used in Study 2, with the to 1.15; that is, medium to large in magnitude. These results are
Table 5
Study 2. Psychometric Characteristics of Computerized PS Tests and Test-Retest Reliabilities
Initial test (A) Retest (B) Test-retest

a
M SD rxx, M SD ra,b t(b⫺a) d
New computerized perceptual

speed tests (multiple stimulus display)
1. Canceling Symbols 74.38 14.08 0.92 83.68 15.80 0.80 12.15** 0.61
2. Factors of 7 44.58 14.12 0.93 53.39 14.44 0.90 17.67** 0.62
3. Finding a and t 41.38 6.46 0.93 50.82 8.65 0.81 23.54** 1.15
4. Summing to 10 62.83 9.85 0.92 73.66 11.29 0.82 21.22** 1.00
5. Scattered X’s 18.33 4.47 0.96 21.78 5.15 0.80 14.20** 0.71
6. Finding 僆 and ¥ 31.97 6.27 0.94 37.09 6.38 0.80 16.42** 0.82
Single-stimulus perceptual speed tests
1. Name Comparison 50.78 8.68 0.88
2. Digit-Symbol 58.59 9.97 0.92
3. Number Comparison 31.77 9.41 0.88
4. Coding 50.78 8.68 0.92
5. Directional Headings 1 51.39 10.99 0.95
6. Letter/Number Substitution 58.62 9.66 0.91
7. Naming Symbols 80.51 9.41 0.90
Note. N ⫽ 160; df ⫽ 157.

a
rxx ⫽ Internal reliability for the computerized PS tests computed by calculating mean (via r-to-z transformation) correlation across three test parts, and
then computing reliability via Spearman-Brown prophecy formula. For all tests, scores are number of items correct minus number items wrong.
**
p ⬍ .01
consistent with practice effects found with paper-and-pencil ver- display PS tests, and a similar analysis for the retest scores yields a
sions of these PS tests in other studies (e.g., see Ackerman & factor that accounts for 53.9% of the variance in the retest set.
Cianciolo, 2000), but are also generally equal to or larger than the As with the reference tests, unit-weighted z-score composites
effect sizes found for the single-item display PS tests described in were created for the seven single-item display PS tests, and the two
Study 1. administrations of the multiple-item display PS tests. Correlations
Internal test reliabilities for the new computerized multiple- between these composites and the reference ability composites are
stimulus display PS tests had a narrow range (from 0.92 to 0.96, shown in Table 7. The patterns of correlations with the reference
M ⫽ 0.93), in comparison to the analogous reliabilities for the abilities are largely similar for each of the sets of PS tests (higher
single-stimulus display (range from 0.88 to 0.95, M ⫽ 0.91). correlations with Numerical and PM abilities, and smaller corre-
Test–retest reliabilities for the new computerized PS tests were lations with Verbal and Spatial abilities), but the correlations
also generally quite substantial, ranging from 0.80 to 0.90 (mean between these aggregate scores and the spatial ability composite
reliability via r - to - z transformation ⫽ 0.83), and similar to those was significantly higher for the single item display PS test com-
of the other PS tests administered in Study 1. posites in comparison with multiple-item display PS test compos-
Reference content tests. Descriptive statistics and a CFA for ites. No significant differences were found for these two compos-
the reference tests are shown in Table 6. The factor solution ites and their respective correlations with either the general ability
indicated that the reference tests are good markers for the under- factor or for the PM ability factor.
lying factors identified in a priori selection ␹2(N ⫽ 160, 98) ⫽ Finally, at the composite ability level, the correlations between
158.45, p ⬍ .05, CFI ⫽ .96, RMSEA ⫽ .061. Correlations among the single-item display PS test composite and the multiple-item
the factors are also provided in Table 6. For further analyses, display PS test composites were 0.63 and 0.66, respectively for the
unit-weighted z-score composites of the ability measures were test and retest sets; while the correlation between the test and retest
constructed. multiple-item display PS composites was r ⫽ 0.89.
Validity. Factor analysis of the single-item display PS tests
and/or the multiple-item display PS tests, using the parallel anal- Discussion
ysis criterion suggests that there are four underlying factors that
could be derived from the aggregate group of 13 computerized PS Generally, the results of the study were consistent with the
measures; however, a single factor accounts for nearly five times predictions. Test–retest reliabilities for the six multiple-stimulus
the amount of variance as the second and subsequent factors (e.g., PS tests were relatively high and relatively uniform (range ⫽
43.9% vs. 9.2% for Factor 1 and Factor 2, respectively). Similar to 0.80 – 0.90; M ⫽ 0.83), even in the face of medium to large mean
Study 1, deriving a single factor for each set of measures illustrates practice effects for all of the tests (mean d ⫽ 0.82). In contrast,
that there is substantial degree of communality among the various only one test in the single-stimulus computerized tests adminis-
measures, such that nearly half of the variance is captured by a tered in Study 1 (a single stimulus version of the Factors of 7 test)
single general PS factor in each case. The single factor accounts showed as large a practice effect as those found in Study 2,
for 51.1% of the variance in the set of seven single-item display PS suggesting that the multiple-stimulus array design of the current
tests. A single factor accounts for 50.4% of the initial multiple-item PS-pattern recognition tests involved a greater degree of perfor-
Table 6
Study 2. Reference Tests Descriptive Statistics and Factor Solution
# of Items M SD ␣ Verbal Numerical Spatial PM
Reference tests
3. Vocabulary 48 21.10 5.81 0.92 0.65
4. Cloze 39 46.44 7.29 — 0.60
5. Math Knowledge 32 23.55 5.52 0.87 0.70
6. Arithmetic 20 7.79 4.48 0.74 0.69
7. Number Series 20 11.93 2.70 0.76 0.57
9. Cube Comparison 42 28.40 8.39 0.94 0.72
11. Paper Folding 24 16.42 5.38 0.85 0.77
13. Tapping 5 95.54 14.30 0.91 0.40
Correlations between factors
Numerical 0.53
Spatial 0.48 0.53
Psychomotor 0.21 0.16 0.20
Note.— ⫽ cannot be computed; PM ⫽ psychomotor; MAB ⫽ Multidimensional Aptitude Battery; RT ⫽ reaction time.
a
Reliability calculated from Part 1, Part 2 correlation, and Spearman-Brown prophecy formula.
mance improvement over practice. In previous studies with paper- tests (e.g., Ackerman & Cianciolo, 2000) only indicated that the
and-pencil versions of these tests, similarly large practice effects PS-Complex composite had a significantly higher correlation with
were found for these tests, though the amount of overall test a spatial ability composite (r ⫽ 0.64 for PS-Complex, vs. 0.30,
practice was much greater (see Ackerman & Cianciolo, 2000). 0.32, and 0.24 for PS-Scanning, PS-Memory, and PS-Pattern Rec-
In terms of association between composites of the single- ognition, respectively). Removing the one PS-Complex test (Di-
stimulus PS tests and the multiple-stimulus PS tests, the correla- rectional Headings 1) from the PS-Single Stimulus composite did
tions of 0.63 for initial test and .66 for retest were slightly higher not markedly change the correlation with spatial ability. For com-
than what was predicted (r ⫽ 0.50 to 0.60), although the margin of parison purposes, Lohman’s (1979) hierarchical representation of
difference between the predicted values and the obtained values is spatial abilities placed spatial visualization at the top of the hier-
neither statistically significant nor substantially meaningful in archy and PS abilities at the bottom of the hierarchy. Although the
magnitude. evidence from this study is indirect, it appears that the multiple-
Contrary to our prediction, there was no significant difference in stimulus display PS tests would best be located close to the bottom
correlations with the PM factor for the two different types of PS of the hierarchy of spatial abilities, as suggested by Lohman, but
tests. The correlations between respective composites and refer- the single-stimulus display PS tests would be located slightly
ence ability factors indicated only that the multiple-stimulus PS higher in the hierarchy, perhaps indicating a greater involvement
tests were significantly less highly associated (than single-stimulus of spatial information processing in performance on these tests in
PS tests) with spatial abilities, while not showing significant dif- the computer administration format.
ferences in correlations with the other reference abilities. A pre- Together, these results indicate that the multiple-stimulus PS
vious investigation with paper-and-pencil versions of all of these tests are reliable and valid, but they also represent a moderate
Table 7
Study 2. Correlations Between Composite PS Scores and Content Ability Composites, t-test for Differences
PS-SS PS-MS(A) PS-MS(B) t(SS-MS(A)) t(SS-MS(B))
Ability composite
1. Verbal 0.415 0.353 0.369 ⫺0.99 ⫺0.80
2. Numerical 0.370 0.435 0.433 1.05 1.06
3. Spatial 0.397 0.265 0.275 ⫺2.08* ⫺2.01*
4. Psychomotor 0.390 0.457 0.378 1.10 0.20
General (V ⫹ N ⫹ S) 0.510 0.458 0.466 ⫺0.89 ⫺0.78
Note. SS ⫽ single stimulus display; MS ⫽ multiple stimulus display; A ⫽ initial test; B ⫽ retest. Composite PS-SS versus PS-MS(A): r ⫽ 0.629.
Composite PS-SS versus PS-MS(B): r ⫽ 0.659. Composite PS-MS (A) with PS-MS (B): r ⫽ 0.891.
*
p ⬍ .05.
amount of common variance with the single-stimulus PS measures. participants for the criterion task performance measures. None of
Whether one considers composite cross-correlations in the range these individuals participated in Study 1 or Study 2.
of r ⫽ 0.63 to0 .66 to be more indicative of a high degree of Apparatus. The apparatus used for the testing portions of this
overlap or a modest degree of overlap, will depend on the indi- study were identical to those used in Studies 1 and 2. For the two
vidual’s orientation. From an applied point of view, however, the criterion tasks, practice trials were administered at individual car-
higher the overlap between the respective composites, the less rels, with participants separated by partitions. Up to 16 participants
likely it will be to find useful incremental predictive validity for were run at one time. The tasks were administered on Dell and
criterion task measures. IBM Pentium computers with Trinitron 17“ monitors. For the ATC
task, visual information was displayed in monochrome text char-
Study 3 acters on a black background, using standard MS-DOS characters.
For the TRACON task, visual information was displayed in color
For Study 3, our goal was to determine the criterion-related VGA (640 horizontal pixels ⫻ 480 vertical pixels) resolution
validity of the single-stimulus display and multiple-stimulus dis- graphics. Audio information from the TRACON task was pre-
play PS tests, in the context of two skill acquisition tasks that have sented binaurally through headphones, using a SoundBlaster inter-
been previously demonstrated to be correlated with PS tests to face. Participants interacted with the task with standard IBM PC
varying degrees. The first task is the Kanfer-Ackerman ATC task 101 key keyboards and a MicroTech 3-button trackball.
(e.g., see Kanfer & Ackerman, 1989), a somewhat complex task Reference ability tests. The same PM and content reference
that is highly consistent. At least for some learners, asymptotic ability tests used in Study 2 were administered in Study 3, with a
skilled performance on the ATC task is normally accomplished few exceptions, as follows. The Cloze test was removed from the
within 4 to 5 hours. The second task places much greater demands set of Verbal tests, and Raven’s Advanced Progressive Matrices
on planning and decision making under time stress. It is a moderate was added to the set of Spatial ability tests. Also, an Alternate
fidelity simulation task of the “terminal radar approach control” or Tapping test was added to the set of PM ability tests. Finally, a set
TRACON air traffic controller (e.g., see Ackerman, 1992). While of 6 Working Memory (WM) tests were added to the battery of
performance improvements are found for most learners on this tests. All of these measures are described in the Appendix.
task, learning is much slower, and the task involves novel infor- PS tests. All of the PS tests administered in Study 2 were also
mation processing that precludes the development of automatic administered in Study 3. Two additional computerized PS tests
processing. We predicted that the two sets of PS tests would show were also added to the battery, namely the Directional Headings 2
significant and marked predictive validity correlations (r ⬎ 0.30) test (administered in Study 1 only) and a new computerized
with initial performance on the two tasks. We expected that cor- single-item display Number Sorting test. In all, there were nine
relations between PS composites and performance would generally single-item display PS tests and six multiple-item display PS tests,
decline for the ATC task, as the participants reached a level of highly for a total of 15 PS tests.
skilled, roughly automatic processing (which is also associated with a
drop in between-individual variability, see Ackerman, 1987, 1990).
Because the underlying processes involved in TRACON performance Criterion Tasks
are relatively stable over task practice, we expected that correlations Kanfer-Ackerman (ATC) task. The ATC task was selected as a
between PS composites and performance on TRACON would be criterion because it is procedural, complex, and involves consistent
relatively stable—that is, not show increasing or decreasing correla- stimulus-response mappings. Extensive data have been collected
tions. with this task, including performance from over 5,000 participants
As in Studies 1 and 2, we also included a battery of reference across more than 20 different task and participant-sample config-
abilities (verbal, spatial, mathematical) to establish evidence for urations (for a review, see Ackerman & Kanfer, 1994). These
construct validity. An assessment of working memory ability was extensive data make it possible to predict learning and perfor-
also included in order to broaden our assessment of reference mance characteristics for specific samples and practice conditions.
abilities. Working memory ability has been shown to be related to Details of the ATC task have been provided elsewhere in the
all of the reference ability measures included here but has rarely literature (e.g., Ackerman, 1988, 1990; Ackerman & Kanfer, 1994;
been examined in terms of its predictive validity for skill acquisi- Goska & Ackerman, 1996; Kanfer & Ackerman, 1989). The key
tion (e.g., see Ackerman, Beier, & Boyle, 2002). Because the main aspects of the ATC task description are discussed in detail in
purpose of Study 3 was to establish evidence for criterion related Ackerman (1988), and the elements of the task display are shown
validity, the inclusion of this broad range of reference ability tests in Figure 2. Performance on the task was operationalized as the
allowed us to determine whether or not the PS assessments ac- number of planes landed in a 10-minute trial.
counted of unique variance in task performance over and above TRACON Task Wesson International. The TRACON software
these abilities. used in this study is a modification of the early professional
version (V1.52) of TRACON developed by Wesson International,
Method that allows for collection of a variety of data. A sample static
screen display for TRACON is provided in Figure 3. The follow-
Participants. Participants were selected in same manner as in ing description of the TRACON platform and task trial design is
the previous studies. One hundred seventeen adults participated. reprinted from Ackerman and Kanfer (1993, p. 417).
The sample had 74 men and 43 women, Mage ⫽ 21.39, SDage ⫽
2.16, range ⫽ 18 to 30 years. Three participants dropped out of the The task requires that trainees learn a set of rules for air traffic control,
study before the start of the ATC task, leaving a sample of 114 including reading flight strips, declarative knowledge about radar
Figure 2. The Kanfer-Ackerman Air Traffic Controller Task. The figure

is a literal static representation of the real-time task display. See text for a
description of task elements. From ”Determinants of individual differences
during skill acquisition: Cognitive abilities and information processing“ by
P. L. Ackerman, 1988, Journal of Experimental Psychology: General, 117,
308. Copyright 1988 by the American Psychological Association. FLT ⫽ Figure 3. Static copy of Terminal Radar Approach Controller (TRA-
flight. Pts ⫽ points. CON) screen. There are three major components to the display. The
right-hand side of the screen shows Pending (not under control) and Active
(under control) flight strips. Each flight strip lists (a) plane identifier, (b)
beacons, airport locations, airport tower handoff procedures, en-route plane type, (c) requested speed, (d) requested altitude, (e) radar fix of sector
center handoff procedures, plane separation rules and procedures, entry, and (f) radar fix of sector exit (including Tower or Center). The
monitoring strategies, and strategies for sequencing planes for maxi- lower part of the screen shows a communications box that gives a printout
mum efficient and safe sector traversal. In addition, trainees are of the current (and last few) commands issued by the trainee, and the
required to acquire human-computer interface skills: including issuing responses from pilots or other controllers. The main part of the screen
trackball-based commands, menu retrieval, keyboard operations, and shows a radar representation of the Chicago sector. Planes are represented
integration between visual and auditory information channels. . . . by a plane icon, and a data tag (which gives the identifier, the altitude, and
an indication of current changes in altitude). The sector is bounded by the
Performance measurement. As with a previous investigation irregular dotted polygon describing a perimeter. Radar fixes are shown as
(Ackerman, 1992), overall performance was computed as the sum small (⫹) figures on the radar screen. Airports are shown with approach
of all flights accepted into the sector that had a final disposition cones, and a circle indicating the facility proper. A continuous radar sweep
within the simulation time (minus any planes that were incorrectly is shown (updating at 12 o”clock, every 5 s). Range rings are also
disposed of— e.g., crashes, not-handed-off, vectored off the radar displayed, indicating 5-mile distances. From “Predicting individual differ-
ences in complex skill acquisition: Dynamics of ability determinants” by
screen). (Ackerman & Kanfer, 1993, p. 417)
P. L. Ackerman, 1992, Journal of Applied Psychology, 77, p. 602. Copy-
right 1992 by the American Psychological Association
TRACON Training Trials
Each [training] trial was comprised of 16 overflights and departures given after every four 10-minute task trials. Session 5 consisted of
(with roughly equal frequency), and 12 arrivals. The planes requested
a 1-hour instructional video for the TRACON task, followed by
entry to the airspace at irregular intervals that were constrained to
three 30-minute TRACON task trials. Sessions 6 and 7 each
require the trainee to be always occupied with at least one active
target. The trials were also constrained so that perfect performance consisted of 6 TRACON task trials each. During TRACON task
(handling all 28 planes successfully) was just beyond the skill level practice, 5-minute breaks were provided after every two 30-minute
achieved by subject matter experts. Each trial was concluded in 30 trials. A total of 4 hours, 40 minute of practice was completed on
min (Ackerman & Kanfer, 1993, p. 417). the ATC task, and 7.5 hours of practice on the TRACON task.
Participants were compensated $270 each at the end of the study.
Procedure
Results
The study took place over 7 sessions. The first two sessions
were 3 hours in duration, and the final 5 sessions were 31⁄2-hour Descriptive statistics and reliability estimates for the reference
sessions, for a total of 231⁄2 hours. The sessions were separated by ability tests will be presented first, followed by a description of the
at least 24 hours, and no more than 72 hours. Procedural details psychometric properties of the PS tests. Next the correlations
(e.g., breaks, interspersing content, and PS test sets) were the same between the reference abilities and the PS composites will be
as in the earlier studies, with the exceptions noted below. discussed. The section that follows provides descriptive informa-
Sessions 1 and 2 were devoted exclusively to ability testing. In tion on the criterion tasks. Finally, the correlations between the PS
Session 3, participants completed instructions on the ATC task, tests, reference abilities and the criterion tasks (ATC and
followed by sixteen 10-minute task trials. Session 4 consisted of TRACON) will be presented.
twelve 10-minute ATC task trials, followed by a break and then the Descriptive statistics. Means, SDs, reliability estimates, and
administration of the six WM tests (with a 5-minute break after the factor loadings for the reference ability measures are shown in
first three tests). During ATC task practice, 5-minute breaks were Table 8. A CFA was conducted using LISREL 8.7 (Jöreskog &
Table 8
Study 3. Means, SDs, Reliability Estimates for Ability Measures, and CFA Factor Loadings (CFA With a Second-Order g Factor)
# of
Reference measures Items Mean SD ␣ Verbal Numerical Spatial WM PM

3. Vocabulary 48 21.90 8.15 0.92 0.79
5. Arithmetic 20 7.67 4.96 0.79 0.78
6. Math Knowledge 32 21.44 7.30 0.91 0.68
7. Number Series 20 10.99 2.97 0.78 0.84
8. Raven’s Progressive Matrices 48 36.00 6.99 0.88 0.75
10. Verbal Test of Spatial 24 13.28 5.03 0.80 0.74
Abilities
12. Paper Folding 24 14.79 5.81 0.86 0.86
13. ABCD Order 24 15.76 5.86 0.91 0.85
14. Alpha Span 99 36.40 16.25 0.81 0.57
15. Computation Span 75 40.51 17.84 0.95 0.64
16. Backward Digit Span 99 44.21 19.16 0.81 0.69
17. Spatial Span 105 52.97 14.03 0.78 0.77
18. Word Sentence 60 26.26 12.60 0.94 0.67
19. Tapping 5 97.88 17.58 0.96
20. Alternate Tapping 5 38.74 5.31 0.91
Tapping Composite — — — — 0.71
21. 8-Choice Serial RT 50 3,721.52 596.13 .97
22. 4-Choice Serial RT 50 1,555.92 197.13 .94
23. 2-Choice Serial RT 50 791.25 107.42 .86
Choice RT Composite — — — — 0.96
g 0.53 0.90 0.78 0.92 0.63
Note. WM ⫽ working memory; PM ⫽ psychomotor; RT ⫽ reaction time. N ⫽ 117. Factor loadings are from a confirmatory factor analysis with a
second-order factor (g). Loadings on g are shown in the last row. Tapping composite and Serial RT composites are composite measures of tapping and serial
RT measures, respectively.
Sörbom, 2006), with five first-order factors (verbal, numerical, ability were on average higher than correlations between the PS
spatial, working memory, and PM) and one second-order general composites and Verbal ability (correlations between PS and Verbal
ability factor, g. For purposes of the CFA, the PM measures were ability in the 0.20s and 0.30s compared with correlations from 0.46
divided into two composites (tapping and choice reaction time). to 0.74 with the other reference abilities). Also noteworthy is the
The table shows the loadings of each reference test and the PM large correlation between the PS-SS composite and working mem-
composites on the first order factors. The loadings of the first order ory ability, r ⫽ 0.74, which is significantly larger than the corre-
factors on the second order g factor are shown in the last row of the lation of r ⫽ 0.54 between PS-MS and working memory, t(114) ⫽
table. Fit of this model was good ␹2(N ⫽ 117, 165) ⫽ 280.15, p ⬍ 4.66, p ⬍ .001.
.05, CFI ⫽ 0.97, RMSEA ⫽ 0.07 showing that the reference tests
are good markers for the underlying factors. Unit weighted z-score
composites were computed for the Verbal, Numerical, Spatial, Criterion Task Performance: Descriptive Statistics
Working Memory, and PM abilities using the same procedure as
ATC Task. Performance on the ATC task over practice was
outlined previously.
Means, SDs, and reliability estimates for the PS tests are shown consistent with results found with other studies that have used this
in Table 9. A CFA was conducted using LISREL with two first- task (e.g., Ackerman & Cianciolo, 2000). Mean performance in-
order factors (PS - Single Stimulus Display [PS-SS], PS - Multiple creased rapidly early in practice, and then reached a rough asymp-
Stimulus Display [PS-MS]). The factor loadings from this model tote after 3.5 to 4 hours of time-on-task (see Figure 4a), though
are also shown in the table. The fit of this model was good, ␹2(N ⫽ only two participants reached maximum performance levels by the
117, 85) ⫽ 154.66, p ⬍ .01, CFI ⫽ .98, RMSEA ⫽ .089, end of the practice trials. The effect size comparing initial-to-final
demonstrating that the measures are good markers for two factors task performance was d ⫽ 3.44. Interindividual variability de-
of PS abilities. Unit weighted z-score composites were also created clined over the course of practice, by a total of 20% (from SD ⫽
for these PS factors (PS-SS and PS-MS). These composites are 10.15 landings/trial to SD ⫽ 8.14 landings/trial), even as average
used for subsequent analyses. performance increased from M ⫽ 46.55 landings/trial for Trials
Correlations between reference ability composites and PS com- 1– 4 to M ⫽ 68.49 for Trials 25–28. Consistent with earlier studies,
posites are shown in Table 10. As can be seen in the table, there were gender differences in mean performance favoring men
correlations among the PS composites and spatial and numerical (overall difference between genders; d ⫽ 0.52).
Table 9
Study 3. Means, SDs, Reliability Estimates, and Factor Loadings for Perceptual Speed Ability Tests
Factor loadings
a
M SD rxx PS-SS PS-MS
Single stimulus
1. Digit Symbol 56.68 10.89 0.95 0.78
2. Number Comparison 33.55 7.12 0.95 0.60
3. Coding 51.56 9.17 0.95 0.75
4. Directional Headings 1 52.54 11.66 0.96 0.83
5. Directional Headings 2 26.62 7.72 0.95 0.79
6. Letter/Number 59.04 12.16 0.95 0.85
Substitution
7. Naming Symbol 84.57 11.54 0.96 0.77
8. Number Sorting 13.58 4.07 0.92 0.65
Multiple stimulus
1. Canceling Symbols 78.90 15.81 0.94 0.70
2. Finding a and t 42.87 7.85 0.94 0.82
3. Summing to 10 62.31 11.89 0.94 0.78
4. Scattered X’s 19.89 5.56 0.95 0.62
5. Factors of 7 46.10 15.89 0.96 0.52
6. Finding 僆 and ¥ 31.85 7.61 0.95 0.76
Note. PS-SS ⫽ Perceptual Speed, Single Stimulus Display; PS-MS ⫽ Perceptual Speed, Multiple Stimulus Display. Factor loadings based on
confirmatory factor analysis. Correlation between PS-SS and PS-MS factors was r ⫽ 0.85 in the CFA results.
a
Reliability estimates computed by calculating mean correlations (via r-to-z transformation) across three test parts and computing reliability via
Spearman-Brown prophecy formula.
TRACON. Performance on the TRACON task over practice set. In contrast to the decrease in interindividual differences ob-
was also consistent with results found with other studies that have served in the ATC task, interindividual variability increased from
used this task (e.g., Ackerman & Kanfer, 1993; Ackerman, Kanfer, SD ⫽ 5.47 planes/handled for Trials 1⫺3 to SD ⫽ 6.85 in Trials
& Goff, 1995). Mean performance increased from M ⫽ 8.89 13–15, an increase in 25% in SDs. Consistent with earlier studies,
planes handled/trial for Trials 1–3 to M ⫽ 16.86 planes handled/ there were gender differences in mean performance favoring men
trial for Trials 13–15 (see Figure 4b). The effect size comparing (overall difference between genders; d ⫽ 0.65)
initial-to-final task performance was d ⫽ 1.87. Although several Relations between predictors and criteria. Correlations
participants came close to handling all 28 planes in the final trials, among the five TRACON and seven ATC trials and the reference
no participant actually handled all of the flights across a three-trial ability and PS composites are shown in Figure 5. As can be seen
Table 10
Study 3. Correlations Among Ability Predictor Composites and Criterion Task Performance Scores
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1. Verbal
2. Spatial 0.48
3. Math 0.43 0.62
4. Working memory 0.45 0.69 0.65
5. PS-SS 0.27 0.45 0.52 0.54
6. PS - MS 0.35 0.64 0.62 0.74 0.77
7. Psychomotor 0.21 0.57 0.40 0.56 0.57 0.65
8. ATC 1–4 0.48 0.70 0.51 0.62 0.33 0.54 0.46
9. ATC 5–8 0.40 0.60 0.53 .57 0.33 0.51 0.42 0.90
10. ATC 9–12 0.34 0.53 0.49 0.56 0.32 0.46 0.42 0.82 0.92
11. ATC 13–16 0.35 0.53 0.48 0.59 0.30 0.45 0.41 0.79 0.87 0.94
12. ATC 17–20 0.33 0.53 0.48 0.60 0.31 0.49 0.48 0.82 0.86 0.92 0.92
13. ATC 21–24 0.34 0.54 0.41 0.59 0.32 0.44 0.46 0.78 0.78 0.83 0.84 0.90
14. ATC 25–27 0.34 0.49 0.41 0.56 0.34 0.38 0.45 0.69 0.71 0.79 0.80 0.82 0.91
15. TRACON 1–3 0.39 0.63 0.43 0.55 0.41 0.57 0.39 0.64 0.55 0.53 0.54 0.55 0.52 0.47
16. TRACON 4–6 0.31 0.52 0.43 0.55 0.34 0.54 0.37 0.65 0.58 0.55 0.56 0.55 0.51 0.45 0.88
17. TRACON 7–9 0.37 0.66 0.46 0.53 0.41 0.57 0.39 0.62 0.55 0.54 0.54 0.53 0.51 0.50 0.85 0.90
18. TRACON 10–12 0.32 0.66 0.49 0.52 0.36 0.53 0.42 0.65 0.57 0.59 0.57 0.59 0.56 0.56 0.79 0.85 0.94
19. TRACON 13–15 0.32 0.64 0.48 0.52 0.37 0.53 0.44 0.65 0.58 0.59 0.59 0.61 0.59 0.59 0.78 0.83 0.90 0.95
Note. ATC ⫽ Kanfer-Ackerman Air Traffic Control Task; TRACON ⫽ Terminal Radar Approach Control; PS ⫽ perceptual speed; SS ⫽ single stimulus
display; MS ⫽ multiple stimulus display. N ⫽ 114 to 117. All correlations significantly different from zero, p ⬍ .01 except rVerbal, Psychomotor, p ⬍ .05.
Figure 4. Criterion task performance. A (top panel), Performance on the

Kanfer-Ackerman Air Traffic Control Task over practice. Shown are 25th
percentile, mean, and 75th percentile performance levels by groups of trials
(1 trial ⫽ 10 minute time-on-task). B (bottom panel), Performance on the
Terminal Radar Approach Controller task over practice. Shown are 25th
percentile, mean, and 75th percentile performance levels, by groups of
trials (1 trial ⫽ 30 minute time-on-task). Figure 5. Correlations between ability composite variables and perfor-
mance over practice on the Air Traffic Controller (ATC) Task and the
Terminal Radar Approach Control (TRACON) task. A (top panel), Spatial,
Numerical, and Verbal Ability. B (middle panel), Working memory and
in the three panels of the figure, the average correlations between
Psychomotor ability composites. C (bottom panel), Perceptual Speed–
TRACON performance and Verbal and PM abilities are generally Single Stimulus Display and Perceptual Speed Multiple-Stimulus Display
lower than correlations between performance on the criterion tasks ability composites.
and the other reference abilities (correlations in the .30s and .40s
compared to correlations ranging from 0.40 – 0.70 for the other
abilities) although all correlations are significant. The correlations (1988) theory, which predicts increasing and then decreasing corre-
between PS-SS and PS-MS are also graphically shown in Figure 5. lations over task practice. However, these results are consistent with
As can be seen in the figure, correlations between PS-SS and the findings of Ackerman (1990) and Ackerman and Cianciolo
performance on TRACON and ATC are higher than correlations (2000), who suggest that the increasing and decreasing patterns found
between PS-MS and TRACON and ATC during initial perfor- for PS composites in previous studies were likely a function of the
mance. For the ATC task, correlations between PS-MS and per- heterogeneous nature of PS tests (e.g., Digit/Symbol tests show de-
formance and PS-SS and performance converge over practice, creasing correlations, Choice RT tests show increasing correlations,
while correlations between PS-MS and PS-SS do not converge and Number Comparison tests show stable correlations). Thus, the
over practice on the TRACON task. particular selection of PS tests affects the likely underlying pattern of
The patterns of stable or declining PS ability correlations with stability and change to the predictive validities of PS tests across skill
performance on the ATC task is not consistent with the Ackerman acquisition trials.
Because significant multicollinearity exists among the reference ability Aggregated prediction from the seven ability test composites for
composites (see Table 10), an analysis of partial correlations between criterion task performance was good. For the ATC task, squared
ability composites and TRACON and ATC performance was conducted. multiple correlations (R2) were 0.50, 0.38, 0.33, 0.38, 0.39, 0.37,
This analysis revealed that the correlations between the PS-SS and 0.37 for Groups of Trials 1–7, respectively, and 0.44 for overall
PS-MS composites and TRACON and ATC performance across all trials performance. For the TRACON task, squared multiple correlations
were nonsignificant after the reference abilities were accounted for (i.e., (R2) were 0.45, 0.42, 0.45, 0.41, and 0.40 for Groups of Trials 1–5,
working memory, verbal, spatial, numerical, and PM abilities partialed respectively, and 0.46 for overall performance. That is, in the
out). Because correlations were highest among spatial abilities and work- aggregate, the ability measures accounted for roughly 40% of the
ing memory abilities and the criterion tasks, similar partial correlation individual differences variance in task performance at the groups
analyses were done using these composites controlling for the other of trials level on both the ATC and TRACON tasks, and about
reference abilities. Results of this analysis indicated that spatial ability was 45% of the variance in overall performance in each task.
significantly related to ATC performance for the first two groups of trials,
controlling for all other abilities but not significantly related to subsequent Discussion
performance on the ATC task (r’s ⫽ 0.31, 0.18, 0.10 ns, 0.09 ns, 0.06 ns,
0.14 ns, 0.11 ns, df ⫽ 100). For TRACON, spatial ability was signifi- The goal of this study was to determine the criterion related
cantly related to performance across all 5 TRACON groups of trials after validity of single stimulus and multiple stimulus PS ability tests for
controlling for the other reference abilities (r’s ⫽ 0.29, 0.34, 0.41, 0.39, predicting performance across two skill acquisition tasks; ATC
0.35, for the five groups of trials, respectively, all correlations significant, which is a relatively consistent, yet somewhat complex task, and
p ⬍ .01, df ⫽ 90). TRACON which is a relatively complex and inconsistent task.
The opposite pattern emerged for working memory abilities. Results showed that both single stimulus and multiple stimulus PS
Specifically, working memory ability was not significantly corre- tests were valid predictors of both types of tasks throughout task
lated with performance on the TRACON task after controlling for performance. Specifically, PS-SS was generally more highly re-
the other reference abilities, but was significantly related to ATC lated to initial task performance than PS-MS for both TRACON
performance for all groups of trials except for one (0.19, 0.13 ns, and ATC tasks. However, for the less complex task, correlations
0.19, 0.26, 0.27, 0.30, 0.34, df ⫽ 100). Of note, these correlations between PS-SS and performance and PS-MS and performance
also increased in magnitude over practice. Figure 6 presents these converged over task practice. These results suggest that PS-SS
data graphically and shows the correlations between working would be a better predictor of complex task performance than
memory ability and performance on the criterion tasks without PS-MS. For predicting performance on less complex, or well-
controlling for the other reference abilities as well as the correla- learned tasks however, there may not be any meaningful difference
tions between working memory and performance while controlling between PS-SS and PS-MS measures. Given the relative ease and
for reference abilities. For TRACON, partialing out only the two efficiency of the PS computerized measures, our results suggest
PS and PM composites indicted significant effects of WM for the that these measures are viable alternatives to more cumbersome
first two groups of trials, but nonsignificant correlations for the paper-and-pencil tests for valid prediction of skill acquisition for
remaining groups of trials (0.23, 0.24, 0.17 ns, 0.18 ns, and 0.18 skills varying in complexity.
ns, df ⫽ 94, respectively). A similar effect was obtained when We also examined the predictive validity of the PS measures
partialing out the three content abilities (verbal, math, and spatial): controlling for other reference abilities. Our results indicated that
the correlation of WM with TRACON performance was signifi- neither PS-SS nor PS-MS accounted for significant variance over
cant only for the first group of trials (0.24, 0.19 ns, 0.07 ns, 0.04 and above an extensive set of assessments of content abilities
ns, 0.06 ns, df ⫽ 94, respectively). (verbal, spatial, mathematical), working memory, and PM abilities.
One apparent reason for this result is that there is a substantial
amount of common variance among the PS measures and espe-
cially the working memory and PM measures. This high degree of
communality between PS measures and working memory has been
found in previous research (e.g., see Ackerman et al., 2002), and
similar degrees of communality with PM ability and PS measures
have also been reported in some studies (e.g., Ackerman &
Cianciolo, 2000). Our analysis also revealed that after controlling
for the other reference abilities, spatial abilities accounted for
incremental variance in TRACON performance, but not ATC
performance after initial trials. The opposite pattern was found for
working memory performance (i.e., significant relations with ATC
performance, but not TRACON performance after controlling for
the other reference abilities). These results are likely a function of
the complexity of the TRACON task compared to the ATC tasks.
That is, performance on the TRACON task is a function of
Figure 6. Raw and partial correlations (partialing included all other managing planes taking off and landing over a longer period of
ability composites) for Working Memory ability composite with criterion time—requiring strategy and tapping higher-level abilities. In con-
task performance. ATC ⫽ Air Traffic Controller Task; TRACON ⫽ trast, performance on the ATC task takes place over a shorter
Terminal Radar Approach Control Task. period of time and thus is likely more highly predicated on more
short-term memory abilities—reflective somewhat of performance manipulate objects. Nonetheless, the introduction of touch-
on working memory ability tests. sensitive computer monitors allowed for the reliable and valid
One of the goals of this study was to evaluate how working assessment of some measures of dexterity and motor coordination
memory ability fit within the broader framework of predicting that were previously only assessed with specialized apparatus.
individual differences during skill acquisition. The results indicate Thus, at the “power” end of the power-speed ability assessment
that the role of WM depends on the complexity of the task. It continuum, extensive research indicates that computerized assess-
appears that WM does not provide incremental predictive validity ment can be as good as paper-and-pencil format testing, and may
for more complex tasks that are require continuous attention and have some distinct advantages (e.g., it is usually impossible for the
effort (e.g., TRACON). In contrast, WM measures did add incre- individual to answer the wrong question on the computerized
mental prediction for consistent skill learning tasks (e.g., the ATC administration of a test in contrast to when the individual has a test
task). The pattern of relations among the predictor and criterion
booklet and a separate optical scan form with which to record
measures suggest that WM is more predictive in later stages of
answers). At the “speed” end of the power-speed continuum, the
skill acquisition for these simple tasks—theoretically similar to PS
existing research has suggested that there are many difficulties in
abilities in (perhaps due to the variance shared by these two
obtaining equally reliable tests, and there has been little evidence
constructs, see also Ackerman et al. [2002] for similar findings
relating WM and PS measures). regarding the predictive validity of speed tests, except for those
studies noted earlier.
In the current series of studies, we have demonstrated that by using
General Discussion a perhaps more direct human-computer interface (touch-sensitive
Technological innovations in ability assessment historically have monitor and TouchPen stylus), it is possible to reliably assess PS
been implemented with a goal of maximizing efficiency of testing and abilities that involve both single stimulus presentation and simulta-
scoring while minimizing the “costs” in terms of psychometric char- neous target and distractor presentations. One must keep in mind that
acteristics of reliability and validity. Mental assessments by Galton in PS tests are normally shorter in duration (typically 1.5–5 min) than
the early 1880s (e.g., Galton, 1885), though mostly psychophysical in power tests (which may take 20 – 40 minute or longer for each test),
content were administered with specialized apparatus and a one and thus are more likely to show reduced comparative reliabilities,
examiner-to-one examinee format. Ebbinghaus (1896 –1897) was the ceteris paribus. Internal reliabilities of the new tests were good (from
first researcher who administered ability tests in a group testing 0.83 to 0.93 for the single-stimulus tests; and from 0.92 to 0.96 for the
format, using paper-and-pencil test forms. This innovation made it multiple-stimulus display tests), and test–retest reliabilities were high
possible to test large numbers of examinees simultaneously, but it (from 0.74 to 0.84 and 0.80 to 0.90 for the single and multiple-
came at a potential cost of confounding individual differences in stimulus display tests, respectively), even in the face of small-to-large
reading comprehension ability with the construct variance of interest mean practice effects. Direct cross-correlations between paper-and-
(i.e., memory and verbal fluency). The introduction of large-scale pencil and computerized PS tests indicated a generally moderate to
multiple choice paper-and-pencil tests, best exemplified by the Army
high level of communality (from 0.52 to 0.78), but not high enough to
Alpha Test (Yoakum & Yerkes, 1920) and the Otis Intelligence Test
indicate that the tests were correlating at the limits of their respective
(see Hull, 1928) had much greater advantages in terms of scoring
reliabilities, although at the level of composites across multiple tests,
efficiency, but represented a fundamental change in the underlying
the raw correlation was substantial r ⫽ 0.73), indicating that about
cognitive demands, from that of producing a correct response to
selecting a response from a list of options (see Carroll, 1982, for a 53% of the variance in the respective composites was common.
discussion of these issues). The introduction of separate test forms In terms of construct validity indicators, the single-stimulus display
capable of being scored by stencils or optical scanning devices in the computerized PS tests indicated higher correlations with the reference
late 1910s represented yet another efficiency in scoring technology, content abilities and with a PM ability composite than their paper-
but introduced potential confounds related to both the speed with and-pencil counterpart tests. In contrast to the single-stimulus display
which the examinee could locate and enter the answer on the scoring computerized PS tests, the multiple-stimulus display computerized PS
form, and the precision with which the answers were indicated (or tests showed only lower correlations with the spatial ability compos-
changed by erasure). ite. With respect to criterion-related validity (Study 3), both the
The use of computers for ability assessment (in contrast to the single-stimulus display PS test composite and the multiple-stimulus
oral, apparatus, or paper-and-pencil methods) is a relatively recent display PS test composite measures showed significant predictive
innovation in the history of applied psychology—large scale com- validity for the ATC task and the TRACON task throughout the
puterized ability assessment only started in the early 1980s, with respective practice sessions, though the single-stimulus display PS
the introduction of a computerized adaptive version of the ASVAB composite had higher predictive validities for both the ATC and
(e.g., see Moreno, Wetzel, McBride, & Weiss, 1984). The review TRACON tasks, compared with the composite based on the multiple-
by Mead and Drasgow (1993) indicated that for relatively un- stimulus display PS tests. The single-stimulus PS test composite did
speeded tests, there was little practical difference in whether the show a marked decline in predictive validities with ATC task perfor-
ability was assessed by paper-and-pencil or computer methods. mance at the later practice sessions, consistent with the similar de-
As noted by Ackerman and Cianciolo (1999), PM ability as- clining validity coefficients for the spatial ability composite, and the
sessment represented a significantly different problem for comput-
fact that the single-stimulus PS test composite was significantly more
erization in comparison to presenting, say, a four-term verbal
highly correlated with spatial ability than was the multiple-stimulus
analogy item on a computer screen, because of the need to provide
PS test composite.
a truly interactive interface for the individual to tap, trace, or
Conclusions/Implications environment. The results from the current investigation show that one
might expect acceptable results of such a transition in the operational
Historically, PS ability tests have been shown to have significant environment. For situations where PS ability assessment might be
and substantial validity for predicting individual differences in per- appropriate, but such testing has not been implemented for consider-
formance on skill-acquisition tasks and job performance measures ations of cost-benefit tradeoffs, it is possible that the benefits of
that involve speed and accuracy of responding. In addition, the time assessing PS abilities with computerized procedures may significantly
needed to administer PS ability tests is generally short, in comparison shift the balance in favor of the benefits side of the equation.
to content ability tests (e.g., verbal, spatial, math), so that a diverse
battery of PS tests can be administered to the examinee in a relatively References
brief period of time. To put this into context, consider that obtaining
a robust estimate of the content abilities in these studies required at Ackerman, P. L. (1987). Individual differences in skill learning: An inte-
least about 30 minutes for each ability. Assessment of PS ability with gration of psychometric and information processing perspectives. Psy-
four tests sampled from the set of measures we investigated could be chological Bulletin, 102, 3–27.
Ackerman, P. L. (1988). Determinants of individual differences during
accomplished in under 20 minutes, even with three repetitions of each
skill acquisition: Cognitive abilities and information processing. Journal
test. For applied use, it is useful to keep in mind that in the context of
of Experimental Psychology: General, 117, 288 –318.
the overarching structure of cognitive abilities (Carroll, 1993), PS Ackerman, P. L. (1990). A correlational analysis of skill specificity:
abilities share less common variance with broad content abilities than Learning, abilities, and individual differences. Journal of Experimental
content abilities share with each other and with general intelligence. Psychology: Learning, Memory, and Cognition, 16, 883–901.
This particular aspect of PS abilities, coupled with the demonstrated Ackerman, P. L. (1992). Predicting individual differences in complex skill
validity of PS ability tests, indicates that one might expect a larger acquisition: Dynamics of ability determinants. Journal of Applied Psy-
increment in overall predictive validity when adding PS measures to chology, 77, 598 – 614.
an estimate of general intellectual ability than would be obtained with Ackerman, P. L., Beier, M. E., & Boyle, M. O. (2002). Individual differ-
just about any other set of ability measures, for the kinds of occupa- ences in working memory within a nomological network of cognitive
and perceptual speed abilities. Journal of Experimental Psychology:
tions where individual differences in speed and accuracy of routine
General, 131, 567–589.
actions are important components of performance. Ackerman, P. L., Beier, M. E., & Boyle, M. O. (2005). Working memory
If one were to assess a wide range of content abilities with and intelligence: The same or different constructs? Psychological Bul-
multiple measures of each ability, as was accomplished in Study 3, letin, 131, 30 – 60.
but at the cost of 3 hours of total testing time, one might not expect Ackerman, P. L., & Cianciolo, A. T. (1999). Psychomotor abilities via
much value to adding PS and PM tests to the battery, given the touchpanel testing: Measurement innovations, construct, and criterion
high levels of multicollinearity among the various measures. How- validity. Human Performance, 12, 231–273.
ever, if one were to give a much shorter battery of tests (e.g., one Ackerman, P. L., & Cianciolo, A. T. (2000). Cognitive, perceptual speed,
verbal, one math, and one spatial ability test), there is likely to be and psychomotor determinants of individual differences during skill
a clear advantage to also including a brief battery of PS and PM acquisition. Journal of Experimental Psychology: Applied, 6, 259 –290.
Ackerman, P. L., & Kanfer, R. (1993). Integrating laboratory and field study
tests. If we were to take only the three content ability tests with the
for improving selection: Development of a battery for predicting air traffic
highest respective factor loadings (MAB Similarities, Number controller success. Journal of Applied Psychology, 78, 413– 432.
Series, and Paper Folding, see Table 8) to obtain a general ability Ackerman, P. L., & Kanfer, R. (1994). Kanfer-Ackerman Air Traffic
composite, the PS and PM tests would provide significant incre- Controller Task CD-ROM database, data collection program, and play-
mental predictive validity for both the ATC and TRACON tests back program manual. Minneapolis: Author.
(roughly 7.5% of additional variance accounted for each task). Ackerman, P. L., Kanfer, R., & Goff, M. (1995). Cognitive and noncog-
The traditional difficulty with practical use of PS ability assess- nitive determinants and consequences of complex skill acquisition. Jour-
ment has been with the need to depend on paper-and-pencil testing nal of Experimental Psychology: Applied, 1, 270 –304.
format, and the costs associated with hand-scoring test forms. Alderton, D. L., Wolfe, J. H., & Larson, G. E. (1997). The ECAT battery.
Administration of PS tests with off-the-shelf computer equipment Military psychology, 9, 5–37.
Andrew, D. M., Paterson, D. G., & Longstaff, H. P. (1979). Manual for the
(namely a standard PC and a touch-sensitive display with a stylus
Minnesota clerical test. New York: Psychological Corporation.
for input) can eliminate the high costs of PS ability assessment. Bennett, G. K., Seashore, H. G., Wesman, A. G. (1977). Differential
With the increasing availability of touch-sensitive computer de- Aptitude Tests Manual. New York: Psychological Corporation.
vices over the past 10 years (e.g., especially with the introduction Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS,
of dedicated Tablet-based operating systems), and the reduction in costs and SIMPLIS: Basic concepts, applications, and programming. Mah-
associated with their purchase, we expect that it will be more cost- wah, NJ: Erlbaum.
effective to include PS tests in assessment batteries, both for selection Cantor, J., & Engle, R. W. (1993). Working-memory capacity as long-term
purposes, and for further exploration of important research questions (e.g., memory activation: An individual-differences approach. Journal of Exper-
the effects of aging on PS abilities—e.g., see Salthouse & Ferrer-Caja, imental Psychology: Learning, Memory, and Cognition, 19, 1101–1114.
2003; Thorvaldsson, Hofer, & Johansson, 2006). Carroll, J. B. (1982). The measurement of intelligence. In R. J. Sternberg
(Ed.), Handbook of human intelligence (pp. 29 –120). Cambridge, MA:
Changing the method and procedures of ability assessment is
Cambridge University Press.
always a matter of concern for effects on reliability and validity, and Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic
the current example of transitioning from paper-and-pencil to com- studies. New York: Cambridge University Press.
puterized assessment of PS abilities is no different. Thus, any field Cobb, B. B., & Mathews, J. J. (1972). A proposed new test for aptitude screening
application of PS tests will require collection of reliability and validity of air traffic controller applicants (FAA-AM-72–18). Washington, DC: U.S.
information prior to the introduction of such tests in an operational Department of Transportation, Federal Aviation Administration.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Kyllonen, P. C. (1988). Cognitive Abilities Measurement (CAM) Battery
Hillsdale, NJ: Erlbaum. (version 4.0). Unpublished computer program.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, Lehto, J. (1996). Are executive function tests dependent on working memory
45, 1304 –1312. capacity? Quarterly Journal of Experimental Psychology, 49A, 29 –50.
Cureton, E. E., & Cureton, L. W. (1955). The Multi-aptitude Test. New Levine, E. L., Spector, P. E., Menon, S., Narayanan, L., & Cannon-Bowers, J.
York: Psychological Corporation. (1996). Validity generalization for cognitive, psychomotor, and perceptual
DiYanni, R. (1994). Literature: Reading fiction, poetry, drama, and the tests for craft jobs in the utility industry. Human Performance, 9, 1–22.
essay (3rd ed.). New York: McGraw-Hill. Lohman, D. F. (1979). Spatial ability: A review and reanalysis of the
Dunlap, W. P., Cortina, J. M., Vaslow, J. B., & Burke, M. J. (1996). correlational literature (Tech. Rep. No. 8). Stanford, CA: Stanford
Meta-analysis of experiments with matched groups or repeated measures University School of Education.
designs. Psychological Methods, 1, 170 –177. Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and
Ebbinghaus, H. (1896 –1897). Über eine neue Methode zur Prüfung geis- paper-and-pencil cognitive ability tests: A meta-analysis. Psychological
tiger Fähigkeiten und ihre Anwendung bei Schulkindern. [On a new Bulletin, 114, 449 – 458.
method for testing mental abilities and its use with school children.] Melton, A. W. (Ed.). (1947). Army Air Forces Aviation Psychology Pro-
Zeitschrift für Psychologie und Psysiologie der Sinnesorgane, 13, 401– gram Research Reports: Apparatus Tests. Report No. 4. Washington,
459. (Trans. by Wilhelm, 1999) DC: U.S. Government Printing Office.
Ekstrom, R. B., French, J. W., Harman, H. H., & Derman, D. (1976). Kit of Montanelli, R. G., Jr., & Humphreys, L. G. (1976). Latent roots of random
factor-referenced cognitive tests. Princeton, NJ: Educational Testing Service. data correlation matrices with squared multiple correlations on the
Fitts, P., & Posner, M. I. (1967). Human performance. Belmont, CA: diagonal: A Monte Carlo study. Psychometrika, 41, 341–348.
Brooks/Cole. Moreno, K. E., Wetzel, C. D., McBride, J. R., & Weiss, D. J. (1984).
Fleishman, E. A. (1954). Dimensional analysis of psychomotor abilities. Relationship between corresponding Armed Services Vocational Apti-
Journal of Experimental Psychology, 48, 437– 454. tude Battery (ASVAB) and computerized adaptive testing (CAT)
Fleishman, E. A. (1956). Psychomotor selection tests: Research and appli- subtests. Applied Psychological Measurement, 8, 155–163.
cation in the U.S. Air Force. Personnel Psychology, 9, 449 – 467. Oberauer, K., Sü␤, H-M., Schulze, R., Wilhelm, O., & Wittmann, W. W.
Galton, F. (1885). Some results of the Anthropometric Laboratory. Journal (2000). Working memory capacity facets of a cognitive ability construct.
of the Anthropological Institute, 14, 275–287. Personality and Individual Differences, 29, 1017–1045.
Ghiselli, E. E. (1942). A comparison of the Minnesota Vocational Test for Oberauer, K., Schulze, R., Wilhelm, O., & Sü␤, H-M., (2005). Working
Clerical Workers with the general clerical battery of the United States memory and intelligence–their correlation and their relation: Comment
Employment Service. Journal of Applied Psychology, 26, 75– 80. on Ackerman, Beier, and Boyle. Psychological Bulletin, 131, 61– 65.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New Ohnmacht, F. W., Weaver, W. W., & Kohler, E. T. (1970). Cloze and
York: Wiley. closure: A factorial study. Journal of Psychology, 74, 205–217.
Goska, R. E., & Ackerman, P. L. (1996). An aptitude-treatment interaction Otis, J. L. (1938). The prediction of success in power sewing machine
approach to transfer within training. Journal of Educational Psychology, operating. Journal of Applied Psychology, 22, 350 –366.
88, 249 –259. Raven, J. C., Court, J. H., & Raven, J. (1977). Raven’s progressive
Guilford, J. P., & Lacey, J. I. (Eds.). (1947). U.S. Army Air Forces Aviation matrices and vocabulary scales. New York: Psychological Corporation.
Psychology Program Research Reports: Printed classification tests. Rushton, J. P., Brainerd, C. J., & Pressley, M. (1983). Behavioral devel-
Report No. 5. Washington, DC: U.S. Government Printing Office. opment and construct validity: The principle of aggregation. Psycholog-
Hall, W. B., & Gough, H. G. (1977). Selecting statistical clerks with the ical Bulletin, 94, 18 –38.
Minnesota Clerical Test. Journal of Psychology: Interdisciplinary and Salthouse, T. A., & Ferrer-Caja, E. (2003). What needs to be explained to
Applied, 96, 297–301. account for age-related effects on multiple cognitive variables? Psychol-
Hay, E. N. (1951). Mental ability tests in clerical selection. Journal of ogy and Aging, 18, 91–110.
Applied Psychology, 35, 250 –251. Sells, S. B., Dailey, J. T., & Pickrel, E. W. (Eds.). (1984). Selection of air
Henly, S. J., Klebe, K. J., McBride, J. R., & Cudeck, R. (1989). Adaptive traffic controllers (FAA-AM-84 –2). Washington, DC: U.S. Department
and conventional versions of the DAT: The first complete test battery of Transportation, Federal Aviation Administration.
comparison. Applied Psychological Measurement, 13, 363–371. Taylor, W. L. (1953). “Cloze procedure”: A new tool for measuring
Hull, C. L. (1928). Aptitude testing. New York: World Book Company. readability. Journalism Quarterly, 30, 415– 433.
Jöreskog , K., & Sörbom, D. (2006). LISREL (Version 8.7) [Computer Thorndike, R. L. (1986). The role of general ability in prediction. Journal
software]. Lincolnwood, IL: Scientific Software International. of Vocational Behavior, 29, 332–339.
Jackson, D. N. (1985). Multidimensional aptitude battery. London, On- Thorvaldsson, V., Hofer, S. M., & Johansson, B. (2006). Aging and
tario, Canada: Sigma Assessment Systems. late-life terminal decline in perceptual speed. European Psychologist,
Jenkins, J. J. (1953). Some measured characteristics of Air Force weather 11, 196 –203.
forecasters and success in forecasting. Journal of Applied Psychology, Thurstone, L. L. (1944). A factorial study of perception. Psychometric
37, 440 – 444. Monographs, 4, 1–148.
Kane, M. J., Hambrick, D. Z., & Conway, A. R. A. (2005). Working Thurstone, L. L., & Thurstone, T. G. (1941). Factorial studies of intelli-
memory and fluid intelligence are strongly related constructs: Comment gence. Psychometric Monograph, 2.
on Ackerman, Beier, and Boyle. Psychological Bulletin, 131, 66 –71. Thurstone, T. G. (1962). PMA. (Primary Mental Abilities) Chicago, IL:
Kanfer, R., & Ackerman, P. L. (1989). Motivation and cognitive abilities: Science Research Associates.
An integrative/aptitude-treatment interaction approach to skill acquisi- Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task
tion. Journal of Applied Psychology– Monograph, 74, 657– 690. dependent? Journal of Memory and Language, 38, 127–154.
Keil, C. T., & Cortina, J. M. (2001). Degradation of validity over time: A test Wolfe, J. H. (1997). Incremental validity of ECAT battery factors. Military
and extension of Ackerman’s model. Psychological Bulletin, 127, 673– 697. Psychology, 9, 49 –76.
Kyllonen, P. C. (1985). Dimensions of information processing speed (AFHRL- Yoakum, C. S., & Yerkes, R. M. (1920). Mental tests in the American
TP-84 –56). Brooks Air Force Base, TX: Air Force Systems Command. Army. London: Sidgwick & Jackson, Ltd.
Appendix
Table A1
Test List for Study 1, Study 2, and Study 3
Test list Study 1 Study 2 Study 3
Single stimulus display

1. Name Comparison P&P PSCA/B PSC PSC
2. Number Comparison P&P PSCA/B PSC PSC
3. Letter/Number P&P PSCA/B PSC PSC
Substitution
4. Naming Symbols P&P PSCA/B PSC PSC
5. Digit/Symbol P&P PSCA/B PSC PSC
6. Coding P&P PSCA/B PSC PSC
7. Factors of 7 P&P PSCA/B
8. Summing to 10 P&P PSCA/B
9. Directional Headings 1 P&P PSCA/B PSC PSC
10. Directional Headings 2 P&P PSCA/B PSC
11. CA-2 P&P PSCA/B
12. Number Sorting PSC
Multiple stimulus display
1. Canceling Symbols PSCA/B PSC
2. Finding a and t PSCA/B PSC
3. Summing to 10 PSCA/B PSC
4. Scattered X’s PSCA/B PSC
5. Factors of 7 PSCA/B PSC
6. Finding 僆 and ¥ PSCA/B PSC
Reference ability measures (all paper and pencil)
Verbal
1. MAB Similarities X X X
2. Word Beginnings X
3. MAB Comprehension X X X
4. Vocabulary X X X
5. Cloze X
Numerical/math
1. Math Knowledge X X X
2. Arithmetic (multi-ability) X X X
3. Number Series X X X
4. Math Approximation X X X
Spatial
1. Cube Comparison X X
2. Spatial Analogy X X X
3. Paper Folding X X X
4. Spatial Orientation X X X
5. Verbal Test of Spatial Ability X
6. Raven Advanced Progressive Matrices X
Working memory
1. ABCD Order X
2. Backward Digit Span X
3. Word Sentence X
4. Spatial Span X
5. Alpha Span X
6. Computation Span X
Psychomotor
1. Single Tapping X X X
2. Alternate Tapping X
3. 8-Choice Serial RT X X X
Note. P&P ⫽ paper and pencil; PSC ⫽ perceptual speed (computerized); A/B ⫽ test-retest administration.
Table A2
Test Descriptions
Perceptual speed
The description below of the Perceptual Speed Tests is excerpted from Ackerman, Beier, & Boyle, 2002, p. 573.
Based on previous taxonomic research that has established four major factors of PS ability (e.g., see Ackerman & Cianciolo, 2000), we selected 16
perceptual speed tests to serve as markers for four PS factors: PS-Scanning, PS-Pattern Recognition, PS-Memory, and PS-Complex. Except where
indicated, the tests were locally developed (Ackerman & Cianciolo, 2000), and the initial administration of each test included three separate
alternate form parts, with durations of 1.5-2 min/part.
1. Name Comparison (identify identical or mismatched name pairs).
2. Number Comparison (identify identical or mismatched number pairs).
3. Letter/Number Substitution. Same as Digit/symbol, but stimuli were letters and numbers.
4. Naming Symbols (write in single letter code for 5 different simple figures).
5. Digit/Symbol (put numbers next to symbols corresponding to lookup key).
6. Coding (look up and circle a letter or number code for common words).
7. Factors of 7 (circle 2-digit numbers if they are exactly divisible by 7).
8. Summing to 10 (circle pairs of numbers if they sum to 10).
9 and 10. Directional Headings- Part I and Part II. This test of memory, perceptual encoding and learning was modeled after a test designed by the
FAA Civil Aeromedical Institute (see Cobb & Mathews, 1972). Participants are given items that include a directional letter, arrow, and degree
heading (e.g., S 1 180). They must decide the direction implied by these indicators, or indicate that conflicting information is presented in the
item. In the first part, a conflict is defined as any mismatch of indicators. In Part II, the more complex, a conflict exists only if two or more
indicators have a mismatch. Two parts of the test were administered.
11. Number Sorting (find the largest of 5 large numbers).
12. Clerical Abilities-2 (CA-2) (Psychological Corporation). This test involves looking-up names and numbers in tables (scanning). Verbal and
numerical content.
13. Canceling Symbols (scan page for a single target figure among other simple target figures).
14. Finding a and t (scan for instances of “a” and “t” in text passages [passages were in Italian]).
15. Finding 僆 and ¥ (same as Finding a & t, except text was random symbols).
16. Scattered Xs (Thurstone & Thurstone, 1941). This test involves searching pages of random letters for the 5 Xs. The examinee circles each “x”.
Verbal ability
1. Similarities. Multidimensional Aptitude Battery (MAB) similarities. This is a test of verbal knowledge. Each item presents two words and
participants must select the option that best describes how the two words are alike. This test has one part, with a time limit of 7 min (Jackson,
1985).
2. Word Beginnings. ETS Word Beginnings. This is a test of verbal fluency. Participants are given three letters and asked to produce as many words
that begin with these letters as time allows. This test has two parts, each part has a time limit of 3 min (ETS Kit: Ekstrom et al., 1976).
3. Comprehension. Multidimensional Aptitude Battery (MAB) comprehension. This is a test of common cultural knowledge. Each item asks for the
correct response to, or the rationale behind everyday situations, cultural conventions or practices. This test has one part, with a time limit of 7
min (Jackson, 1985).
4. Vocabulary. ETS Extended Range Vocabulary Test. This is a classic vocabulary test. Individuals are presented with a word and must choose the
word that most closely matches it. This test has two parts, each part has a time limit of 7 min (ETS Kit: Ekstrom et al., 1976).
5. Cloze. The cloze test was constructed from a passage selected from U.S. Literature college-level textbook (U.S. Literature (DiYanni, 1994, p. 55).
The passage was selected to be around 250 words in length (total number of words ⫽ 248). Following the technique originated by Taylor (1953),
a “structural” (Ohnmacht, Weaver & Kohler, 1970) cloze test was constructed. This entailed leaving the first and the last sentences of the passage
intact. Starting with the second sentence, every fifth word was deleted (regardless of its grammatical or contextual relationship) and replaced with
an underlined blank ten spaces long. The cloze test included 39 blanks. Participants were instructed to read through the passage and fill in the
blanks with the words that best fit into the sentence. If participants did not know the exact words that fit in the blank, they were instructed to
guess. Participants were given 10 min to complete the test. Credit was given for either the actual missing word or for words that fit the gist of the
paragraph (and were grammatically correct in the context of the text).
Numerical/math ability
1. Math Knowledge (Lohman, see Ackerman & Kanfer, 1993). This is a wide range test of mathematical knowledge, from simple computation to
algebra, geometry, and other advanced topics. The test had one part of 32 items, with a 12 min time limit.
2. Arithmetic (multi-ability; Cureton & Cureton, 1955). This test presents a series of math problems requiring a variety of operations such as adding
two fractions and reducing to the lowest term. The most difficult problems require more than one operation. Two parts of 10 items each were
administered with a time limit of 4 min for each part.
3. Number Series. PMA Number series. This a test of inductive reasoning in which a series of numbers generated by a rule is provided, and the next
number is the series is to be identified. The test has one part, with a time limit of 4 min (Thurstone, 1962).
4. Math Approximation (locally developed). This test was modeled after the numerical approximation test described in Guilford and Lacey (1947; test
number CI706A). Each problem requires that the examinee arrive at an estimated answer and then choose from among five possible answers.
This test had two parts of 20 items each, with a short time limit (41/2 min/part) to discourage exact computations.
Spatial ability
1. Cube Comparisons (ETS Kit: Ekstrom et al., 1976). Items in this test illustrate a pair of six-sided cubes, displaying three sides of each cube. Each
side is defined as having a different design, letter, or number. For each pair, the task is to determine whether the blocks could be the same or
must be different, based on possible rotations and constancy of the markings. This test had two parts with 21 items in each part, and a time limit
of 3 min/part.
2. Spatial analogy. A four-term multiple choice test of analogical reasoning with spatial content, similar in structure to verbal analogy tests (i.e.,
A:B::C: a,b,c,d). This test has one part, with a time limit of 9 min (created by P. Nichols; see Ackerman & Kanfer, 1993).
3. Paper Folding (Lohman, see Ackerman & Kanfer, 1993). This test is an adaptation of other classic tests of the same name (e.g., see Ekstrom et al.
1976). Two parts with 12 items each and a time limit of 6 min/part were administered.
4. Spatial Orientation (Lohman, see Ackerman & Kanfer, 1993). This is a test of three-dimensional visualization. Subjects are required to imagine a
block figure, as seen from a different perspective. Two 10-item parts of this test were administered, with a time limit of 21/2 min/part.
Table A2
(Continued)
5. Verbal Test of Spatial Ability (Lohman, see Ackerman & Kanfer, 1993). This is a test of image generation and manipulation. Subjects are asked to
close their eyes and imagine the items described verbally. Then they are asked a multiple choice question about the items in the image. This test
had one part of 24 items, and is experimenter-paced. Each item takes about 10 sec for the item presentation and 20 sec of allowed response time.
Total completion time is 12 min.
6. Raven’s Advanced Progressive Matrices (I ⫹ II). (Raven, Court, & Raven, 1977). Test of inductive reasoning. Participants are given an item that
contains a figure (with three rows and columns) with the lower right-hand entry cut out, along with eight possible alternative solutions.
Participants choose the solution that correctly completes the figure (across rows and columns). Test had two parts, a brief Part I (12 items and a
5-min limit) and a longer Part II (36 items and a 40-min time limit.)
Working memory ability
This battery of tests was used in Ackerman, Beier, & Boyle, 2002, and the description of the specific tests is excerpted from that source (p. 571–572).
Working memory tests
Six commonly used working memory tests were adapted for administration in the current study. The tests included a sampling of stimuli that
represent alphabetic (words), numeric, and spatial content. Tests other than ABCD Order were composed of three trials at each set size. Each test
was preceded by task instructions and at least one example, and trials within task were separated by a 2 s fixation screen.
1. ABCD Order: Two categories were used, with five one-syllable words in each category (example: the category “trees” contained member
words birch, elm, fir, oak, and pine). Three study frames were displayed for 5 s each. The first frame indicated the order of two members
from the same category (e.g., “The pine comes before the elm”); the second frame indicated the order of two members from the second
category (e.g., “The rice comes after the beans”); and the third frame indicated the order of the categories (e.g., “The trees come before the
food”). After the third study screen, an eight-choice answer screen was displayed from which participants selected the correct order of the
words. Participants were allowed 15 s to enter a response (in this example, the correct order is “pine elm beans rice”). The use and ordering
of category members were balanced across items, as were the variations of order (i.e., comes before, comes after, does not come before,
does not come after). To increase difficulty after observing a ceiling effect in pilot testing, two categories and related members were used for
Items 1–12 and two different categories and members were used for Items 13–24. This test was modeled after the ABCD order test used in
the CAM Battery (Kyllonen, 1988). Each item was equally weighted for scoring purposes.
2. Backward Digit Span: Digits from one through nine were presented auditorily at the rate of one digit per second while a “Listen” screen was
displayed. At the auditory “Recall” signal, participants were allowed 15 s to type in reverse order the digits presented. Set size ranged from
three to eight digits (18 trials total). Digits were randomly grouped to form trials, with the restriction that digits did not repeat within a trial.
This test was similar to one administered by Oberauer et al. (2000), except that in our version the stimuli were presented auditorily rather
than by computer display. Scoring was comparable to the Alpha Span task.
3. Word-Sentence Span: This test included a sentence verification task and a recall task. Participants were first presented with a common word to
study for 2 s for later recall (e.g., “cross” or “train”). Participants were then asked to verify (T /F) the correctness of a sentence displayed for
a maximum of 6 s. Recall words and verification sentences alternated through the trial, at the end of which participants were prompted to
recall and enter the first two letters of each recall word in th e order presented. The test was modeled after one included in the CAM battery
(Kyllonen, 1988). Each sentence contained between five and eight words and was of medium length as compared to similar tasks that require
recall of the last word of the sentence (see, e.g., Lehto, 1996; Oberauer et al., 2000; Turner & Engle, 1989). Sentences were selected to be
easily judged as true or false and require only common knowledge-for example, “A canoe is powered by gasoline.” Recall words and
verification sentences were randomly grouped from a stimuli pool to form trials. Set size ranged from two to six words/sentences (15 trials
total). Credit for a perfect trial was given if the first two letters of all study words were recalled in the correct order, and perfect trials were
weighted by set size to compute a final score.
4. Spatial Span: For the secondary task, a three-by-three matrix containing between two and seven red Xs and one blue circle was displayed.
Participants were allowed 7 s to make an odd (O) or even (E) judgment about the number of Xs presented in each stimulus. The recall task
was to identify the pattern of blue circles formed across the stimuli, selecting from a multiple choice response screen of four nine-cell
matrices with different combinations of blue circles. Participants were allowed 7 s to provide a recognition response. Set size ranged from
two to seven stimuli and blue circles in the final configuration (18 trials total). Credit for a perfect trial was given if the correct matrix of
circles was identified, and perfect trials were weighted by set size in computing the final score.
5. Alpha Span: A list of common, one-syllable words was presented auditorily at the rate of one word per second while a “Listen” screen was
displayed. At the auditory “Recall” signal, participants were allowed 15 s to type in alphabetical order the first letter of each of the words
presented. Set size ranged from three to eight words (18 trials total). Words were randomly selected from a stimuli pool without replacement
to form trials, with the restriction that words with the same first letter were not presented together. This test was modeled after the Alpha
Span task used in Oberauer et al. (2000). Credit for a perfect trial was given if all first letters were recalled in the correct order, and perfect
trials were weighted by set size in computing the final score.
6. Computation Span: This test included a verification task and a recall task. Participants were allowed 6 s to verify (T/F) the accuracy of a math
equation and were instructed to remember the displayed solution, regardless of its accuracy. After the final equation of the trial was
displayed, participants were prompted to remember in order each of the presented solutions from the equations (e.g., “Enter the 1st digit”).
Each math equation included two operations using digits from 1 through 10, and the provided and actual solutions were always single-digit
numbers (e.g., “(10/2) ⫺ 4 ⫽ 9‘). This task was constructed as a variation of the Computation Span task used in Oberauer et al. (2000),
using slightly more difficult equations derived from stimuli used in Cantor and Engle (1993). We restricted our equation elements to be no
greater than 10 and limited our solutions to one-digit numbers. Equations were randomly grouped from a stimuli poo l to form trials. Set
size ranged from three to seven equations/solutions (15 trials total). Credit for a perfect trial was given if all digits were recalled in the
correct order, and perfect trials were weighted by set size in computing the final score
Psychomotor abilities
(Excerpted from Ackerman & Cianciolo, 2000)
Psychomotor tests - procedure
Tapping. Two Tapping tests were administered, a single Tapping and an alternate Tapping test.
Table A2
(Continued)
1. Single Tapping. Examinee was presented with a single target square and was instructed to tap it as rapidly as possible with the TouchPen.
2. Alternate Tapping. Examinee was presented with two target squares and was instructed to tap them as rapidly as possible in alternating order
with the TouchPen.
Both tapping tasks began with a variable 1000 to 2000 msec hold on a home key, followed by a change in target square color signaling trial
commencement, then feedback (number of correct taps per trial, number of error taps per trial). All ta rget squares were of equal size, 2.4 ⫻
2.4 cm. Performance was measured as number of correct taps within the trial time limit of 15 seconds. One block of 5 trials was
administered for each test.
Serial RT. In the Serial RT paradigm, the examinee was instructed to press all of the stimulus squares in numerical order.
3. Eight-Item Serial RT. Examinee was presented with 8 target squares arranged equidistant from the “home key” in a circular pattern.
4. Four-Item Serial RT. Same as 8-item Serial RT, with only 4 squares of equal size.
5. Two-Term Serial RT. Same as 8-item Serial RT, with only 2 squares of equal size.
All Serial RT tasks began with a variable 400 to 800 msec hold on a home key, followed by the random numbering of all target squares, then
feedback (same as the Choice RT feedback). All target squares were of equal size, 2.4 ⫻ 2.4 cm. Two blocks of 25 trials were administered
for each test. Performance, measured as mean “RT” (which is actually a total completion time) in ms, was calculated for correct responses only
(Ackerman & Cianciolo, 2000, p. 276-277).
Received December 31, 2006

Revision received March 14, 2007
Accepted April 5, 2007 䡲
View publication stats

Ackerman 2007

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ackerman 2007

Transféré par

Droits d'auteur :

Formats disponibles

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Further Explorations of Perceptual Speed Abilities in the Context of

Article in Journal of Experimental Psychology Applied · December 2007

The user has requested enhancement of the downloaded file.

Further Explorations of Perceptual Speed Abilities in the Context of

Phillip L. Ackerman Margaret E. Beier

Keywords: computerized testing, assessment, validity, working memory, cognitive ability

are more robust (i.e., predicting on-the-job performance as well as

Initial Test (A) Retest (B) r with P&P Test-Retest

M SD rxx,a M SD rp&p,A rp&p,B rA,B t(B-A) d

# of Items M SD ␣ Verbal Numerical Spatial PM

Ability composite P&P Computer A Computer B t(P&P,A) t(P&P,B)

1. Verbal 0.298 0.357 0.446 ⫺1.10 ⫺3.00**

Initial test (A) Retest (B) Test-retest

New computerized perceptual

Note. N ⫽ 160; df ⫽ 157.

# of Items M SD ␣ Verbal Numerical Spatial PM

PS-SS PS-MS(A) PS-MS(B) t(SS-MS(A)) t(SS-MS(B))

Figure 2. The Kanfer-Ackerman Air Traffic Controller Task. The figure

1. MAB Similarities 34 27.26 3.10 0.80 0.83

Figure 4. Criterion task performance. A (top panel), Performance on the

Test list Study 1 Study 2 Study 3

Single stimulus display

Received December 31, 2006

View publication stats

Vous aimerez peut-être aussi