Vous êtes sur la page 1sur 34

E RGO N OM ICS , 2000, VOL . 43, NO.

1, 73 ± 105

Development of physical selection procedures for the British


Army. Phase 2: Relationship betw een physical
performance tests and criterion tasks

M . R AY SON *, D. H OLLIM AN and A. B E LYA VIN


Centre for Human Sciences, Defence and Evaluation Research Agency,
Farnborough GU14 OLX, UK

Keywords: Personnel selection; Military personnel; Physical ® tness; Employm ent


standards; Manual material handling.

This paper is the second in a series of three to describe the developm ent of physical
selection standards for the British Army. The ® rst paper de® ned criterion tasks
( single lift, carry, repetitive lift and carry, and loaded march tasks ) and set
standards on the criterion tasks for all British Army trades. The principal objective
was to determine which com bination of physical performance tests could be best
used to predict criterion task performance. Secondary objectives included
developing so-called `gender-free’ and `gender-unbiased’ models. The objectives
were met by analysing performance data on the criterion tasks and a large battery
of physical performance tests collected from 379 trained soldiers ( mean age 23.5
( SD 4.45 ) years, stature 1734 (SD 79.5 ) mm, body mass 71.4 (SD 10.58 ) kg ).
Objective 1 was met: the most predictive physical performance tests were identi® ed
for all criterion tasks. Both single lift tasks were successfully modelled using muscle
strength and fat free mass scores. The carry model incorporated muscle endurance
and body size data, but the errors of prediction were large. The repetitive lift
models included measures of muscle strength and endurance, and body size, but
errors of prediction were also large. The loaded march tasks were successfully
modelled incorporating indices of aerobic ® tness, supplem ented by measures of
strength, endurance or body size and com position. The secondary objectives were
partially ful® lled, though limitations in the data hampered the process. Although
only one model ( a loaded march ) was gender-free, three models were gender-
related (i.e. contained `gender’ explicitly in the model ). The remaining six were
gender-speci® c ( i.e. were appropriate for men or for women ). Owing to both a
lower accuracy of prediction in women’ s scores and a greater tendency for the
women’ s scores to be distributed around the pass standards, a greater percentage
of women than men were misclassi® ed as passing or failing, resulting in indirect
discrimination. A validation of the models in a separate sample of the user
population of recruits is reported in the third paper in this series.

1. Introduction
1.1. Background
This paper is the second in a series of three that describes the phases involved in
developing physical selection standards for the British Army (Rayson 1997 ). The ® rst

*Author for correspondence at: Optimal Perform ance Ltd, Old Chambers, 93 ± 94 West
Street, Farnham GU9 7EB, UK. e-mail: markrayson @ cwcom.net

E rgonom ics ISSN 0014-0139 print/ISSN 1366-5847 online Ó 2000 T aylor & Francis Ltd
http:// www.tan df.co.uk/ journals / tf / 00140139.htm l
74 M . Rayson et al.

phase described the methods and ® ndings from a job analys is and identi® ed criterion
tasks that could be used as the basis for developing physical selection standards
( Rayson 1998 ). The criterion tasks re¯ ected the ® ndings from a job analys is, overlaid
by subject-matter expert opinion and logistical constraints, and comprised single lift,
carry, repetitive lift and loaded march tasks. Soldiers in all trades were allocated to
one of three levels ( referred to as levels 1, 2 or 3 ), which represented the required
standards or acceptance criteria. The criterion tasks and levels are summ arized in
table 1. For exam ple, an Infantryman was required to achieve level 1 on the single lift
and carry, level 2 on the repetitive lift, and level 1 on the loaded march.
Although ideally, recruits would be directly tested on the criterion tasks ensuring
a high content validity, this is not usually practicable for reasons of safety ( e.g. a
44 kg box lift to 1.70 m in novices m ay not be regarded as acceptable ), skill
Ð 1
requirements ( e.g. a repetitive 10 kg box lift, six times min requires skill as well as
® tness ) and logistics (a 2 h march is not viable as a selection test ), and surrogate tests
are used to predict performance in m any of the arm ed and unarm ed services. This
paper describes the relationships between physical performance tests and the
criterion tasks, potentially enabling a battery of physical selection tests to be chosen
and used for selecting personnel who have the physical prerequisites for the various
British Army jobs.
For the purposes of selecting a battery of potential physical selection tests,
measurements of physical ® tness, body size and body composition were considered.
In the context of physical selection, physical ® tness may be de® ned as the physical
capacity to meet the dem an ds of the occupation. Thus, physical capacity will re¯ ect
both innate capab ility and training-induced enhancement of performance.
Physical ® tness may be classi® ed into anaerobic and aerobic components.
Anaerobic ® tness refers to the muscle’ s ability to produce energy without the
presence of oxyge n via phosphate compounds stored in the muscle cells or by
anaerobic glycolysis. The size and speed of contraction of the muscle will be the
prim ary determinants of anaerobic ® tness ( Jones and Round 1992 ). Aerobic ® tness
refers to the ability of the heart, lungs and circulatio n to allow the muscle to work
aerobically by supplying oxygen and carbohydrate and fat substrates (Astrand and
Rodahl 1986 ).

Table 1. Criterion tasks and levels of performance.

Repetitive lift Loaded march


Criterion Single lift Carry 2 ´ 20 kg and 10 m carry 12.8 km in
task/ level ammunition box water cans ammunition box 120 min

1 44 kg to 1.70 m 210 m 44 kg, ground to 25 kg load


Ð 1
1.45 m, 1.min for
20 min

2 35 kg to 1.45 m 90 m 22 kg, ground to 20 kg load


Ð 1
1.45 m, 3.min for
15 min

3 20 kg to 1.45 m 30 m 10 kg, ground to 15 kg load


Ð 1
1.45 m, 6.min for
10 min
Physical selection procedures for the British Army 75

1.2. Relationship between criterion task and physical performance test performanc e
The relative importance which diŒerent aspects of physical capability have on task
performance depends upon the task in question, the individual perform ing the task,
and the environment in which it is performed. There have been numerous studies
which have reported on the association between lifting, carrying an d marching tasks
( e.g. Sharp et al. 1980, Teves et al. 1985, Beckett and Hodgdon 1987, Nottrodt and
Celentano 1987, M ello et al. 1995, Stevenson et al. 1992, 1994 ), which are broadly
similar to the criterion tasks identi® ed for this project ( Rayson 1998 ), and a variety
of physical performance tests.
M any studies investigating material handling performan ce have employed
psychophysical methodology ( Ayo ub and M ital 1989 for a review ). The majority
of these studies has been omitted from this review as the psychophysical approach is
invalid for assessing true maxim al working capability ( Dueker et al. 1994 ).
Knowledge of maxim al working capability is essential to the development of
physical selection standards for occupations where physical performance is of
param ount im portance. W here more objective evidence of maxim um task
performance is lacking, for exam ple for repetitive lift tasks, brief mention is made
of selected psychophysical studies to add to the knowledge base.
Useful clues may be gleaned from published data that assist the process of
assembling a battery of potentially predictive physical performance tests. However,
caution must be exercised in interpreting the applicability of the data from previous
studies where diŒerences exist between the task protocols, the populations under
investigation, the composition of the test batteries and the manner in which gender
was accounted for ( i.e. whether the impact of gender was explored when developing
the relationships ).

1.3. Predictors of single lift performanc e


It is readily apparent from the literature that a number of diŒerent components of
physical capability are related to brief or single lift performance ( Poulsen 1970,
Sharp et al. 1980, Pytel and Kam on 1981, Ayoub et al. 1982, Teves et al. 1985,
Beckett and Hodgdon 1987, N ottrodt and Celentano 1987, D ueker et al. 1994 ).
These include body size and composition, static and dynam ic strength / power, and
static and dynam ic endurance.
M easures of body size and composition were not included frequently or
comprehensively in these studies, perhaps because they are not am ong the best
predictors of lifting capacity. However, fat free mass ( FFM ) (Sharp et al. 1980, Teves
et al. 1985, Beckett and Hodgdon 1987, Nottrodt and Celentano 1987 ) consistently
featured am ong the best predictors of lifting. Body mass and bi-iliac breadth
( Nottrodt and Celentano 1987 ) were also cited am ong the better predictors.
Tests of static and dynam ic strength have been used m ost frequently to predict
material handling performan ce and there has been considerable debate in the
scienti® c literature as to their relative merits. Som e authors ( e.g. Cha n 1974 ) favour
static tests for reasons of reliability, ease and safety. But m aterial handling com prises
dynam ic as well as static muscle contractions involving inertial forces and this has
encouraged a recent trend towards measurement of dynam ic strength and simulated
tasks. W hen these movem ents are taken into account (e.g. during dynam ic strength
measurement ) higher correlation coe cients usually result and fewer independent
varia bles are required to predict performance ( M ital et al. 1986b , Ayoub and M ital
1989 ). For these reasons, dynam ic strength testing is favoured by many authors
76 M . Rayson et al.

( Aghazadeh and Ayoub 1985, Pytel and Kam on 1981, Kam on et al. 1982, Kroemer
1983, 1985, M ital 1985, M ital et al. 1986a, b ).
The studies reviewed for this paper indicate the value of both static and dynam ic
strength tests. Six of the studies included both types of strength test. Of these, three
studies reported the importance of both types of test ( Aghazadeh and Ayoub 1985,
Teves et al. 1985, Nottrodt and Celentano 1987 ) while the remaining three studies
reported dynam ic strength tests to be superior predictors (Ayoub et al. 1982, Beckett
and Hodgdon 1987, M ello et al. 1995 ). The relative superiority of dynam ic versus
static tests may depend upon the range an d speed of movem ent and the joint an gles
involved in executing the task.
Static strength tests that were strongly associated with lift performance included
upright pull ( Sharp et al. 1980, Teves et al. 1985, Nottrodt and Celentano 1987 ),
back ( Poulsen 1970 ), arm and shoulder strength ( Poulsen 1970, Aghazadeh and
Ayoub 1985 ). Dynam ic strength / power tests strongly associated with lift perfor-
mance included the Incremental Lift M achine (ILM ) test ( Ayoub et al. 1982, Teves et
al. 1985, Beckett an d Hodgdon 1987, Nottrodt and Celentano 1987 ), vertical and
broad jump ( Beckett and Hodgdon 1987 ), and isokinetic back extension ( Pytel and
Kam on 1981 ) and isokinetic lift power ( Pytel and Kam on 1981, Aghazadeh and
Ayoub 1985 ). Two endurance tests stood out as being am ong the better predictors of
the single lifting tasks: a 70 lb hold at elbow height ( Ayoub et al. 1982 ), and push-ups
( Beckett and Hodgd on 1987 ).
The strength of the relationships betw een test and task scores can be assessed, in
part, by the magnitude of the correlation coe cients ( r ) which were cited in all of the
reviewed studies. Caution must be exercised when com paring r between studies as it
is a function of the range of data as well as the strength of the relation between the
two variab les. Som e of the studies presented r for the genders separately, while
others presented combined data.
The highest r (0.87 ± 0.96 ) for the genders separately were reported by Pytel and
Kam on ( 1981 ) between isokinetic strength scores an d lift scores. Nottrodt and
Celentano ( 1987 ) also reported high single-gender r betw een a 1.33 m lift and lean
body mass in females ( 0.76 ) and the ILM test in males ( 0.71 ). The highest pooled-
gender r were between lift performance and ILM ( 0.77, M yers et al. 1984; 0.65, Teves
et al. 1985 ), and between lift performance and FFM ( 0.88, Sharp et al. 1980; 0.74,
M yers et al. 1984; 0.66, Teves et al. 1985 ). Upright pull ( 0.76, Sharp et al. 1980; 0.64,
Teves et al. 1985 ), and hand grip strength ( 0.80, Sharp et al. 1980 ) also correlated
highly. r dropped signi® cantly ( typically to 0.3 ± 0.5 ) when the genders were analys ed
separately (e.g. M yers et al. 1984, Teves et al. 1985 ). Usually, r was higher for men
than for wom en ( e.g. M yers et al. 1984, Teves et al. 1985 ).
2
Several of the studies cited correlation coe cients squared ( r ) indicating the
am ount of variab ility accounted for by the physical performance tests, derived from
pooled-gender data in multiple regression models ( Sharp et al. 1980, Teves et al.
1985, Beckett and H odgdon 1987, Nottrodt and Celentano 1987 ). The latter
2
produced the model accounting for the most variation, citing a r = 0.88 for the ILM
2
test and FFM . W hen the genders were separated, r dropped to 0.68 for men and
0.65 for women. These values were still substantially higher than those derived from
2
the data of Teves et al. ( 1985 ), which produced r = 0.47 for pooled gender data ( also
from the ILM test and FFM ), 0.33 for m en and 0.11 for women. The best model
produced by Beckett and Hodgdon ( 1987 ) incorporated vertical jump and push-ups
( r 2 = 0.82 ), though the ILM test alone produced a r 2 = 0.79, and FFM and push-ups
Physical selection procedures for the British Army 77

2
a r = 0.77. The model of Sharp et al. (1980 ) included FFM , upright pull and gender,
2
producing a r = 0.79. There was a reasonable consistency am ong the physical
performance tests retained in the multiple regression models ( prim arily a
combination of FFM and a strength test ). The models with their coe cients are
shown in table 2.

1.4. Predictors of repetitive lift performance


There have been few studies investigating repetitive or sustained lifting performance
that have not used psychophysical methodology. M ello et al. (1995 ) investigated a
repetitive lift task involving the maxim um weight male and female participants could
lift from the ground, carry 10 m and lift to a platform at 1.32 m at 1 and 4
Ð 1 Ð 1
lifts.min for 1 h. At 1 lift.min , m ean arm power from a W ingate test and bench
press provided the largest correlation coe cients (0.90 an d 0.88 respectively ). At 4
Ð 1
lift.min , FFM of the arm ( r = 0.83 ), total FFM ( 0.82 ), sex ( 0.82 ) and stature
( 0.81 ) were the most highly correlated tests with maxim um weight of lift. The best
Ð 1
equation for the maxim um weight at 1 lift.min incorporated stature and bench
Ð 1
press and the maxim um weight at 4 lift.m in incorporated stature, FFM of the arm
2
and total FFM ( both r = 0.88 ). The models are shown in table 2.

1.5. Predictors of carry performance


M easurements of anthropom etry, body composition, static an d dynam ic strength,
static and dynam ic endurance and aerobic ® tness (i.e. all aspects of physical ® tness )
were am ong the best predictor tests in the seven studies reviewed ( M yers et al. 1984,
Stevenson 1985, 1988, 1992, 1994, Beckett and Hodgdon 1987, Rice and Sharp
1994 ).
M easurem ents of body size and composition were rarely investigated. Stature
was the only measure that appeared am ong the best predictors, albeit in just one of
the two studies in which it was measured ( Rice and Sharp 1994 ). FFM was
identi® ed as am ong the best predictors in only one of the three studies in which it
was measured ( M yers et al. 1984 ). Isometric hand grip and upright pull were the
most predictive static strength tests. Hand grip appeared am ong the best predictors
in ® ve of the six studies in which it was measured (Stevenson et al. 1985, 1988,
1992, 1994, Rice and Sharp 1994 ). Upright pull appeared as a good predictor in
one of the two studies in which it was measured ( Rice and Sharp 1994 ). Dynam ic
lift strength measured on the ILM was a strong predictor in four of the ® ve studies
in which it was measured (M yers et al. 1984, Beckett and Hodgdon 1987,
Stevenson et al. 1988, Rice and Sharp 1994 ). Broad jump was also am ong the best
predictors in a single study (Beckett an d Hodgd on 1987 ). Among the endurance
tests, isom etric hand grip ( Stevenson et al. 1985, 1988 ), ¯ exed arm hang ( Stevenson
et al. 1985, 1988 ), push-ups ( Stevenson et al. 1985, 1988, 1994 ) and sit-ups
( Stevenson et al. 1992 ) appeared am ong the best predictors. M easures of aerobic
.
® tness ( VO 2m a x and run time ) were also consistently am ong the better predictors
( Beckett and Hodgdon 1987, Stevenson et al. 1988, 1992, 1994, Rice and Sharp
1994 ).
r between test scores and carry task scores was generally lower than found for the
lift tasks. The highest single-gender r ( 0.74 for hand grip strength, 0.76 for 2-m ile
run ) were reported by Rice and Sharp ( 1994 ), which was promising as this task
approxim ated most closely the carry criterion task under scrutiny in this .study. The
data of Stevenson et al. ( 1988, 1994 ) also produced a few r > 0.6 for VO 2 m ax and
78
Table 2. Multiple regression models for predicting criterion tasks from the literature. Presented from the reviewed literature are: the single lift (MLC,
maximum lift capacity ), repetitive lift ( MRLC, maximum repetitive lift capacity ), carry and loaded march performance models from various
population samples.
2
Criterion task Author Best multiple variable models Sample r

MLC132 Sharp et al. ( 1980 ) Ð 8.466 + 0.9933 * FFM + 0.006349 * upright pull Pooled 0.79
Ð 4.777 * gender

MLC132 Teves et al. ( 1985 ) Ð 0.55 + 0.87 * FFM + 0.55 * ILM183 Pooled 0.47

MLC elbow Beckett and Hodgdon 5.762+ 0.029 * broad jump * body mass+ 0.297 * push-ups Pooled 0.82
(1987 )

MRLC132 1 lift /min Mello et al. (1995 ) Ð 32.8 + 0.28 * stature + 0.22* bench press Pooled 0.88

MRLC132 4 lift /min Mello et al. (1995 ) Ð 57.3 + 0.53 * stature + 4.1* armFFM Ð 0.7 * FFM Pooled 0.88

No. of 34 kg boxes Beckett and Hodgdon 373 Ð 10.214 * 1.5 mile run+ 0.029 * broad jump * body mass Pooled 0.53
carried 51.4 m in (1987 )
2 ´ 5 min 372 Ð 944 * 1.5 mile run + 0.697 * ILM152 Pooled 0.53
M . Rayson et al.

.
Timed 750 m simulated Stevenson et al. ( 1988 ) 2923 Ð 16.55 * hand grip Ð 24.81 * VO 2m ax Females 0.57
stretcher carry years
> 34
.
No. of 20 kg sandbags Stevenson et al. ( 1988 ) Ð 2.18 + 0.23 * VO 2m ax + 0.07 * ILM + 0.01 * hand grip endurance Males 0.54
carried 50 m in years
> 34
10 min .
0.28 + 0.24 * VO 2m ax + 0.02 * hand grip Ð 0.05 * ¯ exed arm hang Females 0.61
years
> 34
. Ð1
Maximum load at Rayson et al. ( 1993 ) 25.3 * VO 2m ax ( 1.min )+ 0.11 * ankle plantar ¯ exion torque + 0.89 Females 0.71

6.4 km.h *age Ð 1.22 * body fat
Physical selection procedures for the British Army 79

push-ups in a sandbag carry task in older men, and hand grip strength and hand grip
endurance in a stretcher carry in older women.
Similarly, the multiple regression carry models accounted for less of the variation
than had been found in the lift models. M any of the single-gender models had such
2
low r that they were unusable ( e.g. M yers et al. 1984, Stevenson et al. 1985, 1992 ).
However, two of the studies produced models with moderate r ( > 0.5 ) ( table 2 ). A
2

measure of aerobic ® tness combined with a strength- or muscular endurance-related


measure featured in each equation.

1.6. Predictors of march performance


The test batteries were reasonably comprehensive in the ® ve studies reviewed
( Dziados et al. 1987, M ello et al. 1988, Knapik et al. 1990, Rays on et al. 1993,
Frykman and Harm an 1995 ), encompassing all aspects of physical capability.
Interestingly, it appeared that all aspects of physical capability, including
anthropometry and body composition, strength, endurance and aerobic power
provided the best predictors of march performance.
Some of the studies included a limited number of m easurem ents of body size.
Shoulder diam eter ( Frykman and Harm an 1995 ) and stature ( Rayson et al. 1993 )
were the only two measurements to ap pear am ong the best predictors. Despite the
widespread measurem ent of body composition, percentage body fat ( Rayson et al.
1993 ) and FFM ( Knapik et al. 1990 ) were am ong the best predictors in only two of
the reviewed studies.
Among the strength tests, several isometric and isokinetic variab les were am ong
the best predictors, including isom etric upper torso, hand grip and trunk ¯ exion
strength ( Knapik et al. 1990 ), isokinetic knee ¯ exion (Dziados et al. 1987, M ello et al.
1988 ), knee extension strength ( M ello et al. 1988 ), and plantar ¯ exion strength
( Rayson et al. 1993 ).
Several of the studies included selective measurements of m uscle endurance of the
knee ¯ exors an d extensors. Isokinetic knee ¯ exion endurance ( M ello et al. 1988 ) and
squat endurance ( Frykm an and Harm an 1995 ) were am ong the best predictors.
Given the endurance nature of a sustained marching task it was surprising to ® nd
that m easures of m uscle strength were often superior predictors to measures of
muscle endurance. This topic is addressed further in the D iscussion.
M easurem ents of aerobic ® tness were am ong the best predictors in four of the
® ve studies ( Dziados et al. 1987, Knapik et al. 1990, Rayson et al. 1993, Frykman
and
. Harm an 1995 ). The highest r was recorded between m arch performance and
VO 2 m ax ( Fryk man and Harm an 1995 ) with a value of 0.84. However, the distance of
this m arch task was only 3 km. Three additional studies incorporating march
distances of between 2 and 20 km produced moderate r between 0.42 and 0.59
between march time and the m ost highly correlated physical tests of strength ( knee
¯ exion and abdominal strength ) and aerobic power.
Three of the studies ( Dziados et al. 1987, Knapik et al. 1990, Rayson et al. 1993 )
attempted to produce multiple regression models to predict march performan ce.
Dziados et al.’ s model to predict march time over 16 km with an 18 kg load ( 1987 )
2
produced a r = 0.21 incorporating knee ¯ exion strength
. only. The model of Rayso n
2
et al. ( 1993 ) produced a r = 0.71, incorporating VO 2m a x , ankle plantar ¯ exion, age
and body fat (table 2 ), but the model is not directly relevant as it was used to predict
Ð 1
the maxim um tolerable load in women for short duration marching at 6.4 km.h .
Knapik et al.’ s attempts to model a 20 km march with 46 kg load ( 1990 ) were
80 M . Rayson et al.

ham pered by missing data. Their best model incorporated abdominal strength only,
2
but a r is not provided. In summary, the best loaded march models included
measures of aerobic ® tness, strength and body com position.

1.7. Sum mary


Variable success has been achieved in relating performance on physical tests with
perform ance on occupational tasks that approxim ate the criterion tasks identi® ed for
the British Arm y. A considerable range in the am ount of variation in the criterion
tasks scores accounted for by the test scores has been reported. Som e studies have
shown strong relationships, and while others are weak, some of these weak
relationships may be attributed to the fact that many potential predictor tests were
not included in the reviewed studies. The absence of a measure must not be equated to
the absence of a correlation. Indeed, many of these studies had completely diŒerent
objectives from our own and the developm ent of prediction equations was not of
prim ary importance. Further, the study populations diŒered from the population of
interest in this investigation, and the man ner in which the gender issue had been dealt
with in previous studiesÐ ensuring that the relationships between criterion task and
predictor tests was equally valid for m en and womenÐ was not always satisfactory.

2. Objective
The principal objective of this study was to determine which combination of physical
perform ance tests (in multiple regression equations ) could most accurately predict
criterion task performance in trained British Army personnel. Secondary objectives
included developing so-called `gender-free’ modelsÐ to provide common physical
selection tests and standards for men and women, and `gender-unbiased’ modelsÐ to
ensure the models did not disproportionately m isclassify either gender. The physical
perform ance tests that most successfully ful® lled these criteria would be retained for
validation as selection tests in a separate sam ple of the user population ( to be
published in the third paper in this series ).

3. M ethods
3.1. Study design
The objectives were met by administering the criterion tasks and a short-listed
battery of physical performance tests to a representative sam ple of trained soldiers in
a cross-sectional study. The study took place between September and Novem ber
1994, at a British Army base in W iltshire.

3.2. Participants
Each Arm and Service in the British Army was requested by the project client ( the
Directorate of M anning ( Army )) to nominate ~ 20 m ale and 20 female soldiers
representing the variety of trades and range of ® tness contained within that Arm and
Service. Only soldiers medically classi® ed as fully deployable were nom inated. The
authors do not know how many potential participants were screened out by their
units. Those nominated were then medically screened via questionnaire and any
contra-indications were followed up by a consultation with a civilian physician on
the ® rst test occasion. This process screened out six participants. Finally, all
participants provided informed consent to participate. Three hundred and four men
and 75 women ( mean age 23.5 (SD 4.45 ) years, stature 1734 (SD 79.5 ) mm, body
mass 71.4 (SD 10.58 ) kg ) took part.
Physical selection procedures for the British Army 81

The target sam ple size of 300 men and 300 women based upon a power analysis
was not achieved for women for two reasons. First, female soldiers were not
represented in all Arms and Services, and were sparsely represented in others,
inevitably leading to a shortfall in num bers. Second, their relatively small numbers
overall ( ~ 5% of Army personnel ) made recruitment to the study di cult even in
those Arms and Services where female soldiers were reasonably well represented. For
both genders arm y commitm ents inevitably took precedence over participation in the
study and this aŒected both the numbers nominated to take part in the study and in
the compliance of the soldiers to attend for testing over a num ber of days.

3.3. Criterion tasks


The criterion tasks consisted of two single lift tasks, a carry task, three repetitive lift
and carry tasks, and three loaded march tasks. The rationale for these criterion tasks
has been published elsewhere ( Rayson 1997, Rayson 1998 ) and is summarized in
Section 1.1. The participants were asked to perform the criterion tasks to their
individual maxim um without exposing themselves to undue risks. Army boots,
lightweight trousers, and shirt and combat jacket were worn at all times. Typically,
the single lift task was performed in the morning and the carry task in the afternoon
of day 1. The repetitive lift task was performed on day 2 and the loaded march on
day 3. Some ¯ exibility in scheduling was required to m inimize the con¯ ict with other
duties and to comply with availab ility of participants, facilities and staŒ.

3.3.1. Single lift tasks: The two single lift ( SL ) tasks involved lifting an
am munition box (dim ensions 480 ´ 200 ´ 190 m m ) with side handles, with increasing
loads from the ground to 1.70 m ( SL170 ) and to 1.45 m ( SL145 ) respectively.
Participants were advised on correct lifting techniques, but essentially the lift was
freestyle. Participan ts ® rst attempted a load of 10 kg, and after each successful lift,
5 kg ( or 4 kg after 40 kg ) was ad ded to the box until the participant could not
execute the lift at the ® rst attempt, or until a m axim um load of 72 kg had been
achieved. An upper limit of 72 kg load was imposed by the capacity of the
am munition box. A 10 s limit was imposed for com pletion of each lift and a
minimum of 1 min rest was enforced between attempts. The maxim um load lifted
successfully was recorded as the score ( kg ).

3.3.2. Carry: The carry ( C ) task required participants to walk continuously up and
Ð 1
down a 30 m course at a prescribed pace of 1.5 m.s , carrying by the handle, one
standard Army issue plastic water can of 20 kg, in each hand, for as long as possible.
The end point was de® ned by the participant’ s inability to m aintain the prescribed
pace or the inability to maintain a hold on the cans, and the task was scored as the
duration in seconds. No maxim um tim e limit was set.

3.3.3. Repetitive lift and carry: The three repetitive lift and carry ( RLC ) tasks
required participants to lift a loaded ( 10 kg (RLC10 ), 22 kg (RLC22 ) or 44 kg
( RLC 44 )) am munition box ( dimensions 480 ´ 200 ´ 190 m m ) from the ground, and
carry it 10 m to a platform of 1.45 m height and place it on the platform ( one
shuttle ). The participant then picked up the box, carried it back to the start point and
placed it down, ¯ at on the ground under control ( also one shuttle ). Trades allocated
Ð 1
to the RLC 44 worked at 1 shuttle.min , those allocated to RLC22 at 3
Ð 1 Ð 1
shuttles.min and those allocated to RLC10 at 6 shuttles.min . Participants were
82 M . Rayson et al.

advised on correct lifting techniques, but essentially the manoeuvre was freestyle.
Participants continued until they were unable to lift the box or to sustain the work
rate at the prescribed pace. The maxim um duration that the participants achieved,
up to a m axim um of 60 min constituted the score ( seconds ).

3.3.4. Load ed march: The loaded m arch ( LM ) required participants to complete,


by walking and running, a 12.8 km ¯ at bitumen course as quickly as possible, with a
15 kg ( LM 15 ), 20 kg ( LM 20 ) or 25 kg ( LM 25 ) Bergen rucksack. Participants started
individually and were encouraged to adopt a pace that would result in the fastest
tim e. Time to completion constituted the score (m inutes ).

3.4. Physical performance tests


A battery of physical performance tests was assembled on the basis of three criteria.
First, the literature reviewed in Sections 1.2 ± 1.6 identi® ed a number of physical
perform ance tests which had been shown to be predictive of sim ilar criterion tasks.
Second, discussions with subject-matter experts identi® ed some additional tests
which assessed aspects of body size or performance which might com plement those
determined from the literature review and account for additional variation in the
criterion task scores. Third, pragm atic constraints ( e.g. envisaged limited time and
technology at the Recruit Selection Centres where the tests would ultimately be
conducted ) im pacted on the selection of the performance test battery. All the tests
were scrutinized in a separate pilot study and found to be viable. The participan ts
were asked to perform all physical performance tests to their individual maxim um
without exposing themselves to undue risks. Army physical training ( PT )
clothingÐ T-shirt, shorts and training shoesÐ was worn for all of the tests.

3.4.1. Anthropom etry: Stature ( mm ), arm span ( mm ), body m ass ( kg ), bi-acromial


and elbow diam eters ( mm ), neck girth, chest girth ( at nipple height and below the
breasts ), waist girth ( at level of the umbilicus and at the narrowest girth ), and gluteal
girths ( mm ) were m easured according to the methods described in Collins ( 1990 ).

3.4.2. Body com position: Percentage body fat ( % ) and FFM ( kg ) were estim ated
via skinfold measurements at the biceps, triceps, subscapular and supra-iliac sites
( Durnin and W omersley 1974 ), and two electrical impedance devices used by the
British Army (Bodystat, Bodystat Ltd, Isle of M an, UK; and ElectroLipoGraph,
Biologics, Henley-on-Tham es, UK ) which employed diŒerent algorithms.

3.4.3. Static strength: During all of the static strength tests, one practice attempt at
50% eŒort was followed by three maxim al eŒorts of 3 ± 5 s, each separated by 30-s
rest intervals. Upright pull ( N ) at 38 cm ( Cha n 1975, Knapik et al. 1981 ) and
85 cm height from the ground, and arm ¯ exion strength (N ) ( Herm ansen et al. 1972 )
with the arm ¯ exed at 90 8 were measured using a dynam ometer ( Takei, Cranlea,
Birmingham , UK ). Hand grip strength of both hands was measured using a
dynam ometer (M IE, M edical Research Ltd, Leeds, UK ) and the mean score
recorded (Caldwell et al. 1974 ). Back extension strength ( N ) was measured following
the method of Hermansen et al. ( 1972 ). Plantar ¯ exion strength ( N ) was measured
using a purpose-built apparatus. The participant was seated with the dominant leg
positioned so that the lower leg was at 90 8 to the ¯ oor and the angle of the knee was
at 80 8 , with the calliper positioned around the thigh just proxim al to the knee. The
Physical selection procedures for the British Army 83

participant was instructed to exert a maxim al upward push again st the calliper by
using the plantar ¯ exor muscles to raise the heel.

3.4.4. Dynam ic strength / pow er: M axim um lift performance was measured between
a han dle height of 0.3 m and overhead using a hydro-dynam ometer ( Pinder and
Grieve 1997 ) according to the methods of Grieve and van der Linden ( 1986 ).
M axim um lift performance on an ILM ( kg ) from a handle height of 0.3 ± 1.45 m
( ILM 145 ) and 1.70 m ( ILM 170 ) was also assessed ( M cDaniel et al. 1983 ) using the
modi® cations to the procedures proposed by Stevenson et al. ( 1996 ).

3.4.5. M uscular endurance: Six measures of muscular endurance were included.


Static arm ¯ exion endurance with an am m unition box weighing 14 kg was measured.
The participant stood with feet shoulder width apart and knees slightly ¯ exed. The
box was picked up and held just clear of the body with the elbows by the side and
¯ exed at 90 8 and the lower arm parallel to the ¯ oor. The m axim um duration that the
participant could maintain this position was recorded ( seconds ).
Dynam ic arm ¯ exion endurance with a 15 kg barbell was measured. The test
involved the participant standing with the back ¯ at again st a wall, the barbell
grasped in an underhand grip, with the hands shoulder width apart and the arm s
straight. The participant curled and lowered the barbell in a controlled fashion in
time with a metronome. The m etronome commenced at 20 one-directional
Ð 1 Ð 1
movem ents min and increased by 2 movem ents min thereafter. The endpoint
of the test was de® ned by the participant’ s inability to maintain the work rate while
maintaining good `form’ up to a m axim um duration of 5 min and the test was scored
( seconds ).
Dynam ic shoulder endurance was measured via an upright row manoeuvre with
a 15 kg load on a stacked weigh t system. The participant knelt on a padded
platform, facing the machine in an upright posture. The bar was grasped with an
overhand grip. The test com prised performing a repetitive upright row manoeuvre at
a prescribed cadence ( as for the dynam ic arm ¯ exion endurance test ), by bringing the
bar to the level of the clavicles, by taking the elbow s outwards and upwards, while
the hips and back rem ained stationary. The endpoint of the test was de® ned by a
failure to maintain a full range of movement or failure to lift in time with the cadence
and the test was scored ( seconds ).
The sit-up test used was the Abdominal Curl Conditioning Test as described by
Brewer and Davis ( 1993 ) involving curling up and down in time with a prescribed
cadence until failure. The test was scored ( seconds ).
The push-up test involved performing standardized push-ups at a prescribed
cadence. The participant adopted a prone position on the ¯ oor with the hands
positioned at shoulder width. The participant pushed up using the muscles of the
arm s an d chest until the arm s were extended and the lower body was pivoting on the
toes, while maintaining the back straight. The participant then lowered the body
until the elbows were ¯ exed at 90 8 and then returned to the start position. This
manoeuvre was conducted in tim e to a metronome at the rate described for the
dynam ic arm ¯ exion endurance test. The endpoint of the test was de® ned as failure
to m aintain the required work rate or loss of correct `form’ . The duration was
recorded ( seconds ).
Pull-ups were performed on a wooden gym beam with the hands shoulder width
apart, and the beam grasped in an underhand grip. From the start position with the
84 M . Rayson et al.

arm s straight the participant performed as many pull-ups as possible, by raising the
chin above the bar and returning to the start position. No swinging of the body was
perm itted. The test was scored as the num ber ( n ) of com plete pull-ups achieved
without rest.

3.4.6. Aerobic ® tness: M axim al aerobic power was estimated from perform ance of
the M ultistage Fitness Test as described by Ram sbottom et al. ( 1988 ). The test
com prises a progressive shuttle run between marker cones placed 20 m apart in a
gym nasium. Speed of running was determined by a bleep on a cassette tape which
increased every 1 min. Participants continued until they could no longer sustain the
prescribed pace and failed to reach two consecutive markers in the allocated time.
The time to failure ( seconds ) was recorded as the score.

3.4.7. Performanc e indices: Indices were derived from the physical performance
test results:

· calculation of ratios of girths ( e.g. chest to waist and waist to gluteal girth
ratio );
· estimates of percentage body fat and FFM by the three m ethods described;
and
· calculation of mean lift power ( W ) from the hydro-dynam ometer test
between diŒerent vertical heights ( e.g. between the handle heights of 0.7 and
1.0, 0.4 and 1.45, and 0.4 and 1.70 m ); normalization of some strength / power
scores to body m ass where it was physiologically appropriate ( e.g. hydro-
dynam ic lift power, plantar ¯ exion strength, etc. divided by body . m ass;
conversion of the M ultistage Fitness Test scores to estimated VO 2 m a x in
Ð 1 Ð 1 Ð 0.67 Ð 1 Ð 1
ml.kg .min , ml.kg .min and l.min ; calculation of work indices
for pull-ups, sit-ups and push-ups by multiplication of raw scores ( number of
repetitions ) by body mass.

3.5. Procedures
Ethics approval for the study was provided by the Defence Research Agency’ s
Centre for Human Science Ethics Committee. M edical cover was provided during
the study by the Garrison M edical Centre. All participants received a detailed
brie® ng and gave informed consent prior to participation. Participants were tested
over 4 ± 7 days, participating in some of the criterion tasks and all of the physical
perform ance tests. The physical performance tests were administered in a
gym nasium, in three batteries, each with ® ve stations, and the M ultistage Fitness
Test was conducted subsequently. The batteries are shown in table 3. Participan ts
were divided into groups of 20 with four participants allocated to each of the ® ve
stations. W here possible, physically demanding tests were interspersed with
anthropometric measurements to avoid fatigue. Participants began at diŒerent test
stations, thereby randomizing any eŒects of cum ulative fatigue on any particular
physical perform ance test.

3.6. Statistical analys is


For all statistical tests, signi® cance was set at p < 0.05. A product moment
correlation matrix was calculated on the physical performan ce tests and factor
scores derived using principal components and orthogonal varim ax rotation ( Harris
Physical selection procedures for the British Army 85

Table 3. Batteries of physical performance tests.

Stations Battery 1 Battery 2 Battery 3

1 Anthropometry Skinfolds Electrical impedance ´ 2


2 Upright pulls and back Static arm ¯ exion Hand grip
extension strength
3 Plantar ¯ exion Hydro-dynamic lift Incremental lift machine
4 Static arm ¯ exion Shoulder endurance Dynamic arm ¯ exion
endurance endurance
5 Push-ups Pull-ups Sit-ups

1975 ). The factor analysis was used to identify interdependencies between the
performance test variab les and to reduce the set of variab les to a smaller number of
independent factors. The factors were nam ed and meaning was assigned to the
factors.
For those measures where the varia nce increased with the expected value
( heteroscedastic ), power transformations were investigated as a means of stabilizing
the varian ce and it was concluded that the natural logarithms ( ln ) of some measures
( carry, dynam ic and static arm ¯ exion endurance, hand grip, hydro-dynam ic lift
power ) should be used.
The relationships between performance on the criterion tasks and the physical
performance tests were calculated using Pearson product moment correlation and
stepwise multiple linear regression. W here all of the criterion task scores were
maxim al, the standard least squares procedure was used. W here maxim al criterion
task scores could not be assured for a minority of individuals who achieved the limit
imposed by the protocol, a maxim um likelihood procedure was used to take account
of the modi® ed distributional form.
All of the relationships were estimated for men and women separately, and the
hypothesis that the relationships were the sam e was tested in the standard manner by
breaking the hypothesis down into successive tests of parallelism ( equal slopes ) and
identity ( coincident intercepts given equal slopes ). If there was a signi® cant
relationship between dependent and independent measures determined by the
standard F-test and a common relationship for the two genders could not be
rejected, a single `gender-free’ relationship was derived. If the hypothesis of
parallelism of the relationships for the two genders was not rejected, but that of
identity was, then a `gender-related’ relationship was de® ned by including gender
explicitly in the model. If the hypotheses of both parallelism and identity for the two
genders were rejected, then `gender-speci® c’ relationships were de® ned, with separate
models for men and wom en.
All multiple regression equations, referred to henceforth as `models’ , were lim ited
to a maxim um of three test variab les so as to restrict the most promising physical
performance tests ( for nine criterion tasks in total ) to a practicable number. Gender
was allowed access as an additional variab le, coded as `1’ for men and `2’ for wom en.
The relationships between each criterion task and the most predictive physical
performance tests were explored by plotting three graphs. The ® rst showed the
relationship between the single most highly correlated physical performance test and
the criterion task. The second showed the relationship between the measured and
predicted criterion task scores, illustrating the degree of ® t. The third showed the
diŒerence between the measured and predicted criterion task scores (i.e. the bias )
86 M . Rayson et al.

again st their mean, according to the method proposed by Bland and Altman (1986 ).
This third ® gu re exposed any lack of agreem ent between the measured and predicted
scores, which may not have been readily apparent from the second ® gure. The Bland
and Altman plot also allowed an investigation of the relationship between the error
in the predicted score and the measured value.
Type I errors ( false-negative s ) and type II errors ( false-positives ) identify the
misclassi® cation associated with a model. Type I errors occur when an individual
passes the level or standard on the actual criterion task ( e.g. a 35 kg single lift to
1.45 m, or a 12.8 km march with 20 kg in 120 m in ) but the predicted score derived
from the m odel fails to achieve the standard. Type II errors occur when the predicted
criterion task score derived from the model achieves the standard but the individual
fails to achieve the standard on the actual criterion task. The combined type I and
type II errors associated with the selected models are cited in the Results.
The selection of a particular model for each of the nine criterion tasks was based on
the following criteria, listed in order of importance: minimizing the standard deviation
( SD ) ( i.e. the variation associated with the m odel ); maxim izing r 2 ( i.e. the proportion
of varian ce accounted for by the independent variab les ); minimizing the mean error
( i.e. the mean diŒerence between the measured and predicted criterion task score ); and
minimizing misclassi® cation rates ( i.e. the sum of type I and type II errors ).

4. Results
4.1. Criterion tasks
The results for the criterion tasks are sum marized in table 4. Values of 44 and 16.7%
of the men achieved the maxim um permissible load of 72 kg on the SL145 and
SL170 respectively, distorting the distributions and any descriptive statistics
associated with the men’ s data. The RLC44 task was performed by men only as
no women belonged to those trades allocated to the RLC44, and 17% achieved the
maxim um time stipulated in the protocol. On the RLC 22, 99% of the men and 12%
of the women, and on the RLC 10, 98% of men and 48% of women achieved the
maxim um time, thereby limiting the usefulness of the data for developing models.

Table 4. Mean ( SD ) results for the criterion tasks. SL145 and SL170 are the maximum load
(kg ) lifted to 1.45 and 1.70 m respectively. RLC44, RLC22 and RLC10 are the maximum
duration ( s ) sustained on repetitive lift and carry tasks, with 44, 22 and 10 kg respectively.

Criterion task ( units ) Male mean ( SD ) Female mean (SD )

SL145 ( kg ) 65.7 (7.00 )* 36.3 (9.03 )


SL170 ( kg ) 57.1 (10.26 ) 29.0 (6.76 )
Carry (s ) 288 ( 107.3 ) 117 (41.0 )
RLC44 (s ) 563 ( 448.6 ) n/ a
RLC22 (s ) 3578 ( 226.8 )* 1048 (1118 )
RLC10 (s ) 3574 (189 )* 2311 (1340 )
LM 25 ( min ) 103 ( 10.6 ) n/ a
LM 20 ( min ) 102 ( 11.1 ) 126 (11.0 )
LM 15 ( min ) 98 ( 12.4 ) 120 (15.6 )

LM 25, LM 20 and LM 15 are the time (min ) to com plete the loaded march tasks with 25,
20 and 15 kg loads respectively. *Distributions were distorted for these data, caused by a
sizeable proportion of the sample achieving the maximum duration of the protocol and,
therefore, not achieving maximum performance. RLC44 and LM 25 tasks were not performed
by female soldiers.
Physical selection procedures for the British Army 87

4.2. Factor analys is


Table 5 provides the output from the factor analysis. Several of the performance
tests had to be excluded from the analys is (gluteal girth, % body fat and FFM
derived from the ElectroLipoGraph device, plantar ¯ exion strength ) to retain a valid
sam ple size ( n = 272 ). r > 0.5 are highlighted to indicate the performance tests with
the strongest associations with these factors. Four factors ( Small Size, Overweight,
M uscular Strength and Endurance, and Aerobic Fitness ) were identi® ed which
accounted for 79% of the varian ce in the test scores.
Inspection of the correlation coe cients between the physical performan ce tests
and Factor 1 revealed that the most highly correlated variab les concerned
predominantly measures of body size. The measures of stature and arm span had

Table 5. Factor analysis.

Factor num ber 1 2 3 4


Factor name Small size Overweight Muscular Aerobic
strength / ® tness
endurance
Cum % of variance 34.3 51.8 70.9 79.4

Stature Ð 0.929 Ð 0.122 0.062 0.065


Arm span Ð 0.912 Ð 0.096 0.084 0.138
Body mass Ð 0.706 0.647 0.217 0.011
Body mass index Ð 0.161 0.921 0.222 Ð 0.024
Biacromial diameter Ð 0.820 0.046 0.190 0.135
Elbow diameter Ð 0.711 0.085 0.258 0.087
Neck girth Ð 0.668 0.281 0.432 0.228
Chest girth (at nipple height ) Ð 0.478 0.734 0.252 0.060
Chest girth (below breasts ) Ð 0.653 0.509 0.382 0.202
Waist girth ( at umbilicus ) Ð 0.431 0.835 0.092 Ð 0.095
Waist girth ( at narrowest ) Ð 0.523 0.728 0.273 0.084
Skinfolds (sum ) 0.130 0.849 Ð 0.177 Ð 0.288
% body fat (skinfolds ) 0.375 0.696 Ð 0.329 Ð 0.384
% body fat (Bodystat ) 0.441 0.677 Ð 0.357 Ð 0.304
FFM (skinfolds ) Ð 0.843 0.209 0.373 0.218
FFM (Bodystat ) Ð 0.846 0.273 0.363 0.147
38 cm upright pull Ð 0.599 0.110 0.596 0.187
85 cm upright pull Ð 0.763 0.045 0.375 0.119
Back extension strength Ð 0.580 0.240 0.516 0.148
Static arm ¯ exion strength Ð 0.470 0.067 0.641 0.177
Hand grip strength Ð 0.546 Ð 0.112 0.515 0.045
Hydro-dynamic lift power ( 0.4 ± 1.0 m ) Ð 0.681 0.096 0.586 0.056
Hydro-dynamic lift power ( 0.7 ± 1.45 m ) Ð 0.613 0.107 0.667 0.054
Hydro-dynamic lift power ( 0.71 ± 1.70 m ) Ð 0.639 0.096 0.646 0.057
ILM 145 Ð 0.600 0.164 0.596 0.191
ILM 170 Ð 0.641 0.181 0.587 0.195
Static arm ¯ exion endurance Ð 0.378 0.117 0.388 0.029
Dynamic arm ¯ exion endurance Ð 0.388 0.270 0.577 0.378
Dynamic shoulder endurance Ð 0.515 0.178 0.672 0.325
Sit-ups 0.134 Ð 0.163 0.153 0.663
Push-ups 0.048 Ð 0.222 0.722 0.373
Pull-ups Ð 0.134 Ð 0.467 0.637 0.318
Multistage
. Fitness Test Ð 0.390 Ð 0.380 0.240 0.738
1
VO 2m ax (l.min )
Ð
Ð 0.754 0.226 0.308 0.483
88 M . Rayson et al.

the strongest ( negative ) correlation with this factor. The negative correlation
coe cients indicate that the factor concerned `Small Size’ . Other tests of body size
( e.g. elbow and bi-acromial width ), FFM , muscle strength ( e.g. 85 cm upright pull )
. Ð 1
and aerobic power ( VO 2 m ax in l.min ) were also highly negatively correlated with
the factor. Factor 1 accounted for 34% of the varian ce in the data.
Factor 2 accounted for a further 18% of the varian ce and included variab les
measuring aspects of `Overw eight’ Ð body mass index, skinfolds and body fat, body
mass and body girths. M easures of muscular endurance ( e.g. pull-ups and push-ups )
and aerobic ® tness ( e.g. M ultistage Fitness Test ) were weakly negative ly correlated
with this factor.
Factor 3 accounted for a further 19% of varian ce, incorporating several
muscular endurance measures ( e.g. push-ups, dynam ic shoulder endurance ) and
strength measures (e.g. static arm ¯ exion, lift power )Ð many of which also had high
negative loadings with factor 1 ( `Small Size’ ). Factor 3 was nam ed `M uscular
Strength an d Endurance’ .
Factor 4 accounted for the last 8% of the explained varian ce and included only
two tests with a r > 0.5Ð maxim al aerobic power and sit-ups. This factor was nam ed
`Aerobic Fitness’ .

4.3. Relationships between performance on the criterion tasks and the physical
performance tests
The relationships between performance on the criterion tasks and the physical
perform ance tests were investigated using the techniques described in Section 3.6.
One exam ple is provided for the Single Lift to 1.45 m. Table 6 summ arizes the
physical performan ce tests that were most highly correlated with SL145 (p < 0.001 in
all cases ). The relationship between SL145 and FFM Ð the most highly correlated
physical perform ance testÐ is shown in ® gure 1.
The best SL145 model included back extension strength, FFM , ILM 145 divided
2
by body mass, and gender, producing a r = 0.88 and SD = 6.93 kg (table 7 ). The
model is displayed in ® gures 2 and 3. Figure 2 shows the relationship between the
directly measured SL145 scores and the predicted scores derived from the SL145
model. A diŒerence in the slope of the men’ s and wom en’ s regression lines which is
clearly discernible in ® gure 2 necessitated the inclusion of the variab le `gender’ . The
cluster of data points at the measured load lifted to 1.45 m of 72 kg re¯ ects the fact
that 47% of males reached the maxim um permissible load.
Figure 3 shows the degree of agreem ent between measured and predicted SL145
scores according to the methods of Bland and Altman (1986 ). The mean diŒerence

Table 6. Simple correlations for a single lift to 1.45 m.

Pooled (n = 234 ) Men ( n = 181 ) Women (n = 53 )

Rank Test r Test r Test r

1 FFM ( skinfolds ) 0.859 FFM (Bodystat ) 0.638 FFM ( Bodystat ) 0.618


2 FFM
. ( Bodystat ) 0.843 FFM
. (skinfolds ) 0.623 ILM 145 0.585
1 1
VO 2m ax ( l.min ) VO 2m ax ( l.min )
Ð Ð
3 0.825 0.609 ILM 170 0.573
4 FFM ( ELG ) 0.807 ILM 170 0.595 FFM ( skinfolds ) 0.563
5 ILM 170 0.806 Body mass 0.588 FFM ( ELG ) 0.546
Physical selection procedures for the British Army 89

Figure 1. Relationship between SL145 and FFM .

between measured an d predicted scores was Ð 2.7 kg and the SE of the diŒerence
was 0.38 kg. The limits of agreement were + 9.8 to Ð 15.3 kg. Among the wom en,
the model tended to over-predict the lower scores and under-predict the higher
scores.
Type I and type II errors expressed as a percentage are shown in table 8. For
SL145, combined type I and type II errors totalled 0 and 22.2% for the men and
women respectively at the level 2 standard ( 35 kg ), and 0 and 4.8% for the men and
women respectively at the level 3 standard ( 20 kg ).
The preceding series of statistical procedures relating criterion task and physical
performance test scores were repeated for each of the nine criterion tasks and the
preferred models are shown in table 7 and the percentage of type I and type II errors
in table 8. Not all of the criterion tasks have models for pooled, male and female
sam ples due to the absence of women from certain trades and due to the proportion
of participants who ach ieved the maxim um duration of some criterion tasks, as
stated in Section 4.1. A natural logarithmic transform ation ( ln ) of the carry data was
performed to remove the heteroscedastic variation.
A summary of those physical performance tests that appear most frequently as
the best predictors of criterion task performance is provided in table 9. Among the
measures of anthropometry, stature (C, LM 20 ), arm span ( SL170, C ), bi-acromial
width ( SL170, RL22, LM 25 ) and body mass ( SL145, SL170 ) were the best predictors
in 2 or more criterion tasks. The body composition measures of percentage body fat
( SL145, SL170, LM 20 ) and FFM ( SL145, SL170, C, LM 15 ) both featured
frequently. The static strength tests of 38 cm upright pull ( SL170, C, RL22,
RL10 ), and 85 cm upright pull ( C, RL22, LM 20 ), back extension ( SL145, SL170,
RL10 ), and hand grip ( C, RL22 ) were am ong the best predictors. Both dynam ic lift
tests of hydro-dyn am ic lift power (SL170, RL44, LM 20 ) and the ILM ( SL145,
SL170, C, RL44, RL10, LM 20, LM 15 ) and the muscular endurance measures of
static arm ¯ exion ( LM 25, LM 15 ), dynam ic arm ¯ exion ( C, RL44, RL22, RL10,
LM 25 ), sit-ups ( RL44, LM 15 ) and pull-ups ( C, LM 20 ) were am ong the best
90 M . Rayson et al.

Table 7. Models for predicting performance on the criterion tasks.

Task and
2
sample Model p r SD n

SL145 + 0.017 * back extension < 0.001 0.88 6.93 271


pooled + 0.999 * FFM (skinfolds ) < 0.001
+ 6.706 * ILM 145 /body mass < 0.001
Ð 6.013 * gender < 0.01
Ð 13.2

SL170 + 0.011 * 38 cm upright pull < 0.001 0.59 7.59 222


male + 0.829 * FFM (Bodystat ) < 0.001
+ 0.014 * back extension strength < 0.01
Ð 22.5

SL170 + 0.930 * FFM (skinfolds ) < 0.001 0.40 4.85 63


female + 5.817 * ILM 145 /body mass < 0.05
Ð 19.1

1n C + 0.022 * pull-ups < 0.001 0.70 0.30 232


pooled + 0.022 * arm span < 0.001
+ 0.019 * ln dynam ic arm ¯ exion endurance < 0.001
Ð 0.174 * gender < 0.05
+ 0.35

RLC44 + 1.527 * lift power ( 0.4 ± 1.0 m ) < 0.001 0.55 321 29
male Ð 606.689 * ILM 170 /body mass < 0.05
+ 0.027 * sit-ups ´ body mass < 0.05
+ 406.0

RLC22 + 16.51 * dynam ic arm ¯ exion endurance < 0.001 0.45 627 53
female + 3.284 * hand grip strength < 0.05
Ð 1440.1

RLC10 + 2.608 * 38 cm upright pull < 0.001 0.38 565 25


female Ð 801.1
.
LM 25 Ð
Ð
19.765 * VO 2m ax (l.min )
1
< 0.001 0.40 8.46 94
male + 0.530 * body mass < 0.001
Ð 0.052 * static arm ¯ exion endurance < 0.01
+ 142.7

LM 20 Ð 0.072 * Multistage Fitness Test < 0.001 0.55 9.48 100


pooled + 14.134 * gender < 0.001
+ 132.7

LM 15 Ð 0.108 * Multistage Fitness Test < 0.001 0.75 8.96 82


pooled Ð 11.661 * ln static arm ¯ exion endurance < 0.001
Ð 0.534 * % body fat ( skinfolds ) < 0.05
+ 233.4

The models to predict criterion task performance and their associated statistics are
shown. Column 1 lists the criterion tasks and the sample for which the model was valid
( pooled, male or female ). Column 2 lists the equation derived from the stepwise regression
procedure. Column 3 describes p associated with each variable in the equation. Columns 4± 6
provide the correlation coe cient squared, SD and num ber of cases used in the equation
respectively.
Physical selection procedures for the British Army 91

predictors in two or more criterion tasks. Indices of aerobic power derived from the
M ultistage Fitness Test ( SL145, SL170, C, LM 25, LM 20, LM 15 ) were also am ong
the best predictors in many of the criterion tasks.

Figure 2. Relationship between measured and predicted SL145 scores. The relationship is
shown between the directly measured SL145 scores (maximum load lifted to 1.45 m, up to
72 kg ) and the predicted scores derived from the SL145 model displayed in table 6
( SL145 = 0.017 * back extension strength + 0.999 * FFM (skinfolds ) + 6.706 * ILM 145 /
body massÐ 6.013 * genderÐ 13.2 ).

Figure 3. Degree of agreement between measured and predicted SL145 scores. The mean of
the measured and predicted criterion task scores (x-axis ) are shown against the diŒerence
of the scores (y-axis ) (Bland and Altman 1986 ), exposing the degree of agreement between
the measured and predicted scores, and the relationship between prediction error and
measured value. The mean diŒerence between measured and predicted scores was
Ð 2.7 kg; the SE of the diŒerence was 0.38 kg. The limits of agreement were + 9.8 to
Ð 15.3 kg. Among the women, the model tended to over-predict the lower scores and
under-predict the higher scores.
92 M . Rayson et al.

Table 8. Percentage of type I and type II errors associated with each model.

Type I errors (% ) Type II errors ( % )

Task Sample Level Pooled Men Women Pooled Men Women

SL145 Pooled 2 2.2 0 9.5 3.0 0 12.7


3 0 0 0 1.1 0 4.8
SL170 Men 1 2.7 5.0
SL170 Women 1 1.6 0
C Pooled 1 4.2 0 16.7 1.3 1.1 1.7
2 0 0 0 2.9 0 11.7
3 0 0 0 0 0 0
RLC44 Men 1 13.8 3.4
RLC22 Women 2 3.2 0 8.9 3.2 0 8.9
RLC10 Women 3 3.0 1.9 4.3 5.1 0 10.9
LM 25 Men 1 0 5.3
LM 20 Pooled 2 5.0 0 29.4 3.0 2.4 5.9
LM 15 Pooled 3 4.9 0 11.4 3.7 2.1 5.7

Provided is the percentage of type I errors (false-negatives ) and type II errors (false-
positives ) by gender and level, indicating the proportion of people who were misclassi® ed
when criterion task performance was predicted using the models shown in table 7.

In terms of the frequency with which the physical performance tests were am ong
the best predictors of criterion task performance, the M ultistage Fitness Test and the
ILM 145 were am ong the m ost related tests in six of the nine criterion tasks, dynam ic
arm ¯ exion endurance in ® ve, FFM and 38 cm upright pull in four, bi-acromial
width, body mass, 85 cm upright pull, ILM 170 and back extension strength in three.
The remaining physical performance tests were am ong the best predictors of
criterion task perform ance in two or less of the criterion tasks.
U sing the m ethods and criteria de® ned in Section 3.6, pooled gender m odels were
formulated for SL145, C, LM 20 and LM 15. Only the LM 15 m odel was gender-free.
The SL145, C and LM 20 models were gender-related in that they contained gender
explicitly in the model. M odels for the remaining criterion tasks were gen der-speci® c,
either because the development of a gender-free or gender-related model proved
elusive ( e.g. SL170 ), or because usable data for both males and females on the
criterion task were not availab le ( e.g. RL44, RL22, RL10, LM 25 ).

5. Discussion
5.1. Introduction
The objective was to identify a battery of physical performance tests that could be
best used to predict performance on the nine identi® ed criterion tasks in trained
soldiers. The models should predict performance as accurately as possible, and
ideally be `gender-free’ ( to provide common tests an d standards for men and wom en )
and be `gender-unbiased’ in their outcome ( not to misclassify disproportionately
either gender ).
The implication underpinning the validity of gender-free models is that the
relationship between physical performance test and criterion task performance is
fundam entally the sam e in both genders, varyin g in magnitude rather than in quality.
The underlying physiological assumption is that men and women perform physical
tasks in a compara ble fashion, utilizing the sam e energy systems, muscle groups and
Physical selection procedures for the British Army 93

ranges of movem entÐ task performance could largely be explained by the sam e
physical attributes and membership of one or other gender would, therefore, be
immaterial.
This preference for gender-free predictors resulted in the stepwise regression
analysis being run on pooled gender data in the ® rst instance. The validity of the
derived models was then assessed by testing for parallelism and identity. In selecting

Table 9. Sum mary of simple and multiple correlations between physical performance test
and criterion task scores.

Test /task SL145 SL170 C RLC44 RLC22 RLC10 LM 25 LM 20 LM 15

Stature W F
Arm span W PM W3
Body mass M3 M 3
Biacromial width W W M
% body fat P
( skinfolds )
% body fat P
( Bodystat )
% Body fat (ELG ) PW
FFM (skinfolds ) PM W3 PM W3 P 1
FFM (Bodystat ) PM W PM W3
FFM (ELG ) PM
38 cm upright pull 3 MW W 3
85 cm upright pull PM W PM
Back extension 3 2 W
strength
Hand grip strength PM 1
Plantar ¯ exion M
strength
Lift power M3
( 0.4± 1.0 m )
Lift power PM M
( 0.7± 1.45 m )
Lift power PM
( 0.7± 1.70 m )
ILM 145 W3 W1 W W M W
ILM 170 PM W 1 W
Static arm ¯ exion M2 PM 3
endurance
Dynamic arm 3 M W3 W M
¯ exion endurance
Dynamic shoulder M
endurance
Sit-ups 1 MW
Push-ups PM
Pull-ups 3 W
Multistage Fitness PM P PM M3 PMW 3 PM W3
Test

The relations are summarized between the criterion task and physical performance test
performance. P (pooled ), M (men ) and W ( women ) appear if the physical performance test
score produced one of the ® ve highest correlation coe cients. 1 (p < 0.05 ), 2 (p < 0.01 ) and 3
(p < 0.001 ) refer to the highest level of signi® cance achieved by a physical performance test in
any model (e.g. the code PM W3 indicates the strongest relation between the criterion task and
physical performance test scores ).
94 M . Rayson et al.

a speci® c m odel to predict each criterion task, the four criteria described in Section
3.6 were considered. Inevitably, comprom ises had to be reached to optimize the
ful® lm ent of all criteria. For exam ple, in selecting a preferred model, subjective
2
judgem ent was used to determine if a model with a given SD and r was superior to
2
an alternative model with a marginally larger SD but substantially larger r .

5.2. Sam ple


The participant sam ple suŒered from several limitations. The 379 participan ts
nominated by the British Army re¯ ected a comprom ise between the representative
sam ple of men and women from all trades in the Army requested by the authors and
the availab ility of personnel who could be released from their duties for up to 7 days.
W omen were under-represented in the study in both overall numbers and in speci® c
trades that were required to perform certain criterion tasks (e.g. RLC 44, LM 25 ). The
com mitment exhibited in some participants in providing maxim um safe eŒort on all
of the criterion tasks and performance tests was questionable at times. Further, data
were lost due to injury and absence from testing appointments that reduced the data
pool and weakened the statistical analyses. In short, although the extent to which the
® nal sam ple was representative of other trained personnel was not known, a
validation study on a diŒerent population of potential users ( recruits ) which is
described in a separate study, will resolve these uncertainties and unravel any biases.

5.3. Predictors of single lift tasks


5.3.1. Single predictive tests: The highest correlation coe cients between the
physical performance test and the lift task scores in this study (r = 0.86 ± 0.87 ) were
larger than many reported elsewhere ( Sharp et al. 1980, Stevenson et al. 1985, 1988 ),
though not as large as those reported by some authors ( Beckett and Hodgdon 1987,
Nottrodt and Celentano 1987 ). The ® nding that the coe cients were high er for men
than for wom en, particularly for SL170 was consistent with previous studies ( e.g.
M yers et al. 1984, Teves et al. 1985 ).
The strength of the correlation analys es would have been weakened by two
factors in this study. First, the imposition of a 72 kg m axim um load prevented
discrimination between men at the top end of the distribution and led to the authors
using a maxim um likelihood procedure to estimate scores in these men. An
alternative optionÐ to discard these 44% of males from the data analysisÐ was
explored and judged to be less attractive. The planned validation study will verify the
validity of these models and hence the appropriateness of using this statistical
method in this circumstance. Second, the use of relatively large load increments
reduced the sensitivity of the data, especially at the lower end of the distribution
am ong the women. Both of these weaknesses will be overcome by modi® cation of the
protocols for the subsequent validation study.
FFM , an index of whole-body m uscularity, was the single most important
predictor test of perform ance on both single lift tasks, irrespective of the sam pling
method (pooled, men or wom en ). From the studies in the literature which considered
both strength and body composition measurem ents, this ® nding is consistent with
only one (N ottrodt and Celentano 1987 ), though a second study ( Teves et al. 1985 )
showed both FFM and ILM test scores to be equally well correlated with maxim um
load lifted. Other studies have shown the ILM test to be a better predictor than FFM
of maxim um box lifting (Ayoub et al. 1982, M yers et al. 1984, Beckett and Hodgdon
1987 ).
Physical selection procedures for the British Army 95

Our study incorporated several strategies that should have enhanced the
predictive power of the ILM test above that observed in previous studies. This
study was the ® rst to match the hand heights of lift during the ILM tests and the lift
criterion tasks. Constrained ILM lifting techniques had been the norm in previous
studies, but because of the adverse impact of such constraints on wom en ( Stevenson
et al. 1996 ) freestyle techniques were permitted in this study. Further, the
recomm endations of Stevenson et al. ( 1996 ) to coach participants in power lifting
techniques and to adm inister lighter starting loads and smaller increments for
women, were also adopted. However, despite these modi® cations, FFM remained
the stronger predictor.
This ® nding m ight be explained by the fact that FFM , as a measurement
requiring no skill on the part of the participant, provided a more transferable
indicator of lifting performance than do the skill-dependent dynam ic lifting tests.
The concept of a `generic’ lift task m ay be a misnomer and the extent to which skill
may be transferred between a super® cially sim ilar lift task and a dynam ic lift test
may be sm aller than anticipated. W hatever the underlying mechanism, these ® ndings
strongly support the contention of Vogel ( 1992 ) that minimum FFM standards have
an important place in selecting military personnel.
Although FFM was the best predictor of single lift score in both men and
women, the next best predictor was diŒerent between genders. For SL145, the ILM
test
. was am ong the best ® ve predictors for both m en and women. In men, estimated
Ð 1
VO 2 m ax ( l.min ) and body m ass were also important. For SL170, body mass was an
important predictor in men, while ILM 145 and measures of skeletal size, most
notably bi-acromial width and arm span, were important in women. The greater
importance of anthropometric tests to SL170 might be expected given that the task
involved lifting to a greater height.

5.3.2. Single lift to 1.45 m mode l: A series of m odels aŒording an acceptable ® t


between performance at the physical performance tests and SL145 were produced.
The best model (table 7 ) included tests of muscle strength and body size and
compositionÐ nam ely, back extension strength, FFM , ILM 145 / body mass and
genderÐ which accounted for 88% of the varia nce. FFM and ILM 145 were the most
signi® cant contributors to the m odel. This ® nding was in line with the work of
Nottrodt and Celentano ( 1987 ) who recomm ended a model comprising FFM and
ILM 183 to predict maxim al load lifted to 1.33 m. A diŒerence in the slope of the
men’ s and women’ s regression lines necessitated the inclusion of the variab le `gender’
in the equation and made a gender-free model invalid.
The SL145 model resulted in a small mean error of prediction ( Ð 2.7 kg ) with a
95% CI = 13 kg, equating to 21% of the mean SL145 score. This m odel compares
favourably with other lift models in the literature ( see Section 1.3 ). Only the model
devised by N ottrodt and Celentano ( 1987 ) accounted for a higher proportion of the
varia nce. Further, proposed modi® cations to the single lift task protocols should
improve the ® t and reduce the varian ce of the model.
The relatively small mean misclassi® cation rates of 5% for soldiers allocated to
level 2, and 1% for soldiers allocated to level 3 masked considerable variation
between gen ders. M isclassi® cation rates for the men were zero at both levels ( all men
substantially surpassed both standards ), while for the wom en, misclassi® cation rates
were 22% at level 2 and 5% at level 3. The greater misclassi® cation in women was
due partly to the larger error in prediction in wom en’ s scores and partly to a greater
96 M . Rayson et al.

clustering of women’ s scores around the pass standard. The model was, therefore,
gender-biased in its outcome ( i.e. more women were misclassi® ed than men ),
resulting in indirect discrimination again st women.

5.3.3. Single lift to 1.70 m model: DiŒerences in the slopes and intercepts of the
lines resulted in the formulation of gender-speci® c models for SL170. The best men’ s
model ( table 7 ) included tests of muscle strength an d body compositionÐ 38 cm
upright pull, back strength and FFM Ð accounting for 59% of the varian ce. The
mean error of the model was eŒectively zero and the 95% CI = ~ 14 kg, equating to
24% of the mean m en’ s score. M isclassi® cation rates for the men allocated to level 1
totalled 8% .
The best women’ s model ( table 7 ) also included tests of muscle strength and body
size and compositionÐ FFM and ILM 145 /body massÐ accounting for 40% of the
varian ce. The m ean error of the model was also zero and the 95% CI = ~ 10 kg,
equating to 33% of the mean women’ s score. M isclassi® cation rates for the wom en
were very low at 1.6% , but the reason for this small misclassi® cation was that only
one woman achieved the 44 kg standard and most scored substantially below.

5.3.4. Static versus dynam ic strength tests: In this study, the dynam ic strength tests
were superior to the static tests as predictors of maxim um lifting performance ( M ital
et al. 1986b ). The ILM test was a better single predictor of SL145, and the
hydrodynam ic lift power and ILM scores were better single predictors of SL170,
than were the static lift tests ( 38 and 85 cm upright pull ), irrespective of the method
of sam pling. The test of static upright pull did, however, feature prom inently in the
SL170 model for men, and the static tests were considerably simpler and quicker to
administer.

5.4. Predictors of carry task


5.4.1. Single predictive tests: The repeatability of the carry task was found to be
unacceptable ( Rayson and H ollim an 1995 ) when the reliability of the criterion tasks
was veri® ed by administering them to a subsam ple of soldiers. A mean score on the
second test occasion was 34 s ( 17% ) lower than on the ® rst attempt (p < 0.001 ), and this
may partly account for the lower correlation coe cients produced between physical
perform ance test scores and the carry task ( r = 0.7 ). The reasons for the poor reliability
are uncertain. W hen a work-load is su ciently light to be maintained for long periods,
the factors which lead to the decision to term inate the test vary betw een individuals. A
high degree of motivatio n is key to ensuring maxim al voluntary performance and this
can never be assured, even in volunteer participants. The fatigue mechanism s are not
well understood, though there is evidence of both centrally and peripherally mediated
fatigue ( Jones and Round 1992 ). Tests involving higher loads that can be sustained
only for short periods of tim e are more consistent and probably more accurately re¯ ect
true peripheral limitation rather than central factors ( Evan s et al. 1983 ). This may also
explain the better predictive power of hand grip strength over hand grip endurance in
predicting the carry task in the pilot study (Rayson et al. 1994b ).
The poor reliability of the carry task failed to instil con® dence in the robustness of
any predictive model which could be derived and raised a question as to whether the
carry should be elim inated or modi® ed as a criterion task. This type of continuous,
prolonged carry ( e.g. stretcher carrying, water and fuel can carrying ) was reported as
a common activity am ong soldiers of many trades ( Rayson 1998 ), so eliminating it as
Physical selection procedures for the British Army 97

a criterion task was not a desirable option. Consequently, despite the poor reliability,
the relationships with the physical performance tests were investigated to gain an
insight into the best predictors of carry performance. The physical performance test
that best predicted the carry task varied by gender. Higher coe cients am ong women
than men have been reported for carry tasks ( Stevenson et al. 1988, Rice and Sharp
1994 ): this ® nding was substantiated in this study. For men, a measure of m uscle
strength (85 cm upright pull ) was the strongest predictor, followed by a measure of
body size ( arm span ) and a further measure of strength ( hand grip ). For wom en, there
was little to diŒerentiate between the predictive power of measures of strength ( mean
hydro-dynam ic lift power and 38 cm upright pull ) an d body size ( arm span ).
No directly comparable carry tasks were found in the literature, largely because
the majority of carry tasks in previous studies involved intermittent rather than
continuous work and in some respects were more akin to our RLC task than the
carry (e.g. Ayoub et al. 1982, Beckett and Hodgdon 1987 ). A few tasks were broadly
similar ( Stevenson et al. 1985, 1988, 1992, 1994, Rice and Sharp 1994 ) but the
ingredients of the test batteries were sometimes very restricted, limiting any
conclusions that may be drawn.
Tests of upright pull have been extensively cited as strong predictors of carry tasks
and tests of lift strength / power have also been reported as important predictors in
some studies, though not in others ( see Section 1.4 ). The importance of hand grip
strength and carry task performance has considerable face validity and has been
reported extensively. The m ost acute problem for participants appeared to be
maintaining a grip on the cans, which is associated with local muscular fatigue of the
hand and wrist ¯ exors (Lind and M cNicol 1968, Kearney and Stull 1981 ). Legg and
Patton ( 1987 ) had found that 8 days of sustained m anual work and sleep deprivation
resulted in a reduction in hand grip strength in soldiers, though the inclusion of heavy
material handling did not appear to im pact on the reduction in hand grip performan ce.
Time to fatigue is dependent upon the proportion of m axim um tension exerted
by the muscleÐ the relationship being hyperbolic (Evans et al. 1983 ). Fatigue is
induced in sustained contractions > ~ 15% maxim um voluntary contraction
( M VC ), after which the blood supply to the muscle is occluded. The 20 kg load in
each hand during this carry task represented > 15% M VC for all participants.
Despite the importance of hand grip strength as a predictor in the pooled and men’ s
sam ple, the test score was noticeably absent as an important predictor for the
women. The reason for this absence of relation is uncertain but may concern the
small range of M VC in women.
Indices of aerobic power have also been cited as contributing to carry
performance in a number of studies ( see Section 1.4 ), though the controlled pace
Ð 1
of walk ing at 1.5 m.s in this study would have . reduced the im portance of aerobic
Ð 1
® tness to performance. Interestingly, estimated VO 2 m ax ( l.min ) was an important
predictor in both the pooled and men’ s sam ple. H owever, the selection of an index of
Ð 1
aerobic ® tness recorded in l.min , rather than a body mass related index ( e.g.
Ð 1 Ð 1
ml.kg min ), sugge sts that the relationship in men may be more a function of
muscle mass, than of aerobic ® tness ( M yers et al. 1984 ). However, other studies have
found FFM to be unrelated to carry performance ( Stevenson et al. 1985, Beckett and
Hodgd on 1987 ).

5.4.2. Carry model: To eliminate the heteroscedastic varia tion in the carry data, a
natural logarithmic ( ln ) transformation was conducted. The best model ( table 7 )
98 M . Rayson et al.

included tests of muscle endurance and body sizeÐ pull-ups, arm span and dynam ic
arm ¯ exion enduranceÐ accounting for 70% of the variation. DiŒerences in the
slopes for m en and women necessitated the inclusion of `gender’ in the model. The
two most signi® cant physical performance tests were dynam ic arm ¯ exion endurance
and arm span. Neither measure had been considered in previous studies, though
related tests (e.g. stature and pull-ups / ¯ exed arm hang ) had been included and
several studies had reported their relevance ( Stevenson et al. 1985, 1988, Beckett and
Hodgdon 1987 ).
The mean error of the model was zero but the 95% CI = 0.57 ( ln seconds ),
equating to ~ 57% of the mean carry score. Again, mean misclassi® cation rates of
< 5% for the three levels masked som e gender diŒerences. M isclassi® cation rates
were higher for wom en than for men allocated to level 1 ( 15 versus 1% ) and level 2
( 12 versus 0% ), resulting in gender bias and indirect discrimination again st women.
The proportion of varian ce accounted for by the best model ( 70% ) failed to
match the model derived in the pilot study ( 82% , Rayson et al. 1994b ), but it was
com parable or superior to the pooled gender models produced by M yers et al. ( 1984 )
and Beckett and Hodgdon ( 1987 ).

5.5. Predictors of repetitive lift tasks


The limitations of the repetitive lift data ( see Section 4.1 ) made the analyses and
interpretation of the data di cult. The absence of female data for the RLC44 and
the ® nding that virtually all of the males achieved the maxim um duration on the
RLC22 and RLC10 resulted in the analyses being run on single gender data, thereby
denying the possibility of developing gender-free or gender-related m odels.

5.5.1. Repetitive lift with 44 kg: The most predictive single physical performance
tests in the exclusively male sam ple involved a power to body m ass indexÐ mean lift
power from 0.7 to 1.0 m divided by body mass. The only other single predictive test
worthy of mention was a test of muscle enduranceÐ dynam ic arm ¯ exion. The best
model ( table 7 ) included tests of muscle strength and enduranceÐ mean lift power,
ILM 170 divided by body mass, and sit-ups multiplied by body massÐ accounting for
55% of the varian ce. M isclassi® cation rates totalled 17% for the men. The model
produced a relatively large SD, resulting in a 95% CI = ~ 600 s, equivalent to 100%
of the mean score.
There were no tasks in the literature involving lifting similar loads to similar
heights with which to make direct comparisons. The closest were probably the box
carry tasks reported by Beckett and Hodgd on ( 1987 ) and M ello et al. (1995 ),
involving objects of sim ilar dimensions with loads of ~ 35 kg. However, the task
used by Beckett and Hodgdon did not involve lifting and was really a maxim um
eŒort interval carry. Not surprisingly, an index of aerobic ® tness was the best
predictor test, though upper body endurance measured via pull-ups and push-ups
also correlated highly. Beckett and H odgdon’ s ( 1987 ) m odel included aerobic ® tness
and a m easure of explosive power (either broad jump or dynam ic lifting ).
The task used by M ello et al. ( 1995 ) involved identifying the maxim um load that
could be sustained for one h of repetitive lifting and carrying. M ean arm power on a
W ingate test provided the highest correlation in the pooled gender sam ple, followed
by bench press. The best model included stature and bench press, and the model
selected by the authors included arm power only. Unfortunately, the study by M ello
et al. was published after completion of this study, and neither the W ingate nor
Physical selection procedures for the British Army 99

bench press tests had been included ( though push-ups, which are closely related to
bench press, was included ) as they appeared to have little functional relevance to our
criterion tasks. However, in all three studies cited, the relevance of upper body
strength and endurance was apparent.

5.5.2. Repetitive lift with 22 kg: The best single predictor tests included measures of
muscle strength, enduran ce and body sizeÐ ILM 145, 38 cm upright pull, dynam ic
arm ¯ exion endurance and bi-acromial widthÐ on the exclusively female sam ple. The
best m odel ( table 7 ) included one test of strength ( hand grip ) and one test of
endurance (dynam ic arm ¯ exion ), accounting for 45% of the varian ce. The 95% CI
was an unsatisfactory 1800 s, equivalent to ~ 68% of the mean task score. The
misclassi® cation rates totalled 18% am ong the women.
Parallels may be drawn between the RLC22 task and the 20 kg sand bag carry
used by Stevenson et al. ( 1988, 1992, 1994 ), an d the repetitive box lifts used by M ello
et al. ( 1995 ) an d by Aghazadeh an d Ayoub ( 1985 ). Aghazadeh and Ayoub’ s paper
lacks details of the relationships between the static and dynam ic strength tests, and
task performance. However, they reported isokinetic lift scores to be superior
predictors to static strength scores for predicting ¯ oor to shoulder height lifting for
an 8 h shift. The best static strength tests involved shoulder and leg muscles.
M ello et al. ( 1995 ) reported FFM of the arm , whole-body FFM , gender and
stature to be the best single predictors of ¯ oor to shoulder height lift performance for
an eight h shift in a pooled-gender sam ple. The best model from this study included
none of these physical performance test scores. Neither FFM nor stature were
am ong the most powerful predictors in the present study.
Stevenson et al. ( 1988, 1992 ) allow more detailed comparisons of ® ndings. The
1988 study, using a sand bag carry in younger women, reported maxim um lifting,
¯ exed arm hang, and sit-ups to be the best predictor tests. Neither pull-ups nor sit-
ups were good predictors in this study, but the ILM test was the single best predictor.
Stevenson
. et al.’ s model for older women ( 1988 ) included ¯ exed arm hang, ILM and
2
VO 2 m ax , which provided a lower r to the model in this . study. The 1992 study, also
using a sand bag carry in younger wom en, reported VO 2 m ax to be the only signi® cant
predictor. Other tests of importance in men and older women included hand grip
strength and endurance.

5.5.3. Repetitive lift with 10 kg: The best single predictor tests included measures of
strength and enduranceÐ dynam ic arm ¯ exion endurance, ILM 145 and 38 cm
upright pullÐ on the exclusively female sam ple. Even the best model produced an
unacceptable prediction of performance, involving 38 cm upright pull only ( table 7 ).
Although misclassi® cation levels were m oderate (8% overall, 16% for women ) the
equation could only account for 38% of the varian ce, had a m ean prediction error of
nearly 1000 s and a 95% CI > 2200 s, equivalent to 75% of the mean scores. No
comparative tasks with RLC10 have been reported in the literature.
Data from this study provide mixed support for the contention of M ital et al.
( 1986a, b ) that repetitive dynam ic tests are superior to strength measures ( either
static or dynam ic ) in predicting sustained lifting performance. In this study, the test
of lift power provided the highest single predictor of RLC44 by a considerable
margin. However, the test of dynam ic arm ¯ exion endurance featured as a prominent
predictor of all three repetitive lift tasks, supporting the importance of tests of
dynam ic muscular endurance, especially with moderate and lighter loads.
100 M . Rayson et al.

It is the proportion of M VC that the load represents, which largely determines


the relative contributions of muscle strength and endurance. The negative
exponential relationship between percentage maxim um load and sustainable
duration will determine the score on the task (Legg and Pateman 1984 ). If the
load re¯ ects a high percentage of an individual’ s maxim um strength, then maxim um
duration will be severely limited. Conversely, if the load re¯ ects 50% or less of an
individual’ s maxim um capacity, then it is likely that that individual would be able to
sustain the intermittent activity for some time.

5.6. Predictors of loaded march tasks


5.6.1. Single predictive tests: Only Infantry soldiers ( exclusively male ) were
allocated to LM 25 and consequently no wom en performed this criterion task. For
LM 20 and LM 15, the best predictors of the m arch tasks diŒered somewhat by
gender. The largest correlation coe cients for the wom en (0.66 ) were signi® cantly
higher than for the men ( 0.5 ) for LM 20, though this did not hold true for LM 15
where coe cients were of a similar order for both genders ( 0.70 ).
As expected from the literature ( see Section 1.5 ), indices of aerobic ® tness
dominated the sim ple correlations and. the models Ð for the three loaded march tasks.
1
Estimated maxim al aerobic power ( VO 2m a x (l.min )), derived from the M ultistage
Fitness Test and body m ass, was substantially the best predictor for LM 25 in men
while the M ultistage Fitness Test score itself was the best predictor of LM 20 in the
pooled and men’ s sam ples and of LM 15, however, the participants were classi® ed
( pooled, male or female ).
Additional strong predictors of loaded march performance included measures of
muscle strength ( 85 cm upright pull, lift power, ILM test ) and endurance (dynam ic
and static arm ¯ exion, push-ups ). In women, stature was also important.

5.6.2. Load ed march models: The best march models for each of the three criterion
tasks included an index of aerobic ® tness plus one or more additional . physical
Ð 1
perform ance tests. The LM 25 model ( table 7 ) included estimated VO 2m a x ( l.min ),
body m ass and static arm ¯ exion endurance, which accounted for 40% of the
varian ce. The LM 20 model ( table 7 ) included the M ultistage Fitness Test and
gender, accounting for 55% of the varian ce. The LM 15 m odel (table 7 ) included the
M ultistage Fitness Test, static arm ¯ exion endurance and percentage body fat
accounting for 77% of the varian ce. The 95% CI were all between 16 and 18 min,
which equated to ~ 16% of the mean march times.
The misclassi® cation rates were variab le between the march tasks and genders.
The misclassi® cation rates for men were low ( 2 ± 5% ) for all models, largely because
the vast majority of men achieved the required standards of 120 min, many by a
considerable margin. H ow ever, the misclassi® cation rates for women were
highÐ 35% for LM 20 and 18% for LM 15Ð resulting in gender bias and indirect
discrimination again st women. The relatively high percentage misclassi® cation rate
was mainly a result of the women’ s scores being distributed around the pass standard.
Recent studies by Rays on et al. ( 1993 ) on female soldiers and by Frykman and
Harm an ( 1995 ) on male soldiers identi® ed stature as a good predictor of the ability to
march with a load. Interestingly, stature appeared im portant for women in LM 15, yet
not in LM 20, nor in any of the marches for men. Ankle plantar ¯ exion strength had
also been shown to be a useful predictor ( Knapik et al. 1990, Rayson et al. 1993 ), but
the test did not appear in any of the models with the highest predictive power.
Physical selection procedures for the British Army 101

The relationship of percentage body fat and FFM to loaded march performance
is intriguing. Several authors have shown a negative relationship between fatness and
march performance ( Dziados 1987, Rays on et al. 1993 ), though there is little
consensus as to the extent of the impact of fatness. Only in the pooled-gender data
for LM 20 does fatness appear as a prominent predictor in this study, though there is
also some evidence of fatness degrading performance in the men’ s data for LM 20
and LM 15. Strangely, in the pooled-gender LM 15 model, a greater level of body fat
appears to be associated with enhanced march performance. This ® nding might be
related to the restricted range of values of percentage body fat in this sam ple and the
absence of morbidly obese people.

5.7. Sum mary


The objectives of this studyÐ to identify the most predictive physical performance
tests of criterion task performance in trained British Arm y personnel and to develop
gender-free and gender-unbiased proceduresÐ have been partly m et. Prediction
models have been generated for all criterion tasks and the most predictive physical
performance tests have been identi® ed. How ever, the accuracy with which criterion
task scores could be predicted varied considerably and our success in developing
gender-free and gen der-unbiased models was only partially successful.
Only one model ( LM 15 ) was gender-free. Three further models were gender-
related ( i.e. contained `gender’ explicitly in the model ) and the remaining six were
gender-speci® c ( i.e. were appropriate to men or to wom en only ). Owing m ainly to the
larger errors associated with predicting the women’ s scores and the ® nding that
women’ s scores had a greater tendency than the m en’ s to be distributed around the
pass standard, many of the models were gender-biased in their outcome, resulting in
a greater proportion of misclassi® cations and therefore, indirect discrimination
again st women.
Both single lift criterion tasks were successfully modelled using physical
performance tests. A `gender-related’ model was derived for SL145, but `gender-
speci® c’ models had to be derived for SL170. All single lift m odels involved measures
of muscle strength ( some relative to body mass ) and FFM . The level of agreem ent
between measured and predicted criterion task scores was acceptable and should be
improved further by m odifying the criterion task protocols.
The carry criterion task was modelled using a gender-related model but the
errors were large, possibly due to the unsatisfactory reliability of the carry task.
The best model included measures of muscle endurance and body size. The
misclassi® cation rates were gender-biased resulting in indirect discrimination
again st women. Unless an improved carry model can be derived using more
highly m otivated participants, options other than using a prediction model will
have to be considered for the carry task. Alternative strategies include eliminating
performance on the carry task as an entry criterion, or administering the criterion
task itself.
The data generated for all three repetitive lift criterion tasks were weak due to
inadequate task protocols. M odels were derived on single-gender data but they were
unsatisfactory in their current form, due mainly to the large CI. However, measures
of muscle strength and endurance and body size were identi® ed as predictors of
repetitive lift and carry performance. Extending the maxim um duration of the task
protocols in the subsequent validation study should generate better data distribu-
tions and permit new, improved models to be derived.
102 M . Rayson et al.

All of the loaded m arch criterion tasks were successfully modelled incorporating
indices of aerobic ® tness, usually supplemented by measures of strength, endurance
or body size and composition. An absence of female data inevitably resulted in the
generation of a single-gender model for LM 25. A gender-related model was
developed for LM 20, while a gender-free model was appropriate for LM 15. The level
of agreement between measured an d predicted criterion task scores was acceptable.
H aving developed a series of models to predict performance on the nine criterion
tasks, the models must be validated in a separate sam ple of the user population of
recruits. The validation study is reported in a separate paper.

6. Conc lusions
The principal objective of this study to determine which combination of physical
perform ance tests could be best used to predict performance in nine criterion tasks in
trained British Army soldiers has been met. However, the accuracy with which
criterion task perform ance can be predicted was variab le and unsatisfactory for some
of the criterion tasks. The secondary objective of developing gender-free models was
achieved in only one criterion task, the LM 15. Gender-related models were devised
for three criterion tasks, the SL145, C and LM 20, and gender-speci® c models were
devised for the remaining ® ve criterion tasks, SL170, RL44, RL22, RL10 and LM 25.
The additional secondary objective of developing gen der-unbiased models proved
elusive. This was partly due to the limitations in the data which have been discussed
and also the ® nding that the women’ s scores tended to be distributed around the pass
standards, whereas the male’ s scores tended to exceed the pass standards. The result
was a greater proportion of misclassi® cations and, therefore, indirect discrim ination
again st females. A further validation study on recruits is necessary to address these
concerns.

Acknow ledgements
This work was funded by the M inistry of Defence and conducted at the Defence
Research Agency’ s Centre for Human Sciences. The contribution of the following
scienti® c staŒis gratefully acknowledged: M r Rene Nevola, M rs Barbara Sage, M r
Dom inic Lineham , M s Clare Birch, M s Anne Rothw ell, Cpl Dave W illis, CSgt
Gordon Gill, Dr Reg W ithey and Dr M ike Stroud at DERA, M r Andrew Pinder and
Professor Don Grieve from the Royal Free Hospital, and Professor David Jones at
the University of Birmingham . The authors acknowledge the client, the Director of
M anning ( Arm y ), for their vision and commitment to this project, and also the
British Arm y soldiers who took part. Finally, we thank the reviewers of the paper for
constructive comments.

References
A G HAZAD EH , F. and A YO UB , M. M. 1985, A com parison of dynam ic and static-strength models
for prediction of lifting capacity, Ergonomics, 28, 1409 ± 1417.
A STRAND , P. O. and R ODAH L , K. 1986, Textbook of Work Physiology: Physiological Bases of
Exercise, 3rd edn ( Singapore: McGraw-Hill ).
A YO UB , M. M. and M ITAL , A. 1989, Manual Materials Handling ( London: Taylor & Francis ).
A YO UB , M. M., D ENARD O , J. D., S M ITH , J. L., B ETHEA , N. J., L AMBERT , B. A., A LLEY , L. R. and
D URA N , B. S. 1982, Establishing Physical Criteria for Assigning Personnel to Air Force
Jobs: Final Report. Air Force O ce of Scienti® c Research, Contract no. F49620-79-C-
0006 ( Lubbock : Institute for Ergonomics Research, Texas Tech University ).
Physical selection procedures for the British Army 103

B ECK ETT , M. B. and H ODG DON , J. A. 1987, Lifting and Carrying Capacities Relative to Physical
Fitness Measures. Technical Report 87-26 ( Bethesda: Naval Health Research Centre ).
B LAN D , J. M. and A LTM AN , D. G. 1986, Statistical methods for assessing agreement between
two methods of clinical measurem ent, Lancet, i ( 8476 ), 307 ± 310.
B REW ER , J. and D AVIS , J. 1993, Abdominal Curl Conditioning Test ( Headingley: National
Coaching Foundation ).
C ALD W ELL L. S., C HAFFIN , D. B., D UK ES -D OBOS , F. N., K ROEMER , K. H. E., L AUBACH , L. L.,
S NO OK , S. H. and W ASSER MAN , D. E. 1974, A proposed standard procedure for static
muscle strength testing, American Industrial Hygiene Association Journal, 35, 201 ± 205.
C HAFFIN , D. B. 1974, Human strength capability and low-back pain, Journal of Occupational
Medicine, 16, 248 ± 254.
C HAFFIN , D. B. 1975, Ergonom ics guide for the assessment of hum an static strength, American
Industrial Hygiene Association Journal, 36, 505 ± 511.
C OLLINS , K. J. (ed. ) 1990, Handbook of Methods for the Measurement of Work Performance,
Physical Fitness and Energy Expenditure in Tropical Populations ( International Union of
Biological Sciences ).
D U EKER , J. A., R ITCH IE , S. M., K NOX , T. J. and R OSE , S. J. 1994, Isokinetic strength testing and
employment, Journal of Occupational Medicine, 36, 42 ± 48.
D U RN IN , J. V. G. A. and W ORM ERSLEY , J. 1974, Body fat assessed from total body density and
its estimation from skinfold thickness: measurements on 481 men and women aged 16 to
72 years, British Journal of Nutrition, 32, 77 ± 79.
D ZIADO S, J. E., D AM AK OSH , A. I., M ELLO , R. P., V OG EL , J. A., K EN NETH , L. and F ARM ER , J. 1987,
Physiological Determinants of Load Bearing Capacity. Technical Report T19 ± 87
( Natick: US Army Research Institute of Environm ental Medicine ).
E VA NS , O. M., Z ERBIB , Y., F AR IA , M. H. and M ON OD , H. 1983, Physiological responses to load
holding and load carriage, Ergonomics, 26, 161 ± 171.
F RYKM AN , P. N. and H ARM AN , E. A. 1995, Anthropometric correlates of maximal locom otion
speed under heavy backpack loads, Medicine and Science in Sports and Exercise, 27
( suppl. ), 5.
G R IEVE , D. W. and VAN D ER L IN DEN , J. 1986, Force, speed and power output of the hum an
upper limb during horizontal pulls, European Journal of Applied Physiology, 55, 425 ±
430.
H A RRIS, R. J. 1975, A Primer of Multivariate Statistics (New York: Academ ic Press ).
J ON ES , D. A. and R OUN D , J. M. 1992, Skeletal Muscle in Health and Disease ( Manchester:
Manchester University Press ).
K A MON , E., K ISER , D. and P YTEL , J. 1982, Dynamic and static lifting capacity and muscular
strength of steelmill workers, American Industrial Hygiene Association, 43, 853 ± 857.
K EARN EY , J. T. and S TU LL , G. A. 1981, EŒect of fatigue level on rate of force developm ent by
the grip ± ¯ exor muscles, Medicine and Science in Sports and Exercise, 13, 339 ± 342.
K N APIK , J. J., S TAA B , J., B AH RKE , M., O’ C ON NOR , J., S H ARP , M., F RYK MAN , P., M ELLO , R.,
R EYN OLD S, K. and V OGEL , J. A. 1990, Relationship of Soldier Load Carriage to
Physiological Factors Military Experience and Mood States. Technical Report T17-90
( Natick: US Army Research Institute of Environm ental Medicine ).
K N APIK , J. J., V OG EL , J. A. and W RIG HT , J. E. 1981, Measurement of Isometric Strength in an
Upright Pull at 38 cm. Technical Report T3 /81 ( Natick: US Army Research Institute of
Environmental Medicine ).
K R OEM ER , K. H. E. 1983, An isoinertial technique to assess individual lifting capability,
Human Factors, 25, 493 ± 506.
K R OEM ER , K. H. E. 1985, Testing individual capability to lift material: repeatability of a
dynam ic test com pared with static testing, Journal of Safety Research, 16, 1 ± 7.
L EG G , S. J. and P ATEM AN , C. M. 1984, A physiological study of the repetitive lifting capabilities
of healthy young males, Ergonomics, 27, 259 ± 272.
L EG G , S. J. and P ATTON , J. F. 1987, EŒects of sustained manual work and partial sleep
deprivation on muscular strength and endurance, European Journal of Applied
Physiology, 56, 64 ± 68.
L IN D , A. R. and M C N ICHOL , G. W. 1968, Cardiovascular responses to holding and carrying
weights by hand and shoulder harness, Journal of Applied Physiology, 25, 261 ± 267.
104 M . Rayson et al.

M C D AN IEL , J. W., S K AND IS, R. J. and M AD OLE , S. W. 1983, Weight Lift Capabilities of Air Force
Basic Trainees. AFAMRL Report TR-83-0001 (Ohio: Air Force Aerospace Medical
Research Laboratory, Wright ± Patterson Air Force Base ).
M ELLO , R. P., D A MOK OSH , A. I., R EYN OLD S, K. L., W ITT , C. E. and V OG EL , J. A. 1988, The
Physiological Determinants of Load Bearing Performance at DiŒerent March Distances.
USARIEM Technical Report T15 / 88 ( Natick: US Army Research Institute of
Environmental Medicine ).
M ELLO , R. P., N IN DL , B. C., S HAR P , M. A., R ICE , V. J., B ILLS , R. K. and P ATTON , J. F. 1995,
Predicting lift and carry performance from muscular strength, anaerobic power and
body composition variables, Medicine and Science in Sports and Exercise, 27 (suppl. ),
S152.
M ITA L , A. 1985, Use of anthropometry and dynam ic strength in developing placement and
screening procedures for workers, in H. J. Bullinger and H. J. Warnecke ( eds ), Towards
the Factory of the Future ( Heidelberg: Springer ).
M ITA L , A., A G HAZAD EH , F. and K AR WOW SKI , W. 1986a, Importance of isometric and isokinetic
lifting strengths in estimating maximum lifting capacities, Journal of Safety Research,
17, 65 ± 71.
M ITA L , A., K ARW OW SK I, W., M AZOU Z , A. K. and O R SA RH , E. 1986b, Prediction of maximum
weight of lift in the horizontal and vertical planes using simulated job dynam ic
strengths, American Industrial Hygiene Association Journal, 47, 288 ± 291.
M YERS , D. C., G EBH ARD T , D. L., C R UM P , C. E. and F LEISHM AN , E. A. 1984, Validation of the
Military Entrance Physical Strength Capacity Test. Technical Report 610 (Natick: US
Army Research Institute of Environmental Medicine ).
N OTTR ODT , J. W. and C ELENTA N O , E. J. 1987, Developm ent of predictive selection and
placement tests for personnel evaluation, Applied Ergonomics, 18, 279 ± 288.
P IN DER , A. D. and G RIEVE , D. W. 1997, Hydro-resistive measurement of dynam ic lifting
strength, Journal of Biomechanics, 30, 399 ± 402.
P OULSEN , P. E. 1970, Prediction of maximum loads in lifting from measurem ents of muscular
strength, Communications from the Danish National Association for Infantile Paralysis,
no. 31 (Hellerup, Denm ark ).
P YTEL , J. L. and K AMO N , E. 1981, Dynamic strength test as a predictor for maximal and
acceptable lifting, Ergonomics, 24, 663 ± 672.
R AM SBO TTOM , R., B REW ER , J. and W ILLIAMS , C. 1988, A Progressive Shuttle Run test to estimate
maximal oxygen uptake, British Journal of Sports Medicine, 22, 141 ± 144.
R AY SON , M. P. 1997, Selection standards for physically-demanding occupations. PhD thesis,
University of Birmingham.
R AY SON , M. P. 1998, The developm ent of physical selection procedures for the British Army.
Phase 1: Job analysis and identi® cation of criterion tasks, in M. A Hanson (ed. ),
Contemporary Ergonomics 1998 (London: Taylor & Francis ), 393 ± 397.
R AY SON , M. P. and H OLLIM AN , D. E. 1995, Physical Selection Standards for the British Army.
Phase 4: Predictors of Task Performance in Trained Soldiers. Technical Report DRA /
CHS / PH YS / CR95 /017 (Farnborough: Defence and Evaluation Research Agency ).
R AY SON , M. P., B ELL , D. G., H OLLIMA N , D. E., L LEWELY N , M., N EVOLA , V. R. and B ELL , R. L.
1994a, Physical Selection Standards for the British Army: Phases 1 and 2. Technical
Report 94R036 ( Farnborough: Army Personnel Research Establishment ).
R AY SON , M. P. D A VIES, A. and S TR OUD , M. A. 1993, The physiological predictors of maximum
load carriage capacity in trained females, in Proceedings, UK Sport: Partners in
Performance ( London: Sports Council ), 212.
R AY SON , M. P., H OLLIMA N , D. E. and B ELL , D. G. 1994b, Physical Selection Standards for the
British Army. Phase 3: Development of Physical Selection Tests and Pilot Study.
Technical Report DRA /CHS /WP04006 ( Farnborough: Defence and Evaluation
Research Agency ).
R ICE , V. J. and S H ARP , M. A. 1994, Prediction of performance on two stretcher-carry tasks,
Work, 4, 201 ± 210.
S HA RP , D. S., W RIG HT , J. E., V O GEL , J. A., P ATTON , J. F., D A NIEL , W. L., K NAP IK , J. and K ORVAL ,
D. M. 1980, Screening for Physical Capacity in the US Army. An Analysis of Measures
Predictive of Strength and Stamina. Technical Report T8 /80 ( Natick: US Army
Research Institute of Environmental Medicine ).
Physical selection procedures for the British Army 105

S TEVENS ON , J. M., A NDR EW , G. M., B RYA NT , J. T. and T HO MSON , J. M. 1985, Development of


Minimum Physical Fitness Standards for the Canadian Armed Forces. Phase 1. DSS
Contract 8SE85-00017 ( Kingston, Ont.: Ergonom ics Research Laboratory, Queen’ s
University ).
S TEVENS ON , J. M., A NDR EW , G. M., B RYA NT , J. T. and T HO MSON , J. M. 1988, Development of
Minimum Physical Fitness Standards for the Canadian Armed Forces. Phase III. DSS /
DND Contract W8477-7-SC02 / 01-ST ( Kingston, Ont.: Ergonom ics Research Labora-
tory, Queen’ s University ).
S TEVENS ON , J. M., B RYAN T , J. T., A ND REW , G. M., S M ITH , J. T., F R EN CH , S. L., T H OMSON , J. M.
and D EAKIN , J. M. 1992, Developm ent of physical ® tness standards for Canadian Armed
Force’ s younger personnel, Canadian Journal of Sport Science, 17, 214 ± 221.
S TEVENS ON , J. M., B RYAN T , J. T., A ND REW , G. M., S M ITH , J. T., F R EN CH , S. L., T H OMSON , J. M.
and D EAKIN , J. M. 1994, Developm ent of physical ® tness standards for Canadian Armed
Force’ s older personnel, Canadian Journal of Applied Physiology, 19, 75 ± 90.
S TEVENS ON , J. M., G R EENH ORN , D. R., B RYAN T , J. T., D EAK IN , J. M. and S M ITH , J. T. 1996,
Gender diŒerences in performance of a selection test using the incremental lifting
machine, Applied Ergonomics, 27, 45 ± 52.
T EVES , M. A., W RIGH T , J. E. and V O GEL , J. A. 1985, Performance on Selected Candidate
Screening Test Procedures before and after Army Basic and Advanced Individual
Training. Technical Report T13 / 85 ( Natick: US Army Research Institute of Environ-
mental Medicine ).
V OGEL , J. A. 1992, Obesity and its relation to physical ® tness in the US military, Armed Forces
and Society, 18, 497 ± 513.