Académique Documents
Professionnel Documents
Culture Documents
Plymouth Marine
vzw)
MSTIJuu t v o o r d f ZE)
<S MARINE INSTITUTE
Bei ginn-.
P ublished 2001
by
P R IM E R -E Ltd
Cover photos: Steve Smith (University o f New England, Armidale, NSW, Australia), Paul Somerfield (Plymouth
Marine Laboratory, UK), and Ashley Rowden (NIWA, Wellington, New Zealand)
CONTENTS
INTRODUCTION
CHAPTER 9 Transformations
CHAPTER 13 Data requirements for biological effects studies: which components and
attributes o f the marine biota to examine?
INTRODUCTION
As a result o f the authors’ own research interests and Throughout the manual, extensive use is made o f data
the widespread use of community data in pollution sets from the published literature to illustrate the tech
monitoring, a major thrust of the manual is the biological niques. Appendix 1 gives the original literature source
effects o f contaminants but, again, most of the methods for each o f these 25 or so data sets and an index to all
are much more generally applicable. This is reflected the pages on which they are analysed. Each data set
in a range o f more fundamental ecological studies is allocated a single letter designation and, to avoid
among the real data sets exemplified here. confusion, referred to in the text o f the manual by that
letter, placed in curly brackets (e.g. {A} = Amoco-Cadiz
The literature contains a large array o f sophisticated oil spill, macrofauna; {B} = Bristol Channel, Zoo
statistical techniques for handling species-by-samples plankton; {C} = Celtic Sea, Zooplankton etc).
matrices, ranging from their reduction to simple diver
sity indices, through curvilinear or distributional Literature citation
representations o f richness, dominance, evenness etc.,
to a plethora o f multivariate approaches involving This 2nd edition o f the manual follows the 1st edition
clustering or ordination methods. This manual does closely in respect o f the first 15 chapters, though minor
not attempt to give an overview of all the options, or revisions have been made throughout. Chapters 16
even the majority of them. Instead it presents a strategy and 17 are entirely new. Appendix 2 lists some back
which has evolved over several years within the ground papers appropriate to each chapter, including
Community Ecology/Biodiversity group at Plymouth the source o f specific analyses, and a full listing of
Marine Laboratory (PML), and which has a proven references cited is in Appendix 3.
track record in interpretation o f a wide range o f marine
community data; see, for example, papers listed under Whilst the manual is genuinely collaboratively authored,
Clarke or Warwick in Appendix 3 (which have attained for the purposes o f directing queries on specific topics
four-figure total citations in SCI journals). The analyses it is broadly true that the first author (KRC) bears the
and displays in these papers, and in this manual, almost responsibility for the chapters on statistical methods
all draw upon the wide range o f routines available in (1-7, 9, 11) and the second author (RMW) is mainly
the PRIMER package (though in many cases annotations responsible for the chapters on interpretation (10, 12-
etc in plots have been further edited by simple importing 14), the responsibility for Chapters 8 and 15 being
into graphics programs such as Microsoft Powerpoint). shared more or less equally. Chapters 16 and 17 were
written by KRC, drawing on the results o f joint papers
Note also that, whilst other software packages will in various authorship combinations by KRC, RMW
not encompass this specific combination of routines, and Paul Somerfield (also o f the Plymouth Marine
several o f the individual techniques can be found Laboratory). Since this manual is not accessible within
elsewhere. For example, the core clustering and the published literature, referral to the methods it
ordination methods described here are available in describes would properly be by citing the primary
several mainstream statistical packages (SAS, S-Plus, papers on which it is based; these are indicated in the
Systat, Statgraphics etc.), and more specialised stat text and Appendix 2. Alternatively, comprehensive
istical programs (CANOCO, PATN, PC-ORD, the discussion o f the philosophy (and many o f the details)
Cornell Ecology programs, etc.) tackle essentially of the multivariate and univariate approaches advocated
similar problems, though usually employing different can be found in Clarke (1993, 1999) and Warwick
techniques and a different strategy. (1993), respectively, with the newer methods in this
edition best summarised in Clarke and Warwick
Practical use (1998a), Somerfield and Clarke (1995) and Warwick
The arrangement o f topics, and level o f exposition, and Clarke (2001).
have benefited from experience gained at several
training workshops funded jointly by FAO, UNEP Acknowledgements
and UNESCO/IOC, and a series of commercially-run We are grateful to a large number of individuals and
PRIMER courses at Plymouth and venues outside the institutions for their help and support - please see the
UK. The advocacy o f these techniques thus springs detailed list at the end of the manual.
not only from regular use and development within
PM L’s Community Ecology/Biodiversity group but K R Clarke
also from valuable feedback from a series of workshops R M Warwick
in which practical data analyses were central. 2001
Chapter 1
page 1-1
The purpose of this opening chapter is twofold: is expected to decrease in comparison with control
levels”). Note the contrast with the previous stage,
a) to introduce some o f the data sets which are used
however, which is restricted to demonstrating diff
extensively, as illustrations o f techniques, through
erences between groups o f samples, not ascribing
out the manual;
directionality to the change (e.g. deleterious con
b) to outline a framework for the various possible sequence).
stages in a community analysis^.
4) Linking to environm ental variables and examining
Examples are given o f some core elements o f the issues o f causality of any changes. Having allowed
recommended approaches, foreshadowing the analyses the biological information to “tell its own story”,
explained in detail later and referring forward to the any associated physical or chemical variables
relevant chapters. Though, at this stage, the details matched to the same set o f samples can be examined
are likely to remain mystifying, the intention is that for their own structure and its relation to the biotic
this opening chapter should give the reader some feei pattern (its “explanatory power”). The extent to
for where the various techniques are leading and how which identified environmental differences are
they slot together. As such, it is intended to serve actually causal to observed community changes
both as an introduction and a summary. can only really be determined by manipulative
experiments, either in the field or through laboratory
Stages /mesocosm studies.
to the total number o f individuals in the sample, groups which are mutually similar, or an ordination
and plot the cumulated percentages against the plot in which, for example, the samples are “mapped”
species rank. This, and the analogous plot based (usually in two or three dimensions) in such a way
on species biomass, are superimposed to define that the distances between pairs o f samples reflect
A B C (abundance-biomass comparison) curves their relative dissimilarity o f species composition.
(Warwick, 1986), which have proved a useful con
struct in investigating disturbance effects. Another Techniques described in detail in this manual are a
example is the species abundance distribution method o f hierarchical agglomerative clustering
(sometimes termed the distribution o f individuals (e.g. Everitt, 1980), in which samples are successive
am ongst species), in which the species are categ ly fused into larger groups, as the criterion for the
orised into geometrically-scaled abundance classes similarity level defining group membership is relaxed,
and a histogram plotted of the number of species and two ordination techniques: principal com ponents
falling in each abundance range (e.g. Gray and analysis (PCA, e.g. Chatfield and Collins, 1980) and
Pearson, 1982). It is then argued, again from emp non-metric multi-dimensional scaling (NMDS, usually
irical evidence, that there are certain characteristic shortened to MDS, Kruskal and W ish, 1978).
changes in this distribution associated with comm
unity disturbance. For each broad category o f analysis, the techniques
appropriate to each stage are now discussed, and
Such distributional techniques relax the constraint pointers given to the relevant chapters.
in the previous category that the summary from
each sample should be a single variable; here the
emphasis is more on diversity curves than single UNIVARIATE TECHNIQUES
diversity indices, but note that both these categories
share the property that comparisons between samp For diversity indices and other single-variable
les are not based on particular species identities: extractions from the data matrix, standard statistical
two samples can have exactly the same diversity or methods are usually applicable and the reader is
distributional structure without possessing a single referred to one o f the many excellent general
species in common. statistics texts (e.g. Sokal and Rohlf, 1981). The
requisite techniques for each stage are summarised in
3) M ultivariate m ethods are characterised by the fact Table 1.1. For example, when samples have the
that they base their comparisons o f two (or more) structure o f a number of replicates taken at each o f a
samples on the extent to which these samples share number of sites (or times, or conditions), computing
particular species, at comparable levels of abund the means and 95% confidence intervals gives an
ance. Either explicitly or implicitly, all multivariate appropriate representation o f the Shannon diversity
techniques are founded on such sim ilarity coeffic (say) at each site, with discrim ination between sites
ients, calculated between every pair o f samples. being demonstrated by one-way analysis o f variance
These then facilitate a classification or clustering (ANOVA), which is a test o f the null hypothesis that
(these terms are interchangeable) o f samples into there are no differences in mean diversity between
Univariate examples
Stages Diversity indices (Ch 8) Indicator taxa Biodiversity indices (Ch 17)
1) Representing
Means and 95% confidence intervals for each site/condition (Ch 8, 9, 17)
communities
3) Determining By reference to historical data fo r sites (Ch 14, 15) and regional “species pool ” (Ch 17)
stress levels Ultimately a decrease in diversity Initial increase in opportunists Loss o f taxonomic distinctness
4) Linking to
Regression techniques (Ch 11); fo r causality issues see Ch 12
environment
Chapter 1
page 1-3
sites. L inking to the environm ent is then also relat be done a priori). Such arguments lead to the tenets
ively straightforward, particularly if the environmental underlying this manual:
variables can be condensed into one (or a small number
a) community data are usually highly multivariate
of) key summary statistics. Simple or multiple regress
(large numbers of species, each subject to high
ion o f Shannon diversity as the dependent variable,
statistical noise) and need to be analysed en m asse
against the environmental descriptors as independent
in order to elicit the important biological signal
variables, is then technically feasible, though rarely
and its relation to the environment;
very informative in practice, given the over-condensed
nature of the information utilised. b) standard parametric modelling is totally invalid.
For impact studies, much has been written about the Thus, throughout, little emphasis is given to represent
effect of pollution or disturbance on diversity measures: ing communities by univariate measures, though some
whilst the response is not necessarily undirectional definitions o f indices can be found at the start of
(under the hypothesis o f Huston, 1979, diversity is Chapter 8, some brief remarks on hypothesis testing
expected to rise at intermediate disturbance levels (ANOVA) at the start of Chapter 6, a discussion of
before its strong decline with gross disturbance), transformations (to approximate normality and constant
there is a sense in which determ ining stress levels is variance) at the start o f Chapter 9, an example given
possible, through relation to historical diversity patterns of a univariate regression between biota and environ
for particular environmental gradients. Similarly, ment in Chapter 11, and a more extensive discussion
empirical evidence may exist that particular indicator of sampling properties of diversity indices, and bio
taxa (e.g. Capitellids) change in abundance along diversity measures based on taxonomic relatedness,
specific pollution gradients (e.g. of organic enrichment). makes up Chapter 17. Finally, Chapter 14 gives a
Note though that, unlike the diversity measures con series o f detailed comparisons o f univariate with
structed from abundances across species, averaged in distributional and multivariate techniques, in order to
some way11, indicator species levels or the number of gauge their relative sensitivities and merits in a range
species in a sample (£) may not initially satisfy the of practical studies.
assumptions necessary for classical statistical analysis.
For *S, the normality and constant variance conditions
EXAMPLE: Frierfjord macrofauna
can usually be produced by transformation o f the
variable (e.g. log S). However, for most individual
species, abundance across the set o f samples is likely The first example is from the IOC/GEEP practical
to be a very poorly-behaved variable, statistically workshop on biological effects o f pollutants (Bayne
speaking. Typically, a species will be absent from et a f 1988), held at the University of Oslo, August
many of the samples and, when it is present, the counts 1986. This attempted to contrast a range of biochemical,
are often highly variable, with an abundance probab cellular, physiological and community analyses, applied
ility distribution which is heavily right-skewed7. Thus, to field samples from potentially contaminated and
for all but the most common individual species, trans control sites, in a ijordic complex (Frierijord/Lang-
formation is no real help and parametric statistical esundfjord) linked to Oslofjord (/F}, Fig. 1.1). For
analyses cannot be applied to the counts, in any form. the benthic macrofaunal component of this study
In any case, it is not valid to “snoop” in a large data (Gray et a f 1988), four replicate 0.1m2 Day grab
matrix, o f typically 100-250 taxa, for one or more samples were taken at each o f six sites (A-E and G,
“interesting” species to analyse by univariate techn Fig 1.1) and, for each sample, organisms retained on
iques (any indicator or keystone species selection must a 1.0 mm sieve were identified and counted. Wet
weights were determined for each species in each
sample, by pooling individuals within species.
^ A nd thus subject to the central limit theorem, which will tend to Part o f the resulting data matrix can be seen in Table
induce statistical normality.
•j*
1.2: in total there were 110 different taxa categorised
It is the authors’ experience, certainly in the study o f benthic from the 24 samples. Such matrices (abundance, A,
communities, that the individuals o f a species are not distributed
and/or biomass, B ) are the starting point for the biotic
at random in space (a Poisson process) but are often highly clust
ered, either through local variation in forcing environmental analyses of this manual, and this example is typical in
variables or mechanisms o f recruitment, mortality and community respect of the relatively high ratio o f species to samples
interactions. This leads to counts which, in statistical terms, are (always » 1) and the prevalence o f zeros. Here, as
described as over-dispersed, combined with a high prevalence o f elsewhere, even an undesirable reduction to the 30
zeros, causing major problems in attempting parametric modelling
by categorical/log-linear methods.
“most important” species (see Chapter 2) leaves more
Chapter 1
page 1-4
Samples
Species
AÍ A2 A3 A4 BÍ B2 B3 B4
Abundance
Eh Cerianthus lloydi 0 0 0 0 0 0 0 0
Halicryptus sp. 0 0 0 1 0 0 0 0
Onchnesoma 0 0 0 0 0 0 0 0
Phascolion strombi 0 0 0 1 0 0 1 0
Golfingia sp. 0 0 0 0 0 0 0 0
Holothuroidea 0 0 0 0 0 0 0 0
Langesund Nemertina, indet. 12 6 8 6 40 6 19 7
gord Polycaeta, indet. 5 0 0 0 0 0 1 0
Amaena trilobata 1 1 1 0 0 0 0 0
Amphicteis gunneri 0 0 0 0 4 0 0 0
Ampharetidae 0 0 0 0 1 0 0 0
Anaitides groenl. 0 0 0 1 1 0 0 0
Oslofjord Anaitides sp. 0 0 0 0 0 0 0 0
Distributional examples
Table 1.4. Loch Linnhe macrofauna {L}. Abundance/biomass matrix (part only); one (pooled) set o f values per year (1963-1973).
Species A B A B A B A B
Scutopus ventrolineatus 0 0 0 0 11 0.05 0 0
Nucula tenuis 2 0.01 13 0.07 16 0.10 6 0.04
Mytilus edulis 0 0 0 0 5 0.09 0 0
Modiolus sp. indet. 0 0 0 0 0 0 0 0
Thyasira flexuosa 93 3.57 210 7.98 28 1.06 137 5.17
Myrtea spinifera 214 27.39 136 17.41 2 0.26 282 36.10
Lucinoma borealis 12 0.39 26 1.72 0 0 22 0.73
Montacuta ferruginosa 1 0 0 0 4 0.02 0 0
Mysella bidentata 0 0 0 0 0 0 0 0
Abra sp. indet. 0 0 0 0 12 0.26 0 0
Corbula gibba 2 0.13 8 0.54 9 0.27 2 0.13
Thracia sp. indet. 0 0 0 0 0 0 0 0
100
4
80
I
3 60
i
w
L. 40
Q) 2
>
20 1963 1964
1
64 66 68 70 72 0
100
60
40
100
O
>
VP
iS
3
E 40
3
o
1968 1969 1970
finally, reduction o f the sample information to the representation in cases where the samples are expected
recording only o f presence or absence for each species.11 to divide into well-defined groups, perhaps structured
At the former end of the spectrum all attention will be by some clear-cut environmental distinctions. Where,
focused on the dominant counts, at the latter end on on the other hand, the community pattern is responding
the rarer species. to abiotic gradients which are more continuous, then
representation by an ordination is usually more approp
For the clustering technique, representation o f the riate. The method of non-metric MDS (Chapter 5)
communities for each sample is by a dendrogram (e.g. attempts to place the samples on a “map”, usually in
Fig. 1.7a), linking the samples in hierarchical groups two dimensions (e.g. see Fig. 1.7b), in such a way that
on the basis o f some definition of similarity between the rank order o f the distances between samples on
each cluster (Chapter 3). This is a particularly relevant the map exactly agrees with the rank order of the match
ing (dis)similarities, taken from the triangular similarity
matrix. If successful, and success is measured by a
Cumbrae
stress coefficient which reflects lack o f agreement in
Bute the two sets of ranks, the ordination gives a simple
and compelling visual representation o f “closeness”
Ayrshire o f the species composition for any two samples.
</> 16
Site 5 Site 6 Site 7 Site 8
0) 12
16
14 Site Site 10 Site 11 Site 12
12
10 Fig. 1.6. Garroch Head
8
macrofauna {G}, Plots o f
6
number o f species against
4
2 number o f individuals per
0 species in x2 geometric
1 3 5 7 9 11 13 1 11 13 1 3 5 7 9 11 13 3 5 7 9 11 13 classes, fo r the 12 sampling
Geometric abundance class sites o f Fig. 1.5.
^ The PRIMER routines automatically offer this set o f transformation choices, applied to the whole
data matrix, but also cater fo r more selective transformations o f particular sets o f variables, as is
often appropriate to environmental rather than species data.
Chapter 1
page 1-9
Multivariate examples
Stages Hierarchical clustering (Ch 2, 3) MDS ordination (Ch 5) PCA ordination (Ch 4)
Table 1.6. Frierfjord macrofauna {F} Bray-Curtis similarities, because it comes into its own in the analysis o f envir
after Vf-transformation o f counts, fo r every pair o f replicate onmental samples. Abiotic variables (sediment grain
samples from sites A, B, C only (four replicates per site).
size, salinity, contaminant levels etc.) are usually
AÍ A2 A3 A4 BÍ B2 B3 B4 Cl C2 C3 C4 relatively few in number, are continuously scaled,
AÍ - and their distributions can be transformed so that
A2 61 -
standard correlation coefficients (and Euclidean dist
A3 69 60 -
ances) are appropriate ways of describing their inter
A4 65 61 66 -
C2 40 34 26 29 48 69 62 56 56 -
a
3-
67
20 72
2 66
71 68
30 1" 69
0- 63
40 -1 - 73
-2 -
50 65
-3 - 64
60 -4 - 70
-5 -
70 -6 -4 -2 0 2 4 6
PC1
80
Fig. 1.8. Loch Linnhe macrofauna {L}. Two-dimensional princ
90 ipal components analysis (PCA) ordination o f the aW-transform
ed abundances from the 11 years 1963—1973, omitting the less-
common species.
100
G GGG EE EE D D D D C C B B C B B C A A A A
Clarke, 1993).^ It is possible to employ the same test
b in connection with PCA, using an underlying dissimil
arity matrix of Euclidean distances, though when the
D ordination is of a relatively small number o f environ
C mental variables, which can be transformed into approx
imate multivariate normality, then abiotic differences
A D
between sites can be tested by a standard multivariate
A A A
equivalent o f ANOVA (MANOVA, e.g. Mardia et al,
D
D
1979).
B C
B
Part of the process o f discriminating sites, times, treat
o
o
Biota +C C,N,Cd
Fig. 1.9. Garroch Head macrofauna {G}. a) MDS ordination o f Bray-Curtis similarities from ^-transformed species biomass data fo r
the sites shown in Fig. 1.5; b) the same MDS bat with superimposed circles o f increasing size, representing increasing carbon concentrat
ions in matched sediment samples; c) ordination o f (log-transformed) carbon, nitrogen and cadmium concentrations in the sediments at
the 12 sites.
for drawing general inferences about the pollution sites is shown in Fig. 1,9a; this is based on Bray-Curtis
status of an isolated group o f samples. Even in comp similarities computed from (transformed) species
arative studies, on the face o f it there is not a clear biomass values.+ A steady change in the community
sense o f directionality o f change (e.g. deleterious is apparent as the dump centre (site 6) is approached
ness), when it is established that communities at putat- along the western arm o f the transect (sites 1 to 6),
ively impacted sites differ from those at control sites. with a mirrored structure along the eastern arm (sites
Nonetheless, there are a number o f ways in which 6 to 12), so that the samples from the two ends of the
such directionality has been ascribed in published transect have similar species composition. That this
studies, whilst retaining an essentially multivariate biotic pattern correlates with the organic loading o f
form o f analysis (Chapter 15): the sediments can best be seen by superimposing the
values for a single environmental variable, such as
a) a meta-analysis - a combined ordination of data Carbon concentration, on the MDS configuration. Fig.
from NE Atlantic shelf waters, at a coarse level o f 1,9b represents C values by circles of differing diameter,
taxonomic discrimination1 - suggests a common placed at the corresponding site locations on the MDS,
directional change in the balance of taxa under a and the pattern across sites of the 11 available environ
variety of types o f pollution/disturbance (Warwick mental variables (sediment concentrations of C, N, Cu,
and Clarke, 1993a); Cd, Zn, Ni, etc.) can be viewed in this way (Chapter
11).§
b) a number o f studies demonstrate increased m ulti
variate dispersion among replicates under impacted A different approach is required in order to answer
conditions, in comparison to controls (Warwick questions about combinations o f environment
and Clarke, 1993b); iables, for example to what extent the biotic pattern
can be “explained” by knowledge o f the full set, or a
c) another feature of disturbance, demonstrated in a subset, of the abiotic variables. Though there is clearly
spatial coral community study (but with wider one strong underlying gradient in Fig. 1,9a (horizontal
applicability to other spatial and temporal patterns), axis), corresponding to an increasing level o f organic
is a loss o f smooth sériation along transects o f enrichment, there are nonetheless secondary community
increasing depth, again in comparison to controls differences (e.g. on the vertical axis) which may be
in time and space (Clarke et al, 1993). amenable to explanation by metal concentration diff
erences, for example. The heuristic approach adopted
Two methods o f linking m ultivariate biotic patterns
to environm ental variables are explored in Chapter
11; these are illustrated here by the Garroch Head
* Chapter 13, and the meta-analysis
dump-ground study described earlier (Fig. 1.5). The the relative merits and drawbacks o f using species abundance or
MDS o f the m acrofaunal com m unities from the 12 species biomass when both are available; in fact, Chapter 13 is a
wider discussion o f the relative advantages o f sampling particular
components o f the marine biota, fo r a study on the effects o f
pollutants.
The effect o f canying out the various graphical and multivariate ^ The PRIMER MDS routine allows simple and flexible superimp
analyses at taxonomic levels higher than species is the subject o f osition o f individual variables in this way, whether environmental
Chapter 10. (as here) or species values.
Chapter 1
page 1-12
Table 1.7. Nutrient enrichment experiment, Solbergstrand mesocosm, Norway {N}. Meiofaunal abundances (shown fo r copepods only)
from four replicate boxes fo r each o f three treatments (Control, Low and High levels o f added nutrients).
Halectinosoma gothiceps 0 0 1 1 16 23 8 16 0 1 0 0
Danielssania fusiformis 1 1 1 1 1 3 8 5 1 0 0 3
Tisbe sp.1 (gracilis group) 0 0 0 0 0 0 0 0 2 27 119 31
Tisbe sp. 2 0 0 0 0 45 22 39 25 6 0 3 32
Tisbe sp. 3 0 0 0 0 86 83 88 0 5 29 0 20
Tisbe sp. 4 0 0 0 0 151 249 264 87 8 0 0 34
Tisbe sp. 5 0 0 0 0 129 0 0 115 4 0 1 40
Typhlamphiascus typhlops 4 2 2 4 5 8 4 3 0 0 0 0
Bulpamphiascus imus 1 0 0 2 0 0 0 0 0 0 0 0
Stenhelia reflexa 3 1 0 1 2 0 0 0 0 0 0 0
Amphiascus tenuiremis 1 0 0 0 0 0 2 6 0 0 0 0
Ameira parvula 0 0 0 0 4 2 3 2 2 0 1 2
Proameira simplex 0 0 0 0 0 2 0 5 0 0 0 0
Leptopsyllus paratypicus 0 0 1 0 0 0 0 0 0 0 0 0
Enhydrosoma longifurcatum 2 2 1 2 3 1 0 0 0 0 0 0
Laophontidae indet. 0 0 0 0 0 0 1 0 0 0 0 0
Ancorabolis mirabilis 3 0 4 4 2 18 3 3 27 3 1 0
Unidentified Copepodites 0 0 1 0 1 1 1 3 0 1 0 0
here is to display the multivariate pattern of the environ by correlating the underlying dissimilarity matrices
mental data, ask to what extent it matches the between- rather than the ordinations themselves, in parallel with
site relationships observed in the biota, and then max the reasoning behind the ANOSIM tests, discussed
imise some m atching coefficient between the two, by earlier.
examining possible subsets o f the abiotic variables
(the BIO-ENV or BVSTEP procedures, Chapters 11 The suggestion is therefore that the biotic pattern o f
and 16 respectively)11. the Garroch Head sites is associated not ju st with an
organic enrichment gradient but also with a particular
Fig. 1.9c is based on this optimal subset for the Garroch heavy metal. It is important, however, to realise the
Head sediment variables, namely (C, N, Cd). It is an limitations o f such an “explanation”. Firstly, there are
MDS plot, using Euclidean distance for its dissimilar usually other combinations o f abiotic variables which
itie s/ and is seen to replicate the pattern in Fig. 1.9a will correlate nearly as well with the biotic pattern,
rather closely. In fact, the optimal match is determined particularly as here when the environmental variables
are strongly inter-correlated amongst themselves.
Secondly, there can be no direct implication o f
^ The BIOENV routine in PRIMER optimises the match over all
o f the link between these abiotic variables and the
combinations o f abiotic variables. Where this is not computation
ally feasible, the BVSTEP routine performs a stepwise search, community structure, based solely on field survey
adding (or subtracting) single abiotic variables at each step, much data: the real driving factors could be unmeasured but
as in stepwise multiple regression. This also allows generalisation happen to correlate highly with the variables identified
to pattern-matching scenarios other than abiotic-to-biotic. For as producing the optimal match. This is a general
example, BVSTEP allows selection o f a subset o f species whose
multivariate structure matches, to a high degree, the pattern fo r
feature o f inference from purely observational studies
the fu ll set o f species (Chapter 16); this provides a more general and can only be avoided formally by “randomising
alternative to the SIMPER procedure (o f Chapter 7), for identifying out” effects o f unmeasured variables; this requires
influential species. random allocation o f treatments to observational units
It is, though, virtually indistinguishable in this case from a PCA, for field or laboratory-based community
because o f the small number o f variables and the implicit use o f (Chapter 12).
the same dissimilarity matrix fo r both techniques.
Chapter 1
page 1-13
C opepods N em atodes with some boxes remaining undosed (control, C). Fig.
1.10 shows the MDS plots o f the four replicate boxes
H C H H from each treatment, separately for the copepod and
C H nematode components o f the meiofaunal communities
C ^
(see also Chapter 12). For the copepods, there is a
o
o
C h L
o
SIMILARITY FOR QUANTITATIVE that is accounted for by each species. Thus each
matrix entry is divided by its column total (and
DATA MATRICES
multiplied by 100) to form the new array. Such
standardisation will be essential if, for example,
Data matrix differing a n d unknow n volumes o f sediment or
water are sampled, so that absolute numbers o f
The available biological data is assumed to consist of individuals are not comparable between samples.
an array o f p rows (species) and n columns (samples), Even if sample volumes are the same (or, if different
whose entries are counts o f each species for each and known, abundances are adjusted to a unit sample
sample, or the total biomass o f all individuals or their volume), it may still sometimes be biologically
percentage cover, or some other “quantity” of each relevant to define two samples as being perfectly
species in each sample. This includes the special case similar when they have the same % composition o f
where only presence (1) or absence (0) of each species species, fluctuations in total abundance (/biomass
is known. For the moment nothing further is assumed /cover) being o f no interest. This is not the normal
about the structure of the samples. They might consist situation, however, changes in total abundance
of one or more replicates (repeated samples) from a usually having meaningful interpretation in quantit
number o f different sites, times or experimental ative sampling.
"treatments” but this information is not used in the
initial analysis. The strategy outlined in Chapter 1 is c) A reduction to simple presence or absence of each
to observe any pattern of similarities and differences species may be all that is justifiable. For example,
across the samples (i.e. let the biology "tell its own sampling artefacts may make quantitative counts
story”) and, only later, compare this with known or totally unreliable, or concepts of abundance may
hypothesised inter-relations between the samples based be difficult to define for some important faunal
on environmental or experimental factors. components.
itself is always 100%!) or the upper right triangle (the entirely equivalent, as can be seen from some simple
similarity o f sample j to sample k is the same as the algebra or by calculating a few examples):
similarity o f sample k to sample j , o f course).
j X/=i ¡Tzj ~ yjk I ^
Similarity matrices are the basis (explicitly or implicitly) Sjk =100
of many multivariate methods, both in the representation X m Ov
given by a clustering or ordination analysis and in some
(2 . 1)
associated statistical tests. A similarity matrix can be
used to: lo o z r =i 2m inQ w ,A )
a) discriminate sites (or times) from each other, by Em o ’y+ y ík )
noting that similarities between replicates within a
site are consistently higher than similarities between Here y,j represents the entry in the z'th row and yth
replicates at different sites (ANOSIM test, Chapter column o f the data matrix, i.e. the abundance (/biomass
6); /cover) for the zth species in the yth sample (i= 1, 2, ...,y?;
j = 1,2, ..., n). Similarly, y ik is the count for the zth
b) cluster sites into groups that have similar comm
unities, so that similarities within each group of species in the Ath sample. | ... | represents the absolute
sites are usually higher than those between groups value of the difference (the sign is ignored) and min(.,.)
(Clustering, Chapter 3); the minimum o f the two counts; the separate sums in
the numerator and denominator are both over all rows
c) allow a gradation o f sites to be represented graph (species) in the matrix.
ically, in the case where site A has some similarity
with site B, B with C, C with D but A and C are
less similar, A and D even less so etc. (Ordination, EXAMPLE: Loch Linnhe macrofauna
Chapter 4).
A trivial example, used in this and the following chapter
Species similarity matrix to illustrate simple manual computation of similarities
and hierarchical clusters, is provided by extracting six
In a complementary way, the original data matrix can species and four years from the Loch Linnhe macro
be thought o f as describing the pattern of occurrences fauna data {L} o f Pearson (1975), seen already in Fig.
o f each species across the given set o f samples, and a 1.3 and Table 1.4. (O f course, arbitrary extraction o f
matching triangular array o f similarities can be con “interesting” species and years is not a legitimate
structed between every pair o f species. Two species procedure in a real application; it is done here simply
are “similar” (S ' near one) if they have significant as a means of showing the computational steps.)
representation at the same set of sites, and totally
“dissimilar” (S ' = 0) if they never co-occur. Species Table 2.1. Loch Linnhe macrofauna {L} subset, (a) Abundance
similarities are discussed later in this chapter, and the (unstransformed) fo r some selected species and years, (b) The
resulting Bray-Curtis similarities between every pair o f samples.
resulting clustering and ordination diagrams in Chapter
7, but for the bulk o f this manual “similarity” refers (a) Year: 64 68 71 73 (b)
to between-sample similarity. (Sample: 1 2 3 4) Sample 1 2 3 4
Species 1 -
the similarities all appear too low; samples 2 and 3 Myrioche. 2.7 0 0 1.3 3 0 68 -
would seem to deserve a similarity rating higher than
Labidopl 7.7 2.5 0 1.8 4 52 68 42 -
50%. As will be seen later, this is not an important
Amaeana 0 1.9 3.5 1.7
consideration since the most useful multivariate methods
Capitella 0 3.4 4.3 1.2
depend on the relative order (ranking) of the similarities
Mytilus 0 0 0 0
in the triangular matrix, rather than their absolute
values. More importantly, the similarities o f Table
2 . 1b are unduly dominated by the counts for the two
most abundant species (4 and 5), as can be seen from Bray-Curtis is the main coefficient calculated by the PRIMER
studying the form of equation (2 . 1): terms involving Similarity routine, which also allows a range o f transformations
o f the data.
Chapter 2
page 2—4
In fact, for very variable data, choice o f transformation Z,Oy -y-jXyik -y-0 (2.3)
can sometimes be more critical than choice of similarity rjk =
coefficient or ordination technique, and the subject > /Z »( y ‘j ÿ - j )2 Z i ( y ik - ÿ-k
~
therefore merits a chapter to itself (Chapter 9).
where is defined as the mean value over all species
Canberra coefficient for the yth sample. In this form it is not a similarity
An alternative to transformation is to select a similarity coefficient, since it takes values in the range (-1, 1),
coefficient that automatically balances the weighting not (0, 100), with positive correlation (r near +1) if high
given to each species when computed on original counts counts in one sample match high counts in the other,
(/biomass/cover). One such possibility given by Lance and negative correlation (r < 0) if high counts match
and Williams (1967), and referred to as the Canberra absences. There are a number o f ways o f converting
coefficient, defines similarity between sample j and r to a similarity coefficient, the most obvious for
sample k as: community data being S = 50(1+r).
of standardisation carried out for each species, either by 1 (presence) or 0 (absence) and Bray-Curtis similarity
by the species total or maximum value across all (say) computed. This will have the effect o f giving
samples); potentially equal weight to all species, whether rare
or abundant (and will thus have somewhat similar effect
f) it has the flexibility to register differences in total
to the Canberra coefficient).
abundance for two samples as a less-than-perfect
similarity when the relative abundances for all species
Many similarity coefficients have been proposed based
are identical (some coefficients standardise auto
on (0, 1) data arrays; see for example, Sneath and
matically by sample totals, so cannot reflect this
Sokal (1973) or Legendre and Legendre (1998). When
component o f similarity/difference).
computing similarity between samples j and k, the
two columns of data can be reduced to the following
In addition, Faith et al (1987) use a simulation study
four summary statistics without any loss of relevant
to look at the robustness of various similarity coeff
information:
icients in reconstructing a (non-linear) ecological
response gradient. They find that Bray-Curtis and a a = the number o f species which are present in both
very closely-related modification, the Kulczynski samples;
coefficient (Kulczynski 1928)
b = the number of species present in sample j but absent
from sample k\
Y p . m i n t y , , , y ik)
S , ¿ = 1 0 0 ---------^ ' =1 - ? - - ------ (2.4) c = the number of species present in sample k but absent
from sampley;
perform most satisfactorily. d = the number of species absent from both samples.
i.e. S is the probability ( x i 00) that a single species 3) Some coefficients (such as the Canberra), which
picked at random (from the reduced species list) will separately scale the contribution o f each species to
be present in both samples. adjust for this, have a tendency to over-compensate,
i.e. rare species, which may be arbitrarily distributed
A popular coefficient found under several names, across the samples, are given equal weight to common
commonly Sorenson or Dice, is ones. The same criticism applies to reduction of the
original matrix to simple presence/absence of each
species. In addition, the latter loses potentially
S¡¡c = 100[2a/(2a + b + c)] (2.7)
valuable information about the approximate pre
valence of a species (absent, rare, present in modest
Note that this is identical to the Bray-Curtis coeff numbers, common, very abundant etc).
icient when the latter is calculated on (0, 1) presence
/absence data, as can be seen most clearly from the 4) A balanced compromise is often to apply the Bray-
second form o f equation (2.1).11 For example, reducing Curtis similarity to counts (/biomass/cover values)
Table 2.1a to (0, 1) data, and comparing samples 1 and which have been moderately, V y , or fairly severely
4 as previously, equation (2.1) gives: transformed, log(l+y) or VVy. All species then
contribute something to the definition o f similarity,
whilst the retention o f some information on the
o o { ® ± !± i± ^ ± i!U 5 7 .i
14 1 1 + 2 + 2 + 1+1 + 0 prevalence o f a species ensures that the commoner
species are generally given greater weight than the
rare ones.
This is clearly the same construction as substituting
a = 2 , b = 1, c = 2 into equation (2.7).
5) Initial standardisation is occasionally desirable,
dividing each count by the total abundance of all
Several other coefficients have been proposed; Legendre species in that sample; this is essential when non
and Legendre (1998) list at least 15, but only one further comparable, unknown sample volumes have been
measure is given here. In the light o f the earlier discuss taken. Without this column standardisation, the
ion on coefficients satisfying desirable, biologically- Bray-Curtis coefficient will reflect differences
motivated criteria, note that there is a presence/absence between two samples due both to differing commun
form of the ulczynskicoefficient (2.4), a close relative
K ity composition and differing total abundance. The
o f Bray-Curtis, namely: standardisation removes any effect o f the latter;
whether this is desirable is a biological rather than
a a statistical question. (Experience with benthic
Sjk = 50 (2.8)
a+ b a+ c communities suggests that standardisation should
V
usually be avoided, valuable biological information
being contained in the abundance, biomass or cover
totals). Note, however, that column standardisation
RECOMMENDATIONS does not remove the need subsequently to transform
the data matrix, if the similarities are to take account
1) In most ecological studies, some intuitive axioms o f more than just the few commonest species/
for desirable practical behaviour of a similarity
coefficient lead inexorably to the use of the Bray-
Curtis measure (or a closely-related coefficient SPECIES SIMILARITIES
such as that o f Kulczynski).
Starting with the original data matrix o f abundances
2) Similarities calculated on original abundance (or
(or biomass, % cover etc), the similarity between any
biomass) values can often be over-dominated by a
pair of species can be defined in an analogous way to
small number of highly abundant (or large-bodied)
that for samples, but this time involving comparison
species, so that they fail to reflect similarity of overall
o f the zth and /th row (species) across all j — I, ..., n
community composition.
columns (samples).
^ Thus the Sorensen coefficient can be obtained in the PRIMER f In the PRIMER Similarity routine, standardisation is not the default
Similarity routine by “transforming” the data to presence /absence option fo r sample similarities but, if selected, it is therefore carried
and selecting Bray-Curtis similarity. out before any transformation.
Chapter 2
page 2-7
However, different initial treatment of the data is in the samples analysis was to reduce (though not
required, in two respects. totally remove) the large disparities in counts between
species; the standardisation by row total has here
1) Similarities between rare species have little meaning; removed such differences.
very often such species have single occurrences,
distributed more or less arbitrarily across the sites, Correlation coefficient
so that S' is usually zero (or occasionally 100). If
these values are left in the similarity matrix they The standard product moment correlation coefficient
will tend to confuse and disrupt the patterns in any defined in equation (2.3), and subsequently modified
subsequent clustering or ordination analysis; the to a similarity, is perhaps more appropriate for defining
rarer species should thus be omitted from the data species similarities than it was for samples, in that it
matrix before computing species similarities. automatically incorporates a type of row standardisation.
In fact, this is a full normalisation (subtracting the
2) A different form of standardisation of the data matrix row mean from each count and dividing by the row
is appropriate and (in contrast to the samples analysis) standard deviation) and it is less appropriate than the
it usually makes sense to carry this out routinely in simple row standardisation above. In addition, the
place o f a transformation. Two species could have previous argument about the effect of joint absences
quite different mean levels o f abundance yet be is equally appropriate to species similarities: an inter
“perfectly sim ilar” in the sense that their counts are tidal species is no more similar to a deep-sea species
in strict ratio to each other across the samples. One because neither is found in shelf samples. A correlation
species might be of much larger body size, and thus will again be a function o f joint absences; the Bray-
tend to have smaller counts, for example; or there Curtis coefficient will not.
might be a direct host-parasite relationship between
the two species. It is therefore appropriate to stand
RECOMMENDATION
ardise the original data by dividing each entry by
its row (species) total, and multiplying by 100:
For species similarities, a coefficient such as Bray-
Curtis calculated on row-standardised and untrans
ya y ík (2.10) formed data seems most appropriate. The rarer species
(usually at least half o f the species set) should first be
before computing the similarities (S'). The effect removed from the matrix, to have any chance o f an
o f this can be seen from the artificial example in the interpretable clustering or ordination analysis. There
following table, for three species and five samples. are several ways of doing this, all o f them arbitrary to
For the original matrix, the Bray-Curtis similarity some degree. Field et al (1982) suggest removal of
between species 1 and 2, for example, is only S' = all species that never constitute more than p% o f the
33% but the two species are found in strict proportion total abundance (/biomass/cover) of any sample, where
to each other across the samples so that, after row p is chosen to retain around 50 or 60 species (typically
standardisation, they have a more realistic similarity p - 3%, or so, for soft-sediment benthic data). This is
of S' = 100%. Note that it is not clear that a trans preferable to simply retaining the 50 or 60 species
formation now serves any useful purpose. Its role with the highest total abundance across all samples,
Chapter 2
page 2-8
since the latter strategy may result in omitting several Euclidean distance
species which are key constituents of a site which is
characterised by a low total number of individuals.^1 The natural distance between any two points in space
It is important to note, however, that this inevitably is referred to as E uclidean distance (from classical or
arbitrary process of omitting species is not necessary Euclidean geometry). In the context of a species abund
for the more usual between-sample similarity calcul ance matrix, the Euclidean distance between samples
ations. There the computation o f the Bray-Curtis j and k is defined algebraically as:
coefficient downweights the contributions o f the less
common species in an entirely natural and continuous Jjk = J z i .U y » ■y*) (2.13)
fashion (the rarer the species the less it contributes,
on average), and all species should be retained in
This can best be understood, geometrically, by taking
these calculations.
the special case where there are only two species so that
samples can be represented by points in 2-dimensional
DISSIMILARITY COEFFICIENTS space, namely their position on the two axes of Species 1
and Species 2 counts. This is illustrated below for a
The converse concept to similarity is that o f dissimil simple two samples by two species abundance matrix.
The co-ordinate points (2, 3) and (5, 1) on the (Sp. 1,
arity, the degree to which two samples are unlike each
other. Though similarity and dissimilarity are ju st Sp. 2) axes are the two samples j and k. The direct
opposite sides of the same coin, the latter is a more distance djk between them o f V[(2-5)2 + (3—1)2] (from
natural starting point in constructing ordinations, in Pythagoras) clearly corresponds to equation (2.13).
which dissimilarities (5) between pairs of samples are
turned into distances (d) between sample locations on Sp 2
Euclidean
a “map”. Thus large dissimilarity implies that samples Sample: j k J
3 Manhatten
should be located at a large distance from each other, Sp 1 2
and dissimilarities near 0 imply nearby location; 5 must Sp 2 3
therefore always be positive, of course.
S = 100-5 (2.11)
It is easy to envisage the extension o f this to a matrix
For example, for the Bray-Curtis coefficient this gives: with three species; the two points are now simply
located on 3-dimensional species axes and their straight
line distance apart is a natural geometric concept.
Zf=il>v -y*\ (2 . 12) Algebraically, it is the root o f the sums o f squared
* ,* = 1 0 0
5 X , 0 '¡j distances apart along the three axes, equation (2.13).
Extension to four and higher numbers of species (dimen
sions) is harder to envisage geometrically (in our 3-
which has limits £ = 0 (no dissimilarity) and S = 100
dimensionsal world) but the concept remains unchanged
(total dissimilarity).
and the algebra is no more difficult to understand in
However, rather than conversion from similarities, higher dimensions than three: additional squared dist
other important dissimilarity measures arise in the ances apart on each new species axis are added to the
first place as distances. Their role as implicit dis summation under the square root in (2.13). In fact,
similarity matrices underlying particular ordination this concept o f representing a species-by-samples
techniques will be seen more clearly later (e.g. in matrix as points in high-dimensional is
Principal Components Analysis, Chapter 4). a very fundamental and important one and will be met
again in Chapter 4, where it is crucial to an under
standing o f Principal Components Analysis.
The PRIMER Similarity routine will compute Bray-Curtis species
similarities, with or without row standardisation and transformation
(though the default is as recommended here). Prior to this, the Manhattan distance
Select Variables option allows reduction o f the number o f species,
by retaining those that contribute pYo or more to at least one o f Euclidean distance is not the only way o f defining dist
the samples, or by specifying the number n o f “most important”
ance apart o f two samples in species space; an altern
species to retain. The latter uses the same p% criterion but gradually
increases p until only n species are left. ative is to sum the distances along each species axis:
Chapter 2
page 2-9
S am p les
12 3 4
2 4 1 3
</>
Q) C lustering
(3 o f sa m p les
0
a
co
^ The PRIMER Similarity routine can generate Euclidean distances (normalised or not, see page 4-6), on either biotic or environmental
input matrices.
Chapter 2
page 2-10
Chapter 3
page 3-1
B2
2) O ptim ising techniques. A single set o f mutually
B3 exclusive groups (usually a pre-specified number)
B4 59 is formed by optimising some clustering criterion,
56 for example minimising a within-cluster distance
C2
measure in the species space.
C3
C4
3 ) Mode-seeking methods. These are based on consider
ations o f density o f samples in the neighbourhood
Cluster analysis (or classification) aims to find “nat o f other samples, again in the species space.
ural groupings” of samples such that samples within a
group are more similar to each other, generally, than A) C lum ping techniques. The term “clumping” is
samples in different groups. Cluster analysis is used reserved for methods in which samples can be placed
in the present context in the following ways. in more than one cluster.
a) Different sites (or different times at the same site) 5) M iscellaneous techniques.
can be seen to have differing community composi
tions by noting that replicate samples within a site Cormack (1971) also warned against the indiscriminate
form a cluster that is distinct from replicates within use o f cluster analysis: “availability o f ... classification
other sites. This can be an important hurdle to over techniques has led to the waste of more valuable scient
come in any analysis; if replicates for a site are ific time than any other ‘statistical5 innovation55. The
Chapter 3
page 3-2
Year: 64 68 71 73
Sample: 1 2 3 4 Sample 1 2 3 4 Sample 1 2&4 3 Sample 1 2&3&4
Species 1 - 1 - 1
Echinoca. 1.7 0 0 0 -» 2 25.6 _ -> 2&4 38.9 - -> 2&3&4 2 ± 9
My ri oche. 2.1 0 0 1.3 3 0.0 67.9 3 0.0 55.0 -
Labidopl. 1.7 2.5 0 1.8 4 52.2 68.1 42.0 -
Amaeana 0 1.9 3.5 1.7
Capitella 0 3.4 4.3 1.2
Mytilus 0 0 0 0
r°r
30' - 3 5 - B B
Cardiff I F
4 a
Æ
“ 36 30 "26 “ 22 "17
“ 47 “40 ENGLAND
-41
51
EXAMPLE: Bristol Channel Zooplankton The dendrogram provides a sequence o f fairly con
vincing groups; once each o f the four main groups has
formed it remains separate from other groups over a
Collins and Williams (1982) perform hierarchical
relatively large drop in similarity. Even so, a cluster
cluster analyses o f Zooplankton samples, collected by
analysis gives an incomplete and disjointed picture of
double oblique net hauls at 57 sites in the Bristol
the sample pattern. Remembering the analogy o f the
Channel UK, for three different seasons in 1974
“mobile”, it is not clear from the dendrogram alone
This was not a pollution study but a baseline survey
whether there is any natural sequence o f community
carried out by the Plymouth laboratory, as part o f a
change across the four main clusters (implicit in the
major programme to understand and model the eco
designations true estuarine, estuarine and marine,
system o f the estuary. Fig. 3.2 is a map o f the sample
euryhaline marine, stenohaline marine). For example,
locations, sites 1-58 (site 30 not sampled).
the stenohaline marine group could just as correctly
have been rotated to lie between the estuarine and
Fig. 3.3 shows the results o f a hierarchical clustering
marine and euryhaline marine groups. In fact, there is
using group-average linking on data sampled during
a strong (and more-or-less continuous) gradient o f
April 1974. The raw data were expressed as numbers
community change across the region, associated with
per cubic metre for each o f 24 holozooplankton species,
the changing salinity levels. This is best seen in an
and Bray-Curtis similarities calculated on VV-trans-
ordination of the 57 samples on which are superimposed
formed abundances. From the resulting dendrogram,
the salinity levels at each site; this example is there
Collins and Williams select the four groups determined
fore returned to in Chapter 11.
at a 55% similarity level and characterise these as
estuarine (sites 1-8, 10, 12), (9,
11, 13-27, 29), euryhaline (28, 3 1 ,3 3 -3 5 ,4RECOMMENDATIONS
2-
44, 47-50, 53-55) and stenohal (32, 36-41,
45, 46, 51, 52, 56-58). A corresponding clustering o f 1) Hierarchical clustering with group-average linking,
species and a re-ordering o f the rows and columns of based on sample similarities or dissimilarities such
the original data matrix allows the identification o f a as Bray-Curtis, has proved a useful technique in a
number o f species groups characterising these main number o f ecological studies of the last three decades.
site clusters, as is seen later (Chapter 7). It is appropriate for delineating groups o f sites with
Chapter 3
page 3-6
20 t
40 -
E
'55
.¡2 60 -
tí
3
OI
2 70 --
CÛ
Fig. 3.3. Bristol Channel Zooplankton {B}. Dendrogram fo r hierarchical clustering o f the 57 sites, using group-average linking o f
Bray-Curtis similarities calculated on VV-transformed abundance data.
distinct community structure (this is not to imply Ordination is preferable in these situations.
that groups have no species in common, o f course,
but that different characteristic patterns of abundance 3) Even for samples which are strongly grouped,
are found consistently in different groups). cluster analysis is often best used in conjunction
with ordination. Superimposition of the clusters
2) Clustering is less useful (and could sometimes be (at various levels o f similarity) on an ordination
misleading) where there is a steady gradation in plot will allow any relationship between the groups
community structure across sites, perhaps in response to be more informatively displayed, and it will be
to strong environmental forcing (e.g. large range seen later (Chapter 5) that agreement between the
of salinity, sediment grain size, depth o f water two representations strengthens belief in the
column, etc.). adequacy o f both.
Chapter 4
page 4-1
An ordination is a map o f the samples, usually in two a) PCA is the longest-established method, though the
or three dimensions, in which the placement of samples, relative inflexibility of its definition limits its prac
rather than representing their simple geographical tical usefulness more to multivariate analysis of
location, reflects the similarity o f their biological environmental data rather than species abundances
communities. To be more precise, distances between or biomass; nonetheless it is still widely encountered
samples on the ordination attempt to match the corr and is of fundamental importance.
esponding dissimilarities in community structure:
nearby points have very similar communities, samples b) Non-metric MDS is a more recent development,
which are far apart have few species in common or whose complex algorithm could only have been
the same species at very different levels o f abundance contemplated in an era o f advanced computational
(or biomass). The word ’'attempt” is important here power; however, its rationale can be very simply
since there is no uniquely defined way in which this described and understood, and many people would
can be achieved. (Indeed, when a large number of argue that the need to make few (if any) assumptions
species fluctuate in abundance in response to a wide about the data make it the most widely applicable
variety of environmental variables, each species being and effective method available.
affected in a different way, the community structure
is essentially high-dimensional and it may be imposs
PR INC IPA L C O M PO NENTS A NALYSIS
ible to obtain a useful two or three-dimensional rep
resentation).
The starting point for PCA is the original data matrix
So, as with cluster analysis, several methods have been rather than a derived similarity matrix (though there
proposed, each using different forms o f the original is an implicit dissimilarity matrix underlying PCA,
data and varying in their technique for approximating that o f Euclidean distance). The data array is thought
high-dimensional information in low-dimensional plots. of as defining the positions o f samples in relation to
They include: axes representing the full set of species, one axis for
each species. This is the very important concept intro
a) Principal Com ponents A nalysis, PCA (see, for duced in Chapter 2, following equation (2.13). Typic
example, Chatfield and Collins, 1980);
ally, there are many species so the samples are points
b) P rincipal Co-ordinates A nalysis, PCoA (Gower, in a very high-dimensional space.
1966);
A simple 2-dimensional example
c) Correspondence A nalysis and D etrended Corres
pondence Analysis, DECORANA (Hill and Gauch,
It helps to visualise the process by again considering
1980);
an (artificial) example in which there are only two
à) M ulti-D im ensional Scaling, MDS; in particular species (and nine samples).
non-m etric M D S (see, for example, Kruskal and
Wish, 1978).
Sample 1 2 3 4 5 6 1 8 9
A comprehensive survey of ordination methods is
outside the scope o f this volume. As with clustering Abundance Sp. 1 6 0 5 7 11 10 15 18 14
methods, detailed explanation is given only o f the
Sp.2 2 0 8 6 6 10 8 14 14
techniques required for the analysis strategy adopted
throughout the manual. This is not to deny the validity
o f other methods but simply to affirm the importance The nine samples are therefore points in two dimensions,
o f applying, with understanding, one or two techniques and labelling these points with the sample number gives
o f proven utility. The two ordination methods selected the following plot.
Chapter 4
page 4-2
Sp 2 9 8 Sample 2 34 5 6
10 PCI
This is an ordination already, o f 2-dimensional data The second principal component axis (PC2) is defined
on a 2-dimensional map, and it summarises pictorially as the axis perpendicular to PC I, and a full principal
all the relationships between the samples, without component analysis then consists simply of a rotation
needing to discard any information at all. However, of the original 2-dimensional plot:
suppose for the sake of example that a 1-dimensional
ordination is required, in which the original data is
reduced to a genuine ordering of samples along a
line. How do we best place the samples in order?
One possibility (though a rather poor one!) is simply
to ignore altogether the counts for one o f the species,
say Species 2. The Species 1 axis then automatically
gives the 1-dimensional ordination (Sp.1 counts are
again labelled by sample number):
PC2 i 3 6
— V u t ' F 1 * PCI
10
icular distances of the points from the line.^ The An equivalent way o f visualising this is again in terms
second approach comes from noting in the above of “best fit”: PCI is the “best fitting” line to the sample
example that the biggest differences between samples points and, together, the PCI and PC2 axes define a
take place along the PCI axis, with relatively small plane (stippled in the above diagram) which is the
changes in the PC2 direction. The PCI axis is there “best fitting” plane.
fore defined as that direction in which the variance of
sample points projected perpendicularly onto the axis Algebraic definition
is maximised. In fact, these two separate definitions
o f the PCI axis turn out to be totally equivalent and The above geometric formulation can be expressed
one can use whichever concept is easier to visualise. algebraically. The three new variables (PCs) are ju st
linear com binations of the old variables (species),
Extensions to 3-dimensional data such that PC I, PC2 and PC3 are In the
above example:
Suppose that the simple example above is extended to
the following matrix of counts for three species. PC I = 0.62x Sp.1 + 0 .5 2 xS
PC2 = -0 .7 3 x Sp.1 +0.65 x S (4.1)
Sample 2 3 4 5 PC 3 = 0.28 x Sp.1 + 0 .5 5
f v a r ( P C i ) =TjVariSp.i) (4.2)
has a useful interpretation as the % o f the original total o f the relationship between the samples. A 3-dimen
variance explained by the z’th PC. For the simple 3- sional sample ordination, using the first three PC axes,
dimensional example above, PCI explains 93%, PC2 may give a fuller picture or it may be necessary to
explains 6% and PC3 only 1% of the variance in the invoke PC4, PC5 etc. before a reasonable percentage
original samples. o f the total variation is encompassed. Guidelines for
an acceptable level o f “% variance explained” are
Ordination plane difficult to set, since they depend on the objectives of
the study, the number o f species and samples etc., but
This brings us back finally to the reason for rotating an empirical rule-of-thumb might be that a picture
the original three species axes to three new principal which accounts for as much as 70-75% o f the original
component axes. The first two PCs represent a plane variation is likely to describe the overall structure
o f “best fit”, encompassing the maximum amount of rather well.
variation in the sample points. The % v a ria n c e e x p la in
e d by PC3 may be small and we can dispense with
The geometric concepts o f fitting planes and projecting
this third axis, projecting all points perpendicularly
points in (say) 30-dimensional space are not ones that
onto the (PCI, PC2) plane to give the 2-dimensional
most people are comfortable with (!) so it is important
ordination plane that we have been seeking. For the
to realise that, a lg e b r a ic a lly , the central ideas are no
above example this is:
more complex than in three dimensions. Equations
like (4.1) simply extend to p (=30) principal compon
PC2 ents, each a linear function of the p species counts.
The “perpendicularity” (<orthogonality) o f the principal
component axes is reflected in the zero values for all
sums o f cross-products o f coefficients, e.g. for equation
PCI (4.1):
EXAMPLE: Loch Linnhe macrofauna based on this similarity matrix (for example, the MDS
method o f Chapter 5) clearly scores over PCA, in this
respect. In fact, Fig. 4.1 is based on a data matrix of
The various options available, and the limitations only 29 species, those making up more than 3% o f the
imposed when constructing an ordination using PCA, total abundance in at least one of the samples. (The
are best appreciated in the context of a real data set. rationale for this type o f selection procedure was
A 2-dimensional PCA o f the full Loch Linnhe macro discussed in the section on species similarities in
faunal abundance data {L} is shown in Fig. 4.1. The Chapter 2). Calculations o f the principal components
original matrix contained a total o f 115 species for is now possible though, even so, the software package
the 11 samples, one for each year of the period 1963- needs to handle its computations carefully. A total of
1973. Pulp-mill effluent was first discharged to the 11 sample points will always fit perfectly into 10
loch in 1966, with an increased discharge in 1969/70 dimensions (think o f the lower-dimensional analogy
and a subsequent decrease in 1972/73. again: 3 points in 3-dimensional space will always lie
on a 2-dimensional plane). Thus, only 10 (at most)
Exclude less-common species PC axes can be constructed, or to put it another way,
all the sample variance can be explained by the first
The retention of rarer species in a PCA ordination will 10 PCs. In fact, the first two PCs in Fig. 4.1 explain
have a strongly distorting effect, even supposing that 57% of the total variability so the 2-dimensional
the matrix operations to construct the ordination are ordination does not give a fully satisfactory picture of
possible. For the Loch Linnhe data there are 11 samples the changing community pattern over the years. If
in 115-dimensional species space! An initial and this example were being taken further, it would be
drastic reduction in the number o f species is necessary advisable to look also at the third PC (at least), perhaps
for the PCA algorithm to work. In fact, many o f the with some form of 3-dimensional perspective plot or
species are represented only by a single individual in by the three separate 2-dimensional plots of (PCI, PC2),
a single year and their omission will not be a serious (PC I, PC3) and (PC2, PC3). Nonetheless, one main
loss to interpretation, but the necessity o f making an feature is clear from Fig. 4.1 : the relatively large change
(essentially arbitrary) decision about the species to in community composition between 1970 and 1971,
exclude is one o f the problems with applying PCA to and the reversion in 1973 to a community which is
biological community data. By contrast, the clustering more like the earlier years.
methods o f the last chapter were applied to a similarity
matrix which could be constructed from all species, Transformation of abundance/biomass
the rarer ones either being emphasised, as in reduction
to presence/ absence, or down-weighted automatically In much the same way as was seen for the calculation
(though not ignored totally) by the choice of similarity o f similarity coefficients in Chapter 2, it may be
coefficient and transformation. An ordination method necessary to make an initial transformation of the
abundance or biomass values to avoid over-domination
of the resulting analysis by the very common species.
For the Loch Linnhe data, Capitella numbers in a
3- yearly sample range from 0 to over 4,000 individuals,
67
2 72 66 whereas the bulk of the other species have counts in
71 68
1" 69 single or double figures. For untransformed data (and
0- 63 using a covariance-based analysis, as discussed below),
the Capitella axis will clearly contain a substantial
-1 - 73
part o f the overall variation o f samples in the species
-2 -
65 space, so that the direction of the PCI axis will tend
-3 - 64 to be dictated by that species alone. A more balanced
-4 - 70 picture will emerge after transformation: Fig. 4.1 is
-5 I i based on VV-transformed abundances.
-6 -4 -2 0 2 4 6
PC1 Scale and location changes
Fig. 4.1. Loch Linnhe macrofauna {Lj. 2-dimensional PCA
ordination o f sample abundances ( VV-transformed) from the 11 The data matrix can also be norm alised (after any
years 1963-1973. PCI (x-axis) and PC2 (y-axis) together account transformation has taken place). For each species
fo r 57% o f the total sample variability. abundance, subtract the mean count and divide by the
Chapter 4
page 4-6
standard deviation over all samples for that species. (variables) by n rows (samples)11, with the variables
This makes the variance o f samples along all species perhaps being a mixture o f physical parameters (grain
axes the same (= 1) so all species are of potentially size, salinity, depth o f the water column) and other
equal importance in determining the principal comp environmental or chemical measurements (nutrient
onents. This normalised analysis is referred to as levels, heavy metal, hydrocarbon or PCB concentrat
correlation-based PCA rather than the covariance- ions etc). Patterns in the environmental data across
based PCA obtained when the data is not normalised samples can be displayed in an analogous way to species
(the terminology comes from whether the algebraic data, by a multivariate ordination, and techniques for
extraction of eigenvalues and eigenvectors takes place linking the biological and environmental summaries
on the correlation or covariance matrix between are discussed in Chapter 11.
species).
PCA is more appropriate to environmental variables
because o f the form of the data (there are no large
When does one use normalisation and when transform
blocks of zero counts needing to be treated in a special
ation (or both)? In fact, the arguments are somewhat
way: it is no longer necessary to select a dissimilarity
analogous to those seen in Chapter 2 for the computation
coefficient that ignores “joint absences” and some
o f similarities. There, techniques which tended to
form of Euclidean distance measure makes more sense
weight all species equally (for example, the calculation
for environmental data). However, a crucial difference
o f Canberra coefficients) were rejected in favour o f
between species and environmental data is that the
methods that maintained a greater (though not over
latter will usually have a complete mix of measurement
whelmingly greater) importance for common species
scales (salinity in %o, grain size in (j) units, depth in m,
than rarer ones. This was achieved through initial V ,
metal concentrations in pg/g, PCBs in ng/g etc). In a
VV or log transformation; the equivalent option here
multi-dimensional visualisation o f the environmental
would be to take the same transformation and apply
data matrix, the samples are points referred to environ
covariance-based PCA. It is true that, here, the set of
m ental axes rather than species axes, but what does it
species has first been drastically reduced so that all
mean now to talk about (Euclidean) distance between
the rarer species are eliminated; nonetheless there
two sample points in the environm ental variable
seems no compelling reason why the remaining species
space? If the units on each axis differ, and have no
should be given equal weight, as they would in a corr
natural connection with each other, then point A can
elation-based PCA.
be made to appear closer to point B than point C, or
Note that even if normalisation is used, it still makes closer to point C than point B, simply by a change of
sense to perform an initial transformation. This has scale on one o f the axes (e.g. measuring PCBs in pg/g
the effect of reducing the inevitable right-skewness of not ng/g, or depth in fathoms rather than m). Obviously
the spread of sample counts along a species axis (i.e. it would be entirely wrong for the PCA ordination to
abundances tend to bunch at smaller values with a vary with such arbitrary scale changes. There is one
long “tail” o f occasional large counts). Computation natural solution to this: perform a correlation-based
o f variances and the resulting normalisation are really PCA, i.e. normalise all axes (after transformations, if
designed to cope with data which are, as the name any) so that they have comparable, dimensionless
implies, approximately normally distributed; transform scales.
ation makes this more likely. The problem does not generally arise for species data,
o f course, because though a scale change on the axes
may be contemplated (e.g. changing counts from
PCA OF ENVIRONMENTAL DATA
numbers o f individuals per core to numbers per m2 of
sediment surface), the same scale change is made on
The conclusion above is that covariance-based PCA
each axis and the PCA ordination will be unaffected.
would probably be preferred to correlation-based PCA
The overall guideline would therefore be to use
for species abundance matrices, though the summary
at the end o f this chapter makes it clear that neither is
a very satisfactory ordination method for such data. ^ Note that this is the opposite convention to that used in ecology
There is one important situation, however, where PCA fo r abundance matrices, where the rows (species) are the variables
is a more useful tool and where normalisation is usually (the reason for the different ecological convention is clear: binomial
essential. This is in the multivariate analysis of environ species names are much more neatly displayed as row than column
labels!). Input to the PRIMER PCA routine can use either convention.
mental rather than species data. In conventional It is unnecessary to transpose the matrix before entry; one simply
statistical notation, one has a matrix o f p columns needs to be careful to specify whether rows or columns are to be
treated as the variables.
Chapter 4
page 4-7
Correspondence analyses are a class o f ordination The ordination technique which is adopted in this
methods featuring strongly in French data-analysis manual’s strategy, non -m etric is itself a com
literature (for a review in English see Greenacre, plex numerical algorithm but it can (and will) be
1984). Key papers in ecology are Hill (1973a) and argued that it is conceptually simple. It makes
Hill and Gauch (1980), who introduced detrended any) model assumptions about the form o f the data or
correspondence analysis (DECORANA). The methods the inter-relationship o f the samples, and the link
start from the data matrix, rather than a set of dissimil between the final picture and the user’s original data
arity coefficients, so are rather inflexible in their is relatively transparent and easy to explain. It
definition o f sample dissimilarity; in effect, multinomial addresses both the major criticisms o f PCA made
assumptions generate an implicit dissimilarity measure earlier: it has great flexibility both in the defini
o f “chi-squared” distance. Basic correspondence and conversion of dissimilarity to distance and its
analysis (CA) has its genesis in a particular model of rationale isthe preservation o f these relationships in
unimodal species response to underlying (but un the low-dimensional ordination space.
measured) environmental gradients; an account is
outside the scope o f this manual but a comprehensive
exposition (by C.F.J ter Braak) o f CA and related NON-METRIC MULTIDIMENSIONAL
techniques can be found in Jongman et al (1987).^ SCALING
The popular DECORANA version of CA has a primary
The method o f non-m etric was introduced by
motivation of straightening out an “arch effect” in a
Shepard (1962) and Kruskal (1964), for application to
problems in psychology; a useful introductory text is
^ A convenient way o f carrying out correspondence analyses, and Kruskal and Wish (1978), though again the applications
related canonical methods, is to use ter Braak1s excellent described are not ecological. Throughout this manual
CANOCO package.
Chapter 5
page 5-2
the term MDS refers to Kruskal’s non-metric procedure 2 and 3 next closest, then 1 and 4, 3 and 4, 1 and 2,
(metric MDS is possible, and is akin to PCoA, but will and finally, 1 and 3 are furthest apart. The resulting
not be discussed in any detail). figure is a more informative summary than the corres
ponding cluster analysis in Chapter 3, showing, as it
The starting point is the similarity or dissimilarity does, a gradation o f change from clean (1) to progress
matrix among samples (Chapter 2). This can be ively more impacted years (2 and 3) then a reversal of
whatever similarity matrix is biologically relevant to the trend, though not complete recovery to the initial
the questions being asked o f the data. Through position (4).
choice o f coefficient and possible transformation or
standardisation, one can choose whether to ignore Though the mechanism for constructing such MDS
joint absences, to emphasise similarity in common or plots has not yet been described, two general features
rare species, to compare only % composition or allow of MDS can already be noted.
sample totals to play a part, etc. In fact, the flexibility
o f MDS goes beyond this. It recognises the essential 1) MDS plots can be arbitrarily sc a le d , lo ca ted , r o ta te d
arbitrariness of a b s o lu te similarity values; Chapter 2 or in v e r te d . Clearly, rank order information about
showed that the range o f values taken could alter which samples are most or least similar can say
dramatically with transformation (often, the more nothing about which direction in the MDS plot is
severe the transformation, the higher and more “up” or “down”, or the absolute “distance apart” of
compressed the similarity values become) and there is two samples: what can be interpreted is r e la tiv e
no absolute interpretation o f a statement like “the distances apart, o f course.
similarity of sample 1 to sample 2 is 1.5 times that o f
sample 1 to sample 3”. The natural interpretation is 2) It is not difficult in the above example to see that
in terms of the r e la tiv e values o f similarity to each four points could be placed in two dimensions in
other, e.g. simply that “sample 1 is more similar to such a way as to satisfy the similarity ranking
sample 2 than it is to sample 3” . This is an intuitively exactly.11 For more realistic data sets, though, this
appealing and very generally applicable base from will not usually be possible and there will be some
which to build a graphical representation of the sample distortion or stress between the similarity rankings
patterns and, in effect, the ranks o f the similarities and the corresponding distance rankings in the
a r e the only information used by a successful non ordination plot (even in a high-dimensional config
metric MDS ordination. uration). This motivates the principle o f the MDS
algorithm: to choose a configuration o f points which
The purpose of MDS can thus be simply stated: it minimises this degree o f stress, appropriately
constructs a “map” or configuration o f the samples, measured.
in a specified number o f dimensions, which attempts
to satisfy all the conditions imposed by the rank
(dis)similarity matrix, e.g. if sample 1 has higher EXAMPLE: Exe estuary nematodes
similarity to sample 2 than it does to sample 3 then
sample 1 will be placed closer on the map to sample 2 The construction o f an MDS plot is illustrated with
than it is to sample 3. data collected by W arwick (1971) and subsequently
analysed in this way by Field e t a l (1982). A total of
19 stations from different sites and tide-levels in the
EXAMPLE: Loch Linnhe macrofauna Exe estuary, UK, were sampled bi-monthly at low
spring tides between October 1966 and September
This is illustrated in Table 5.1 for the subset o f the 1967.
Loch Linnhe macrofauna data used to demonstrate
hierarchical clustering (Table 3.2). Similarities
between VV-transformed counts of the four year
samples are given by Bray-Curtis similarity coeff
icients, and the corresponding rank similarities are ^ In fact, there are rather too many ways o f satisfying it and the
algorithm described in this chapter will fin d slightly different
also shown. (The highest similarity has the lo w e s t
solutions each time it is run, all o f them equally correct. However,
rank, 1, and the lowest similarity the highest rank, this is not a problem in genuine applications with (say) six or
n ( n - l ) / 2 , ) The MDS configuration is constructed to more points. The number o f similarities increases roughly with
p r e s e r v e the similarity ranking as Euclidean distances the square o f the number o f samples and a position is reached very
in the 2-dimensional plot: samples 2 and 4 are closest, quickly in which not all rank orders can be preserved and this
particular indeterminacy disappears.
Chapter 5
page 5-3
Table 5.1. Loch Linnhe macrofauna {L} subset. Abundance array after VV-transform, the Bray-Curtis similarities (as in Table 3.2),
the rank similarity matrix and the resulting 2-dimensional MDS ordination.
Year: 64 68 71 73
Sample: 1 2 3 4 Sample 1 2 3 4 Sample 1 2 3 4
Species 1 - 1 - 3
Echinoca. 1.7 0 0 0 -> 2 25.6 - -> 2 5 - -» 2
Myrioche. 2.1 0 0 1.3 3 0.0 67.9 - 3 6 2 - 1 4
Labidopl. 1.7 2.5 0 1.8 4 52.2 68.1 42.0 - 4 3 1 4 -
Amaeana 0 1.9 3.5 1.7
Capitella 0 3.4 4.3 1.2
Mytilus 0 0 0 0
Three replicate sediment cores were taken for meio- Instead they appear to relate to variables such as
faunal analysis on each occasion, and nematodes sediment type and organic content, and these links are
identified and counted. This analysis considers only discussed in Chapter 11. For now the question is:
the mean nematode abundances across replicates and what are stages in the construction o f Fig. 5.1?
season (no seasonal differences were evident in a
more detailed analysis), so the data matrix consists o f
182 species and 19 samples.
MDS ALGORITHM
c
ao> 1 A) M easure goodness-of-fit o f the regression by calc
ulating the stress value
(0
5 íes*
□□ Stress = ) £ j X k (djk - d j O f Z j U k d j k (5-1)
ordinate positions in the configuration), the method other.^ In genuine applications, converged stress
gradually finds its way down to a minimum o f the values are rarely precisely the same if configurations
stress function. This is most easily envisaged in three differ materially.
dimensions, with just a 2-dimensional parameter
Degenerate solutions can also occur, in which groups
space (the jc, y plane) and the vertical axis (z) denoting
of samples collapse to the same point (even though
the stress at each (x, y ) point. In reality the stress
they are not 100% similar), or to the vertices o f a
surface is a function of more parameters than this of
triangle, or are strung out round a circle. In these
course, but we have seen before how useful it can be
cases the stress may go to zero. (This is akin to our
to visualise high-dimensional algebraic operations in
rambler starting his walk outside the encircling hills,
terms of 3-dimensional geometry. An appropriate
so that he sets off in totally the wrong direction and
analogy is to imagine a rambler walking across a
ends up at the sea!) Artefactual solutions o f this sort
range of hills in a thick fog(!), attempting to find the
are relatively rare and easily detected: repetition from
lowest point within an encircling range o f high peaks.
different random starts will find many solutions
A good strategy is always to walk in the direction in
which are more sensible. (In fact, a more likely cause
which the ground slopes away most steeply (the
of an ordination in which points tend to be placed
method o f steepest descent, in fact) but there is no
around the circumference o f a circle is that the input
guarantee that this strategy will necessarily find the
matrix is o f similarities when the program has been
lowest point overall, i.e. the global m inim um o f the
told to expect dissimilarities, or vice-versa; in such
stress function. The rambler may reach a low point
cases the stress will also be very high.) A much more
from which the ground rises in all directions (and
common form of degenerate solution is repeatable
thus the steepest descent algorithm converges) but
and is a genuine result o f a disjunction in the data.
there may be an even lower point on the other side of
For example, if the data divide into two groups which
an adjacent hill. He is then trapped in a local m inim um
have no species in common, or for which all dissimil
of the stress function. Whether he finds the global or
arities within the groups are smaller than any dissimil
a local minimum depends very much on where he
arity between groups, then there is clearly no yardstick
starts the walk, i.e. the starting configuration of
within our non-parametric approach for determining
points in the ordination plot.
how far apart the groups should be placed in the MDS
plot. It is then not surprising to find that the samples
Such local minima do occur in many MDS analyses,
in each group collapse to a point (a commonly met
usually corresponding to configurations of sample
special case is when one of the two groups consists of
points which are only slightly different from one
a single outlying point). The solution is to split the
another. Often this may be because there are one or
data and carry out an ordination separately on the two
two points which bear little relation to any o f the
groups (or, in the latter case, re-run the MDS omitting
other samples and there are several choices as to
the outlier).
where they may be placed, or perhaps they have a
more complex relationship with other samples and
may be difficult to fit into (say) a 2-dimensional Another feature of MDS mentioned earlier is that,
picture. unlike PCA, there is not any direct relationship between
ordinations in different numbers o f dimensions. In
There is no guaranteed method o f ensuring that a PCA, the 2-dimensional picture is just a projection o f
global minimum o f the stress function has been the 3-dimensional one, and all PC axes can be generated
reached; the practical solution is therefore to repeat in a single analysis. With MDS, the minimisation o f
the MDS analysis several times starting with different stress is clearly a quite different optimisation problem
random positions of samples in the initial configuration for each ordination of different dimensionality; indeed,
(step 2 above). If the same (lowest stress) solution this explains the greater success of MDS in distance-
re-appears from a number of different starts then preservation. Samples that are in the same position
there is a strong assurance, though never a total with respect to (PC 1, PC2) axes, though are far apart
guarantee, that this is indeed the best solution. Note
that the easiest way to determine whether the same ^ The arbitrariness o f orientation can be a practical nuisance
solution has been reached as in a previous attempt is when comparing different ordinations, and it can be helpful to
simply to check for equality o f the stress values; rotate an MDS so that its direction o f maximal variation always
lies along the x axis. This can be simply achieved by applying
remember that the configurations themselves could be
PCA to the 2-d MDS co-ordinates (this is not the same thing as
arbitrarily rotated or reflected with respect to each applying PCA to the original data matrix o f course!); the
PRIMER MDS routine does this automatically but also then
permits easy user-control o f fin a l orientation and reflection.
Chapter 5
page 5-6
on the PC3 axis, will be projected on top o f each plot?” One answer to this is through empirical
other in a 2-dimensional PCA but they will remain evidence and simulation studies o f stress values.
separate, to some degree, in a 2-dimensional as well Stress increases not only with reducing dimensional
as a 3-dimensional MDS. ity but also with increasing quantity o f data, but a
rough rule-of-thumb for 2-dimensional ordinations,
If the ultimate aim is a 2-dimensional ordination, it using the stress formula (5.1), is as follow s/
may still be useful to carry out a 3-dimensional MDS
initially. Its first two dimensions will often provide a Stress <0.05 gives an excellent representation with no
reasonable starting point to the iterative computations prospect of misinterpretation (a perfect representation
for the 2-dimensional configuration.11 In fact, this would probably be one with stress <0.01 since
strategy will tend to reduce the risk of finding local numerical iteration procedures often terminate
minima or degenerate solutions. The samples are when stress reduces below this value*).
likely to fit more easily into three dimensions, itself Stress <0.1 corresponds to a good ordination with no
reducing the risk of finding a local minimum; the 2- real prospect o f a misleading interpretation; 3- or
dimensional iteration will then be constrained to start higher-dimensional solutions will not add any
much nearer a global minimum than it would for a additional information about the overall structure
purely random initial configuration. Another reason (though the fíne structure o f any compact groups
for obtaining higher-dimensional solutions is to compare may bear closer examination).
their stress with that from two dimensions: this is one
of several ways in which the accuracy of a 2-dimension- Stress <0.2 still gives a potentially useful 2-dimensional
al MDS can be assessed. picture, though for values at the upper end o f this
range too much reliance should not be placed on
the detail of the plot; a cross-check of any conclusions
ADEQUACY OF MDS should be made against those from an alternative
REPRESENTATION technique (e.g. the superimposition of cluster groups
suggested in point 5 below).
1)7$ the stress value small? By definition, stress
increases with reducing dimensionality o f the Stress >0.3 indicates that the points are close to being
ordination (or in rare cases where a low-dimensional arbitrarily placed in the 2-dimensional ordination
ordination is a perfect representation, stress remains space. In fact, the totally random positions used as
constant). It has therefore been suggested that a starting configuration for the iteration usually
stress values in 2, 3, 4 etc. dimensions should be give a stress around 0.35-0.45. Values o f stress in
compared: if there is a particularly large drop in the range 0.2-0.3 should therefore be treated with a
stress passing from two to three dimensions (say) great deal of scepticism and certainly discarded in
and only a modest, steady decrease thereafter, this the upper half o f this range, especially for a small
would imply that a 3-dimensional ordination is to moderate number o f points (<50 say). Other
likely to be a more satisfactory representation than techniques will be certain to highlight inconsistencies
a 2-dimensional one. However, experience with and higher-dimensional ordinations should be
ecological data suggests that clear-cut “shoulders” examined.
such as this, in the plot of minimum stress against
dimensionality, are rarely seen. It is also undeniable 2) D oes the Shepard diagram appear satisfactory?
that a 2-dimensional picture will usually be a more The stress value totals the scatter around the regress
useful and accessible summary, so the question is ion line in a Shepard diagram, for example the low
often turned around: not “What is the true dimension stress o f 0.05 for Fig. 5.1 is reflected in the low
ality o f the data?” but “Is a 2-dimensional plot a scatter in Fig. 5.2. Outlying points in the plot could
usable summary o f the sample relationships, or is it
likely to be sufficiently misleading to force its
abandonment in favour of a 3- or higher-dimensional
f There are alternative definitions o f stress, fo r example the
“stress formula 2 ” option provided in the MDSCAL and KYST
programs. This differs only in the denominator scaling term in
^ This procedure is adopted by the PRIMER MDS routine, which
(5.1) but is believed to increase the risk o f finding local minima
also allows the user to specify the number o f random re-starts
and to be more appropriate fo r other form s o f multivariate
(ideally at least 10). The PRIMER results log contains the co
scaling, e.g. multidimensional unfolding, which are outside the
ordinates o f the best (lowest stress) 2-dimensional and 3-dimen
scope o f this manual.
sional solutions and the stress values fo r all 2-d and 3-d repeats.
It can plot either the optimal 2-d or 3-d configuration. * This is true o f the M DS routine in PRIMER, fo r example.
Chapter 5
page 5-7
8 1B
be identified with the samples involved; often there 5) Do superim posed groups fro m a cluster analysis
are a range of outliers all involving dissimilarities distort the ordination plot? The combination of
with a particular sample and this can indicate a clustering and ordination analyses can be a very
point which really needs a higher-dimensional effective way o f checking the adequacy and mutual
representation for accurate placement, or simply consistency o f both representations. Fig. 5.3 shows
corresponds to a major error in the data matrix. the dendrogram from a cluster analysis o f the Exe
estuary nematode data {X} of Fig. 5.1. Two or
3) Is there distortion when sim ilar sam ples are more (arbitrary) similarity values are chosen at a
connected in the ordination plot? One simple spread of hierarchical levels, each determining a
check on the success o f the ordination in dissimil- particular grouping o f samples. In Fig. 5.3, four
arity-preservation is to identify the top 10% or groups are formed at around a 15% similarity level
20% (say) o f values in the similarity matrix and and eight groups would be determined for any
draw a line between the corresponding points on similarity threshold between 30 and 45%.
the MDS configuration. An inaccurate represent
ation is indicated if several connections are made
between points which are further apart on the plot
than other unconnected pairs of points.
9
8
7
Fig. 5.5. Dosing experiment,
CD 6
o Solbergstrand mesocosm {D}.
+g■* 5 Nematode abundances fo r
fo u r replicates from each o f
¡5 4 fo u r treatments (control, low,
3
medium and high dose o f
2 hydrocarbons and Cu) after
1 species reduction and log
100 transformation as in Fig. 4.2.
H M MC L L L C H H C M C M L H H M H H M C M C C L M L C L H L a), c) Group-averaged clust
c d ering from Bray-Curtis simil
arities; clusters formed at two
MDS PCA arbitrary levels are superim
posed on the 2-dimensional
MDS obtained from the same
similarities (stress = 0.16).
b), d) Group-average cluster-
ing from Euclidean distances;
clusters from two levels are
superimposed on the 2-dimen
sional PCA o f Fig. 4.2. Note
the greater degree o f distortion
in the latter.
These two sets of groupings are superimposed on principal components was very low, at 37%. Fig.
the MDS ordination, Fig. 5.4, and it is clear that 5.5c shows the MDS ordination from the same data,
the agreement between the two techniques is and in order to make a fair comparison with the PCA
excellent: the clusters are sharply defined and the data matrix was treated in exactly the same way
would be determined in much the same way if one prior to analysis.1 The stress for the 2-dimensional
were to select clusters by eye from the 2-dimensional MDS configuration is m oderately high (at 0.16),
ordination alone. The stress for Fig. 5.4 is also indicating some difficulty in displaying the relation
low, at 0.05, giving confidence that the 2-dimensional ships between these 16 samples in two dimensions.
plot is an accurate representation o f the sample However, the PCA was positively misleading in its
relationships. One is not always as fortunate as apparent separation o f the four high dose (H) replicates
this, and a more revealing example o f the benefits in the 2-dimensional space; by contrast the MDS does
o f viewing clustering and ordination in combination provide a usable summary which is not likely to lead
is provided by the data of Fig. 4 2 } to serious misinterpretation. This can be seen by
superimposing the corresponding cluster analysis
results, Fig. 5.5a, onto the MDS. Two similarity
EXAMPLE: Dosing experiment, thresholds have been chosen in Fig. 5.5a such that
Solbergstrand they (arbitrarily) divide the samples into 5 and 10
groups, the corresponding hierarchy o f clusters being
The nematode abundance data from the dosing experim indicated in Fig. 5.5c by thin and thick lines respect
ent {D} at the GEEP Oslo W orkshop was previously ively. Whilst it is clear that there are no natural
analysed by PCA, see Fig. 4.2 and accompanying groupings o f the samples in the MDS plot, and the
text. The analysis was likely to be unsatisfactory, groupings provided by the cluster analysis must
since the % o f variance explained by the first two therefore be regarded with some caution, the two
analyses are not markedly inconsistent.
^ One option within PRIMER is to run CLUSTER on the ranks o f
the similarities rather than the similarities themselves. Whilst not
o f any real merit in itself (and not the default option), Clarke
(1993) argues that this could have marginal benefit when performing T The same 26 species were retained and a log transformation
a group-average cluster analysis solely to see how well the clusters applied before computation o f Bray-Curtis similarities, though, o f
agree with the MDS plot: the argument is that the information course, a species reduction would not normally be necessary with
utilised by both techniques is then made even more comparable. MDS or clustering o f samples.
Chapter 5
page 5-9
Table 5.2. Road distances (miles) between pairs o f selected towns and cities in the UK (part only).
Ln
London Te
Teesside 247 Tn
Taunton 144 302 St
Stranraer 399 186 416 So
Southampton 77 286 67 422 Sh
Shrewsbury 153 168 146 270 163 PI
Plymouth 211 376 74 490 146 220 Pr
Perth 417 183 452 145 455 306 526 Pz
Penzance 281 442 140 556 217 286 78 592 Ox
Oxford 57 220 109 357 65 104 180 390 250 Nt
Nottingham 122 127 180 283 158 79 254 297 320 94 Nw
Norwich Ul 224 248 385 189 195 319 399 389 139 124 Nc
Newcastle 273 35 327 155 311 194 401 150 467 247 153 256 Me
Manchester 184 103 203 218 206 66 277 254 343 142 70 184 128 Lv
Liverpool 197 135 203 219 216 58 278 254 344 154 97 215 153 35 Li
Lincoln 132 119 209 279 184 115 284 293 350 119 36 106 150 84 118 Le
Leeds 190 65 237 217 224 106 312 235 378 160 67 173 91 40 73 67 Kn
Kendal 253 77 270 146 275 124 344 182 410 211 137 244 85 72 72 138 71 In
Inverness 531 297 566 251 569 420 640 115 706 505 411 514 264 368 369 408 349 296 Hu
Hull 168 82 247 252 221 140 321 265 387 156 73 143 117 93 128 37 56 124 380 HI
Holyhead 259 226 252 315 269 106 326 351 392 210 172 296 249 123 94 201 163 169 465 214 Gc
Gloucester 105 224 78 341 91 75 152 376 218 49 102 179 249 125 128 131 159 194 491 169 181 G1
Glasgow 392 179 409 84 415 263 483 61 549 350 276 378 143 211 212 272 210 139 169 245 308 334
Fort William 495 274 512 185 517 366 586 105 652 453 379 481 238 314 314 375 313 242 66 347 411 436 103
measure will automatically downweight the contrib distances with their rank order. The resulting
ution of species that are rarer (and thus more prone “map” is shown in Fig. 5.7. Towns and cities are
to random and uninterpretable fluctuations). There placed fairly close to their true locations though
is then no necessity to delete species, either to obtain there is some distortion, and stress is not zero,
realistic low-dimensional ordinations or to make because road distances are not the same as direct
the calculations viable; the computational scale is (“as the crow flies”) distances. The distortion is
determined solely by the number of samples. most evident in the peninsular regions where road
distances are much greater than direct distances,
A) M D S is generally applicable. MDS can validly be e.g. the placement o f Penzance and Plymouth in
applied in a wide variety o f situations; fewer relation to the Welsh locations. Using a direct
assumptions are made about the nature and quality distance matrix instead, but again based only on the
o f the data when using MDS than for (arguably) rank distances, the MDS algorithm now produces
any other ordination method. It seems difficult to Fig. 5.8. With only minor exceptions, this fits on
imagine a more parsimonious position than stating top o f the true map of these locations, as indicated
that all that should be relied on is the rank order of by the superimposed coastline: the reconstruction
similarities (though of course this still depends on is near perfect, and the stress equals zero. This is a
the data transformation and similarity coefficient remarkable demonstration of the ability o f MDS to
chosen). The step to considering only rank order generate powerful displays from only rank order
o f similarities, rather than their actual values, is not information (“M anchester is closer to Leeds than
as potentially inefficient as it might at first appear, Plymouth is to Penzance” etc.), and such examples
in cases where we have more faith in the exact can be most useful (and are commonly used) in
value of the (dis)similarities. A simple example explaining the purpose and interpretation o f
which illustrates this is in the reconstruction of ordinations to the non-specialist.1
genuine maps. Table 5.2 is a lower triangular
matrix giving the road distances between a number
o f major towns and cities in the UK. This is a real 1 For example, Everitt (1978) uses a road distance matrix fo r the
UK to illustrate PCoA, and Clarke (1993) uses great-circle dist
distance matrix (for a change!) but it can be input
ances between pairs o f world cities in an MDS, to illustrate the
to an MDS in the same way as above, replacing the concept o f stress in an MDS, when seeking a 2-d representation
o f “dissimilarities” arising from an inherently higher-dimensional
(3-d) configuration.
Chapter 5
page 5-11
FW Ad
FW
Ed
Le Hu
Kn
Sh Nt Le
Ab
Bm Cb Nw Me
Cm
Co
Ln Sh
Bm
Tn Bg Dv Cm Cb
Co
Ln
So
Pz
Fig. 5.7. Non-metric MDS configuration o f the road distances Fig. 5.8. Non-metric MDS configuration o f the same towns and
(partly given in Table 5.2) between selected UK towns and cities cities as in Table 5.2, but starting from the matrix o f direct ( “as
(stress = 0.04). the crow flies ”) distances between every pair (stress = 0).
5) Sim ilarities can be given unequal weight. If some say), though speed has become much less of an
samples are inherently less reliable than others issue than it once was. MDS on more than about
because they are based on smaller amounts of = 1000 samples is not only rather computationally
material sampled (perhaps combining the results of intensive (processor time increases roughly proport
fewer replicates), then similarities involving these ional to n2)*but also increasing sample size generally
samples can be given less influence in the cons brings increasing complexity of the sample relation
truction o f the MDS configuration: a weighting ships, and a 2-dimensional representation is unlikely
term could be added to the definition o f stress in to be adequate in any case. (O f course this last
equation (5.1). It is also true, though not of much point is just as true, if not more true, for other
practical significance here, that the algorithm can ordination methods). This scenario was touched
operate perfectly successfully when the similarity on in Chapter 4 and in the discussion o f Fig. 5.6,
matrix is subject to a certain amount o f missing where it was suggested that large data sets can
data.f often, with benefit, be sub-divided or on
the basis o f well-defined subsets from a cluster
analysis, and the groups analysed separately by
MDS WEAKNESSES
MDS. Representatives (or averages) from each
group can then be input to another MDS to display
\ ) M D S is computationally demanding. To generate
the large-scale structure across groups.
a configuration with a moderate number o f samples
takes some time on a modern PC (for up to n = 100 2) Convergence to the global
samples, a few seconds for each o f 10 random starts, not guaranteed.As we have seen, the iterative
nature of the MDS algorithm makes it necessary to
f Neither o f these options are currently implemented in PRIMER. repeat each analysis a number o f times, from different
They could only be o f importance i f data were to arise directly as
similarities constructed from pairwise comparisons o f biological
material, and some o f those comparisons are not made or are * There are no longer absolute co
lost. It is not o f relevance i f similarities are generated from a PRIMER though there will be effective constraints imposed by
species-by-samples data matrix since, usually, either all or none available memory and processor speed. However, the pertinent
o f the similarities involving a particular sample can be calculated. question should now be not “how many samples will it handle? ”
I f the latter, then there is clearly no way the sample could feature but “how many samples does it make sense to try to represent,
in the ordination! approximately, in a single 2-d picture?”
Chapter 5
page 5-12
starting configurations, to be fairly confident that a grouped the MDS will reveal this anyway, and
solution that re-appears several times (with the when there is a more gradual continuum o f change,
lowest observed stress) is indeed the global minimum or some interest in the placement of major groups
o f the stress function. Generally, the higher the with respect to each other, MDS will display this
stress, the greater the likelihood o f non-optimal in a way that a cluster analysis is quite incapable of
solutions, so a larger number o f repeats is required; doing. For higher values o f stress, the techniques
this adds to the computational burden. should be thought of as com plem entary to each
other; neither may present the full picture so the
3) The algorithm places m ost weight on the large recommendation is to p erfo rm both and view them
distances. A common feature of most ordination in com bination. This may make it clear which
methods (including MDS and PCA) is that more points on the MDS are problematic to position
attention is given to correct representation o f the (examining some o f the local minimum solutions
o verall structure of the samples than' their local can help here11) and an ordination in a higher dim
structure. For MDS, it is clear from the form of ension may prove more consistent with the cluster
equation (5.1) that the largest contributions to groupings. Conversely, the MDS plots may make
stress will come from incorrect placement of it clear that some groups in the cluster analysis are
samples which are very distant from each other. arbitrary subdivisions o f a natural continuum.
Where distances are small, the sum o f squared
difference terms will also be relatively small and
the minimisation process will not be as sensitive to
incorrect positioning. This is another reason therefore
for repeating the ordination within each large cluster:
it will lead to a more accurate display of the fine
structure, if this is important to interpretation. An
example is given later in Figs. 6.2a and 6.3, and is
fairly typical of the generally minor differences
that result: the subset of points are given more
freedom to expand in a particular direction but
their relative positions are usually only marginally
changed.
RECOMMENDATIONS
Further details o f how confidence intervals are deter samples and the probability distribution of counts
mined, why the ANOVA F ratio and F tables are could never be reduced to approximate (multivariate)
defined in the way they are, how one can allow to normality, by any transformation, because o f the
some extent for the repeated significance tests in dominance of zero values. For example, for the
pairwise comparisons o f site means etc, are not Frierfjord data, as many as 50% o f the entries in the
pursued here. This is the ground o f basic statistics, species/samples matrix are zero, even after reducing
covered by many standard texts, for example Sokal the matrix to only the 30 m ost abundant species!
and Rohlf (1981), and such computations are available
in all general-purpose statistics packages. This is not A valid test can instead be built on a simple non-
to imply that these concepts are elementary; in fact it parametric permutation procedure, applied to the
is ironic that a proper understanding of why the uni (rank) similarity matrix underlying the ordination or
variate F test works requires a level of mathematical classification of samples, and therefore termed an
sophistication that is not needed for the simple permuta
tion approach to the analogous global test for differences
in multivariate structure between groups, outlined
below.
MULTIVARIATE TESTS
CD
o
is combined with a general randomization approach
O
to the generation of significance levels {M onte Carlo B
tests, Hope 1968). In the context below, it was des
cribed by Clarke and Green (1988). C
B
4ANOSIM’ FOR THE 1-WAY LAYOUT D
D
Fig.6.3 displays the MDS based only on the 12 samples
(4 replicates per site) from the B, C and D sites of the
Frierfjord macrofauna data. The null hypothesis (H0)
is that there are no differences in community compos C
D
ition at these 3 sites. In order to examine H0, there
D
are 3 main steps:
1) Com pute a test statistic reflecting the observed
differences betw een sites, contrasted with differences Fig. 6.3. Frierfjord macrofauna {F}. MDS ordination as fo r
Fig. 6.2 but computed only from the similarities involving sites
among replicates within sites. Using the MDS plot o f B, C and D (stress = 0.11).
Fig. 6.3, a natural choice might be to calculate the
average distance between every pair of replicates
within a site, and contrast this with the average distance in the underlying triangular similarity matrix. If
apart o f all pairs of samples corresponding to replicates is defined as the average o f all rank similarities among
from different sites. A test could certainly be construct replicates w ithin sites, and is the average o f rank
ed from these distances but it would have a number o f similarities arising from all pairs of replicates
drawbacks. different sites, then a suitable test statistic is
a) Such a statistic could only apply to a situation in
which the method o f display was an MDS rather
than, say, a cluster analysis.
b) The result would depend on whether the MDS was
where M = n i n - 1)/2 and is the total number of
constructed in two, three or higher dimensions.
samples under consideration. Note that the highest
There is often no “correct” dimensionality and one
similarity corresponds to a rank o f 1 (the lowest
may end up viewing the picture in several different
value), following the usual mathematical convention
dimensions - it would be unsatisfactory to generate
for assigning ranks.
different test statistics in this way.
c) The configuration o f B, C and D replicates in Fig. The denominator constant in equation (6.1) has been
6.3 also differs slightly from that in Fig. 6.2a, chosen so that:
which includes the full set of sites A-E, G. It is
a) Rc
an never technically lie outside the range (-1,1);
again undesirable that a test statistic for comparing
only B, C and D should depend on which other b)/? = 1 only if all replicates within sites a
sites are included in the picture. similar to each other than replicates from differ
ent sites;
These three difficulties disappear if the test is based
not on distances between samples in an MDS but on c) Ri
s approximately zero if the null hypothesis is
the corresponding rank similarities between samples true, so that similarities between and within sites
will be the same on average.
R will usually fall between 0 and 1, indicating some
^ The PRIMER ANOSIM routine covers tests fo r replicates from
degree o f discrimination between the sites. substant
1-way and 2-way (nested or crossed) layohts; the ANOSIM2 ially less than zero is unlikely since it would correspond
routine tackles the special case o f a 2-way layout with no replic to similarities across different sites being than
ation, which needs a modified style o f test described at the end o f those within sites; such an occurrence is more likely
this chapter.
Chapter 6
page 6—4
the MDS plot that distinguishes the 1981 and 1983 (e.g. control and impact) are applied at a number o f
groups (a point returned to in Chapter 15 )} This is in locations (“blocks”), for example in the different
contrast with the standard univariate ANOVA (or multi mesocosm basins o f a laboratory experiment;
variate MANOVA) test, which will have no power to
detect a variability change; indeed it is invalid without c) a 2-way crossed case wit of each
an assumption of approximately equal variances (or treatment/block combination can also be catered
variance-covariance matrices) across the groups. for, to a limited extent, by a different style o f
permutation test.
separate conditions: P and C labels may not be switched. genuine “replicates” (the sites 1-3) at each of the two
Even so, the number o f possible permutations is large conditions (C and P).
(around 20,000).
This is a 1-way layout, and H2 can be tested by 1-way
Notice again that the test is not restricted to balanced ANOSIM but one first needs to combine the inform
designs, i.e. those with equal numbers o f replicate ation from the three original replicates at each site, to
samples within sites and/or equal numbers of sites define a similarity matrix for the 6 “new” replicates.
within treatments (although lack o f balance causes a Consistent with the overall strategy that tests should
minor complication in the efficient averaging of Rc only be dependent on the rank similarities in the
and Rp, see Clarke, 1988, 1993). Fig. 6.6b displays original triangular matrix, one first averages over the
the results of 999 simulations (constrained relabellings) appropriate ranks to obtain a reduced matrix. For
from the permutation distribution for R under the example, the similarity between the three PI and
null hypothesis H I. Possible values range from -0.3 three P2 replicates is defined as the average o f the
to 0.6, though 95% of the values are seen to be <0.27 nine inter-group rank similarities; this is placed into
and 99% are <0.46. The observed R o f 0.75 therefore the new similarity matrix along with the 14 other
provides a strongly significant rejection o f hypothesis averages (C l with C2, PI with Cl etc) and all 15
H I. values are then re-ranked\ the 1-way ANOSIM then
gives R = 0.74. There are only 10 distinct permutations
so that, although this is actually the most extreme R
H2, which will usually be the more interesting o f the
value possible, H2 is only able to be rejected at a
two hypotheses, can now be examined. ' The test of
p< 10% significance level.
HI demonstrated that there are, in effect, only three
A1
B3 N3 G3
L4
02
D2 K3
H2 M3
04 E3 M1
A2
C2 K4
A4 G1
H4 D3
B4
N2 A3
P4
L2 C4
M4 H3
M2 G2
D4
B2 E4 Fig. 6.8. Westerschelde nem
atodes experiment [Wf. MDS
P2
o f species abundances from
P3 16 different nutrient-enrichment
C3 treatments, A to P, applied to
sediment cores in each o f four
G4
mesocosm basins, 1 to 4 (stress
= 0.28).
the true value o f R (=0.85) is again the most extreme The experiment involved 15 different nutrient enrich
and is almost certainly the largest in the full set; the ment conditions and one control, the treatments being
null hypothesis is decisively rejected. In this case the applied to the surface of the undisturbed sediment cores.
test is inherently uninteresting but in other situations A fter 16 weeks controlled exposure in the mesocosm
(e.g. a sites x times study) tests for both factors could environment, the meiofaunal communities in the 64
be of practical importance. cores were identified, and Bray-Curtis similarities on
root-transformed abundances gave the MDS o f Fig.
6.8. The full set o f 16 treatments was repeated in each
EXAMPLE: Mesocosm experiment (2-way o f the 4 basins (blocks), so the structure is a 2-way
crossed ease with no replication) treatments x blocks layout with only one replicate per
cell. Little, if any, of this structure is apparent from
Although the above test may still function if a few Fig. 6.8 and a formal test o f the null hypothesis
random cells in the 2-way layout have only a single
H0: there are no treatment differences (but allowing
replicate, its success depends on reasonable levels o f the possibility o f basin effects)
replication overall to generate sufficient permutations.
A commonly arising situation in practice, however, is is clearly necessary before any sort of interpretation
where the 2-way design includes no replication at all.^ is attempted.
Typically this could be a sites x times field study (see
next section) but it may also occur in experimental In the absence of replication, a test is still possible in the
work: an example is given by Austen and W arwick univariate case, under the assumption that interaction
(1995) o f a laboratory mesocosm study in which a effects are small in relation to the main treatment or
complex array of treatments was applied to soft- block differences (Scheffé, 1959). In a similar spirit,
sediment cores taken from a single, intertidal location a global test of H0 is possible here, relying on the
in the W esterschelde estuary, Netherlands. A total o f observation that i f certain treatments are responsible
64 cores were randomly divided between 4 mesocosm for community changes, in a more-or-less consistent
basins, 16 to a basin. way across blocks, separate MDS analyses for each
block should show a repeated treatment pattern. This
is illustrated schematically in the top half o f Fig. 6.9:
the fact that treatment A is consistently close to B
^ As noted earlier, this case is not covered by the PRIMER ANOSIM
routine. It uses a separate routine, ANOSIM2. (and C to D) can only arise is H0 is false. The analogy
Chapter 6
page 6-11
Bray-Curtis
SPECIES (say) Block 2
COUNTS
(possibly between
trans every pair
formed) of columns
(Not Block 3
used)
A B
I i I
RANK RANK SIMILARITIES
CORRELATION Total
Block 1 2 3 1
1 2 Fig. 6.9. Schematic diagram
Average illustrating the stages in def
i ining concordance o f treatment
between patterns across the blocks, and
every pair (from Kendall’s
of columns 6 concordance) the two computational routes
f o ) Pax
r.
with the univariate test is clear: large interaction effects for agreement of two blocks, j and k, is the Spearman
imply that the treatment pattern differs from block to correlation coefficient
block and there is little chance o f identifying a treat
ment effect; on the other hand, for a treatment x block
design such as the current mesocosm experiment there
<63)
is no reason to expect treatments to behave very differ
ently in the different basins. between the matching elements o f the two simil
What is therefore required is a measure o f how well the arity matrices {r;j, rlk: ƒ=], since these ranks are
treatment patterns in the ordinations for the different the only information used in successful MDS plots.
blocks match; this statistic can then be recomputed The coefficients can be averaged across all
under all possible (or a random subset of) permutations pairs from the bblocks, to obtain an overall me
of the treatment labels within each block. As previously, of agreement pav on which to base the test. A short cut
if the observed statistic does not fall within the body is to define, from the row totals { r¡.} and grand total
o f this (simulated) distribution there is significant K. shown in Fig. 6.9, Kendall’s (1970)
evidence to reject H0 Note that, as required by the concordance between the full set o f ranks:
statement o f H05 the test makes no assumption about
the absence o f block effects; between-block similarities
are irrelevant to a statistic based only on agreement in w = - 2— ^ 2— ~ -)2 ( 6 .4 )
b N ( N - 1) 1-1
within-block patterns.
In fact, for the same reasons advanced for the previous and then exploit the known relationship between this
ANOSIM tests (e.g. arbitrariness in choice o f MDS and Pav
dimensionality), it is more satisfactory to define agree
ment between treatment patterns by reference to the pm = ( b W - ] ) / ( b - l ) (6 .5 )
underlying similarity matrix and not the MDS locations.
Fig. 6.9 indicates two routes, which lead to equivalent As a correlation coefficient, pav takes values in the
formulations. If there are n treatments and thus N = range (-1, 1), with pav = 1 implying perfect agreement
n ( n - 1)/2 similarities within a block, a natural choice and Pav » 0 if the null hypothesis H0 is true.
Chapter 6
page 6-12
G I L M d K
B OH
P L
E
J B
D
Note that standard significance tests and confidence EXAMPLE: Exe nematodes (no replication
intervals for p or W (e.g. as given in basic statistical and missing data)
tables) are totally invalid, since they rely on the ranks
{r,f i= l,...,N } being from independent variables; this A final example demonstrates a positive outcome to
is obviously not true o f similarity coefficients from such a test, in a common case o f a 2-way layout of
all possible pairs of a set o f (independent) samples. sites and times with the additional feature that samples
This does not make pav any the less appropriate as a are missing altogether from a small number o f cells.
measure of agreement whose departure from zero Fig. 6.11 shows again the MDS, from Chapter 5, of
(rejection o f H0) is testable by permutation. nematode communities at 19 sites in the Exe estuary.
For the nutrient enrichment experiment, Fig. 6.10 shows
the separate MDS plots for the 4 mesocosm basins.
Although the stress values are rather high (and the 15
plots therefore slightly unreliable as a summary o f the
among treatment relationships), there appears to be
no commonality of pattern, and this is borne out by a
3 1 12 14 10
near zero value for pav o f -0.03. This is central to the
2 13 17
range o f simulated values for pav under H0 (obtained 16
by permuting treatment labels separately for each
block and recomputing pav), so the test provides no
evidence o f any treatment differences. Note that the 6
symmetry o f the 2-way layout also allows a test o f the 11
(less interesting) hypothesis that there are no block
effects, by looking for any consistency in the among-
basin relationships across separate analyses for each
10 5
o f the 16 treatments. The test is again non-significant,
with pav = -0.02. The overall negative conclusion to
Fig. 6.11. Exe estuary nematodes {X}. MDS, fo r 19 inter-tidal
the tests should bar any further attempts at interpret sites, o f species abundances averaged over 6 bi-monthly sampling
ation o f these data. occasions; see also Fig. 5.1 (stress = 0.05).
Chapter 6
page 6-13
In fact, this is based on an average o f data over six site labels are permuted amongst the available samples,
successive bi-monthly sampling occasions. For the separately for each time, and these designations fixed
individual times, the samples remain strongly clustered whilst all the paired p values are computed (using
into the 4 or 5 main groups apparent from Fig. 6.11. pairwise removal) and averaged. Here the, largest
Less clear, however, is whether any structure exists such pav value in 999 simulations was 0.30, so the
within the largest group (sites 12 to 19) or whether null hypothesis is rejected at th e p<0.1% level.
the scatter in Fig. 6.11 is simply the consequence of
sampling variation. In the same way, one can also carry out a test of the
Rejection of the null hypothesis of “no site differences” hypothesis that there are no differences across tim e
would be suggested by a common site pattern in the for sites 12 to 19. The component plots, o f the 4 to 6
separate MDS plots for the 6 times (Fig. 6.12). At times for each site, display no obvious features and
some of the times, however, one o f the site samples is Pav = 0.08 (p< 18%). The failure to reject this null
missing (site 19 at times 1 and 2, site 15 at time 4 and hypothesis justifies, to some extent, the use of averaged
site 18 at time 6). Instead o f removing these sites from data across the 6 times, in the earlier analyses.
all plots, in order to achieve matching sets of similar
ities, one can remove for each pair o f times only those Tests o f this form, searching for agreement between
sites missing for either o f that pair, and compute the two or more similarity matrices, occur also in Chapter
Spearman correlation p between the remaining rank 11 (in the context of matching species to environmental
similarities. The p values for all pairs o f times are data) and Chapter 15 (where they link biotic patterns
to some model structure). The discussion there includes
then averaged to give pav, i.e. the left-hand route is
use o f measures other than a simple Spearman coeff
taken in the lower half o f Fig. 6.9. This is usually
icient, for example a weighted Spearman coefficient
referred to as pairwise rem oval o f missing data, in
contrast to the listwise rem oval that would be needed pw (suggested for reasons explained in Chapter 11),
for the right-hand route. Though increasing the and these adjustments could certainly be implemented
computation time, pairwise removal clearly utilises here also if desired, using the left-hand route in the
more o f the available information. lower half of Fig.6.9. In the present context, this type
of “matching” test is clearly an inferior one to that
Fig. 6.12 shows evidence of a consistent site pattern, possible where genuine replication exists within the
for example in the proximity o f sites 12 to 14 and the 2-way layout. It cannot cope with follow-up tests for
tendency o f site 15 to be placed on its own; the fact differences between specific pairs o f treatments, and
that site 15 is missing on one occasion does not under it can have little sensitivity if the numbers of treatments
mine this perceived structure. Pairwise computation and blocks are both small. A test for two treatments
gives pav = 0.36 and its significance can be determined is impossible note, since the treatment pattern in all
by a Monte Carlo test, as before. The (non-missing) blocks would be identical.
15 18
16
17
15
Fig. 6 .12. Exe estuary nemat
odes {X}. MDS fo r sites 12 to
19 only, performed separately
16 fo r the 6 sampling times (read
1 2 13 across rows for time order); in
-A00
^1
.
SPECIES CLUSTERING AND MDS at a similarity o f around 10%, the dendrogram divides
fairly neatly into 5 clusters o f species, and these groups
can be identified with the 5 clusters that emerge from
Chapter 2 (page 2-6) describes how the original data the sample dendrogram, Fig. 5.3. (This identification
matrix can be used to define similarities between every comes simply from categorising the species by the
pair o f species; two species are thought o f as “similar” site groups in which they have the greatest abundance;
if their numbers (or biomass) tend to fluctuate in the correspondence between site and species groupings
parallel across sites. The resulting species similarity on this basis is seen to be very close.)
m atrix can be input to a cluster analysis or ordination
in exactly the same way as for sample similarities.^1 Fig 7.2 shows the 2-dimensional MDS plot o f the same
species similarities. The groups determined from the
Fig. 7.1 displays the results o f a cluster analysis on the cluster analysis are superimposed and indicate a good
Exe estuary nematode data {X} first seen in Chapter 5. measure o f agreement. However, both clustering and
The dendogram is based on Bray-Curtis similarities MDS have worked well here because the sites are
computed on standardised abundances, as given in strongly grouped, with many species characteristic o f
equations (2.9) and (2.10). Following the recommend only one site group. Typically, species cluster analyses
ations on page 2-6, the number o f species was first are less clearly delineated than this and the correspond
reduced, retaining only those that accounted for more ing MDS ordinations have high stress. A more inform
than 4% of the total abundance at any one site. Cluster ative approach is often to concentrate on the sample
analysis with a greater number o f species is possible similarities and highlight the species principally
but the “hit-and-miss” occurrence o f the rarer species responsible for determining the sample groupings in
across the sites tends to confuse the picture. In fact, the cluster or ordination analyses.
100
70 -
OJ _
= 60 -
E
'55
W 50 -
0
O
0
Q. 40 - Fig. 7.1. Exe estuary nemat
CO
odes PC}. Dendrogram using
30 - group-average linking on Bray-
Curtis species similarities
from standardised abundance
20 - data; the 57 most important
species were retained from an
original list o f 182. The 5
groups defined at arbitrary
similarity level o f 10% are
indicated.
^ Computation o f species similarities is an option available in the PRIMER CLUSTER routine, and
is referred to as inverse analysis by Field et al (1982).
Chapter 7
page 7-2
22 27 24
44 42
50
46
DETERMINING DISCRIMINATING for abundance data and matches the likely multi
variate analysis. (Ordination of the data,
SPECIES
using interval means, is indistinguishable from that
With a wide range o f sophisticated multivariate tech based on original abundances.)
niques at one’s disposal, it is all too easy to lose sight
Grp 111111111122222222222222222233333333333333444444444444444
of the original data. A full understanding requires the Site 124536781192112112111212222243445544335543544333355523445
data matrix to be re-examined in the light o f the multi 02 4397710564185926324890343354571115726876889062
The effect o f the re-ordering here is to concentrate the is large and SD (^) small (and thus the ratio ¿>, /SD(<5[)
higher abundances in the diagonal region o f the matrix, is large), then the Zth species not only contributes much
and it is then relatively easy to identity species which to the dissimilarity between groups 1 and 2 but it also
have characteristically different abundance levels does so consistently in inter-comparisons o f all samples
between (say) sample groups 1 and 2 (e.g. species 6, in the two groups; it is thus a good discrim inating
1, 4, 23, 18, 3). However, for a matrix with larger species.
numbers o f species and a less satisfactory species
ordination, a more automatic, analytical procedure for For the Bristol Channel Zooplankton data {B} of Fig.
identifying influential species is preferable, as follows. 7.3, Table 7.1 shows the results o f breaking down the
dissimilarities between sample groups 1 and 2 into
Similarity breakdown species contributions. Species are ordered by their
average contribution 8 l to the total average dissimil
The fundamental information on the multivariate
structure o f an abundance matrix is summarised in the arity 8 = I \ 8 t = 59.5. Species which are likely to be
Bray-Curtis similarities between samples, and it is by good discriminators o f groups 1 and 2 are indicated
disaggregating these that one most precisely identifies by an asterisk in the J,/S D (^ ) column. The final
the species responsible for particular aspects of the column rescales the first column to percentages, i.e. it
multivariate picture.1 So, first compute the average
computes the % o f the total dissimilarity 8 that is
dissimilarity 8 between all pairs of inter-group samples contributed by the Zth species and then cumulates
(e.g. every sample in group 1 paired with every sample these percentages down the rows o f the table. It can
in group 2) and then break this average down into be seen that many o f the species play some part in
separate contributions fr o m each species to 8 . determining the dissimilarity between groups 1 and 2,
and this is typical o f such analyses. Here, nearly 90%
For Bray-Curtis dissimilarity 8 jk between two samples of the contribution to 8 is accounted for by the first
j and k, the contribution from the Zth species, S/k(i), could twelve species listed, with over 50% accounted for by
simply be defined as the Zth term in the summation of the first five. The results are in good agreement, of
equation (2.11), namely: course, with the pattern observed in the condensed
matrix format of Fig. 7.3.
s jk(0 = i 00.|yÿ - y ik| / x f =1 + y it ) (7.1)
In much the same way, though perhaps of less practical
significance, one can examine the contribution each
8 jk(i) is then averaged over all pairs (j, k), with j in the
species makes to the average similarity within a group,
first and k in the second group, to give the average
S . The average contribution of the Zth species, S l ,
contribution 8 l from the Zth species to the overall
could be defined by taking the average, over all pairs
dissimilarity 8 between groups 1 and 2 } Typically,
there are many pairs o f samples (j, k) making up the Table 7.1. Bristol Channel Zooplankton {B}. Breakdown o f
average S¡, and a useful measure o f how consistently average dissimilarity between groups 1 and 2 into contributions
from each species; species are ordered in decreasing contribution
a species contributes to S¡ across all such pairs is the (part only given).
standard deviation SD (^) o f the Sjk{i) values.8 If S { _
Sp Name 5, SD(Si) ôj /SD(5i) SSi %
6 Eurytemora affinis 7.7 2.8 2.7* 13.0
4 Centropages hamatus 7.3 4.4 1.7* 25.2
f This is implemented in the PRIMER SIMPER routine ( “similarity 3 Calanus helgolandicus 6.8 4.0 1.7* 36.7
percentages ”), both in respect o f contribution to average similarity 1 Acartia bifilosa 5.7 4.0 1.4* 46.3
within a group and average dissimilarity between groups. 23 Temora longicornis 5.6 3.3 1.7* 55.6
+ 18 Pseudocalanus elongatus 4.7 1.5 3.1* 63.5
Though this is a natural definition, it should be noted that, in 13 Paracalanus parvus 3.3 4.2 0.8 69.1
the general unstandardised case, there is no unambiguous partition 15 Pleurobrachia pileus jv 3.1 2.8 1.1 74.3
o f Ôjk into contributions from each species, since the standardising 20 Sagitta elegans jv 2.9 1.9 1.6* 79.1
term in the denominator o f equation (7.1) is a function o f all species 19 Sagitta elegans 2.1 1.6 1.3 82.5
values. 8 Gastrosaccus spinifer 2.0 1.8 1.1 85.9
8 The usual definition o f standard deviation from elementary statistics 14 Pleurobrachia pileus 1.9 1.6 1.2 89.0
is a convenient measure o f variability here, but there is no sense 10 Mesopodopsis slabberi 1.7 1.4 1.3 91.9
in which the Sjfri) values are independent observations, and one 21 Schistomysis spiritus 1.6 1.4 1.1 94.5
cannot use standard statistical inference to define, say, 95% 17 Polychaete lai'vae 1.5 1.3 1.2 97.1
confidence intervals fo r the mean contribution from the ith 2 Acartia clausi 0.7 1.8 0.4 98.3
species.
C hapter 7
page 7-4
o f samples within a group, o f the zth term in the simil What is needed here is a more holistic technique,
arity definition o f equation (2.1) (the second form). identifying the set of influential species which, between
The more abundant a species is within a group, the them, capture the full multivariate pattern (whether
more it will contribute to the similarities. clustered or forming a gradation), and which operates
It typifies that group if it is found at a consistent with any appropriately-defined similarity coefficient.
abundance throughout, so the standard deviation of A possible method is suggested in a later chapter (16)
its contribution SD(5'/) is low, and the ratio .S', /SD(5,;) on comparing multivariate patterns.
high. Note that this says nothing about whether that
species is a good discriminator o f one group from
RECOMMENDATIONS
another; it may be very typical o f a number o f groups.
Such a breakdown is shown for group 1 o f the Bristol A multivariate display o f the samples, either by an
Channel Zooplankton data in Table 7.2. The average ordination or a cluster analysis, is not the end-point of
similarity within the group is 66.3, with more than a community analysis; it should be seen as a frame
two-thirds o f this contributed by only three species work within which the patterns o f individual species
(6, 18 and 1), the first two of which are found at very abundances can be interpreted.
consistent levels within the group.
1) This may be by simple re-examination o f the data
Table 7,2, Bristol Channel Zooplankton {B}. Breakdown o f matrix, ordered and re-presented (perhaps averaged
average similarity within group 1 into contributions from each within groups) in the light o f the information from the
species (part only given). multivariate analysis.
Sp Name Si SD(S|) S, /SD(Si) 2 S, % 2) In the case of a convincing clustering o f samples,
6 Eurytemora affinis 19.3 6.3 3.1 * 29.1 individual species contributions to the separation of the
18 Pseudocalanus elongatus 14.7 2.7 5.4 * 51.3 groups can be examined with the SIMPER procedure.
1 Acartia bifilosa 12.2 6.4 1.9 * 69.6
3.1 1.2 75.5
Note that this is not a statistical testing framework, just
17 Polychaete laiwae 3.9
14 Pleurobrachia pileus 3.4 3.8 0.9 80.7 an exploratory analysis. It indicates which species are
21 Schistomysis spiritus 3.3 3.6 0.9 85.7 principally responsible either for an observed clustering
15 Pleuorbrachia pileus jv 3.3 4.7 0.7 90.7 pattern or for differences between sets o f samples
that have been defined a priori and are confirmed to
differ in community structure by the tests o f Chapter 6.
Limitations of the method
3) Species identified in this manner (or by the more
The SIMPER procedure has two main constraints which, general pattern-matching procedures discussed in
to some extent, limit its usefulness.11 Chapter 16) are sometimes viewed most effectively in
conjunction with the ordination. One at a time, they
a) It applies only to Bray-Curtis dissimilarities, whereas can be superimposed on an MDS (or PCA) plot, as
one might legitimately want to examine the influence
circles whose varying diameters reflect the abundance
o f particular variables in a more general case, e.g. changes for that species across samples (see, for
when the variables are not species abundances but example, Fig. 15.3).f
environmental measures, and the dissimilarity coeff
icient is not Bray-Curtis but Euclidean distance.
b) It compares two groups o f samples at a time, identify
ing the influential species only for each specific
comparison. Some multivariate patterns, however,
are not so readily categorised but represent a cont
inuum o f community change in response to one or
more underlying gradients.
M a rg a le fs index (d) is used, which also incorporates and a further model-based description, Fisher’s a
the total number o f individuals (TV) and is a measure (Fisher et al, 1943), which is the shape parameter,
o f the number o f species present for a given number fitted by maximum likelihood, under the assumption
o f individuals: that the species abundance distribution follows a log
series. This has certainly been shown to be the case
d = (S -l)/\o g N (8.2) for some ecological data sets but can by no means be
universally assumed, and (as with Brillouin) its use is
Equitability clearly restricted to genuine (integral) counts.
This is often expressed as P ielou’s evenness index: The final option in this category is the rarefaction
J ' = H ' / H'max = H r/ \ o g S (8.3) method o f Sanders (1968) and Hurlbert (1971), which
under the strict assumption that individuals arrive in
where H'max is the maximum possible value of Shannon the sample independently o f each other, can be used
diversity, i.e. that which would be achieved if all species to project back from the counts o f total species (S)
were equally abundant (namely, log S). and individuals (TV), how many species (ESn) would
have been ‘expected’ had we observed a smaller number
Simpson {n) o f individuals:
M acrofauna N em a to d es
2.5 3.0
2.0
<
/>. 2.5
V
0 I . I
>
^5 1.5
C
o 2.0 Fig. 8.1. Hamilton Harbour,
c 1.0
Bermuda {Hf. Diversity (H)
CO
and 95% confidence intervals
0.5 1.5 fo r macrobenthos (left) and
H2 H3 H4 H5 H6 H7 H2 H3 H4 H5 H6 H7 meiobenthic nematodes (right)
S ite at six stations.
24
05 20 2.5 1.0
X
05
16 X
(D 12 0.9
.D (D
E 8 >
3 b
4 0.5 0.8
0
81 83 84 85 87 88 81 83 84 85 87 88 81 83 84 85 87 88
Year
Fig. 8.2. Indonesian reef corals, South Tikus Island {I}. Total number o f species (S), Diversity (H) and Evenness (J) based on coral
species cover data along transects, spanning the 1982—2 El Niño.
Caswell’s neutral model This might indicate that the macrobenthic communities
are under some kind of stress at these two stations.
In some circumstances, the equitability component o f However, it must be borne in mind that deviation in
diversity can, however, be compared with a theoretical H ’ from the neutral model prediction depends only on
expectation for diversity, given the number of individ differences in equitability, since the species richness
uals and species present. Observed diversity has been is fixed, and that the equitability component of diversity
compared with predictions from Caswell's neutral may behave differently from the species richness
m odel (Caswell, 1976). This model constructs an component in response to stress (see, for example,
ecologically ‘neutral’ community with the same number Fig. 8.2). Also, it is quite possible that the ‘intermediate
o f species and individuals as the observed community, disturbance hypothesis’ will have a bearing on the
assuming certain community assembly rules (random behaviour of V in response to disturbance, and increased
births/deaths and random immigrations/emigrations) disturbance may either cause it to decrease or increase.
and no interactions between species. The deviation Using this method, Caswell found that the flora of
statistic V is then determined which compares the tropical rain forests had a diversity below neutral model
observed diversity (H ’) with that predicted from the predictions!
neutral model (E (^')):
Table 8.1. Hamilton Harbour, Bermuda {H}. V statistics fo r
[H’- E j H ’)] summed replicates o f macrobenthos and meiobenthic nematode
(8.8) samples at six stations.
SD(H')
Station Macrobenthos Nematodes
A value o f zero for the Vs
tatistic indicates neutrality,
positive values indicate greater diversity than predicted H2 + 0.5 -0 .1
H3 - 5 .4 + 0.4
and negative values lower diversity. Values >+2 or
H4 - 4 .5 - 0 .5
< -2 indicate ‘significant’ departures from neutrality. H5 - 1 .9 0.0
The computer program o f Goldman and Lambshead H6 -1 .3 - 0 .4
(1989) is useful.11 H7 - 0 .2 - 0 .4
community structure which are not a function of the 4) k-dom inance curves are ranked a
specific taxa present, and may therefore be related to ances plotted against species rank, or log species
levels o f biological ‘stress’.' rank (Lambshead et al,1983). This has a s
effect on the curves. Ordering o f curves on a plot
1) Rarefaction curves (Sanders, 1968) were among will obviously be the reverse o f rarefaction curves,
the earliest to be used in marine studies. They are with the most elevated curve having the
plots of the number of individuals on the x-axis diversity. To compare d separately from
against the number o f species on the y-axis. The the number o f speci,the .r-axis (species rank) may
more diverse the community is, the steeper and be rescaled from 0-100 (relative species rank), to
more elevated is the rarefaction curve. The sample produce Lorenz curves.
sizes (N) may differ widely between stations, but
the relevant sections o f the curves can still be
compared. EXAMPLES: Carroch Head and Ekofisk
niacrofauna
2) Gray and Pearson (1982) recommend plotting the
number o f species in x2 geom etric abundance Plots o f geometric abundance classes along a transect
classes as a means of detecting the effects of pollution across the Garroch Head }se wage-sludg
{G
stress. These are plots o f the number o f species (Fig. 8.3) are given in Fig. 8.4. Note that the curves
represented by only 1 individual in the sample (class are very steep at both ends of the transect (the relatively
1), 2-3 individuals (class 2), 4 -7 (class 3), 8-15 unpolluted stations) with many species represented
(class 4) etc. In unpolluted situations there are many by only one individual, and they extend across very
rare species and the curve is smooth with its mode few abundance classes (6 at station 1 and 3 at station
well to the left. In polluted situations there are 12). As the dump centre at station 6 is approached
fewer rare species and more abundant species so the curves become much flatter, extending over many
that the higher geometric abundance classes are more abundance classes (13 at station 7), and there
more strongly represented, and the curve may also are fewer rare species.
become more irregular or ‘jagged’ (although this
latter feature is more difficult to quantify). Gray In Fig. 8.5a, average ranked species abundance curves
and Pearson further suggest that it is the species in (with the x-axis logged) are given for the macrobenthos
the intermediate abundance classes 3 to 5 that are at a group o f 6 sampling stations within 250m o f the
the most sensitive to pollution-induced changes current centre o f oil-drilling activity at the Ekofisk
and might best illustrate the differences between field in the North Sea {E},compared with
polluted and unpolluted sites (i.e. this is a way of 10 stations between 250m and 1km from the centre
selecting ‘indicator species’ objectively). (see inset map in Fig. 10.6a for locations of these stat
ions). Note that the curve for the more polluted (inner)
3 ) R a n ked species abundance (dominance) curves stations is J-shaped, showing high dominance o f abund
are based on the ranking o f species (or higher taxa) ant species, whereas the curve for the less polluted
in decreasing order o f their importance in terms o f (outer) stations is much flatter, with low dominance
abundance or biomass. The ranked abundances,
expressed as a percentage o f the total abundance o f
asaapl
all species, are plotted against the relevant species
rank. Log transformations o f one or both axes have ¡h L .
But»
frequently been used to emphasise or downweight
different sections of the curves. Logging the x (rank)
Ayrshire
axis enables the distribution o f the commoner species
to be better visualised.
, Dumpsite
‘ ff?®
-
j
*
Two plotting programs o f this typ Arren
'/'Ü.. - ,f
package: Geometric class plots, wh Firth of Clyde 5 km
ution o f geometric abundance (/biom
plots, which generate ranked species
choosing from ordinary, cumulative Fig. 8.3. Garroch Head macrofauna {G}. Map showing location
or dual (abundance-biomass compari o f dump-ground and position o f sampling stations (1-12); the
the remainder o f this chapter. dump centre is at station 6.
Chapter 8
page 8-6
Fig. 8.5b shows ^-dominance curves for the same data. are A-selected or conservative species, with the attrib
Here the curve for the inner stations is elevated, utes of large body size and long life-span: these are
indicating lower diversity than at the 250m -lkm rarely dominant numerically but are dominant in terms
stations. o f biomass. Also present in these communities are
smaller r-selected or opportunistic species with a short
Abundance/biomass comparison plots
life-span, which can be numerically significant but do
W hether ^-dominance curves are plotted from the not represent a large proportion o f the community
species abundance distribution of from species biomass biomass. When pollution perturbs a community,
values, the from the species abundance distribution o f conservative species are less favoured in comparison
from species biomass values, the y-axis is always scaled with opportunists. Thus, under pollution stress, the
in the same range (0 to 100). This facilitates the distribution of numbers o f individuals among species
Abundance/Biom ass Comparison (ABC) method o f behaves differently from the distribution o f biomass
determining levels of disturbance (pollution-induced among species.
or otherwise) on benthic macrofauna communities.
Under stable conditions o f infrequent disturbance the The ABC method, as originally described by Warwick
competitive dominants in macrobenthic communities (1986), involves the plotting of separate ^-dominance
50 o 100
o
c
(C
a> 40 c
o
c E Fig. 8.5. Ekofisk macrobenthos
ro 30 o
c TJ {E}, a) Average ranked species
E abundance curves (x-axis
o
■D 20 g 40 logged) fo r 6 stations within
250m o f the centre o f drilling
J2
10 3 activity (dotted line) and 10
E stations between 250m and
3
0 O lkm from the centre (solid
1 10 100 1 10 100 line); b) k-dominance curves
for the same groups o f stations.
Species rank
Chapter 8
page 8-7
100
a>
>
I 50 -
3
E
3
o Moderately Grossly
Unpolluted polluted polluted
10 1 5 10
Species rank (log scale)
Fig. 8.6. Hypothetical k-dominance cum es fo r species biomass and abundance, showing ‘unpolluted’, ‘moderately polluted’ and
‘g rosslypolluted’ conditions.
1963
become transposed at some distance from the dump- y, ' =log[( 1 + yi)l( 101 - y,)] (8.9)
centre, when species richness is still high.
An example of the effect o f this transformation on
Transformations of A-dominance curves ABC curves is given in Fig. 8.9 for the macrofauna at
two stations in Frierfjord, Norway A being an
Very often A-dominance curves approach a cumulative
unimpacted reference site and C a potentially impacted
frequency of 100% for a large part o f their length, and
site. At site C there is an indication that the biomass
in highly dominated communities this may be after the
and abundance curves cross at about the tenth species,
first two or three top-ranked species. Thus, it may be
but since both curves are close to 100% at this point,
difficult to distinguish between the forms o f these
the crossover is unclear. The logistic transformation
curves. The solution to this problem is to transform
enables this crossover to be better visualised, and
the y-axis so that the cumulative values are closer to
illustrates more clearly the differences in the ABC
linearity. Clarke (1990) suggests the m odified logistic
configurations between these two sites.
transformation:
Chapter 8
page 8-9
100
100
40
40
100
80
60
40
Partial dominance curves same with the third most dominant etc. Thus if a¡ is
the absolute (or percentage) abundance o f the rth
A second problem with the cumulative nature o f k- species, when ranked in decreasing abundance order,
dominance (and ABC) curves is that the visual inform the partial dominance curve is a plot o f p¡ against log
ation presented is over-dependent on the single most i ( /= 1, 2,..., S - 1), where
dominant species. The unpredictable presence of
large numbers of a species with small biomass,
Pi =perhaps
10 0 « i/ Z j = i aj» P i = 1 0 0 a 2/ ’
an influx o f the juveniles o f one species, may give a
false impression o f disturbance. With genuine disturb •••> Ps-\ =100 (8.10)
ance, one might expect patterns o f ABC curves to be
unaffected by successive removal o f the one or two Earlier values can therefore never affect later points
most dominant species in terms of abundance or on the curve. The partial dominance curves (ABC)
biomass, and so Clarke (1990) recommended the use for undisturbed macrobenthic communities typically
o f partial dom inance curves, which compute the look like Fig. 8.10, with the biomass curve (thin line)
dominance o f the second ranked species over the above the abundance curve (thick line) throughout its
remainder (ignoring the first ranked species), the length. The abundance curve is much smoother than
Chapter 8
page 8-10
100 T
o
o
c
aj
c Site A Site C
£
o
■ a
<D C
~ 100 y
JH
3
E
the biomass curve, showing a slight and steady decline shows that in the most polluted years 1971 and 1972
before the inevitable final rise. Under polluted cond the abundance curve is above the biomass curve for
itions there is still a change in position o f partial domin most of its length (and the abundance curve is very
ance curves for abundance and biomass, with the atypically erratic), the curves cross over in the moderat
abundance curve now above the biomass curve in ely polluted years 1968 and 1970 and have an unpolluted
places, and the abundance curve becoming much more configuration prior to the pollution impact in 1966. In
variable. This implies that pollution effects are not 1967, there is perhaps the suggestion of incipient change
ju st seen in changes to a few dominant species but are in the initial rise in the abundance curve. Although
a phenomenon which pervades the complete suite o f these curves are not so smooth (and therefore not so
species in the community. For example, the time series visually appealing!) as the original ABC curves, they
o f macrobenthos data from Loch Linnhe (see Fig. 8.11) may provide a useful alternative aid to interpretation
and are certainly more robust to random fluctuations
70 T in the abundance o f a small-sized, numerically dominant
Site A
species.
60 --
CD
o Phyletic role in ABC method
c
es
c
100
<D
o
c
(0
C
£ 1966 1967 1968
o
■o
<D 100
>
'43
iS
3
E
3
o
60
40
ao>
c 20
re
c
g 60
re
1.
40 Fig. 8.11. Loch Linnhe macro
fa u n a {L}. Selected years
1966-68 and 1970-72. a-j)
20 ABC curves (logistic transform).
g)-l) Partial dominance curves
0 1970 1971 1972 fo r abundance (thick line) and
10 1 10 10 biomass (thin line) fo r the same
Species rank years.
CHAPTER 9: TRANSFORMATIONS
There are two distinct roles for transformations in The lack of symmetiy (and thus approximate normality)
community analyses: of the replication distribution is probably of less import
a) to validate statistical assumptions for parametric ance than the large difference in variability; ANOVA
techniques - in the approach o f this manual such relies on an assumption o f constant variance across
methods are restricted to univariate tests; the groups. Fortunately, both defects can be overcome
by a simple transformation of the raw data; a power
b )to weight the contributions o f common and rare
transformation (such as a square root), or a logarithmic
species in the (non-parametric) multivariate repres
transformation, have the effect both of reducing right
entations.
skewness and stabilising the variance.
The second reason is the only one of relevance to the
preceding chapters, with the exception o f Chapter 8 Power transformations
where it was seen that standard parametric analysis of
variance (ANOVA) could be applied to diversity indices The power transformations y* = y x form a simple and
computed from replicate samples at different sites or useful family, in which decreasing values o f X produce
times. Being composite indices, derived from all species increasingly severe transformations. The log transform,
counts in a sample, some o f these will already be y* = loge(y), can also be encompassed in this series
approximately continuous variâtes with symmetric (technically, (yÀ - \)/X -> loge(y) as X -> 0). Box and
distributions, and others can be readily transformed to Cox (1964) give a maximum likelihood procedure for
the normality and constant variance requirements of optimal selection o f X but, in practice, a precise value
standard ANOVA. Also, there may be interest in the is not important, and indeed rather artificial if one
abundance patterns o f individual species, specified a were to use slightly different values of X for each new
priori (e.g. keystone species), which are sufficiently analysis. The aim should be to select a transformation of
common across most sites for there to be some possib the right order for all data of a particular type, choosing
ility of valid parametric analysis after transformation. only from, say: none, square root, 4th root or logarith
mic. It is not necessaiy for a valid ANOVA that the
UNIVARIATE CASE variance be precisely stabilised or the non-normality
totally removed, just that gross departures from the
For purely illustrative purposes, Table 9.1 extracts the parametric assumptions (e.g. the order o f magnitude
counts o f a single Thyasira species from the Frierfjord change in s.d. in Table 9.1) are avoided. One useful
macrofauna data {F}, consisting o f four replicates at technique is to plot log(A.¿/.) against log (mean) and
each of six sites. estimate the approximate slope o f this relationship
(ß). This is shown here for the data of Table 9.1.
Table 9.1. Frierjjord macrofauna {Fj. Abundance o f a single
species (Thyasira sp.) in four replicate grabs at each o f the six
sites (A-E, G). Log(s.d.)
Site: A B C D E G
Replicate
1 1 7 0 1 62 6 6
2 4 0 0 8 1 0 2 6 8
ß = 0.55
3 3 3 0 5 93 52
4 11 2 3 13 69 36
Mean 4.8 3.0 0 .8 6 .8 81.8 55.5
Stand, dev. 4.3 2.9 1.5 5.1 18.7 14.8
Log(mean)
Two features are apparent:
It can be shown that, approximately, if X is set roughly
1) the replicates are not symmetrically distributed (they equal to 1 - ß, the transformed data will have constant
tend to be right-skewed); variance. That is, a slope o f zero implies no transform
2) the replication variance tends to increase with increas ation, 0.5 implies the square root, 0.75 the 4th root
ing mean, as is clear from the mean and standard and 1 the log transform. Here, the square root is
deviation (s.d.) values given in Table 9.1. indicated and Table 9.2 gives the mean and standard
Chapter 9
page 9-2
deviations of the root-transformed abundances: the of defining the balance between contributions from
s.d. is now remarkably constant in spite of the order common and rarer species in the measure of similarity
o f magnitude difference in mean values across sites. of two samples.
An ANOVA would now be a valid and effective testing
procedure for the hypothesis o f “no site-to-site differ Returning to the simple example o f Chapter 2, a subset
ences”, and the means and 95% confidence intervals o f the Loch Linnhe macro fauna data, Table 9.3 shows
for each site can be back-transformed to the original the effect o f a 4th root transformation o f these abund
measurement scales for a more visually helpful plot. ances on the Bray-Curtis similarities. The rank order
of the similarity values is certainly changed from the
Table 9.2. Frierfjord macrofauna {F}. Mean and standard deviation untransformed case, and one way o f demonstrating
over the four replicates at each site, fo r root-transformed abund how dominated the latter is by the single most numerous
ances o f Thyasira sp. species (Capitella capitata) is shown in Table 9.4.
Leaving out each o f the species in turn, the Bray-Curtis
Site: A B C D E G
similarity between samples 2 and 4 fluctuates wildly
Mean (y*) 2.01 1.45 0.43 2.42 9.00 7.40 when Capitella is omitted in the untransformed case,
S.d.(y*) 0.97 1.10 0.87 1.10 1.04 1.04 though changes much less dramatically under 4th root
transformation, which downweights the effect of single
species.
Like all illustrations, though genuine enough, this one
works out too well to be typical! In practice, there is
Table 9.3. Loch Linnhe macrofauna {L} subset. Untransformed
usually a good deal o f scatter in the log s.d. versus and 4th root-transformed abundances fo r some selected species
log mean plots; more importantly, most species will and samples (years), and the resulting Bray-Curtis similarities
have many more zero entries than in this example and between samples.
it is impossible to “transform these away”: species
Untransformed
abundance data are simply not normally distributed
and can only rarely be made so. Another important Sample: 1 2 3 4
point to note here is that it is never valid to “snoop” Species Sample 1 2 3 4
in a data matrix of, perhaps, several hundred species Echinoca. 9 0 0 0 1 -
or indicator species that can be tested with an entirely Myrioche. 2.1 0 0 1.3 2 26 -
independent set o f samples. Labidopl. 1.7 2.5 0 1.8 3 0 68 -
Amaeana 0 1.9 3.5 1.7 4 52 68 42 -
These two difficulties between them motivate the only Capitella 0 3.4 4.3 1.2
satisfactory approach to most community data sets: a Mytilus 0 0 0 0
properly multivariate one in which all species are
considered in combination in non-parametric methods
of display and testing, which make no distributional Transformation sequence
assumptions at all about the individual counts.
The previous remarks about the family o f power trans
formations apply equally here: they provide a continuum
MULTIVARIATE CASE of effect from X = 1 (no transform), for which only
the common species contribute to the similarity, through
There being no necessity to transform to attain distrib X = 0.5 (square root), which allows the intermediate
utional properties, transformations play an entirely abundance species to play a part, to X - 0.25 (4th
separate (but equally important) role in the clustering root), which takes some account also o f rarer species.
and ordination methods o f the previous chapters, that As noted earlier, X -> 0 can be thought of as equivalent
Chapter 9
page 9-3
to the loge(y) transformation and the latter would between the samples in any way, and therefore do not
therefore be more severe than the 4th root transform. contribute to the final multivariate description. The
However, in this form, the transformation is impractical emphasis is therefore shifted firmly towards patterns in
because the (many) zero values produce log(O) —» -oo. the intermediate and rarer species, the generally larger
Thus, common practice is to use log(l+y) rather than numbers of these tending to over-ride the contributions
log(y), since log(l+y) is always positive for positive;; from the few numerical or biomass dominants.
and log( 1+y) = 0 for y = 0. The modified transformation
no longer falls strictly within the power sequence; on Table 9.5. Loch Linnhe macrofauna
absence (0) o f the six species in t
large abundances it does produce a more severe trans
the resulting Bray-Curtis similarities.
formation than the 4th root but for small abundances it
is less severe than the 4th root. In fact, there are rarely Presence/absence
any practical differences between cluster and ordination Sample: 1 2 3 4
results performed following y ° 25 or log(l+j;) transform Species Sample 1 2 3 4
ations; they are effectively equivalent in focusing Echinoca. 1 0 0 0 1 -
No transform a Root b
L
L
H H
lm h H
L M
8 M
H
L “ MC H
M L c c M rC L
M
consuming and therefore relatively expensive. One how much information has been lost compared with
practical means of overcoming this problem is to exploit the full species-level analysis. Although such exper
the redundancy in community data by analysing the iments have not often been done for other components
samples to higher taxonomic levels, such as family, of the marine biota (e.g. plankton), results from the
rather than to species. If results from identifications benthic studies are remarkably clear in that very little
to higher taxonomic levels are comparable to a full information appears to be lost after a moderate degree
species analysis, this means that: o f aggregation.
a) A great deal o f labour can be saved. Several groups
Methods amenable to aggregation
o f marine organisms are taxonomically difficult,
for example (in the macrobenthos) several families 1) M ultivariate m ethods. Although taxonomic levels
of polychaetes and amphipods; as much time can higher than that o f species can be used to some degree
be spent in separating a few of these difficult groups for all types o f statistical analysis o f community
into species as the entire remainder of the sample, data, it is probably for multivariate methods that
even in Northern Europe where taxonomic keys for this is most appropriate, in view o f the redundancy
identification are most readily available. discussed above. All ordination/clustering techniques
b)L e ss taxonom ic expertise is needed. Many taxa are amenable to aggregation, and there is now subs
really require the skills o f specialists to separate tantial empirical evidence that identification only
them into species, and this is especially true in parts to the family level for macrobenthos, and the genus
o f the world where fauna is poorly described. For level for meiobenthos, makes very little difference
certain groups of marine organisms, e.g. the meio- to the results (see, for example, Figs. 10.2-10.6,
benthos, the necessary expertise required to identify and more recent results described in Chapter 16).
even the major taxa (nematodes and copepods) to There are also certain possible theoretical advantages
species is lacking in most laboratories which are to conducting multivariate analyses at a high taxon
concerned with the monitoring o f marine pollution, omic level for pollution impact studies. Natural
so that these components o f the biota are rarely environmental variables which also affect community
used in such studies, despite their many inherent structure are rarely constant in surveys designed to
advantages (see Chapter 13). detect pollution effects over relatively large geograph
ical areas. In the case o f the benthos, these ‘nuisance’
For the marine macro- and meiobenthos, aggregations variables include water depth and sediment granul
o f the species data to higher taxonomic levels have ometry. However, it is a tenable hypothesis that these
been made1 and the resultant data matrices have been variables usually influence the fauna more by species
subjected to several forms of statistical analysis to see replacement than by changes in the proportions o f
the major taxa present. Each major group, in its
adaptive radiation, has evolved species which are
1 This can be performed in PRIMER using the Aggregate routine,
which pools the species counts in a data matrix to genus or fam ily
suited to rather narrow ranges o f natural environ
level, say, using an aggregation worksheet. The latter specifies mental conditions, whereas the advent of pollution
the genus, family, order, class etc designations fo r each species, by man has been too recent for the evolution o f
and such aggregation files are also o f fundamental importance in suitably adapted species. Ordinations of abundance
computing biodiversity measures based on the taxonomic relatedness
or biomass data o f these m ajor taxa are therefore
o f species in each sample, see Chapter 17.
Chapter 10
page 10-3
C c ll
L L ll
C
ll H
Fig. 10.2. Nutrient-enrichment experiment, Solbergstrand {N}. MDS plot o f copepod abundances (^-transformed, Bray-Curtis similar
ities) for four replicates from each o f three treatments; species data aggregated into genera and families (stress = 0.09, 0.09, 0.08).
more likely to correlate with a contamination indicator groups rather than indicator species is
gradient than are species ordinations, the latter well-established. For example, at organically
being more complicated by the effects of natural enriched sites, polychaetes of the family Capitell
environmental variables when large heterogeneous idae become abundant (not just Capitella capitata),
geographical areas are considered. In short, higher as do meiobenthic nematodes o f the family Onch
taxa may well reflect well-defined pollution gradients olaimidae. The nem atode copepod ratio (Raffaelli
more closely than species. and Mason, 1981) is an example of a pollution
index based on higher taxonomic levels. Such
2) D istributional methods. Aggregation for ABC indices are likely to be o f more general applicability
curves is possible, and family level analyses are than those based on species level information.
often identical to species level analyses (see Fig. Diversity indices themselves can be defined at
10.7). hierarchical taxonomic levels for internal comparative
purposes, although this is not commonly done in
3) Univariate methods. The concept o f pollution practice.
64
no
3
O
o 72 73 69 68
70
69 68
72 73
72
Fig. 10.3. Loch Linnhe macrofauna {Lj. MDS (using Bray-Curtis similarities) o f samples from 11 years. Abundances are VV-transformed
(top) and untransformed (bottom), with 115 species (left), aggregated into 45 families (middle) and 9 phyla (right). (Reading across rows,
stress = 0.09, 0.09, 0.10, 0.09, 0.09, 0.02).
Chapter 10
page 10-4
Species Phyla
Oil
spill
Fig. 10.4. Amoco-Ccidiz oil spill
{A}. MDS fo r macrobenthos
at station “Pierre Noire” in the
Oil
spill
Bay o f Morlaix. Species data
(left) aggregated into phyla
(right). Sampling months are
A:4/77, B:8/77, C:9/77, D:12/77,
E:2/78, F :4/78, G:8/78, H: 11/78,
1:2/79, J:5/79, K:7/79, 1:10/79,
M:2/80, N:4/80, 0:8/80, P:10/80,
0:1/81, R:4/81, S:8/81, T:ll/81,
U:2/82. The oil-spill was during
3/78, (stress = 0.09, 0.07).
Species Genera
3
7
7 7 7
3
^ 00
00
8
00
7 3
8 8
7
°o 00
4
8
f".
4 4 3
5 1 5 43
1 5 8
\ 5 3
1 1 5 5 4
4
5 5
1 55
1 1
Fig. 10.5. Indonesian reef corals {I}. MDS fo r species (p=75) and genus (p=24) data at South Pari Island (Bray-Curtis similarities on
untransformed % cover). The El Niño occurred in 1982—3. 1=1981, 3=1983 etc. (stress = 0.25).
Chapter 10
page 10-5
community stress, and a return to the left in 1973 The analysis of phyla closely reflects the timing of
associated with reduced pollution levels and community pollution events, the configuration being slightly more
stress. This pattern is equally clear at all levels o f linear than in the species analysis. All pre-spill samples
taxonomic aggregation. Again, the separation o f the (A-E) are in the top left of the configuration, the immed
most polluted years is most distinct at the phylum iate post-spill sample (F) shifts abruptly to the bottom
level, at least for the double square root transformed right after which there is a gradual recovery in the
data (and the configuration is more linear with respect pre-spill direction. Note that in the species analysis,
to the pollution gradient at the phylum level for the although results are similar, the immediate post-spill
untransformed data). response is rather more gradual. The community
response at the phylum level is remarkably clear.
Amoco-Cadiz oil-spill
Indonesian reef corals
M acrofauna species were sampled at station ‘Pierre
N oire’ in the Bay of Morlaix on 21 occasions between The El Niño of 1982-3 resulted in extensive bleaching
April 1977 and February 1982, spanning the period o f of reef corals throughout the Pacific. Fig. 10.5 shows
the wreck o f the ‘Amoco-Cadiz’ in March 1978. The the coral community response at South Pari Island
sampling site was some 40km from the initial tanker over six years in the period 1981-1988, based on ten
disaster but substantial coastal oil slicks resulted. replicate line transects along which coral species cover
was determined. Note the immediate post-El Niño
The species abundance MDS has been repeated with the location shift on the species MDS and a circuitous
data aggregated into five phyla: Annelida, Mollusca, return towards the pre-El Nino condition. This is closely
Arthropoda, Echinodermata a n d ‘others’ (Fig. 10.4). reflected in the genus level analysis.
a S am pling s ite s b S p e c ie s
A —* (30km E)
2km 500m
A
A
Distance from centre of drilling activity:
A >3.5km □ 1 - 3.5km ® 250m - 1km S <250m
Fig. 10.6. Ekoflsk oil-platform macrobenthos {Ej. a) Map o f station positions, indicating symbol/shading conventions fo r distance zones
from the centre o f drilling activity; b)-d) MDS fo r root-transformed species, fam ily and phyla abundances respectively (stress = 0.12,
0.11, 0.13).
Chapter 10
page 10-6
100
4
I
3 oo
* Q
I
o 2
>
¡5 1963 1964
1
64 66 68 70 72
100
80
60
40
100
sp
o'* 80
0)
>
VS 60
3
E 40
3
o
20 1968 1969 1970
100
80
60
Fig. 10.7. Loch Linnhe macro-
40
fauna {L}. Shannon diversity
20 1971 1972 1973 (H ) and ABC plots over the
11 years, 1963 to 1973, fo r
0 data aggregated to family level
1 10 1 10 1 10 (c.f Fig. 8.7). Abundance =
Species rank thick line, biomass = thin line.
Chapter 10
page 10-7
Species Genera
24 10
20 8
X
(Ü 16
6
12
<D 4
n 8
E
3
z 2
4
0 0
2.5 1.6
X
Fig. 10.8. Indonesian reef corals
1.2
co m Means and 95% confidence
i— 1.5
o intervals fo r number o f taxa
> 0.8
and Shannon diversity at South
5
Tikus Island, showing the impact
0.5 0.4
and partial recovery from the
1982-3 El Niño. Species data
_!____ I____ I_
81 83 84 85 87 88 81 83 84 85 87 88 (left) have been aggregated
into genera (right).
Year
Loch Linnhe macrofauna Clearly the operational taxonomic level for environ
mental impact studies is another factor to be considered
ABC plots for the Loch Linnhe macrobenthos species when planning such a survey, along with decisions
data are given in Chapter 8, Fig. 8.7, where the perform about the number o f stations to be sampled, number
ance o f these curves with respect to the time-course of replicates, types o f statistical analysis to be employed
of pollution events is discussed. In Fig. 10.7 the species etc. The choice will depend on several factors, particul
data are aggregated to family level, and it is seen that arly the time, manpower and expertise available and
the curves are virtually identical to the species level the extent to which that component o f the biota being
analysis, so that there would have been no loss o f studied is known to be robust to taxonomic aggregation,
information had the samples only been sorted originally for the type o f statistical analysis being employed,
into families. and the type o f perturbation expected. Thus, it is
Similar results were produced by replotting the ABC difficult to give general recommendations and each
curves for the Garroch Head sewage sludge dumping case must be treated on its individual merits. However,
ground macrobenthos }F
{G forfamily
( ig. 8.8) at the routinelevel
monitoring o f organic enrichment situations
(W arwick, 1988b). using macrobenthos, one can by now be rather certain
that family level analysis will be perfectly adequate.
Similarly, for meiofaunal taxa, there are by now many
UNIVARIATE EXAMPLE examples where multivariate analysis o f genus-level
information is indistinguishable from that for
Indonesian reef corals species.11 For other components o f the marine fauna,
and observational studies not concerned with
Fig. 10.8 shows results from another survey o f 10 detection o f organic enrichment impacts, the body of
replicate line transects for coral cover over the period evidence supporting a particular choice of taxonomic
1981-1988, in this case at South Tikus Island, Indonesia level is much smaller.
{I}. Note the similarity o f the species and genus
analyses for the number o f taxa and Shannon diversity,
1 See also Chapter 16, which exten
with an immediate post-El Niño drop and subsequent
providing the tools fo r quantifying
suggestion o f partial recovery. ation a multivariate analysis provides
indeed the transformation) is varied.
Chapter 10
page 10-8
Chapter 11
page 11-1
Table IL L Garroch Head dump ground {G}. Sediment metal concentrations (ppm), water depth at the site (m) and organic loading o f
the sediment (% carbon and nitrogen), fo r the transect o f 12 stations across the sewage-sludge dump site (centre at station 6), see Fig. 8.3.
Station Cu Mn Co Ni Zn Cd Pb Cr Dep %C %N
1 26 2470 14 34 i 60 0 70 53 144 3 0.53
2 30 1170 15 32 156 0.2 59 15 152 3 0.46
3 37 394 12 38 182 0.2 81 77 140 2.9 0.36
4 74 349 12 41 227 0.5 97 113 106 3.7 0.46
5 115 317 10 37 329 2.2 137 177 112 5.6 0.69
6 344 221 10 37 652 5.7 319 314 82 11.2 1.07
7 194 257 11 34 425 3.7 175 227 74 7.1 0.72
8 127 246 10 33 292 2.2 130 182 70 6.8 0.58
9 36 194 6 16 89 0.4 42 57 64 1.9 0.29
10 30 326 11 26 108 0.1 44 52 80 3.2 0.38
11 24 439 12 34 119 0.1 58 36 83 2.1 0.35
12 22 801 12 33 118 0 52 51 83 2.3 0.45
Thus the decision to log all the metal data stems not
ju st from the draftsman plots but also from previous Fig. IL L Garroch H ead dump ground {G}. Two-dimensional
experience that such concentration variables often PCA ordination o f the 11 environmental variables o f Table 11.1
(transformed and normalised), fo r the stations (1-12) across the
have standard deviations proportional to their means; sewage-sludge dump site centred at station 6 (% variance explained
i.e. a roughly constant percentage variation is log = 88 %).
transformed to a stable absolute variance.
strong pattern of incremental change on moving from
Fig. 11.1 displays the first two axes (PCI and PC2) o f the ends o f the transect to the centre o f the dump site,
a PCA ordination on the transformed data o f Table which (unsurprisingly) has the greatest levels of organic
11.1. In fact, the first component accounts for much enrichment and metal concentrations (a significant
o f the variability (61%) in the full matrix, the first exception being Mn).
two components accounting for 88%, so the 2-d plot
provides an accurate summary o f the sample relation LINKING BIOTA TO UNIVARIATE
ships. Broadly speaking, PCI represents an axis of
ENVIRONMENTAL MEASURES
increasing contaminant load:
variables can be exploited in this way. In the case of EXAMPLE: Bristol Channel Zooplankton
the Garroch Head dump ground, Fig. 11.2 shows the
relation between Shannon diversity of the macrofauna
samples at the 12 sites and the overall contaminant The cluster analysis o f Zooplankton samples from 57
load, as reflected in the first PC o f the environmental sites in the Bristol Channel was seen in Chapter
data (Fig. 11.1). Here the relationship appears to be a 3, and the dendogram suggested a division o f the
simple linear decrease in diversity with increasing samples into 4 or 5 main clusters (Fig. 3.3). The
load, and the fitted linear regression line clearly has a matching MDS (Fig. 11.3), whilst in good agreement
significantly non-zero slope (ß = - 0.29, p < 0.1%). with the cluster analysis, reveals a more informative
picture of a strong gradient of change from the Inner
Channel to the Celtic Sea sites.
3
gradients (Field et al, 1982).^ * Note the horseshoe effect (more properly termed the arch effect),
which is a common feature o f the ordination from single, strong
environmental gradients. Both theoretically and empirically,
non-metric MDS would seem to be less susceptible to this than
^ Bubble plots, superimposing environmental data onto an ordination, metric ordination methods, but without the drastic (and somewhat
are a basic feature provided in the PRIMER MDS plotting routine. arbitrary) intervention in the plot that a technique like detrended
The technique can also be useful in a wider context: Field et al correspondence analysis uses (specifically to “cut and p a ste”
(1982) superimpose morphological characteristics o f each species such ordinations to a straight line), some degree o f curvature is
onto a species MDS o f the type seen in Chapter 7, and Warwick unavoidable and natural. Where samples towards opposite ends
and Clarke (1993a; see also Fig. 15.3) give an example o f super o f the environmental gradient have few species in common (thus
imposition o f biotic variables drawn from the same data matrix giving dissimilarities near 100%), samples which are even further
as used to create the MDS. The latter can provide useful insight apart on the gradient have little scope to increase their dissimil
into the role o f individual taxa in shaping the biotic picture (see arity further. To some extent, non-metric MDS can compensate
section 4 o f the PRIMER User Manual/Tutorial), especially when fo r this by the flexibility o f its monotonie regression o f distance
the number o f taxa is small, as is the case fo r the phylum-level on dissimilarity (Chapter 5), but arching o f the tails o f the plot is
“meta-analysis” o f Chapter 15. clearly inevitable after dissimilarities o f 100% are reached.
Chapter 11
page 11-4
42 15 20
49
48 44 14
53 54
50 34
16
12
21 18
43, 35 ç 11
33 5 5 22
292
25 17 10
47 5 5 26 8
31
23 13 19 24 6 7
27
51 4145 3 5 Fig. 11.3. Bristol Channel Zoo
37 4 plankton {B}. Biotic MDS fo r
32 36 the 57 sampling sites (1-29,
52
oz 57
38 __ 9 2 31-58) mapped in Fig. 3.2,
58 from the same Bray-Curtis
46 40 5 6 similarities on Vv-transformed
39 1 abundances used for the cluster
28
analysis o f Fig. 3.3 (stress =
0. 11).
4 4
5
6
5 6 4
6 6
4
6
3
5 5 5
6 6 5 5
5 6 5 3 4 3
8 5 2
6
6 5 6
5 5 2 2
00
7 1 1
1
Fig. 11.4. Bristol Channel Zoo
9 7 4 plankton {B}. MDS o f Fig.
7 1
7 11.3, with superimposed codes
00
9
00
EXAMPLE: Garroch Head macrofauna 11.2), and there is an even more graphic representation
o f steady community change in the multivariate plot
as the dump centre is approached (stations 1 through
The macrofauna samples from the 12 stations on the to 6), with gradual reversion to the original community
Garroch Head transect {G}lead to structure
the MDS on plot of
moving away from the centre (stations 6
Fig. 11.5a. For a change, this is based not on abundance through to 12).
but biomass values (root-transform ed).' Earlier in the
chapter, it was seen that the contaminant gradient The correlation o f the biotic pattern with particular
induced a marked response in species diversity (Fig. contaminant variables is clearly illustrated by the
superimposition technique discussed above: Fig.
11,5b displays the values of % carbon in the sediment
' Chapter 14 argues that, where it (Tables 11.1) as circles of varying diameter, which
times be more biologically relevant than abundance, though in confirms the main axis of the biotic MDS as one o f
practice MDS plots from both will be broadly similar, especially the increasing organic enrichment. Several o f the
under heavy transformation, as the data tends towards presence/ metal concentrations from Table 11.1 show a similar
absence (Chapter 9).
Chapter 11
page 11-5
a b
Biota +C
1 12 o °
o
2 34
0
O
„ « s
o
9
8 7
0
OO
Fig. 11.5. Garroch Head macro
+ Mn + Pb fa u n a {G}, a) MDS o f Bray-
Curtis similarities from trans
formed species biomass data
at the 12 stations (Fig. 8.3);
O 03 b)-d) the same MDS but with
superimposed circles o f increas
ing size with increasing sediment
concentration o f C, Mn and
O O Pb, from Table 11.1. (Stress
= 0.05).
pattern, one exception being Mn, which displays a monotonically along the main MDS axis but cannot
strong gradient in the other direction (Fig. 11.5c). be responsible for the division, for example, between
sites 1-4 and 7-9. On the other hand, the relation o f
In fact, some of the metal and organic variables are so salinity to the MDS configuration is non-monotonic
highly correlated with each other (e.g. compare the (Fig. 11.6c), with larger values for the “middle” groups,
plot for Pb in Fig. 11.5d with 11.5b) that there is little but now providing a contrast between the 1-4 and 7-9
point in retaining all of them in the environmental data clusters. Some other variables, such as the height up
matrix. Clearly, when two abiotic variables are so the shore (Fig. 11.6d), appear to bear little relation to
strongly related (collinear), separate putative effects the overall biotic structure, in that samples within the
on the biotic structure could never be disentangled same faunal groups are frequently at opposite
(their effects are said to be confounded). extremes o f the intertidal range.
Biota 3 + M P D b
15
4 19 lg
3 , ’m 4
2 10 17 i- • ^ c í ?
16
87 O°
9 6 O
11 O
10 5
O
+ Sal C + H t d
0 0
GD salinity and height up the shore
o f the sampling locations. (Stress
= 0.05).
LINKING BIOTA TO MULTIVARIATE plots in Fig. 11.7 are o f specific combinations o f the
six sediment variables: H 2S, Sal, MPD, %Org, WT
ENVIRONMENTAL PATTERNS
and Ht, as defined above. For consistency o f present
ation, these plots are also MDS ordinations but based
The intuitive premise adopted here is that if the suite on an appropriate dissimilarity matrix (Euclidean
o f environmental variables responsible for structuring distance on the normalised abiotic variables). In
the community were known1, then samples having practice, since the number o f variables is small, and
rather similar values for these variables would be the distance measures the same, the MDS plots will
expected to have rather similar species composition, be largely indistinguishable from PCA configurations
and an ordination based o (note that Fig. 11.7b is effectively just a scatter plot,
would group sites in the sa since it involves only two variables).
plot. If key environmental variables are omitted, the
match between the two plots will deteriorate. By the The point to notice here is the remarkable degree o f
same token, the match will also worsen if abiotic data concordance between biotic and abiotic plots, especially
which are irrelevant to the community structure are Figs. 11.7a and c; both group the samples in very
included. similar fashion. Leaving out MPD (Fig. 11.7b), the
(7-9 ) group is less clearly distinguished from (6, 11)
The Exe estuary nematode data again provides an and one also loses some matching structure in the
appropriate example. Fig. 11.7a repeats the species (12-19) group. Adding variables such as depth o f the
MDS for the 19 sites seen in Fig. 11,6a. The remaining water table and height up the shore (Fig. 11.7d), the
(1-4 ) group becomes more widely spaced than is in
keeping with the biotic plot, sample 9 is separated
1 These might sometimes include from 7 and 8, sample 14 split from 12 and 13 etc, and
e.g. when assessing how sediment meiofaunal communities might
the fit again deteriorates. In fact, Fig. 11.7c represents
be affected by densities o f a structuring (critical) macrofaunal
species. There is an implicit assumption here, o f course, that the the bestfittin g environmental combination, in the
observed sample patterns are not dominated by internal stochastic sense defined below, and therefore best “explains”
forces, e.g. competitive interactions within the assemblage constit the community pattern.
uting the biotic data matrix. I f they are, the procedure will fa il to
“explain” the community structure in terms o f the provided set o f
environmental variables, naturally.
Chapter 11
page 11-7
Biota
12 14
12-19
For example, in spite o f the very low stress in Fig. 11.7, a 2-d This is so defined by Clarke and Ainsworth (1993) because it is
Procrustes fit o f 11.7a with 11.7c will be rather poor, since the algebraically related to the average o f the harmonic mean o f each
(5, 10) and (12-19) groups are interchanged between the plots. (r¡, s) pair. The denominator term, r, + sh down-weights the contrib
Yet, the interpretation o f the two analyses is fundamentally the ution o f large ranks; these are the low similarities, the highest simil
same (five clusters, with the (5, 10) group out on a limb etc). This arity corresponding to the lowest value o f rank similarity (1), as
match will probably be better in 3-d but will be fully expressed, usual. Note that p w and r tend to give consistently lower values
without arbitrary dimensionality constraints, in the underlying than p sfo r the same match; nothing should therefore be inferred
similarity matrices. from a comparison o f absolute values o f ps, ra n d pw.
Chapter 11
page 11-8
Dissimilarities
S am p les MDS ordination
between sam ples
Bray-Curtis
S pecies
numbers
/biom ass
Rank ♦ correlation
The constant terms are defined such that, in both The BIO-ENV procedure
(11.3) and (11.4), plies in the range (-1, 1), with the
The matching o f biotic to environmental patterns can
extremes o f p = -1 and +1 corresponding to the now cases
take place', as outlined schematically in Fig. 11.8.
where the two sets of ranks are in complete opposition Combinations o f the environmental variables are
or complete agreement, though the former is unlikely considered at steadily increasing levels o f complexity,
to be attainable in practice because o f the constraints i.e. kvariables at a time {k = 1, 2, 3 ,..., v). Table 11.2
inherent in a similarity matrix. Values o f around displays the outcome for the Exe estuary nematodes.
zero correspond to the absence o f match between
the two patterns, but typically will be positive. It is Table 11.2. Exe estuary nematodes
environmental variables, taken ka t
tempting, but wholly wrong, to refer to standard o f biotic and abiotic similarity
statistical tables of Spearman’s rank correlation, to by weighted Spearman rank corrpu;
assess whether two patterns are “significantly” matched overall optimum. See earlier text
( p > 0). This is invalid because the ranks (or {5/})
are not m utually independent variables, since they are k Best variable combinations (p„)
based on a large number (N)of strongly interdependent 1 H2S %Org Sal ...
similarity calculations. (.62) (.54) (.53)
2 H2S, Sal H2S, MPD H2S, %Org Sal,%Org ...
In itself, this does not compromise the use o f as an
(.76) (67)
index o f agreement o f the two triangular matrices.
However, it could be less than ideal because few o f 3 H2S, Sal, MPD H2S, Sal, %Org H2S, Sal, WT ...
(SO) (
the equally-weighted difference terms in equation
(11.3) involve “nearby” samples. In contrast, the 4 H2S, Sal, MPD, %Org H2S, Sal, MPD, Ht ...
premise at the beginning o f this section makes it clear (79)
that we are seeking a combination o f environmental 5 H Ä Sal, MPD, %Org, Ht ...
variables which attains a good match o f the (.79)
similarities (low ranks) in the biotic and abiotic 6 H2S, Sal, MPD, %Org, Ht, WT
matrices. The value o f when computed from (77)
triangular similarity matrices, will tend to be swamped
by the larger number o f terms involving distant pairs
The single abiotic variable which best groups the
o f samples, contributing large squared differences in
sites, in a manner consistent with the faunal patterns,
(11.3). This motivates the down-weighting denominator
is the depth o f the H2S layer (pw = 0.62); next best is
term in (11.4). However, experience suggests that,
typically, this modification affects the outcome only the organic content (pw =0.54), etc. O
marginally and, in the interests o f simplicity o f the faunal ordination is not essentially 1-dimensional
explanation, the well-known Spearman coefficient
may be preferred. ' This is implemented in the PRI
performs a fu ll search, up to a fixed
Chapter 11
page 11-9
(Fig. 11.7a), it would not be expected that a single An example is given by the Garroch Head macrofauna
environmental variable would provide a very success study {Gj,for which the 11 abiotic variables of Table
ful match, though knowledge o f the H2S variable alone 11.1 are first transformed, to validate the use of Euclid
does distinguish points to the left and right o f Fig. ean distances and standard product-moment correlations
11.7a (samples 1 to 4 and 6 to 9 have lower values (page 11-2). As indicated earlier, choice o f transform
than for samples 5, 10 and 12 to 19, with sample 11 ations is aided by a draftsm an i.e. scatter
intermediate). of all pairwise combinations o f variables, Fig. 11.9.
Here, this is after all the concentration variables, but
The best 2-variable combination also involves depth not water depth, have been log transformed*, in line
o f the H2S layer but adds the interstitial salinity. The with the recommendations on page 11-2.
correlation (pM,= 0.76) is markedly better than for any
other 2-variable subset, and this is the combination The draftsman plot, and the associated correlation
shown in Fig. 11.7b. The best 3-variable combination matrix between all pairs o f variables, can then be
retains these two but adds the median particle diameter, examined for evidence o f collinearity (page 11-5),
and gives the overall optimum value for p w o f 0.80 indicated by straight-line relationships, with little
(Fig. 11.7c); p w drops slightly to 0.79 for the best 4- scatter, in Fig. 11.9. A further rule-of-thumb would
and higher-way combinations. The results in Table 11.2 be to reduce all subsets o f (transformed) variables
do therefore seem to accord with the visual impressions which have mutual correlations averaging more than
in Fig. 11.7.T In this case, the first column of Table 11.2 about 0.95 to a single representative. This suggests
has a hierarchical structure: the best combination at that C, Cu, Zn and Pb are so highly inter-correlated
one level is always a subset of the best combination on that it would serve no useful purpose to leave them
the line below. This is not guaranteed (although it seems all in the BIO-ENV analysis. For every good match
to happen surprisingly often) since all combinations that included C, there would be equally good matches
have been evaluated and simply ranked. including Cu, Zn or Pb, leading to a plethora of effect
ively identical solutions. Here, the organic carbon
An exhaustive search over v variables involves load (C) is retained and the other three excluded,
leaving 8 abiotic variables in the full BIO-ENV search.
This results in an optimal match of the biotic pattern
s - ï ü à r 2’ - 1 <"-5>
with C, N and Cd (pw= 0.78). The corr
combinations, i.e. 63 for the Exe estuary study, though ordination plots are seen in Fig. 11.10. The biotic
this number quickly becomes prohibitive when v is MDS of Fig. 11.10a, though structured mainly by a
larger than about 15. Above that level, one could single strong gradient towards the dump centre (e.g.
consider stepwise (and related) procedures which the organic enrichment gradient seen in Fig. 11.10b),
search in a more hierarchical fashion, adding and is not wholly 1-dimensional. Additional information,
deleting variables one at a time (this is implemented on a heavy metal, appears to improve the “explanation”.
in the BVSTEP procedure of Chapter 16). In practice
though, it may be desirable to limit the scale o f the
CONCLUDING REMARKS
search initially, for a number o f reasons, e.g. always
to include a variable known from previous experience
Further examples of the BIO-ENV procedure are
or external information to be potentially causal.
given in Clarke and Ainsworth (1993), Clarke (1993),
Alternatively, as discussed earlier, scatter plots o f the
Somerfield et al( 1994) and many subsequent appli
environmental variables may demonstrate that some
ations. For a series o f data sets on impacts on benthic
are highly inter-correlated and nothing in the way o f
macrofauna around N Sea oil rigs, Olsgard (1997,
improved “explanation” could be achieved by entering
1998) use the BIO-ENV procedure in a particularly
them all into the analysis.
interesting way. They examine which transformations
(Chapter 9) and what level o f taxonomic aggregation
f This will not always be the case if the 2-d faunal ordination has (Chapter 10) tend to maximise the BIO-ENV correl
non-negligible stress. It is the matching o f the similarity matrices
which is definitive, although it would usually be a good idea to
ation, p. The hypotheses examined are that certain
plot the abiotic ordination fo r the best combination at each value
o f k, in order to gauge the effect o f a small change in p on the i
This actually uses a log(c+x) transformation where c is a constant
interpretation. Experience suggests that combinations giving the such as 1 or 0.1. The necessity fo r this, rather than a simple log(x)
same value o f p to two decimal places do not give rise to ordinations transform, comes from the zero values fo r the Cd concentrations
which are distinguishable in any practically important way, thus in Table 11.1, log(0) being undefined. A useful rule-of-thumb here
it is recommended that p is quoted only to this accuracy, as in is to set the constant c to the lowest non-zero measurement, or the
Table 11.2. concentration detection limit.
Chapter 11
page 11-10
log Mn
kA
m aa
U k ■ * r * *
log Co
A i
A A A Aa
: A A
log Ni
A A
A
AA i ‘ aA a
*‘
log Z n
Aa A
MA A 1
\ A A
A
i
AA
aa
A
aa
44
A
kk AA A AA
A A 1
log Cd a ! 1
A A
AÍ A A A
.......................A_ _ .. A.-.. . . .A A ‘
k A A A
A
Ak AA AA AA a A A 1
log Pb A i A A i
4 i .
A 4 kA i k i
a :. A A _____ A_ Aè
* ' A AA A A ;
A AA 4
AAA* A a Aa A AA *At AAA A A4 / 4 A a AAA
log Cr ‘ 1 *
k ^
* f
A A A
a\ A ‘ *
k
A* a ÍA A i k *A k * A
** A A 1 A A A A A A A A A
Depth * a k k
i * A i ^ A * A ■ A* 4A 4 . a * a .aAA a a a ; a * aa a !
aV A *
-- -------:
a ‘A “ • - A'- - — I“ -
“A - A-- - :
aa k i A 44 A aa A t A \ A Ia ¡i A*
log %C a a ¿A A A * aa *A * Ai4
Aà
A A i A a
A aa* A 4i A
a k
* A AÂ A
>._A A A * i * A AA AA,
— ”-'A A ‘ ~ ' ....... A .................................... A" ■ A ‘ ' A A .............. - A-r ■■ A......... - ..........
A A Aa aA A* A A a A A* * A
A Á A A
log %N A* A
k
Ak
A
A
£
AA
A A Í a A A A A
k A
AA A A
A a A
A i
A* A
A
*
a A
A
AA ai* A
AAA aa *
___ A. A . . .A A A A A
log Cu log Mn log Co log Ni log Zn log Cd log Pb log Cr Depth log %C
Fig. 11.9. Garroch Head macrofauna {G}. Draftsman plot (all possible pairwise scatter plots) fo r the 11 abiotic variables recorded at 12
sampling stations across the sewage sludge dumpsite. All variables except water depth have been log transformed.
parts of the community, on the spectrum of rare to procedure, which can be seen as both a strength
common species, may delineate the underlying impact (generality, ease o f understanding, simplicity o f
gradient more clearly (see page 9-4), as may some interpretation) and a weakness (lack o f a structure for
taxonomic levels, higher than species (see page 10-2). formal statistical inference). A simple RELATE test
Another question which naturally arises is the extent is available (see Chapter 15) o f the null hypothesis
to which the conclusions from BIO-ENV can be that there is no relationship between the biotic inform
supported by significance tests. This is problematic ation and a specific abiotic pattern, i.e. that p is effect
given the lack o f model assum ptions underlying this ively zero. This can be examined by a permutation or
Biota C,N
randomisation test, of a type met previously in Chapter number of sample conditions, and the closest possible
6 (Fig. 6.9), in which p is recomputed for all (or a matching o f environmental with biological data. In
random subset of) permutations o f the sample labels the case of a number of replicates from each of a number
in one of the two underlying similarity matrices. As of sites, this could imply that the biotic samples, which
usual, if the observed value o f p exceeds that found in would be well-separated in order to represent genuine
95% of the simulations, which by definition correspond variation at a site, would each have a closely-matched
to unrelated ordinations, then the null hypothesis can environmental replicate.
be rejected at the 5% level.
Another lesson of the earlier Garroch Head example
Note however that this is not a valid procedure if the is the difficulty of drawing conclusions about causality
abiotic set being tested against the biotic pattern is the from any observational study. In that case, a subset
result of optimal selection by the BIO-ENV procedure, of abiotic variables were so highly correlated with
on the same data. For v variables, this is implicitly each other that it was desirable to omit all but one of
equivalent to carrying out 2V-1 null hypothesis tests, them from the computations. There may sometimes
each of which potentially runs a 5% risk of Type 1 be good external reasons for retaining a particular
error (rejecting the null hypothesis when it is really member of the set but, in general, one of them is
true). This rapidly becomes a very large number of chosen arbitrarily as a proxy for the rest (e.g. in the
tests as v increases, and a naïve RELATE test on the Garroch Head data, C was a proxy for the highly
optimal combination is almost certain to indicate a inter-correlated set C, Cu, Zn, Pb). If that variable
‘significant’ biotic-abiotic relation, even with entirely does appear to be linked to the biotic pattern then any
random data sets!^ However, the null hypothesis of member of the subset could be implicated, of course.
‘no relation’ is often not tenable from the start and More importantly, there cannot be a definitive causal
testing is then rather irrelevant: the BIO-ENV procedure implication here, since each retained variable is also a
is best thought o f as an exploratory tool. A more proxy for any potentially causal variable which
convincing confirmatory strategy is to use an initial correlates highly with it, but remains unmeasured.
set of data to suggest an optimal combination of abiotic Clearly, in an environmental impact study, a design in
variables, and an independent data set, utilising only which the main pollution gradient (e.g. chemical) is
that reduced number o f variables, in a RELATE test highly correlated with variations in some natural
or a second BIO-ENV analysis. If there were any environmental measures (e.g. salinity, sediment
variables featuring marginally and arbitrarily in the structure), cannot be very informative, whether the
first run o f BIOENV, they would be unlikely to do so latter variables are measured or not. A desirable
again on the second run. strategy, particularly for the non-parametric multi
variate analyses considered here, is to limit the influence
Design of important natural variables by attempting to select
sites which have the same environmental conditions
Two final points can be made about the sampling but a range of contaminant impacts (including control
design. The general subject o f experimental and field sites* o f course). Even then, in a purely observational
survey design is an immense one, which cannot be study one can never entirely escape the stricture that
covered here.T It is also a problematic area for many any apparent change in community, with changing
o f the (non-parametric) multivariate techniques because pollution impact, could be the result of an unmeasured
the lack o f formal model structures makes it difficult natural variable with which the contaminant levels
to define pow er o f statistical procedures, such as the happen to correlate. Such issues o f causality motivate
randomisation tests described above and in Chapters the following chapter on experimental approaches.
6 and 15. In the context o f linking biotic and abiotic
patterns, it is intuitively clear that this has the greatest
prospect o f success if there are a moderately large
In Chapter 11 we have seen how both the univariate and The subject of ecological experiments requires a book
multivariate community attributes can be correlated o f its own, indeed it gets an excellent one in Underwood
with natural and anthropogenic environmental variables. (1998). The latter, however, in common with most
With careful sampling design, these methods can biologically-oriented texts on experiments, is almost
provide strong evidence as to which environmental entirely concerned with univ analysis
variables appear to affect community structure most, attributes (a population abundance, a diversity measure,
but they cannot actually pro cause
etc).andExperiments
effect. In with multiple outcomes which are
experimental situations we can investigate the effects of analysed by multivariate methods are still far from
a single factor (the treatment)on community structure, though becoming more evident in recent
commonplace,
while other factors are held constant or controlled, thus literature (for papers with a methodological bent see,
establishing cause and effect. There are three main for example, Anderson 2001 a,b, Chapman and Under
categories of experiments that can be used: wood, 1999, Krzanowski (in press), Legendre and
Anderson, 1999, McArdle and Anderson, 2001, Under
1) ‘N atural experim ents’.N ature provides the treat
wood and Chapman, 1998).
ment, i.e. we compare places or times which differ
in the intensity of the environmental factor in
question. NATURAL EXPERIMENTS’
2) Field experiments.The experimenter provides the
treatment, i.e. environmental factors (biological, It is doubtful whether so called
chemical or physical) are manipulated in the field. deserve to be called ‘experiments’ at all, and not
simply well-designed field surveys, since they make
3) Laboratory nts.E nvironmental factors
exprim
comparisons o f places or times which differ in the
are manipulated by the experimenter in laboratory
intensity of the particular environmental factor under
mesocosms or microcosms.
consideration. The obvious logical flaw with this
The degree o f ‘naturalness’ (hence realism) approach is that its validity rests on the assumption
from 1-3, but the degree of control which can be exerted that places or times differ in the intensity o f the
over confounding environmental variables selected environmental factor (treatment); there is no
from 1-3. possibility o f randomly
experimental units,the central tool o f experimentation
In this chapter, each class o f experiments is illustrated and one that ensures that the potential effects o f
by a single example. Unfortunately all these concern unmeasured, uncontrolled variables are averaged out
the meiobenthos, since this component o f the biota is across the experimental groups. Design is often a
very amenable to community level experiments (see problem, but statistical techniques such as two-way
Chapter 13), whereas experiments with other compon ANOVA, e.g. Sokal and Rohlf (1981), Underwood
ents o f the marine biota have mainly been concerned (1981), or two-way ANOSIM (Chapter 6), may enable
with populations o f individual species, rather than us to examine the treatment effect allowing for differ
communities. ences between sites, for example. This is illustrated
in the first example below.
In all cases care should be taken to avoid
replication, i.e. the treatments should be replicated,
In some cases natural experiments may be the only
rather than a series o f ‘replicate’ samples taken from
possible approach for hypothesis testing in community
a single treatment (pseudoreplicates, e.g. Hurlbert,
ecology, because the attribute o f community structure
1984). This is because other confounding variables,
under consideration may result from
often unknown, may also differ between the treatments.
rather than ecological m echanisnd we obviously
It is also important to run experiments long enough
cannot conduct manipulative field or laboratory
for community changes to occur; this favours compon
experiments over evolutionary time. One example of
ents of the fauna with short generation times (see
a community attribute which may be determined by
Chapter 13).
evolutionary mechanisms relates to size spectra in
It should be made clear at the outset that the treatment marine benthic communities. Several hypotheses,
o f experiments in this chapter is somewhat cursory. some complementary and some contradictory, have
Chapter 12
page 12-2
Table 12.1. Tasmania, Eagiehawk Neck {T}. Mean values per core sample o f univariate measures fo r nematodes, copepods and total
meiofauna (nematodes + copepods) in the disturbed and undisturbed areas. The significance levels fo r differences are from a two-way
ANOVA, i.e. they allow fo r differences between blocks, although these were not significant at the 5% level.
Nematodes
Block 1 Block 2
Univariate indices. The significance of differences referred to briefly at the end o f Chapter 8, and detailed
between disturbed and undisturbed samples (treatments) in Clarke, 1990). For the copepods, however, (plots
was tested with two-way ANOVA (blocks/treatments), given in Chapter 13, Fig. 13.4), k-dominance curves are
Table 12.1. For the nematodes, species richness, intermingled and crossing, and there is no significant
Shannon diversity and evenness were significantly treatment effect.
reduced in disturbed as opposed to undisturbed areas,
although total abundance was unaffected. For the M ultivariate ordinations. MDS revealed significant
copepods, however, there were no significant differ differences in species composition for both nematodes
ences in any o f these univariate measures. and copepods: the effects o f crab disturbance were
similar within each block and similar for nematodes
G raphical/distributional plots, ^-dominance curves and copepods. Note the similarities in Fig. 12.3 between
(Fig. 12.2) also revealed significant differences in the the nematode and copepod configurations: both disturb
relative species abundance distributions for nematodes ed samples within each block are above both of the
(using both the ANOVA and A N O SlM -based tests undisturbed sam ples (except for one block for the
a d* %
Fig. 12.3. Tasmania, Eagiehawk Neck {T}. MDS configurations fo r nematode, copepod and 'meiofauna ’ (nematode + copepod) abund
ance (root-transformed). Different shapes represent the four blocks o f samples. Open symbols = undisturbed, filled = disturbed (stress
= 0.12, 0.09. 0.11 respectively).
Chapter 12
page 12-4
3C
3E 3A
copepods), and the blocks are arranged in the same
sequence across the plot. For both nematodes and
copepods, two-way ANOSIM shows a significant Fig. 12.4. Azoic sediment recolonisation experiment {Z}. MDS
configuration fo r harpacticoid copepods (4th root transformed
effect o f both treatment (disturbance) and blocks, abundances) after 1, 3 and 8 months, with 6 different treatments
Table 12.2, but the differences are more marked for (A-F), see text (stress = 0.07).
the nematodes (with higher values o f the R statistic).
fauna; D - 10 mm control cages with two ends left open;
Conclusions. Univariate indices and graphical/distrib
E - open unmeshed cages; F - uncaged background
utional plots were only significantly affected by crab
controls. Three replicates of each treatment were
disturbance for the nematodes. Multivariate analysis
sampled after 1 month, 3 months and 8 months and
revealed a similar response for nematodes and copepods
analysed for nematode and harpacticoid copepod
(i.e. it seems to be a more sensitive measure of commun
species composition.
ity change). In multivariate analyses, natural variations
in species composition across the beach (i.e. between
Univariate indices. The presence o f cages had a more
blocks) were about as great as those between treatments
pronounced impact on copepod diversity than nemat
within blocks, and the disturbance effect would not
ode diversity. For example, after 8 months, H ' and J '
have been clearly evidenced without this blocked
(but not S) for copepods had significantly higher values
sampling design.
inside the exclusion cages than in the control cages
with the ends left open, but for the nematodes, differ
FIELD EXPERIMENTS ences in H ’ were o f borderline significance (p = 5.3%).
Field manipulative experiments include, for example, Graphical/distributional plots. No significant treat
ment effect for either nematodes or copepods could be
caging experiments to exclude or include predators,
controlled pollution o f experimental plots, and big- detected between ^-dominance curves for all sampling
dates, using the ANOSIM test referred to at the end
bag experiments with plankton. Their use has been
of Chapter 8.
predominantly for (univariate) population rather than
community studies, although multivariate analysis o f
M ultivariate analysis. For the harpacticoid copepods
manipulative experiments is becoming more widespread
there was a clear successional pattern o f change in
(see, for example, Anderson and Underwood, 1997,
community composition over time (Fig. 12.4), but no
Morrisey et al, 1996, Gee and Somerfield, 1997, Austen
such pattern was obvious for the nematodes. Fig. 12.4
and Thrush, 2001). The following example is one in
uses data from Table 2 in Olafsson and Moore’s paper,
which univariate, graphical and multivariate statistical
which are for the 15 most abundant harpacticoid species
analyses have been applied to meiobenthic communities.
in all treatments and for the mean abundances o f all
replicates within a treatment on each sampling date.
Azoic sediment recolonisation experiment with
On the basis o f these data, there is no significant
predator exclusion {Z}
treatment effect using the 2-way ANOSIM test for
Olafsson and Moore (1992) studied meiofaunal colon one replicate per cell11 (see page 6-10), but the fuller
isation o f azoic sediment in a variety o f cages designed replicated data may have been more revealing.
to exclude epibenthic macrofauna to varying degrees:
A - 1 mm mesh cages designed to exclude all macro
fauna; B -1 mm control cages with two ends left open; ^ This is the PRIMER ANOS1M2 test, which will be uninformative
in the presence o f sizeable treatment/time interactions, a likely
C - 10 mm mesh cages to exclude only larger macro- possibility here.
Chapter 12
page 12-5
80
60
Fig. 12.5. Nutrient enrichment
40 experiment {N}. k-dominance
curves fo r nematodes, total cop
20 epods and copepods omitting
the ‘weed ’ species o f Tisbe,
fo r summed replicates o f each
0
1 10 1 10 treatment, C = control, L =
S p ecies rank low and H — high dose.
C H h
C
C
C H L
H
L L
L
Fig. 12.6. Nutrient enrichment experiment {N}. MDS o f W-transformed abundances o f nematodes, copepods and total meiofauna (nematodes
+ copepods). C = control, L = low dose, H = high dose (stress = 0.18, 0.09, 0.12).
Table 12.4. Nutrient enrichment experiment {N}. Values o f the treatments are low (0.2-0.3), but there is a significant
R statistic from the ANOSIM test, in pairwise comparisons between difference between the low dose treatment and the
treatments, together with significance levels. C = control, L =
low dose, H = high dose.
control, at the 5% level. For the copepods, there is a
clear separation o f treatments on the MDS, the R
Treatment Statistic % Sig. statistic values are much higher (0.6-1.0), and there
value (R) level are significant differences in community structure
Nematodes between all treatments.
(L,C) 0.27 2.9
(H,C) 0.22 5.7 Conclusions. The univariate and graphical/distributional
(H,L) 0.28 8.6 techniques show lowered diversity with increasing
Copepods dose for copepods, but no effect on nematodes. The
(L,C) 1.00 2.9 multivariate techniques clearly discriminate between
(HyC) 0.97 2.9 treatments for copepods, and still have some discrimin
' (H,L) 0.59 2.9 ating power for nematodes. Clearly the copepods have
been much more strongly affected by the treatments
assemblages in the enriched boxes was the presence, in all these analyses, but changes in the nematode
in highly variable numbers, o f several species o f the community may not have been detectable because o f
large epibenthic harpacticoid Tisbe, which are ‘w eed’ the great variability in abundance o f nematodes in the
species often found in old aquaria and associated with high dose boxes. The responses observed in the meso-
organic enrichment. If this genus is omitted from the cosm were similar to those sometimes observed in the
analysis, a clear sequence of increasing elevation of field where organic enrichment occurs. For example,
the ^-dominance curves is evident from control to high there was an increase in abundance of epibenthic
dose boxes. copepods (particularly Tisbe spp.) resulting in a
decrease in the nematode/copepod ratio. In this
M ultivariate analysis. Fig. 12.6 shows that, in an MDS experiment, however, the causal link is closer to
o f VV-transformed species abundance data, there is no being established, though the possible constraints and
obvious discrimination between treatments for the artefacts inherent in any laboratory mesocosm study
nematodes. In the ANOSIM test (Table 12.4) the should always be borne in mind (see, for example, the
values of the R statistic in pairwise comparisons between discussion in Underwood and Peterson, 1988).
Chapter 13
page 13-1
PLANKTON 2 -,
Zooplankton
The advantages o f plankton are that:
a) Long tows over relatively large distances result in
community samples which reflect integrated ecolog o
ical conditions over large areas. They are therefore c
o
useful in monitoring more global changes. +3
5
b) Identification o f macro-planktonic organisms is ’>
o
moderately easy, because o f the ready availability
of appropriate literature.
Phytoplankton
The d is a d v a n ta g e of plankton is that, because the water
masses in which they are suspended are continually
mobile, they are not useful for monitoring the local
effects o f a particular pollutant source. 0
The disadvantages o f fish are that: This combination o f advantages has resulted in the
soft-bottom macrobenthos being probably the most
a) Strictly quantitative sampling which is equally rep
widely used component o f the marine biota in environ
resentative o f all the species in the community is
mental impact studies. Despite this, they do have
difficult. The overall catching efficiency of nets,
several disadvantages:
traps etc. is often unknown, as are the differing
abilities o f species to evade capture or their suscept a) Relatively large-volume sediment samples must be
ibility to be attracted to traps. Visual census methods collected, so that sampling requires relatively large
are also not free from bias, since some species will research ships.
be more conspicuous in colouration or behaviour
b) Because it is generally not practicable to bring large
than other dull secretive species.
volumes o f sediment back to the laboratory for
b) Uncertainty about site fidelity is usually, but not processing, sieving must be done at sea and is rather
always, a problem. labour intensive and tim e consuming (therefore
expensive).
Example: Maldives coral reef-fish
c) The potential response tim e of the macrobenthos to
For a study in the M aldive islands, Dawson-Shepherd a pollution event is slow. Their generation times
et (al1992) used visual census methods to compare are measured in years, so that although losses of
reef-fish assemblages at 23 coral reef-flat sites, 11 of species due to pollution may take immediate effect,
which had been subjected to coral mining for the the colonisation o f new species which may take
construction industry and 12 were non-mined controls. advantage of the changed conditions is slow. Thus,
The MDS (Fig. 13.2) clearly distinguished mined the full establishment o f a community characterising
from non-mined sites. the new environmental conditions may take several
years.
MACROBENTHOS d)T he macrobenthos are generally unsuitable for
causality experiments in mesocosms, because such
The advantages o f soft-bottom macrobenthos are that: experiments can rarely be run long enough for fully
representative community changes to occur, and
a) They are relatively non-mobile and are therefore
recruitment of species to mesocosm systems is often
useful for studying the local effects of pollutants.
a problem because of their planktonic larval stages
b) Their taxonomy is relatively easy. (see Chapter 12).
c) Quantitative sampling is relatively easy. Example: Amoco Cadiz oil-spill
d) There is an extensive research literature on the The sensitivity o f macrobenthic community structure
effects o f pollution, particularly organic enrichment, to pollution events, when using multivariate methods
on macrobenthic communities, against which specific o f data analysis, is discussed in Chapter 14. The
case-histories can be evaluated. response o f the macrobenthos in the Bay o f Morlaix
to the Amoco Cadiz oil-spill some 40 km away,
described in Chapter 10, is a good example of this
M (Fig. 13.3).
MEIOBENTHOS
Copepods
11
100
Block 1
g 100
3
o
Macrofauna Meiofauna
100 100
80 80
a>
> 60 60
J2
3 40 40
E
20 20
Fig. 13.5. Hamilton Harbour,
Bermuda {H}. k-dominance
0 -, curves fo r macrobenthos (left)
10 and meiobenthic nematodes
Species rank (right) at six stations (H2-H7).
Chapter 13
page 13-5
ATTRIBUTES
HARD-BOTTOM MOTILE FAUNA
Species abundance data are by far the most commonly
The motile fauna living on rocky substrates and assoc used in environmental impact studies at the community
iated with algae, holdfasts, hydroids etc. has rarely level. However, the abundance of a species is perhaps
been used in pollution impact studies because o f its the least ecologically relevant measure of its relative
many disadvantages: importance in a community, and we have already
a) Remote sampling is difficult. seen in Chapter 10 that higher taxonomic levels than
b) Quantitative extraction from the substrate, and species may be sufficient for environmental impact
comparative quantification o f abundances between analyses. So, when planning a survey, consideration
different substrate types, are difficult. should be given not only to the number of stations
and number o f replicates to be sampled, but also to
c) Responses to perturbation are largely unknown. the level o f taxonomic discrimination which will be
d) A suitable habitat (e.g. algae) is not always available. used, and which measure(s) o f the relative importance
A solution to this problem, and also problem (b), o f these taxa will be made.
Meiofauna Macrofauna
#
• A
A * A A
• © A # ® ^ A
A* #
•# # 0 A . 0 . 0 ^ . -
S • ' <p a
i 0<> a A ♦ A
Fig. 13.7. Isles o f Scilly seaweed
# ♦ n D □ ^ /> □ □
O n cp * fauna fS}. MDS o f standard
D □ o 4 ] ° ised VV-transformed meiofauna
□ # and macrofauna species abund
□ ance data. The five seaweed
species are indicated by differ
ent symbol and shading con
# Chondrus # Laurencia # Lomentaria □ Cladophora A Polysiphonia ventions (stress = 0.19, 0.18).
Chapter 13
page 13-6
A b u n d a n ce B io m a ss
As a measure of the relative ecological importance for We have already seen in Chapter 10 that, in many
soft-sediment and water-column sampling o f species, pollution-impact studies, it has been found for both
biomass is better than abundance, and production in graphical and multivariate analyses that there is
turn is better than biomass. However, the determination surprisingly little loss of information when the species
o f annual production of all species within a community data are aggregated to higher taxa, e.g. genera, families
over a number of sites or times would be so time or sometimes even phyla. For the detection of pollution
consuming as to be completely impracticable.11 We impact, initial collection o f data at the level of higher
are therefore left with the alternatives o f studying taxa would result in a considerable saving of time
abundances, biomasses, or both. Abundances are (and cost) in the analysis o f samples. Such a strategy
marginally easier to measure, biomass may be a better would, of course, be quite inappropriate if the objective
reflection of ecological importance, and measurement were to be differently defined, for example, the quant
o f both abundance and biomass opens the possibility ification o f biodiversity properties.
o f comparing species-by-sites matrices based on these
two different measures (e.g. by the ABC method RECOMMENDATIONS
discussed in Chapter 8).
It is difficult to give firm recommendations as to which
In practice, multivariate analyses o f abundance and components or attributes of the biota should be studied,
biomass data often give remarkably similar results, since this depends on the problem in hand and the
despite that fact that the species mainly responsible for expertise and funds available. In general, however,
discriminating between stations are usually different. In the wider the variety of components and attributes
Fig. 13.8, for example, the Frierfjord macrobenthos studied, the easier the results will be to interpret. A
MDS configurations for abundance and biomass are broad approach at the level of higher taxa is often
very similar but it is small polychaete species which preferable to a painstakingly detailed analysis of species
are mainly responsible for discriminating between abundances. If only one component o f the fauna is to
sites on the basis of abundance, and species such as be studied, then consideration should be given to
the large echinoid Echinocardium cordatum which working up a larger number of stations/replicates at
discriminate the sites on the basis o f biomass. the level of higher taxa in preference to a small number
of stations at the species level. O f course, a large
number of replicated stations at which both abundance
and biomass are determined at the species level is
always the ideal!
^ Although relative “production” o f species can be approximated
using empirical relationships between biomass, abundance and
production, and these “production ” matrices subjected to multi
variate analysis, see Chapter 15.
Chapter 14
page 14-1
100
80
60
40
20
Site A Site B Site C
>
_rc
g 100
° 80
60
40
Fig. 14.2. Frierfjord macro
benthos {F}. ABC plots based
20
on the totals from 4 replicates
Site D Site E Site G at each o f the 6 sites. Solid
0
1 10 lines: abundances; dotted lines:
10
biomass.
Species rank
Univariate indices
Graphical/distributional plots
Multivariate analysis
3.2 -
2km 500m
2.4 -
2.0 -
100
80
o
> 60
+3
J2
3 Distance groups:
E 40
3 — < 250m
o
— 250m -1km
20
— 1 - 3.5km
> 3.5km
10 100
Species rank
Fig. 14.4. Ekofisk macrobenthos {E}. a) Map o f sampling sites, represented by different symbol and shading conventions according to
their distance from the 2/4K rig at the centre o f drilling activity; b) Shannon diversity (mean and 95% confidence intervals) in these distance
zones; c) mean k-dominance curves; d) MDS from root-transformed species abundances (stress = 0.12).
Univariate indices
100
EXAMPLE 4: Fish communities from
coral reefs in the Maldives
80
U n i v a r i a t e in d ic e s
1 10
Species rank Using ANOVA, no significant differences in diversity
Fig. 14.6. Indonesian reef-corals, Pari Island {I}, k-dominance (Fig. 14.8) were observed between mined and control
cuides fo r totals o f all ten replicates in each year. sites, with no differences either between reef flats and
slopes.
M u l t i v a r i a t e a n a ly s i s G r a p h ic a l/d is tr ib u tio n a l p lo ts
3.5
£0)
T>J 3.0
C
O
C
c
s:re
CO
2.5
Mined Control Mined Control
flats flats slo p es slo p es
O
0
>
80
t Bryher
8
Tresco
St Martin's
1 10 100 1 10 100
S p ecies rank
• 6
Fig. 14.9. Maldive Islands, coral-reeffish {M}. Average k-dom- St Mary's
inance cui'ves fo r abundance and biomass at mined and control
reef-flat sites. 3
Conclusions
St Agnes
There were clear differences in community composition
due to mining activity revealed by multivariate methods,
even on the reef-slopes adjacent to the mined flats, but 5 km
these were not detected at all by univariate or graphical/
distributional techniques, even on the flats, where the Fig. 14.11. Isles o f Scilly {S}. Map o f the 8 sites from each o f
separation in the MDS is so obvious. which 5 seaweed species were collected.
Multivariate analysis
Meiofauna Macrofauna
2.8
X
2.5
Meiofauna Macrofauna
100 100
Lo
80
o>
> 60
JS
3
E 40 Fig. 14.13. Isles o f Scilly seaweed
3 Po
fauna {Sj. k-dominance curves
o
Ch 20 for meiofauna (left) and macro
fauna (right). Ch = Chondrus,
0 La = Laurencia, Lo - Loment
1 10 100 10 100 aria, Cl = Cladophora, Po =
Species rank Polysiphonia.
Chapter 14
page 14-7
GENERAL CONCLUSIONS
Nematodes Copepods
100 100
80 80
60 60
Fig. 14.15'. Tamar estuary meio
40 40 benthos {R}. k-dominance
curves fo r amalgamated data
20 20 from 6 replicate cores fo r nem
atode and copepod species
0 0 abundances. For clarity o f
1 10 1 10 presentation, some sites have
Species rank been omitted.
Chapter 14
page 14-8
Nematodes Copepods
88
We have seen in Chapter 14 that multivariate methods comparative data on species diversity requires a highly
o f analysis are very sensitive for detecting differences skilled and painstaking analysis of species and a high
in community structure between samples in space, or degree o f standardisation with respect to the degree
changes over time. Generally, however, these methods of taxonomic rigour applied to the sample analysis;
are used to detect differences between communities, e.g. it is not valid to compare diversity at one site where
and not in themselves as measures of comm unity stress one taxon is designated as “nemertines” with another
in the same sense that species-independent methods at which this taxon has been divided into species.
(e.g. diversity, ABC curves) are employed. Even using
the relatively less-sensitive species-independent methods The problem o f natural variability in species compos
there may be problems of interpretation in this context. ition from place to place can be potentially overcome
Diversity does not behave consistently or predictably by working at taxonomic levels higher than species.
in response to environmental stress. Both theory The taxonomic composition o f natural communities
(Connell, 1978; Huston, 1979) and empirical observ tends to become increasingly similar at these higher
ation (e.g. Dauvin, 1984; W iddicombe and Austen, levels. Although two communities may have no species
1998) suggest that increasing levels o f disturbance in common, they will almost certainly comprise the
may either decrease or increase diversity, or it may same phyla. For soft-bottom marine benthos, we have
even remain the same. A monotonic response would already seen in Chapter 10 that disturbance effects
be easier to interpret. False indications of disturbance are detectable with multivariate methods often at the
using the ABC method may also arise when, as some highest taxonomic levels, even in some instances where
times happens, the species responsible for elevated these effects are rather subtle and are not evidenced
abundance curves are pollution sensitive rather than in univariate measures even at the species level, e.g.
pollution tolerant species (e.g. small amphipods, the Ekofisk {E} study.
Hydrobia etc). Knowledge o f the actual identities of
M eta-analysis is a term widely used in biomedical
the species involved will therefore aid the interpretation
statistics and refers to the combined analysis of a range
o f ABC curves, and the resulting conclusions will be
o f individual case-studies which in themselves are of
derived from an informal hybrid of species-independent
limited value but in combination provide a more global
and species-dependent information (W arwick and
insight into the problem under investigation. Warwick
Clarke, 1994). In this chapter we describe three
and Clarke (1993a) have combined macrobenthic data
possible approaches to the measurement of community
aggregated to phyla from a range of case studies {J}
stress using the fully species-dependent multivariate
relating to varying types o f disturbance, and also from
methods.
sites which are regarded as unaffected by such pertur
bations. A choice was made of the most ecologically
META-ANALYSIS OF MARINE meaningful units in which to work, bearing in mind
MACROBENTHOS the fact that abundance is a rather poor measure of
such relevance, biomass is better and production is
perhaps the most relevant o f all (Chapter 13). O f
This method was initially devised as a means of
course, no studies have measured production (P) of
comparing the severity of community stress between
all species within a community, but many studies
various cases o f both anthropogenic and natural
provide both abundance (A) and biomass (B) data.
disturbance. On initial consideration, measures of
Production was therefore approximated using the
community degradation which are independent of the
allometric equation:
taxonomic identity o f the species involved would be
most appropriate for such comparative studies. Species
P= (B/A)013X A (15.1)
composition varies so much from place to place
depending on local environmental conditions that any
where /Asi simply the mean body-weight, and 0.73
B
general species-dependent response to stress would
is the average exponent o f the regression of annual
be masked by this variability. However, diversity
production on body-size for macrobenthic invertebrates.
measures are also sensitive to changes in natural
Since the data from each study are standardised (i.e.
environmental variables and an unperturbed community
production of each phylum is expressed as a proportion
in one locality could easily have the same diversity as
of the total) the intercept o f this regression is irrelevant.
a perturbed community in another. Also, to obtain
For each data set the abundance and biomass data were
Chapter 15
page 15-2
first aggregated to phyla, following the classification 5) Two stations in the Skagerrak at depths of 100 and
o f Howson (1987); 14 phyla were encountered overall 300m. The 300m station showed signs of disturbance
(see the later Table 15.1). Abundance and biomass attributable to the dominance o f the sediment re
were then combined to form a production m atrix using working bivalve Abra nitida.
the above formula. All data sets were then merged 6) An undisturbed station off the coast o f Northumber
into a single production matrix and an MDS performed land, N E England.
on the standardised, 4th root-transformed data using
the Bray-Curtis similarity measure. All macrobenthic 7) An undisturbed station in Carmarthen Bay, S Wales.
studies from a single region (the N E Atlantic shelf) 8) An undisturbed station in Kiel Bay; mean of 22 sets
for which both abundance and biomass data were of samples.
available were used, as follows:
In all, this gave a total o f 50 samples, the disturbance
1) A transect of 12 stations sampled in 1983 on a west- status o f which has been assessed by a variety of diff
east transect (Fig. 1.5) across a sewage sludge dump- erent methods including univariate indices, dominance
ground at Garroch Head, Firth o f Clyde, Scotland plots, ABC curves, measured contaminant levels etc.
{G}. Stations in the middle o f the transect show The MDS for all samples (Fig. 15.1) takes the form o f
clear signs o f gross pollution. a wedge with the pointed end to the right and the wide
end to the left. It is immediately apparent that the
2) A time series of samples from 1963-1973 at two
long axis of the configuration represents a scale of
stations (sites 34 and 2, Fig. 1.3) in West Scottish
disturbance, with the most disturbed samples to the
sea-lochs, L. Linnhe and L. Eil {L/, covering the
right and the undisturbed samples to the left. (The
period of commissioning o f a pulp-mill. The later
reason for the spread o f sites on the vertical axis is
years show increasing pollution effects on the macro
less obvious). The relative positions o f samples on
fauna, except that in 1973 a recovery was noted in
the horizontal axis can thus be used as a measure of
L. Linnhe following a decrease in pollution loading.
the relative severity of disturbance. Another gratifying
3) Samples collected at six stations in Frierfjord feature o f this plot is that in all cases increasing levels
(OsloQord), Norway {F}. The stations (Fig. 1.1) o f disturbance result in a shift in the same direction,
were ranked in order of increasing stress A -G -E - i.e. to the right. For visual clarity, the samples from
D -B -C , based on thirteen different criteria. The individual case studies are plotted in Fig. 15.2, with
macrofauna at stations B, C and D were considered the remaining samples represented as dots.
to be influenced by seasonal anoxia in the deeper
1) Garroch H ead (Clyde) sludge dum p-ground {G}.
basins of the fjord.
Samples taken along this transect span the full scale
4) Amoco-Cadiz oil spill, Bay o f Morlaix {A}. In view of the long axis o f the configuration (Fig. 15.2a).
o f the large number o f observations, the 21 sampling Stations at the two extremities o f the transect (1
occasions have been aggregated into five years for and 12) are at the extreme left of the wedge, and
the meta-analysis: 1977 = pre-spill, 1978 = immediate stations close to the dump centre (6) are at the
post-spill and 1979-81 = recovery period. extreme right.
Oo
and biomass data are available. The scale of perturb
o ation is determined by the 50 samples present in the
^ ° O
meta-analysis. These can be regarded as the training
Oo 8° °
set against which the status of new samples can be
O 8 oo judged. The best way to achieve this would be to merge
r\
° 0 3 o'
the new data with the training set to generate a single
o
°< s> °
production matrix for a re-run o f the MDS analysis.
G 1 The positions of the new data in the two dimensional
C
5 O Echinodermata configuration, especially their location on the principal
o
axis, can then be noted. O f course the positions of
the samples in the training set may then be altered
o 0 3 relative to each other, though such re-adjustments
would be expected to be small. It is also natural, at
CO«? O ° 0 least in some cases, that each new data set should add
O 0*0 ■ to the body of knowledge represented in the meta
o analysis, by becoming part o f an expanded training
set against which further data are assessed. This
approach would preserve the theoretical superiority
Crustacea and practical robustness o f applying MDS (Chapter
5) in preference to ordination methods such as PCA.
Fig. 15.3. Joint NE Atlantic sh elf studies ( “meta-analysis”) {Jj.
As Fig. 15.1 but highlighting the role o f specific phyla in shaping However, there are circumstances in which more
the MDS; symbol size represents % production in each sample approximate methods might be appropriate, such as
from: a) echinoderms, b) crustaceans.
when it is preferable to leave the training data set
unmodified. Fortunately, because of the relatively
This multivariate approach to the comparative scaling low dimensionality of the multivariate space (14
of benthic community responses to environmental stress phyla, of which only half are of significance), a two-
seems to be more satisfactory than taxon-independent dimensional PCA o f the “production” data gives a
methods, having both generality and consistency of plot which is rather close to the MDS solution. The
behaviour. It is difficult to assess the sensitivity o f eigenvectors for the first three principal components,
the technique because data on abundance and biomass which explain 72% o f the total variation, are given in
o f phyla are not available for any really low-level or Table 15.1. The value of the PCI score for any existing
subtle perturbations. However, its ability to detect or new sample can then easily be calculated from the
the deleterious effect o f the Amoco-Cadiz oil spill, first column of this table, without the need to re-analyse
where diversity was not impaired, and to rank the the full data set. This score could, with certain caveats
Frierijord samples correctly with respect to levels of (see below), be interpreted as a disturbance index. This
stress which had been determined by a wide variety index is on a continuous scale but, on the basis o f the
of more time-consuming species-level techniques, training data set given here, samples with a score of
suggests that this approach may retain much o f the >+1 can be regarded as grossly disturbed, those with a
sensitivity of multivariate methods. It certainly seems, value between -0.2 and +1 as showing some evidence
at least, that there is a high signal/noise ratio in the o f disturbance and those with values <-0.2 as not
sense that natural environmental variation does not signalling disturbance with this methodology. A more
affect the communities at this phyletic level to an extent robust, though less incisive, interpretation would place
which masks the response to perturbation. The fact less reliance on the absolute location o f samples on
that this meta-analysis “works” has a rather weak the MDS or PCA plots and emphasise the movement
theoretical basis. Why should M ollusca as a phylum (to the right) o f putatively impacted samples relative
be more sensitive to perturbation than Annelida, for to appropriate controls. For a new study, the spread of
example? The answer to this is unlikely to be straight sample positions in the meta-analysis allows one to scale
forward and would need to be addressed by considering the importance of observed changes, in the context of
a broad range o f toxicological, physiological and differences between control and impacted samples for
ecological characteristics which are more consistent the training set.
within than between phyla.
It should be noted that the training data is unlikely to
The application o f these findings to the evaluation of be fully representative of all types o f perturbation that
data from new situations requires that both abundance could be encountered. For example, in Fig. 15.1, all
Chapter 15
page 15-5
Table 15.1. Joint NE Atlantic shelf studies (“,m eta-analysis”) {J}. was that this variability in itself may be an identifiable
Eigenvectors for the first three principal components from covar
symptom o f perturbed situations. The four examples
iance-based PCA o f standardised and 4th root-transformed phylum
“production ” (all samples). examined were:
BC,
It is possible to construct an index from the relative all similarities among impacted samples are lower than
variability between impacted and control samples. One any similarities among control samples. The converse
natural comparative measure of dispersion would be case gives a minimum for IMD o f - 1 , and values near
based on the difference in average distance among zero imply no difference between treatment groups.
replicate samples for the two groups in the 2-d MDS
configuration. However, this configuration is usually In Table 15.2, IMD values are compared between each
not an exact representation of the rank orders o f simil pair o f treatments or conditions for the four examples.
arities between samples in higher dimensional space. For the mesocosm meiobenthos, comparisons between
These rank orders are contained in the triangular the high dose and control treatments and the high dose
similarity matrix which underlies any MDS. (The and low dose treatments give the most extreme IMD
case for using this matrix rather than the distances is value o f +1, whereas there is little difference between
analogous to that given for the ANOSIM statistic in the low dose and controls. For the Ekofisk macro
Chapter 6.) A possible comparative Index of M ulti fauna, strongly positive values are found in comparisons
variate Dispersion (IMD) would therefore contrast between the group D (most impacted) stations and the
the average rank of the similarities among impacted Table 15.2. Variability study {Ny E, /, M}. Index o f Multivariate
samples (r^) with the average rank among control Dispersion (IMD) between all pairs o f conditions.
samples (rc) , having re-ranked the full triangular
Study Conditions compared IMD
matrix ignoring all between-treatment similarities.
Meiobenthos High dose / Control +/
Noting that high similarity corresponds to low rank
High dose / Low dose +/
similarity, a suitable statistic, appropriately
Low dose / Control -0.33
standardised, is:
Macrobenthos Group D / Group C +0.77
IM D = 2(rt - r c)/(N l+ N C) (15.2) Group D / Group B +0.80
Group D / Group A +0.60
where Group C / Group B -0.02
Group C / Group A -0.50
N c = n c(nc - l)/2, N t = n t(nt - l)/2 (15.3)
Group B / Group A -0.59
and nc, nt are the number of samples in the control Corals 1983 / 1981 +0.84
and treatment groups respectively. The chosen denom
inator ensures that IMD has maximum value o f +1 when Reef-fish Mined / Control reefs +0.81
Chapter 15
page 15-7
other three groups. It should be noted however that Table 15.3. Variability study {N, E, I, M}. Relative dispersion o f
stations in groups C, B and A are increasingly more the groups (equation 15.4) in each o f the fo u r studies.
T ransect A B C
.74 .72
Year
.26 .71
between the corresponding elements o f two triangular transect (1,2, ..., ri) is entirely arbitrary, and the spread
matrices o f rank “dissimilarities” . The first is that of of IMS values which are consistent with the null hypo
Bray-Curtis coefficients calculated for all pairs from thesis can be determined by recomputing it for permut
the n coral community samples (n = 12 or 17 in this ations of the sample labels in one of the two similarity
case). The second is formed from the inter-point matrices (holding the other fixed). For T randomly
distances o f n points laid out, equally-spaced, along a selected permutations of the sample labels, if only t
line. If the community changes exactly match this o f the T simulated IMS values are greater than or
linear sequence (for example, sample 1 is close in equal to the observed IMS, the null hypothesis can be
species composition to sample 2, samples 1 and 3 are rejected at a significance level of 100(/+ l)/(r+ l)% .^
less similar; 1 and 4 less similar still, up to 1 and 12
In structure, the test is analogous to that considered at
having the greatest dissimilarity) then the IMS takes
the end of Chapter 6 (implemented in the PRIMER
the value 1. If, on the other hand, there is no discernible
routine ANOSIM2), and again referred to briefly in
biotic pattern along the transect, or if the relationship
Chapter 11 in the context o f the BIO-ENV procedure.
between the community structure and distance offshore
is very non-monotonic - with the composition being
similar at opposite ends of the transect but veiy different ^ The calculations fo r the tests were carried out using the PRIMER
in the middle - then the IMS will be close to zero. RELATE routine. In exactly the same way, community change
These near-zero values can be negative as well as could be related to a temporal trend (equally-spaced points in time
positive but no particular significance attaches to this. or around a circle, such as a seasonal cycle) or to the sampling
positions in a 2-dimensional spatial layout. There are null hypoth
esis tests specifically for seriation and cyclicity in RELATE, and a
A statistical significance test would clearly be useful, general test fo r lack of relationship between any two supplied
to answer the question: when is the IMS sufficiently similarity matrices with the same label sets (independently derived).
different from zero to reject the null hypothesis o f a The relation to spatial layout can be tested by first calculating
complete absence of seriation? Such a( test can be Euclidean distances on an “environmental ” data matrix consisting
simply o f the co-ordinates o f the sample positions, then feeding
derived by a permutation procedure. If the null hypo this and the biotic triangular matrix into the RELA TE routine.
thesis is true then the labelling o f samples along the
Chapter 15
page 15-10
One distinctive feature o f the current test is that tied On transect A, subjected to the highest sedimentation,
ranks will be much more prevalent, particularly in the visual inspection o f the MDS gives a clear impression
similarities computed from the linear sequence, and it of the breakdown o f the linear sequence for the next
is advisable to make proper allowance for this in calc two sampling occasions. The IMS is dramatically
ulating the Spearman coefficients. Kendall (1970, eqt reduced to 0.26 in 1986, when the dredging operations
3.7) gives an appropriate adjustment to p s, and this commenced, although the correlation with a linear
form is used in the analysis below. sequence is still ju st significant (p=3.8%). By 1987
the IMS on this transect is further reduced to 0.19 and
In 1983, before the dredging operations, MDS config the correlation with a linear sequence is no longer
urations (Fig. 15.6) indicate that the points along each significant. On transect B, further away from the
transect conform rather closely to a linear sequence, dredging activity, the loss of seriation is not evident
and there are no obvious discontinuities in the sequence until 1987, when the sequencing of points on the MDS
of community change (i.e. no discrete clusters separated configuration breaks down and the IMS is reduced to
by large gaps); the community change follows a quite 0.32, although the latter is still significant (p=0.2%).
gradual pattern. The values of the IMS are consequently Note that the MDS plots o f Fig. 15.6 may not tell the
high (Table 15.5), ranging from 0.62 (transect C) to whole story; the stress values lie between 0.07 and
0.72 (transect B). 0.14, indicating that the 2-dimensional pictures are
not perfect representations (though unlikely seriously
The correlation with a linear sequence is highly sign to mislead, see Chapter 5). The largest stress is, in
ificant in all three cases. Note that in the 1983 MDS fact, that for transect B in 1987, so that the seriation
for transect A, the furthest inshore sample has been that is still detectable by the test is only imperfectly
omitted; it had very little coral cover and was an outlier seen in the 2-dimensional plot. It is also true that the
on the plot, resulting in an unhelpfully condensed increased number o f points (17) on transect B, in
display of the remaining points. (This is to be expected comparison with A and C (12), will lead to a more
in MDS analyses where one sample has a higher dis sensitive test. On transect C there is no evidence of
similarity to all other samples than any other dissimil the breakdown o f seriation at all, either from the IMS
arity in the matrix, and the MDS needs to be replotted values or from inspection o f the MDS plot. By 1988
with this point removed). There is no similar technical transects A and B had completely recovered their
need, however, to remove this sample from the IMS seriation pattern, with IMS values equal to or higher
calculation; this was not done in Table 15.5 though than their 1983 values, highly significant correlations
doing so would increase the p s value from 0.65 to 0.74 with a linear sequence (p<0.1%) and clear sequencing
(as indicated in Fig. 15.6). evident on the MDS plots. There was clearly a graded
response, with a greater breakdown of seriation occur
Table 15.5. Ko Phuket corals {Kj. Index o f Multivariate Seriation ring earlier on the most impacted transect, some
(IMS) along the three transects, fo r four sampling occasions. breakdown on the middle transect but no breakdown
Figures in parentheses are the % significance levels in a permut at all on the least impacted transect.
ation test fo r absence o f seriation (T = 999 simulations).
Overall, the breakdown in the pattern o f seriation was
Year Transect A Transect B Transect C due to the increase in distributional range o f species
1983 0.65 (0 0.72(0.1%) 0.62(0.1%) which were previously confined to distinct sections of
1986 0.26 (3 0.71 (0 - the shore. This is commensurate with the disruption
1987 0.19(6.4%) 0.32 (0 0.65(0.1%) o f almost all the types of mechanism which have been
invoked to explain patterns of seriation, and gives us
1988 0.64(0.1%) 0.80(0.1%) 0.72(0.1%)
no clue as to which o f these is the likely cause.
Chapter 16
page 16-1
Fig. 16.1. Amoco-Cadiz oil spill {A}. MDS fo r 257 macrobenthic Stepwise procedure
species in the Bay o f Morlaix, fo r 21 sampling times (A, B, C,
..., U; see legend to Fig. 10.4 fo r precise dates). The ordination One way round the problem is to search not over
is based on Bray-Curtis similarities from fourth root-transformed
possible combination but some more limited space,
abundances and the samples were taken at approximately
quarterly internals over 5 years, reflecting normal seasonal and the natural choice here is a algorithm
cycles and the perturbation o f the oil spill (stress - 0.09). which operates sequentially and involves both forward
Chapter 16
page 16-2
Bray- 4
2
7 Fig. 16.2. Schematic diagram
Curtis 3 NMDS 12
4 5 o f selection o f a subset o f
All sp e c ie s
5 species whose multivariate
3
6 6 sample pattern matches that
7 fo r the fu ll set o f species. The
search is either over all subsets
Rank
o f species (generalised BIO-
correlation
ENV routine) or, more practic
Bray- ally, stepwise selection o f
74
Curtis • © NMDS 2 i species (B VSTEP routine),
S ub set of
with the aim o f finding the
sp ec ies 5
smallest subset o f species giving
6 3 rank correlation between the
©© similarity matrices o f p > 0.95.
and backward-stepping phases.11 At each stage, a sel EXAMPLE: Amoco-Cadiz oil spill
ection is made o f the best single species to add to or
drop from the existing selected set. Typically, the proc
Applying this (BVSTEP) procedure to the 125-species
edure will start with a null set, picking the best single
set from the Bay o f Morlaix, a smallest subset o f only
variable (maximising p), then adding a second variable
9 species can be found, whose similarity matrix across
which gives the best combination with the first, then
the 21 samples correlates with that for the full species
adding a third to the existing pair. The backward
set, at p > 0.95. The MDS plot for the 21 samples based
elimination phase then intervenes, to check whether
only on these 9 species is shown in Fig. 16.3b and is
the first selected variable can now be dropped, the
seen to be largely indistinguishable from 16.3a. The
combination of second and third selections alone not
make-up of this influential s set is disc
having been considered before. The forward selection
but it is important to realise, as often with stepwise proc
phase returns and the algorithm proceeds in this fashion
edures, that this may be far from a unique solution.
until no further improvement is possible by the addition
There are likely to be other sets o f species, a little larger
o f a single variable to the existing set or, more likely
in number or giving a slightly lower p value, that would
here, the stopping criterion is met (p exceeds 0.95).
do a (nearly) equally good job o f ‘explaining’ the full
In order fully to clarify the alternation o f forward and
pattern.
backward stepping phases, Table 16.1 describes a
purely hypothetical (and unrealistically convoluted) One interesting way of seeing this is to discard the
search over 6 variables. Analogously to the MDS initial selection o f 9 species, and search again for a
algorithm o f Chapter 6, it is quite possible that such further subset that produces a near-perfect match (p >
an iterative search procedure will get trapped in a local 0.95) to the pattern for the full set o f 125 species. Fig.
optimum and miss the true best solution; only a minute 16.3c shows that a second such set be found, this
fraction of the vast search space is ever examined. Thus, time o f 11 species. If the two sets are discarded, a third
it may be helpful to begin the search at several, different, (of 14 species), then a fourth (of 18 species) can also
random starting points, i.e. to start sequential addition be identified, and Fig. 16.3d and e again show the high
or deletion from an existing, randomly selected set of level of concordance with the full set, Fig. 16.3a. There
25% (say) of the species.7 are now 73 species left and a fifth set can just about
be pulled out of them (Fig. 16.3f), though now the
11 This concept may be familiar from stepwise multiple regression algorithm terminates at a genuine maximum o f p; a
in univariate statistics, which tackles a similar problem o f selecting match better than p = 0.91 cannot be found by the
a subset o f explanatory variables which account fo r as much as
possible o f the variance in a single response variable. stepwise procedure, even after several attempts with
different random starting positions. If these (27) species
The PRIMER B VSTEP routine carries out this stepwise approach
on the active worksheet (the faunal data matrix), fo r a separately are also discarded, the ability o f the remaining 46
specified, fix ed similarity matrix (Bray-Curtis on the faunal data species to reconstruct the initial pattern degrades slowly
matrix here). There are options always to exclude, or always to (Fig. 16.3g) then rapidly (Fig. 16.3h and i), i.e. little
include, certain variables (species) in the selection, to start the o f the original ‘signal’ remains.
algorithm either with none, all or a random set o f variables in the
initial selection, and to output results o f the iteration at various
levels o f detail.
Chapter 16
page 16-3
Table 16.1. Hypothetical illustration o f stages in a stepwise algorithm (F: forward selection, B: backward elimination steps) to select a
subset o f species which match the multivariate sample pattern fo r a fu ll set (here, 6 species). Bold underlined type indicates the subset
with the highest p at each stage, and italics denote a backward elimination step that decreases p and is therefore ignored. The procedure
ends when p attains a certain threshold (p > 0.95), or when forward selection does not increase p.
Clarke and W arwick (1998a) discuss the implication in the second set, and vice-versa. (Note that such taxon-
of these plots for concepts o f structural redundancy omic-relatedness concepts are the basis of several bio
in assemblages (and, arguably, for functional redund diversity indices proposed in Chapter 17.) A permutat
ancy, or at least compensation capacity). They invest ion test can be constructed which leads to the conclusion
igate whether the various sets of species ‘peeled’ out that the peeled subsets are more taxonomically similar
from the matrix have a similar taxonomic structure. (i.e. have greater taxonom ic coherence) than would
For example, Table 16.2 displays the first and second be expected by chance. The number o f such coherent
‘peeled’ species lists and defines a taxonomic mapping subsets that can be ‘peeled out’ from the matrix is
coefficient, used to measure the degree to which the clearly some measure o f redundancy o f information
first set has taxonomically closely-related counterparts content.
Nassarius reticulatus 2 —----------------------------- 4 Gastrosaccus lobatus 2) species subsets which best respond to (characterise)
-
S p e c ie s F a m ilie s
SECOND-STAGE MDS
C B D A C R D A
E B E
It is not normally a viable sampling strategy, for soft-
No
sediment benthos at least, to use the B VSTEP procedure PQ°
Q RN T SU R
to identify a subset of species as the only ones whose
abundance is recorded in future. Savings o f monitoring dH
F N K
effort at the identification analysis stage can sometimes A n A
D E
be made, however, by working at a higher taxonomic C B
C
U B
1^ E
2 Rank
3 ©• correlations
Set A (eg
species, no 5
Bray- 6
transform) Curtis 7
Correlation
P ab
Set B (eg
species, V
transform)
Correlation
P bc
Set C (eg
family, no •© Fig. 16.5. Schematic diagram
transform) Second stage o f the stages in quantifying
MDS and displaying agreement, by
second-stage MDS, o f different
multivariate analyses o f a corr
esponding set o f samples.
Table 16.3. Amoco-Cadiz oil spill {A}. Spearman correlation species and family level analyses largely forms the
matrix between eveiy pair o f similarity matrices underlying the
other (bottom to top) axis. Three important points are
10 plots o f Fig. 16.4, measuring the extent to which they ‘tell the
same sto iy’ about the 21 Morlaix samples. These correlations immediately clear:
(rank ordered) are treated like a similarity matrix and input to a 1) Log and VV transforms are virtually identical in their
second-stage MDS. Key: s —species-level analysis, ƒ = family-level;
effect on the data, with differences between these
0 = no transform, 1 = root, 2 = 4th root, 3 = log(l +x), 4 = presence
/absence. transformations being much smaller than that between
species and family-level analyses in that case.
sO si s2 s3 s4 fO fl 12 O
2) With the exception of these two, the transformations
si .970 generally have a much more marked effect on the
s2 .862 .949
outcome than the aggregation level (the relative
s3 .852 .942 .995
s4 .736 .847 .961 .946
distance apart on the MDS of the points representing
fO .996 .965 .855 .845 .726 different transformations, but the same taxonomic
fl .949 .993 .961 .958 .865 .947 level, is much greater than the distance apart of
n .791 .893 .972 .974 .953 .785 .924 species and family-level analyses, for the same
D .760 .869 .962 .971 .946 .753 .904 .993 transformation).
f4 .645 .756 .877 .870 .923 .639 .792 .946 .929
3) The effect of taxonomic aggregation becomes greater EXAMPLE: Phuket coral-reef time series
as the transformation becomes more severe, so that
for presence/absence data the difference between A rather different application of second-stage MDS 11
species and family-level is much more important is motivated by considering the two-way layout from
than it is for untransformed or mildly transformed a time-series of coral-reef assemblages, along an
counts. Whilst this is not unexpected, it does indicate onshore-offshore transect in Ko Phuket, Thailand
the necessity to think about analysis choices in comb {K}. These data were previously met in Chapter 15,
ination, when designing a study. where only samples from the earlier years 1983, 86 ,
Other applications 87, 88 were considered (as available to Clarke et a f
1993). The time series has been subsequently expanded
The concept o f a second-stage M D S used on rank to the 13 years 1983-2000, omitting 1984, 85, 89, 90
correlations between similarity matrices - from different and 96, on transect A (Brown et a f in press). This
taxonomic aggregation levels (species, genus, family, transect consisted o f 12 equally-spaced positions along
trophic group) and, in the same analysis, different the onshore-offshore gradient, and was subject to
faunal groups (nematodes, macrofauna) recorded for sedimentation disturbance from dredging for a new
the same set o f sites - was introduced by Somerfield deep-water port in 1986 and 87. For 10 months during
and Clarke (1995), for studies in Liverpool Bay and late 1997 and 98 there was also a wide scale sea-level
the Fai estuary, UK. Olsgard et al (1997, 1998) depression in the Indian Ocean, leading to significantly
expanded the scope to include the effects of different greater irradiance exposures at mid-day low tides.
transformation, simultaneously with differing aggreg Elevated sea temperatures were also observed (in
ation levels, for data from N Sea oilfield studies .11 1991, 95, 97, 98), sometimes giving rise to coral
Other interesting applications include Kendall and bleaching events, but these generally resulted in only
Widdicombe (1999) who examined different body- short-term partial mortalities.
size components o f the fauna as well as different
faunal groups, from a hierarchical spatial sampling The two (crossed) factors here are the years and the
design (spacings of 50cm, 5m, 50m, 500m) in Plymouth positions down the transect ( 1- 12, at the same spacing
subtidal waters. They used a second-stage MDS to each year). Separate MDS plots of the onshore-offshore
display the effects o f different combinations of body- (seriation) pattern for each year show some visual
sizes, faunal groups and transformation. Olsgard and differences which can be summarised in Spearman
Somerfield (2000) introduced the pattern from environ rank correlations (p) between their underlying simil
mental variables as an additional point on a second- arity matrices. These correlations in turn are entered
stage MDS, together with biotic analyses from different into the second-stage MDS to produce Fig. 16.7; it
faunal components (polychaetes, molluscs, crustacea, demonstrates both the sedimentation-based disruption
echinoderms) at another N Sea oilfield. The idea is to the gradient in 1986 and 87, and the negative sea-
that biotic subsets whose multivariate pattern links level anomaly o f 1998. Interestingly, these are on
well to the environmental data will be represented by opposite sides o f the MDS plot, suggesting that the
points on the second-stage MDS which lie close to departures from the ‘normal’ onshore-offshore gradient
the environmental point. The converse operation can are o f a different type.
also be envisaged, as a visual counterpart to the B IO -
ENV procedure. For small numbers of environmental Note the subtlety of what this analysis tries to isolate:
variables, the abiotic patterns from subsets of these the compositions of the transect over the different years
can be represented as points on the second-stage MDS, are not directly compared, as they would be if all were
in which the (fixed) biotic similarity matrix is also analysed together on the same (first-stage) MDS, for
shown. The best environmental combinations should example. There may (and will) be natural year-to-year
then ‘converge’ on the (single) biotic point. fluctuations in abundances which would separate the
transects on a combined MDS plot but which do not
disrupt the serial change in assemblage down the tran
11 They also carried out another interesting analysis, assessing
BIO-ENV results in the light o f analysis choices. It was hypoth sect. The second-stage procedure will not be sensitive
esised earlier (p9-4 and 10-2), that a contaminant impact may to such natural fluctuations. In fact, it eliminates them
manifest itself more clearly in the assemblage pattern fo r inter
mediate transform and aggregation choices. Olsgard et al (1997)
do indeed show, fo r the Valhall oilfield, that the BIO-ENV 11 Both types ofproblem are catered fo r in the PRIMER 2STAGE
matching o f sediment macrobenthos to the degree o f disturbance routine, the inputs either being a series o f similarity matrices (which
from drilling muds disposal (measured by sediment THC, Ba can be taken from any source provided they refer to the same set
concentrations etc), was optimised by intermediate transform (V) o f sample labels), or a single similarity matrix, from a 2-w ay
and aggregation level (family). layout with appropriate factors defined.
Chapter 16
page 16-8
95
T The idea has parallels with the ANOSIM2 procedure at the end
o f Chapter 6, which sets out to remove the main effects o f factor
A in a two-way layout, by concentrating only on the patterns o f
factor B, in separate MDS plots fo r each level o f factor A.
Chapter 17
page 17-1
SPECIES RICHNESS DISADVANTAGES highly sensitive to sample size and totally non
comparable across studies involving unknown,
uncontrolled or simply differing degrees of sampling
Chapter 8 discussed a range of diversity indices based
effort. The same is true, to a lesser extent, o f many
on species richness and the species abundance distrib
other standard diversity indices. Fig. 17.1 shows
ution. Richness (S) is widely used as the preferred
the effect o f increasing numbers of individuals on
measure o f biological diversity (biodiversity) but it has
the values of some of the diversity indices defined
some m ajor drawbacks, many o f which apply equally
in Chapter 8 . This is a sub-sampling study, selecting
to other diversity indices such as H', H, J', etc.
different numbers of individuals at random from a
1) Observed richness is heavily dependent on sam ple single, large community sample. The only index to
size/effort. In nearly all marine contexts, it is not demonstrate a lack of bias in mean value is Simpson
possible to collect exhaustive census data. The diversity, given here in the form 1-À/, see equation
assemblages are sampled using sediment cores, trawls (8.4). Comparison of richness, Shannon, evenness,
etc, and the ‘true’ species richness o f a station is Brillouin etc values for differing sample sizes is
rarely fully represented in such samples. For example, clearly problematic.
Gage and Coghill (1977) describe a set of contiguous
2) Species richness does not directly reflect phylo-
core samples taken for macrobenthic species in a
genetic diversity. “A measure o f biodiversity o f a
Scottish sea-loch. A species-area p lo t (or accum
site ought ideally to say something about how
ulation curve) which illustrates how the number of
different the inhabitants are from each other” (Harper
different species detected increases as the samples
and Hawksworth, 1994). It is clear that a sample
are accumulated11, shows that, even after 64 replicate
consisting of 10 species from the same genus should
samples are taken at this single locality, the observed
be seen as much less biodiverse than another sample
number o f species is still rising.
of 10 species, all of which are from different families:
genetic, phylogenetic or, at least, taxonomic related
ness of the individuals in a sample is the key concept
which is developed in this chapter, into practical
indices which genuinely reflect biodiversity and
are robust to sampling effort variations.
40"
3) N o statistical fra m ew o rk exists f o r departure o f S
fro m ‘expectation\ Whilst observed species richness
measures can be compared across sites (or times)
which are subject to strictly controlled and equivalent
sampling designs, there is no sense in which the
values o f S can be compared with some absolute
standard, i.e. we cannot generally answer the question
1 32 64 “what do we expect the richness to be at this site?”,
No. of replicate sam p les in the absence of anthropogenic impact, say.
04 0.5-1
1000 10000 100000 10 100 1000 10000 100000 10 100 1000 10000 100000
0.35- 0 . 6-
0.45-
10 100 1000 10000 100000 10 100 1000 10000 100000 10 100 1000 10000 100000
Num ber of individuals (N)
Fig. 17.1. Amoco-Cadiz oil spill {A}, pooled pre-impact data. Values o f 6 standard diversity indices (y-axis, see Chapter 8 fo r definitions),
fo r simulated samples o f increasing numbers o f individuals (x-axis, log scaled), drawn randomly without replacement from the fu ll set
o f 140,344 macrobenthic organisms.
5) R ichness can vary markedly with differing tree (two species at greatest taxonomic distance apart)
habitat type. Again, the ideal would be a measure is set to co = 100. Thus, for a sample consisting only
which is less sensitive to differences in natural o f the 5 species shown, the path between individuals
environmental variables but is responsive to in species 3 and 4 is 0034 = 100, between species 1 and
anthropogenic disturbance. 2 is con = 50, between two individuals o f species 5 is
CO55 = 0 , etc.
AVERAGE TAXONOMIC DIVERSITY
Family
AND DISTINCTNESS
Taxonom ic diversity (A) T axonom ic d istin ctn ess (A AvTD (A+), p res/abs
50—
f
40-
3 0 -
20 -
10-
10 100 1000 10000 100000 1000 10000 100000 20 40 60 80 100 120 140
N um ber of individuals (N) Number of s p e c ie s (S)
75- 400
4000
65-
300-
2000
55-
200
0 J--- ¡— 4 5 -1
Note also that when the taxonomic tree collapses to a A* = [ co ijx , x j ] / [ Z£,<y Xixj] ( 17.3)
single-level hierarchy (all species in the same genus,
say), A becomes Another way o f thinking o f this is as the expected
taxonomic distance apart of any two individuals chosen
A0 = [ 21,1,i<j p¡pj ] / \-( AT1), where p,= x, / at random from the sample, provided those two individ
uals are not from the same species.
= (1 - £ ( /?,2 ) / ( l - A r ' ) (17.2)
A further form o f the index, exploited greatly in what
which is a form of Simpson diversity. The Simpson follows, takes the special case where quantitative data
index is actually defined from the probability that any is not available and the sample consists simply o f a
two individuals selected at random from a sample belong species list (presence/absence data). Both A and A*
to the same species (Simpson, 1949). A is therefore seen reduce to the same coefficient
to be a natural extension o f Simpson, from the case
where the path length between individuals is either 0 A+ = [ ZI,<y coi j ] / [ S ( S - l)/2 ] (17.4)
(same species) or 100 (different species) to a more
refined scale of intervening relatedness values (0 = same where S, as usual, is the observed number of species
species, 20 = different species in the same genera, 40 in the sample and the double summation ranges over
= different genera but same family, etc).f It follows that all pairs i and j o f these species (/</). Put simply, the
A will often track Simpson diversity fairly closely. To average taxonomic distinctness (AvTD) A+ o f a species
remove the dominating effect o f the speciestabundance list is the average taxonomic distance apart of all its
distribution {*,}, leaving a measure which is more nearly pairs of species. This is a very intuitive definition o f
a pure reflection o f the taxonomic hierarchy, Warwick biodiversity, as average taxonomic breadth of a sample.
and Clarke (1995a) proposed dividing A by the Simpson
Sampling properties
index A°, to give average taxonom ic distinctness
For quantitative data, repeating the pairwise exercise
' In addition, there is a relationship betw een A and Sim pson indices (Fig. 17.1) o f random subsampling o f individuals from
c om pu ted a t all higher taxonom ic levels, as recen tly re p o rte d by a single, large sample, Fig. 17.2a and b show that both
Shim atani (2001)
Chapter 17
page 17-4
taxonomic diversity (A) and average taxonomic distinct down into a highly variable response for the strongly
ness (A*) inherit the sample-size independence seen impacted sites, within 100m o f the drilling activity.
in the Simpson index, from which they are generalised.
A further example, from the coastal N Sea, is given
Clarke and Warwick (1998b) formalise this result by
by a time-series o f macrobenthic samples, with data
showing that, whatever the hierarchy or subsample
averaged over 6 locations in Tees Bay, UK, ({V?,
size, A is exactly unbiased and A* is close to being so
Warwick et ol, in press). Samples were taken in March
(except for very small subsamples). For non-quantit-
and September for each o f the years 1973 to 1996,
ative data (a species list), the corresponding question
and Fig. 17.4 shows the September inter-annual patterns
is to ask what happens to the values o f A+ for random
for four (bio)diversity measures. Notable is the clear
subsamples of a fixed number o f species drawn from
increase in Shannon diversity at around 1987/88 (Fig.
the full list. Fig. 17.2c demonstrates that the mean
17.4b), coinciding with significant widescale changes
value of A+ is unchanged, its exact unbiasedness in all
in the N Sea planktonic system which have been
cases again being demonstrated in Clarke and Warwick
reported elsewhere (e.g. Reid et al, in press). However,
(1998b). This lack o f dependence o f A+ (in mean Shannon diversity is very influenced here by the high
value) on the number o f species in the sample has far- numbers o f a single abundance dominant (Spiophanes
reaching consequences for its use in comparing bombyx), whose decline after 1987 led to greater
historic data sets and other studies for which equitability in the quantitative species diversity
sampling effort is uncontrolled, unknown or unequal. measures. A more far-reaching change, representative
of what was happening to the community as a whole,
EXAMPLES: Ekofisk oil-field and Tees is indicated by looking at the taxonomic relatedness
Bay soft-sediment macrobenthos statistics based only on presence/absence data. Use
of simple species lists has the advantage here of ensuring
that no one species can dominate the contributions to
The earlier Fig. 14.4 demonstrated a change in the
the index. Average taxonomic distinctness (A+) is
sediment macrofaunal communities around the Ekofisk
seen to show a marked decline at about the time of
oil-field /'E/, out to a distance o f about 3 km from the
this N Sea regime shift (Fig. 17.4c), indicating a
centre o f drilling activity. This was only evident,
biodiversity loss, a very different (and more robust)
however, from the multivariate (MDS and ANOSIM)
conclusion than that drawn from Shannon diversity.
analyses, not from univariate diversity measures such
as Shannon H', where reduced diversity was only
apparent up to a few hundred metres from the centre OTHER RELATEDNESS MEASURES
(Fig. 17.3a). The implication is that the observed
community change resulted in no overall loss o f The remainder of this chapter deals only with data in
diversity but this is not the conclusion that would the form o f a species list for a locality (presence/absence
have been drawn from calculating the quantitative data). There is a substantial literature on measures
average taxonomic distinctness index, A*. Fig. 17.3b incorporating, primarily, phylogenetic relationships
shows a clear linear trend of increase in A* with (log) amongst species (see references in the review-type
distance from the centre, the relationship only breaking papers o f Faith, 1994, Hum phries et al 1995). The
120
110 W A a
9 0 \
100
h j
90
8 9 i
rr Fig. 17.4. Variations inter-annu-
ally in Tees Bay macrobenthos
80 8 8 1........................................... ............?...................... -
{V}. (Bio)diversity indices fo r
75 80 85 90 95 75 80 85 90 95
Tees Bay areas combined, from
sediment samples in September
Shannon, H' VarTD, A+ d each year, over the period 1973
400 —96, straddling a major regime-
shift in N Sea ecosystems, about
3
375 1987. a) Richness, S; b) Shann
on, H'; c) Average taxonomic
distinctness, A”, based on pres
350
2 ence/absence and reflecting
the mean taxonomic breadth
325 o f the species lists; d) Variation
in taxonomic distinctness, A '
1 300 (also pres/abs), reflecting un
75 80 85 90 95 75 80 85 90 95 evenness in the taxonomic
Year (Sept sam pling each year) hierarchy.
context is conservation biology, with the motivation turnover or morphological richness, it is an appealingly
being the selection o f individual species, or sets of simple statistic. Unfortunately, Fig. 17.5 demonstrates
species (or reserves), with the highest conservation some o f the disadvantages o f using these measures in
priority, based on the unique evolutionary history a distinctness context. The figure compares only
they represent, or their complementarity to existing samples (lists) with the same number o f species (7),
well-conserved species (or reserves). Warwick and at four hierarchical levels (say, species within genera
Clarke (2001) draw a potentially useful distinction of within families, all in one order), so that each step
terminology between this individual species-focused length is set to 33.3. Fig. 17.5b and c have the same
conservation context and the use, as in this chapter, tree topology, yet we should not consider them to
of relatedness information to monitor differences in have the same average (or total) distinctness, since
community-wide patterns in relation to changing each species is more taxonomically similar to its
environmental conditions. They suggest that the term neighbours in b than c (reflected in A+ values of 33.3
taxonomic/phylogenetic distinctiveness (of a species) and 66.6 respectively). Similarly, contrasting Fig. 17.5d
is reserved for weights assigned to individual species, and e, the total PD is clearly identical, the sum of all
reflecting their priority for conservation; whereas the branch lengths being 333 in both cases, but this
taxonomic/phylogenetic distinctness (of a community) does not reflect the more equitable distribution of
summarises features of the overall hierarchical structure species amongst higher taxa in d than e (A+ does,
of an assemblage (the spread, unevenness etc. of the however, capture this intuitive element o f biodiversity,
classification tree). with respective values o f 52 and 43).
Order
Family
Fig. 17.5. a)-f) Example taxon
omic hierarchies fo r presence/
Genus absence data on 7 species (i.e.
o f fixed species richness), with
4 levels and 3 step lengths (thus
each o f 33.3, though the third
S p e cie s 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 step only comes into play fo r
plot f). (P~: average phylo
e genetic diversity, A": average
taxonomic distinctness, A :
variation in TD. The plots
show, inter alia: the expected
1biodiversity ’ decrease from
a) to d) and e) to b) (in both
A~ and O ;, and from d) to e)
(but only in A", not in );
unevenness o f f) in relation to
c), reflected in increased K
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 though unchanged A~.
a better equivalent to average taxonomic distinctness may well be a useful measure of total taxonomic breadth
(AvTD, A+) would be average phylogenetic diversity of an assemblage, as a modification of species richness
(AvPD), defined as the ratio: which allows for the species inter-relatedness, so that
it would be possible, for example, for an assemblage
0 + = PD I S (17.5) of 20 closely-related species to be deemed less ‘rich’
than one o f 10 distantly-related species. In general,
This is a very intuitive summary of average distinctness,
however, like total PD, total TD will tend to track
being the contribution that each species makes on
species richness rather closely, and will only therefore
average to the total tree length, but unfortunately it
be useful for tightly controlled designs in which effort
does not have the same lack of dependence on sampling
is identical for the samples being compared, or sampling
effort that characterises A+. Fig. 17.2e (and the later
is sufficiently exhaustive for the asymptote o f the
Fig. 17.9b) show that its value decreases markedly as
species-area curve to have been reached (i.e. comparison
the number of species (S) increases, making it mislead
of censuses rather than samples).
ing to compare AvPD values across studies with differ
ing levels o f sampling effort. Variation in TD
‘Total’ versus ‘average’ measures Finally, a comparison o f Fig. 17.5c and f shows that
Note the distinction here between total and average the scope for extracting meaningful biodiversity indices
(unrelated to richness) from simple species lists has
distinctness measures. AvPD ( 0 +) is the analogue o f
not yet been exhausted. Average taxonomic distinctness
AvTD (A+), both being ways of measuring the average
taxonomic breadth o f an assemblage (a species list), is the same in both cases (A+ = 66.6) but the tree
constructions are very different, the former having
for a given number of species. A+ will give the same
consistent, intermediate taxonomic distances between
value (on average) whatever that number of species;
pairs o f species, in comparison with the latter’s disparate
0 + will not. Total PD measures the total taxonomic
range o f small and large values. This can be conven
breadth o f the assemblage and has a direct analogue
iently summarised in a further statistic, the variance
in total taxonom ic distinctness:
of the taxonomic distances {coy} between each pair of
TTD = 5. A+ = I , cou ) / ( S - 1)] (17.6) species i andy, about their mean value A+:
Explained in words, this is the average taxonomic A+ = [ ££,<, (®y - A+)2 ] / [ - 1)/2 ] (17.7)
distance from species i to every other species, summed
over all species, z = 1 ,2 , ..., S. (Taking an average termed the variation in taxonom
rather than a sum gets you back to AvTD, A+.) TTD Its behaviour in a practical application will be examined
Chapter 17
page 17-7
later in the chapter11, but note for the moment that it, and reductions from this level, at one place or time,
too, appears to have the desirable sampling property can potentially be interpreted as loss o f biodiversity.
o f (approximate) lack o f dependence of its mean value
on sampling effort (see Fig. 17.2f). Testing framework
10 (H
100 - m = 122 s p e c ie s Exe sa n d s m - 112 s p e c ie s
'A+ = 79.1 Fig. 1 7.7. UK regional study,
80- free-living nematodes {U}.
80-
Histograms o f simulated AvTD,
from 999 sublists drawn rand
60- 60- omly from a UK master list o f
0>
o» 395 species. Sublist sizes o f a)
o>
m=122, b) m=112, correspond
40- 40- ing to the observed number o f
Clyde sa n d s
species in the Exe (ES) and
A+ = 74.1
Clyde (Cl) surveys. True A
20- 20 -
also indicated: the Exe value
u_ is central but the null hypoth
0- 0- esis that AvTD fo r the Clyde
equates to that fo r the UK list
74 75 76 77 78 79 80 81 82 74 75 76 77 78 79 80 81 82 as a whole is clearly rejected
A verage Taxonom ic D istin ctn ess A+ (p<0.001 or 0.1%)
Chapter 17
page 17-9
85
<
c/)
</>
a> 80
+c-» # s a ES
o
c
‘-M
(/>
ss •
O EM • Fig. 17.8. UK regional study y
o free-living nematodes {U}.
E 75 Funnel plot for simulated AvTD,
o F0 C2
c
as in Fig. 17.1, but fo r a range
o o f sublist sizes m=10, 15, 20,
X
(0 ..., 250 (x-axis). Crosses, and
H
o thick lines, indicate limits within
O) which 95% o f simulated A "
(0
k.
a> 70 values lie; the thin line indicates
>
< mean A ' (the AvTD fo r the
A+ for the UK list master list), which is not a
function o f m. Points are the
Simulated 95% limits true AvTD (y-axis) fo r the 14
65 location/habitat studies (see
Fig. 17.6 fo r codes), plotted
0 50 100 150 200 250 against their sublist size (x-
Number of species (m) axis).
for low numbers o f species. Superimposing the real not lead to an intrinsic relationship between the
A+ values for the 14 habitat/location combinations, five two but that does not prevent there being an
features are apparent: correlation; the latter would imply some genuine
assemblage structuring which predisposed large
1)The impacted areas o f Clyde, Liverpool Bay, Fai
communities to be more (or less) ‘averagely distinct’
and, to a lesser extent, Tamar, are all seen to have
than small communities. The lack o f an intrinsic,
significantly reduced average distinctness, whereas
mechanistic correlation greatly aids the search for
pristine locations in the Exe and Scilly have A+ values such interesting observational relationships (see
close to that o f the UK master list. also the later discussion on AvTD, VarTD correl
2) Unlike species richness (and in keeping with the ations). The same cannot be said for phylogenetic
‘desirability criteria’ stated earlier), A+ does not diversity, PD. Fig. 17.9a shows the expected near-
appear to be strongly dependent on habitat type: Exe linear relation between total PD and for these
sand and mud habitats have very different numbers meiofaunal studies (total TD and would have
o f species but rather centrally-placed distinctness; given a similar picture) but, more significantly, Fig.
Scilly algal and sand habitats have near-identical A+ 17.9b bears out the previous statements about the
values. Warwick and Clarke (1998) also demonstrate dependence also o f averag PD (<t>+)
a lack o f habitat dependence in A+ from a survey of intrinsic relationship, shown by the declining curve
Chilean nematodes (data o f W Wieser). for the expected value of 0 + as a function of the
number o f species in the list, contrasts markedly
3) There is apparent monotonicity o f response o f the with the constant mean line for A+ in Fig. 17.8.
index to environmental degradation (also in keeping Nothing can therefore by read into an observed
with another initial criterion). To date, there is no negative correlation o f <D+ and S in a practical study:
evidence o f average taxonomic distinctness increasing such a relationship would be likely, as here, to be
in response to stress. purely mechanistic, i.e. artefactual.
4) In spite o f the widely differing lengths of their species
AvTD is therefore seen to possess many of the features
lists, it is notable that the two Clyde studies (C l, C2)
return rather similar (depressed) values for A+. listed at the beginning o f the chapter as desirable in a
biodiversity index - a function, in part, o f its attractive
5) There is no evidence o f any empirical relation in the mathematical sampling properties (for formal statistical
(A+, S)scatter plot. We know from the sampling results on unbiasedness and variance structure see
theory that the mechanics o f calculating A+ does Clarke and Warwick, 1998b, 2001). Many questions
Chapter 17
page 17-10
Q 5000- e 50
CL •E >.
FO
£ 4000- TY, o 45
>
5 -•c*iES a
5 L#®S o
£O 3000- % 40 S3»«®
• FA c
Q EM <
a> U) TY
m 2000
o i 35
>*
-C SA%SS -C
Q. C1
CL ®FO ® 30
-
15 1000 O) ou
o 0
»- Simulated mean 4>+
< 25
50 100 150 200 50 100 150 200 250
Number of sp ec ies Number o f sp ec ies
There is a wealth of taxonomic detail to exploit in this repeating the mean, lower and upper limits in sub-plots
case. The analysis uses a 14-level classification (Fig. o f observed A+ values for the 9 sea areas. A+ is clearly
17.11), based on phylogenetic information, compiled seen to be reduced in some areas, particularly 6, 8 and
by J.D. Reynolds (Univ E Anglia), primarily from 9, whilst remaining at ‘expected’ levels in others.
Nelson (1994) and McEachran and Miyake (1990). Rogers et al( 1999) discuss possible explanations for
The distinctness structure o f this master list, and its this, noting the contribution made by the spatial pattern
AvTD o f A+ = 80.1, for all groundfish species that o f elasmobranchs, a taxonomic group they argue may
could be reliably sampled and identified, becomes the be particularly susceptible to disturbance by commercial
standard against which the species lists from the various trawling, because o f their life history traits.
quarter-rectangles are assessed.
Weighting of step lengths
Funnel plot
M any o f the fine-scale phylogenetic groupings in Fig.
Fig. 17.12 displays the resulting funnel plot of the range 17.11 are utilised comparatively rarely (e.g. subgenera
o f A+ values expected from sublists o f size 5 to 35, only within Raja,tribe only within the Pleuronectida
Chapter 17
page 17-11
Class Superorder Order Superfamily Subfamily Genus Species
Fig. 17.11. Quarter ICES rectangles, groundfish surveys {Q}. 14-level classification (phylogenetically-based) used fo r the construction o f
taxonomic distances between 93 demersal fish species, those that could be reliably sampled and identified fo r the 277 rectangles in this
N European study.
Table 17.1. Quarter ICES rectangles, groundfish surveys {Q}. etc), and the standard assumption that all step lengths
The 13 taxonomic/phylogenetic categories (k) used in the ground
between taxonomic levels are given equal weight (7.69,
fish study, the standard taxonomic distances /co*/ and an alternative
formulation {to/®} based on taxon richness {sk} at each level. in this case) may appear arbitrary. For example, if a
co¿ (or co/%) is the path length between species from different new category is defined which is not actually used,
taxon group k but the same group k+1. then the resulting change in all the step lengths, in
order to accommodate it, seems unwarranted. The
k Taxon Sk COh w #> natural alternative here is to make the step lengths
proportional to the extent o f group melding that takes
1 Species 93 7.7 1.3 place, larger steps corresponding to larger decreases
2 Sub-genus 89 15.4 6.9 in taxon richness. A null category would then add no
3 Genus 72 23.1 8.9 additional step length. Table 17.1 shows the resulting
4 Tribe 67 30.8 12.5 taxonomic distances {co(0)} between species connected
5 Sub-family 59 38.5 21.4 at the differing levels, contrasted with the standard,
6 Family 41 46.2 22.9 equal-stepped, distances {co}. Obviously, both are
standardised so that the largest distance in the tree
7 Super-family 39 53.8 27.4
(between species in the different classes Chondrichthyes
8 Sub-order 33 61.5 44.4 and Osteichthyes) is set to 100.
9 Order 14 69.2 54.9
10 Series 9 76.9 61.4 Fig. 17.13 demonstrates the minimal effect these revised
11 Super-order 7 84.6 65.6 weights have on the calculation of average taxonomic
12 Sub-division 6 92.3 85.3 distinctness, A+. It is a scatter plot o f A+(0) (revised
13 Class 2 100.0 100.0 weights) against A+ (standard, equal-stepped, distances)
for the 277 quarter-rectangle species lists. The relation
Chapter 17
page 17-12
x i
_ 80
< 75
700
For the UK nematode study Fig. 17.14 displays JOINT (AvTD, VarTD) ANALYSES
the funnel plot for VarTD (A+) which is the companion
to Fig. 17.8 (for AvTD, A+). It is constructed in the The histogram and funnel plots o f Figs. 17.7 and 17.8
same way, by many random selections o f sublists of a are univariate analyses, concentrating on only one index
fixed size m from the UK master list o f 395 nematode
at a time. Also possible is a bivariate approach in
species, and recomputation of A+ for each subset. The which (A+, A+) values are considered jointly, both in
resulting histograms are typically more symmetric respect o f the observed outcomes from real data sets
than for A+, as seen by the 95% probability limits for and their expected values under subsampling from a
‘expected’ A+ values, across the full range o f sublist master species inventory. Fig. 17.15 shows the results
sizes:m= 10, 1 5 ,2 0 ,2 5 ,..., shown in Fig. 17.14. Three o f a large number o f random selections o f 100
features are noteworthy: species from the 395 in the UK nematode list
1)The simulated mean A+ (thin line in Fig. 17.14) is each selection gives rise to an (AvTD, VarTD) pair
again largely independent o f sublist size, only de and these are graphed in a scatter plot (Fig. 17.15a).
clining slightly for very short lists (and the slight Their spread defines the (rather
bias is dwarfed by the large uncertainty at these than range)of distinctness behaviour, for a sublist of
low sizes). Clarke and W arwick (2001) derive an 100 species. Superimposed on the same plot are the
exact formula for the sampling bias o f A+ and show, observed (A+, A+) pairs for three o f the studies with
generally, that it will be negligible. This again has list sizes o f about that order: all three (Clyde, Liver
important practical implications because it allows pool Bay and Scilly) are seen to fall outside the
expected structure, though in different ways, as
A+ to be meaningfully compared across (historic)
previously discussed.
studies in which sampling effort is uncontrolled.
2) The various UK habitat/location combinations all ‘Ellipse’ plots
fall within ‘expected’ ranges, with the interesting
It aids interpretation to construct the bivariate equivalent
exception of the Scilly data sets. These have signif
of the univariate 95% probability limits in the histogram
icantly higher VarTD values, as discussed above.
or funnel plots, namely a 95% probability within
3 )A + therefore appears to be extracting independent which (approximately) 95% o f the simulated values
information, separately interpretable from A+, about fall. An adequate description here is provided by the
the taxonomic structure o f individual data sets. ellipse from a fitted bivariate normal distribution to
This assertion is testable by a approach. separately transformed scales for A+ and A+.
Chapter 17
page 17-14
74 75 76 77 78 79 80 81 82 74 75 76 77 78 79 80 81 82
A verage Taxonom ic D istin ctn ess A+ A verage T axonom ic D istin ctn ess A+
Fig. 17.15. UK regional study, free-living nematodes {U}. a) Scatter plot o f (AvTD, VarTD) pairs from random selections o f m = 100
species from the UK nematode list o f 395; also superimposed are three observed points: Clyde (Cl), Liverpool Bay (L) and Scilly (S),
all falling outside ‘expectation ’. b) Probability contours (back-transformed ellipses) containing approximately 95, 90, 75 and 50% o f
the simulated values. Both plots are based on 1000 simulations though only 500 points are displayed, fo r clarity.
AvTD in particular needs a reverse power transform 2) The ‘failure to reject’ region o f the null hypothesis,
to eliminate the left-skewness though, as previously inside the simulated 95% probability contour, is not
noted, any transformation o f VarTD can be relatively rectangular, as it would be for two separate tests.
mild, if needed at all. Clarke and Warwick (2001) This opens the possibility for other faunal groups,
discuss the fitting procedure in detail^ and Fig. 17.15b where simulated A+ and A+ values may be negatively
shows its success in generating convincing probability correlated (as appears to happen for components o f
contours, containing very close to the nominal levels the macrobenthos, Clarke and Warwick, 2001), that
o f 50, 75, 90 and 95% o f simulated data points. In significance could follow from the combination o f
the normal convention, the ‘expected region’ is taken moderately low AvTD and VarTD values, where
as the outer (95%) contour, which is an ellipse on the neither of them on their own would indicate rejection.
transformed scales, though typically ‘egg-shaped’
when back-transformed to the original (A+, A+) plot. 3) It aids interpretation o f spatial biodiversity patterns
to know whether there is any intrinsic, artefactual
A different region needs to be constructed for each correlation to be expected between the two indices,
sublist size or, in practice, for a range o f values o f m resulting from the fact that they are both calculated
straddling the observed sizes. Though these could all from the same set o f data. Here, Fig. 17.15 shows
be displayed on a single plot, it may improve clarity emphatically that no such internal correlation is to
to separate them in groups o f two or three, as in Fig. be expected (though, as ju st commented, the indep
17.16. The conclusions are largely unchanged from endence o f A+ and A+ is not a universal result, and
those drawn earlier, for the separate funnel plots, and needs to be examined by simulation for each new
it would be reasonable to query the advantage o f a master list). Yet the empirical correlation between
bivariate approach here. However, it has at least A+ and A+ for the 14 studies is not zero but large
three merits: and positive (Fig. 17.17). This implies a genuine
correlation from location to location in these two
l)T h e procedure automatically compensates for the
assemblage features, which it is legitimate to inter
repeated testing inherent in two separate, univariate
pret. The suggestion (Clarke and Warwick, 2001)
tests.
is that pollution may be connected with a loss both
of the normal wide spread o f higher taxa (reduced
A+), and that the higher taxa lost are those with a
^ Accomplished by the PRIMER TAXDTEST routine, which auto simple subsidiary structure, represented only by
matically carries out the simulations and transformation/fitting o f
bivariate probability regions to obtain (transformed) ‘ellipse’plots,
one or two species, genera or families, leaving a
fo r specified sublist sizes, on which real data pairs (A' ,A':) may more balanced tree (reduced A+).
be superimposed
Chapter 17
page 17-15
600 SA (41)
S S (42)
C2: Clyde 2
550 80 EM: Exe mud
FA: Fai
FO: Forth
C /)
(/>)
Q 500 N: Northumberland
*C
-> SA: Scillies algae
o
c FO {27} FA (78s SS: Scillies sand
co 450
TA: Tamar
b
o
E
o 400 - -I ._._i_I_i_._._L_
c
o
X
(0
C
600
c
o £ S (102)
400
76 82 72 74 76 78 80 82
A verage T axonom ic D istin c tn e ss A+
Fig. 17.16. UK regional study, free-living nematodes {U}. ‘E llipse’plots o f 95% probability regions fo r (AvTD, VarTD) pairs, as fo r
Fig. 17.15 but fo r a range o f sublist sizes: a) m = 40, 50; b) m = 60, 80; c) m = 100, 115; d) m = 120, 160. The observed (A \ A")
values fo r the 14 location/habitat studies are superimposed on the appropriate plot fo r their particular species list size (given in brackets).
As seen in the separate funnel plots (Figs. 17.8 and 17.14), Clyde, Liverpool Bay, Fai (borderline) and all the Isles o f Scilly data sets
depart significantly from expectation.
oilfields, and Ward et al (submitted) for a latitudinal indices (Warwick and Clarke, 1995). This suggestion
study o f pelagic copepods. Two non-marine examples was not borne out by subsequent oil-field studies
are D. Danielopol (pers. comm.) for groundwater copep (Somerfield et al, 1997), particularly where the impact
ods and Shimatani (2001) for forest stands. A bivariate was less sustained, the data collection at a less extensive
biodiversity example is given by Warwick and Light level and hence the gradients more subtly entwined
(in press), who use ‘ellipse’ plots o f expected (A+, A+) with natural variability. But it would be a mistake to
values, from live faunal records o f the Isles o f Scilly, claim sensitivity as a rationale for this approach: there
to examine whether easily sampled bivalve and is much empirical evidence that the best way of detect
gastropod death assemblages could be considered ing subtle community shifts arising from environmental
representative o f the taxonomic distinctness structure impacts is not through univariate indices at all, but by
o f the live fauna. non-parametric multivariate display and testing (Chapter
14). The difficulty with the multivariate techniques is
C aveats that, since they match precise species identities through
the construction o f similarity coefficients, they can be
In view o f its limited application to date, too much sensitive to wide scale differences in habitat type, geo
should not be claimed for this methodology!11 It is graphic location (and thus species pool) etc.
surprising that anything sensible can be said about
diversity at all, for data consisting simply o f species Though independent of particular species identities,
lists, and arising from unknown or uncontrolled many of the traditional univariate indices have their
sampling effort (which usually renders it impossible own sensitivities, to habitat type, dominant species
to read anything into the relative size of these lists). and sampling effort differences, as we have discussed.
Yet, much o f the later part o f this chapter suggests The general point here is that robustness (to sampling
that not only can we find one index (AvTD) which details) and sensitivity (to impact) are usually conflicting
may be validly compared across such studies, and criteria. What is properly claimed for average taxon
which captures an intuitive sense of what biodiversity omic distinctness is not sensitivity but:
means, but we can also find a second one (VarTD), a) relevance - it is a genuine reflection o f biodiversity
with similarly attractive statistical properties, and loss, gain, or neither (rather than recording simply
which seems to capture (in some cases, at least) an a change of assemblage composition), and one that
entirely independent attribute o f biodiversity structure.
appears to respond in a monotonie way to impact;
Nonetheless, it is clear that controlled sampling designs, b) robustness - it can be meaningfully compared across
carried out in a strictly uniform way across different studies from widely separated locations, with few
spatial, temporal or experimental conditions, must (or even no) species in common, from different
provide additional, meaningful, comparative diversity habitats, using data in presence/absence form (and
information (on richness, primarily) that A+ and A+ thus not sensitive to dominant species), and with
are designed to ignore. Even here, though, concepts different levels o f sampling effort. This makes its
o f taxonomic relatedness can expand the relevance o f natural use the comparison of regional/global studies
richness indices: rather than use S, or one of its variants and/or historic data sets.
(see Chapter 8), total taxonomic distinctness (TTD)
or total phylogenetic diversity (PD), see pages 17-5 Taxonomic artefacts
and 17-6, capture the richness o f an assemblage in
terms o f its number o f species and whether they are A natural question is the extent to which relatedness
closely or distantly related. indices are subject to taxonomic artefacts. Linnean
hierarchies can be inconsistent in the way they define
Sensitivity and robustness taxonomic units across different phyla, for example.
This concern can be addressed on a number o f levels.
Returning for a moment to the quantitative form o f As suggested earlier, the concept o f mutual distinctness
average distinctness, equation (17.3), the Ekofisk oil o f a set o f species is not constrained to a Linnean class
field study suggested that such relatedness measures ification. The natural metric may be one of genetic
may have a greater sensitivity to disturbance events distance (e.g. Nei, 1996) or that from a soundly-based
than is seen with species-level richness or evenness phylogeny combining molecular approaches with more
traditional morphology. The Linnean classification
clearly gives a discrete approximation to a more contin
^ A quote from Dr Johnson seems apt: “It is not done well; but uous distinctness measure, and this is why it is important
you are surprised to fin d it done at a ll”! (Boswell’s ‘Life o f
Johnson’, 1763)
to establish that the precise weightings given to the
Chapter 17
page 17-17
step lengths between taxonomic levels are not critical form of A+- weighting by taxon richness at the differ
to the relative values that the index takes, across the ent hierarchical levels. The existence o f a master
studies being compared. Nonetheless, it is.a legitimate inventory makes this procedure more appealing, since
concern that a cross-phyletic distinctness analysis if the taxon richness weighting was determined only
could represent a simple shift in the balance of two by the samples to hand, the index would need to be
major phyla as a decrease in biodiversity, not because adjusted as each new sample (containing further
the phylum whose presences are increasing is genuinely species) was added. The message o f this chapter,
less (phylogenetically) diverse but because its taxonomic however, is that the additional complication o f adjust
sub-units have been arbitrarily set at a lower level. Such ing the weights in A+ for differences in taxon richness
taxonomic artefacts could be readily examined by is unnecessary. Constant step lengths appear to be
computing, for example, the (AvTD, VarTD) structure adequate.
across different phyla in a standard species catalogue
(e.g. Howson, 1987). The pragmatic approach, as The inventory is therefore only used for setting a back
illustrated here for the UK nematode study and the ground context, the theoretical mean and funnel limits.
groundfish data, is to work within a well-characterised, Various lists could sensibly be employed: global, local
reasonably taxonomically coherent group. geographic, biogeographic provinces, or simply the
combined species list o f all the studies being analysed.
Master species list The addition of a small number of newly-discovered
species to the master inventory is unlikely to have a
Concerns about the precise definition o f the master detectable effect on the overall mean and funnel for
list (e.g. its biogeographic range or habitat specificity) A+. If these are located in the taxonomic tree at random
also naturally arise. Note, however, that the existence with respect to the existing taxa (rather than all belong
of such a wide-scale inventory is not a central require ing to the same high or low order group) they will have
ment, more o f a secondary refinement. It is not used little or no effect on the theoretical mean A+. This, of
in constructing and contrasting the values o f A+ for course, is one o f the advantages o f using an index o f
individual samples, and only features in two ways in average rather than total taxonomic distinctness.
these analyses:
It also makes clear what the limitations are to the validity
1) In the funnel plots (Figs. 17.8, 17.12, 17.14), location
o f A+ comparisons. W hilst many marine community
o f the points does not require a master species list,
studies seem to consist of the low-level (species or
the latter being used only to display the background
genera) identifications which are necessary for meaning
reference o f the mean value and limits that would
ful computation o f A+, there are always some taxa that
be expected for samples drawn at random from such
cannot be identified to this level. There is no real
an inventory. In Fig. 17.12, in fact, the limits are
difficulty here, since A+ is always used in a relative
not even that relevant since they apply to single
manner, provided these taxa are treated in the same
samples rather than, for example, to the mean o f
way in all samples (e.g. omitted). The ability to impose
the tens o f samples plotted for each sea area. The
taxonom ic consistency, by suitable omissions or re
most useful plot for interpretation here is simply a
groupings, is clearly an important caveat on the use o f
standard m eans p lot of the observed mean A+ and
taxonomic distinctness for historic or widely-sourced
its 95% confidence interval, calculated from the
data sets. Where such conditions can be met, however,
replicates for each sea area (see Rogers et al, 1999
we believe that these (and possibly other) measures
and Warwick and Clarke, 2001).
based on taxonomic relatedness, have a promising
2) In Table 17.1 and Fig. 17.3, the master species list future role in biodiversity assessment in relation to
is employed to calculate step lengths in a revised global change and at global scales.
Chapter 17
page 17-18
Appendix 1
page A 1-1
The following is a list o f all (real) data sets used as p 1-6, 1-7, 1-10, 4-5,
examples in the text, where they are referenced by 10- 4, 10- 6, 10- 7, 15- 2 ,
their indexing letter (A-Z). The entries give all pages
M —Maldive Islands mining. (Dawson-
on which each set is analysed and also its source refer
Shepherd etal,1992).
ence (see also Appendix 3). These are not always the
p 13-2, 14-4, 14- 5, 15-
appropriate references for the analyses o f the text; the
latter can generally be found in Appendix 2. N - Nutrient-enrichment experiment, Solbergstrand
mesocosm, Norway. Ne (G ee
A - Amoco-Cadiz oil spill, Bay of Morlaix, France.
et al,1985).
Macrofauna.(Dauvin, 1984).
p 1-12, 1-13, 10-3, 1
p 10-4, 10-5, 13-2, 1
15-7
16-6, 17-2, 17-3
B — Bristol Channel, England. (Collins P - Plankton survey (Continuous Plankton Recorder),
and Williams, 1982). N.E. Atlantic. Zooplankton,(Cole-
p 3-5, 3-6, 7-2 to 7-4, brook, 1986).
p 13-1
C - Celtic Sea. oplankt.(Collins, pers. comm.).
Z
p 5 -9 Q - Quarter-ICES rectangles, beam-trawl surveys, N.
D - Dosing experiment, Solbergstrand mesocosm, Europe. roundfish.G(Rogers 1998).
Norway (GEEP Workshop). (Warwick p 17-10 to 17-12
et al,1988). R - Tamar estuary mud-flat, S.W. England.
p 4-8, 5-8, 5-9, 9-4 copepods.(Austen and Warwick, 1989).
E — Ekofisk oil platform, N.Sea. (Gray p i 4-6 to 14-8
et 1al,990).
S — Scilly Isles, UK. Seawe (G ee and
p 8-5, 8-6, 10-5, 10-6,
Warwick, 1994).
17-4
p 13-5, 14- 5, 14-6
F - Frierfjord, Norway (GEEP Workshop).
T — Tasmania, Eaglehawk Neck.
fauna.(Gray et al,1988).
(Warwick et al,1990a).
p 1-3, 1-4, 1-9, 1-10,
p 6-9, 12-2 to 12- 4, 13
8-10, 10- 1, 10- 2, 13-6,
G - Garroch Head, sludge dump-ground, Scotland. U - UK regional studies. N(
Warwick an
Macrofauna.(Pearson and Blackstock, 1984). Clarke, 1998).
p 1-6, 1-8, 1-11, 1-12, p 17-7 to 17-10, 17-13
to 11-5, 11-9, 11-10, 15-2
V - Variation inter-annual ly, Tees Bay, UK.
H - Hamilton Harbour, Bermuda (GEEP Workshop). benthos.(Warwick etin press).
Macrofauna, atods.( Warwick
nem 1990c). p 15-7, 17-5
p 8-3, 8-4, 8-12, 13-3,
W -W esterschelde estuary cores, Netherlands; meso
I- Indonesian reef corals, S. Pari and S. Tikus Islands. cosm experiment on food supply.
Coral % over.c(Warwick 1990b). (Austen and Warwick, 1995).
p 6-6, 6-7, 8-3, 8-4, p 6-10, 6-12
13- 5, 14- 3, 14-4, 1 5-5
X - Exe estuary, S.W. England. (Warwick,
J- Joint NE Atlantic shelf studies (“meta-analysis”).
1971).
M acrofauna "production”.(Warwick and Clarke,
p 5-2 to 5-4, 5-7, 6-12,
1993a).
11-9
p 15-2 to 15-5
K —Ko Phuket coral reefs, Thailand. Y - Clyde, Scotland. atodsL
em
N( ambshead, 1986).
cover.(Clarke et al,1993; Brown p 6-7,
in press). 6 -8
p 15-8 to 15-10, 16-7, Z - Azoic sediment recolonization experiment.
L - Loch Linnhe and Loch Eil, Scotland, pulp-mill epods.(Olafsson and Moore, 1992).
effluent. acrofun.(Pearson, 1975).
M p 12-4
Appendix 1
page A 1-2
Appendix 2
page A2-1
This manual chiefly reflects an approach to multivariate Chapter 5: Ordination by MDS. Non-metric MDS
and other community analyses that has been adopted was introduced by Shepard (1962) and Kruskal (1964);
and developed at the Plymouth Marine Laboratory two standard texts are Kruskal and Wish (1978) and
(PML) for well over a decade, and has benefited from Schiffman et al( 1981). Here, the exposition parallels
experience at numerous IOC/UNEP and commercial that in Field et al( 1982) and Clarke (1993); the E
training courses. Methods papers from work at PML nematode graphs (Figs. 5.1-5.4) are redrawn from the
covered in this manual include: Field (1982), former. The dosing experiment (Fig. 5.5) is discussed
Warwick (1986), Clarke and Green (1988), Clarke in Warwick et al(1988).
(1990), Warwick and Clarke (1993a & b, 1995a),
Clarke and Ainsworth (1993), Clarke (1993), Chapter 6: Testing. The basic permutation test and
Clarke and Warwick (1994, 1998a & b, 1999, 2001) simulation o f significance levels can be traced to Mantel
and Somerfield and Clarke (1995, 1997). Clarke (1993, (1967) and Hope (1968), respectively. In this context
1999), Warwick ( 1993) and Warwick and Clarke (2001) (e.g. Figs. 6.2 & 6.3 and eqt. 6.1) it is described by
give general reviews, and a large number of papers from Clarke and Green (1988). A fuller discussion of the
PML and authors worldwide exemplify their use via extension to 2-way nested and crossed ANOSIM tests
the PRIMER package (some are listed in Appendix 3 (including Figs. 6.4 & 6.6) is in Clarke (1993) (with
but there are currently over a thousand in the SCI). some asymptotic results in Clarke, 1988); the coral
analysis (Fig. 6.5) is in Warwick (1990b), and
O f course, the exposition here draws on a wider body the Tasmanian meiofaunal MDS (Fig. 6.7) in Warwick
of statistical techniques, and there follows a brief list et al(1990a). The 2-way design without replication
o f the main sources that can be consulted for more (Figs. 6.8-6.12) is tackled in Clarke and Warwick
detail on the methods and analyses of each Chapter. (1994); see also Austen and Warwick (1995).
Chapter 1: Framework. The categorisation here is Chapter 7: Species analyses. Clustering and ordination
an extension o f that given by W arwick (1988a). The o f species similarities is as illustrated in Field
Frierfjord macrofauna data and analyses (Tables 1.2 (1982), for the Exe nematode data (Figs 7.1 & 7.2,
& 1.6 and Figs. 1.1, 1.2 & 1.7) are extracted and re redrawn); see also Clifford and Stephenson (1975).
drawn from Bayne et al(1988), Gray The (1988)
SIMPER and(“similarity percentages”) procedure is
Clarke and Green (1988), the Loch Linnhe macrofauna described in Clarke (1993).
data (Table 1.4 and Fig. 1.3) from Pearson (1975),
and the ABC curves (Fig. 1.4) from Warwick (1986). Chapter 8: Univariate/graphical analyses. Pielou
The species abundance distribution for Garroch Head (1975), Heip et al( 1988) and Magurran (1991)
macrofauna (Fig. 1.6) is first found in Pearson useful texts, summarising a large literature on a variety
(1983), and the multivariate linking to environmental o f diversity indices and ranked species abundance
variables (Fig. 1.9) in Clarke and Ainsworth (1993). plots. The diversity examples here (Figs. 8.1 & 8.2)
The mesocosm data and analysis (Table 1.7 and Fig. are discussed by Warwick e (1990c, 19
1.10) are extracted and redrawn from Gee ( 1985). ively) and the Caswell Vc omputations (Table 8.1) are
from Warwick et al(1990c). The Garroch Head spe
Chapters 2 and 3: Similarity and Clustering. These abundance distributions (Fig. 8.4) are first found in
methods originated in the 1950’s and 60’s (e.g. Florek Pearson etal(1983); Fig.8.3 is redrawn from Pearson
et al, 1951; Sneath, 1957; Lance and Williams, 1967). and Blackstock (1984). Warwick (1986) introduced
The description here widens that o f Field (1982), Abundance-Biomass Comparison curves, and the Loch
with some points taken from the general texts of Everitt Linnhe and Garroch Head illustrations (Figs. 8.7 &
( 1980) and Cormack (1971). The dendrogram of Frier- 8.8) are redrawn from Warwick (1986) and Warwick
fjord macrofaunal samples (Fig.3.1) is redrawn from et al {1987). The transformed scale and partial domin
Gray et (al1988), and the Zooplankton example (Figs. ance curves o f Figs. 8.9-8.11 were suggested by Clarke
3.2 & 3.3) from Collins and W illiams (1982). (1990), which paper also tackles issues of summary
statistics (Fig. 8.12, equation 8.7, and as employed in
Chapter 4: Ordination by PCA. This is a founding
technique o f multivariate statistics, see for example Fig. 8.13) and significance tests for dominance curves.
Chatfield and Collins (1980) and Everitt (1978). The Chapter 9: Transformations. This chapter is an
final example (Fig. 4.2) is from W arwick (1988). expansion of the discussion in Clarke and Green (1988);
Appendix 2
page A2-2
Fig. 9.1 is recomputed from Warwick (1988). figures can be found as follows: Figs. 14.1-14.3, Gray
et al( 1988); Figs. 14.5-14.7, W arwick (1990b);
Chapter 10: Aggregation. This description of the
Figs 14.9-14.10, Dawson-Shepherd (1992); Figs.
effects of changing taxonomic level is based on Warwick
14.11-14.12, Gee and W arwick (1994); Figs. 14.14-
(1988b), from which Figs. 10.2-10.4 and 10.7 are
14.16, Austen and W arwick (1989).
redrawn. Fig. 10.1 is discussed in Gray (1988),
Fig. 10.5 and 10.8 in W arwick (1990b) and Fig. Chapter 15: Multivariate measures of disturbance.
10.6 in Gray et (al1990) (or Warwick and ThisClarke,
follows the format o f Warwick and Clarke (1995)
1993a, in this categorisation). More recent work on and is an amalgamation o f ideas from three primary
the effects on the analysis o f choice o f taxonomic papers: Warwick and Clarke (1993a) on “meta-analysis”
level (and transform) can be found in Olsgard o f NE Atlantic macrobenthic studies, W arwick and
(1997, 1998) and Olsgard and Somerfield (2000). Clarke (1993b) on the increase in multivariate dispersion
under disturbance, and Clarke (1993) on the break
Chapter 11: Linking to environment. For wider
down o f multivariate sériation patterns. Figs. 15.1-
reading on this type of “canonical” problem, see Chapter
15.3 and Table 15.1 are redrawn and extracted from
5 o f Jongman et al(1987), including ter Braak’s (1986)
the first, Fig. 15.4 and Table 15.2 from the second and
method o f canonical correspondence analysis. The
Figs. 15.5 & 15.6 and Table 15.5 from the third. The
approach here o f performing environmental and biotic
analysis in Table 15.4 is from Warwick (in press).
analyses separately, and then comparing them, combines
that advocated by Field Chapter
et(1982: 16: Comparing multivariate patterns. The
superimposing
variables on the biotic MDS) and by Clarke and Ains general extension o f the BIO-ENV approach of Chapter
worth (1993: the BIO-ENV program). The data in Table 11, to combinations other than selecting environmental
11.1 is from Pearson and Blackstock (1984). Fig 11.3 variables to match biotic patterns, is described in Clarke
is redrawn from Collins and Williams (1982) and Fig. and W arwick (1998a). This details the forward/back
11.6 from Field et al(1982); Figs. 11.7-11.8,
ward stepping
11.10 search algorithm BVSTEP, and uses it
and Table 11.2 are from Clarke and Ainsworth (1993). to select subsets o f “influential species” from a biotic
matrix. Second-stage MDS is defined in Somerfield
Chapter 12: Community experiments. Influential
and Clarke (1995) and early examples o f its use can
papers and books on field experiments, and causal
be found in Olsgard et al(1997, 1998). F
interpretation from observational studies in general,
16-3, and Tables 16-1, 16-2, are extracted from Clarke
include Connell (1974), Flurlbert (1984), Green (1979)
and W arwick (1998a), and Fig. 16-5 from Somerfield
and many papers by A J Underwood, M G Chapman
and Clarke (1995).
and collaborators, in particular the Underwood (1997)
book. Underwood and Peterson (1988) give some Chapter 17: Taxonomic distinctness measures.
thoughts specifically on mesocosm experiments. Lab- Warwick and Clarke (1995a) first defined taxonomic
based “microcosm” experiments on community struct diversity/distinctness. Earlier work, from a conservation
ure, using this analysis approach, are typified by Austen perspective, and using different species relatedness
and Somerfield (1997) and Schratzberger and Warwick properties (such as PD), can be found in, e.g. Faith
(1998b). Figs. 12.2 and 12.3 are redrawn from Warwick (1992, 1994), Vane-W right (1991) and Williams
e ta l(1990a) and Figs. 12.5,12.6 from Gee (1985). et al(1991). The superior sampling properties of
average taxonomic distinctness (A+), and its testing
Chapter 13: Data requirements. The exposition
structure in the case o f simple species lists, are given
parallels that in W arwick (1993) but with additional
in Clarke and W arwick (1998b), and applied to UK
examples. Figs. 13.1-13.3 and 13.8 are redrawn from
nematodes by Warwick and Clarke (1998) and Clarke
Warwick (1993), and earlier found in Colebrook (1986),
and Warwick (1999). Variation in taxonomic distinct
Dawson-Shepherd et al(1992), Warwick (1988b) and
Gray et al( 1988) respectively. Fig. 13.4 is redrawn ness (A+) was introduced, and its sampling properties
from Warwick et al( 1990a), Fig. 13.5 from examined,
Warwickin Clarke and Warwick (2001), and a review
et al( 1990c), Fig. 13.6 from W arwick (1990b) of the area can be found in Warwick and Clarke (2001),
and Fig. 13.7 from Warwick and Clarke (1991). from which Figs. 17.1, 17.2, 17.5, 17.11, 17.12 are
redrawn. Fig. 17.3 is discussed in Warwick and Clarke
Chapter 14: Relative sensitivities. This parallels the (1995a), Fig. 17.4 in Warwick (in press), Figs. 17.6,
earlier sections o f Warwick and Clarke (1991), from 17.8, 17.9, 17.14, 17.17 in Clarke and Warwick (2001),
which all these figures (except Figs. 14.11 & 14.14) Fig. 17.7 in Clarke and Warwick (1998b) and Figs.
have been redrawn. Primary source versions o f the 17.10, 17.13 in Rogers etal{\9
Appendix 3
page A3-1
Addison, R.F., Clarke, K.R. (eds.) (1990). Biological effects o f poll Chapman, M.G., Underwood, A.J. (1999). Ecological patterns in
utants in a subtropical environment. J. exp. mar. Biol. Ecol. 138 multivariate assemblages: information and interpretation of negative
values in ANOSIM tests. Mar. Ecol. Prog. Ser. 180: 257-265
Agard, J.B.R., Gobin, J., Warwick, R.M. (1993). Analysis of marine
macrobenthic community structure in relation to pollution, natural Chatfield, C., Collins, A.J. (1980). Introduction to multivariate
oil seepage and seasonal disturbance in a tropical environment analysis. Chapman and Haii, London
(Trinidad, West Indies). Mar. Ecol. Prog. Ser. 92: 233-243
Clarke, K.R. (1988). Detecting change in benthic community
Anderson, M.J. (2001a). Permutation tests for univariate or multi structure. Proceedings XlVth International Biometric Conference,
variate analysis of variance and regression. Can. J. Fish. Aquat. Namur: Invited Papers. Société Adolphe Quetelet, Gembloux,
Sei. 58: 626-639 Belgium
Anderson, M.J. (2001b). A new method for non-parametric multi Clarke, K.R. (1990). Comparisons of dominance curves. J. exp.
variate analysis of variance. Austral. Ecol. 26: 32-46 mar. Biol. Ecol. 138: 143-157
Anderson, M.J., Underwood, A.J. (1997). Effects of gastropod Clarke, K.R. (1993). Non-parametric multivariate analyses of
grazers on recruitment and succession of an estuarine assemblage: changes in community structure. Aust. J. Ecol. 18: 117-143
a multivariate and univariate approach. Oecologia 109: 442-453
Clarke, K.R. (1999). Non-metric multivariate analysis in community-
Austen, M.C., McEvoy, A.J. (1997). The use of offshore meiobenthic level ecotoxicology. Environ. Toxicol. Chem. 18: 118-127
communities in laboratory microcosm experiments: response to
heavy metal contamination. J. exp. Mar. Biol. Ecol. 211: 247-261 Clarke, K.R., Ainsworth, M. (1993). A method of linking multivariate
community structure to environmental variables. Mar. Ecol. Prog.
Austen, M.C., Somerfield, P.J. (1997). A community level sediment Ser. 92:205-219
bioassay applied to an estuarine heavy metal gradient. Mar. Envir.
Clarke, K.R., Green, R.H. (1988). Statistical design and analysis
Res. 43:315-328
for a ‘biological effects’ study. Mar. Ecol. Prog. Ser. 46: 213-
Austen, M.C., Thrush, S.F. (2001). Experimental evidence suggest 226
ing slow or weak response of nematode community structure to
a large suspension-feeder. J. Sea Res. 46: 69-84 Clarke, K.R., Warwick, R.M. (1994). Similarity-based testing for
community pattern: the 2-way layout with no replication. Mar.
Austen, M.C., Warwick, R.M. (1989). Comparison of univariate Biol. 118: 167-176
and multivariate aspects of estuarine meiobenthic community
structure. Est. cstl. shelf Sei. 29: 23-42 Clarke, K.R., Warwick, R.M. (1998a). Quantifying structural
redundancy in ecological communities. Oecologia 113: 278-
Austen, M.C., Warwick, R.M. (1995). Effects of manipulation of 289
food supply on estuarine meiobenthos. Hydrobiologia 311: 175—
184 Clarke, K.R., Warwick, R.M. (1998b). A taxonomic distinctness
index and its statistical properties. J. appl. Ecol. 35: 523-531
Austen, M.C., Widdicombe, S., Villano-Pitacco, N. (1998). Effects
of biological disturbance on diversity and structure of meiobenthic Clarke, K.R., Warwick, R.M. (1999). The taxonomic distinctness
nematode communities. Mar. Ecol. Prog. Ser. 174: 233-246 measure of biodiversity: weighting of step lengths between hierarch
ical levels. Mar. Ecol. Prog. Ser. 184: 21-29
Bayne, B.L., Clarke, K.R., Gray. J.S. (eds) (1988). Biological
effects o f pollutants: results o f a practical workshop. Mar. Ecol. Clarke, K.R, Warwick, R.M. (2001). A further biodiversity index
Prog. Ser. 46 applicable to species lists: variation in taxonomic distinctness.
Mar. Ecol. Prog. Ser. 216: 265-278
Beukema, J.J. (1988). An evaluation of the ABC-method (abundance
/biomass comparison) as applied to macrozoobenthic communities Clarke, K.R., Warwick, R.M., Brown, B.E. (1993). An index show
living on tidal flats in the Dutch Wadden Sea. Mar. Biol. 99: ing breakdown of sériation, related to disturbance, in a coral-
425-433 reef assemblage. Mar. Ecol. Prog. Ser. 102: 153-160
Box, G.E.P., Cox, D.R. (1964). An analysis of transformations. J. Clifford, D.H.T., Stephenson, W. (1975). An introduction to
R. Statist. Soc. Ser. B 26: 211-243 numerical classification. Academic Press, New York
Bray, J.R., Curtis, J.T. (1957). An ordination of the upland forest Colebrook, J.M. (1986). Environmental influences on long-term
communities of Southern Wisconsin. Ecol. Monogr. 27: 325-349 variability in marine plankton. Hydrobiologia 142: 309-325
Brown, B.E., Clarke, K.R., Warwick, R.M. (in press). Serial patterns Collins, N.R., Williams, R. (1982). Zooplankton communities in
of biodiversity change in corals across shallow reef flats in Ko the Bristol Channel and Severn Estuary. Mar. Ecol. Prog. Ser.
9: 1-11
Phuket, Thailand, due to the effects of local (sedimentation) and
regional (climatic) perturbations. Mar. Biol. Connell, J.H. (1974). Field experiments in marine ecology. In:
Buchanan, J.B. (1993). Evidence of benthic pelagic coupling at a Mariscal, R. (ed.) Experimental marine biology. Academic Press,
station off the Northumberland coast. J. exp. Mar. Biol. Ecol. 172: New York
1 -1 0 Connell, J.H. (1978). Diversity in tropical rain forests and coral
Caswell, H. (1976). Community structure: a neutral model analysis. reefs. Science N. Y. 199: 1302-1310
Ecol. Monogr. 46: 327—354
Appendix 3
page A3-2
Cormack, R.M. (1971). A review of classification. J. R. Statist. archaeological and historical sciences. Edinburgh University
Soc. Ser. A 134: 321-367 Press, Edinburgh
Dauvin, J-C. (1984). Dynamique d’ecosystemes macrobenthiques Gower, J.C., Ross, G.J.S. (1969). Minimum spanning trees and
des fonds sedimentaires de la Baie de Morlaix et leur perturbation single linkage cluster analysis. Appl. Statist. 18: 54-64
par les hydrocarbures de l’Amoco-Cadiz. Doctoral thesis. Univ.
Gray, J.S., Aschan, M., Carr, M.R., Clarke, K.R., Green, R.H.,
Pierre et Marie-Curie, Paris
Pearson, T.H., Rosenberg, R., Warwick, R.M. (1988). Analysis
Dawson-Shepherd, A., Warwick, R.M., Clarke, K.R., Brown, B.E. of community attributes of the benthic macrofauna of Frierljord
( 1992). An analysis of fish community responses to coral mining /Langesundfjord and in a mesocosm experiment. Mar. Ecol. Prog.
in the Maldives. Environ. Biol. Fish. 33: 367-380 Ser. 46: 151-165
Everitt, B. (1978). Graphical techniques fo r multivariate data. Gray, J.S., Clarke, K.R., Warwick, R.M., Hobbs, G. (1990). Detection
Heinemann, London of initial effects of pollution on marine benthos: an example from
the Ekofisk and Eldfisk oilfields, North Sea. Mar. Ecol. Prog.
Everitt, B. (1980). Cluster analysis, 2nd edn. Heinemann, London
Ser. 66: 285-299
Faith, D.P. (1992). Conservation evaluation and phylogenetic
Gray, J.S., Pearson, T.H. (1982). Objective selection of sensitive
diversity. Biol. Conserv. 61: 1-10
species indicative of pollution-induced change in benthic commun
Faith, D.P. (1994). Phylogenetic pattern and the quantification of ities. I. Comparative methodology. Mar. Ecol. Prog. Ser. 9:
organismal biodiversity. Philos. Trans. R. Soc. Lond. Ser. B Biol. 111-119
Sei. 345: 45-58 Green, R.H. (1979). Sampling design and statistical methods for
Faith, D.P., Minchin, P.R, Beibin, L. (1987). Compositional dis environmental biologists. Wiley, New York
similarity as a robust measure of ecological distance. Vegetado
Greenacre, M.J. (1984). Theory and applications o f correspondence
69: 57-68 analysis. Academic Press, London
Field, J.G., Clarke, K.R., Warwick, R.M. (1982). A practical strategy
Hall, S.J., Greenstreet, S.P. (1998). Taxonomic distinctness and
for analysing multispecies distribution patterns. Mar. Ecol. Prog.
diversity measures: responses in marine fish communities. Mar.
Ser. 8: 37-52
Ecol. Prog. Ser. 166: 227-229
Fisher, R.A., Corbet, A.S., Williams, C.B. (1943). The relation
Harper, J.L., Hawksworth, D.L. (1994). Biodiversity: measurement
between the number of species and the number of individuals in
and estimation. Preface. Phil. Trans. Roy. Soc. Lond. Ser. B 345:
a random sample of an animal population. J. anim. Ecol. 12:
5-12.
42-58
Heip, C., Herman, P.M.J., Soetaert, K. (1988). Dataprocessing,
Florek, K., Lukaszewicz, J., Perkal, J., Steinhaus, H., Zubrzycki,
evaluation, and analysis, pp. 197-231 in R.P. Higgins and H.
S. (1951). Sur la liason et la division des points d’un ensemble
Thiel (eds.), Introduction to the study o f meiofauna. Smithsonian
fini. Colloquium Math. 2: 282-285
Institution, Washington DC
Gage, J.D., Coghill, G.G. (1977). Studies on the dispersion patterns
Hill, M.O. (1973a). Reciprocal averaging: an eigenvector method
of Scottish sea-loch benthos from contiguous core transects. In:
of ordination. J. Ecol. 61: 237-249
Coull, B. (ed) Ecology o f marine benthos. University o f South
Carolina Press, Columbia. Hill, M.O. (1973b). Diversity and evenness: a unifying notation
and its consequences. Ecology 54: 427-432
Gee, J.M., Somerfield, P.J. (1997). Do mangrove diversity and leaf
litter decay promote meiofaunal diversity? J. exp. mar. Ecol. Hill, M.O. (1979a). DECORAN A - A FORTRAN program fo r
Biol. 218: 13-33 detrended correspondence analysis and reciprocal averaging.
Cornell University, Ithaca, New York
Gee, J.M., Warwick, R.M. (1994a). Metazoan community structure
in relation to the fractal dimensions of marine microalgae. Mar. Hill, M.O. (1979b). TWINSPAN — A FORTRAN program fo r
Ecol. Prog. Ser. 103: 141-150 arranging multivariate data in an ordered two-way table by
classification o f individuals and attributes. Cornell University,
Gee, J.M., Warwick, R.M. (1994b). Body-size distribution in a
Ithaca, New York
marine metazoan community and the fractal dimensions of
macroalgae. J. exp. mar. Biol. Ecol. 178: 247-259 Hill, M.O., Gauch, H.G. (1980). Detrended correspondence analysis,
an improved ordination technique. Vegetado 42: 47-48
Gee, J.M., Warwick, R., Schaanning, M., Berge, J.A., Ambrose
Jr, W.G. (1985). Effects of organic enrichment on meiofaunal Hope, A.C.A. (1968). A simplified Monte Carlo significance test
abundance and community structure in sublittoral soft sediments. procedure. J. R. Statist. Soc. Ser. B 30: 582-598
J. exp. mar. Biol. Ecol. 91: 247-262
Howson, C.M. (ed.) (1987). Directory o f the British marine fauna
Goldman, N., Lambshead, P.J.D. (1989). Optimization of the Ewens and flora. Marine Conservation Society, Ross-on-Wye,
/Caswell neutral model program for community diversity analysis. Hertfordshire
Mar. Ecol. Prog. Ser. 50: 255—261
Humphries, C.J., Williams, P.H., Vane-Wright, R.I. (1995). Meas
Gower, J.C. (1966). Some distance properties of latent root and uring biodiversity value for conservation. Ann. Rev. Ecol. Syst.
vector methods used in multivariate analysis. Biometrika 53, 26: 93-111
325-328
Hurlbert, S.H. (1971). The nonconcept of species diversity: a critique
Gower, J.C. (1971). Statistical methods of comparing different and alternative parameters. Ecology 52: 577-586
multivariate analyses of the same data. pp. 138-149 in F.R.
Hurlbert, S.H. (1984). Pseudoreplication and the design of ecol
Hodson, D.G. Kendall and P. Tautu (eds.), Mathematics in the
ogical field experiments. Ecol. Monogr. 54: 187-211
Appendix 3
page A3-3
Huston, M. (1979). A general hypothesis of species diversity. May, R.M. (1990). Taxonomy as destiny. Nature 347: 129-130
Am. Nat. 113: 81-101
McArdle, B.H., Anderson, M.J. (2001). Fitting multivariate models
Ibanez, F., Dauvin, J.-C. (1988). Long-term changes (1977-1987) to community data: a comment on distance-based redundancy
in a muddy fine sand Abra alba - Melinna palmata community analysis. Ecology 82: 290-297
from the Western English Channel: multivariate time-series
McEachran, J.D., Miyake, T. (1990). Phylogenetic interrelationships
analysis. Mar. Ecol. Prog. Ser. 49: 65—81
of skates: a working hypothesis (Chondrichthyes, Rajoidea). In:
Izsak, C., Price, A.R.G. (2001). Measuring ß-diversity using a Elasmobranchs as living resources: advances in the biology,
taxonomic similarity index, and its relation to spatial scale. Mar. ecology, systematics, and the status o f the fisheries, H.L. Pratt
Ecol. Prog. Ser. 215: 69-77 et al (eds). NOAA Technical Report NMFS 90: 285-304
Jayasree, K. (1976). Systematics and ecology of free-living marine Morrisey, D.J., Underwood, A.J., Howitt, L. (1996). Effects of
nematodes from polluted intertidal sand in Scotland. Ph.D. thesis, copper on the faunas of marine soft-sediments: an experimental
University of Aberdeen field study. Mar. Biol. 125: 199-213
Jongman, R.H.G., ter Braak, C.F.J., van Tongeren, O.F.R. (1987). Nei, M. (1996). Phylogenetic analysis in molecular evolutionary
Data analysis in community and landscape ecology. Pudoc, genetics. Ann. Rev. Genet. 30: 371-403
Wagen ingen
Nelson, J.S. (1994). Fishes o f the world, 3rd edn. Wiley, New York
Kendall, M.A., Widdicombe, S. (1999). Small scale patterns in the
Olafsson, E., Moore, C.G. (1992). Effects of macroepifauna on
structure of macrofaunal assemblages of shallow soft sediments.
developing nematode and harpacticoid assemblages in a subtidal
J. exp. mar. Biol. Ecol. 237: 127-140
muddy habitat. Mar. Ecol. Prog. Ser. 84: 161-171
Kendall, M.G. (1970). Rank correlation methods. Griffin, London
Olsgard, F., Somerfield, P.J. (2000). Surrogates in marine benthic
Kenkel, N.C., Orloci, L. (1986). Applying metric and nonmetric investigations - which taxonomic unit to target? J. Aquat. Ecosyst.
multidimensional scaling to some ecological studies: some new Stress Recov. 1: 25-42
results. Ecology 61: 919-928
Olsgard, F., Somerfield, P.J., Carr, M.R. (1997). Relationships
Krzanowski, W.J. (in press). Multifactorial analysis of distance in between taxonomic resolution and data transformations in analyses
studies of ecological community structure. J. Agrie. Biol. Environ. of a macrobenthic community along an established pollution
Sei. gradient. Mar. Ecol. Prog. Ser. 149: 173-181
Kruskal, J.B. (1964). Multidimensional scaling by optimizing Olsgard, F., Somerfield, P.J., Carr, M.R. (1998). Relationships
goodness of fit to a nonmetric hypothesis. Psychometrika 29: between taxonomic resolution, macrobenthic community patterns
1-27 and disturbance. Mar. Ecol. Prog. Ser. 172: 25-36
Kruskal, J.B., Wish, M. (1978). Multidimensional scaling. Sage Pearson, T.H. (1975). The benthic ecology of Loch Linnhe and
Publications, Beverley Hills, California Loch Eil, a sea-loch system on the west coast of Scotland. IV.
Changes in the benthic fauna attributable to organic enrich
Kulczynski, S. (1928). Die Pflanzenassoziationen der Pieninen. Buil.
ment. J. exp. mar. Biol. Ecol. 20: 1-41
Int. Acad. Pol. Sei. Lett. Cl. Sei. Math. Nat. Ser. B, Suppl II: 57-203
Pearson, T.H., Blackstock, J. (1984). Garroch Head sludge dumping
Lambshead, P.J.D. (1986). Sub-catastrophic sewage and industrial
ground survey, final report. Dunstaffnage Marine Research
waste contamination as revealed by marine nematode faunal
Laboratory (unpublished)
analysis. Mar. Ecol. Prog. Ser. 29: 247-260
Pearson, T.H., Gray, J.S., Johannessen, P.J. (1983). Objective
Lambshead, P.J.D., Platt, H.M., Shaw, K.M. (1983). The detection
selection o f sensitive species indicative of pollution-induced
of differences among assemblages of marine benthic species
change in benthic communities. 2. Data analyses. Mar. Ecol.
based on an assessment of dominance and diversity. J. nat.
Prog. Ser. 12: 237-255
Hist. 17: 859-874
Pielou, E.C. (1975). Ecological diversity. Wiley, New York
Lance, G.N., Williams, W.T. (1967). A general theory of classific-
atory sorting strategies: 1 Hierarchical Systems. Comp. J. 9: Pielou, E.C. (1984). The interpretation o f ecological data. A
373-380 primer on classification and ordination. Wiley, New York
Legendre, P., Anderson, M.J. (1999). Distance-based redundancy Piepenburg, D., Voss, J., Gutt, J. (1997). Assemblages of sea stars
analysis: testing multispecies responses in multifactorial ecological (Echinodermata: Asteroidea) and brittle stars (Echinodermata:
experiments. Ecol. Monogr. 69: 1-24 Ophiuroidea) in the Weddell Sea (Antarctica) and off Northeast
Greenland (Arctic): a comparison of diversity and abundance.
Legendre, P., Legendre, L. (1998). Numerical ecology, 2nd Engl,
Polar Biol. 17: 305-322
edn. Elsevier, Amsterdam
Platt, H.M., Warwick, R.M. (1983). Freeliving marine nematodes.
Lorenzen, S. (1994). The phylogenetic systematics o f free-living
Part I. British Enoplids. Synopses of the British Fauna no 28.
nematodes. Ray Society, London
Cambridge University Press, Cambridge
Magurran, A.E. (1991). Ecological diversity and its measurement.
Platt, H.M., Warwick, R.M. (1988). Freeliving marine nematodes.
Chapman and Haii, London
Part II. British chromadorids. E. J. Brill, Leiden
Mantel, N. (1967). The detection of disease clustering and a general
Platell, M.E., Potter, I.C., Clarke, K.R. (1998). Resource partitioning
ized regression approach. Cancer Res. 27: 209-220
by four species of elasmobranchs (Batoidea: Urolophidae) in
Mardia, K.V., Kent, J.T., Bibby, J.M. (1979). Multivariate analysis. coastal waters of temperate Australia. Mar. Biol. 131: 719-734
Academic Press, London
Appendix 3
page A3-4
Potter, I.C., Claridge, P.N., Hyndes, G.A., Clarke, K.R. (1997). Somerfield, P.J., Gee, J.M., Warwick, R.M. (1994b). Soft sediment
Seasonal, annual and regional variations in ichthyofaunal compos meiofaunal community structure in relation to a long-term heavy
ition in the inner Severn Estuary and inner Bristol Channel. J. metal gradient in the Fai estuary system. Mar. Ecol. Prog. Ser.
mar. biol. Ass. U.K. 77: 507-525 105: 79-88
Price, A.R.G., Keeling, M.J., O’Callaghan, C.J. (1999). Ocean- Somerfield, P.J., Gee, J.M., Widdicombe, S. (1993). The use o f
scale patterns of ‘biodiversity’ of Atlantic asteroids determined meiobenthos in marine pollution monitoring programmes.
from taxonomic distinctness and other measures. Biol. J. Linn. Plymouth Marine Laboratory Miscellaneous Publication LIB-
Soc. 66: 187-203 33A,B, Plymouth
Raffaelli, D., Mason, C.F. (1981). Pollution monitoring with meio- Somerfield, P.J., Clarke, K.R. (1995). Taxonomic levels, in marine
fauna using the ratio of nematodes to copepods. Mar. Poll. community studies, revisited. Mar. Ecol. Prog. Ser. 127: 113-
Bull. 12: 158-163 119
Reid, P.C., Barges, M.dF., Svendsen, E. (in press). A regime shift Somerfield, P.J., Clarke, K.R. (1997). A comparison of some methods
in the North Sea circa 1988 linked to changes in the North Sea commonly used for the collection of sublittoral sediments and
fishery. Fish. Res. their associated fauna. Mar. Environ. Res. 43: 143-156
Rogers, S.I., Clarke, K.R., Reynolds, J.D. (1999). The taxonomic Somerfield, P.J., Olsgard, F., Carr, M.R. (1997). A further examin
distinctness of coastal bottom-dwelling fish communities of the ation of two new taxonomic distinctness measures. Mar. Ecol.
North-east Atlantic. J. Anim. Ecol. 68: 769-782 Prog. Ser. 154: 303-306
Sanders, H.L. (1968). Marine benthic diversity: a comparative Somerfield, P.J., Rees, H.L., Warwick, R.M. (1995). Interrelation
study. Am. Nat. 102: 243-282 ships in community structure between shallow-water marine
meiofauna and macrofauna in relation to dredgings disposal.
Scheffe, H. (1959). The analysis o f variance. Wiley, New York
Mar. Ecol. Prog. Ser. 127: 103-112
Schiftfnan, S.S., Reynolds, M.L., Young, F.W. (1981). Introduction
ter Braak, C.F.J. (1986). Canonical correspondence analysis: a new
to multi-dimensional scaling. Theory, methods and applications.
eigenvector technique for multivariate direct gradient analysis.
Academic Press, London
Ecology 61: 1167-1179
Schratzberger, M., Warwick, R.M. (1998a). Effects of the intensity
Underwood, A.J. (1981). Techniques of analysis of variance in
and frequency of organic enrichment on two estuarine nematode
experimental marine biology and ecology. Oceanogr. Mar. Biol.
communities. Mar. Ecol. Prog. Ser. 164: 83-94
Ann. Rev. 19: 513-605
Schratzberger, M., Warwick, R.M. (1998b). Effects of physical
Underwood, A.J. (1992). Beyond BACI: the detection of environ
disturbance on nematode communities in sand and mud: a
mental impact on populations in the real, but variable, world.
microcosm experiment. Mar. Biol. 130: 643-650
J. exp. Mar. Biol. Ecol. 161: 145-178
Schratzberger, M., Warwick, R.M. (1999). Differential effects of
Underwood, A.J. (1997). Experiments in ecology: their logical
various types of disturbances on the structure of nematode
design and interpretation using analysis o f variance. Cambridge
assemblages: an experimental approach. Mar. Ecol. Prog. Ser.
University Press, Cambridge
181: 227-236
Underwood, A.J., Chapman, M.G. (1998). A method for analysing
Schwinghamer, P. (1981). Characteristic size distributions of
spatial scales of variation in composition of assemblages.
integral benthic communities. Can. J. Fish, aquat. Sei. 38:
Oecologia 117: 570-578
1255-1263
Underwood, A.J., Peterson, C.H. (1988). Towards an ecological
Seber, G.A.F. (1984). Multivariate observations. Wiley, New York
framework for investigating pollution. Mar. Ecol. Prog. Ser.
Shimatani, K. (2001). On the measurement of species diversity " 46: 227-234
incorporating species differences. Oikos 93: 135-147
Vane-Wright, R.I., Humphries, C.J., Williams, P.H. (1991). What
Shephard, R.N. (1962). The analysis of proximities: multidimens to protect? Systematics and the agony of choice. Biol. Conserv.
ional scaling with an unknown distance function. Psychometrika 55: 235-254
27:125-140
Ward, P., Woodd-Walker, R., Clarke, A. (submitted). Seasonality
Simpson, E.H. (1949). Measurement of diversity. Nature 163: 688 rather than temperature regulates Zooplankton diversity. Nature
Sneath, P.H.A. (1957). The application of computers to taxonomy. Warwick, R.M. (1971). Nematode associations in the Exe estuary.
J. gen. Microbiol. 17: 201-226 J. mar. Biol. Ass. U.K. 51: 439-454
Sneath, P.H.A., Sokal, R.R. (1973). Numerical taxonomy. Freeman, Warwick, R.M. (1984). Species size distributions in marine benthic
San Francisco communities. Oecologia (Berlin) 61: 32-41
Sokal, R.R., Rohlf, F.J. (1981). Biometry. Freeman, San Francisco Warwick, R.M. (1986). A new method for detecting pollution effects
on marine macrobenthic communities. Mar. Biol. 92: 557-562
Somerfield, P.J., Gage, J.D. (2000). Community structure of the
benthos in Scottish sea-lochs. IV. Multivariate spatial pattern. Warwick, R.M. (1988a). Effects on community structure of a poll
Mar. Biol. 136: 1133-1 145 utant gradient - summary. Mar. Ecol. Prog. Ser. 46: 207-211
Somerfield, P.J., Gee, J.M., Warwick, R.M. (1994a). Benthic Warwick, R.M. (1988b). The level of taxonomic discrimination
community structure in relation to an instantaneous discharge required to detect pollution effects on marine benthic communities.
of waste water from a tin mine. Mar. Pollut. Bull. 28: 363-369 Mar. Pollut. Bull. 19: 259-268
Appendix 3
page A3-5
Warwick, R.M. (1993). Environmental impact studies on marine Warwick, R.M., Platt, H.M., Clarke, K.R., Agard, J., Gobin, J.
communities: pragmatical considerations. Aust. J. Ecol. 18: (1990c). Analysis of macrobenthic and meiobenthic community
63-80 structure in relation to pollution and disturbance in Hamilton
Harbour, Bermuda. J. exp. Mar. Biol. Ecol. 138: 119-142
Warwick, R.M., Ashman, C.M., Brown, A.R., Clarke, K.R., Dowell,
B., Hart, B., Lewis, R.E., Shillabeer, N., Somerfield, P.J., Tapp, Warwick, R.M., Platt, H.M., Somerfield, P.J. (1998). Freeliving
J.F. (in press). Inter-annual changes in the biodiversity and marine nematodes. Part III. British Monhysterida. Synopses
community stucture of the macrobenthos in Tees Bay and the of the British Fauna no 53. Field Studies Council, Shrewsbury
Tees estuary, UK, associated with local and regional environ
Widdicombe, S., Austen, M.C. (1998). Experimental evidence for
mental events. Mar. Ecol. Prog. Ser.
the role of Brissopsis lyrifera (Forbes, 1841) as a critical species
Warwick, R.M., Buchanan, J.B. (1970). The meiofauna off the in the maintenance of benthic diversity and the modification of
coast of Northumberland. I: The structure of the nematode sediment chemistry. J. exp. Mar. Biol. Ecol. 228: 241-255
population. J. mar. biol. Assoc. UK 50: 129-146
Widdicombe, S., Austen, M.C. (2001). The interaction between
Warwick, R.M., Carr, M.R., Clarke, K.R., Gee, J.M., Green, R.H. physical disturbance and organic enrichment: an important element
(1988). A mesocosm experiment on the effects of hydrocarbon in structuring benthic communities. Limnol. Oceanog. 46: 1720-
and copper pollution on a sublittoral soft-sediment meiobenthic 1733
community. Mar. Ecol. Prog. Ser. 46: 181-191
Wilkinson, D.M. (1999). The disturbing history of intermediate
Warwick, R.M., Clarke, K.R. (1991). A comparison of methods disturbance. Oikos 84: 145-147
for analysing changes in benthic community structure. J. mar.
Williams, P.H., Humphries, C.J., Vane-Wright, R.I. (1991). Measur
Biol. Ass. U.K. 71: 225-244
ing biodiversity: taxonomic relatedness for conservation priorities.
Warwick, R.M., Clarke, K.R. (1993a). Comparing the severity of Aust. Syst. Bot. 4: 665-679
disturbance: a meta-analysis of marine macrobenthic community
data. Mar. Ecol. Prog. Ser. 92: 221-231
Warwick, R.M., Clarke, K.R. (1993b). Increased variability as a
symptom of stress in marine communities. J. exp. mar. Biol.
Ecol. 172: 215-226
Warwick, R.M., Clarke, K.R. (1994). Relearning the ABC: taxonomic
changes and abundance/biomass relationships in disturbed benthic
communities. Mar. Biol. 118: 739-744
Warwick, R.M., Clarke, K.R. (1995a). New ‘biodiversity’ measures
reveal a decrease in taxonomic distinctness with increasing stress.
Mar. Ecol. Prog. Ser. 129: 301-305
ACKNOWLEDGEMENTS
Warwick, R.M., Clarke, K.R. (1995b). Multivariate measures of
community stress and their application to marine pollution studies
We are grateful to a large number o f individuals for
in the East Asian region. Phuket mar. biol. Cent. Res. Bull. 60: helpful collaboration and discussions: Paul Somerfield,
99-113 Martin Carr, Ray Gorley, Paul Dowland, Ben Knight,
Warwick, R.M., Clarke, K.R. (1998). Taxonomic distinctness and Marios M itella, Jane Addy, John Haii, Martin Budge,
environmental assessment. J. appl. Ecol. 35: 532-543 Mike Ainsworth, Robert Pritchard, Charlie Green, John
Warwick, R.M., Clarke, K.R. (2001). Practical measures of marine
Bramley, Simon Frith, Roger Carter, Mike Gee, Mel
biodiversity based on relatedness of species. Oceanogr. Mar. Austen, John Gray, Frode Olsgard and many others who
Biol. Ann. Rev. 39: 207-231 in various ways contributed to the analyses featured in
Warwick, R.M., Clarke, K.R., Gee, J.M. (1990a). The effects of this manual.
disturbance by soldier crabs, Mictyris platycheles H. Milne
Edwards, on meiobenthic community structure. J. exp. Mar. We also gratefully acknowledge institutional support
Biol. Ecol. 135: 19-33 from a diversity of sources, primarily the Plymouth
Warwick, R.M., Clarke, K.R., Suharsono. (1990b). A statistical Marine Laboratory of the Natural Environment Research
analysis of coral community responses to the 1982-3 El Nino Council, UK (which employed both o f us for our entire
in the Thousand Islands, Indonesia. Coral Reefs 8: 171-179 research careers, until 2001), IOC, UNEP and FAO,
Warwick, R.M., Coles, J.W. (1977). The marine flora and fauna who supported the earlier development work, and the
of the Isles of Scilly. Free-living Nematoda. J. Nat. Hist. 11: UK M inistry o f Agriculture, Fisheries and Food (now
393-407 the Department o f Environment, Food & Rural Affairs)
Warwick, R.M., Light, J. (in press). Death assemblages of molluscs who jointly funded, with the NERC, some o f the more
on St. Martin’s Flats, Isles of Scilly: a surrogate for regional recent methods developments.
biodiversity? Biodivers. Conserv.
Warwick, R.M., Pearson, T.H., Ruswahyuni (1987). Detection of K R Clarke
pollution effects on marine macrobenthos: further evaluation of R M W arwick
the species abundance/biomass method. Mar. Biol. 95: 193-200 2001