Vous êtes sur la page 1sur 13

C H A PTER 6

Distance Measures

Background • O ne can calculate distances am ong either the


T he first step o f m ost m ultivariate analyses is to rows o f your data m atrix or the colum ns of
calculate a m atrix o f distances or sim ilarities am ong a your data m atrix. W ith com m unity data this
set o f item s in a m ultidim ensional space. T his is m eans you can calculate distances am ong your
analogous to constructing the triangular "m ileage sam ple units (SUs) in species space or am ong
chart" provided w ith m any road maps. But in our case, your species in sam ple space.
we need to build a m atrix o f distances in hyperspace, Figure 6.1 show s two species as points in sample
rath er th an the tw o-dim ensional m ap space. Fortu­ space, corresponding to the tiny data set below (Table
nately, it is ju st as easy to calculate distances in a 6.1). We can also represent sam ple units as points in
m u ltidim ensional space as it is in a tw o-dim ensional species space, as on the right side o f Figure 6.1. using
space. the sam e data set.
T h is first step is extrem ely im portant. If T here are m any distance m easures. A selection of
inform ation is ignored in th is step, then it cannot be the m ost com m only used and m ost effective m easures
expressed in the results. Likewise, if noise or outliers are described below. It is im portant to know the
are exaggerated by the distance m easure, then these dom ain of acceptable data values for each distance
unw anted features o f our data w ill have undue m easure (Table 6.2). M any distance m easures are not
influence on the results, perhaps obscuring m eaningful com patible w ith negative num bers. O ther distance
patterns. m easures assum e th at the data are proportions ranging
betw een zero and one, inclusive
Distance concepts
D istance m easures are flexible: Table 6.1. Exam ple data set A bundance o f two
• Resem blance can be m easured either as a species in tw o sam ple units.
distance (dissim ilarity) or a sim ilarity.
Species
• M ost distance m easures can readily be con­
verted into sim ilarities and vice-versa. Sam ple unit 1 2

• All o f the distance m easures described below A 1 4


can be applied to either binary (presence- B 5 2
absence) or quantitative data.

5 j S am ple sp a ce S p ecies sp a ce
ш q
♦SUA
3 3
<L> Sp 2 SUB
ex 2 tU
CL
C /3
C /3

- I ------------------ 1------------------1------------------ L_

» 1 2 3 4 5 1 2 3
Sample U nit A S p ecies 1

F igure 6.1. G raphical representation o f the data set in T able 6.1. T h e left-hand graph
show s species as points in sam ple space. T he rieh t-h an d nranh shnwc eamnip unite ·>-
Chapter 6

Table 6.2 Reasonable and acceptable dom ains o f input data. л\ and ranges o f distance m easures, d - fix).

D om ain
Name (synonym s) of X Range o f d = f i x ) C om m ents
Sorensen x >0 0 <d < 1 p roportion coefficient in city-
(Brav & Curtis; ( o r ( ) < x < 100%) block space, sem im etric
Czekanovvski)
Relative Sorensen x >0 0 <d< 1 proportion coefficient in city -
(Kulczyński; Q uantitative (or 0 < x < 100%) block space; sam e as Sorensen
Symmetric) but data points relativized by
sam ple unit totals; sem im etric
Jaccard x>0 ()< £ /< 1 proportion coefficient in city-
(orO < d < 100%) block space; m etric

Euclidean (Pythagorean) all non-negative m etric


Relative Euclidean all E uclidean distance between
0 < d < 4 2 for quarter
(Chord distance, points on unit hy persphere:
hypersphere; 0 < d < 2
standardized E uclidean) m etric
for full hvpersphere
C orrelation distance all 0 <d < I converted from correlation to
distance; proportional to arc
distance betw een points on unit
hypersphere; cosine of angle
from centroid to points; m etric
Chi-square x>0 d>0 E uclidean but doubly w eighted
by variable an d sam ple unit
totals; m etric
Squared Euclidean all d> 0 m etric
M ahalanobis all d> 0 distance between groups
weighted by w ithin-group
dispersion: m etric

D istance m easures can be categorized as metric, Kulczyński distances Seim m ctrics are extremely use­
scm im etric. or nonm etric A m e tric distance m easure ful in com m unity ecology but obey a non-Euclidean
must satisfy the follow ing rules: geometry N o n m etrics violate one or m ore o f the other
rules and are seldom used in ecology
1 The m inim um value is zero w hen two item s are
identical.
2 W hen two item s differ, the distance is positive
Distance measures
(negative distances are not allowed). T h e equations use the follow ing conventions: Our
data m atrix A has q rows, w hich are sam ple units and
3 Symmetry: the distance from objects A to object
p colum ns, w hich are species. E ach elem ent of the
B is the sam e as the distance from B to A.
m atrix, a, ,, is the abundance of species j in sam ple unit
4 T riangle inequality axiom: W ith three objects. i. Most of the following distance m easures can also be
the distance between two o f these objects used on binary data (1 o r 0 for presence or absence).
cannot be larger than the sum of the two other In each o f the follow ing equations, we are calculating
distances.
the distance between sam ple units / and h.
D istance Λle asu res

Euclidean distance

oo EUCLIDEAN
E E ) ,,и 'У ' ( α , , ~ ай,]} Щ DISTANCE
D
ω
Cu
oo
T his form ula is sim ply the Pythagorean theorem
applied to p dim ensions rather th an the usual two
dim ensions (Fig. 6.2).
S P E C IE S 1

City-block distance (= M a n h attan distance) 00


ω
0 f CITY-BLOCK
ω [ d is t a n c e
CBo ъ a¡ , " Ok,, α,
oo
7=1

ln city-block space you can only move along one


dim ension o f the space at a tim e (Fig. 6.2). By S P E C IE S 1
analogy, in a city o f rectangular blocks, you caimot cut
diagonally through a block, but m ust w alk along either
of the two dim ensions of th e block. In the m athem a­
tical space, size o f the blocks does not affect distances cos α

in the space Note also that m any equal-length paths 00

exist betw een two points in city-block space.


E uclidean distance and city-block distance are
c e n t r o id S P E C IE S 1
special cases (different values of k) o f the M inkow ski
metric. In two dim ensions:

Figure 6.2. G eom etric representations o f basic dis­


Distance \[xk + y k
tance m easures betw een two sam ple units (A and B) in
species space In the upper two graphs the axes meet
where x and v are distances in each o f two dim ensions.
at the origin; in the lowest graph, at the centroid
G eneralizing this to p dim ensions, and using the form
of the equation for ED:
cos(18()“). Tw o sam ple units lying on the sam e radius
from the centroid have r = 1 = cos(0°). If two sam ple
units form a rig h t angle from the centroid then r = 0 =
Distance,h = at] - a h j t
cos(90°).
The correlation coefficient can be rescaled to a
Note that k = 1 gives city-block distance, k = 2 gives distance m easure o f range 0-1 by
Euclidean distance. As k increases, increasing
em phasis is given to large differences in individual t d ista n c e ( Î ~ ~

dim ensions

Correlation Proportion coefficients


The correlation coefficient (r) is cosine a (third Proportion coefficients are city-block distance
panel in Fig. 6.2) w here the origin o f the coordinate m easures expressed as proportions o f the m axim um
system is the m ean species com position o f a sam ple distance possible. T he Sorensen. Jaccard. and QSK
tuut in species space (the "centroid"; see Fig. 6.2). If coefficients described below are all proportion coef­
the data have not been transform ed and the origin is at ficients. O ne can represent proportion coefficients as
(0,0). then this is a noncentered correlation coefficient the overlap betw een the area under curves. T his is
easiest to visualize w ith two curves o f species abun­
For exam ple, if two sam ple units lie at 180“ from
dance on an environm ental g rad ien t (Fig. 6 3). If A is
each other relative to the centroid, then r - -1 =
the area under one curve, B is the area under the other.
( 'hapt ur 6

and n' is the overlap (intersection) of the two areas,


(hen the Sorensen coefficient is 2и>/(. I · lí).
Jaccard coefficient is w i l l - И-w).
T he
Σ tin - Cthj

IX,
W ritten in set notation: Σ». +Σ a.
2( A Г) B)
Sorensen similarity A nother way o f w riting this, w here MÍN is the
( A ' u B) - (A rsB) sm aller o f two values is:
АглВ
Jaccard similarity
A sj B Clh,)

Proportion coefficients as distance m easures are D,h


foreign to classical statistics, w hich are based on
squared Euclidean distances. Ecologists latched onto

proportion coefficients for their sim ple, intuitive appeal
despite their falling outside of m ainstream statistics.
Nevertheless, Roberts (1986) showed how proportion O ne can convert th is dissim ilarity' (or any o f the
coefficients can be derived from the m athem atics o f follow ing proportion coefficients) to a percentage dissi­
fuzzy sets an increasingly important branch of m ilarity ( r ø ) :
m athem atics For exam ple, when applied to q u an tita­
tive data. Sorensen sim ilarity is the intersection
between two fuzzy sets. Sorensen distance ■=- BC, PD,i 100 D„
Sorensen sim ilarity (also know n as "BC" for
Bray-Curtis coefficient) is thus shared abundance
Jaccard dissimilarity is the proportion of the
combined abundance that is not shared, or u (A ■ B w)
(Jaccard 1901):

- Σ a ¡j “ ci

JD ,

E n v iro n m e n ta l G ra d ie n t
Σ ач + Σ ^ ”^ Σ ^ο ~ Qhj

Figure 6.3. O verlap between two species abundances


along an environm ental gradient. T he abundance
shared between species A and B is shown by w. Q uantitative sym m etric dissim ilarity (also
know n as the K ulczyński or Q SK coefficient: see
Faith et al. 1987):
divided by total abundance. It is frequently know n as
"2u/(A + B )" for short. T his logical-seem ing m easure
was also proposed by Czekanow ski (1913).
Originii Ily used for binary (0/1) data, it w orks
£ MIN(¿/y, a hj ) X M1N(û„ , üHj)
equally well for quantitative data It is often called
the "Bray-Curtis coefficient" (Faith et al. 1987) Q S K ih= 1- v
w hen applied to quantitative data, as in Bray and
Σ a '¡ Σ α >ν
Curtis (1957). R ew riting the equation as a
dissimilarity (or distance) m easure, dissim ilarity
A lthough Faith et a f (1987) stated that this measure
between item s / and h is.
has a "built-in stan d ard izatio n " it is not a standardized
city-block distance in the sam e way that relative
Euclidean distance is a standardized m easure of
Euclidean distance. In contrast with the "relative
Sorensen distance" (below), the Q SK coefficient gives
D istance M easures

different results w ith raw data versus data standardized


by SU totals (Ch. 9). A fter such relativization. how­
ever. Q SK gives the sam e results as Sorensen, relative
Sorensen, and city-block distance
R elative S ø ren sen (also know n as relativized
M anhattan coefficient in Faith et al. 1987) is m athem a­ RED ih
tically equivalent to the Bray-C urtis coefficient on data
relativized by SU total. T his distance m easure builds
VV;
Σ
in a standardization by sam ple unit totals, each sam ple
unit contributing equally to the distance m easure.
U sing this relativization shifts the em phasis o f the
analysis to proportions o f species in a sam ple unit,
1.0
rather than absolute abundances.

cn
\ A rc d istan ce
/ y w
u RED \
Ш (ch o rd )
es­
co
X j a k
D„ Σ M I N
J- 1 P

Σ«. =1 7 1.0
Vj-i J S P E C IE S 1
Figure 6.4. Relative E uclidean distance is the
chord distance betw een two points on the surface
An alternate version, using an absolute value o f a unit hypersphere
instead of the MIN function, is also m athem atically
equivalent to Bray-C urtis coefficient on data relativized
RED builds in a standardization It puts differ­
bv SU total:
ently scaled variables on the sam e footing, elim inating
any signal other than relative abundance. Note that the
correlation coefficient also accom plishes this standard­
a ization, but arccos(r) gives the arc distance on the
IX, qu arter hypersphere. not the chord distance Also,

Σ α >< Σa. w ith the correlation coefficient the surface is a full


hypersphere. not a quarter hypersphere. and the center
o f the hypersphere is the centroid of the cloud of
points, not the origin

Note that w ith standardization or relativization of


the data before application o f the distance m easure,
Chi-square distance
many o f the city-block distance m easures become T he chi-square distance m easure is used in corre­
m athem atically equivalent to each other. After relati- spondence analysis and related ordination techniques
vization by sam ple unit totals. PD (Bray-Curtis) = CB (Chardy et al 1976. M inchin 1987a). A lthough Faith
= QSK = Relative Sorensen et al. (1987) found that this distance m easure
perform ed poorly, it is im portant to be aw are o f it.
Relative Euclidean distance (RED) since it is the im plicit distance m etric in some o f the
m ore popular ordination techniques (detrended
RED is a variant o f ED that elim inates differences
correspondence analysis and canonical correspondence
m total abundance (actually totals o f squared abun­
analysis)
dances) am ong sam ple units. RED ranges from 0 to
the square root of 2 w hen all abundance values are Let
nonnegative (i.e., a quarter hypersphere). Visualize p
the SUs as being placed on the surface of a quarter ah, = total for sam ple unit h (i.e.. ' У \ а . )
hypersphere w ith a radius o f one (Fig. 6.4). RED is
the chord distance between two points on this surface
( ’hapler 6

a,. = tolal for sam ple unit / (i.e.. f ' tf„ ) C ase
/I o o·
o oo A

a . , = total for species j (i.e.. ^ a ¡} )

th en the chi-square distance (C hardv et al. 1976) is


C ase
B
Cl hj Clij
x:
a h- ci,~ G ro u p ƒ G ro u p h

Note (hat this distance m easure is sim ilar to Figure 6.5. Illustration o f the influence o f vvithin-
E uclidean distance, but it is w eighted by the inverse o f group variance on M ahalanobis distance
the species totals. If the data are prerelativized by
sam ple unit totals (i.e.. b„ a,., ), then the equation
sim plifies to:
Dfh (w - g ) ΣΣ
r i J- 1
W 'J ' K * Cl.h ) ( « л - a jh )
X' Σ (fibL-AÌ w here n is the num ber o f sam ple units, g is the num ber
o f groups, an d i * j. Note that differences are w eighted
m ore heavily by w„ w hen v ariables / and j are uncorre­
The num erator is the squared difference in relative
lated Thus. M ahalanobis distance corrects for the
abundance It is expressed as a proportion o f the
correlation structure o f the original variables (the
species total (the denom inator) and sum m ed over all
dim ensions o f the space). T h e built-in standardization
species.
m eans that it is independent o f the m easurem ent units
M inchin (1987a) offered the following critique o f o f the original variables.
this distance measure:
In w hich case in Figure 6.5 are groups ƒ and h
The appropriateness o f C hi-squared distance as a
more distant? Because the M ahalanobis distance
m easure o f com positional dissim ilarity in ecology
inversely w eights the distance between centroids by the
may be questioned (F a ith et al 1987). T he m ea­
variance, the two groups are m ore distant in Case B.
sure accords high w eight to species w hose total
abundance in the data is low. [Conversely, it de-
even though the centroids are equidistant in the two
em phasizes abundant species ] It thus tends to cases.
exaggerate the d istin c tiv e n ess o f sam ples contain­ Note the conceptual sim ilarity to an T -ratio of
ing several rare species. U nlike the B ray-C urtis between- to w itlnn-group variance. Indeed, the M aha­
coefficient and re la te d m easures. C hi-squared lanobis distance can be used to calculate an F-test for
distance does not reach a constant, m axim al value
m ultivariate differences between groups. Sim ilarly, it
for sam ple pairs w ith no species in com m on, but
can be used to test for outliers by calculating the
fluctuates according to variations in the rep resen ­
distance between each point and the cloud of rem ain­
tation o f species w ith high or low total abundances.
ing points.
These properties o f C hi-squared distance may
account for som e o f the d istortions observed in
DCA ordinations. Performance of distance
measures
M ahalanobis distance (Ό2)
M ahalanobis distance O f is used as a distance Loss o f sensitivity with heterogeneity
measure between two groups (/ and h). It is com m only Perform ance o f distance m easures can be
used in discrim inant analysts and in testing for evaluated by com paring the relationship between
outliers. If a„ is the m ean for the ith variable in group environm ental distance (distance along an environ­
J. and vt'y is an elem ent from the inverse of the pooled m ental gradient, such as elevation) vs. sociological
vvithin-groups covariance m atrix. representing distance (the difference in com m unities as reflected by
' >n ihtpc / and /'. then the distance in species space) T his method oi
D istance ΛJe asures

evaluating distance m easures w as used by Beals previous exam ple (CV o f SU totals = 40% ). and the
(1984). Faith et al. (1987), De ath (1999a). an d Boyce species vary realistically in abundance (CV of species
and Ellison (2001). 11' species respond noiselessly to totals = 183°»).
environm ental gradients and the environm ental A gain, all o f the distance m easures lose sensitiv ity
gradients are know n, then we seek a perfect linear with increasing environm ental distance (Fig. 6,7),
relationship betw een distances in species space and T his loss is greatest for distance based on the co rrela­
distances in environm ental space. Any departure from tion coefficient. E uclidean distance not only loses
that relationship represents a partial failure o f our sensitivity at high distances, but introduces consid­
distance measures. erable error, even at m oderate distances. N ote also that
Two exam ples help clarify' the variability in the E uclidean distance shows no fixed upper bound for
relationship betw een distance in species space and sam ple units that have nothing in com m on
environm ental space. T hese exam ples are based on Sorensen distance loses sensitiv ity ov er a distance
synthetic data sets w ith a know n underlying structure about h alf the length o f the env ironm ental gradients.
and noiseless responses o f species to two environm en­ T he flat top on the Sorensen scatterplot results because
tal gradients. it has a fixed m axim um for SUs having no species in
The first exam ple is an "easy" data set. consisting com m on. M any ecologists consider this a desirable,
o f 25 sam ple units and 16 species. It is easy because intuitive property for species data. C hi-square distance
the beta diversity is fairly low (average Sorensen perform s reasonably well at sm all environm ental
distance am ong SUs = 0.59: 1.3 h alf changes), the distances but m isinterprets many distant SUs as being
sam ple unit totals are fairly even (coefficient of close in species space.
variation (CV) o f SU totals = 17%). and the species are T ransform ation of the data to binary (presence-
all sim ilarly abundant (CV o f species totals = 37%). absence) in both exam ples results in a more linear
Despite this being an easy data set. all o f the relationship w ith most distance measures. T his was
distance m easures show a curvilinear relationship with show n for E uclidean distance and Sorensen distance
environm ental distance (Fig. 6.6). Specifically, we see w ith real data from an elevation gradient (Beals 1984).
the loss in sensitivity o f our distance m easures at large G iven the apparently poor perform ance o f all of
environm ental distances. T he problem is least the distance m easures, it is rem arkable that m u ltiv ari­
apparent in the Sorensen, chi-square, and Jaccard ate analysis is able to extract clear, sensible patterns
distances. T he problem is w orst with the correlation (Fig. 6 8) We are rescued by the redundancy m the
distance, w here the curve not only flattens at high data — all ordination and classification techniques
environm ental distances, but starts to decline at the benefit front this redundancy in the data W here two
highest distances. T he drop in the curve for the corre­ species fail to be inform ative of a difference, another
lation coefficient (actually (1 - r)l2 w hich converts r two species are inform ative. Some ordination tech­
into a distance rather than a sim ilarity m easure) is due niques. nonm etric m ultidim ensional scaling (NM S) in
to interpreting shared zeros (0.0) as positive associ­ particular, are able to linearize the relationship
ation. between distance in species space an d distance in a
T he second exam ple is a m ore difficult data set. reduced ordination space. NM S has an advantage over
consisting o f 100 sam ple units and 25 species. The other ordination techniques: it is based on ranked
difficulty has nothing to do w ith the size o f the data distances, w hich im proves its ability to extract
set Rather its beta diversity is higher (average Soren­ inform ation from the nonlinear relationships illustrated
sen distance am ong SUs = 0.79: 2.3 h alf changes), the in the two exam ples.
sam ple unit totals vary m ore widely than in the
C hapter 6

« 0 0
u
o 20
c 0.8 8 a
ce o
0 .6 «0
9
0
S 10 0.4

o 0.2
сл
0
2 4 6 2 4

E nvironm ental D istance E nvironm ental D istance

0.08 0.9
a
3 0.06
co
D 0.05
υ.07 0.8
0.7
06
V ? "
05

u 0.04
03 j 0.4
¡3 0.03 аз
cr ТЗ 0.3
.2 0 02 fc 0.2
O
O 0.01 CJ 0.1
0
2 4 6 2 4 6

E nvironm ental D istance E nvironm ental D istance

700 1 o o
8 *8
a 600 o
d) o 0.8
Г2 500 o 0
73 0
Z3 400 0.6
Q O
ш 0
-а 300 T3
w
P 03 0.4
o
g 200
cr 0.2
C/3 100
0
2 4 6 2 4 6

E nvironm ental D istance E nvironm ental D istance

Figure 6 6. R elationship between distance in species space for an "easy” data set, using various distance
m easures and environm ental distance. T he graphs above are based on a synthetic data set w ith noiseless
species responses to two know n underlying environm ental gradients T he gradients w ere sam pled w ith a
5 x 5 grid T his is an "easy” data set because the average distance is reasonably sm all (Sorensen distance
= 0.59: 1.3 h a lf changes), all species are sim ilar in abundance (CV o f species totals = 37% ), and sam ple
units have sim ilar totals (CV o f SU totals = 17%).
D istance Λteasures

0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 X 9 10 1 1 1 2 13
Environm ental D istance Environm ental D istance

& a 0.6

o 0. 4

ω 0.3 &

0 1 2 3 4 5 6 7 8 9 10 1 1 1 2 13 0 1 2 3 4 5 6 7 X 9 10 11 12 i:
Environm ental D istance Environm ental Distance

Figure 6.7. R elationship between distance in species space for a "m ore difficult” data set, using various
distance m easures, and environm ental distance. T he graphs above are based on a synthetic data set with
noiseless species responses to two know n underlying environm ental gradients. T he gradients were
sam pled w ith a 10 x 10 grid. T his is a "m ore difficult ' data set because the average distance is rather
large (Sorensen distance = 0.79: 2.3 h a lf changes), species vary in abundance (C V of species totals =
183%). and sam ple units have moderately variable totals (CV o f SU totals = 40% ).
------- V 'J J

C hapter 6

O 1 2 3 4 5 6 7 8 9 10 11 12 13

Environmental Distance

Figure 6 8. D istance in a 2-D nonm etric m ultidim ensional scaling


ordination (NM S) in relation to environm ental distances, using the sam e
data set as in Figure 6.7. Note how the ordination overcam e the lim ita­
tion o f th e Sorensen coefficient at expressing large distances.

for selecting one distance m easure over another The


Availability rationale for our choice is prim arily em pirical: we
should select m easures that have show n superior
T he Sorensen (B ray-C urtis) index has repeatedly
perform ance, based on the other criteria listed here
been shown to be one o f the m ost effective m easures o f
O ne im portant theoretical difference betw een E uclid­
sam ple or species sim ilarity, yet it is not w idely avail­
ean an d city-block distance is, however, apparent.
able in general statistical software.
Long E uclidean distances in species space are
m easured through an uninhabitable portion o f species
Compatibility space — in other w ords the straig h t-lin e segm ents tend
T he prim ary disadvantage o f city-block distance to pass through areas o f im possibly species-rich and
m easures is that they are not com patible w ith many overly full com m unities. In contrast, city-block
standard m ultivariate analyses (e.g.. discrim inant distances are m easured alo n g the edges o f species
analysis, canonical correlation, and canonical corre­ space — exactly w here sam ple units lie in the dust
spondence analysis). T his becom es less an d less bunny distribution.
im portant, however, as non-E uclidean alternatives
becom e increasingly available. C ertain m ethods of Intuitive criteria
classification and ordination are effective w ith
Does E uclidean or city-block distance better match
ecological data (e.g.. B ray-Curtis ordination an d non-
our intuition on how com m unity distances should be
m etnc m ultidim ensional scaling), in part because they
m easured? C onsider the follow ing exam ples.
are am enable to distance m easures that perform well
w ith ecological data. In city-block space, the im portance o f a gradient is
proportional to the num ber o f species responding to it.
Theoretical basis For exam ple, assum e that 20 species respond only to
gradient X an d 4 species only to gradient Y. In city-
Beyond the choice of a proportion coefficient or
block space, g radient X is 5 tim es as im portant as
not (Box 6.1) and the choice o f a relativized distance
gradient Y. In E uclidean space, gradient X is square-
m easure or not, there is little basis in ecological theory
D istance M easures

root-of-5 tim es as im portant as gradient Y W hich as the n nearest neighbors G eodesic distances should
space m atches your intuition? be able to find effectively the curvature of com position­
W ith Euclidean distance, large differences are al gradients in species space Geodesic distances arc
w eighted more heavily than several sm all differences one o f the m ost prom ising new m ethodological deve-
(Box 6.2). T his results in greater sensitivity to outliers lopments. A key issue w ill be objectively defining
w ith E uclidean distance than w ith city-block distance "nearby" to optim ize the recovery o f patterns in
m easures For exam ple, assum e we have four species ecological com m unities.
and three sam ple umts. A, B. and C T he data and T he difference betw een T enenbaum s geodesic
differences in abundance o f each species for each pair distance an d the ecologists' shortest path (SP) m ethods
o f sam ple units are listed in Box 6.2. can be visualized w ith an analogy to crossing a stream
dotted w ith stepping stones We w ant to find the
Geodesic distance shortest route from a p articular point on one bank to a
p articular point on the opposite bank. T he SP m ethod
P erform ance o f all o f the traditional distance
m easures declines as distances in species space m ust find a single stepping stone that gets us across the
increase (Figs. 6.7 and 6 8). An innovative solution to stream in the two shortest possible leaps (one to the
the problem o f m easuring long distances in nonlinear stepping stone and one to the far bank). T he geodesic
m ethod, however, defines a com fortably sm all step,
structures is the geodesic distance (Tenenbaunr et al.
then seeks the shortest series o f steps w ithout ever
2000) T his concept is sim ilar to the "shortest path"
exceeding that sm all step length. T he geodesic m ethod
adjustm ents to a distance m atrix (W illiam son 1978.
thus considers the w hole array o f stepping stones,
1983: Clvm o 1980. Bradfield & K enkei 1987. De ath
w hile the SP m ethod can consider only one stone at a
1999a). W illiam son sum m ed distances between
tim e
sam ple unit pairs representing the shortest path
betw een two distant SUs. but only applied this to SUs A problem w ith the SP m ethod is that if the stream
w ith no species in com m on Bradfield and Kenkei is broader th an two leaps, th en no single stepping stone
(1987) added flexibility by varying the threshold for will work. T his corresponds to two SUs so different
the num ber of species in com m on. Bradfield and th at there is no th ird SU that shares species with both
Kenkei found better results w ith a low er threshold; i.e.. o f them. De ath (1999a) solved this problem by allow ­
adjusting a larger proportion o f the distance m atrix. ing m ultiple passes o f the SP method, in essence
De ath (1999a) further extended the m ethod by using allow ing m ultiple stepping stones.
city-block distance m easures, changing the threshold to D espite the excellent o rdinations in Bradfield and
a quantitative dissim ilarity value, and allow ing K enkei (1987), Boyce an d E llison (2001), and De ath
inultiple-step paths betw een very distant SUs. (1999a). the geodesic distances and related m ethods
A geodesic distance betw een two points is m eas­ have not been widely adopted, probably because they
ured by accum ulating distances betw een nearby points. have not been included in p opular softw are packages.
T enenbaum et al. (2000) used E uclidean distances, but W hether T enenbaum et a l .’s (2000) geodesic distances
geodesic distances can be built from other distance offer further im provem ents over the SP m ethods used
m easures "N earby" can be defined as a fixed radius or by ecologists rem ains to be seen
C'hapter 6

Box 6.1. C om parison o f E uclidean distance w ith a proportion coefficient (Sorensen distance). Relative proportions
o f species 1 and 2 are the sam e between Plots 1 and 2 and Plots 3 and 4

Data m atrix containing abundances


E xam ple calculations o f distance m easures
for Plots 3 an d 4. ED = E uclidean distance.
S pi Sp2 PD = Sorensen d istance as percentage
Plot 1 1 0
ED, . V ( 1 0 - It))2 + ( 1 0 - o r = 10
Plot 2 1 1
Plot 3 10 0 io o ( |io - io | + |i o - o |)
Plot 4 10 10 PD , 4 = = 33.3%
10 + 20

Plot 1 Plot 2 Plot 3 Plot 4


Plot 1 0
Plot 2 33.3 0
Plot 3 81.8 83.3 0
Plot 4 90.5 83.3 33.3 0

E uclidean distance m atrix


Plot 1 Plot 2 Plot 3 Plot 4
Plot 1 0
Plot 2 10 0
Plot 3 .9.0 9.1 0
Plot 4 13.4 12.7 10.0 0

T he Sorensen distance betw een Plots 1 and 2 is 0.333 (33.3% ). as is the Sorensen
distance between Plots 3 and 4. as illustrated below In both cases the shared abundance is one
third o f the total abundance. In contrast, the E uclidean distance betw een Plots 1 an d 2 is I,
w hile the E uclidean distance betw een Plots 3 and 4 is 10. T hus the Sorensen coefficient
expresses the shared abundance as a proportion of the total abundance, w hile E uclidean distance
is unconcerned w ith proportions.

10 Plot 4 ♦

8
СЧ
00 6

Õ 33.3
<D
Q* 4
СП Plot 2
2 I 33.3
♦ / Plot 1 Plot 3
0

2 4 6 8 10
Species 1
D istance M easures

Box 6.2. E xam ple data set com paring E uclidean and citx -block distances, contrasting the effect
o f squaring differences versus not.

H ypothetical data: abundance o f four species in three sam ple um ts (SU).

Sp
SU 1 2 3 4
A 4 2 0 1
B 5 1 1 10
C 7 5 3 4

Sam ple units A.B: species differences d ~ 1, 1, 1, 9 for each o f the four species.
Sam ple units A.C: species differences </= 3. 3, 3, 3

D istance M easure

P air o f SUs E uclidean City-block

AB 9.165 12

AC 6 12

T he sim ple sum o f differences (city-block distance) is the sam e for AB and AC
E uclidean distance sum s the squared differences, so that difference o f 9 is given more
em phasis w ith E uclidean distance than w ith city-block distance. Thus, the Euclidean
distance betw een A and B is larger th an the distance betw een A and C The city-
block distance between A and B is the sam e as that betw een A and C. W hich dis­
tance m easure m atches your intuition ’

Vous aimerez peut-être aussi