Académique Documents
Professionnel Documents
Culture Documents
Patchworks
Sebastian Bocker 1 and Andreas W. M. Dress 2
1. INTRODUCTION
Given a set X, let P(X ) denote the set of all subsets of X; the set P(X )
is partially ordered by inclusion. We denote by A _* B the union A _ B of
disjoint sets A, B. We consider collections CP(X) of subsets of X; such
collections will also be called cluster systems, and the subsets in C will also
be called (C-)clusters. To emphasize the ground set X, we will sometimes
also refer to the pair (X, C) rather than to just C when discussing a cluster
system C.
From [3], we recall the following notations, definitions, assertions, and
basic observations:
1
Sebastian Bocker was supported by ``DFGGraduiertenkolleg Strukturbildungs-
prozesse,'' Forschungsschwerpunkt Mathematisierung, University of Bielefeld, Germany.
2
Andreas Dress was supported by ``Bundesministerium fur Bildung, Wissenschaft,
Forschung und Technologie,'' FKZ 0311642, Germany; presently Distinguished Visiting
Professor at City College, CUNY, Department of Chemical Engineering.
1
0001-870801 35.00
Copyright 2001 by Academic Press
All rights of reproduction in any form reserved.
2 BOCKER AND DRESS
. C := . C and , C := , C ;
C#C C#C
X
C :=C _ [<, X ] _ [[x] | x # X] ;
and
Clearly, we have
min*(C)=min(C"[<]); (1)
X
\+
1
:=[YX | *Y=1]=[[x] | x # X ].
Every such cluster system will henceforth be called a patchwork or, more
specifically, an X-patchwork.
Consult Figs. 1 and 2 for on the definition of patchworks. Note that
every patchwork is almost finitary and that a hierarchy is a patchwork if
and only if it is almost finitary. More generally, a cluster system C is a
patchwork if and only if it is almost finitary and satisfies the condition
(Patchwork) for all finite non-empty subsets C$Cor, equivalently, for
all subsets C$C with *C$=2.
Suppose that C satisfies these two conditions, and let C$ denote an
arbitrary non-empty subsystem of C with <{Z 0 := C$. Then there must
exist some C 0 # min(C$Z0 ), and we must have Z 0 C 0 & C$ # C and, hence,
C 0 & C$=C 0 for all C$ # C$. In turn, this implies Z 0 C 0 C$=Z 0 and,
hence, Z 0 =C 0 # C. The claim Z 1 := C$ # C follows analogously by
considering some cluster C 1 # max(CZ1 & C$Z0 ) and observing that
C 1 _ C$=C 1 must hold for all C$ # C$.
Furthermore, if a cluster system CP(X ) is almost finitary, an
X-patchwork, or an X-hierarchy, then C X is finitary, an X-patchwork, or
an X-hierarchy, respectively.
An X-patchwork C with X # C will be called a connected patchwork, and
it will be called relatively connected if its support C is a C-cluster.
Clearly, any two distinct maximal clusters in a not relatively connected
patchwork C must be disjoint; so the set of non-empty maximal clusters in
a patchwork C forms a partition of C, and C decomposes into the
``essentially disjoint'' union of the connected C-patchworks CC where C
runs over all non-empty maximal C-clusters.
Similarly, any two distinct clusters from min*(C) must have an empty
intersection and, hence, the clusters in min*(C) provide a partition of
min*(C).
For further reference, we collect some of the above observations in
F :=(C i ) i # I
C J =C F, J := [C i | i # J ]
6 BOCKER AND DRESS
IF Ä CY : J [ C J
(i) C is ample;
(ii) for every non-empty cluster C # C&min*(C), there exist either
two disjoint clusters A, B # C/C with A _* B=C, or a (necessarily infinite)
chain C$C/C with C$=C;
(iii) for every non-empty cluster C # C&min*(C), there exist either
two clusters A, B # C/C with A _ B=C, or a (necessarily infinite) chain
C$C/C with C$=C;
(iv) C= min*(C) holds and, for every proper trace (I, I) of
(X, C) with I< I =[<] _ ( 1I ) _ [I ], we have *I=2;
(v) C= min*(C) holds, and every proper trace (I, I) of (X, C)
is ample;
(vi) C= min*(C) holds and, for every proper C-family F of
(X, C), there exist either two distinct indices i, j # I F with [i, j ] # IF , or
there exists a non-empty sub-chain I$IF such that every element J # I$
has infinite cardinality and * I$1 holds.
(i) C is ample;
(ii) C contains an ample and almost finitary hierarchy C0 with
max(C0 )=max(C) and min*(C0 )=min*(C);
(iii) every maximal sub-hierarchy C0 of C is ample.
2. EXAMPLES
Examples. (1) Let T=(V, E ) denote a tree with vertex set V and
edge set E( V2 ). We define
{
CT := FE | the subgraph
\. F, F + is connected= , (3)
More specifically, we will say that such a sequence is an injective path from
p 0 to p n . A non-empty subset FE is called a G-patch if the subgraph
( F, F ) is connected and if, for every two distinct vertices u, v # F
and every injective path p from u to v, we have ( p) F. Clearly, if
G=(V, E ) is a tree, then FE is a G-patch if and only if F # CG ; cf. (3).
So, generalizing (3), we may define
for any graph G; one can check thatas in the case of treesCG is a
patchwork, that CG is connected if and only if G is connected, and that CG
is ample. Note also that FE is a minimal G-patch if and only if the sub-
graph ( F, F ) is a 2-connected component of G (provided ``bridges of G''
are included among the 2-connected components of G; see, for example,
[8] for definitions); in particular, one has CG =(CG ) E if and only if G is a
tree and Fig. 3 for an illustration.
10 BOCKER AND DRESS
and
r(F) :=*R(F ),
and for every e # E, we put r(e) :=r([e]). We suppose R(E )=V, and we
define a map *: P(E ) Ä R by
Note that
r(E)& e # E r(e)
*(<)= 0
*E&1
e # E r(e)&r(E) f # F r( f )&r(F )
(9)
*E&1 *F&1
holds for all FE with *F2 or, equivalently, if one has
e # E r(e)&r(E )
r*(F) : r*( f ) with r*(F ) :=r(F )& (10)
f #F
*E&1
for all FE with *F1 (where r*(e) :=r*([e]) for all e # E, of course).
Then, the set
{ }
CR :=[FE | *(F )0]= FE r*(F ) : r*( f )
f #F
= (11)
Indeed, we have R(F )= F, r(F )=* F and r(e)=2 for all e # E. From
r(E )=*V=*E+1, we infer
e # E r(e)&r(E ) 2 *E&*E&1
= =1.
*E&1 *E&1
that holds for all subsets FE with *F2. Finally, to establish CR =CT ,
we observe that
\ +
*(F)=*F+*? 0 . F, F &2 *F+(*F&1)=*? 0 . F, F &1
\ +
vanishes for some non-empty set FE if and only if ( F, F ) is connected.
12 BOCKER AND DRESS
e # E r(e)&r(E )
=n
*E&1
and
f # F r( f )&r(F) *F+n&r(F )
=n+
*F&1 *F&1
C* :=[C # C ( C I ) | I(C)=I ],
So, we must have *I$=2 which implies *max(IJ2 &J1 )=1 and hence, in
view of [i ] # I for each i # I,
{ = { =
max(IJ2 &J1 )= . max(IJ2 &J1 ) = . (IJ2 &J1 ) =[J 2 &J 1 ],
C* :=max(C0( /C 0 ))
14 BOCKER AND DRESS
5. APPLICATIONS
The following lemma deals with the case that a collection C of subsets
of X is ``nearly'' an ample patchwork. Note that, given an ample cluster
system CP(X ) and a cluster C 0 , the cluster system CC0 is ample, too.
The following simple observation will be used in [4]:
is ample; otherwise, it returns False. Given an oracle that, for any two dis-
joint clusters C 1 , C 2 # C, is able to tell us in constant time whether
C 1 _ C 2 # C holds, the algorithm is of order O((*X ) 2 ), provided that every
other command (like forming that union of those two disjoint sets) can be
carried out in constant time.
Proof. Clearly, at the beginning of each step of the While-loop (line 2),
the following holds: A & B=<, X Â B, C$ is an ample hierarchy with
( X1 )C$, A _ B coincides with the associated partition max(C$) of X, and
B _ B$ Â C holds for all distinct clusters B, B$ # B.
If the algorithm stops with A=<, then BC$ forms a partition of X.
Since B _ B$ Â C holds for all distinct clusters B, B$ # B=max(C$), we infer
that *B3 must hold. Hence, Theorem 3, (i) (vi) implies that C
cannot be ample.
If the algorithm stops with A=[X ] then it is clear thatwhether or
not C is a patchworkthe output C$ will always be a maximal hierarchy
contained in C.
To check the runtime behavior of the algorithm, note first that the
number 2 *A+*B is decreased by one in every step of the While-loop
(lines 214), so we need at most 2n&2 steps of the loop (where n :=*X )
because after 2n&2 steps, we have 2 *A+*B=2 and, hence, A=[X ]
must hold (because A=< would imply *B3). Furthermore, at each
step, at most *B sets are checked for membership in C (lines 4 and 9). Let
k :=2n&2 *A&*B+1 denote the step number of the While-loop, and
suppose that the loop was left at step K. We denote the cardinality of B at
the beginning of step k of the loop by b k for k=1, ..., K. Then
b k =2n+1&k&2 *A2n&1&k for A{< and b k k&1 holds, the
latter in view of b 1 =0 and b i+1 # [b i \1] for i=1, ..., K&1. Together, we
get
b k min(2n+1&k, k&1) for k=1, ..., K
and
K n 2n&2
: b k : (k&1)+ : (2n&1&k)
k=1 k=1 k=n+1
n(n&1) (n&1)(n&2)
= + =(n&1) 2
2 2
and, hence, at most (n&1) 2 &1=n(n&2) subsets of X need to be checked
for membership in C, since we already know X # C. K
Note that the algorithm described in Fig. 4 is ``best possible'' in the sense
that every algorithm computing a maximal hierarchy C$CP(X ) and
depending on our oracle needs at least n(n&2) calls of the oracle in the
PATCHWORKS 17
We will now show how our results can be established for arbitrary
patchworks C. First, we give some examples showing that the situation can
get more complicated for infinite cluster systems C:
C :=[[1, ..., n] | n # N]
C :=[[q # Q | q:] | : # R]
R
C :=[(&, :), (&, :] | : # R] _ [R, <] _
\1+
is an ample and finitary R-hierarchy with C R =C.
The main obstacle for just copying the proofs given above for the case
*C< is that, unfortunately, the assertions (Ample 12) do not hold
verbatim for ample and finitary cluster systems; see Example 7. Instead, we
have:
I
I(C) Â < I =[I ] _
\+
1
_ [<]
holds, we are done. To find such a cluster, choose a maximal chain C$ from
that must exist in view of Zorn's Lemma. First, suppose C* := C$ # C*.
Then, condition (iii) implies the existence of some non-empty subcluster
C # C /C* with I(C) Â < I, so we are done. Otherwise, we must have
I(C*) # ( 1I ) _ [<], that is, there must exist some i # I with C*C i .
Choose j # I&[i ] and, noting that C i & C{< and C j & C{< holds for
all C # C$, define
C$ :=[C # C0 | x # C % C 0 ].
REFERENCES
1. H.-J. Bandelt and A. Dress, Reconstructing the shape of a tree from observed dissimilarity
data, Adv. in Appl. Math. 7 (1986), 309343.
2. S. Bocker, D. Bryant, A. W. M. Dress, and M. Steel, Algorithmic aspects of tree amalgama-
tion, J. Algorithms 37, No. 2 (2000), 522537.
3. S. Bocker and A. W. M. Dress, A note on maximal hierarchies, Adv. Math., in press.
4. S. Bocker, A. W. M. Dress, and M. Steel, Most parsimonious quartet encodings of binary
trees, in preparation.
5. S. Bocker, A. W. M. Dress, and M. Steel, Patching up X-trees, Ann. Comb. 3 (1999), 112.
PATCHWORKS 21
6. D. Bryant, Hunting for trees in binary character sets: efficient algorithms for extraction,
enumeration, and optimization, J. Comp. Biol. 3 (1996), 275288.
7. H. Colonius and H.-H. Schulze, Tree structures for proximity data, British J. Math. Statist.
Psych. 34 (1981), 167180.
8. A. Dress, V. Moulton, and M. Steel, Trees, taxonomy, and strongly compatible multi-state
characters, Adv. in Appl. Math. 19 (1997), 130.