Mynard, Seal - Phenotype Spaces

J. Math. Biol.
(2010) 60:247266
DOI 10.1007/s00285-009-0265-8 Mathematical Biology
Phenotype spaces
Frdric Mynard Gavin J. Seal
Received: 18 February 2008 / Revised: 24 February 2009 / Published online: 29 March 2009
Springer-Verlag 2009
Abstract The topological viewpoint on spaces of phenotypes presented in Stadler
et al. (J Theor Biol 213:241274, 2001) is revisited, and a quantied version is
proposed. While necessary probabilistic information can be encoded in a topological-
like fashion, it turns out that it is not reected adequately by the concept of conti-
nuity. We propose alternative models, but the behavior of maps make these models
non-topological in fundamental ways.
Keywords Genotype Phenotype Topology Pretopology RNA folding
Mathematics Subject Classication (2000) 54A05 54A20 54H99 60B05
92D15 92D20
1 Introduction
In Stadler et al. (2001), Fontana, Stadler, Stadler, and Wagner present various aspects
of a theory of evolutionary change, coined here as the GP-map model (details are
recalled in the next section). The folding of RNA into secondary structures provides
the simplied model for evolutionary change that the authors focus on. The reader is
however warned that the point of view adopted is rather extreme: genotype and phe-
notype are two aspects of the same molecule; environmental constraints are simple
thermodynamic parameters, so that in a xed thermodynamical setting, selection is
F. Mynard (B) G. J. Seal
Department of Mathematical Sciences, Georgia Southern University,
Statesboro, GA 30460-8093, USA
e-mail: fmynard@georgiasouthern.edu
G. J. Seal
e-mail: gseal@georgiasouthern.edu
123
248 F. Mynard, G. J. Seal
essentially ignored. Nevertheless, before selection decides the fate of a specic new
phenotype, that phenotype must be produced. It is the internal dynamics allowing to
access new phenotypes that is the focal point of Stadler et al. (2001) and of the pres-
ent paper, and the simplied model is reasonable to describe this aspect of evolution.
Fontana, Stadler, Stadler, and Wagner further argue that continuous changes in the
RNAsequences can lead to apparent jumps at the phenotypic level, so that evolution
is not solely inuenced by natural selection and genetic drift, but is also directed by
internal dynamics. In order to analyze the jumps in the shapes of RNA sequences,
the authors equip the genotype set with a pretopology (see Sect. 3.1 for more details),
which is then used to structure the phenotype set. This approach leads the authors to
describe the jumps as an essentially continuous process, interrupted at intervals by a
discontinuous event.
The intent of this work is to explore in more depth and develop the mathematical
framework introduced by Fontana, Stadler, Stadler, and Wagner. The present authors
were originally motivated to investigate the intuitive notion of continuity/discontinu-
ity underlying the evolutionary processes described by the GP-map model, notably
punctuated equilibrium. We refer to Stadler et al. (2001) for a more thorough treatment
of the biological motivations and ideas of the model, and shall concentrate here on its
mathematical aspects.
It should be noted that Stadler et al. (2001) focuses on the pretopological approach
for a didactic purpose only. Indeed, as observed in op. cit., Stadler and Stadler (2004)
and Wagner and Stadler (2003), a biologically realistic model taking into account pos-
sibly unequal crossovers would make use of more general types of closure operators
(see Sects. 3.1, 6 for definitions of these closure operators). We begin our study by
considering the simplest version of the topological approach: only nite pretopological
spaces are considered. Since these are mathematically equivalent to directed graphs
called digraphs from now onthe setting can be translated into classical graph-theo-
retic terms, and the theory of randomwalks on weighted digraphs exploited to analyze
the dynamics of the GP-map model (Sect. 2). We begin our study of the model along
this path in order to give an alternate view of the dynamics that underly the approach
proposed in Stadler et al. (2001). In the general case (described in Sect. 6) however,
such an interpretation in terms of Markov chains is no longer available.
As suggested in op. cit., the quantitative information extracted from the GP-map
model can be better represented by replacing the topological approach by a met-
ric one (see Sect. 4.2 below). Thus, we develop the mathematical model along these
lines, in order to clarify certain aspects of the topological interpretation initially
proposed. Unfortunately, even with these new structures at hand, the interpretation
of the systems evolution in terms of a satisfying notion of continuity is in no way
obvious. In fact, while the necessary probabilistic information can be encoded in a
topological-like manner, the inadequacy of the topological models lies in the nature
of their maps, which do not reect the probabilistic information correctly.
Standard references for the mathematical themes that we use can be found in most
introductory probability books for random walks and Markov processes (for example
Grinstead and Snell 1997), and Dolecki (2008) for topological-like structures (lters,
neighborhoods, pretopologies, etc.) used herein. Nevertheless, we attempt to be as
self-contained as possible.
123
Phenotype spaces 249
1.1 The GP-map model
Let us briey recall the essentials of the genotype-phenotype-map model (or GP-
map model) that we will exploit in this work (for more details, we refer to Stadler
et al. 2001). In the example of RNA folding, the GP-map model is given by a map
f : G P, where the genotype set G is a set whose elements are RNA sequences
of xed length, referred to hereafter as sequences, and the phenotype set P is a set
whose elements are the minimum free energy shapes, referred to simply as shapes, or
phenotypes; the GP-map f then represents the folding process of a sequence into
its shape. Each sequence of the genotype set is a word of length l on the alphabet
{a, c, g, u} (where a stands for adenine, c for cytosine, g for guanine and u for ura-
cil), which can mutate into any of its 3l one-error mutants, that is, into any of the
sequences that differ in exactly one coordinate from it. The genotype set becomes the
set of vertices of a Hamming graph (in which two sequences are adjacent if and only
if they differ in one coordinate), with a loop added at each vertex, so that a sequence
is allowed to avoid mutation. In the model proposed in Stadler et al. (2001), the loop
at each vertex is not added and non-mutation is not considered; however, following
this line of thought here would create unconsistencies in our model later on. Thus, we
chose to endow each edge (including the loop at each vertex) with a probability
1
3l+1
,
so a random walk on this graph therefore represents a series of mutations occurring
or notat regular time intervals.
1
For simplicity, we denote the resulting weighted
graph by G again. The phenotype set is then the vertex set of the quotient graph P
induced by f . To describe this graph, let us denote the probability that a sequence
x G mutates into a sequence y G by Prb(y x), so one has
Prb(y x) =
1
3l+1
if d
G
(x, y) 1
0 else,
(i)
where d
G
: G G N denotes the Hamming distance in G between two sequences
(so that d
G
(x, y) 1 iff x = y or y is a one-error mutant of x). In (i), we make an
explicit choice of a probability of transition, but only to illustrate how such probabil-
ities lead to the neighborhood structure. Indeed, while (i) might reect the process by
which mutants are produced, the actual probability of transition to a specic genotype
at the individual and population levels is clearly much more complex. Similarly, all the
specic probabilities appearing later on are introduced to facilitate the understanding
of the mathematical constructs being presented, rather than to provide meaningful
quantitative information. Suppose now that G is partitioned into disjoint subsets. We
introduce the following consistent probabilities of transition between the partitioning
sets: the probability Prb(B A) of transition from a set B to a set A by one-error
mutation is given by
1
As we mention further on, other weights can be given in order to better reect the biological reality
(in which mutation should be much less probable than non-mutation), but as long as the probabilities are
different from zero, such a choice will not affect the long-term behavior of the system.
123
Prb(B A) =
yB
xA
Prb(y x)
yG
xA
Prb(y x)
, (ii)
which in the present context leads to
Prb(B A) =
Edg(B, A)
(3l +1)| A|
where | A| is the number of elements in the set A and
Edg(B, A) = |{(y, x) | y B, x A, d
G
(y, x) 1}| .
The probability Prb( ) that a mutation in G will lead to a change from shape
to a shape in P is then given by
Prb( ) =
Edg( f
1
(), f
1
())
(3l +1)| f
1
()|
. (iii)
This probability is essentially the occurrence frequency described in Fontana and
Schuster (1998). Note however that the frequency ratio A( ) dened in Stadler
et al. (2001) is not directly related to Prb( ), since it only takes into account
the behavior of mutations near the border of a bre f
1
(). Due to the difculty
of determining a realistic probability of transition between phenotypes, we base our-
selves on our heuristic model to dene it as follows: given probabilities of transitions
Prb(y x) on the set G of genotypes, probabilities of transitions Prb( ) on the
set P of phenotypes are induced via
Prb( ) = Prb( f
1
() f
1
()), (iv)
where the right-hand side of the equation is dened via (ii). This approach reects the
idea that the probability of transition from to depends directly on the probability
of a mutation leading to a jump from the neutral network of to that of . How-
ever, Prb(y x) need not be given by (i). Actually, even if each type of mutation is
considered equiprobable, replication remains more probable than mutation. To reect
this fact, one can introduce a relatively small probability m of mutation (or even an
increasing function of l of the form m : N [0, 1]), and this new setting yields the
probabilities
Prb(y x) =
1 m if x = y
m
3l
if d
G
(x, y) = 1
0 else,
(v)
and probabilities of transitions on P are induced via (iv). Of course, other possibilities
for Prb(y x) could be explored, but the model always requires (iv) to describe the
probabilities on P.
123
The phenotype set P is endowed with a weighted graph structure by equipping
each pair of shapes (, ) with the probability Prb( ) (as mentioned before,
this is the quotient weighted digraph of G by f : G P), and a random walk on P
represents the change in shape induced by the mutations of the RNA sequences in G.
This model puts forth the idea that while mutations can occur regularly in a sequence
x of RNA, the shape that x takes on can remain the same while the mutating sequence
moves in the same bre f
1
(), and then suddenly jumps into another shape when
x changes bre. The term neutral drift is used to describe a random walk in G that
remains in a constant bre.
2 Dynamical aspects and Markov chains
We will see later on (at the end of Sect. 4.1) that the raw data required by Markov
chains can be equivalently viewed as a probabilistic version of the pretopological
spaces described in Stadler et al. (2001). Although the former give us an idea of the
behavior of the long-term dynamics of the system, we stress that the model being
studied is an extreme simplication of the biological processes of evolution. Thus, the
classical approach presented here only provides us with very partial information on
the dynamics of the complex processes underlying a real-life biological system.
As is the case for many other evolutionary models (see Stadler 2002), directed
graphs (or digraphs) provide a convenient conceptual framework in which to study
the evolution of the phenotype space equipped with the dynamics described by the
GP-map model. As suggested by the latter, an evolutionary trajectory can be visual-
ized by a random walk on the graph whose vertices are the elements of the phenotype
space, and whose directed edges carry a probability corresponding to the accessibility
ratio:
a
:= Prb( ),
for all phenotype shapes and . These probabilities provide the entries of the tran-
sition matrix of a Markov process with state space P:
M = (a
)
,P
.
Let us emphasize that this Markov process is not a direct consequence of the GP-map
model (as the Markov process on the latter is a priori not lumpable, see Kemeny and
Snell 1960); it is only considered here as a simple heuristic for the topological setting
presented further on.
The information pertaining to the probabilistic evolution of the phenotype space
is then contained in the successive powers M, M
2
, M
3
, . . . of the transition matrix.
In the context of the GP-map model, this matrix is strongly constrained: it is prefera-
bly non-symmetric, a fact that reects the asymmetry in the direction of phenotypic
mutations, which in turn provides a preferred direction for evolution (see Stadler et al.
2001); it also has strictly positive diagonal elements, a fact that allows the drift along
the neutral network.
Let us analyze this matrix further. Although M is not symmetric, if an entry a
is
distinct from 0, then so is a
: indeed, according to the GP-map model, the mutations

123
in the genotype space are following a random walk on an undirected graph whose
edges all carry the same probability; thus, the mutation in the genotype space that
accesses from is reversible, and the corresponding probability a
nonzero. By
reducing if necessary the matrix into irreducible components, one can also suppose
that M is irreducible, so that any phenotype in the corresponding state space P is
accessible from any other. Better yet, since M is nite, and the diagonal entries are all
distinct from 0, the matrix is regular: given a state , there is a random walk to any
state and that subsequently stays there indefinitely (even though the probability of
this event is small, it is non-zero), so by niteness of P, one can take the maximum
of the lengths of these walks to observe that the row vectors
_
a
(n)
_
P
of M
n
must
all have strictly positive components for large enough n.
One can therefore conclude that the powers of the regular transition matrix M
converge to a limit matrix L (see Grinstead and Snell 1997):
lim
n
M
n
= L,
with constant columns. The values in each column then determine the prevalence
landscape of the phenotype space, and pinpoints which phenotypes will be more likely
to prevail. The results described in Fontana and Schuster (1998) seem to indicate that
a row vector of L will reveal a small number of almost absorbing states that occur
among a wide majority of transient states (the column indices whose constant value
in L will be close to 0).
3 The FontanaStadlerStadlerWagner approach
3.1 Pretopologies
In Stadler et al. (2001), Fontana, Stadler, Stadler, and Wagner endow the set of phe-
notypes P with a pretopology dened in terms of the GP-map f : G P. One
way to dene a pretopology is in terms of lters. A lter F on a set X is a family of
non-empty subsets of X which is closed under nite intersection (that is, A F, B
F A B F) and under supersets (that is, A F, A B B F).
A pretopology on a set X assigns to each point x of X a lter N
(x) such that

x
_
N(x); this lter N(x) is called the neighborhood lter of x. Unlike in the
topological case, pretopological neighborhood lters do not need to have a lter-base
composed of open sets (that is, sets which are neighborhoods of each of their points).
In particular, a pretopology cannot be described via an idempotent closure operator
(see below), but one can nonetheless use a generalization of the latter that will play a
similar role. Thus, a pretopology can alternatively be given in terms of an adherence
operator, that is, a map adh
: 2
X
2
X
(where 2
X
denotes the set of all subsets of
X) which is
(1) grounded: adh
= ,
(2) expansive: A adh
A,
123
(3) isotone : A B adh
A adh
B,
(4) additive: adh
(A B) = adh
A adh
B.
For adh
to describe a topological space, one also requires idempotency:

adh
adh
A = adh
A.
The passage from the adherence operator to the neighborhood presentations of a pre-
topology and back is given by
x adh
A N N(x) (A N = ).
This correspondence is the same that is used to switch between neighborhoods of
topological spaces and closure operators, so that pretopologies generalize topologies
to the same extent that adherences generalize closures. The purpose of considering
pretopologies on spaces of phenotypes is to introduce a notion of continuity for evo-
lutionary trajectories, since the former do not possess enough structure to yield a
topology. To reproduce the notion of continuity for topological spaces, one says that
a map h : (X, ) (Y, ) is continuous if
h(adh
A) adh
h(A)
for each A X, or equivalently, if
h(N
(x)) N
(h(x))
for all x X. If and are two pretopologies on the same set X, then is ner than
(or is coarser than ), in symbols if the identity map id
X
: (X, ) (X, )
is continuous, that is, if N
(x) N
(x) for every x X.

3.2 Finite pretopologies and the GP-map model
In the context of the GP-map model, one denes a pretopology on G via
adh
A := {x G | y A (d
G
(x, y) 1)},
for all A G, where d
G
: G G N denotes the Hamming distance in G, as
in Sect. 1.1. In other words, the adherence of a subset A of G is made up of all the
sequences in G that can be reached in one mutation or less from a sequence in A. As
mentioned in Sect. 3.1 above, such an adherence operator adh
uniquely determines a
neighborhood lter N
(x) for every point x in G. Since only nite sets are considered,
every neighborhood lter N
(x) is principal, that is, has a smallest element, denoted

with a non-calligraphic N
(x), and we have

adh
A =
_
xA
N
(x) = N
(A).
123
That is: if G is a nite set, then the adherence of a subset A of G is the principal
lter made up of all sets B that contain the set N
(A). The above formula is true for

the structure induced by the Hamming graph because the latter is symmetric. It is not
true in general for nite pretopological spaces. Indeed, a nite pretopological space
(X, ) denes a directed graph whose set of vertices is X, and in which an edge from
x to y exists if and only if y N
(x). Conversely, a directed graph with a loop at

each vertex denes a pretopological space whose underlying set is the set of its
vertices and in which y N
(x) if and only if there is an edge from x to y. Since the

previous correspondences are inverse of one another, nite pretopological spaces can
be identied with directed graphs (with a loop at each vertex), and we will denote by
N
G
(x) (instead of N
(x)) the smallest neighborhood of a point x of a digraph G in

the pretopology that it denes. In this context, a map h : G
1
G
2
between two
digraphs is continuous at x G
1
if
h
_
N
G
1
(x)
_
N
G
2
(h(x)).
As the genotype set G can be equipped with the pretopology induced by the Ham-
ming graph structure, and the dynamics on P is inherited from G via the folding map
f : G P (see Sect. 1.1), it is natural to consider the quotient pretopology on P
induced by f , that is, the nest pretopology
a
on P making f continuous. In Stadler
et al. (2001), this structure is called the accessibility pretopology on P. In this case, a
phenotype is in the
a
-neighborhood of when = f (x), for a one-error mutant x
of a sequence y such that f (y) = . In other words, N
a
() is the set of phenotypes
that are possibly accessible from by a single mutation. However, as observed in
Stadler et al. (2001), accessibility is a weak condition to be a neighbor, which doesnt
distinguish between neighbors likely to be realized and those unlikely to be. In par-
ticular, accessibility is symmetric, whereas the probability of transition is not, as (iii)
shows.
On the other side of the spectrum, the authors of Stadler et al. (2001) also con-
sider the shadow pretopology
s
, for which is in the
s
-neighborhood of if every
sequence y folding into admits a one-error mutant x folding into :
N
s
() =
_
y f
1
()
f (N
G
(y)) .
This is now too strong a requirement. A more meaningful structure should reect the
likelihood of phenotypic change Prb( ) of (iii), which is proportional to the
number of genotypes with phenotype that have G-neighbors with phenotype , or
alternatively, should reect the likelihood of phenotypic change Prb( ) induced
by (v). A pretopology on P dened in these terms would have to use a cut-off value
R
+
. More specifically, the structure
would be given by the sets

N
() = { P : Prb( ) } . (vi)
123
However, if (iii) is used and is too large, this set N
() will not contain anymore,

and can therefore hardly be considered to be a neighborhood of ! In other words, if
> min
P
Prb( ),
then, as P is nite, there will be an
0
P, such that
0
/ N
(
0
), so the sets N
()
(for P) will not dene a pretopology. Of course, one could add the element
to each N
(), but this would mean that the information pertaining to Prb( )
is explicitly ignored, a feature that one would not wish for in a faithful model of the
phenotype space. This issue was not considered in Stadler et al. (2001) where the pos-
sibility of non-mutation was not explicitly included in the model. But in that setting, a
realistic cut-off value would have to be smaller than the minimal probability of rep-
lication min
P
Prb( ), because the probability of mutation is small compared
to that of replication, so that Prb( ) is always large compared to Prb( )
for = . However, this biological justication could be built in the model by use
of probabilities induced by (v) instead of (iii). If we stick to (iii), the model can still
be formally corrected, as we outline in the next section. But in the latter case, the use
of structures more general than pretopologies constitutes a step further away from a
topological model.
Note also that contrary to the claim in Stadler et al. (2001), even if min
P
Prb( ), the accessibility and shadow pretopologies are, at least formally, not the
limiting cases for pretopologies
. More specifically:
Proposition 3.2.1 If
is dened by (vi) on P then:

(1)

a
for every > 0;
(2)
=
a
for every (0, min
P
1
(3l+1)| f
1
()|
];
(3)
s
may not be comparable with
.
Proof (1). If N
() then Prb( ) > 0, so that, in view of (iii), we have

f
1
() N
G
_
f
1
()
_
= . Hence,
f
_
N
G
( f
1
())
_
= N
a
().
(2). Assume that min
P
1
(3l+1)| f
1
()|
and that N
a
(). The latter con-
dition means that f
1
() N
G
_
f
1
()
_
= . Therefore, in view of (iii), we have
Prb( )
1
(3l+1)| f
1
()|
. Hence Prb( ) so that N
().
(3). Consider an example in which f
1
() N
G
_
f
1
()
_
has several elements,
but with a proper subset A with the property that N(y) A = for every y f
1
().
Then, N
s
() but / N
() if >
| f
1
()N
G
_
f
1
()
_
|
(3l+1)| f
1
()|
. On the other hand, it is
clear that we may have N
() without every y f
1
() having a neighbor in
f
1
(), that is, such that / N
s
().
123
Hence, a biologically realistic cut-off would satisfy
min
P
1
(3l +1)| f
1
()|
< < min
P
Prb( ),
which is consistent with the approach in Stadler et al. (2001). However, it is unclear to
us whether the situation described in Sect. 3 can be easily excluded in a biologically
realistic setting. In any case, it was pointed out in Stadler et al. (2001) that the need
for an ad-hoc choice of a cut-off point is not satisfying and calls for the introduction
of a probabilistic or fuzzy version of a pretopology on P. Such a probabilistic version
of the model was briey discussed in Stadler and Stadler (2006). We will now explore
this avenue in more details.
4 Quantitative atopologies
4.1 Probabilistic atopologies
Since the sets N
() that play the role of neighborhoods need not contain the element
, we introduce the following terminology. An atopology on a set X is just the
assignment of a lter N
(x) to each x X, with no additional constraint. Although

we do not require anymore that x
_
N(x), we continue to call N
(x) the neigh-

borhood lter of x. One of the issues concerning the pretopologies
considered in
the previous section is to nd a consistent way to pick a specic ; to remedy this, we
now consider structures that have the potential to encapsulate the information carried
collectively by the family of pretopologies
. In the sequel, we will make a heavy

use of suprema and infima. We denote these operations by
_
and
_
respectively to
make formulas more readable. In particular, when applied to a family of structures
on the same underlying set, they denote the supremum and infimum of the family of
structures relative to the ner order relation (see the end of Sect. 3.1). When applied
to a family of lters on a xed set X, they denote the supremum and infimum for the
order relation given by inclusion of families of subsets of X.
Our terminology is based in part on probabilistic pretopologies of Richardson and
Kent (1996). A probabilistic atopology is a family = (
)
[0,1]
of atopologies
on X where
0
is the indiscrete topology, and

.
Note that if (
is a family of atopologies on X, then the atopology

_
dened by
N
_
(x) =
_
(x) is the supremum of all the
, that is, the coarsest atopology

ner than each
. A probabilistic atopology is called left-continuous if
=
_
<
for each (0, 1] (again, see Richardson and Kent 1996). As the probabilistic
atopologies that we will consider are all left-continuous, we assume from now on
123
that this property is part of the definition of a probabilistic atopology. A map h :
(X, ) (Y, T) between two probabilistic atopological spaces is continuous if
h : (X,
) (Y,
) is continuous for every [0, 1]:

h(N
(x)) N
(h(x))
for all x X.
We endow the phenotype space P described in the previous section with the proba-
bilistic atopology = (
)
[0,1]
where
is dened by (vi) when > 0. Note that in

view of Proposition 3.2.1, the probabilistic atopology contains a constant interval
of copies of the accessibility pretopology for (0, min
P
1
(3l+1)| f
1
()|
].
While the nite atopologies
dene digraph structures on the set of vertices P,

the probabilistic atopology puts different weights on each edge, and can therefore
be interpreted as a Markov chain on P. Indeed, given two phenotypes and in P,
we dene
a
:=
_
{ [0, 1] : N
()},
so that
Prb( ) :=
a
()
a
uniquely denes a Markov chain on P. Conversely, the Markov chain given by the
collection {Prb( ) : , P} denes , and these operations are inverse of
each other if one starts with a Markov chain. Thus, a Markov chain on a nite set
denes a unique probabilistic atopology on that set. Hence, the probabilistic atopol-
ogies can reproduce in a topological-like setting the information conveyed by the
Markov chain discussed in Sect. 2. The advantage of the interpretation in terms of
probabilistic atopologies is that it makes a notion of continuity readily available. We
discuss this aspect in the next section.
4.2 A metric alternative
We consider nowan alternative interpretation of probabilistic atopologies: roughly put,
the probabilistic neighborhoods described in Sect. 4.1 can equivalently be viewed
as metric neighborhoods thanks to a logarithmic transformation. However, it is also
advantageous to replace the concept of neighborhood by that of convergence. In
view of this, we briey recall the details of the topological space, referring the reader
to Dolecki (2008) for more details and further references. Because the notion of a
topological space encompasses a large class of objects, the concept of convergence of
sequences is not sufcient to fully convey the generality of the former. However, by
replacing sequences by lters (see Sect. 3.1 above), one obtains a satisfying theory,
where the structures correspond via
F converges to x F is ner than the neighborhood lter of x
123
(for a lter F on X and a point x in X). This correspondence can be extended to
the metric setting. Although we refer to Lowen (1997) for details, the concepts that
pertain directly to our setting are outlined below.
As observed in Brock and Kent (1997), probabilistic pretopological spaces with
continuous maps on one hand, and pre-approach spaces with contractions in the sense
of Lowen et al. (1997) on the other hand are two presentations of the same mathe-
matical concept (formally, one says that the two categories formed by these objects
and morphisms are equivalent). This equivalence can be extended to the case of prob-
abilistic atopological spaces, and certain weak approach spaces, a procedure that
provides an alternative description of the structures at hand. Given a set X together
with a map : FX [0, ]
X
, where FX denotes the set of all lters on X and
[0, ]
X
the set of all maps from X to [0, ], we say that a pair (X, ) is an ametric
space if satises
I
F
_
(x) =
_
I
(F
) (x),
for every family {F
: I } of lters. Let us mention that for (X, ) to form a

pre-approach space (again, in the sense of Lowen et al. 1997), the structure must also
satisfy
( x) (x) = 0,
where x denotes the lter {A X : x A} generated by {x}. Intuitively, the limit
function (F) measures the default of convergence of the lter F at each point x.
If (F)(x) = 0, the lter fully converges to x, while (F)(x) = means that F
is as far as possible from converging to x. Morphisms between ametric spaces are
contractions: a map f : (X,
X
) (Y,
Y
) between two pre-approach spaces is a
contraction if
Y
( f (F)) ( f (x))
X
(F)(x).
Roughly put, a probabilistic atopology assigns a probability of convergence to each
pair of lter and point, while an ametric structure assigns to such a pair a measure
of the default of convergence in [0, ]. The map ln : [0, 1] [0, ] establishes
a one-to-one correspondence between these two measures that also transforms sums
into products (although this last feature only becomes relevant when one considers
the diagonal condition used to dene approach spaces). There are several other
equivalent descriptions of pre-approach spaces that the interested reader may nd for
example in Lowen and Lowen (1989), Lowen et al. (1997), Lowen (1997), or Mynard
(2008).
In the nite case, these concepts take on a simpler description, as was the case for
pretopologies in Sect. 3.2. Indeed, an ametric structure on a nite set is necessarily
nitely generated Lowen et al. (1997), and can simply be interpreted as sets equipped
with a map d : X X [0, ]. More precisely, if (X, ) is a nite ametric space
123
then
d
(x, y) := ( x) (y)
denes such a map on X. On the other hand, every lter is principal so that each map
d : X X [0, ] determines an ametric structure
d
via
d
(A)(x) :=
_
aA
d(a, x).
Finally, it is easy to see that
d
= and d
d
= d.
4.3 Ametric spaces via the unit interval
As mentioned in Sect. 4.2 above, the order-reversing map ln : [0, 1] [0, ]
yields a bijection between the probabilistic and metric views of the spaces that reect
the Markov structure of the phenotype space P. Because the link of the latter with the
probabilistic presentation is more direct, it is useful to translate the previous descrip-
tion of ametric spaces into their probabilistic counterparts. Thus, from now on, an
ametric space (X, ) will be a set X equipped with a map : FX [0, 1]
X
such
that
I
F
_
(x) =
I
(F
) (x),
for every family {F
: I } of lters. In this case, the limit function : FX

[0, 1]
X
measures the probability that a lter F converges to a point x X. A
contraction in this setting is a map f : (X,
X
) (Y,
Y
) between ametric spaces
satisfying
X
(F)(x)
Y
( f (F)) ( f (x)) .
In other words, a contraction increases the probability of convergence of lters to
points.
In this context, the nitely generated ametric spaces are those that can be described
by a map d
: X X [0, 1], and a map f : (X, d
X
) (Y, d
Y
) corresponds to a
contraction if and only if
d
X
(x, y) d
Y
( f (x), f (y))
for all x, y X. The relation with the weighted digraph structure on P is immedi-
ate via the adequate normalization at each point P, as described previously in
Sect. 4.2. In particular, one denes
d
P
(, ) := Prb( ),
123
for all , P, so that d
P
(, ) represents the weight of each directed edge of the
graph on P.
5 Continuity, and evolutionary trajectories
5.1 Continuity of the GP-map
As pointed out before, the accessibility pretopology is the nest pretopology on P
making the GP-map everywhere continuous. Hence, the GP-map is not continuous
into the ner atopologies
as soon as
> min{Prb( ) | , P : Prb( ) > 0}.
Although the weighted digraph structure on the set P can faithfully be reproduced
by a probabilistic atopological space, this structure is not the probabilistic atopological
quotient structure on P induced by the GP-map f : G P (if G is the metric space
obtained thanks to the Hamming distance d
G
). This can easily be seen by considering
the quotient ametric structure on P:
d
P
(, ) :=
x f
1
()
y f
1
()
d
G
(x, y),
for all , P. In the [0, 1] setting, this structure is described by
d
P
(, ) =
_
x f
1
()
y f
1
()
exp(d
G
(x, y)).
The main obstacle to obtaining the required structure on P is that a contraction is
in no way related to the sums used in a probabilistic distribution, as can be seen in
the definition of a contraction in the unit interval setting. Thus, although probabilistic
atopologies can completely reect the initial data on a weighted digraph, the corre-
sponding continuous maps do not preserve their structure conveniently. In particular,
the quotient graph on P does not appear as a quotient structure of an atopological one.
In the next subsection, we briey investigate whether maps between phenotype spaces
can benet from the point of view provided by probabilistic atopologies.
5.2 Evolutionary trajectories
The notion of continuity for evolutionary trajectories proposed in Stadler et al. (2001)
has the disadvantage of requiring an additional choice for the cut-off value on the space
of phenotypes. Using the probabilistic atopology on P described in the previous sec-
tion should provide a useful alternative. However, the notion of trajectory needs to be
re-interpreted. In Stadler et al. (2001), an evolutionary trajectory is a map : T P
123
where T is a discrete time space (essentially, the natural numbers) and P the space of
phenotypes. When T is endowed with its natural pretopology N(t ) = {t 1, t } (so
that

t converges to t +1), and P with a pretopology (or an atopology), it is possible to
consider whether trajectories are continuous or not. Here it should be noted that each
increment in discrete time corresponds to a potential mutation. Hence, if represents
an evolutionary trajectory, then : T (P,
a
) should be continuous. Indeed,
can be factored through G as = f h where h : T G describes the state of
a sequence at time t and f : G P is the folding map. Here h is continuous if
h(t +1) is obtained from h(t ) by a single mutation, which in our view is an assump-
tion to be made on an evolutionary trajectory in the context of a discrete time. Since
f : G (P,
a
) is continuous, so is .
In Stadler et al. (2001), the authors suggest the possibility of discontinuous geno-
typic change h, but we see the possibility of several mutations between time t and
t +1 as inconsistent with the choice of a discrete space time: otherwise T is merely a
discrete selection of instants in a continuous time space, and there is no reason not to
consider the continuous time space instead. In this context, we think that an evolution-
ary trajectory should be dened as a continuous map : T (P,
a
). In contrast, for
other atopologies
, there will be discontinuities, as observed in Stadler et al. (2001).

However, the biological interpretation of such discontinuities is uneasy, because a dis-
continuity for one may not be one for a smaller . If one endows P with a specic
atopology
0
, the discontinuities depend on the choice of the threshold value
0
. In
particular, small variations of the threshold value may change significantly the number
of discontinuities. This is problematic, since we would like discontinuities that reect
a biological phenomenon to be detectable in a robust manner. Moreover, replacing
an atopology
by a probabilistic version does not lead to a meaningful notion

of continuity, because the time T is not endowed with a probabilistic pretopology.
Indeed, time ows necessarily from t to t + 1, so an atopology can be identied
with a probabilistic atopology = (
t
)
t I
where
t
= for every t (0, 1], but
then a map : (T, ) (P, ) would only be continuous if all transitions in P are
certain, too.
The main motivation for a topological approach to evolutionary change was to
interpret some biological phenomenon like punctuated equilibrium as a topological
discontinuity. It turns out that even if probabilistic atopologies can reect all the statis-
tical information necessary to study the dynamic, the associated notion of continuity
appears essentially useless.
6 Mutation, recombination and general closure spaces
We have seen that the structure induced on the set of genotypes by simple mutation
is that of a directed graph, which can be interpreted as a pretopology. The structure
on the phenotype space that describes accessibility is then the quotient pretopology.
If recombination (crossover) is considered, the adjacency relation of the graph is
replaced by recombination sets r(x, y) of all possible recombinants of two parents
x and y, as discussed in, e.g. Wagner and Stadler (2003) and Stadler and Stadler (2004).
Minimal assumptions on recombination sets are that x and y are in r(x, y), that is,
123
replication is possible; and that r(x, y) = r(y, x), because the role of the two parents
is interchangeable. We may assume that r(x, x) = {x} only if unequal crossover is
ruled out. In a general model, this last property may not be satised. Also, we may
include in r(x, y) both recombinants and one-error mutants of x and y, in which case
r(x, x) would contain at least one-error mutants of x. The recombinations sets induce
a closure operator on the set of genotype via
cl A =
_
(x,y)AA
r(x, y).
By definition, this operator is grounded, and isotone. Because replication is possible,
it is also expansive. However, it needs not be additive nor idempotent. Sets equipped
with such an operator have been considered under various names in the literature.
We will call them preclosure spaces. Hence a pretopological space is an additive pre-
closure space and a topological space is a preclosure space that is both additive and
idempotent. Idempotent preclosure spaces are usually called closure spaces.
Hence, in a general theory of evolutionary accessibility, nite preclosure spaces
form a natural model of genotype spaces. The notion of continuity introduced for pre-
topological spaces generalizes to preclosure spaces: a map f : (X, cl
X
) (Y, cl
Y
)
between two preclosure spaces is continuous if
f (cl
X
A) cl
Y
( f (A)) ,
for every A X. Note that, as in the case of mutation, phenotypic accessibility is ade-
quately described in topological terms. Indeed, if the set of genotype G is endowed
with a preclosure cl
G
, the GP-map f : (G, cl
G
) P induces on P the quotient
preclosure cl
P
in which
cl
P
(A) = f
_
cl
G
( f
1
(A))
_
,
for every A P. This is the nest preclosure structure on P making f continu-
ous. Note that cl
P
() if and only if f
1
() cl
G
( f
1
()) = , that is, if the
potential offsprings of genotypes with phenotype include a sequence with phenotype
. Hence, this structure describes phenotypic accessibility, as noticed for example in
Stadler and Stadler (2004). However, we have seen in the previous section that even in
the simplest case (mutation/pretopology), a topological model carrying the statistical
information needed to make meaningful predictions will not yield a useful notion of
continuity. In the next section, we propose an alternative model, which is intrinsically
non-topological. This makes any parallel with pre-existing closure operations difcult,
if not impossible.
7 A non-topological model
The discussions in Sect. 5 point to the fact that the continuity condition for probabi-
listic pretopological spaces does not convey the pertinent information for the study
123
of the GP-map model. We propose here an alternative point of view from which the
model could be studied, but we insist on the fact that it is not topological in nature.
The emphasis in the GP-map model is on accessibility of a subset of G by a
one-mutation of a sequence of RNA. Hence, an abstract structure on a nite set X
reecting this notion is an operation cl : 2
X
[0, 1]
X
that measures the probability
cl A(x) that an element x is accessed from a subset A X. This operation should
therefore satisfy
(1) cl = 0;
(2) x A, cl A(x) = 0;
(3)

xX
cl A(x) = 1.
The last condition ensures that cl A is indeed a probability distribution for all A X.
To be meaningful, such an operation should not be monotone ( AB cl Acl B)
in order to avoid a contradiction in the probabilistic condition. This contributes to the
non-topological nature of the operation cl. We will call cl a probabilistic closure
even though it differs in essential ways fromprobabilistic pretopologies or atopologies.
A map f : (X, cl
X
) (Y, cl
Y
) between two such spaces is continuous if
x f
1
(y)
cl
X
A(x) cl
Y
[ f (A)](y),
for all y Y, A 2
X
. Note that continuity of a map between probabilistic pretopo-
logical (or atopological) spaces can be described by a similar formula, in which the
sum is replaced by a supremum. This is, however, an essential difference: in the prob-
abilistic topological like models, continuity means that the probability of accessing
a phenotype from f (A) is at least the largest probability of accessing a genotype
with phenotype from A, but it does not reect how many times such genotypes
are accessed from A. This drawback is corrected in this new model. Even better, the
GP-map induces a quotient structure which is exactly the desired one. Indeed, if f is
a surjective map f : (X, cl
X
) Y, then the quotient structure on Y is given by
cl
Y
B(y) :=
x f
1
(y)
cl
X
[ f
1
(B)](x),
for all B 2
Y
, y Y. For example, when the genotype set G is equipped with its
looped Hamming graph structure, then we dene
cl
G
A(x) := Prb(x A),
where Prb(x A) is given by (ii). Then the quotient structure on P is precisely the
quotient graph structure in which
cl
P
() = Prb( )
as in (iii).
123
More generally, the genetic operations acting on G in the case of mutation or of
recombination can be seen as maps of the form c : G {0, 1}
G
in the rst case
and c : G G {0, 1}
G
in the second. In the rst case c(x)(y) = 1 if y is a
one-error mutant of x. In the second case, c(x, y)(z) = 1 if z is a potential offspring
of the pair (x, y). We can identify c : G {0, 1}
G
with a map of the second kind
c
: G G {0, 1}
G
by c
(x, y) 0 if x = y and c
(x, x)(z) = c(x)(z). Hence

we can treat both cases simultaneously, and associate to such a map c a probabilistic
closure via
cl
c
A(x) =
(a,b)AA
c(a, b)(x)
yG
(a,b)AA
c(a, b)(y)
. (vii)
Then cl
c
A(x) represents the probability to access the genotype x from the collection
of genotypes A. Realistically, mutations and appearance of offsprings are of course
not equidistributed. However, this can easily be taken care of in this model, because
the same formula (vii) applies if c(x, y) is no longer valued in {0, 1} but in [0, 1] (that
is, c : G G [0, 1]
G
) and carries some statistical information on the likelihood
to access a specic offspring, or mutant, among the potential offsprings, or mutants
respectively. Once the genotype space is endowed with its probabilistic closure struc-
ture induced by the genetic operator c, the GP-map f : G P induces the quotient
probabilistic closure on P (the nest making f continuous) given by
cl
P
B() =
x f
1
()
cl
c
[ f
1
(B)](x)
=
x f
1
()
(y,z) f
1
(B)f
1
(B)
c(y, z)(x)
t G
(y,z) f
1
(B)f
1
(B)
c(y, z)(t )
.
This operation adequately reects the likelihood of accessing the phenotype from
the collection of phenotypes B.
8 Concluding remarks
The main conclusion from our analysis is that a meaningful topological notion of
continuity solely based on a probabilistic approach seems out of reach. Indeed, if we
use a specic
for a threshold value , then continuity/discontinuity merely distin-

guishes between likely and unlikely events, where is the threshold probability value
separating what is likely from what is not. If instead we try to encode the entire statis-
tical information in a topological manner, like in Sect. 4.1, continuity is necessarily
continuity in every
and becomes far too stringent a condition to be meaningful.

The model proposed in Sect. 7 seems to us to be more adequate, but is fundamentally
non topological. Let us try to clarify this point. Without getting into the details of
category theory, we would like to give a sense of the concept of a topological category
(in the sense of Admek et al. 1990) which provides one essential measure of how
topological a model is. Among the fundamental features of topological spaces and
123
continuous maps is the fact that the product of an arbitrary family of topological spaces
is a well-dened, easy to construct, topological space. Actually, it seems to us that
the availability of product spaces was among the main motivations for the authors of
Stadler et al. (2001) to seek a topological theory. Indeed, a topological setting allows
for example the development of a theory of characters (Wagner and Stadler 2003),
which is based on the representation of a phenotype space or a region thereof, as a
product space where each factor represents a character. The existence of canonical
products is one of the consequences of the fact that the underlying category is topo-
logical. In this sense, a model in which canonical products are not available is highly
non topological. But for a probabilistic version of (pre)topological spaces to be (cate-
gorically) topological, in particular for product spaces to be well-dened, it is essential
to dene continuity the way it has been dened in Sect. 4.1. We have seen that this
notion does not seem to nd a meaningful interpretation. In contrast, if the condition
on maps is modied from
_
x f
1
(y)
cl
X
A(x) cl
Y
[ f (A)](y),
to
x f
1
(y)
cl
X
A(x) cl
Y
[ f (A)](y),
as in Sect. 7, we obtain a more meaningful notion, but which does not allow a canon-
ical construction of product spaces. The latter model may well be worth investigating
further, but it will not lead to a theory of characters parallel to that of Wagner and
Stadler (2003).
It seems to us that to succeed in interpreting apparent discontinuities (for exam-
ple, punctuated equilibrium) as discontinuities in a topological sense, the topological
structure of phenotype spaces would have to reect a notion of phenotypic proximity
that does not depend solely on the probabilities of transitions induced by the folding
map. How such a structure should be dened remains a challenge.
Acknowledgments The authors wish to thank the Georgia Southern Faculty Research Committee for
their generous support, as well as the referees for their many valuable remarks that led to substantial
improvements of the paper.
References
Admek J, Herrlich H, Strecker E (1990) Abstract and concrete categories. Wiley, London
Brock P, Kent DC (1997) Approach spaces, limit tower spaces, and probabilistic convergence spaces. Appl
Categorical Struct 5(2):99110
Dolecki S (2009) An initiation into convergence theory. Beyond Topology, AMS Contemp Math 486
Fontana W, Schuster P (1998) Shaping space: the possible and the attainable in RNA genotypephenotype
mapping. J Theor Biol 194:491515
Grinstead CM, Snell JL (1997) Introduction to probability. American Mathematical Society, Providence
Kemeny JG, Snell JL (1960) Finite Markov chains. Van Nostrand, Princeton
123
Lowen E, Lowen R (1989) Topological quasitopos hulls of categories containing topological and metric
objects. Cahiers de Topologies et Gometrie diffrentielle Catgorique 30:213228
Lowen E, Lowen R, Verbeeck C (1997) Exponential objects in PRAP. Cahiers de Topologies et Gometrie
diffrentielle Catgorique 38:259276
Lowen R (1997) Approach spaces: the missing link in topology-uniformity-metric triad. Oxford University
Press, New York
Mynard F (2008) Measures of compactness for lters in the approach setting. Quaest Math 31:189201
Richardson GD, Kent DC (1996) Probabilistic convergence spaces. J Aust Math Soc (Ser A) 61:400420
Stadler B, Stadler P (2004) The topology of evolutionary biology. In: Ciobanu G, Rozenberg G (eds)
Modeling in molecular biology. Springer, Heidelberg, pp 267286
Stadler B, Stadler P, Wagner G, Fontana W (2001) The topology of the possible: formal spaces underlying
patterns of evolutionary change. J Theor Biol 213:241274
Stadler P (2002) Spectral landscape theory. In: Crutcheld JP, Schuster P (eds) Evolutionary dynamics.
Exploring the Interplay of selection, neutrality, accident and function. Oxford University Press,
New York, pp 231272
Stadler P, Stadler B (2006) Genotype phenotype maps. Biol Theory 3:268279
Wagner GP, Stadler PF (2003) Quasi-independence, homology and the unity of type: a topological theory
of characters. J Theor Biol 220(4):505527
123

Mynard, Seal - Phenotype Spaces

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Mynard, Seal - Phenotype Spaces

Transféré par

Droits d'auteur :

Formats disponibles

J. Math. Biol.

: indeed, according to the GP-map model, the mutations

(x) such that

to describe a topological space, one also requires idempotency:

(x) for every x X.

(x) is principal, that is, has a smallest element, denoted

(x), and we have

(A). The above formula is true for

(x). Conversely, a directed graph with a loop at

(x) if and only if there is an edge from x to y. Since the

(x)) the smallest neighborhood of a point x of a digraph G in

would be given by the sets

() will not contain anymore,

is dened by (vi) on P then:

() then Prb( ) > 0, so that, in view of (iii), we have

(x) to each x X, with no additional constraint. Although

(x) the neigh-

. In the sequel, we will make a heavy

is a family of atopologies on X, then the atopology

(x) is the supremum of all the

, that is, the coarsest atopology

. A probabilistic atopology is called left-continuous if

) is continuous for every [0, 1]:

is dened by (vi) when > 0. Note that in

dene digraph structures on the set of vertices P,

: I } of lters. Let us mention that for (X, ) to form a

: I } of lters. In this case, the limit function : FX

: X X [0, 1], and a map f : (X, d

, there will be discontinuities, as observed in Stadler et al. (2001).

by a probabilistic version does not lead to a meaningful notion

(x, x)(z) = c(x)(z). Hence

for a threshold value , then continuity/discontinuity merely distin-

and becomes far too stringent a condition to be meaningful.

Vous aimerez peut-être aussi