Vous êtes sur la page 1sur 13

Article

A Systematic Ensemble Approach to


Thermodynamic Modeling of Gene Expression from
Sequence Data
Graphical Abstract Authors
Md. Abul Hassan Samee, Bomyi Lim,
Núria Samper, ..., Gerardo Jiménez,
Stanislav Y. Shvartsman,
Saurabh Sinha

Correspondence
sinhas@illinois.edu

In Brief
A systematically constructed ensemble
of sequence-to-expression models
captures many distinct, biologically
plausible hypotheses. Systematic
analysis and filtering of the models,
followed by in vivo experiments, improve
our understanding of combinatorial
regulation of the Drosophila ind gene.

Highlights
d Conventional modeling of enhancer readouts may lead to
incorrect conclusions

d A model ensemble can capture many distinct hypotheses


plausible with data

d Filtering and analysis of a model ensemble can reveal new


testable hypotheses

d Ensemble modeling improved current understanding of ind


regulation in Drosophila

Samee et al., 2015, Cell Systems 1, 396–407


December 23, 2015 ª2015 Elsevier Inc.
http://dx.doi.org/10.1016/j.cels.2015.12.002
Cell Systems

Article

A Systematic Ensemble Approach


to Thermodynamic Modeling of Gene Expression
from Sequence Data
Md. Abul Hassan Samee,1,8 Bomyi Lim,2 Núria Samper,3 Hang Lu,4 Christine A. Rushlow,5 Gerardo Jiménez,3,6
Stanislav Y. Shvartsman,2 and Saurabh Sinha1,7,*
1Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
2Department of Chemical and Biological Engineering and Lewis–Sigler Institute for Integrative Genomics, Princeton University, Princeton,
NJ 08544, USA
3Department of Developmental Biology, Instituto de Biologı́a Molecular de Barcelona, Consejo Superior de Investigaciones Cientı́ficas (CSIC),

Barcelona 08208, Spain


4School of Chemical and Biomolecular Engineering and Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of

Technology, Atlanta, GA 30332, USA


5Department of Biology, New York University, New York, NY 10003, USA
6Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain
7Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
8Present address: The Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA

*Correspondence: sinhas@illinois.edu
http://dx.doi.org/10.1016/j.cels.2015.12.002

SUMMARY that contain binding sites for TFs and can regulate the tran-
scription of target genes (Shlyueva et al., 2014). Maintaining a
To understand the relationship between an enhancer quantitative relationship between input and transcriptional
DNA sequence and quantitative gene expression, output is key to the precise patterning of gene expression.
thermodynamics-driven mathematical models of Accordingly, as the levels of inputs vary across different cell
transcription are often employed. These ‘‘sequence- types, the enhancer-controlled levels of gene expression (also
to-expression’’ models can describe an incomplete termed as the ‘‘readout’’ of the enhancer) also vary (Yáñez-
Cuna et al., 2013). These relationships are a direct function of
or even incorrect set of regulatory relationships if
the enhancer’s DNA sequence. However, a detailed under-
the parameter space is not searched systematically.
standing of how enhancer sequence affects a gene’s expres-
Here, we focus on an enhancer of the Drosophila sion level remains elusive (Yáñez-Cuna et al., 2013). Such un-
gene ind and demonstrate how a systematic search derstanding may be achieved by interrogating a mathematical
of parameter space can reveal a more comprehensive model that explains the available experimental results about
picture of a gene’s regulatory mechanisms, resolve the gene both qualitatively and quantitatively, suggests exper-
outstanding ambiguities, and suggest testable hy- iments to improve upon the current model, and is capable of
potheses. We describe an approach that generates predicting the gene’s expression pattern upon cis or trans per-
an ensemble of ind models; all of these models are turbations. Here, we refer to such models as ‘‘sequence-to-
technically acceptable solutions to the sequence- expression’’ models, and we show how they can form the basis
to-expression problem in light of wild-type data, of a systematic, unbiased enquiry into gene regulation by mul-
tiple TFs.
and some represent mechanistically distinct hypoth-
A common paradigm of sequence-to-expression modeling
eses about the regulation of ind. This ensemble can
is based on equilibrium thermodynamics (Shea and Ackers,
be restricted to biologically plausible models using 1985). This approach models the rate of transcription initiation
requirements gleaned from in vivo perturbation ex- based on quantitative descriptions of variable site affinities
periments. Biologically plausible models make uni- (‘‘motifs’’) (Stormo, 2000) and expression levels of TFs.
que predictions about how specific ind enhancer Because they can incorporate the DNA-sequence-dependent
sequences affect ind expression; we validate these characteristics of TF binding, sequence-to-expression thermo-
predictions in vivo through site mutagenesis in trans- dynamic models of this genre are arguably more realistic than
genic Drosophila embryos. thermodynamic models where all TF-binding sites are assumed
to have the same affinity (Cohen et al., 2014; Fakhouri et al.,
2010; Papatsenko and Levine, 2008; Zinzen and Papatsenko,
INTRODUCTION 2007) or only classified as ‘‘strong’’ versus ‘‘weak’’ (Bintu
et al., 2005; Gertz et al., 2009; Parker et al., 2011; White
Transcription factors (TFs) work in concert with other DNA- et al., 2012). We previously reported one such sequence-
binding molecules to regulate gene expression. These mole- to-expression model called GEMSTAT (Gene Expression
cules act as inputs at enhancers, distinct genomic regions Modeling based on Statistical Thermodynamics) and used it

396 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
for modeling 40 enhancers involved in anterior-posterior cis-regulatory logic. In total, we outline a ‘‘strong inference’’
patterning during early embryonic development of Drosophila approach (Platt, 1964) that systematically eliminates various
(He et al., 2010). mechanistic explanations to the data, within the context of a
Sequence-to-expression models like GEMSTAT face a signif- predetermined qualitative model, and also suggests the addi-
icant challenge. They are formulated using one to three free pa- tional experiments necessary to further refine our mechanistic
rameters that are specific to each TF and other parameters that understanding.
represent TF-TF interactions and basal activity at the promoter.
Because a typical enhancer is controlled by a handful of TFs, RESULTS
even simple models may have a considerable number of free
parameters. Most of these parameters have not been deter- A Model of Transcriptional Regulation by Sequence-
mined experimentally. Instead, they are estimated by numerical Specific Transcription Factors and Their Interplay with
algorithms that operate within a defined parameter space to Signaling Molecules
identify parameter values whose predictions are optimal fits We modified GEMSTAT (He et al., 2010), a sequence-to-
to the data. Importantly, these fits are not necessarily unique. expression model, to study how TFs bound to the ind enhancer
Since the models are typically overdetermined given experi- may regulate the gene’s expression. We outline the main as-
mental data, a given model may make different predictions sumptions and components of the model here; see Experi-
that are consistent with available data at distinct parameter mental Procedures and Supporting Online Information for de-
settings. tails of model formulation. GEMSTAT is founded on a theory
It is entirely possible that there are multiple optima within of combinatorial gene regulation first proposed by Shea and
the parameter space for a given enhancer. For example, several Ackers (Shea and Ackers, 1985). The model considers the sys-
papers on multi-parameter models for systems biology and tem of TF molecules and their cognate sites in the enhancer, as
climatology (Gutenkunst et al., 2007; Tebaldi and Knutti, 2007) well as the basal transcriptional machinery (BTM) and its binding
have demonstrated that different parameter sets can fit the to the promoter, and uses a minimal set of parameters to model
same data. In such cases, the different optima may represent the interactions among TFs, BTM, and DNA (Figure 1A). All inter-
distinct, mutually exclusive hypotheses about underlying mech- actions are assumed to happen in thermodynamic equilibrium,
anisms. A systematic interrogation of a gene’s sequence-to- which is assumed to be reached much more rapidly than the
expression model must consider all such hypotheses and distin- timescale at which the transcription machinery is activated
guish between them. Some related studies have indeed found and begins producing mRNA. Under these assumptions, the
widely different parameter assignments to fit a dataset (Dresch transcription initiation rate, and hence the equilibrium level of
et al., 2010; Granek and Clarke, 2005; Zinzen and Papatsenko, mRNA transcription, are proportional to the fractional occu-
2007), yet the standard practice is to report one or at most a pancy of the BTM at the promoter. GEMSTAT computes this
few best models that result after iterative improvements upon fractional occupancy by considering all possible configurations
initial random guesses (He et al., 2010; Janssens et al., 2006; of DNA-bound TFs and BTM (Figure 1B) and summing the
Kim et al., 2013; Parker et al., 2011; Segal et al., 2008). probability of configurations where the BTM is promoter-bound
This suggests that if one is to understand the relationship (Figure 1C). The equilibrium probability of each configuration
between enhancer sequence and quantitative gene expression is computed based on the Boltzmann distribution. As TF con-
in an unbiased way, one should go beyond the common practice centrations change across cell types, the probability of bound
of seeking the single best fitting model. Instead, one could BTM configurations also changes, reflecting the variation of
explore the entire parameter space for models that agree with expression levels due to the change in regulator concentration
data. (Figure 1D).
Here, we take this approach and perform a systematic explo- An important distinction of the GEMSTAT model from several
ration of the parameter space of the GEMSTAT model for a other thermodynamics-based models—as we mentioned in the
specific developmental gene, intermediate neuroblasts defective Introduction—is its ability to account for varying affinities of a
(ind), in D. melanogaster. Beginning with a qualitative under- TF’s binding sites, by relating mutations from the optimal or
standing of its likely regulators, we adopt an ensemble modeling ‘‘consensus’’ site to corresponding changes in binding energy.
approach (Brown et al., 2004; Kuepfer et al., 2007; Swigon, 2013; For this, GEMSTAT implements Berg and von Hippel’s theory
Toni et al., 2009; Villaverde et al., 2014; von Dassow et al., 2000) of protein/DNA interaction energetics (Berg and von Hippel,
to learn all quantitative models consistent with wild-type expres- 1987), using the TF’s position weight matrix (Stormo, 2000) to
sion data, then use a visualization tool that we introduce here to predict the ‘‘mismatch energy’’ relative to the consensus site
recognize the distinct mechanistic hypotheses they represent. (Supporting Online Information).
Next, we ask how additional perturbation experiments reported Our modification of GEMSTAT in this work allows for modula-
in the literature refine the ensemble of models, and we elimi- tion of a TF’s DNA-binding affinity depending on the concentra-
nate mechanistic explanations that are inconsistent with these tion of some other molecular species, which in our case (next
additional data. The surviving ensemble of models, despite section) was the dual phosphorylated extracellular signal-
representing distinct hypotheses about regulation, can make regulated kinase (dpERK). We modeled a recently proposed
unambiguous predictions about ind’s response to specific per- ‘‘de-repression’’ mechanism whereby the kinase attenuates a
turbations. We verify these predictions experimentally. Using repressor TF’s DNA-binding affinity, resulting in higher expres-
this approach iteratively, we can narrow down the ensemble of sion levels of the regulated gene at higher levels of the kinase
consistent models and refine our understanding of the gene’s (Figure 1E). This allows us to model how a non DNA-binding

Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 397
Figure 1. Overview of the GEMSTAT Model
(A–C) GEMSTAT models the expression readout of
an enhancer from the strength of TF-DNA and TF-
BTM interactions in thermodynamic equilibrium (A)
and considers all possible configurations of bound
TFs and the BTM (B) to compute the equilibrium
probability of BTM occupancy at promoter (C).
Shown is a hypothetical probability distribution for
the configurations shown in (B); probability of BTM
occupancy is computed from configurations c1–c4.
(D) GEMSTAT’s predictions change as TF concen-
trations change across different conditions. Shown
is the profile of mRNA levels (magenta) resulting
from a uniformly expressed activator (green) and a
graded repressor (red); horizontal axis shows
different spatial locations. Also shown is how the
equilibrium probability distributions change with
changes in TF concentrations.
(E) A molecular species (gray) that can attenuate the
DNA-binding affinity of a repressor may increase the
mRNA level of the gene shown in (D).

test if these details are consistent with ob-


servations made under various perturba-
tions of the system, and to make testable
predictions about additional perturbations.
We list below our assumptions about how
these inputs regulate ind expression (Fig-
ures 2A–2C).

d Dl directly activates ind (Hong et al.,


2008), while Sna and Vnd are its
direct repressors (Cowden and Lev-
ine, 2003; McDonald et al., 1998;
Weiss et al., 1998).
d Zld activates ind (Nien et al., 2011),
but the mechanism may be either
direct (similar to Dl) or indirect, or a
combination of both. To model the
indirect mechanism, we considered
Zld and Dl to exhibit cooperative
DNA binding at closely located bind-
ing sites, based on our observation
of proximally located binding sites
for the two TFs (Figure 2D; Supporting
Online Information; Figure S1A), and
on reports of a similar mechanism in
regulatory input may shape the expression pattern of a target the sog enhancer (Foo et al., 2014; Kanodia et al., 2012; Lib-
gene by interacting with the gene’s enhancer. erman and Stathopoulos, 2009). We note that Dl-Zld coop-
erativity can also act as a surrogate for chromatin-mediated
A Model of Transcriptional Regulation of the ind Gene effect of Zld on Dl activation, as suggested previously
We used GEMSTAT to study the details of regulation of ind, a (Cheng et al., 2013; Foo et al., 2014; Sun et al., 2015).
dorsoventral (D/V) patterning gene in Drosophila. The neuroecto- d Cic is a repressor of ind. Mutation of Cic sites in the ind
dermal expression pattern of this gene and an enhancer driving enhancer results in a dorsal expansion of ind expression
this expression have been characterized previously (Stathopou- (Ajuria et al., 2011; Lim et al., 2013). However, Cic has a
los and Levine, 2005; Weiss et al., 1998). Prior works have also spatially uniform nuclear concentration during the pre-
revealed or suggested identities of its major regulatory inputs, gastrulation stage, suggesting that an additional input
and reported several genetic perturbations and the resulting that localizes Cic’s activity domain must be considered
changes in ind expression. Our main goals were to infer mecha- when modeling Cic-mediated repression of ind. Spatially
nistic details of the combinatorial action of these regulators, to restricted signaling may provide this input, presumably

398 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
by relieving Cic-driven repression of ind. In particular, ERK
phosphorylates Cic (Astigarraga et al., 2007), and this has
been proposed to influence Cic activity by impeding its
DNA binding (Dissanayake et al., 2011; Lim et al., 2013),
leading to ind de-repression in a specific domain along
the D/V axis. This is the mechanism we chose to implement
here, though other mechanisms have also been proposed
(Grimm et al., 2012). We obtained a D/V profile of dual-
phosphorylated ERK (dpERK) from (Lim et al., 2013) and
used it as a proxy for ERK activity. To model this effect,
we modified GEMSTAT so that the energy of Cic-DNA
binding is increased (binding affinity is reduced) to an
extent proportional to dpERK concentration (Experimental
Procedures).

To capture the above qualitative features, GEMSTAT uses 13


free parameters: two per TF representing its DNA-binding and
activation/repression potency (denoted by K and a, respec-
tively), one for Dl-Zld cooperativity (denoted by u), one repre-
senting basal transcriptional activity (denoted by qBTM ), and
one representing the attenuation of Cic’s DNA-binding energy
by dpERK (denoted by CicATT ). The free parameters of the model
were optimized to fit the wild-type D/V expression profile of ind,
and prediction from the trained model was found to be in excel-
lent agreement with this wild-type pattern and also to be sensi-
tive with respect to most of the parameters (Figure 2E), indicating
that the model is flexible enough to capture the combinatorial ef-
fect of the assumed regulators in driving ind expression. Notably,
model training failed completely when we did not incorporate
ERK-Cic interplay (data not shown), providing quantitative evi-
dence in favor of this mechanism. However, this exercise also
raised questions about the validity and utility of the trained
model, such as: Can the model correctly predict the effect of
cis and trans perturbations to the system? Does the trained
model provide a unique quantitative explanation of ind regulation
consistent with the data? If not, what are all possible quantitative
explanations of these data, what do they predict about various
perturbations, and how can we narrow them down to the true
underlying mechanism? We address these questions in the
following sections.

Systematic Exploration of Parameter Space Provides an


Ensemble of Distinct Mechanistic Hypotheses
Consistent with the Wild-Type Data
In Figure 2E, we presented the prediction of a single model, i.e.,
one particular setting of parameter values, that accurately fits
Figure 2. Inputs to the Model and an Example of Predicted ind wild-type ind expression. Any assignment of values to the 13
Expression from a Wild-Type Model free parameters of the model corresponds to a predicted
(A) Lateral (left) and D/V cross-sections (right) of blastoderm stage Drosophila
readout of the ind enhancer, which can be scored against
embryos. Embryos were stained with ind mRNA (magenta) and its four non-
uniform regulators: Dl (green), Vnd (blue), Sna (red), and dpERK (gray). the wild-type ind pattern using a ‘‘goodness-of-fit’’ function.
(B) Assumed relationships between ind and its regulators. Any high-scoring parameter assignment represents a plausible
(C) Expression profiles of ind and its regulators along the D/V axis. mechanistic description of ind regulation; it provides insights
(D) PWMs of the regulators and locations of computationally identified sites for into the relative strengths of various regulatory inputs and
TF binding. Asterisks mark the pairs of closely located Dl-Zld sites. makes predictions about the effects of different cis and trans
(E) Predicted ind expression (red) from a model optimized on wild-type data
perturbations.
(purple).
(F–H) Sensitivity plots for a model: panels show the RMSE scores of the model
Given any initialization of parameter values, the GEMSTAT
as the corresponding parameter’s value is varied within its range, keeping program systematically and iteratively modifies those values
other parameters fixed at their optimized values. For brevity, the vertical axis is and reports a locally optimal parameter setting that maximizes
limited at RMSE = 0.10. the goodness of fit. However, there may exist many other

Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 399
parameter assignments that are as good or nearly as good in number of different quantitative models are consistent with
terms of their agreement with data, and examining the one wild-type data, raising the concern that some of these may yield
optimal assignment reported by GEMSTAT may provide a incorrect predictions under perturbation conditions and prompt-
skewed view of plausible models (Kirk et al., 2013). We therefore ing us to ask how additional experiments can refine the wild-type
modified the GEMSTAT program to perform a comprehensive ensemble.
exploration of the multi-dimensional parameter space, with the
goal of constructing a complete map of plausible quantitative Data from Perturbation Experiments Narrow Down the
models. To this end, we first generated a large number of Range of Plausible Models
13-dimensional vectors (parameter assignments) as follows (Fig- Wild-type data may not have sufficient information to constrain a
ure 3A; Supporting Online Information). We partitioned each pa- multi-parameter model such that it captures the precise extent of
rameter’s range into halves, which gave us 213 compartments of each TF’s effect on the target gene. To further constrain the
the parameter space. From each compartment, we sampled values of model parameters, we examined how well models in
1000 parameter vectors and scored them for their goodness of the ensemble predict the effects of the following genetic pertur-
fit to data. Next, we sorted these 1,000 3 213 parameter vectors bations for which we have data from the literature.
based on their scores, and for each parameter vector with a
d Mutation of sna: the ind expression domain remains essen-
score among the top 2% of unique scores in this sorted list,
tially unaltered in sna mutants. vnd expression is de-
we optimized the GEMSTAT model using that vector as initial es-
repressed in these embryos and expands ventrally such
timate of model parameters. The resulting collection of opti-
that ind stays repressed in the endogenous domain of
mized models can predict ind expression accurately in the
sna expression (Figure S2A).
wild-type condition (Figure S1B), with little dispersion in their
d Mutation of vnd: the ind expression domain expands
predictions (maximum difference with the mean is <0.07 at any
ventrally, yet does not encroach into the mesoderm region,
position along the D/V axis). We call this collection of models
in vnd mutants (McDonald et al., 1998; Weiss et al., 1998).
(21,000 in total) the ‘‘wild-type ensemble.’’ We note that an
d Mutation of Cic-binding sites in the ind enhancer: the
alternate strategy to compute such an ensemble would be to
readout of the ind enhancer expands dorsally, to an extent
sample from the parameter space using Monte Carlo-based
that matches the spatial domain of the Dl protein, upon
techniques (Toni et al., 2009). However, there are major chal-
mutating two particular Cic sites in the enhancer (Lim
lenges associated with these methods (e.g., slow convergence
et al., 2013).
to the stationary distribution and consequently less control
over the coverage of the parameter space, the need for pre- These are the only perturbations reported in the literature that
computing an ensemble of models that covers the parameter manifest direct effects on ind expression. We used each model
space; Toni et al., 2009), which motivated us to choose the to predict ind expression upon knocking down a TF, by setting
exhaustive strategy described above. the DNA-binding parameter of the respective TF to zero. Addi-
The 21,000 models of the wild-type ensemble spanned tionally, when simulating sna knockdown, we replaced the
widely different compartments of the parameter space (652 out spatial patterns of vnd and egfr with their altered patterns in
of 213 z8000), and the marginal densities of individual parame- sna mutants (Figures S2B and S2C). To predict the effect of
ters showed high variance and multimodality (Figure 3B). This mutating a site, we discarded the site from our set of annotated
suggested the existence of many distinct parameterizations TF-binding sites in the ind enhancer (Experimental Procedures
that explain the wild-type data equally well, and we asked if and Supplemental Experimental Procedures). In evaluating
this meant that many distinct mechanistic hypotheses explain models on perturbation data, we focused on carefully selected
the data on ind regulation. To this end, we summarized the domains along the D/V axis that, we reasoned, should provide
models as motifs that visually depict how a particular model uti- adequate information about the accuracy of model predictions
lizes each TF to regulate ind in five key spatial domains along the (Experimental Procedures).
D/V axis (Figure 3C, legend). Columns in a motif correspond to For each of the 21,000 models in the wild-type ensemble, we
domains, symbols denote the regulator TFs, and the height of evaluated its predictions on perturbation data and discarded
a symbol in a column represents the contribution of that TF in every model that failed to correctly predict the known effects.
the corresponding region. We found 2,100 models whose predictions are accurate in
Hierarchical k-medoids clustering of the motifs for the wild- both wild-type and the three perturbation conditions (Figure 4A.
type ensemble revealed at least 36 distinct sets of mechanistic We call these models the ‘‘filtered ensemble.’’ Parameters of
hypotheses (motifs) that are supported by the wild-type data these models were found to be far more constrained than those
(Figure 3D). As an example of how these hypotheses differ, we of the initial ensemble, falling into 42 of the 213 compartments in
marked three motifs in the bottom row with asterisks, which sug- the parameter space, whereas the wild-type ensemble models
gest three distinct mechanisms of activating ind in its peak span 652 compartments. (Also compare the solid to the dotted
domain of expression (third column in the motif): (1) Zld is the curves in Figure 3B, which shows marginal distributions of pa-
dominant activator of ind, while Dl does not have an important rameters.) Considering perturbation data in this manner greatly
role to that end (marked with a red asterisk); (2) Dl is the dominant narrowed down the possible explanations of ind regulation,
activator of ind, while Zld does not play a strong role in activating with the only surviving mechanistic hypotheses being those
ind (marked with a green asterisk); and (3) neither Dl nor Zld shown as motifs in the first row of Figure 3D. Whereas models
alone, but only their synergistic interaction, is the dominant input in the wild-type ensemble could explain ind regulation without
toward activating ind (marked with a blue asterisk). Clearly, a one of our three assumed repressors (the three motifs marked

400 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
Figure 3. Construction and Visualization of Model Ensembles
(A) Left to right: sampling of parameter vectors, scoring, model optimization initiated from each parameter vector that scored above the threshold (the resulting set
of optimized models is the ‘‘wild-type ensemble’’), and filtering of wild-type models according to their accuracy in predicting the effects of various perturbations.
The remaining models (not crossed out) constitute the ‘‘filtered ensemble.’’
(B) Marginal densities of parameters of the wild-type and the filtered ensemble models (dashed and solid lines, respectively).
(C) The motif for a model shows how it utilizes each TF to regulate ind in five domains along the D/V axis. Each domain corresponds either to the peak expression
domain of ind (domain 3) or a TF (domains 1 and 2: peak expression domains of Sna and Vnd, respectively) or to a domain where the effect on ind expression is

(legend continued on next page)


Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 401
with purple asterisks in the bottom row of Figure 3D), the filtered (Supplemental Experimental Procedures). Our filtered ensemble
ensemble unambiguously supports the need for all three repres- predicts that ind expression will reduce by 60%–100% (mean
sors: Sna and Vnd repress ind in the mesoderm and the neuro- 85%) of its wild-type level upon this perturbation (Figure 4D).
ectoderm domains, and Cic plays a role in defining the dorsal The experiment showed 65% reduction in peak ind expression,
border of ind. This filtering step also removed from the wild- which validates the model prediction and supports a dominant
type ensemble those models that implied a very weak activating role of Dl in ind activation (Figures 4F–4H; Supplemental Informa-
input from Dl (low KDL, aDL; dotted lines in panels labeled ‘‘DL’’; tion; Figure S3). This investigation illustrates that an ensemble of
Figure 3B). Such models overestimate the activating role of Zld models can make unambiguous predictions despite large varia-
and thus overpredict the expression level of ind in the dorsal tions in individual parameters, as has been previously argued
ectoderm upon Cic site mutagenesis, leading to their exclusion more generally for multi-parameter ordinary differential equation
from the filtered ensemble. (ODE)-based models (Gutenkunst et al., 2007). Our investigation
is also significant in that it correctly predicts a major effect of Dl
Predicting the Effect of Mutating Activator-Binding site mutagenesis in one experiment and no effect in a different
Sites mutagenesis experiment (Garcia and Stathopoulos, 2011) for
An important aspect of the utility of any modeling approach is the same TF (Figures 4C and 4D). It is extremely difficult to
its ability to make testable predictions. Here, we show how we make such nuanced predictions based on qualitative reasoning
utilized our filtered ensemble to predict the effects of Dl and alone.
Zld in activating ind and how we validated those predictions The filtered ensemble predicts that Zld-induced activation is
experimentally. necessary for wild-type ind expression and, specifically, that
ind expression is known to be abolished in Dl mutants (von Oh- ind expression should reduce, on average, to 50% of its
len and Doe, 2000) and to become weaker in Zld mutants (Nien peak wild-type level upon mutating the four strongest Zld-bind-
et al., 2011). However, both Dl and Zld are also implicated in ing sites (out of five sites) in the ind enhancer (Figure 4E). We
regulating several direct regulators of ind, e.g., sna, vnd, rho, tested this prediction and noted that ind expression is indeed
and egfr (Hong et al., 2008; Nien et al., 2011); hence, their genetic reduced in transgenic embryos where Zld sites were mutated
effects comprise a combination of direct and indirect influences. (Figures 4I–4K; Supplemental Information; Figure S3) to approx-
To accurately characterize the direct activating roles of Dl and imately half of the endogenous levels. The expression of ind in
Zld, one needs to mutate their binding sites in the ind enhancer. these embryos is considerably more variable than its endoge-
To this end, we focused here on the computationally identified nous expression, leading us to speculate whether the apparent
binding sites for Dl and Zld in the ind enhancer (Figure 4B; Sup- reduction in expression level is due to increased noise (i.e., ind
plemental Experimental Procedures). has bimodal expression, at a basal level and at a level compara-
To our knowledge, the only prior experimental study that ex- ble to its endogenous peak level) or due to an overall reduction in
amines direct effects of Dl on ind is that of Garcia and Stathopou- expression level within the nuclei where ind is expressed. We
los (Garcia and Stathopoulos, 2011), who mutated a Dl-binding find further analyses support the latter proposition (Figure 4K).
site in the ind enhancer and found no significant change in ind Thus, the filtered ensemble reveals Dl as the dominant acti-
expression, leading them to speculate that Dl only partially sup- vator of ind, while also demonstrating an important activating
ports ind activation. Predictions from our filtered ensemble for role for Zld. There remain uncertainties in parameter values in
the particular mutation performed by Garcia and Stathopoulos the ensemble (Figure 3B), and these uncertainties can in some
agree with their report (Figure 4C). The filtered ensemble also cases translate to ambiguous predictions for specific perturba-
predicts, unambiguously, that removal of all Dl sites should tions. We revisit this point in the Discussion, where we show
abolish ind expression (Figure S2D), suggesting a dominant how the modeling framework suggests the most informative ex-
role of Dl in activating ind. (This also means that Zld alone cannot periments to perform in order to resolve such ambiguities.
activate the gene.) We confirmed this prediction experimentally
in transgenic embryos through mutating three evolutionarily Models in the Filtered Ensemble Explain ind Regulation
conserved sites of Dl in the ind enhancer (Figure 4B; Figure S2E). in Other Drosophilids
In particular, we mutated the above three sites to a sequence One potential issue with the filtered ensemble models is that they
which is both shorter than the known Dl-binding site and has a might be ‘‘overfit’’ to the specific number and arrangement of TF-
very low affinity score based on the position weight matrix of binding sites in the D. melanogaster ind enhancer, as opposed to
Dl. We also confirmed computationally that these mutations do capturing a more general logic. If this were the case, we would
not create any new site for the TFs considered in our model. expect the filtered ensemble of models to contain features that
The reporter sequences containing the mutated Dl sites were are specific to D. melanogaster and not conserved across Dro-
then integrated at the same chromosomal location, which al- sophilids. We asked whether our filtered ensemble models of
lowed us to make direct comparisons of their expression levels the D. melanogaster ind enhancer can explain features found in

known for a specific site-mutagenesis experiment (domains 4 and 5: results known for Cic site mutagenesis). Columns in a motif correspond to domains, symbols
denote the regulator TFs, and the height of a symbol in a column represents the contribution of that TF in the corresponding region; if a TF f is an activator, then its
height in a column represents the root-mean-square error (RMSE) between model-predicted ind expression profiles in the corresponding domain when there is
no activator and when f is the only activator in the model. Similarly, the height of a repressor f is computed from the conditions when there is no repressor and
when f is the only repressor in the model.
(D) Representative motifs of the 36 clusters computed from the wild-type ensemble models.

402 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
Figure 4. Predictions of the Filtered Ensemble and Their Experimental Validations
(A) Predictions of filtered ensemble under perturbed conditions; left to right: mutation of sna, vnd, and two Cic sites in the ind enhancer. Shown are the mean (red)
and the range (shaded red area around the curve) of ensemble predictions. Plots in (C)–(E) follow the same semantics.
(B) Computationally identified sites for Dl (green) and Zld (cyan) in ind enhancer. Yellow and gray sequences show mutations to disrupt Dl and Zld sites,
respectively.

(legend continued on next page)


Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 403
the orthologous ind enhancers of ten other Drosophilids species. tions that suggested a minor function of Dl in this context. We
As shown in Figure 5, for orthologs in the species related closely note that, since a direct activating role of Dl was an assumption
to D. melanogaster, the filtered ensemble models predict read- of the model, the predictive value of our model is more in its abil-
outs that are very similar to the ind expression pattern in ity to identify that ind cannot be induced in the absence of Dl sites
D. melanogaster. For the orthologs from the more distantly and that removal of a particular subset of Dl sites leads to
related species, e.g., D. grimshawii, the filtered ensemble in- strongly reduced ind expression. Our experimental data on Dl
cludes some models that are able to predict the expression site mutagenesis is therefore not only conforming to our
domain with reasonable accuracy, and also models whose pre- assumption, but also to a quantitative prediction. Second, the
dictions deviate substantially from the ind pattern. (The excep- ensemble emphasizes an important role of Zld in expressing
tion to this was the ortholog in D. virilis, which shows very poor ind. Recent work (Foo et al., 2014; Li et al., 2014; Sun et al.,
sequence conservation with the D. melanogaster ind enhancer; 2015) proposes that Zld functions primarily as a chromatin re-
see Figure S1.) Our observation suggests that the filtered modeler for these genes rather than imparting a direct activating
ensemble, despite being trained on several perturbation experi- input. However, such a direct role for Zld has been implicated
ments in addition to wild-type data, has sufficient diversity of previously (Hamm et al., 2015; ten Bosch et al., 2006). Our
models to capture the logic of orthologous ind enhancers. filtered ensemble models comprise two classes where one class
of models suggests only an indirect activating role (possibly
DISCUSSION through chromatin remodeling) for Zld, while the other class sug-
gests an additional direct activating input from Zld is necessary.
Recent technological advances allow the rapid generation of As such, our current modeling cannot explain unambiguously
hypotheses about a biological system. Because biological how Zld activates ind. However, the filtered ensemble suggests
problems are often under-constrained, the challenge becomes an experiment that can disambiguate the mechanism. For this,
reconciling different hypotheses about the same system. Multi- we considered model predictions in spatial domains where Zld
parameter computational models are one way to unify diverse is the only regulator of ind. A non-basal level of ind expression
hypotheses into a comprehensive description of one system. under such a condition will imply the existence of a direct acti-
However, with more parameters comes the burden of estimating vating input from Zld, whereas basal levels will imply a lack
those parameters and addressing parameter uncertainty (Guten- thereof and signify Zld’s role as a facilitator of the DNA binding
kunst et al., 2007). Ensemble modeling is a powerful solution to of Dl (Supplemental Information). One such setup is available
this problem, with demonstrated success in the context of signal in the dorsal-ectoderm region upon mutating all Cic sites: this
transduction networks, protein folding and climate change, leaves Zld as the sole regulator in that region, and as summa-
among others. rized in Figure S4, predicted ind expression upon Cic site muta-
A major contribution of our work is the rigorous demonstration tion shows large uncertainty in the dorsal-ectoderm region. Like-
of how ensemble modeling may benefit a complex, multi-param- wise, the filtered ensemble motifs (Figure 3D) exhibit maximum
eter model of transcriptional cis regulation by refining its param- disagreement in the fifth column, i.e., in predicting the effect of
eter estimates in a systematic and unbiased manner. Notably, Cic site mutation in the dorsal-ectoderm region. We therefore
integrating perturbation data into our analysis did not refine propose mutation of all Cic sites as an experiment that can
parameter estimates to precise points. However, it did lead to disambiguate Zld’s mode of function. This is an illustration of
rejection of a large fraction of models from the wild-type how ensemble-based modeling can provide guidance about
ensemble, for example, models where Sna (or Vnd) alone can the most informative experiments to perform next.
explain the ventral repression of ind. This illustrates why An important assumption in this study was that a post-transla-
‘‘learning’’ in biological systems should not be defined only as tional modification of Cic by ERK may inhibit DNA binding of Cic
reduction of a mechanistic parameter to a point estimate; and de-repress ind in a manner that depends on ERK’s spatial
reducing the acceptable ranges of a mechanistic parameter, or distribution. Without such an assumption of a localized de-
of a combination of parameters, can be equally valuable. More- repression of ind from the repressive effect of Cic, we were
over, such modeling can help us comprehend the disparate unable to fit any model with reasonable accuracy. While ERK-
experimental evidence pertaining to regulation of the ind gene mediated de-repression of ind from Cic has experimental evi-
in Drosophila. dence (Ajuria et al., 2011; Lim et al., 2015; Lim et al., 2013), our
What insights does our filtered ensemble provide about ind assumption may not be the only mechanistic explanation to
regulation? First, the ensemble establishes a dominant, direct this phenomenon. Alternative hypotheses about spatially local-
role of Dl in activating ind, and contradicts the previous observa- ized modifications in the influence of CIC on transcription

(C and D) Filtered ensemble models do not predict any significant change in ind expression upon the mutations reported in (Garcia and Stathopoulos, 2011) (C)
but predict that ind expression nearly abolishes upon removing the additional sites for Dl (D).
(E) Filtered ensemble models predict 50% reduction in ind expression upon mutating Zld sites in ind enhancer.
(F) The ind enhancer was used to drive expression that recapitulates the endogenous ind expression (ind1.4WT-lacZ) and Dl sites in the enhancer were mutated
(ind1.4Dlmut-lacZ). Embryos were co-stained with ind (red) and lacZ (yellow).
(G) Histograms of mean intensity values computed from bootstrapped profiles; asterisks mark bootstrapped mean values.
(H) Smoothed histograms from wild-type and mutant lacZ profiles (each created from 20 profiles [one per embryo] on 256 bins [one per intensity value]).
(I) Zld sites in the enhancer were mutated (ind1.4zldmut-lacZ). Embryos were co-stained with ind (red) and lacZ (white).
(J and K) Plots analogous to (G) and (H).

404 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
Figure 5. Predictions of the Filtered Ensemble Models for the Orthologs of the D. melanogaster ind Enhancer in Ten Other Drosophilids
See the legend of Figure S1 for the details of how the orthologs were extracted. Semantics of the plots are the same as that of Figure 4A. We also show in
parentheses the number of TF-binding sites in each ortholog. Additionally, the dotted red curve for each ortholog shows the best prediction of a filtered ensemble
model for the corresponding ortholog. The goodness of a model prediction for a given ortholog is defined as the sum of the squared error between the model
prediction and ind expression data in D. melanogaster.

initiation or on activator recruitment are also plausible. Notably, Zld-mediated gene activation is still unclear (Supplemental
these alternative hypotheses rely not on attenuation of Cic’s Information), simply rejecting a direct activating input from
DNA binding but on the potency of Cic’s interaction with tran- Zld, as one would if one were taking a more conventional
scription initiation machinery or with other TFs. One can also approach, would produce a biased working hypothesis, and
imagine a scenario where ERK activates some yet-unknown opportunities for follow-up experiments could be missed. A
non-repressive TF that competes with Cic to bind DNA. Ascer- systematic ensemble approach can thus improve sequence
taining any such mechanism is a subject for future studies. to expression models by identifying mechanistic hypotheses
One might ask if modeling a larger dataset (i.e., multiple en- that are consistent with the entire dataset and also those
hancers) could make the conventional approach of computing that conform to different subsets of the data, thereby speci-
one or a few optimal models as effective as our ensemble fying the experiments for follow-up.
approach in terms of eliminating incorrect mechanistic expla-
nations. It is important to note that the conventional approach
EXPERIMENTAL PROCEDURES
attempts to discover models consistent with the entire data-
set. However, when relevant TFs and mechanisms of their The GEMSTAT Model
functions are not well understood, a systematic ensemble To estimate the probability that TFs bound to an enhancer regulates the
approach for individual enhancers could provide insights that expression of a target gene, GEMSTAT considers the ‘‘statistical weight’’ Zc
the one or few optimal models for the entire dataset would of each configuration c in the ensemble of all possible configurations of occu-
have missed. For example, when trying to fit the enhancers pied and unoccupied TF-binding sites and the promoter. Detailed formulation
of Zc and the algorithm to compute probability of mRNA expression from Zc
of ind and sog simultaneously (data not shown), we consis-
are given in Supplemental Information; here, we mention the parameters
tently discovered models that do not use a direct activating
that define Zc . For a given site S, the binding of a TF f at S contributes a sta-
input from Zld, presumably because the assumed inputs do tistical weight of qf;s = Kf;s ½f to Zc . Here Kf;s is the equilibrium constant of the
not include a dorsal repressor for sog. Given that our under- DNA-binding reaction between f and s, and ½f is the concentration of f Let
standing about the regulators of sog and the mechanism of Sfmax denote the strongest binding site of f and KðSfmax Þ denote the association

Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 405
constant of TF-DNA binding between f and Sfmax . We rewrite Kf;s as Received: August 10, 2015
KðSfmax ÞexpðbDEf;s Þ, where b is the Boltzmann constant and DEf;s denotes Revised: October 19, 2015
the ‘‘mismatch energy’’ of the site S relative to Sfmax for f. The concentration Accepted: December 2, 2015
½f is in arbitrary units and can be rewritten as v½frel , where ½frel is the concen- Published: December 23, 2015
tration of f relative to some unknown reference value v. Therefore,
  REFERENCES
qf;S = K Sfmax v½frel expðbDEf;s Þ;

where KðSfmax Þ and v are unknown quantities. We take their product KðSfmax Þ Ajuria, L., Nieva, C., Winkler, C., Kuo, D., Samper, N., Andreu, M.J., Helman,
as a free parameter and refer to it as the ‘‘DNA-binding parameter’’ for f. In A., González-Crespo, S., Paroush, Z., Courey, A.J., and Jiménez, G. (2011).
addition to the qf;S terms, Zc includes the following multiplicative terms: (1) Capicua DNA-binding sites are general response elements for RTK signaling
uf1;f2 for each instance of two interacting TFs f1 and f2 bound in configuration in Drosophila. Development 138, 915–924.
c, (2) af for each instance of a TF f bound to one of its cognate sites and when c Astigarraga, S., Grossman, R., Dı́az-Delfı́n, J., Caelles, C., Paroush, Z., and
is a BTM-bound configuration, and (3) qBTM whenever c is a BTM-bound Jiménez, G. (2007). A MAPK docking site is critical for downregulation of
configuration. Capicua by Torso and EGFR RTK signaling. EMBO J. 26, 668–677.
Berg, O.G., and von Hippel, P.H. (1987). Selection of DNA binding sites by reg-
Filtering Models Based on Perturbation Data ulatory proteins. Statistical-mechanical theory and application to operators
Qualitative effects on the ind expression pattern upon various perturbations and promoters. J. Mol. Biol. 193, 723–750.
were mentioned in Results; we discard every model that fails to meet the
Bintu, L., Buchler, N.E., Garcia, H.G., Gerland, U., Hwa, T., Kondev, J., and
following quantitative criteria in predicting those effects.
Phillips, R. (2005). Transcriptional regulation by the numbers: models. Curr.
(1) Upon Sna knockout, a model should predict the mean probability of ind
Opin. Genet. Dev. 15, 116–124.
expression in the ventral-most 20% of the D/V axis to be low (%0.05 times of
the probability of ind expression at its peak domain). (2) Upon Vnd knockout, a Brown, K.S., Hill, C.C., Calero, G.A., Myers, C.R., Lee, K.H., Sethna, J.P., and
model should predict the mean probability of ind expression at locations of Cerione, R.A. (2004). The statistical mechanics of complex signaling networks:
peak Vnd expression (at least 80% of its maximum concentration level) to nerve growth factor signaling. Phys. Biol. 1, 184–195.
be high (mean probability of expression R0.80). (3) Upon mutation of two Cheng, Q., Kazemian, M., Pham, H., Blatti, C., Celniker, S.E., Wolfe, S.A.,
(out of four) Cic-binding sites, a model should predict ind expression to expand Brodsky, M.H., and Sinha, S. (2013). Computational identification of diverse
dorsally (mean probability of expression R0.80 at the location where wild-type mechanisms underlying transcription factor-DNA occupancy. PLoS Genet.
ind expression is half-maximal at its dorsal boundary) and to remain low ((% 9, e1003571.
0.05 times of the probability of ind expression at its peak domain) in the dor-
Cohen, M., Page, K.M., Perez-Carrasco, R., Barnes, C.P., and Briscoe, J.
sal-most 20% of the D/V axis.
(2014). A theoretical framework for the regulation of Shh morphogen-
controlled gene expression. Development 141, 3868–3878.
Model Optimization
Cowden, J., and Levine, M. (2003). Ventral dominance governs sequential pat-
GEMSTAT optimizes the root-mean-square error (RMSE) function in course
terns of gene expression across the dorsal-ventral axis of the neuroectoderm
of parameter estimation. At location i along the D/V axis, let Di and Mi
in the Drosophila embryo. Dev. Biol. 262, 335–349.
denote the ind expression level and model-predicted readout of the ind
enhancer, respectively. Assuming there are n data points, the RMSE for Dissanayake, K., Toth, R., Blakey, J., Olsson, O., Campbell, D.G., Prescott,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi A.R., and MacKintosh, C. (2011). ERK/p90(RSK)/14-3-3 signalling has an
the predicted expression pattern is ð1=nÞ ðDi  bMi Þ2 , where b is a
i impact on expression of PEA3 Ets transcription factors via the transcriptional
free parameter as used in standard least square estimation. We constrain repressor capicúa. Biochem. J. 433, 515–525.
b%2, since allowing b to assume arbitrary values may scale up and assign Dresch, J.M., Liu, X., Arnosti, D.N., and Ay, A. (2010). Thermodynamic
good RMSE scores to very low expression levels, even as low as one modeling of transcription: sensitivity analysis differentiates biological mecha-
may expect from randomly generated sequences—a problematic issue nism from mathematical model-induced effects. BMC Syst. Biol. 4, 142.
for sequence-to-expression models (He et al., 2012). Several other
Fakhouri, W.D., Ay, A., Sayal, R., Dresch, J., Dayringer, E., and Arnosti, D.N.
sequence-to-expression models also constrained the value of b in opti-
(2010). Deciphering a transcriptional regulatory code: modeling short-range
mizing RMSE (Kazemian et al., 2010; Kim et al., 2013; Segal et al., 2008).
repression in the Drosophila embryo. Mol. Syst. Biol. 6, 341.
GEMSTAT uses the Nelder-Mead simplex and the quasi-Newton gradient
descent algorithms to optimize RMSE. Foo, S.M., Sun, Y., Lim, B., Ziukaite, R., O’Brien, K., Nien, C.Y., Kirov, N.,
Shvartsman, S.Y., and Rushlow, C.A. (2014). Zelda potentiates morphogen
activity by increasing chromatin accessibility. Curr. Biol. 24, 1341–1346.
SUPPLEMENTAL INFORMATION
Garcia, M., and Stathopoulos, A. (2011). Lateral gene expression in Drosophila
Supplemental Information includes Supplemental Experimental Procedures early embryos is supported by Grainyhead-mediated activation and tiers of
and four figures and can be found with this article online at http://dx.doi.org/ dorsally-localized repression. PLoS ONE 6, e29172.
10.1016/j.cels.2015.12.002. Gertz, J., Siggia, E.D., and Cohen, B.A. (2009). Analysis of combinatorial cis-
regulation in synthetic and genomic promoters. Nature 457, 215–218.
AUTHOR CONTRIBUTIONS Granek, J.A., and Clarke, N.D. (2005). Explicit equilibrium modeling of tran-
scription-factor binding and gene regulation. Genome Biol. 6, R87.
Conceptualization, M.A.H.S., S.S., S.Y.S.; Methodology, M.A.H.S.; Software,
Grimm, O., Sanchez Zini, V., Kim, Y., Casanova, J., Shvartsman, S.Y., and
M.A.H.S.; Formal Analysis, M.A.H.S.; Investigation, M.A.H.S., B.L., N.S.; Re-
Wieschaus, E. (2012). Torso RTK controls Capicua degradation by changing
sources, M.A.H.S., B.L., N.S.; Data Curation, M.A.H.S., B.L., N.S.; Writing – Orig-
its subcellular localization. Development 139, 3962–3968.
inal Draft, M.A.H.S., S.S.; Writing – Review and Editing, All authors; Visualization,
M.A.H.S.; Supervision, S.S., S.Y.S., G.J.; Funding Acquisition, S.S., S.Y.S., Gutenkunst, R.N., Waterfall, J.J., Casey, F.P., Brown, K.S., Myers, C.R., and
C.A.R., H.L., G.J. Sethna, J.P. (2007). Universally sloppy parameter sensitivities in systems
biology models. PLoS Comput. Biol. 3, 1871–1878.
ACKNOWLEDGMENTS Hamm, D.C., Bondra, E.R., and Harrison, M.M. (2015). Transcriptional activa-
tion is a conserved feature of the early embryonic factor Zelda that requires a
This work was supported by NSF grant EFRI 1136913, MICINN grant BFU2011- cluster of four zinc fingers for DNA binding and a low-complexity activation
23611, NIH grant R01GM086537, and NIH grant R01 5R01GM114341. domain. J. Biol. Chem. 290, 3508–3518.

406 Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc.
He, X., Samee, M.A., Blatti, C., and Sinha, S. (2010). Thermodynamics-based Platt, J.R. (1964). Strong Inference: Certain systematic methods of scientific
models of transcriptional regulation by enhancers: the roles of synergistic thinking may produce much more rapid progress than others. Science 146,
activation, cooperative binding and short-range repression. PLoS Comput. 347–353.
Biol. 6, 6. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U., and Gaul, U. (2008).
He, X., Duque, T.S., and Sinha, S. (2012). Evolutionary origins of transcription Predicting expression patterns from regulatory sequence in Drosophila seg-
factor binding site clusters. Mol. Biol. Evol. 29, 1059–1070. mentation. Nature 451, 535–540.
Hong, J.W., Hendrix, D.A., Papatsenko, D., and Levine, M.S. (2008). How the Shea, M.A., and Ackers, G.K. (1985). The OR control system of bacteriophage
Dorsal gradient works: insights from postgenome technologies. Proc. Natl. lambda. A physical-chemical model for gene regulation. J. Mol. Biol. 181,
Acad. Sci. USA 105, 20072–20076. 211–230.
Janssens, H., Hou, S., Jaeger, J., Kim, A.R., Myasnikova, E., Sharp, D., and Shlyueva, D., Stampfel, G., and Stark, A. (2014). Transcriptional enhancers:
Reinitz, J. (2006). Quantitative and predictive model of transcriptional control from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286.
of the Drosophila melanogaster even skipped gene. Nat. Genet. 38, 1159– Stathopoulos, A., and Levine, M. (2005). Localized repressors delineate the
1165. neurogenic ectoderm in the early Drosophila embryo. Dev. Biol. 280, 482–493.
Kanodia, J.S., Liang, H.L., Kim, Y., Lim, B., Zhan, M., Lu, H., Rushlow, C.A., Stormo, G.D. (2000). DNA binding sites: representation and discovery.
and Shvartsman, S.Y. (2012). Pattern formation by graded and uniform signals Bioinformatics 16, 16–23.
in the early Drosophila embryo. Biophys. J. 102, 427–433. Sun, Y., Nien, C.Y., Chen, K., Liu, H.Y., Johnston, J., Zeitlinger, J., and
Kazemian, M., Blatti, C., Richards, A., McCutchan, M., Wakabayashi-Ito, N., Rushlow, C. (2015). Zelda overcomes the high intrinsic nucleosome barrier
Hammonds, A.S., Celniker, S.E., Kumar, S., Wolfe, S.A., Brodsky, M.H., and at enhancers during Drosophila zygotic genome activation. Genome Res.
Sinha, S. (2010). Quantitative analysis of the Drosophila segmentation regula- 25, 1703–1714.
tory network using pattern generating potentials. PLoS Biol. 8, e1000456. Swigon, D. (2013). Ensemble modeling of biological systems. In De Gruyter
Kim, A.R., Martinez, C., Ionides, J., Ramos, A.F., Ludwig, M.Z., Ogawa, N., Series in Mathematics and Life Sciences 1, A. Antoniouk and R. Melnik, eds.
Sharp, D.H., and Reinitz, J. (2013). Rearrangements of 2.5 kilobases of non- (De Gruyter), pp. 19–42.
coding DNA from the Drosophila even-skipped locus define predictive rules Tebaldi, C., and Knutti, R. (2007). The use of the multi-model ensemble in prob-
of genomic cis-regulatory logic. PLoS Genet. 9, e1003243. abilistic climate projections. Philos. Trans. A Math Phys. Eng. Sci. 365, 2053–
Kirk, P., Thorne, T., and Stumpf, M.P. (2013). Model selection in systems and 2075.
synthetic biology. Curr. Opin. Biotechnol. 24, 767–774. ten Bosch, J.R., Benavides, J.A., and Cline, T.W. (2006). The TAGteam DNA
motif controls the timing of Drosophila pre-blastoderm transcription.
Kuepfer, L., Peter, M., Sauer, U., and Stelling, J. (2007). Ensemble modeling for
Development 133, 1967–1977.
analysis of cell signaling dynamics. Nat. Biotechnol. 25, 1001–1006.
Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M.P. (2009).
Li, X.Y., Harrison, M.M., Villalta, J.E., Kaplan, T., and Eisen, M.B. (2014).
Approximate Bayesian computation scheme for parameter inference and
Establishment of regions of genomic activity during the Drosophila maternal
model selection in dynamical systems. J. R. Soc. Interface 6, 187–202.
to zygotic transition. eLife 3, 3.
Villaverde, A., Bongard, S., Mauch, K., Müller, D., Balsa-Canto, E., Schmid, J.,
Liberman, L.M., and Stathopoulos, A. (2009). Design flexibility in cis-regulatory
and Banga, J. (2014). High-confidence predictions in systems biology dynamic
control of gene expression: synthetic and comparative evidence. Dev. Biol.
models. In 8th International Conference on Practical Applications of
327, 578–589.
Computational Biology & Bioinformatics (PACBB 2014), J. Saez-Rodriguez,
Lim, B., Samper, N., Lu, H., Rushlow, C., Jiménez, G., and Shvartsman, S.Y. M.P. Rocha, F. Fdez-Riverola, and J.F. De Paz Santana, eds. (Springer
(2013). Kinetics of gene derepression by ERK signaling. Proc. Natl. Acad. International Publishing), pp. 161–171.
Sci. USA 110, 10330–10335.
von Dassow, G., Meir, E., Munro, E.M., and Odell, G.M. (2000). The segment
Lim, B., Dsilva, C.J., Levario, T.J., Lu, H., Schupbach, T., Kevrekidis, I.G., and polarity network is a robust developmental module. Nature 406, 188–192.
Shvartsman, S.Y. (2015). Dynamics of inductive ERK signaling in the von Ohlen, T., and Doe, C.Q. (2000). Convergence of dorsal, dpp, and egfr
Drosophila embryo. Curr. Biol. 25, 1784–1790. signaling pathways subdivides the drosophila neuroectoderm into three dor-
McDonald, J.A., Holbrook, S., Isshiki, T., Weiss, J., Doe, C.Q., and Mellerick, sal-ventral columns. Dev. Biol. 224, 362–372.
D.M. (1998). Dorsoventral patterning in the Drosophila central nervous system: Weiss, J.B., Von Ohlen, T., Mellerick, D.M., Dressler, G., Doe, C.Q., and Scott,
the vnd homeobox gene specifies ventral column identity. Genes Dev. 12, M.P. (1998). Dorsoventral patterning in the Drosophila central nervous system:
3603–3612. the intermediate neuroblasts defective homeobox gene specifies intermediate
Nien, C.Y., Liang, H.L., Butcher, S., Sun, Y., Fu, S., Gocha, T., Kirov, N., column identity. Genes Dev. 12, 3591–3602.
Manak, J.R., and Rushlow, C. (2011). Temporal coordination of gene networks White, M.A., Parker, D.S., Barolo, S., and Cohen, B.A. (2012). A model of
by Zelda in the early Drosophila embryo. PLoS Genet. 7, e1002339. spatially restricted transcription in opposing gradients of activators and re-
Papatsenko, D., and Levine, M.S. (2008). Dual regulation by the Hunchback pressors. Mol. Syst. Biol. 8, 614.
gradient in the Drosophila embryo. Proc. Natl. Acad. Sci. USA 105, 2901–2906. Yáñez-Cuna, J.O., Kvon, E.Z., and Stark, A. (2013). Deciphering the transcrip-
Parker, D.S., White, M.A., Ramos, A.I., Cohen, B.A., and Barolo, S. (2011). The tional cis-regulatory code. Trends Genet. 29, 11–22.
cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding Zinzen, R.P., and Papatsenko, D. (2007). Enhancer responses to similarly
affinity, competition, and cooperativity. Sci. Signal. 4, ra38. distributed antagonistic gradients in development. PLoS Comput. Biol. 3, e84.

Cell Systems 1, 396–407, December 23, 2015 ª2015 Elsevier Inc. 407

Vous aimerez peut-être aussi