Vous êtes sur la page 1sur 10

Measuring Dynamic Phosphorylation of Spo0A in Bacillus subtilis

Arvind Thiagarajan, Joseph H. Levine, Michael B. Elowitz


Departments of Biology,
Biological Engineering, and Applied Physics
California Institute of Technology
(Dated: November 1, 2011)
Sporulation is an intricately controlled process in Bacillus subtilis, promising a potential wealth of
novel network motifs. It has previously been shown that the key regulator involved in the induction
of sporulation is the phosphorylated form of the transcription factor Spo0A. For this reason, we
developed an experimental paradigm to measure Spo0A
P
levels dynamically in vivo. A strain of B.
subtilis was engineered in which the Spo0A gene had been replaced by a Spo0A-GFP fusion and
into which an array containing 256 copies of a Spo0A
P
binding site had been inserted. Imaging of
this strain revealed clusters of localized uorescence amidst diuse background uorescence, corre-
sponding respectively to bound Spo0A
P
and unbound Spo0A
P
and Spo0A. In order to determine
relative levels of Spo0A
P
and Spo0A from such images, we developed a machine learning algorithm
for identifying localized uorescence and quantifying the corresponding intensities. Subsequently,
we constructed a probabilistic model to estimate intracellular deviations in cluster intensities and
consequently determine the ratio between observed uorescence and number of molecules present.
A microuidic chemostat was optimized for the study of Spo0A
P
dynamics during sporulation in B.
subtilis, and to facilitate experiments in this device, we optimized a media to induce sporulation.
I. INTRODUCTION
A. Background
Systems Biology is a burgeoning new eld, encompass-
ing and impinging on many aspects of other biological
sciences. The subject deals with the quantitative analysis
of systems architectures in biological networks that lead
to emergent phenomena. Such analysis draws heavily on
electrical engineering, physics, and control theory, and
consequently appeals to scientists from a variety of elds.
Systems Biology is most interesting, however, because bi-
ological systems are complex: a qualitative description of
parts does not lead to a unique qualitative description of
the whole, and consequently quantitative data must be
used to determine the regimes between which a system
has qualitatively dierent behavior.
One particularly interesting subtopic in systems biol-
ogy is the control of cellular dierentiation programs.
In particular, while there are known network designs
that facilitate controlled dierentiation, it would seem,
a priori, that these networks, like all biological networks,
ought to operate on a time scale shorter than the length
of the cell cycle. The constant dilution of chemical
concentrations accompanying cell growth superimposes
a constitutive negative feedback on any system, and thus
makes it dicult for networks to operate over multiple
cell cycles. It is particularly intriguing, then, to study the
dierentiation of Bacillus subtilis into spores. B. subtilis
responds to a lack of nutrients by initiating rst several
rounds of growth followed by dierentiation into spores.
This delayed process is of great interest because, unlike
similar processes mediated by quorum sensing, this pro-
cess has been shown to occur in a cell-autonomous fash-
ion, independent of the medium in which the cells are
grown.
It is known that the master regulator for this dieren-
tiation process is the phosphorylated form of the tran-
scription factor Spo0A. There are multiple regulatory
pathways that both control and are controlled by the
phosphorylation level of Spo0A, and the Elowitz Lab has
analyzed many of these using genetic knockdowns and
knockouts of various key factors in the networks. How-
ever, any analysis of such mutations and their eects
must necessarily use as readout the level of phosphory-
lation of Spo0A. To this end, the Elowitz Lab had de-
vised a relatively simple readout system to probe Spo0A
phosphorylation dynamics. Under this readout system,
a promoter sensitive to the phosphorylated Spo0A drives
production of yellow uorescent protein (YFP), while a
constitutive promoter drives production of green uores-
cent protein (GFP) to serve as a control for the level of
uorescence and promoter activity. The promoter acti-
vated by phosphorylated Spo0A is bound more often as
the concentration of Spo0A
P
increases, and this leads to
increased production of the uorescent readout. Since
the concentration of uorescent protein in each cell is
constantly being diluted due to cell growth and division,
a dynamic model must then be used to extract the rel-
ative Spo0A
P
level as a function of time based on the
overall uorescence level.
This readout allows for comparative analyses of
Spo0A
P
levels, and even estimates of ratios between dif-
ferent levels, but it does not provide any information
about the actual number of phosphorylated Spo0A tran-
scription factors. Furthermore, the readout system does
not respond linearly to Spo0A
P
concentration, as the
binding of transcription factor to promoter follows a satu-
ration curve. Finally, the system has low time resolution,
Measuring Dynamic Phosphorylation of Spo0A in Bacillus subtilis 2
due to the discrepancy in time scales between binding ki-
netics and transcription/translation.
Using this readout system, it was found that Spo0A
phosphorylation is pulsed once every cell cycle in the
natural course of sporulation. This discovery implicated
several potential mechanisms by which these pulsed dy-
namics could enable cells to robustly defer sporulation.
However, the lab was unable to experimentally distin-
guish between these dierent mechanisms because of the
low time resolution of the readout system.
I have conducted research this summer to construct
and employ a new readout system, also based on time
lapse microscopy, which addresses these issues and, in
particular, introduces higher time resolution. Since only
phosphorylated Spo0A can bind DNA, my project inves-
tigated whether direct observation of Spo0A
P
DNA bind-
ing dynamics can give a more precise readout of Spo0A
P
dynamics in individual cells.
B. Binding Array Based Readout System
Under the modied system that I employed this sum-
mer, the chromosomal copy of the Spo0A gene was re-
placed with a fusion protein of Spo0A and the red uores-
cent protein mCherry. It has been shown that this fusion
protein does not interfere with the structural properties
of native Spo0A, and in particular does not inhibit the
binding of Spo0A
P
to the appropriate binding site. Fur-
thermore, a series of short identical DNA sequences, each
isolated from the P
Spo0F
promoter and capable of binding
only Spo0A
P
, were inserted into the B. subtilis genome.
The sequences together form a binding array for Spo0A
P
molecules. Since any Spo0A
P
-mCherry molecules bound
to this array are localized within a small area, their u-
orescent intensity is much more concentrated spatially
than the intensity of unbound Spo0A diusing around
the cell. As such, this modied readout system would
present concentrated spots of uorescence, the intensi-
ties of which would correspond to the number of Spo0A
P
present in the cell.
In order to employ this novel readout system, several
dierent aspects of the research process were carried out
simultaneously. We developed an algorithm for identify-
ing spots of localized uorescence and quantifying the rel-
ative intensities of these spots as well as the background
uorescence in each cell. Furthermore, a probabilistic
model was developed in which these determined inten-
sities were used to estimate the uorescences of single
bound and unbound Spo0A molecules, and consequently
to estimate intracellular concentrations of Spo0A
P
and
Spo0A. We also completed three particular subprojects in
an eort to create the ideal experimental system. First,
we utilized a system in which cells and intracellular u-
orescent spots could be observed under chemostatic con-
ditions over long time scales. Second, we developed a
media solution that would induce cells to sporulate, so
as to study sporulation consistently across a population
of cells. Finally, we performed experiments to determine
the level of Spo0A for which the dynamic range for local-
ized uorescence corresponding to Spo0A
P
is maximized.
All of these components will be used by continuing mem-
bers of the Elowitz Lab to study the ne timecourse of
Spo0A
P
dynamics during sporulation in B. subtilis.
II. METHODS AND RESULTS
Prior to discussing the work conducted this summer,
we would like to relate the general method by which cells
were imaged. Typically, cells were imaged on agarose
pads. These pads were prepared as follows: an appropri-
ate weight of agarose powder was rst mixed into SMS
solution by heating. If the cells were to be imaged for
extended lengths of time, this solution was then mixed
with the desired growth media. The nal agarose solution
was then applied uniformly on the surface of a cover slip
and sealed from above with another cover slip. Once the
sealed solution had cooled, the cover slips were removed
and the agarose slab in between was sliced into smaller
square pads. Cells were then applied on the surface of
these pads and, after allowing fteen minutes for excess
liquid to evaporate, the pads were placed face down in
petri dishes and imaged.
A. Identication and Quantication of Localized
Fluorescence
The problem that had to be solved in order to deter-
mine the intensities of spots of localized uorescence was
actually twofold. First, given an image with many cells,
we had to be able to identify, within a given cell, what
regions did or did not constitute a spot. Then, given such
a spot, an intensity value had to be assigned to the spot
in a meaningful way.
Prior to this summer, a validation strain of B. subtilis
was created in which a TetR-GFP fusion protein was ex-
pressed constitutively, and in which each cell possesed at
least one binding array for this fusion protein. This strain
was cultured simply because it aorded greater clarity in
dot identication than did the experiment strain involv-
ing Spo0A
P
binding arrays. This clarity would ensure an
identication algorithm with parameters less susceptible
to noise. Furthermore, since all the fusion proteins in
the validation strain were capable of binding to the ar-
rays, we expected the ratio of calculated spot uorescence
to background cellular uorescence to be nearly constant
from cell to cell. It is for these two reason that this strain
was used to optimize the algorithm by which spots were
identied and their intensities quantied.
Initially, we identied the best conditions, with respect
to growth media and progress along the growth curve, in
which to image this validation strain of B. subtilis so as to
observe clear and distinct spots. We sampled three types
of growth media, namely SMS media with glucose, CH
Measuring Dynamic Phosphorylation of Spo0A in Bacillus subtilis 3
media, and LB media. We also sampled cells hourly until
they reached stationary phase. It was eventually found
that the cells grown in SMS media, two hours into the
exponential phase, presented the most desirable spots for
imaging. Consequently, we prepared the validation strain
in SMS media and after two hours of growth imaged the
strain repeatedly, both in brighteld and under a 395
nm laser for GFP excitation, collecting data from several
hundreds of cells. The resulting images were then used
to optimize a computational algorithm for spot detection
and quantication.
The underlying structure of the algorithm is fairly
straightforward. The motivating them is to select poten-
tial candidates for spots by some initial sieve, and then to
identify which candidates are true spots using machine
learning. Boundaries of cells are determined from the
brighteld images using standard edge detection algo-
rithms. Each cell is then handled in an iterative fashion,
using the portion of uorescence data contained within
the newly determined boundaries of the cell. First, the
maximum intensity pixel in the cell is selected. If this
pixel is below some pre-specied intensity threshold, then
it is determined that all potential spots in the cell have
been identied. Otherwise, the pixels around this maxi-
mum intensity pixel are examined in order to determine
the point at which the dierence in intensity between ad-
jacent pixels falls below a preset threshold of noise, mark-
ing the boundary of the spot. Using the pixels within the
spot, a number of informative features are calculated for
the spot.
We decided to use as features the total intensity of the
spot (i.e. the sum of pixel intensities for all pixels within
the spot, and the nal assignment of spot intensity), the
intensity of the central pixel, the characteristic radius of
the spot (i.e. the distance between the central pixel and
the boundary), the standard deviation of pixel intensities
within the spot, and a measure of the correlation between
the pixels in the spot and a gaussian distribution. Fol-
lowing the calculation of these features, the intensities of
the pixels in the potential spot are replaced by a running
estimate of the background intensity of the cell, based on
an average over pixels that have not yet been identied as
being within a potential spot. The entire process is then
repeated. Finally, we provided annotations (i.e. whether
a potential spot is truly a spot or not) for approximately
100 potential spots, and using these labels and the fea-
tures of the associated potential spots, we used Linear
Discriminant Analysis to compute a general model for
identifying real spots among potential spots. The model
indicated that gaussian correlation was by far the great-
est indicator of true localization, with ten times greater
predictive value than the second best indicator, the char-
acteristic radius. Both of these features showed positive
correlation with true localization of uorescence.
Cross validation tests with the annotated data set in-
dicate that this computed model has a very low false
negative rate, and a false positive rate of approximately
0.05. However, this is somewhat misleading. It is con-
ceivable that while this model has very good predictive
value among spots within cells of the validition strain, the
features that are most important over a larger population
of cells, or more specically over the cells of the experi-
mental strain, might not be the same features identied
from the validation strain alone. Furthermore, Linear
Discriminant Analysis assumes a linear dependence on
the features being used, and for our choice of features
this assumption is quite likely to be erroneous. As a sim-
ple example, consider that spots with either abnormally
small or abnormally large characteristic radii might in
fact be nothing more than slight perturbations from the
background uorescence caused by stochastic diusion of
the fusion proteins. This is a highly nonlinear eect, and
that too in a feature identied as being particularly in-
formative under the linear approximation. Thus, as we
proceed it would be worthwhile to consider employing
nonlinear machine learning algorithms. Furthermore, it
would also be useful to iteratively optimize the weights
determined for each feature by employing the algorithm
over mixed populations of cells, using the output of the
previous iteration as the starting point for the next iter-
ation, and eventually determining feature weights solely
over the cells of the experimental strain.
B. Probabilistic Model for Determining Molecule
Counts
The uorescent intensities computed in the model gen-
erated by Linear Discriminant Analysis, while certainly
informative, leave much to be desired on a number of
fronts. While an dierence in intensity between dierent
spots necessarily correspond to a dierence of the same
sign between the occupancies of the two corresponding
binding arrays, these dierences do not share a linear
relation, or an easily discernible relation for that mat-
ter. Furthermore, in some cells of both the validation
and experimental strains, the original array insertion pro-
cess was so successful that not one, but two arrays were
transformed, and in these cells the distribution of total
localized uorescence among the two corresponding spots
oers additional information.
To extract this information, in particular the conver-
sion between intensities and molecule counts, a proba-
bilistic model was developed to describe the system at
the level of chemical kinetics and statistical mechanics.
We begin with a simplied model, in which we con-
sider only the distribution of molecules between the two
binding arrays, assuming a xed number of total bound
molecules. We then proceed to consider both bound and
unbound molecules, considering at this stage what infor-
mation might not be available to us at present. Finally,
we discuss how to infer the uorescence of a single fusion
protein from these models. We leave the mathematical
details of these models for the appendices, including only
the results of our calculations here.
In considering the simplied model, let there be N
B
Measuring Dynamic Phosphorylation of Spo0A in Bacillus subtilis 4
copies of the fusion protein X bound in total and M bind-
ing sites for X in each of two binding arrays. Note that
2M N
B
. We assume no cooperativity in the binding
of dierent sites, and that each binding site is identical
and has equal probability of being bound. Label the two
binding arrays 1 and 2, and let N
i
be the number of X
bound to array i. Since N
2
= N
B
N
1
, only N
1
is needed
to characterize the macrostate of the system. From these
assumptions, we calculated that
N
1
=
N
2
N
1
2
=
N
2

1 +
(N 1)(M 1)
2M 1

(
N
1
N
2
2
)
2
=
N1
=

N
1
2
N
1
2
=

N(2M N)
4(2M 1)
Now we extend our analysis to unbound X. Let N be
the total number of copies of X present, L the number of
free lattice sites available in the cytoplasm for X, E
the energy change due to a single molecule of X bind-
ing to a binding site on either array, T the temperature
of the solution in Kelvin, and x the Boltzmann factor
e
E
kT
. From these assumptions and statistical mechanics,
we calculated the averages and variances of N
1
and N
B
under this model. However, due to the lack of informa-
tion regarding the parameters E and L at present, the
results cannot be used directly for the inference of single
molecule uorescent intensity. Furthermore, neither one
can individually be inferred from the data, though a joint
function of both parameters might be inferred.
Thus, in order to infer , the uorescent intensity of
a single X, we must rely solely on the distribution of
intensity between the two spots within cells, and not
on the background uorescence levels. For some ar-
bitrary numbering over the cells from which data was
collected, let the values of N, N
B
, N
1
, N
2
for cell i be
N
i
, N
i,B
, N
i,1
, N
i,2
. Furthermore, let the actual uores-
cence measurements from each of these cells be denoted
by replacing the N in the corresponding molecule count
with Y . Thus, for any set of subscripts j, we have that
Y
j
= N
j
. Furthermore, dene F = M. Given these
denitions and the model we have described, we pick as
our estimate of the value of for which p(|Y
i,1
, Y
i,2
i)
is maximized. This distribution can be determined using
Bayes Law and a uniform but restricted prior p() over
possibly values of the proportionality constant . Carry-
ing out this calculation gives
2F

(Y
i,1
Y
i,2
)
2
Y
i
(2F Y
i
)

All that remains, then, is to determine the value of F


in this expression. A heuristic method which we found
to be surprisingly eective was to just note that
N1
for
xed N
B
is maximized when N
B
= M, and thus select
F as the value of Y
B
for which
Y1
is maximized. We
are currently in the process of solving the joint inference
problem for F and simultaneously.
C. Experimental Optimization
It is crucial that the proper conditions are maintained
for growth of B. subtilis during experiments. In partic-
ular, we decided to conduct all experiments in a bacte-
rial microudic chemostat. While prior experiments had
been performed in non-chemostatic environments, these
experiments presented many diculties. Firstly, we had
been unable to track the growth of the B. subtilis cells for
more than ten generations; indeed, while we are very in-
terested in the dynamics of Spo0A phosphorylation over
long time intervals, the large number of progeny gener-
ated from ten generations of division simply frustrated
our attempts to measure these dynamics. Furthermore,
it is eectively impossible to vary environmental condi-
tions in a controlled way without using a chemostat.
In light of these advantages, we opted to use a chemo-
static device in which only one lineage of B. subtilis cells
is actively maintained. Such a device was recently devel-
oped by the Jun lab at Harvard. Their work utilized a
device in which a small 1um wide growth channel abuts
a larger trench through which a chemostatic nutrient so-
lution is own. The growth channel initially contains a
single cell. As this cell divides, its progeny are pushed out
from the growth channel. At the same time, the mother
cell, remaining in the growth channel, is exposed to a
chemostatic solution via diusion of nutrients from the
much larger adjacent trench. In this way, then, only one
lineage of the cells is retained, and this lineage is kept in a
chemostatic environment which can be controlled and al-
tered at will. We utilized this device, termed the mother
machine, to study Spo0A phosphorylation dynamics in
B. subtilis cells. We manufactured this device and opti-
mized its design parameters by making liberal use of the
Kavli Nanoscience Institutes (KNI) cleanroom facilities.
In particular, the dierent mixtures of photoresist used,
the speed at which the photoresist is spun, and nally
the amount of time for which the photoresist is exposed
to ultraviolet light aect both the height and the deni-
tion of the channels in the device. As such, by iterating
over these parameters in a systematic fashion and assum-
ing a linear dependence of both height and dependence
on these parameters, we optimized channel height and
channel denition.
Concurrently, we worked to develop a conditioned me-
dia solution to induce sporulation in B. subtilis cells as
follows. First, wild type B. subtilis grown in nutrient rich
media were isolated by centrifugation and resuspended in
nutrient deprived resuspension media. After some period
of time, the cells were ltered out of the resuspension
media solution, and what remained was taken to be the
conditioned media. In an eort to test the ecacy of the
conditioned media , we grew the cells, either in liquid or
solid phase, in four conditions, namely without glucose,
Measuring Dynamic Phosphorylation of Spo0A in Bacillus subtilis 5
with glucose, in conditioned media without glucose, and
in conditioned media with glucose. For each test, we
expected the rst group to sporulate, albeit slowly, the
second group not to sporulate, and the third group to
sporulate relatively quickly. Sporulation of the fourth
group would have constituted conclusive validation of the
ecacy of the conditioned media, but a lack of sporula-
tion would not have been conclusive. We optimized over
these dierent preparation protocols by varying the ini-
tial growth media, the stage of growth at which cells
were removed from this media, and the amount of time
spent in resuspension media before ltration. We found
that the most eective conditioned media was produced
by taking cells grown to optical density 1 in CH media
and culturing them in resupsension media for 2 hours.
This media was subsequently tested on cells within the
mother machine, and successfully induced sporulation in
this setting.
Finally, we also attempted to optimize the dynamic
range of Spo0A
P
visualization by varying Spo0A pro-
duction. To do this, we replaced the Spo0A
P
inducible
promoter controlling production of Spo0A with a xy-
lose inducible promoter. We then proceeded to image
cells on media pads treated with xylose, under 587 nm
laser excitation to visualize mCherry signal and deter-
mine the level of xylose induction required to achieve the
greatest ratio between localized uorescence and back-
ground uorescence. After much experimentation with
pre-imaging growth routines, we nally determined that
the the basal rate of activity of the xylose inducible pro-
moter produced the greatest dynamic range of Spo0A
P
visualization. Thus, as we proceed we will likely need
to choose a more tightly regulated promoter to control
Spo0A production.
In conclusion, then, we were able to optimize a num-
ber of components required to analyze Spo0A
P
dynamics
during sporulation in B. subtilis. We optimized the de-
sign for a microuidic system for analyzing individual
cell lineages and produced the optimized device. We also
developed a protocol for producing conditioned media
that induces sporulation in B. subtilis. Furthermore, we
determined the level of induction of Spo0A production
required for maximum dynamic range of Spo0A
P
visu-
alization. Finally, we developed an algorithm for iden-
tifying and quantifying spots of localized uorescence in
cells, as well as a probabilistic model for determining ac-
tual molecule counts for Spo0A and Spo0A
P
from these
quantied uorescence levels. As we move forward, we
intend to replace the xylose inducible promoter with an
IPTG inducible promoter for tighter regulation, and to
implement the dual inference of both binding array size
and single molecule uorescence intensity. Finally, with
all of these components now available to us, we wish to
examine the sporulation system and answer two particu-
lar questions. We will determine bounds on the time scale
of Spo0A phosphopulses, and we will determine whether
the amplitude of these pulses has any eect on the overall
timescale of sporulation.
Acknowledgments
The author would like to thank Joseph H. Levine for
his patience and guidance throughout the research pro-
cess and to thank Maria Hernandez for constructing all
of the strains used in experiments this summer. Fur-
thermore, the author is indebted to Professor Michael
B. Elowitz for allowing him the privilege to perform re-
search in the Elowitz Lab at the California Institute of
Technology.
[1] Grossman, A.D., Genetic networks controlling the initia-
tion of sporulation and the development of genetic com-
petence in Bacillus subtilis. Annu Rev Genet, 1995. 29: p.
477-508.
[2] Molle, V., et al. The Spo0A regulon of Bacillus subtilis.
Mol Microbiol, 2003. 50(5): p. 1683-1701
[3] Newport, J. and M. Kirschner, A major developmental
transition in early Xenopus embryos: I. characterization
and timing of cellular changes at the midblastula stage.
Cell, 1982. 30(3): p. 675-86.
[4] Ra, M., Intracellular developental timers. Cold Spring
Harb Symp Quant Biol, 2007. 72: p. 431-5.
[5] Sonenshein, A.L. , Control of sporulation initiation in
Bacillus subtilis. Curr Opin Microbiol, 2000. 3(6): p. 561-
6.
[6] Wang P., Robert L., Pelletier J., Dang W.L., Taddei F.,
Wright A., Jun S. (2010). Robust Growth of Escherichia
coli. Current Biology 20, 1099-1103.
[7] Waters, C.M. and B.L. Bassler, Quorum sensing: cell to
cell communication in bacteria. Annu Rev Cell Dev Biol,
2005. 21: p. 319-46.

Figure 1. Shown above is a schematic diagram of experimental system. Each image on the left depicts a
molecular scenario, while the corresponding image on the right illustrates the observable fluorescence
pattern associated with the molecular scenario. When the fusion proteins are bound, their fluorescence
is much more localized and intense.


Figure 2. Shown to the left is a
schematic of the mother machine
followed by a depiction of the way in
which the device maintains single cell
lineages. As can be observed, there is
one inlet channel and one outlet
channel connecting to the central
trench, which supplies nutrients to
and draws excess cells from the side
channels.



Figure 5. Gillespie Simulations were used to simulate the two array system, producing the following data
(labeled in blue) for some fixed value of . The values labeled in red denote the standard deviation of all
data points with Y
T
within 10
4
of the denoted Y
T
value. These depict the process by which Y
M
= F was
determined as the value of Y
T
for which this standard deviation is maximized. Finally, the black curve
above depicts the expected value of the standard deviation in single array fluorescence.
Figure 3. Induction of Sporulation
by Conditioned Media in the
Mother Machine
Figure 4. Visualization of Spo0A
P

localized fluorescence during xylose
induction of Spo0A production.
Appendix: Probabilistic Model
1 Binomial Treatment
The problem statement here will be as follows. Let be N
B
copies of the fusion protein X bound in total and
M binding sites for X in each of two binding arrays. Note that 2M N
B
. We assume no cooperativity
in the binding of dierent sites, and that each binding site is identical and has equal probability of being
bound. Label the two binding arrays 1 and 2, and let N
i
be the number of X bound to array i. Since
N
2
= N
B
N
1
, only N
1
is needed to characterize the macrostate of the system. Now, there exist 2M binding
sites and N indistinguishable copies of X, and so there exist
_
2M
N
_
total binding congurations. Of these,
_
M
n
__
M
n
_
congurations have N
1
= n copies of X bound to array 1. Thus, the probability that N
1
= n is given
by
_
M
n
__
M
n
_
_
2M
N
_ .
Let us consider now the following function f(x, y) = (1 +xy)
M
(1 +y)
M
. Expanding this expression gives
f(x, y) =
_

n
_
M
n
_
x
n
y
n
__

n
_
M
n
_
y
n
_
=

n
_
M
n
_
x
n
y
n
_
M
N n
_
y
Nn
=

n
_
M
n
__
M
N n
_
x
n
y
N
The coecient of y
N
in this polynomial (denoted by [y
N
] (f(x, y))) is
_
2M
N
_
times the generating function for
our probability distribution. From this, we have that
_
2M
N
_
N
1
=

n
n
_
M
n
__
M
N n
_
= [y
N
]
_

n
n
_
M
n
__
M
N n
_
x
n
y
N
_
x=1
= [y
N
]
_
x

x
f(x, y)
_
x=1
=
[y
N
]
_
Mxy(1 +xy)
M1
(1 +y)
M
_
x=1
= [y
N1
]
_
M(1 +y)
2M1
_
= M
_
2M 1
N 1
_
_
2M
N
_
N
1
2
=

n
n
2
_
M
n
__
M
N n
_
= [y
N
]
_

n
n
2
_
M
n
__
M
N n
_
x
n
y
N
_
x=1
=
[y
N
]
_
x

x
x

x
f(x, y)
_
x=1
= [y
N
]
_
Mxy(1 +xy)
M1
(1 +y)
M
+M(M 1)x
2
y
2
(1 +xy)
M2
(1 +y)
M
_
x=1
=
[y
N1
]
_
M(1 +y)
2M1
_
+ [y
N2
]
_
M(M 1)(1 +y)
2M2
_
= M
_
2M 1
N 1
_
+M(M 1)
_
2M 2
N 2
_
N
1
=
N
2
N
1
2
=
N
2
_
1 +
(N 1)(M 1)
2M 1
_
_
(N
1
N
2
)
2
4
=
N1
=
_
N
1
2
N
1
2
=

N(2M N)
4(2M 1)
1
2 Generalized Treatment
Consider now a more general system, in which two binding arrays for protein X and a number of copies of
protein X are present in the cytoplasm. Let N be the total number of copies of X present, N
B
the number of
X molecules bound to either array, M the number of binding sites in each array, N
1
and N
2
the numbers of
X molecules bound to the rst and second array respectively, L the number of free lattice sites available in
the cytoplasm for X, E the energy change due to a single molecule of X binding to a binding site on either
array, T the temperature of the solution in Kelvin, and x = e
E
kT
.
Now, for any given value of N
B
, the treatment in the previous section gives the expected statistics for N
1
.
In particular, for any analytic function g(N
1
), the treatment in the previous section allows us to determine
an analytic function h(N
B
) = g(N
1
), where the average is taken over all possible N
1
for xed N
B
. In the
case of variable N
B
then, as we are discussing here, g(N
1
) = h(N
B
), where both averages are taken over all
possible congurations of the system. Thus, we need only consider in this section determining averages of the
nature h(N
B
), for analytic h. Now, from statistical mechanics we have that the probability of obtaining a
particular N
B
is given by
1
Z
_
L
N N
B
__
2M
N
B
_
, where the normalization factor is Z =

N
B
_
L
N N
B
__
2M
N
B
_
.
Employing the same argument as used in the rst section, we obtain that Z = [w
N
]
_
(1 +w)
L
(1 +xw)
2M
_
=
_
L
N
_

i=0
(2M)
i
(N)
i
x
i
i!(L N + 1)
(i)
where (a)
i
= a(a 1) (a i +1) and a
(i)
= a(a +1) (a +i 1) are the falling
and rising factorials respectively. Here we introduce our rst assumption, namely that L > N. We must note,
of course, that the cytoplasm does not actually behave as a lattice with a nite number of sites, but even in
such a model as ours for which the cytoplasmic space is discretized, it is ludicrous to suggest that the number
of copies of X present in any functioning cell would outnumber the total number of cytoplasmic sites available.
Given this assumption, then, it is clear that every term in this series is well dened. Furthermore, the series
must eventually terminate, as (a)
i
= 0 for i a + 1.
Using exact methods this calculation can be taken no further, as this series does not yield a closed form
(it is in fact a hypergeometric function). Thus, to proceed we must enforce further assumptions and examine
dierent regimes of behavior. We assume, then, that L >> 2M, N. Again, this is a reasonable assumption
because of the vast and continuous nature of the cytoplasm in comparison to the size of individual proteins.
Under this assumption, we have that (L N + 1)
i
L
i
. Finally, we shall consider two dierent regimes
of behavior, namely when N << 2M and when N >> 2M. In the former case, our expression reduces to
Z
_
L
N
__
1 +
2Mx
L
_
N
, while in the latter case, our expression reduces to Z
_
L
N
__
1 +
Nx
L
_
2M
. From
these two expression, we can calculate the quantities of interest in each of these two regimes. In particular,
we have now that h(N
B
) =
1
Z
h(x
d
dx
)Z. Applying this in conjunction with the expressions obtained in the
rst section gives
N
B
=
_
2MNx
L+2Mx
, if N << 2M
2MNx
L+Nx
, if N >> 2M
N
B
2
=
_
2MNx
L+2Mx
+
(2Mx)
2
N(N1)
(L+2Mx)
2
, if N << 2M
2MNx
L+Nx
+
(Nx)
2
(2M)(2M1)
(L+Nx)
2
, if N >> 2M

N
B
2
=
_
2MNx
L+2Mx

(2Mx)
2
N
(L+2Mx)
2
, if N << 2M
2MNx
L+Nx

(Nx)
2
(2M)
(L+Nx)
2
, if N >> 2M
2
N
1
=
_
MNx
L+2Mx
, if N << 2M
MNx
L+Nx
, if N >> 2M
N
1
2
=
_
MNx
L+2Mx
+
2N(N1)(Mx)
2
(2M1)(L+2Mx)
2
, if N << 2M
MNx
L+Nx
+
M(Nx)
2
(L+Nx)
2
, if N >> 2M

N1
2
=
_
N
1
N
2
2
_
2
=
_
MNx
L+2Mx

(Mx)
2
(L+2Mx)
2
_
2M3
2M1
N
2
+
2
2M1
N
_
, if N << 2M
MNx(L(M2)Nx)
(L+Nx)
2
, if N >> 2M
3 Inference of Proportionality Constants
We now consider a system in which each copy of protein X is replaced with a fusion protein, X-GFP. Now,
whether bound to an array or oating free in the cytoplasm, each X molecule will produce a uorescence
signal with amplitude . Only measurements of uorescence can be taken on this system, but if were known,
these uorescence measurements could be converted into exact molecule counts. We will use the probabilistic
variations, described by the quantities calculated in the previous two sections, to calculate . However, since
the parameter values for L, E in the generalized treatment are not known, we will only perform the inference
using the binomial model.
3.1 Independent Dependencies
We would like to select the value of such that the probability p(|d) p(d|) is maximized for dataset
d. Suppose each cell in our data set is labeled with an index i. Then for cell i, the data points Y
i,1
, Y
i,2
are
collected, corresponding to the uorescence measurements from the rst and second binding arrays respectively.
Furthermore, Y
i,1
= N
i,1
, Y
i,2
= N
i,2
by denition, where N
i,1
, N
i,2
are the number of X molecules present
within cell i and bound to array 1 and array 2 respectively. Now, in section 1 we calculated the rst two
moments of N
1
as a function of N
B
. For a suciently large number of cells, the central limit theorem allows
us to asymptotically determine that
p(Y
i,1
|Y
i,2
, )

4F
(Y
i,1
+Y
i,2
)(2F (Y
i,1
+Y
i,2
))
e

4F(Y
i,1

1
2
(Y
i,1
+Y
i,2
))
2
(Y
i,1
+Y
i,2
)(2F(Y
i,1
+Y
i,2
))
Furthermore, since each cell is independent, we have that p(Y
i,1
i|v, Y
i,2
i) =

i
p(Y
i,1
|v, Y
i,2
). Since
maximizing this distribution with respect to is equivalent to maximizing the logarithm of this distribution
with respect to , we dierentiate ln p with respect to and solve for such that the resulting expression is
0. This gives the result established in the main text, namely that
2F
_
(Y
i,1
Y
i,2
)
2
Y
i
(2F Y
i
)
_
3

Vous aimerez peut-être aussi