Académique Documents
Professionnel Documents
Culture Documents
1 +
(N 1)(M 1)
2M 1
(
N
1
N
2
2
)
2
=
N1
=
N
1
2
N
1
2
=
N(2M N)
4(2M 1)
Now we extend our analysis to unbound X. Let N be
the total number of copies of X present, L the number of
free lattice sites available in the cytoplasm for X, E
the energy change due to a single molecule of X bind-
ing to a binding site on either array, T the temperature
of the solution in Kelvin, and x the Boltzmann factor
e
E
kT
. From these assumptions and statistical mechanics,
we calculated the averages and variances of N
1
and N
B
under this model. However, due to the lack of informa-
tion regarding the parameters E and L at present, the
results cannot be used directly for the inference of single
molecule uorescent intensity. Furthermore, neither one
can individually be inferred from the data, though a joint
function of both parameters might be inferred.
Thus, in order to infer , the uorescent intensity of
a single X, we must rely solely on the distribution of
intensity between the two spots within cells, and not
on the background uorescence levels. For some ar-
bitrary numbering over the cells from which data was
collected, let the values of N, N
B
, N
1
, N
2
for cell i be
N
i
, N
i,B
, N
i,1
, N
i,2
. Furthermore, let the actual uores-
cence measurements from each of these cells be denoted
by replacing the N in the corresponding molecule count
with Y . Thus, for any set of subscripts j, we have that
Y
j
= N
j
. Furthermore, dene F = M. Given these
denitions and the model we have described, we pick as
our estimate of the value of for which p(|Y
i,1
, Y
i,2
i)
is maximized. This distribution can be determined using
Bayes Law and a uniform but restricted prior p() over
possibly values of the proportionality constant . Carry-
ing out this calculation gives
2F
(Y
i,1
Y
i,2
)
2
Y
i
(2F Y
i
)
n
_
M
n
_
x
n
y
n
__
n
_
M
n
_
y
n
_
=
n
_
M
n
_
x
n
y
n
_
M
N n
_
y
Nn
=
n
_
M
n
__
M
N n
_
x
n
y
N
The coecient of y
N
in this polynomial (denoted by [y
N
] (f(x, y))) is
_
2M
N
_
times the generating function for
our probability distribution. From this, we have that
_
2M
N
_
N
1
=
n
n
_
M
n
__
M
N n
_
= [y
N
]
_
n
n
_
M
n
__
M
N n
_
x
n
y
N
_
x=1
= [y
N
]
_
x
x
f(x, y)
_
x=1
=
[y
N
]
_
Mxy(1 +xy)
M1
(1 +y)
M
_
x=1
= [y
N1
]
_
M(1 +y)
2M1
_
= M
_
2M 1
N 1
_
_
2M
N
_
N
1
2
=
n
n
2
_
M
n
__
M
N n
_
= [y
N
]
_
n
n
2
_
M
n
__
M
N n
_
x
n
y
N
_
x=1
=
[y
N
]
_
x
x
x
x
f(x, y)
_
x=1
= [y
N
]
_
Mxy(1 +xy)
M1
(1 +y)
M
+M(M 1)x
2
y
2
(1 +xy)
M2
(1 +y)
M
_
x=1
=
[y
N1
]
_
M(1 +y)
2M1
_
+ [y
N2
]
_
M(M 1)(1 +y)
2M2
_
= M
_
2M 1
N 1
_
+M(M 1)
_
2M 2
N 2
_
N
1
=
N
2
N
1
2
=
N
2
_
1 +
(N 1)(M 1)
2M 1
_
_
(N
1
N
2
)
2
4
=
N1
=
_
N
1
2
N
1
2
=
N(2M N)
4(2M 1)
1
2 Generalized Treatment
Consider now a more general system, in which two binding arrays for protein X and a number of copies of
protein X are present in the cytoplasm. Let N be the total number of copies of X present, N
B
the number of
X molecules bound to either array, M the number of binding sites in each array, N
1
and N
2
the numbers of
X molecules bound to the rst and second array respectively, L the number of free lattice sites available in
the cytoplasm for X, E the energy change due to a single molecule of X binding to a binding site on either
array, T the temperature of the solution in Kelvin, and x = e
E
kT
.
Now, for any given value of N
B
, the treatment in the previous section gives the expected statistics for N
1
.
In particular, for any analytic function g(N
1
), the treatment in the previous section allows us to determine
an analytic function h(N
B
) = g(N
1
), where the average is taken over all possible N
1
for xed N
B
. In the
case of variable N
B
then, as we are discussing here, g(N
1
) = h(N
B
), where both averages are taken over all
possible congurations of the system. Thus, we need only consider in this section determining averages of the
nature h(N
B
), for analytic h. Now, from statistical mechanics we have that the probability of obtaining a
particular N
B
is given by
1
Z
_
L
N N
B
__
2M
N
B
_
, where the normalization factor is Z =
N
B
_
L
N N
B
__
2M
N
B
_
.
Employing the same argument as used in the rst section, we obtain that Z = [w
N
]
_
(1 +w)
L
(1 +xw)
2M
_
=
_
L
N
_
i=0
(2M)
i
(N)
i
x
i
i!(L N + 1)
(i)
where (a)
i
= a(a 1) (a i +1) and a
(i)
= a(a +1) (a +i 1) are the falling
and rising factorials respectively. Here we introduce our rst assumption, namely that L > N. We must note,
of course, that the cytoplasm does not actually behave as a lattice with a nite number of sites, but even in
such a model as ours for which the cytoplasmic space is discretized, it is ludicrous to suggest that the number
of copies of X present in any functioning cell would outnumber the total number of cytoplasmic sites available.
Given this assumption, then, it is clear that every term in this series is well dened. Furthermore, the series
must eventually terminate, as (a)
i
= 0 for i a + 1.
Using exact methods this calculation can be taken no further, as this series does not yield a closed form
(it is in fact a hypergeometric function). Thus, to proceed we must enforce further assumptions and examine
dierent regimes of behavior. We assume, then, that L >> 2M, N. Again, this is a reasonable assumption
because of the vast and continuous nature of the cytoplasm in comparison to the size of individual proteins.
Under this assumption, we have that (L N + 1)
i
L
i
. Finally, we shall consider two dierent regimes
of behavior, namely when N << 2M and when N >> 2M. In the former case, our expression reduces to
Z
_
L
N
__
1 +
2Mx
L
_
N
, while in the latter case, our expression reduces to Z
_
L
N
__
1 +
Nx
L
_
2M
. From
these two expression, we can calculate the quantities of interest in each of these two regimes. In particular,
we have now that h(N
B
) =
1
Z
h(x
d
dx
)Z. Applying this in conjunction with the expressions obtained in the
rst section gives
N
B
=
_
2MNx
L+2Mx
, if N << 2M
2MNx
L+Nx
, if N >> 2M
N
B
2
=
_
2MNx
L+2Mx
+
(2Mx)
2
N(N1)
(L+2Mx)
2
, if N << 2M
2MNx
L+Nx
+
(Nx)
2
(2M)(2M1)
(L+Nx)
2
, if N >> 2M
N
B
2
=
_
2MNx
L+2Mx
(2Mx)
2
N
(L+2Mx)
2
, if N << 2M
2MNx
L+Nx
(Nx)
2
(2M)
(L+Nx)
2
, if N >> 2M
2
N
1
=
_
MNx
L+2Mx
, if N << 2M
MNx
L+Nx
, if N >> 2M
N
1
2
=
_
MNx
L+2Mx
+
2N(N1)(Mx)
2
(2M1)(L+2Mx)
2
, if N << 2M
MNx
L+Nx
+
M(Nx)
2
(L+Nx)
2
, if N >> 2M
N1
2
=
_
N
1
N
2
2
_
2
=
_
MNx
L+2Mx
(Mx)
2
(L+2Mx)
2
_
2M3
2M1
N
2
+
2
2M1
N
_
, if N << 2M
MNx(L(M2)Nx)
(L+Nx)
2
, if N >> 2M
3 Inference of Proportionality Constants
We now consider a system in which each copy of protein X is replaced with a fusion protein, X-GFP. Now,
whether bound to an array or oating free in the cytoplasm, each X molecule will produce a uorescence
signal with amplitude . Only measurements of uorescence can be taken on this system, but if were known,
these uorescence measurements could be converted into exact molecule counts. We will use the probabilistic
variations, described by the quantities calculated in the previous two sections, to calculate . However, since
the parameter values for L, E in the generalized treatment are not known, we will only perform the inference
using the binomial model.
3.1 Independent Dependencies
We would like to select the value of such that the probability p(|d) p(d|) is maximized for dataset
d. Suppose each cell in our data set is labeled with an index i. Then for cell i, the data points Y
i,1
, Y
i,2
are
collected, corresponding to the uorescence measurements from the rst and second binding arrays respectively.
Furthermore, Y
i,1
= N
i,1
, Y
i,2
= N
i,2
by denition, where N
i,1
, N
i,2
are the number of X molecules present
within cell i and bound to array 1 and array 2 respectively. Now, in section 1 we calculated the rst two
moments of N
1
as a function of N
B
. For a suciently large number of cells, the central limit theorem allows
us to asymptotically determine that
p(Y
i,1
|Y
i,2
, )
4F
(Y
i,1
+Y
i,2
)(2F (Y
i,1
+Y
i,2
))
e
4F(Y
i,1
1
2
(Y
i,1
+Y
i,2
))
2
(Y
i,1
+Y
i,2
)(2F(Y
i,1
+Y
i,2
))
Furthermore, since each cell is independent, we have that p(Y
i,1
i|v, Y
i,2
i) =
i
p(Y
i,1
|v, Y
i,2
). Since
maximizing this distribution with respect to is equivalent to maximizing the logarithm of this distribution
with respect to , we dierentiate ln p with respect to and solve for such that the resulting expression is
0. This gives the result established in the main text, namely that
2F
_
(Y
i,1
Y
i,2
)
2
Y
i
(2F Y
i
)
_
3