Académique Documents
Professionnel Documents
Culture Documents
032227
INVESTIGATIONS
ABSTRACT Efficient and accurate protein synthesis is crucial for organismal survival in competitive envi- KEYWORDS
ronments. Translation efficiency the number of proteins translated from a single mRNA in a given time Translation initia-
period is the combined result of differential translation initiation, elongation, and termination rates. Previous tion
research identified the Shine-Dalgarno (SD) sequence as a modulator of translation initiation in bacterial genes, Gene expression
while codon usage biases are frequently implicated as a primary determinant of elongation rate variation. Growth regula-
Recent studies have suggested that SD sequences within coding sequences may negatively affect translation tion
elongation speed, but this claim remains controversial. Here, we present a metric to quantify the prevalence
of SD sequences in coding regions. We analyze hundreds of bacterial genomes and find that the coding
sequences of highly expressed genes systematically contain fewer SD sequences than expected, yielding a
robust correlation between the normalized occurrence of SD sites and protein abundances across a range of
bacterial taxa. We further show that depletion of SD sequences within ribosomal protein genes is correlated
with organismal growth rates, supporting the hypothesis of strong selection against the presence of these
sequences in coding regions and suggesting their association with translation efficiency in bacteria.
...
Free energy of
50S binding with aSD: G1 ... G5 ... Gn
SD site
B
G (kcal/mol)
5'
XXXAGGAGGXXXXXXAUGXXX 3'
mRNA
UCCUCC
3' anti-SD
sequence
30S
B
Affinity
Radj
2
6
4 b3055
7) .
= 4 in
= 1 in
84 ff
=2 A
=8 A
A
mR 9)
Tra 998)
mR )
= 2 s.e
Pro )
25
N
(n te
(n te
24
aSD binding score, S
(n N
00
Pro
n
(n
(n
n = 175 n = 12
Frequency
B et al. 2014).
n = 172 n = 15
Frequency
A 10
n=3 n = 23
8
Frequency
6
z-score
4
Figure 3 Depletion of SD occurrence in genomes compared to ex-
pectation from 1000 randomly generated genomes using our codon-
2
shuffled null model. (A) the canonical SD sequence AGGAGG is
depleted within coding sequences in most genomes (175 of 187)
and (B) The genome aSD binding score S genome is lower for most
organisms (172 of 187). Both distributions are centered significantly
0
to the left of zero showing that the majority of organisms avoid SD -0.02 0 0.02 0.04 0.06 0.08
sequences according to both metrics. R (NC' +S ) - R
2
adj
2
adj (NC' )
Figure 5 Shine-Dalgarno sequence depletion is correlated with pro-
tein abundances in a diverse set of bacterial taxa. Distribution of dif-
ferences between the R2adj for models which do and do not contain
the S score. For 23 of the 26 organisms, inclusion of aSD binding
score as an independent variable enhances predictive power. The
full data table including organism names and values is available in
Supplementary Table 2.
SD depleted SD enriched
in RPs in RPs
E. coli
BSD