Académique Documents
Professionnel Documents
Culture Documents
DOI 10.1007/s12013-013-9638-0
ORIGINAL PAPER
The Author(s) 2013. This article is published with open access at Springerlink.com
Introduction
The origin of protein aggregation and amyloid formation is
poorly understood for intrinsically disordered proteins
(IDPs) that do not have a fixed three-dimensional structure
in physiological conditions. Some IDPs are resistant to
protein aggregation while others are directly involved in
amyloid formation [1]. Similarly, some IDPs can have a
fixed structure under some physiological conditions, for
example, when interacting with other molecules (folders)
while others are so-called nonfolders that do not fold into a
unique structure under any known conditions [2]. What
makes some IDPs foldable or aggregation prone is an open
123
Results
Defining Semi-disorder
SPINE-D was trained by a large database of 4,229
(4,157 ? 72) non-redundant proteins with 90 % ordered
residues and 10 % disordered residues [12]. This unbalanced dataset led to a threshold of 0.06 for predicted
probability when optimized for the highest accuracy. That
is, residues are assigned as ordered if the predicted probability is less than 0.06 and residues are assigned as disordered if the predicted disorder probability is greater than
0.06. However, for a two-state classification, a perfect
threshold should be at a probability of 0.5 when there is an
123
(A)
# of Residues
30000
20000
20000
DX4080
SL477
Control
10000
10000
(B)
0
0.04
0.08
0.12
0.16
# of Residues
30000
20000
DX4080
SL477
Control
10000
(C)
# of Segments
0
1000
800
600
DX4080
SL477
Control
400
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Disorder Probability
Fig. 1 The distribution of disorder probability predicted by SPINE-D
at residue level before (a) and after scaling (b) and at long segment
level ([30 amino acid residues) (c) for three datasets (DX4080,
SL477, and Control703). The insert in (a) shows the fine detail around
the disorder probability of 0.06. The negative control set (stable
monomeric proteins) does not have a peak for fully disordered
residues or regions, indicating the usefulness of separating semidisorder from full disorder
0.8
0.6
0.4
0.2
Ordered
Semi-disordered
Fully Disordered
0
0.2
0.4
0.6
0.8
123
123
Disorder Probability
1meyG
20
40
60
80
20
40
60
80
20
40
60
80
1ohhH
1qqp4
1urqA
10
20
1ytvN
30
40
50
60
40
60
80
2pxbA
20
40
60
80
100
3f5hB
10
20
30
40
50
60
3k29A
50
100
150
Residue Index
that fold upon binding. The dataset was divided into long
(28 complexes) and short (46 complexes) according to the
size of disordered regions (30 residues). In this dataset,
each residue is annotated as either in binding (positive) or
non-binding (negative) regions. To examine if a residue in
a semi-disordered state is a potential binding residue, we
define true positive (TP) if an annotated binding residue is
predicted as semi-disorder, true negative (TN) if a nonbinding residues is predicted as non-semi-disorder, false
positive (FP) if a non-binding residue is predicted as semidisorder, and false negative (FN) if a binding residue is
predicted as non-semi-disorder. This allows us to calculate
sensitivity [TP/(TP ? FN)], specificity [TN/(TN ? FP)],
and Matthews correlation coefficient MCC TP TN
p
FP FN = TP FPTP FNTN FPTN FN
without any training. Here, we assess the performance on
the residue level, rather than on the region level to avoid
the difficulty of defining true/false negatives/positives at
the region level without introducing additional parameters.
Table 1 compares the results of SPINE-D with those
from ANCHOR [22] and MoRFpred [23], two recently
developed techniques that were trained to predict binding
in disordered regions. The accuracy of all three methods is
low with the average sensitivity and specificity (balanced
accuracy) between 56 and 72 %. MoRFpred, trained for the
short dataset, has the highest MCC value of 0.29 for the
short dataset while SPINE-D has the highest MCC value of
0.15 for the long dataset. The result confirms a weak but
positive association between a semi-disordered state and
the binding-induced folding region, for binding residues in
long disordered regions, in particular.
Semi-disorder and Protein Aggregation: Illustrative
Examples
The connection between semi-disorder and bindinginduced folding also suggests the potential role of semidisorder in protein aggregation because protein aggregation
can be viewed as folding coupled with self-association.
Here, we started with several known aggregation-prone
proteins to examine if there is a connection between semidisorder and aggregation.
Table 1 Predicting binding residues in short and long disordered regions (the short and long ANCHOR set) by ANCHOR, MoRFpred, and the
semi-disorder from SPINE-D
Short disordered region
Method
Sensitivity
Specificity
MCC
Sensitivity
Specificity
MCC
ANCHOR
0.64
0.71
0.14
0.45
0.66
0.06
MoRFpreda
0.50
0.94
0.29
0.17
0.94
0.10
SPINE-D
0.32
0.80
0.05
0.42
0.81
0.15
123
0.6
0.95
fQ
0.5
0.9
0.4
0.85
0.3
0.8
0.2
0.75
0.1
0.7
10
20
30
40
50
60
70
80
90
0.7
0.65
100
Number of Qs
Fig. 5 Transition of the polyglutamine tract of huntingtin from a fully
disordered to a partially semi-disordered state. Fraction of glutamines
(Qs) in a semi-disordered state (fQ in red) and the average disorder
probability (P, in blue) in the poly-Q region as a function of the number
of glutamines in the poly-Q tract of huntingtin (Color figure online)
123
Yeast Sup35
The overlap between amyloidogenic and semi-disordered
regions in alpha-synuclein is further observed for amyloidogenic yeast Sup35. In Fig. 6b, the disorder probability
profile for Sup35 predicted by SPINE-D is compared with
1
0.8
0.6
0.4
0.2
0
(A) alpha-synuclein
N-Terminal
0
1
0.8
0.6
0.4
0.2
0
20
1
0.8
0.6
0.4
0.2
0
NAC
40
60
80
100
120
140
(B) Sup35
0
1
0.8
0.6
0.4
0.2
0
C-Terminal
50
100
150
200
250
(C) SOD1
20
40
60
80
100
120
140
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
1
0.8
0.6
0.4
0.2
0
Disorder Probability
Huntingtin
20
40
60
80
100
120
Residue Index
0.8
Disorder Probability
Acylphosphate
0.6
0.4
0.2
Structured Region
0
50
100
Residue Index
Fig. 7 The disorder probability profile of acylphosphate from
hyperthermophilic archaeon Sulfolobus solfataricus (Sso AcP) predicted from SPINE-D (red line). The semi-disordered residues from 1
to 12 at the N-terminal after removing terminal effect agree with the
unstructured region for the first 12 residues from the NMR experiment
(PDB #1Y9O) (blue) (Color figure online)
123
the semi-disordered state (Crsd Crfd )/Crfd is highly correlated with amyloid aggregation propensity. As shown in
Fig. 8, the correlation coefficient is 0.86 without Pro and
four charged residues (Arg, Asp, Glu, and Lys) and 0.74 for
all residues. The highest enrichment of a residue in a semidisordered region over the fully disordered region is 185 %
for the strongest aggregation-prone residue Trp and more
than 100 % for the second and third strongest aggregationprone residues Phe and Cys. This strong positive correlation
supports the capability of semi-disordered regions to promote aggregation. Changing from the semi-disordered state
to the ordered one continues to enrich residues with high
amyloid aggregation propensity but with a much smaller
enrichment factor (36 % for Trp, 52 % for Phe, and 41 %
for Cys). The correlation coefficient is 0.79 for all 20 residue
types and 0.87 without Pro and charged residues. Thus, only
the fully disordered state is aggregation-resistant. Both
ordered and semi-disordered regions can participate in
aggregation as demonstrated in Figs. 6 and 7.
Discussion
Table 2 Predicting residues in aggregation regions for 12 proteins in the AmyPDB dataset and 4 proteins from Fig. 6
Method
Sensitivity
Specificity
MCC
Fold-amyloid
0.16
0.81
-0.03
-0.01
Aggrescan
0.23
0.76
0.14 (0.54)
0.38 (0.84)
0.93 (0.66)
0.73 (0.32)
0.01 (0.18)
0.11 (0.15)
Two options in Waltz server were used: best performance and high sensitivity (in parentheses)
Results from SPINE-D are obtained by employing predicted semi-disordered residues in aggregation regions. The numbers in parentheses are
resulted from assigning both ordered and semi-disordered residues (00.7) as aggregation prone
123
fd
(C -C )/C
sd
(C -C )/C
1.5
Trp
fd
sd
Cys
Ile
Phe
Tyr
Leu
0.5
Val
Asp
Arg
Pro
Glu Lys
Asn
Gln
Ser
Enrichment
Depletion
Thr
Ala
Gly
Met
His
-0.5
-12
-8
-4
123
# of Residues
(A) 15000
SPINE-D
Disopred2
IU-long
MD
MFDp
Dispro
10000
SPINE-D
5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Disorder Probability
(B)
Disorder Probability
MFDp
Expt.
SPINE-D
0.8
IUlong
0.6
MD
0.4
Disopred2
0.2
Dispro
0
50
100
150
200
250
Residue Index
Fig. 9 (a) Distributions of predicted disorder probabilities for five online servers (Disopred2, IU-long, MD, MFDp, and Dispro) in addition
to SPINE-D as labeled. (b) Disorder probability of yeast Sup 35
predicted by the above five methods in addition to SPINE-D as labeled.
No methods except SPINE-D (in red) separated a collapsed N-terminal
region and an extended C-terminal region in agreement with the
experimental Cys accessibility data (in black) (Color figure online)
123
References
1. Tompa, P. (2002). Intrinsically unstructured proteins. Trends in
Biochemical Sciences, 27, 527533.
2. Rauscher, S., & Pomes, R. (2010). Molecular simulations of
protein disorder. Biochemistry and Cell Biology, 88, 269290.
123
123
43. Maurer-Stroh, S., et al. (2010). Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nature Methods, 7, 855.
44. Conchillo-Sole, O., et al. (2007). AGGRESCAN: A server for the
prediction and evaluation of hot spots of aggregation in
polypeptides. BMC Bioinformatics, 8, 65.
45. Thompson, M. J., et al. (2006). The 3D profile method for
identifying fibril-forming segments of proteins. Proceedings of
the National Academy of Sciences United States of America, 103,
40744078.
46. Pawar, A. P., et al. (2005). Prediction of aggregation-prone and
aggregation-susceptible regions in proteins associated with
neurodegenerative diseases. Journal of Molecular Biology, 350,
379392.
47. Linding, R., Schymkowitz, J., Rousseau, F., Diella, F., & Serrano,
L. (2004). A comparative study of the relationship between
protein structure and beta-aggregation in globular and intrinsically disordered proteins. Journal of Molecular Biology, 342,
345353.
48. Mohan, A., et al. (2006). Analysis of molecular recognition features (MoRFS). Journal of Molecular Biology, 362, 10431059.
49. Oldfield, C. J., et al. (2005). Coupled folding and binding with
alpha-helix-forming molecular recognition elements. Biochemistry United States of America, 44, 1245412470.
50. Garner, E., Romero, P., Dunker, A. K., Brown, C., & Obradovic, Z.
(1999). Predicting binding regions within disordered proteins.
Genome Informatics Workshop on Genome Informatics, 10, 4150.
51. Zhou, Y. Q., Karplus, M., Wichert, J. M., & Hall, C. K. (1997).
Equilibrium thermodynamics of homopolymers and clusters:
Molecular dynamics and Monte Carlo simulations of systems
with square-well interactions. Journal of Chemical Physics, 107,
1069110708.
52. Sikirzhytski, V., et al. (2012). Fibrillation mechanism of a model
intrinsically disordered protein revealed by 2D correlation deep
UV resonance Raman spectroscopy. Biomacromolecules, 13,
15031509.
53. Chiti, F., & Dobson, C. M. (2009). Amyloid formation by globular proteins under native conditions. Nature Chemical Biology,
5, 1522.
54. Kuwata, K., Kamatari, Y. O., Akasaka, K., & James, T. L. (2004).
Slow conformational dynamics in the hamster prion protein.
Biochemistry United States of America, 43, 44394446.
55. Saiki, M., Hidaka, Y., Nara, M., & Morii, H. (2012). Stemforming regions that are essential for the amyloidogenesis of
prion proteins. Biochemistry United States of America, 51,
15661576.
56. Tycko, R., Savtchenko, R., Ostapchenko, V. G., Makarava, N., &
Baskakov, I. V. (2010). The alpha-helical C-terminal domain of
full-length recombinant PrP converts to an in-register parallel
beta-sheet structure in PrP fibrils: Evidence from solid state
nuclear magnetic resonance. Biochemistry United States of
America, 49, 94889497.
57. Goldschmidt, L., Teng, P. K., Riek, R., & Eisenberg, D. (2010).
Identifying the amylome, proteins capable of forming amyloidlike fibrils. Proceedings of the National Academy of Sciences
United States of America, 107, 34873492.
58. Chiti, F., Stefani, M., Taddei, N., Ramponi, G., & Dobson, C. M.
(2003). Rationalization of the effects of mutations on peptide and
protein aggregation rates. Nature, 424, 805808.
59. Tartaglia, G. G., Cavalli, A., Pellarin, R., & Caflisch, A. (2004).
The role of aromaticity, exposed surface, and dipole moment in
determining protein aggregation rates. Protein Science, 13,
19391941.
60. Fernandez-Escamilla, A. M., Rousseau, F., Schymkowitz, J., &
Serrano, L. (2004). Prediction of sequence-dependent and
61.
62.
63.
64.
65.
66. Dosztanyi, Z., Csizmok, V., Tompa, P., & Simon, I. (2005).
IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.
Bioinformatics, 21, 34333434.
67. Hecker, J., Yang, J. Y., & Cheng, J. (2008). Protein disorder
prediction at multiple levels of sensitivity and specificity. BMC
Genomics, 9(Suppl 1), S9.
68. Ward, J. J., McGuffin, L. J., Bryson, K., Buxton, B. F., & Jones,
D. T. (2004). The DISOPRED server for the prediction of protein
disorder. Bioinformatics, 20, 21382139.
69. Schlessinger, A., Punta, M., Yachdav, G., Kajan, L., & Rost, B.
(2009). Improved disorder prediction by combination of orthogonal approaches. PLoS ONE, 4, e4433.
70. Mizianty, M. J., et al. (2010). Improved sequence-based prediction of disordered regions with multilayer fusion of multiple
information sources. Bioinformatics, 26, i489i496.
123