Vous êtes sur la page 1sur 3

computer programs

Journal of
Applied
CSSP (Consensus Secondary Structure Prediction):
Crystallography a web-based server for structural biologists
ISSN 0021-8898

Ankit Gupta,a Avnish Deshpande,a Janardhan Kumar Amburi,a Radhakrishnan


Received 1 October 2008 Sabarinathan,a Ramaswamy Senthilkumara and Kanagaraj Sekara,b*
Accepted 24 December 2008
a
Bioinformatics Centre, (Centre of Excellence in Structural Biology and Bio-computing), Indian Institute of
Science, Bangalore 560 012, India, and bSupercomputer Education and Research Centre, Indian Institute
of Science, Bangalore 560 012, India. Correspondence e-mail: sekar@serc.iisc.ernet.in

Sequence–structure correlation studies are important in deciphering the


relationships between various structural aspects, which may shed light on the
protein-folding problem. The first step of this process is the prediction of
secondary structure for a protein sequence of unknown three-dimensional
structure. To this end, a web server has been created to predict the consensus
secondary structure using well known algorithms from the literature.
Furthermore, the server allows users to see the occurrence of predicted
secondary structural elements in other structure and sequence databases and to
# 2009 International Union of Crystallography visualize predicted helices as a helical wheel plot. The web server is accessible at
Printed in Singapore – all rights reserved http://bioserver1.physics.iisc.ernet.in/cssp/.

1. Introduction berg, 1996), SIMPA96 (Levin, 1997), GOR IV (Garnier et al., 1996),
A number of servers exist for the analysis of the secondary structural Predator (Frishman & Argos, 1996), PSIPRED (Jones, 1999) and
elements of proteins (Shanthi et al., 2003; Balamurugan et al., 2005), PROFPHD (Rost et al., 1994). The secondary structure predicted by
which are the building blocks of the three-dimensional protein the most methods for a particular residue in a given sequence is
structures. The secondary structures are, in turn, dependent on the considered to be the consensus prediction for that residue. However,
sequence of their component amino acid residues. Hence, over the in situations where three methods predict an -helix and the
past two decades, a large section of the bioinformatics research remaining methods predict a -sheet for the residue, the secondary
community has been investigating various ways to bridge the gap structural element is decided by using a simple numerical calculation.
between these two fundamental levels of protein structure, namely, First, each method is assigned a weight based on its accuracy. For
the primary sequence and the secondary structure. A literature example, a method with an accuracy of 70% is assigned a weight of
survey shows that the first sequence–structure correlation study was 0.7. Subsequently, the product of the weights for each set of methods
carried out by Blout et al. (1960), who assigned -helix forming and predicting different secondary structural elements is calculated.
breaking properties to seven types of amino acid residues based on Finally, the prediction made by the methods having a higher product
experiments with homopolymers and polypeptides. Davies (1964) value is incorporated into the consensus prediction, thus bringing out
further applied these results to native globular proteins and estab- the best possible accuracy of prediction of the secondary structural
lished an anticorrelation between helix content as determined by elements. CSSP is written using C and implemented on a bioinfor-
optical rotatory dispersion measurements and the content of the matics Linux sever (a 3.4 GHz Pentium dual core processor equipped
residues. Over time, a number of methodologies were developed to with 2 GB RAM and Fedora Core 7.0).
predict the secondary structure of a protein (Schulz et al., 1979).
However, with the availability of a wide range of secondary structure
prediction methods with varying accuracy (from 64 to 77%), the 3. Utilities
problem lies in identifying the most efficient method from the pool of The output of CSSP contains the predictions made by the individual
prediction methods. Higher levels of accuracy can be achieved by methods along with a consensus output. The server has the specialty
using the consensus secondary structure of different prediction that the location of the starting and end residues of each predicted
methods. The end result of this venture would aid structural biologists secondary structural element along with the complete amino acid
in identifying the most appropriate protein model from the Protein
Data Bank to solve unknown crystal structures.

Table 1
The secondary structure prediction methods deployed in CSSP, with their
2. Methodology and specifications percentage accuracy (as mentioned in the literature).
After a careful survey of the literature, a pool of six different avail- DSC Q3 state prediction accuracy 70%
able secondary structure prediction methods (Table 1) was selected, SIMPA96 Prediction accuracy 68.7%
GOR IV Q3 state accuracy 64.4%
and their respective codes were downloaded and executed on our Predator Prediction accuracy 68%
local server (Bioinformatics Centre, Indian Institute of Science, PSIPRED Prediction accuracy 77.5%
PROFPHD Q3 state prediction accuracy 72.1%
Bangalore, India). The methods chosen were DSC (King & Stern-

336 doi:10.1107/S0021889808043847 J. Appl. Cryst. (2009). 42, 336–338


computer programs
flavin synthase from Bacillus subtilis
(Ritsert et al., 1995). The predicted result
of the secondary structure assignment of
the CSSP server is shown in Fig. 1. The
server predicts both -helices and
-strands. The prediction agrees well with
the secondary structural elements
observed in the actual three-dimensional
structure (PDB code 1rvv) and the
accuracy is 89% (Fig. 2). Fig. 3 shows the
output of a search for a short peptide
‘IDVAWV’ (residues 48–53) in all protein
structures in the PDB (code 1zis, chain
G), where the query is displayed in cyan
among the backbone trace of chain A
(grey) of the protein. Secondly, a 153
amino acid sequence of sperm whale
myoglobin N-butyl isocynaide (PDB
code 105m; Smith, 1999) was given as
input to the server. In about 30 s, the
server predicted mostly -helices
(Fig. S11). A comparison of the predicted
results of the proposed server CSSP with
the secondary structural elements
observed in the actual three-dimensional
protein structure is shown in Fig. S2.1 The
results produced were very similar with
an accuracy of 85%. Furthermore, a 109
amino acid sequence of mannose-specific
agglutinin from Glanthus nivalis (PDB
code 1jpc; Wright & Hester, 1996) was
also given to CSSP for secondary struc-
ture prediction. The consensus prediction
was produced in 25 s and the results are
shown in Fig. S3.1 In this case, the server
predicted only -strands and the results
obtained were very similar to the actual
secondary structures reported in the
Figure 1 crystal structure (Fig. S41) with an accu-
Predicted secondary structural elements for the sequence of riboflavin synthase from Bacillus subtilis (PDB code
1rvv). racy of 85%. It is evident from the case
studies that the predicted secondary
structural elements of the server agree
well with the actual secondary structural
sequence is displayed. Furthermore, a link is provided for the users to elements observed in the respective three-dimensional protein
see the helical wheel plot for the predicted -helical fragments. In structures. On average, the server requires 6 min to predict the
addition, there are links to search for the occurrence of the predicted secondary structure for 1500 amino acids. As a result of the compu-
secondary structural elements in various structural and sequence tational complexity, the maximum number of residues in a protein
databases like the Protein Data Bank (PDB; Berman et al., 2002), sequence is restricted to 4000.
GDB, SWISS-PROT (Bairoch & Apweiler, 1998) and the PIR
(Barker et al., 1998). The databases employed in the search are
updated periodically and hence users receive up-to-date information.
The molecular visualization programs RASMOL (Sayle & Milner- 5. Conclusion
White, 1995) and Jmol (http://www.jmol.org) have been interfaced
The proposed web server, CSSP, provides a reasonably accurate
with the server to enable the user to view the three-dimensional
protein secondary structure prediction for structural biologists.
structure of the detected secondary structural element if it is present
Furthermore, the server allows users to view the percentage of
in the protein structures available in the PDB.
various secondary structural elements predicted. Extensive options
for search and visualization enable additional analysis and inspection
of the predicted structural elements. It can be concluded that the
4. Case studies 1
Supplementary material has been deposited with the IUCr. This is available
The proposed server CSSP was used to predict the secondary struc- on the IUCr electronic archives (Reference: HE5421). Services for accessing
ture for chain A of the protein (154 amino acid residues long) ribo- these data are described at the back of the journal.

J. Appl. Cryst. (2009). 42, 336–338 Ankit Gupta et al.  CSSP 337
computer programs
output of CSSP will be an aid to bridge the
gap between an amino acid sequence and
its secondary structure. Users of this server
are requested to cite the URL and this
article in their scientific reports and publi-
cations. Comments and suggestions can be
sent to Dr K. Sekar.

The authors acknowledge the use of the


Bioinformatics Centre and the Super-
computer Education and Research Centre.
KS thanks the Department of Biotech-
nology, Government of India, for financial
support in the form of a research grant.

References
Bairoch, A. & Apweiler, R. (1998). Nucleic Acids
Res. 26, 38–42.
Figure 2 Balamurugan, B., Samaya Mohan, K., Ramesh, J.,
A comparison of the predicted (CSSP) secondary structures with the structural elements observed (PDB) in the Roshan, M. N. A. Md., Sumathi, K. & Sekar,
actual three-dimensional structure of riboflavin synthase from Bacillus subtilis (PDB code 1rvv). K. (2005). Acta Cryst. D61, 634–636.
Barker, W. C., Garavelli, J. S., Haft, D. H., Hunt,
L. T., Marzec, C. R., Orcutt, B. C., Sriniva-
sarao, G. Y., Yeh, L. S. L., Ledley, R. S., Mewes,
H. W., Pfeiffer, F. & Tsugita, A. (1998). Nucleic
Acids Res. 26, 27–32.
Berman, H. M. et al. (2002). Acta Cryst. D58,
899–907.
Blout, E. R., de Loze, C., Bloom, S. M. & Fasman,
G. D. (1960). J. Am. Chem. Soc. 82, 3787–3789.
Davies, D. R. (1964). J. Mol. Biol. 9, 605–609.
Frishman, D. & Argos, P. (1996). Protein Eng. 9,
133–142.
Garnier, J., Gibrat, J. F. & Robson, B. I. (1996).
Methods Enzymol. 266, 540–553.
Jones, D. T. (1999). J. Mol. Biol. 292, 195–202.
King, R. D. & Sternberg, M. J. (1996). Protein Sci.
5, 2298–2310.
Levin, J. (1997). Protein Eng. 7, 771–776.
Ritsert, K., Huber, R., Turk, D., Ladenstein, R.,
Schmidt-Base, K. & Bacher, A. (1995). J. Mol.
Biol. 253, 151–167.
Rost, B., Sander, C. & Schneider, R. (1994).
Comput. Appl. Biosci. 10, 53–60.
Sayle, R. A. & Milner-White, E. J. (1995). Trends
Biochem. Sci. 20, 374–376.
Schulz, G. E., Schirmer, R. H. & Cantor, C. R.
(1979). Principles of Protein Structure. New
York: Springer-Verlag New York Inc.
Shanthi, V., Selvarani, P., Kumar, K. Ch., Mohire,
C. S. & Sekar, K. (2003). Nucleic Acids Res. 31,
3404–3405.
Smith, R. D. (1999). PhD thesis, Rice University,
Figure 3 Texas, USA.
Sample output produced by the web server for a search for the occurrence of a motif (IDVAWV) in all the protein Wright, C. S. & Hester, G. (1996). Structure, 4,
structures. Inset: a visualization of the three-dimensional structure (Jmol). 1339–1352.

338 Ankit Gupta et al.  CSSP J. Appl. Cryst. (2009). 42, 336–338

Vous aimerez peut-être aussi