Académique Documents
Professionnel Documents
Culture Documents
Journal of
Applied
CSSP (Consensus Secondary Structure Prediction):
Crystallography a web-based server for structural biologists
ISSN 0021-8898
1. Introduction berg, 1996), SIMPA96 (Levin, 1997), GOR IV (Garnier et al., 1996),
A number of servers exist for the analysis of the secondary structural Predator (Frishman & Argos, 1996), PSIPRED (Jones, 1999) and
elements of proteins (Shanthi et al., 2003; Balamurugan et al., 2005), PROFPHD (Rost et al., 1994). The secondary structure predicted by
which are the building blocks of the three-dimensional protein the most methods for a particular residue in a given sequence is
structures. The secondary structures are, in turn, dependent on the considered to be the consensus prediction for that residue. However,
sequence of their component amino acid residues. Hence, over the in situations where three methods predict an -helix and the
past two decades, a large section of the bioinformatics research remaining methods predict a -sheet for the residue, the secondary
community has been investigating various ways to bridge the gap structural element is decided by using a simple numerical calculation.
between these two fundamental levels of protein structure, namely, First, each method is assigned a weight based on its accuracy. For
the primary sequence and the secondary structure. A literature example, a method with an accuracy of 70% is assigned a weight of
survey shows that the first sequence–structure correlation study was 0.7. Subsequently, the product of the weights for each set of methods
carried out by Blout et al. (1960), who assigned -helix forming and predicting different secondary structural elements is calculated.
breaking properties to seven types of amino acid residues based on Finally, the prediction made by the methods having a higher product
experiments with homopolymers and polypeptides. Davies (1964) value is incorporated into the consensus prediction, thus bringing out
further applied these results to native globular proteins and estab- the best possible accuracy of prediction of the secondary structural
lished an anticorrelation between helix content as determined by elements. CSSP is written using C and implemented on a bioinfor-
optical rotatory dispersion measurements and the content of the matics Linux sever (a 3.4 GHz Pentium dual core processor equipped
residues. Over time, a number of methodologies were developed to with 2 GB RAM and Fedora Core 7.0).
predict the secondary structure of a protein (Schulz et al., 1979).
However, with the availability of a wide range of secondary structure
prediction methods with varying accuracy (from 64 to 77%), the 3. Utilities
problem lies in identifying the most efficient method from the pool of The output of CSSP contains the predictions made by the individual
prediction methods. Higher levels of accuracy can be achieved by methods along with a consensus output. The server has the specialty
using the consensus secondary structure of different prediction that the location of the starting and end residues of each predicted
methods. The end result of this venture would aid structural biologists secondary structural element along with the complete amino acid
in identifying the most appropriate protein model from the Protein
Data Bank to solve unknown crystal structures.
Table 1
The secondary structure prediction methods deployed in CSSP, with their
2. Methodology and specifications percentage accuracy (as mentioned in the literature).
After a careful survey of the literature, a pool of six different avail- DSC Q3 state prediction accuracy 70%
able secondary structure prediction methods (Table 1) was selected, SIMPA96 Prediction accuracy 68.7%
GOR IV Q3 state accuracy 64.4%
and their respective codes were downloaded and executed on our Predator Prediction accuracy 68%
local server (Bioinformatics Centre, Indian Institute of Science, PSIPRED Prediction accuracy 77.5%
PROFPHD Q3 state prediction accuracy 72.1%
Bangalore, India). The methods chosen were DSC (King & Stern-
J. Appl. Cryst. (2009). 42, 336–338 Ankit Gupta et al. CSSP 337
computer programs
output of CSSP will be an aid to bridge the
gap between an amino acid sequence and
its secondary structure. Users of this server
are requested to cite the URL and this
article in their scientific reports and publi-
cations. Comments and suggestions can be
sent to Dr K. Sekar.
References
Bairoch, A. & Apweiler, R. (1998). Nucleic Acids
Res. 26, 38–42.
Figure 2 Balamurugan, B., Samaya Mohan, K., Ramesh, J.,
A comparison of the predicted (CSSP) secondary structures with the structural elements observed (PDB) in the Roshan, M. N. A. Md., Sumathi, K. & Sekar,
actual three-dimensional structure of riboflavin synthase from Bacillus subtilis (PDB code 1rvv). K. (2005). Acta Cryst. D61, 634–636.
Barker, W. C., Garavelli, J. S., Haft, D. H., Hunt,
L. T., Marzec, C. R., Orcutt, B. C., Sriniva-
sarao, G. Y., Yeh, L. S. L., Ledley, R. S., Mewes,
H. W., Pfeiffer, F. & Tsugita, A. (1998). Nucleic
Acids Res. 26, 27–32.
Berman, H. M. et al. (2002). Acta Cryst. D58,
899–907.
Blout, E. R., de Loze, C., Bloom, S. M. & Fasman,
G. D. (1960). J. Am. Chem. Soc. 82, 3787–3789.
Davies, D. R. (1964). J. Mol. Biol. 9, 605–609.
Frishman, D. & Argos, P. (1996). Protein Eng. 9,
133–142.
Garnier, J., Gibrat, J. F. & Robson, B. I. (1996).
Methods Enzymol. 266, 540–553.
Jones, D. T. (1999). J. Mol. Biol. 292, 195–202.
King, R. D. & Sternberg, M. J. (1996). Protein Sci.
5, 2298–2310.
Levin, J. (1997). Protein Eng. 7, 771–776.
Ritsert, K., Huber, R., Turk, D., Ladenstein, R.,
Schmidt-Base, K. & Bacher, A. (1995). J. Mol.
Biol. 253, 151–167.
Rost, B., Sander, C. & Schneider, R. (1994).
Comput. Appl. Biosci. 10, 53–60.
Sayle, R. A. & Milner-White, E. J. (1995). Trends
Biochem. Sci. 20, 374–376.
Schulz, G. E., Schirmer, R. H. & Cantor, C. R.
(1979). Principles of Protein Structure. New
York: Springer-Verlag New York Inc.
Shanthi, V., Selvarani, P., Kumar, K. Ch., Mohire,
C. S. & Sekar, K. (2003). Nucleic Acids Res. 31,
3404–3405.
Smith, R. D. (1999). PhD thesis, Rice University,
Figure 3 Texas, USA.
Sample output produced by the web server for a search for the occurrence of a motif (IDVAWV) in all the protein Wright, C. S. & Hester, G. (1996). Structure, 4,
structures. Inset: a visualization of the three-dimensional structure (Jmol). 1339–1352.
338 Ankit Gupta et al. CSSP J. Appl. Cryst. (2009). 42, 336–338