Accurate and automated classification of protein secondary structure with PsiCSI
Open Access
- 1 February 2003
- journal article
- Published by Wiley in Protein Science
- Vol. 12 (2) , 288-295
- https://doi.org/10.1110/ps.0222303
Abstract
PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without (13)C shifts and Q3 = 86.8% with only H(alpha) shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.Keywords
This publication has 34 references indexed in Scilit:
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopyBiochemistry, 1992
- Relationship between nuclear magnetic resonance chemical shift and protein secondary structureJournal of Molecular Biology, 1991
- A relational database for sequence-specific protein NMR dataJournal of Biomolecular NMR, 1991
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteinsJournal of Molecular Biology, 1978
- Algorithms for prediction of α-helical and β-structural regions in globular proteinsJournal of Molecular Biology, 1974