Accurate and automated classification of protein secondary structure with PsiCSI

Open Access

1 February 2003

journal article
Published by Wiley in Protein Science

Vol. 12 (2) , 288-295
https://doi.org/10.1110/ps.0222303

Abstract

PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without (13)C shifts and Q3 = 86.8% with only H(alpha) shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.

Keywords

This publication has 34 references indexed in Scilit:

Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von Heijne
Journal of Molecular Biology, 1999
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997
Hidden Markov Models in Computational Biology
Journal of Molecular Biology, 1994
The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy
Biochemistry, 1992
Relationship between nuclear magnetic resonance chemical shift and protein secondary structure
Journal of Molecular Biology, 1991
A relational database for sequence-specific protein NMR data
Journal of Biomolecular NMR, 1991
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
Journal of Molecular Biology, 1978
Algorithms for prediction of α-helical and β-structural regions in globular proteins
Journal of Molecular Biology, 1974