Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training
- 13 February 2007
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 66 (4) , 838-845
- https://doi.org/10.1002/prot.21298
Abstract
An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q3 (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88–90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained. Proteins 2007.Keywords
This publication has 57 references indexed in Scilit:
- Prediction of protein secondary structure based on residue pair types and conformational states using dynamic programming algorithmFEBS Letters, 2005
- Comparison of probabilistic combination methods for protein secondary structure predictionBioinformatics, 2004
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Hybrid system for protein secondary structure predictionJournal of Molecular Biology, 1992
- Predicting the secondary structure of globular proteins using neural network modelsJournal of Molecular Biology, 1988
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Statistical analysis of the correlation among amino acid residues in helical, β-stractural and non-regular regions of globular proteinsJournal of Molecular Biology, 1971
- Structural Studies of Ribonuclease. III. A Model for the Secondary and Tertiary Structure1,2Journal of the American Chemical Society, 1960