Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties
- 30 March 2007
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 68 (1) , 76-81
- https://doi.org/10.1002/prot.21408
Abstract
Proteins can move freely in three-dimensional space. As a result, their structural properties, such as solvent accessible surface area, backbone dihedral angles, and atomic distances, are continuous variables. However, these properties are often arbitrarily divided into a few classes to facilitate prediction by statistical learning techniques. In this work, we establish an integrated system of neural networks (called Real-SPINE) for real-value prediction and apply the method to predict residue-solvent accessibility and backbone ψ dihedral angles of proteins based on information derived from sequences only. Real-SPINE is trained with a large data set of 2640 protein chains, sequence profiles generated from multiple sequence alignment, representative amino-acid properties, a slow learning rate, overfitting protection, and predicted secondary structures. The method optimizes more than 200,000 weights and yields a 10-fold cross-validated Pearson's correlation coefficient (PCC) of 0.74 between predicted and actual solvent accessible surface areas and 0.62 between predicted and actual ψ angles. In particular, 90% of 2640 proteins have a PCC value greater than 0.6 between predicted and actual solvent-accessible surface areas. The results of Real-SPINE can be compared with the best reported correlation coefficients of 0.64–0.67 for solvent-accessible surface areas and 0.47 for ψ angles. The real-SPINE server, executable programs, and datasets are freely available on http://sparks.informatics.iupui.edu. Proteins 2007.Keywords
This publication has 20 references indexed in Scilit:
- QBES: Predicting real values of solvent accessibility from sequences by efficient, constrained energy optimizationProteins-Structure Function and Bioinformatics, 2006
- Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regressionProteins-Structure Function and Bioinformatics, 2005
- Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structureProteins-Structure Function and Bioinformatics, 2005
- Protein secondary structure prediction with dihedral anglesProteins-Structure Function and Bioinformatics, 2005
- Prediction of protein accessible surface areas by support vector regressionProteins-Structure Function and Bioinformatics, 2004
- Accurate prediction of solvent accessibility using neural networks–based regressionProteins-Structure Function and Bioinformatics, 2004
- PISCES: a protein sequence culling serverBioinformatics, 2003
- Real value prediction of solvent accessibility from amino acid sequenceProteins-Structure Function and Bioinformatics, 2003
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983