Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure
- 16 August 2005
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 61 (2) , 318-324
- https://doi.org/10.1002/prot.20630
Abstract
The present study is an attempt to develop a neural network‐based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed‐forward networks with a single hidden layer have been trained with standard back‐propagation as a learning algorithm. The Pearson's correlation coefficient increases from 0.53 to 0.63, and mean absolute error decreases from 18.2 to 16% when multiple‐sequence alignment obtained from PSI‐BLAST is used as input instead of a single sequence. The performance of the method further improves from a correlation coefficient of 0.63 to 0.67 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields a mean absolute error value of 15.2% between the experimental and predicted values, when tested on two different nonhomologous and nonredundant datasets of varying sizes. The method consists of two steps: (1) in the first step, a sequence‐to‐structure network is trained with the multiple alignment profiles in the form of PSI‐BLAST‐generated position‐specific scoring matrices, and (2) in the second step, the output obtained from the first network and PSIPRED‐predicted secondary structure information is used as an input to the second structure‐to‐structure network. Based on the present study, a server SARpred (http://www.imtech.res.in/raghava/sarpred/) has been developed that predicts the real value of solvent accessibility of residues for a given protein sequence. We have also evaluated the performance of SARpred on 47 proteins used in CASP6 and achieved a correlation coefficient of 0.68 and a MAE of 15.9% between predicted and observed values. Proteins 2005.Keywords
This publication has 27 references indexed in Scilit:
- Accurate prediction of solvent accessibility using neural networks–based regressionProteins-Structure Function and Bioinformatics, 2004
- Prediction of α‐turns in proteins using PSI‐BLAST profiles and secondary structure informationProteins-Structure Function and Bioinformatics, 2004
- A neural-network based method for prediction of γ-turns in proteins from multiple sequence alignmentProtein Science, 2003
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- PredAcc: prediction of solvent accessibility.Bioinformatics, 1999
- Adaptation of protein surfaces to subcellular location 1 1Edited by F. E. CohenJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of Protein Structural ClassesCritical Reviews in Biochemistry and Molecular Biology, 1995
- Learning representations by back-propagating errorsNature, 1986
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983