A dynamic Bayesian network approach to protein secondary structure prediction
Open Access
- 25 January 2008
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (1) , 49
- https://doi.org/10.1186/1471-2105-9-49
Abstract
Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q 3 accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus. The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.Keywords
This publication has 34 references indexed in Scilit:
- Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale trainingProteins-Structure Function and Bioinformatics, 2007
- SAM-T04: What is new in protein-structure prediction for CASP6Proteins-Structure Function and Bioinformatics, 2005
- A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach1 1Edited by B. HollandJournal of Molecular Biology, 2001
- The Protein Data BankNucleic Acids Research, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Predicting protein secondary structure with probabilistic schemata of evolutionarily derived informationProtein Science, 1997
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Predicting the secondary structure of globular proteins using neural network modelsJournal of Molecular Biology, 1988
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983