PROTEIN STRUCTURE AND FOLD PREDICTION USING TREE-AUGMENTED NAÏVE BAYESIAN CLASSIFIER
- 1 August 2005
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 3 (4) , 803-819
- https://doi.org/10.1142/s0219720005001302
Abstract
Due to the large volume of protein sequence data, computational methods to determine the structure class and the fold class of a protein sequence have become essential. Several techniques based on sequence similarity, Neural Networks, Support Vector Machines (SVMs), etc. have been applied. Since most of these classifiers use binary classifiers for multi-classification, there may be Nc2 classifiers required. This paper presents a framework using the Tree-Augmented Bayesian Networks (TAN) which performs multi-classification based on the theory of learning Bayesian Networks and using improved feature vector representation of (Ding et al., 2001).4 In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, hydrophobicity) for each class, which helps to further understand the complexity in protein structure. The experiments on the datasets used in three prominent recent works show that our approach is more accurate than other discriminative methods. The framework is implemented on the BAYESPROT web server and it is available at . More detailed results are also available on the above website.Keywords
This publication has 10 references indexed in Scilit:
- Addressing the problems of Bayesian network classification of video using high-dimensional featuresIEEE Transactions on Knowledge and Data Engineering, 2004
- Support Vector Machines for Protein Fold Class PredictionBiometrical Journal, 2003
- Support Vector Machines for predicting protein structural classBMC Bioinformatics, 2001
- Role and Results of statistical methods in protein fold class predictionMathematical and Computer Modelling, 2001
- Entropy and MDL discretization of continuous variables for Bayesian belief networksInternational Journal of Intelligent Systems, 1999
- Recognition of a protein fold in the context of the SCOP classificationProteins-Structure Function and Bioinformatics, 1999
- On the Optimality of the Simple Bayesian Classifier under Zero-One LossMachine Learning, 1997
- Threading thrills and threatsStructure, 1996
- Prediction of protein folding class using global description of amino acid sequence.Proceedings of the National Academy of Sciences, 1995
- A new approach to protein fold recognitionNature, 1992