A novel hierarchical ensemble classifier for protein fold recognition
Open Access
- 27 August 2008
- journal article
- Published by Oxford University Press (OUP) in Protein Engineering, Design and Selection
- Vol. 21 (11) , 659-664
- https://doi.org/10.1093/protein/gzn045
Abstract
The ensemble classifier plays a critical role in protein fold recognition. In this article, a novel hierarchical ensemble classifier named GAOEC (Genetic-Algorithm Optimized Ensemble Classifier) is presented and it can be constructed in the following steps. First, a novel optimized classifier named GAET-KNN (Genetic-Algorithm Evidence-Theoretic K Nearest Neighbors) is proposed as a component classifier. Second, six component classifiers in the first layer are used to get a potential class index for every query protein. Third, according to the results of the first layer, every component classifier in the second layer generates a 27-dimension vector whose elements represent the confidence degrees of 27-folds. Finally, genetic algorithm is used for generating weights for the outputs of the second layer to get the final classification result. The standard percentage accuracy of GAOEC is 64.7% on a widely used benchmark dataset, where the proteins in the testing set have less than 35% identity with those in the training set.Keywords
This publication has 15 references indexed in Scilit:
- A novel ensemble of classifiers for protein fold recognitionNeurocomputing, 2006
- Ensemble of classifiers for protein fold recognitionNeurocomputing, 2006
- A machine learning information retrieval approach to protein fold recognitionBioinformatics, 2006
- Fold recognition by combining profile-profile alignment and support vector machineBioinformatics, 2005
- Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognitionProteins-Structure Function and Bioinformatics, 2004
- A fast and elitist multiobjective genetic algorithm: NSGA-IIIEEE Transactions on Evolutionary Computation, 2002
- An evidence-theoretic k-NN rule with parameter optimizationIEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A k-nearest neighbor classification rule based on Dempster-Shafer theoryIEEE Transactions on Systems, Man, and Cybernetics, 1995
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993