Enhanced protein fold recognition using a structural alphabet
- 11 November 2008
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 76 (1) , 129-137
- https://doi.org/10.1002/prot.22324
Abstract
Fold recognition from sequence can be an important step in protein structure and function prediction. Many methods have tackled this goal. Most of them, based on sequence alignment, fail for sequences of low similarity. Alignment–free approaches can provide an efficient alternative. For such approaches, the identification of efficient fold discriminatory features is critical. We propose a new fold recognition approach that relies on the encoding of the local structure of proteins using a Hidden Markov Model Structural Alphabet. This encoding provides a 1D description of the conformation of complete proteins structures, including loops. At the fold level, compared with the classical secondary structure helix, strand, and coil states, such encoding is expected to provide the means of a better discrimination between loop conformations, hence providing better fold identification. Compared with previous related approaches, this supplement of information results in significant improvement. When combining this information with supplementary information of secondary structure and residue burial, we obtain a fold recognition accuracy of 78% for 27 protein families, that is, 8% higher than the best available method so far, and of 68% for 60 families. Corresponding scores at the class level are of 92% and 90% indicating that mispredictions are mostly within structural classes. Proteins 2009.Keywords
This publication has 31 references indexed in Scilit:
- Protein homology detection and fold inference through multiple alignment entropy profilesProteins-Structure Function and Bioinformatics, 2007
- SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognitionBMC Bioinformatics, 2007
- A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabetsBiosystems, 2007
- Ensemble classifier for protein fold pattern recognitionBioinformatics, 2006
- A machine learning information retrieval approach to protein fold recognitionBioinformatics, 2006
- TASSER: An automated method for the prediction of protein tertiary structures in CASP6Proteins-Structure Function and Bioinformatics, 2005
- Fold recognition by combining profile-profile alignment and support vector machineBioinformatics, 2005
- Protein homology detection by HMM–HMM comparisonBioinformatics, 2004
- Protein Structure Prediction and Structural GenomicsScience, 2001
- A new approach to protein fold recognitionNature, 1992