Improving model construction of profile HMMs for remote homology detection through structural alignment
Open Access
- 9 November 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 1-12
- https://doi.org/10.1186/1471-2105-8-435
Abstract
Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.Keywords
This publication has 59 references indexed in Scilit:
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence AlignmentsJournal of Molecular Biology, 2004
- Improving Profile HMM Discrimination by Adapting Transition ProbabilitiesJournal of Molecular Biology, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The Protein Data BankNucleic Acids Research, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989