Hidden Markov model speed heuristic and iterative HMM search procedure
Top Cited Papers
Open Access
- 18 August 2010
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 11 (1) , 1-8
- https://doi.org/10.1186/1471-2105-11-431
Abstract
Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.Keywords
This publication has 15 references indexed in Scilit:
- A comparison of profile hidden Markov model procedures for remote homology detectionNucleic Acids Research, 2002
- ASTRAL compendium enhancementsNucleic Acids Research, 2002
- The ASTRAL compendium for protein structure and sequence analysisNucleic Acids Research, 2000
- SCOP: a Structural Classification of Proteins databaseNucleic Acids Research, 2000
- Removing near-neighbour redundancy from large protein sequence collections.Bioinformatics, 1998
- Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationshipsProceedings of the National Academy of Sciences, 1998
- Homology Detection via Family Pairwise SearchJournal of Computational Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Comparison of methods for searching protein sequence databasesProtein Science, 1995
- Basic local alignment search toolJournal of Molecular Biology, 1990