Homology Detection via Family Pairwise Search
- 1 January 1998
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 5 (3) , 479-491
- https://doi.org/10.1089/cmb.1998.5.479
Abstract
The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed when only a single query sequence is known. Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Motif-based methods fall in between these two extremes. The current work introduces a straightforward generalization of pairwise sequence comparison algorithms to the case when multiple query sequences are available. This algorithm, called Family Pairwise Search (FPS), combines pairwise sequence comparison scores from each query sequence. A BLAST implementation of FPS is compared to representative examples of hidden Markov modeling (HMMER) and motif modeling (MEME). The three techniques are compared across a wide range of protein families, using query sets of varying sizes. BLAST FPS significantly outperforms motif-based and HMM methods. Furthermore, FPS is much more efficient than the training algorithms for statistical models.Keywords
This publication has 24 references indexed in Scilit:
- Combining evidence using p-values: application to sequence homology searches.Bioinformatics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Score Distributions for Simultaneous Matching to Multiple MotifsJournal of Computational Biology, 1997
- A structural explanation for the twilight zone of protein sequence homologyStructure, 1996
- Maximum Discrimination Hidden Markov Models of Sequence ConsensusJournal of Computational Biology, 1995
- Hidden Markov models of biological primary sequence information.Proceedings of the National Academy of Sciences, 1994
- The ENZYME data bankNucleic Acids Research, 1994
- Prosite: a dictionary of sites and patterns in proteinsNucleic Acids Research, 1992
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Weights for data related by a treeJournal of Molecular Biology, 1989