Multiple spaced seeds for homology search
Open Access
- 5 September 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (22) , 2969-2977
- https://doi.org/10.1093/bioinformatics/btm422
Abstract
Motivation: Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith–Waterman sensitivity is approached at BLASTn speed. However, computing optimal multiple spaced seeds was proved to be NP-hard and current heuristic algorithms are all very slow (exponential). Results: We give a simple algorithm which computes good multiple seeds in polynomial time. Due to a completely different approach, the difference with respect to the previous methods is dramatic. The multiple spaced seed of PatternHunterII, with 16 weight 11 seeds, was computed in 12 days. It takes us 17 s to find a better one. Our approach changes the way of looking at multiple spaced seeds. Contact:ilie@csd.uwo.caKeywords
This publication has 25 references indexed in Scilit:
- Quick, Practical Selection of Effective Seeds for Homology SearchJournal of Computational Biology, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- tPatternHunter: gapped, fast and sensitive translated homology searchBioinformatics, 2004
- Sensitivity analysis and efficient method for identifying optimal spaced seedsJournal of Computer and System Sciences, 2004
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Multiple filtration and approximate pattern matchingAlgorithmica, 1995
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970