miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
Open Access
- 21 July 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (13) , 4335-4344
- https://doi.org/10.1093/nar/gki739
Abstract
A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Existing tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users.Keywords
This publication has 14 references indexed in Scilit:
- Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profilesProceedings of the National Academy of Sciences, 2004
- Guidelines for incorporating non-perfectly matched oligonucleotides into target-specific hybridization probes for a DNA microarrayNucleic Acids Research, 2004
- OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approachNucleic Acids Research, 2003
- Human–Mouse Alignments with BLASTZGenome Research, 2002
- An open-source oligomicroarray standard for human and mouseNature Biotechnology, 2002
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- SSAHA: A Fast Search Method for Large DNA DatabasesGenome Research, 2001
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000
- [27] Local alignment statisticsPublished by Elsevier ,1996
- Basic local alignment search toolJournal of Molecular Biology, 1990