Hobbes: optimized gram-based methods for efficient read alignment
Open Access
- 22 December 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 40 (6) , e41
- https://doi.org/10.1093/nar/gkr1246
Abstract
Recent advances in sequencing technology have enabled the rapid generation of billions of bases at relatively low cost. A crucial first step in many sequencing applications is to map those reads to a reference genome. However, when the reference genome is large, finding accurate mappings poses a significant computational challenge due to the sheer amount of reads, and because many reads map to the reference sequence approximately but not exactly. We introduce Hobbes, a new gram-based program for aligning short reads, supporting Hamming and edit distance. Hobbes implements two novel techniques, which yield substantial performance improvements: an optimized gram-selection procedure for reads, and a cache-efficient filter for pruning candidate mappings. We systematically tested the performance of Hobbes on both real and simulated data with read lengths varying from 35 to 100 bp, and compared its performance with several state-of-the-art read-mapping programs, including Bowtie, BWA, mrsFast and RazerS. Hobbes is faster than all other read mapping programs we have tested while maintaining high mapping quality. Hobbes is about five times faster than Bowtie and about 2-10 times faster than BWA, depending on read length and error rate, when asked to find all mapping locations of a read in the human genome within a given Hamming or edit distance, respectively. Hobbes supports the SAM output format and is publicly available at http://hobbes.ics.uci.edu.Keywords
This publication has 23 references indexed in Scilit:
- AREM: Aligning Short Reads from ChIP-Sequencing by Expectation MaximizationJournal of Computational Biology, 2011
- mrsFAST: a cache-oblivious algorithm for short-read mappingNature Methods, 2010
- Personalized copy number and segmental duplication maps using next-generation sequencingNature Genetics, 2009
- RazerS—fast read mapping with sensitivity controlGenome Research, 2009
- SOAP2: an improved ultrafast tool for short read alignmentBioinformatics, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- ZOOM! Zillions of oligos mappedBioinformatics, 2008
- Rates of Transition and Transversion in Coding Sequences since the Human-Rodent DivergenceGenomics, 1994
- Approximate string-matching with q-grams and maximal matchesTheoretical Computer Science, 1992