SHRiMP: Accurate Mapping of Short Color-space Reads
Top Cited Papers
Open Access
- 22 May 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 5 (5) , e1000386
- https://doi.org/10.1371/journal.pcbi.1000386
Abstract
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp. Next Generation Sequencing (NGS) technologies are revolutionizing the way biologists acquire and analyze genomic data. NGS machines, such as Illumina/Solexa and AB SOLiD, are able to sequence genomes more cheaply by 200-fold than previous methods. One of the main application areas of NGS technologies is the discovery of genomic variation within a given species. The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (“reference”) genome. Differences between the reference and the reads are indicative either of polymorphisms, or of sequencing errors. Since the introduction of NGS technologies, many methods have been devised for mapping reads to reference genomes. However, these algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms. We present a novel read mapping method, SHRiMP, that can handle much greater amounts of polymorphism. Using Ciona savignyi as our target organism, we demonstrate that our method discovers significantly more variation than other methods. Additionally, we develop color-space extensions to classical alignment algorithms, allowing us to map color-space, or “dibase”, reads generated by AB SOLiD sequencers.Keywords
This publication has 18 references indexed in Scilit:
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- The diploid genome sequence of an Asian individualNature, 2008
- Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applicationsBioinformatics, 2008
- ZOOM! Zillions of oligos mappedBioinformatics, 2008
- SOAP: short oligonucleotide alignment programBioinformatics, 2008
- Extreme genomic variation in a natural populationProceedings of the National Academy of Sciences, 2007
- Efficient q-Gram Filters for Finding All ε-Matches over a Given LengthJournal of Computational Biology, 2006
- PATTERNHUNTER II: HIGHLY SENSITIVE AND FAST HOMOLOGY SEARCHJournal of Bioinformatics and Computational Biology, 2004
- Finding Motifs Using Random ProjectionsJournal of Computational Biology, 2002
- PatternHunter: faster and more sensitive homology searchBioinformatics, 2002