Microindel detection in short-read sequence data
Open Access
- 9 February 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (6) , 722-729
- https://doi.org/10.1093/bioinformatics/btq027
Abstract
Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels. Contact:peter.krawitz@googlemail.com; peter.robinson@charite.de Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 19 references indexed in Scilit:
- The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic groupGenome Research, 2009
- Evaluation of next generation sequencing platforms for population targeted sequencing studiesGenome Biology, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- Paired-End Mapping Reveals Extensive Structural Variation in the Human GenomeScience, 2007
- The Diploid Genome Sequence of an Individual HumanPLoS Biology, 2007
- Microdeletions and microinsertions causing human genetic disease: common mechanisms of mutagenesis and the role of local DNA sequence complexityHuman Mutation, 2005
- Comprehensive identification and characterization of diallelic insertion–deletion polymorphisms in 330 human candidate genesHuman Molecular Genetics, 2004
- mreps: efficient and flexible detection of tandem repeats in DNANucleic Acids Research, 2003
- The Mutation Process of Microsatellites During the Polymerase Chain ReactionJournal of Computational Biology, 2003