De novo fragment assembly with short mate-paired reads: Does the read length matter?
- 3 December 2008
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 19 (2) , 336-346
- https://doi.org/10.1101/gr.079053.108
Abstract
Increasing read length is currently viewed as the crucial condition for fragment assembly with next-generation sequencing technologies. However, introducing mate-paired reads (separated by a gap of length, GapLength) opens a possibility to transform short mate-pairs into long mate-reads of length ≈ GapLength, and thus raises the question as to whether the read length (as opposed to GapLength) even matters. We describe a new tool, EULER-USR, for assembling mate-paired short reads and use it to analyze the question of whether the read length matters. We further complement the ongoing experimental efforts to maximize read length by a new computational approach for increasing the effective read length. While the common practice is to trim the error-prone tails of the reads, we present an approach that substitutes trimming with error correction using repeat graphs. An important and counterintuitive implication of this result is that one may extend sequencing reactions that degrade with length “past their prime” to where the error rate grows above what is normally acceptable for fragment assembly.Keywords
This publication has 27 references indexed in Scilit:
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computerGenome Research, 2008
- Short read fragment assembly of bacterial genomesGenome Research, 2007
- SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencingGenome Research, 2007
- Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotationGenome Research, 2007
- High-Resolution Profiling of Histone Methylations in the Human GenomePublished by Elsevier ,2007
- Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomesNucleic Acids Research, 2006
- The fragment assembly string graphBioinformatics, 2005
- A New Algorithm for DNA Sequence AssemblyJournal of Computational Biology, 1995