Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler
Open Access
- 22 December 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 4 (12) , e8407
- https://doi.org/10.1371/journal.pone.0008407
Abstract
Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly. These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.Keywords
This publication has 29 references indexed in Scilit:
- ABySS: A parallel assembler for short read sequence dataGenome Research, 2009
- Crystallizing short-read assemblies around seedsBMC Bioinformatics, 2009
- De novo fragment assembly with short mate-paired reads: Does the read length matter?Genome Research, 2008
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computerGenome Research, 2008
- Short read fragment assembly of bacterial genomesGenome Research, 2007
- SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencingGenome Research, 2007