Crystallizing short-read assemblies around seeds
Open Access
- 30 January 2009
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 10 (S1) , S16
- https://doi.org/10.1186/1471-2105-10-s1-s16
Abstract
New short-read sequencing technologies produce enormous volumes of 25–30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this paper, we present a short-read de novo assembler particularly targeted at the new ABI SOLiD sequencing technology. This paper presents what we believe to be the first de novo sequence assembly results on real data from the emerging SOLiD platform, introduced by Applied Biosystems. Our assembler SHORTY augments short-paired reads using a trivially small number (5 – 10) of seeds of length 300 – 500 bp. These seeds enable us to produce significant assemblies using short-read coverage no more than 100×, which can be obtained in a single run of these high-capacity sequencers. SHORTY exploits two ideas which we believe to be of interest to the short-read assembly community: (1) using single seed reads to crystallize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. We demonstrate effective assemblies (N50 contig sizes ~40 kb) of three different bacterial species using simulated SOLiD data. Sequencing artifacts limit our performance on real data, however our results on this data are substantially better than those achieved by competing assemblers.Keywords
This publication has 26 references indexed in Scilit:
- Single-Molecule DNA Sequencing of a Viral GenomeScience, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computerGenome Research, 2008
- Short read fragment assembly of bacterial genomesGenome Research, 2007
- SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencingGenome Research, 2007
- Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read TechnologiesPLOS ONE, 2007
- Advanced sequencing technologies: methods and goalsNature Reviews Genetics, 2004
- The greedy path-merging algorithm for contig scaffoldingJournal of the ACM, 2002
- Combinatorial algorithms for DNA sequence assemblyAlgorithmica, 1995