FLASH: fast length adjustment of short reads to improve genome assemblies
Top Cited Papers
Open Access
- 7 September 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (21) , 2957-2963
- https://doi.org/10.1093/bioinformatics/btr507
Abstract
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of Availability and Implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. Contact:t.magoc@gmail.comKeywords
This publication has 10 references indexed in Scilit:
- High-quality draft assemblies of mammalian genomes from massively parallel sequence dataProceedings of the National Academy of Sciences, 2010
- Quake: quality-aware detection and correction of sequencing errorsGenome Biology, 2010
- Unlocking Short Read Sequencing for MetagenomicsPLOS ONE, 2010
- Phenotypic connections in surprising placesGenome Biology, 2010
- De novo assembly of human genomes with massively parallel short read sequencingGenome Research, 2009
- ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired readsGenome Biology, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- Aggressive assembly of pyrosequencing reads with matesBioinformatics, 2008
- Versatile and open software for comparing large genomesGenome Biology, 2004