Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences
- 1 November 2011
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 18 (11) , 1681-1691
- https://doi.org/10.1089/cmb.2011.0170
Abstract
Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/).Keywords
This publication has 23 references indexed in Scilit:
- SOPRA: Scaffolding algorithm for paired reads via statistical optimizationBMC Bioinformatics, 2010
- Phylogenetic comparative assemblyAlgorithms for Molecular Biology, 2010
- ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired readsGenome Biology, 2009
- Real-Time DNA Sequencing from Single Polymerase MoleculesScience, 2009
- De novo fragment assembly with short mate-paired reads: Does the read length matter?Genome Research, 2008
- Versatile and open software for comparing large genomesGenome Biology, 2004
- Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2Genome Research, 2003
- The greedy path-merging algorithm for contig scaffoldingJournal of the ACM, 2002
- Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripesScience, 2002
- Assembly of the Working Draft of the Human Genome with GigAssemblerGenome Research, 2001