Fast scaffolding with small independent mixed integer programs
Open Access
- 13 October 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (23) , 3259-3265
- https://doi.org/10.1093/bioinformatics/btr562
Abstract
Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. Results: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes. Availability: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/. Contact: leena.salmela@cs.helsinki.fiKeywords
This publication has 15 references indexed in Scilit:
- High-quality draft assemblies of mammalian genomes from massively parallel sequence dataProceedings of the National Academy of Sciences, 2010
- Scaffolding pre-assembled contigs using SSPACEBioinformatics, 2010
- SOPRA: Scaffolding algorithm for paired reads via statistical optimizationBMC Bioinformatics, 2010
- Unified View of Backward Backtracking in Short Read MappingPublished by Springer Nature ,2010
- De novo assembly of human genomes with massively parallel short read sequencingGenome Research, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- SOAP2: an improved ultrafast tool for short read alignmentBioinformatics, 2009
- Genome assembly reborn: recent computational challengesBriefings in Bioinformatics, 2009
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- Combinatorial algorithms for DNA sequence assemblyAlgorithmica, 1995