Toward Simplifying and Accurately Formulating Fragment Assembly
- 1 January 1995
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 2 (2) , 275-290
- https://doi.org/10.1089/cmb.1995.2.275
Abstract
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov–Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints. Key words: shotgun DNA sequencing; fragment assembly; sequence rconstructionKeywords
This publication has 13 references indexed in Scilit:
- Combinatorial algorithms for DNA sequence assemblyAlgorithmica, 1995
- A contig assembly program based on sensitive detection of fragment overlapsGenomics, 1992
- Roles of repetitive sequencesComputers & Chemistry, 1992
- A greedy approximation algorithm for constructing shortest common superstringsTheoretical Computer Science, 1988
- Genomic mapping by fingerprinting random clones: A mathematical analysisGenomics, 1988
- SEQAID: a DNA sequence assembling program based on a mathematical modelNucleic Acids Research, 1984
- The theory and computation of evolutionary distances: Pattern recognitionJournal of Algorithms, 1980
- A new method for sequencing DNA.Proceedings of the National Academy of Sciences, 1977
- Numerical Tabulation of the Distribution of Kolmogorov's Statistic for Finite Sample SizeJournal of the American Statistical Association, 1952
- Some Properties of ConversionTransactions of the American Mathematical Society, 1936