Linear approximation of shortest superstrings
- 1 July 1994
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 41 (4) , 630-647
- https://doi.org/10.1145/179812.179818
Abstract
We consider the following problem: given a collection of strings s 1 ,…, s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s . Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2 n ), yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent O(n log n ) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4 n . Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3 n . We also show the superstring problem to be MAXSNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely.Keywords
This publication has 6 references indexed in Scilit:
- The Traveling Salesman Problem with Distances One and TwoMathematics of Operations Research, 1993
- Approximation algorithms for the shortest common superstring problemInformation and Computation, 1989
- A greedy approximation algorithm for constructing shortest common superstringsTheoretical Computer Science, 1988
- A theory of the learnableCommunications of the ACM, 1984
- On finding minimal length superstringsJournal of Computer and System Sciences, 1980
- Uniqueness theorems for periodic functionsProceedings of the American Mathematical Society, 1965