Bit-parallel approach to approximate string matching in compressed texts
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).Keywords
This publication has 13 references indexed in Scilit:
- A unifying framework for compressed pattern matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Approximate String Matching over Ziv—Lempel Compressed TextPublished by Springer Nature ,2000
- A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed TextPublished by Springer Nature ,1999
- Offline dictionary-based compressionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Fast text searchingCommunications of the ACM, 1992
- Finding approximate patterns in stringsJournal of Algorithms, 1985
- A Technique for High-Performance Data CompressionComputer, 1984
- The theory and computation of evolutionary distances: Pattern recognitionJournal of Algorithms, 1980
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977