Faster approximate string matching over compressed text

13 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 459-468
https://doi.org/10.1109/dcc.2001.917177

Abstract

Approximate string matching on compressed text was an open problem for almost a decade. The two existing solutions are very new. Despite that they represent important complexity breakthroughs, in most practical cases they are not useful, in the sense that they are slower than uncompressing the text and then searching the uncompressed text. We present a different approach, which reduces the problem to multipattern searching of pattern pieces plus local decompression and direct verification of candidate text areas. We show experimentally that this solution is 10-30 times faster than previous work and up to three times faster than the trivial approach of uncompressing and searching, thus becoming the first practical solution to the problem.

Keywords

This publication has 17 references indexed in Scilit:

Multiple pattern matching in LZW compressed text
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Bit-parallel approach to approximate string matching in compressed texts
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems, 2000
Approximate String Matching over Ziv—Lempel Compressed Text
Published by Springer Nature ,2000
Very fast and simple approximate string matching
Information Processing Letters, 1999
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems, 1997
Combinatorial Pattern Matching
Published by Springer Nature ,1997
A fast string searching algorithm
Communications of the ACM, 1977
Fast Pattern Matching in Strings
SIAM Journal on Computing, 1977
Efficient string matching
Communications of the ACM, 1975