Deterministic Sampling–A New Technique for Fast Pattern Matching

1 February 1991

journal article
research article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Computing

Vol. 20 (1) , 22-40
https://doi.org/10.1137/0220002

Abstract

Consider the following three-stage strategy for recognizing patterns in larger scenes: Mimic randomization deterministically. Sample several positions of the pattern. Search for sample. Find all occurrences of the sample in the scene. Verify. For each occurrence of the sample, verify occurrence of the full pattern. This strategy has led to the core of the new idea given in this paper. Consider the string matching problem. Given the pattern, a sample of its positions is carefully selected whose size is at most logarithmic (the deterministic sample). Then, the sample is searched for. For nonperoidic patterns, the sample has the following perhaps surprising property. It is possible to disqualify all occurrences of the sample positions but one, within each "neighborhood" of locations in the text, without any further comparisons of characters. This provides sparse verification. This approach enables the text analysis (stages "search for sample" and "verify") to be performed in O(log* n) time and optimal speedup on a PRAM. This improves on the previous fastest optimal speedup result. It also leads to a new serial algorithm for string matching that runs in linear time including preprocessing. The approach is expected to be applicable for pragmatic pattern recognition problems. In some sense the algorithms are based on degenerate forms of computation, such as AND and OR of a large number of bits. However, traditional machine designs do not take advantage of such degeneracies, and usual complexity measures do not even enable them to be reflected. This leads to the conclusion of the paper with some speculative thoughts on desirable capabilities that would enhance computing machinery for some pattern recognition applications.

Keywords

This publication has 25 references indexed in Scilit:

An Optimal $O(\log\log n)$ Time Parallel String Matching Algorithm
SIAM Journal on Computing, 1990
Faster optimal parallel prefix sums and list ranking
Information and Computation, 1989
Relations between Concurrent-Write Models of Parallel Computation
SIAM Journal on Computing, 1988
Optimal parallel algorithms for string matching
Information and Control, 1985
Open Problems in Stringology
Published by Springer Nature ,1985
Time-space-optimal string matching
Journal of Computer and System Sciences, 1983
Fast probabilistic algorithms for hamiltonian circuits and matchings
Journal of Computer and System Sciences, 1979
Two theorems on random polynomial time
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1978
A fast string searching algorithm
Communications of the ACM, 1977
Minimum-scan pattern recognition
IEEE Transactions on Information Theory, 1959