Deterministic Sampling–A New Technique for Fast Pattern Matching
- 1 February 1991
- journal article
- research article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Computing
- Vol. 20 (1) , 22-40
- https://doi.org/10.1137/0220002
Abstract
Consider the following three-stage strategy for recognizing patterns in larger scenes: Mimic randomization deterministically. Sample several positions of the pattern. Search for sample. Find all occurrences of the sample in the scene. Verify. For each occurrence of the sample, verify occurrence of the full pattern. This strategy has led to the core of the new idea given in this paper. Consider the string matching problem. Given the pattern, a sample of its positions is carefully selected whose size is at most logarithmic (the deterministic sample). Then, the sample is searched for. For nonperoidic patterns, the sample has the following perhaps surprising property. It is possible to disqualify all occurrences of the sample positions but one, within each "neighborhood" of locations in the text, without any further comparisons of characters. This provides sparse verification. This approach enables the text analysis (stages "search for sample" and "verify") to be performed in O(log* n) time and optimal speedup on a PRAM. This improves on the previous fastest optimal speedup result. It also leads to a new serial algorithm for string matching that runs in linear time including preprocessing. The approach is expected to be applicable for pragmatic pattern recognition problems. In some sense the algorithms are based on degenerate forms of computation, such as AND and OR of a large number of bits. However, traditional machine designs do not take advantage of such degeneracies, and usual complexity measures do not even enable them to be reflected. This leads to the conclusion of the paper with some speculative thoughts on desirable capabilities that would enhance computing machinery for some pattern recognition applications.Keywords
This publication has 25 references indexed in Scilit:
- An Optimal $O(\log\log n)$ Time Parallel String Matching AlgorithmSIAM Journal on Computing, 1990
- Faster optimal parallel prefix sums and list rankingInformation and Computation, 1989
- Relations between Concurrent-Write Models of Parallel ComputationSIAM Journal on Computing, 1988
- Optimal parallel algorithms for string matchingInformation and Control, 1985
- Open Problems in StringologyPublished by Springer Nature ,1985
- Time-space-optimal string matchingJournal of Computer and System Sciences, 1983
- Fast probabilistic algorithms for hamiltonian circuits and matchingsJournal of Computer and System Sciences, 1979
- Two theorems on random polynomial timePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1978
- A fast string searching algorithmCommunications of the ACM, 1977
- Minimum-scan pattern recognitionIEEE Transactions on Information Theory, 1959