The practicality of text signatures for accelerating string searching
- 1 January 1982
- journal article
- research article
- Published by Wiley in Software: Practice and Experience
- Vol. 12 (1) , 35-44
- https://doi.org/10.1002/spe.4380120104
Abstract
This paper studies the use of text signatures in string searching. Text signatures are a coded representation of a unit of text formed by hashing substrings into bit positions which are, in turn, set to one. Then instead of searching an entire line of text exhaustively, the text signature may be examined first to determine if complete processing is warranted. A hashing function which minimizes the number of collisions in a signature is described. Experimental results for two signature lengths with both a text file and a program file are given. Analyses of the results and the utility and application of the method conclude the discussion.Keywords
This publication has 5 references indexed in Scilit:
- Partial-match retrieval using indexed descriptor filesCommunications of the ACM, 1980
- Practical fast searching in stringsSoftware: Practice and Experience, 1980
- A fast string searching algorithmCommunications of the ACM, 1977
- Fast Pattern Matching in StringsSIAM Journal on Computing, 1977
- Implementation of the substring test by hashingCommunications of the ACM, 1971