A method for speeding up text retrieval

Abstract

A simple method for speeding up the term detection phase of retrieval from a full-text document database is presented. The method makes use of a surrogate database, in which a document is represented as a sequence of hash signatures. Each signature represents a term occurring in the original document. The size of the surrogate database is expected to be only 5-10% of the original database. The major part of the work involved in term detection can be done utilizing this smaller database. It can either be scanned or inverted and used as an index. The term detection phase is speeded up significantly at a cost of increased processing when filing a document.

Keywords

This publication has 6 references indexed in Scilit:

A Hardware Hashing Scheme in the Design of a Multiterm String Comparator
IEEE Transactions on Computers, 1982
Experiments with automatic text filing and retrieval in the office environment
ACM SIGIR Forum, 1982
Text Retrieval Computers
Computer, 1979
Text file inversion
Published by Association for Computing Machinery (ACM) ,1978
A Dictionary for Minimum Redundancy Encoding
Journal of the ACM, 1963
Length-frequency statistics for written English
Information and Control, 1958