Abstract
A simple method for speeding up the term detection phase of retrieval from a full-text document database is presented. The method makes use of a surrogate database, in which a document is represented as a sequence of hash signatures. Each signature represents a term occurring in the original document. The size of the surrogate database is expected to be only 5-10% of the original database. The major part of the work involved in term detection can be done utilizing this smaller database. It can either be scanned or inverted and used as an index. The term detection phase is speeded up significantly at a cost of increased processing when filing a document.

This publication has 6 references indexed in Scilit: