A method for speeding up text retrieval
- 1 January 1984
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMIS Database: the DATABASE for Advances in Information Systems
- Vol. 15 (2) , 19-23
- https://doi.org/10.1145/1017712.1017717
Abstract
A simple method for speeding up the term detection phase of retrieval from a full-text document database is presented. The method makes use of a surrogate database, in which a document is represented as a sequence of hash signatures. Each signature represents a term occurring in the original document. The size of the surrogate database is expected to be only 5-10% of the original database. The major part of the work involved in term detection can be done utilizing this smaller database. It can either be scanned or inverted and used as an index. The term detection phase is speeded up significantly at a cost of increased processing when filing a document.Keywords
This publication has 6 references indexed in Scilit:
- A Hardware Hashing Scheme in the Design of a Multiterm String ComparatorIEEE Transactions on Computers, 1982
- Experiments with automatic text filing and retrieval in the office environmentACM SIGIR Forum, 1982
- Text Retrieval ComputersComputer, 1979
- Text file inversionPublished by Association for Computing Machinery (ACM) ,1978
- A Dictionary for Minimum Redundancy EncodingJournal of the ACM, 1963
- Length-frequency statistics for written EnglishInformation and Control, 1958