Document ranking on weight-partitioned signature files
- 1 April 1996
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 14 (2) , 109-137
- https://doi.org/10.1145/226163.226164
Abstract
A signature file organization, called the weight-partitioned signature file, for supporting document ranking is proposed. It employs multiple signature files, each of which corresponds to one term frequency, to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hashed into the signature file corresponding to that term frequency. This eliminates the need to record the term frequency explicitly for each word. We investigate the effect of false drops on retrieval effectiveness if they are not eliminated in the search process. We have shown that false drops introduce insignificant degradation on precision and recall when the false-drop probability is below a certain threshold. This is an important result since false-drop elimination could become the bottleneck in systems using fast signature file search techniques. We perform an analytical study on the performance of the weight-partitioned signature file under different search strategies and configurations. An optimal formula is obtained to determine for a fixed total storage overhead the storage to be allocated to each partition in order to minimize the effect of false drops on document ranks. Experiments were performed using a document collection to support the analytical results.Keywords
This publication has 13 references indexed in Scilit:
- Efficient signature file methods for text retrievalIEEE Transactions on Knowledge and Data Engineering, 1995
- Overview of the first TREC conferencePublished by Association for Computing Machinery (ACM) ,1993
- Partitioned signature files: design issues and performance evaluationACM Transactions on Information Systems, 1989
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- Implementing ranking strategies using text signaturesACM Transactions on Information Systems, 1988
- Parallel free-text search on the connection machine systemCommunications of the ACM, 1986
- Multimedia document presentation, information extraction, and document formation in MINOS: a model and a systemACM Transactions on Information Systems, 1986
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- Optimal partial-match retrieval when fields are independently specifiedACM Transactions on Database Systems, 1979
- Partial-match retrieval via the method of superimposed codesProceedings of the IEEE, 1979