Coding for compression in full-text retrieval systems
- 2 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Witten, Bell and Nevill (see ibid., p.23, 1991) have described compression models for use in full-text retrieval systems. The authors discuss other coding methods for use with the same models, and give results that show their scheme yielding virtually identical compression, and decoding more than forty times faster. One of the main features of their implementation is the complete absence of arithmetic coding; this, in part, is the reason for the high speed. The implementation is also particularly suited to slow devices such as CD-ROM, in that the answering of a query requires one disk access for each term in the query and one disk access for each answer. All words and numbers are indexed, and there are no stop words. They have built two compressed databases.> Author(s) Moffat, A. Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia Zobel, J.Keywords
This publication has 10 references indexed in Scilit:
- Models for compression in full-text retrieval systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Efficient decoding of prefix codesCommunications of the ACM, 1990
- Implementing the PPM data compression schemeIEEE Transactions on Communications, 1990
- Word‐based text compressionSoftware: Practice and Experience, 1989
- Arithmetic coding for data compressionCommunications of the ACM, 1987
- Development of a Spelling ListIEEE Transactions on Communications, 1982
- Variations on a theme by HuffmanIEEE Transactions on Information Theory, 1978
- Optimal source codes for geometrically distributed integer alphabets (Corresp.)IEEE Transactions on Information Theory, 1975
- Run-length encodings (Corresp.)IEEE Transactions on Information Theory, 1966
- A Method for the Construction of Minimum-Redundancy CodesProceedings of the IRE, 1952