Coding for compression in full-text retrieval systems

2 January 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 72-81
https://doi.org/10.1109/dcc.1992.227474

Abstract

Witten, Bell and Nevill (see ibid., p.23, 1991) have described compression models for use in full-text retrieval systems. The authors discuss other coding methods for use with the same models, and give results that show their scheme yielding virtually identical compression, and decoding more than forty times faster. One of the main features of their implementation is the complete absence of arithmetic coding; this, in part, is the reason for the high speed. The implementation is also particularly suited to slow devices such as CD-ROM, in that the answering of a query requires one disk access for each term in the query and one disk access for each answer. All words and numbers are indexed, and there are no stop words. They have built two compressed databases.> Author(s) Moffat, A. Dept. of Comput. Sci., Melbourne Univ., Parkville, Vic., Australia Zobel, J.

Keywords

This publication has 10 references indexed in Scilit:

Models for compression in full-text retrieval systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Efficient decoding of prefix codes
Communications of the ACM, 1990
Implementing the PPM data compression scheme
IEEE Transactions on Communications, 1990
Word‐based text compression
Software: Practice and Experience, 1989
Arithmetic coding for data compression
Communications of the ACM, 1987
Development of a Spelling List
IEEE Transactions on Communications, 1982
Variations on a theme by Huffman
IEEE Transactions on Information Theory, 1978
Optimal source codes for geometrically distributed integer alphabets (Corresp.)
IEEE Transactions on Information Theory, 1975
Run-length encodings (Corresp.)
IEEE Transactions on Information Theory, 1966
A Method for the Construction of Minimum-Redundancy Codes
Proceedings of the IRE, 1952