Adding compression to a full‐text retrieval system

1 August 1995

journal article
research article
Published by Wiley in Software: Practice and Experience

Vol. 25 (8) , 891-903
https://doi.org/10.1002/spe.4380250804

Abstract

We describe the implementation of a data compression scheme as an integral and transparent layer within a full‐text retrieval system. Using a semi‐static word‐based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full‐text retrieval environments compression not only saves space, it can also yield faster query processing ‐ a win‐win situation.

Keywords

This publication has 17 references indexed in Scilit:

Bounding the Depth of Search Trees
The Computer Journal, 1993
Indexing and compressing full-text databases for CD-ROM
Journal of Information Science, 1991
Efficient decoding of prefix codes
Communications of the ACM, 1990
Implementing the PPM data compression scheme
IEEE Transactions on Communications, 1990
Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems, 1989
A locally adaptive data compression scheme
Communications of the ACM, 1986
Data compression on a database system
Communications of the ACM, 1985
A practitioner's guide to data base compression tutorial
Information Systems, 1983
Universal modeling and coding
IEEE Transactions on Information Theory, 1981
Universal codeword sets and representations of the integers
IEEE Transactions on Information Theory, 1975