Adding compression to a full‐text retrieval system

Abstract
We describe the implementation of a data compression scheme as an integral and transparent layer within a full‐text retrieval system. Using a semi‐static word‐based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full‐text retrieval environments compression not only saves space, it can also yield faster query processing ‐ a win‐win situation.

This publication has 17 references indexed in Scilit: