Adding compression to a full‐text retrieval system
- 1 August 1995
- journal article
- research article
- Published by Wiley in Software: Practice and Experience
- Vol. 25 (8) , 891-903
- https://doi.org/10.1002/spe.4380250804
Abstract
We describe the implementation of a data compression scheme as an integral and transparent layer within a full‐text retrieval system. Using a semi‐static word‐based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full‐text retrieval environments compression not only saves space, it can also yield faster query processing ‐ a win‐win situation.Keywords
This publication has 17 references indexed in Scilit:
- Bounding the Depth of Search TreesThe Computer Journal, 1993
- Indexing and compressing full-text databases for CD-ROMJournal of Information Science, 1991
- Efficient decoding of prefix codesCommunications of the ACM, 1990
- Implementing the PPM data compression schemeIEEE Transactions on Communications, 1990
- Storing text retrieval systems on CD-ROM: compression and encryption considerationsACM Transactions on Information Systems, 1989
- A locally adaptive data compression schemeCommunications of the ACM, 1986
- Data compression on a database systemCommunications of the ACM, 1985
- A practitioner's guide to data base compression tutorialInformation Systems, 1983
- Universal modeling and codingIEEE Transactions on Information Theory, 1981
- Universal codeword sets and representations of the integersIEEE Transactions on Information Theory, 1975