Offline dictionary-based compression
- 1 January 1999
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10680314,p. 296-305
- https://doi.org/10.1109/dcc.1999.755679
Abstract
Dictionary-based modelling is the mechanism used in many practical compression schemes. We use the full message (or a large block of it) to infer a complete dictionary in advance, and include an explicit representation of the dictionary as part of the compressed message. Intuitively, the advantage of this offline approach is that with the benefit of having access to all of the message, it should be possible to optimize the choice of phrases so as to maximize compression performance. Indeed, we demonstrate that very good compression can be attained by an offline method without compromising the fast decoding that is a distinguishing characteristic of dictionary-based techniques. Several nontrivial sources of overhead, in terms of both computation resources required to perform the compression, and bits generated into the compressed message, have to be carefully managed as part of the offline process. To meet this challenge, we have developed a novel phrase derivation method and a compact dictionary encoding. In combination these two techniques produce the compression scheme RE-PAIR, which is highly efficient, particularly in decompression.Keywords
This publication has 10 references indexed in Scilit:
- Exploiting clustering in inverted file compressionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A hybrid approach to text compressionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Housekeeping for prefix codingIEEE Transactions on Communications, 2000
- A text compression scheme that allows fast searching directly in the compressed fileACM Transactions on Information Systems, 1997
- Compression and Explanation using Hierarchical GrammarsThe Computer Journal, 1997
- On the implementation of minimum redundancy prefix codesIEEE Transactions on Communications, 1997
- The relationship between greedy parsing and symbolwise text compressionJournal of the ACM, 1994
- Variations on a theme by Ziv and LempelPublished by Springer Nature ,1985
- A note on the Ziv - Lempel model for compressing individual sequences (Corresp.)IEEE Transactions on Information Theory, 1983
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978