Unbounded length contexts for PPM
- 19 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10680314,p. 52-61
- https://doi.org/10.1109/dcc.1995.515495
Abstract
The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten, and a series of improvements was described by Moffat (1990), culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation uses considerably greater computational resources (both time and space). The basic PPM compression scheme is described, showing the use of contexts of unbounded length, and how it can be implemented using a tree data structure. Some results are given that demonstrate an improvement of about 6% over the old method.Keywords
This publication has 5 references indexed in Scilit:
- The zero-frequency problem: estimating the probabilities of novel events in adaptive text compressionIEEE Transactions on Information Theory, 1991
- Implementing the PPM data compression schemeIEEE Transactions on Communications, 1990
- Data compression with finite windowsCommunications of the ACM, 1989
- Data Compression Using Adaptive Coding and Partial String MatchingIEEE Transactions on Communications, 1984
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978