Semantically Motivated Improvements for PPM Variants

1 January 1997

journal article
Published by Oxford University Press (OUP) in The Computer Journal

Vol. 40 (2 and 3) , 76-93
https://doi.org/10.1093/comjnl/40.2_and_3.76

Abstract

The on-line sequence modelling algorithm ‘Prediction by Partial Matching’ (PPM) has set the performance standard in lossless data compression research since Moffat's 1990 implementation, PPMC. Despite intense research activity, only Howard's 1993 escape-count update mechanism ‘D’ has provided any consistent, order-independent performance improvement to PPMC (about 1%). Most notably, the recently introduced PPM variant, PPM*, which eliminates PPM's order bound, fails to offer compression results superior to those of PPMC with Markov order greater than four. This paper explains how to significantly improve the compression performance of any PPM variant (by 5–12%) by combining PPM's probability estimator, ‘blending’, with information-theoretic state selection. Hazards inherent to this combination are overcome by identifying the distinct semantics of the two approaches and resolving the differences using a dual-frequency update mechanism. We present and apply our percolating state selector, plus an enhancement to blending, both of which we have recently shown to independently outperform all competing techniques from the literature. We also give a minimal linear-space suffix-tree implementation of PPM and PPM*. Performance is measured in experiments run on the Calgary Corpus using our reimplementation of the original algorithms in an executable cross-product of independent model components, which permits precise control of all modelling algorithm features.

Keywords

This publication has 0 references indexed in Scilit: