Application of a Finite-State Model to Text Compression

Abstract
The bit-oriented finite-state model applied in Dynamic Markov Compression (DMC) [5]) is here generalized to a larger alphabet. The finite-state machine is built adaptively during compression, by applying two types of modifications to the machine structure: state cloning and shortcut creation. The machine size is kept tolerable by an escape transition mechanism. Similar to DMC, the new method is combined with arithmetic coding, based on the maintained transition frequencies. The experiments show that the new approach produces notably better compression gains for different sorts of texts in natural and formal languages. In some cases the results are better than for any compression technique found in the literature.

This publication has 0 references indexed in Scilit: