A note on the DMC data compression scheme

Abstract
Until recently, the best schemes for text compression have used variable-order Markov models, i.e. each symbol is predicted using some finite number of directly preceding symbols as a context. The recent Dynamic Markov Compression (DMC) scheme models data with Finite State Automata, which are capable of representing more complex contexts than simple Markov models. The DMC scheme builds models by ‘cloning’ states which are visited often. Because patterns can occur in English which can be recognised by a Finite State Automaton, but not by a variable-order Markov model, the use of FSA models is potentially more powerful. A class of Finite State Automata, called Finite Context Automata, is defined, which are equivalent to variable-order Markov models. For the initial models proposed, the DMC scheme is shown to have no more power than variable-order Markov models by showing that it generates only Finite Context Automata. This is verified in practice, where experiments show that the compression performance of DMC is comparable to that of existing variable-order Markov model schemes. Consequently, more sophisticated models than Finite Context Automata still need to be explored in order to achieve better text compression.

This publication has 0 references indexed in Scilit: