Variable length Markov chains

Open Access

1 April 1999

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 27 (2) , 480-513
https://doi.org/10.1214/aos/1018031204

Abstract

We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary high-order Markov chains. From an algorithmic view, the VLMC model class has attracted interest in information theory and machine learning, but statistical properties have not yet been explored. Provided that good estimation is available, the additional structural richness of the model class enhances predictive power by finding a better tradeoff between model bias and variance and allowing better structural description which can be of specific interest. The latter is exemplified with some DNA data. A version of the tree-structured context algorithm, proposed by Rissanen in an information theoretical setup is shown to have new good asymptotic properties for estimation in the class of VLMCs. This remains true even when the underlying model increases in dimensionality. Fur-thermore, consistent estimation of minimal state spaces and mixing properties of fitted models are given. We also propose a new bootstrap scheme based on fitted VLMCs. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures.

Keywords

This publication has 21 references indexed in Scilit:

Statistical methods for DNA sequence segmentation
Statistical Science, 1998
A universal finite memory source
IEEE Transactions on Information Theory, 1995
Estimation and Modelling Repeated Patterns in High Order Markov Chains with the Mixture Transition Distribution Model
Journal of the Royal Statistical Society Series C: Applied Statistics, 1994
A sequential algorithm for the universal coding of finite memory sources
IEEE Transactions on Information Theory, 1992
Achieving Information Bounds in Non and Semiparametric Models
The Annals of Statistics, 1990
The Jackknife and the Bootstrap for General Stationary Observations
The Annals of Statistics, 1989
Complexity of strings in the class of Markov sources
IEEE Transactions on Information Theory, 1986
A universal data compression system
IEEE Transactions on Information Theory, 1983
Central Limit Theorems for dependent variables. I
Probability Theory and Related Fields, 1981
Bootstrap Methods: Another Look at the Jackknife
The Annals of Statistics, 1979