Nonparametric entropy estimation for stationary processes and random fields, with applications to English text
- 1 May 1998
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 44 (3) , 1319-1327
- https://doi.org/10.1109/18.669425
Abstract
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Cesaro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker (1940). We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.Keywords
This publication has 24 references indexed in Scilit:
- Using difficulty of prediction to decrease computation: fast sort, priority queue and convex hull on entropy bounded inputsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Stationary entrophy estimation via string matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Prefixes and the entropy rate for long-range sourcesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Entropy and recurrence rates for stationary random fieldsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Entropy and data compression schemesIEEE Transactions on Information Theory, 1993
- Ergodic TheoremsPublished by Walter de Gruyter GmbH ,1985
- Asymptotical Growth of a Class of Random TreesThe Annals of Probability, 1985
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977
- The ergodic theorem for a sequence of functionsDuke Mathematical Journal, 1940