Estimating the information content of symbol sequences and efficient codes
- 1 May 1989
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Information Theory
- Vol. 35 (3) , 669-675
- https://doi.org/10.1109/18.30993
Abstract
Several variants of an algorithm for estimating Shannon entropies of symbol sequences are presented. They are all related to the Lempel-Ziv algorithm (1976, 1977) and to recent algorithms for estimating Hausdorff dimensions. The average storage and running times increase as N and Nlog N, respectively, with the sequence length N. These algorithms proceed basically by constructing efficient codes. They seem to be the optimal algorithms for sequences with strong long-range correlations, e.g. natural languages. An application to written English illustrates their use.<>Keywords
This publication has 19 references indexed in Scilit:
- Long-range effects in an elementary cellular automatonJournal of Statistical Physics, 1986
- A universal data compression systemIEEE Transactions on Information Theory, 1983
- Linear Algorithm for Data Compression via String MatchingJournal of the ACM, 1981
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A convergent gambling estimate of the entropy of EnglishIEEE Transactions on Information Theory, 1978
- A class of balanced binary sequences with optimal autocorrelation propertiesIEEE Transactions on Information Theory, 1977
- On the Complexity of Finite SequencesIEEE Transactions on Information Theory, 1976
- On the Length of Programs for Computing Finite Binary SequencesJournal of the ACM, 1966
- A formal theory of inductive inference. Part IIInformation and Control, 1964
- Prediction and Entropy of Printed EnglishBell System Technical Journal, 1951