Always Good Turing: Asymptotically Optimal Probability Estimation
- 17 October 2003
- journal article
- other
- Published by American Association for the Advancement of Science (AAAS) in Science
- Vol. 302 (5644) , 427-431
- https://doi.org/10.1126/science.1088284
Abstract
While deciphering the Enigma code, Good and Turing derived an unintuitive, yet effective, formula for estimating a probability distribution from a sample of data. We define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet greater than 1. We then derive an estimator whose attenuation is 1; that is, asymptotically it does not underestimate the probability of any sequence.Keywords
This publication has 12 references indexed in Scilit:
- Turing’s anticipation of empirical bayes in connection with the cryptanalysis of the naval enigma*Journal of Statistical Computation and Simulation, 2000
- Universal predictionIEEE Transactions on Information Theory, 1998
- Fisher information and stochastic complexityIEEE Transactions on Information Theory, 1996
- Redundancy rates for renewal and other processesIEEE Transactions on Information Theory, 1996
- Probability scoring for spelling correctionStatistics and Computing, 1991
- The zero-frequency problem: estimating the probabilities of novel events in adaptive text compressionIEEE Transactions on Information Theory, 1991
- On Turing's formula for word probabilitiesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
- The performance of universal encodingIEEE Transactions on Information Theory, 1981
- Universal noiseless codingIEEE Transactions on Information Theory, 1973
- THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERSBiometrika, 1953