Zipf’s law, the central limit theorem, and the random division of the unit interval
- 1 July 1996
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 54 (1) , 220-223
- https://doi.org/10.1103/physreve.54.220
Abstract
It is shown that a version of Mandelbrot’s monkey-at-the-typewriter model of Zipf’s inverse power law is directly related to two classical areas in probability theory: the central limit theorem and the ‘‘broken stick’’ problem, i.e., the random division of the unit interval. The connection to the central limit theorem is proved using a theorem on randomly indexed sums of random variables [A. Gut, Stopped Random Walks: Limit Theorems and Applications (Springer, New York, 1987)]. This reveals an underlying log-normal structure of pseudoword probabilities with an inverse power upper tail that clarifies a point of confusion in Mandelbrot’s work. An explicit asymptotic formula for the slope of the log-linear rank-size law in the upper tail of this distribution is also obtained. This formula relates to known asymptotic results concerning the random division of the unit interval that imply a slope value approaching -1 under quite general conditions. The role of size-biased sampling in obscuring the bottom part of the distribution is explained and connections to related work are noted. © 1996 The American Physical Society.Keywords
This publication has 9 references indexed in Scilit:
- Linguistic Features of Noncoding DNA SequencesPhysical Review Letters, 1994
- Random texts exhibit Zipf's-law-like word frequency distributionIEEE Transactions on Information Theory, 1992
- Maximum entropy formalism, fractals, scaling phenomena, and 1/f noise: A tale of tailsJournal of Statistical Physics, 1983
- On 1/ f noise and other distributions with long tailsProceedings of the National Academy of Sciences, 1982
- Logarithms of Sample SpacingsSIAM Journal on Applied Mathematics, 1968
- On the theory of word frequencies and on related Markovian models of discourseProceedings of Symposia in Applied Mathematics, 1961
- Some Effects of Intermittent SilenceThe American Journal of Psychology, 1957
- On a the Test for Homogeneity and Extreme ValuesThe Annals of Mathematical Statistics, 1952