The meta book and size-dependent properties of written language
Open Access
- 1 December 2009
- journal article
- Published by IOP Publishing in New Journal of Physics
- Vol. 11 (12) , 123015
- https://doi.org/10.1088/1367-2630/11/12/123015
Abstract
Evidence is presented for a systematic text-length dependence of the power-law index γ of a single book. The estimated γ values are consistent with a monotonic decrease from 2 to 1 with increasing text length. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by γ=1 instead of the value γ=2 expected if Zipf's law is universally applicable. In addition, we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.Keywords
All Related Versions
This publication has 11 references indexed in Scilit:
- Size-dependent word frequencies and translational invariance of booksPhysica A: Statistical Mechanics and its Applications, 2010
- Power-Law Distributions in Empirical DataSIAM Review, 2009
- Power laws, Pareto distributions and Zipf's lawContemporary Physics, 2005
- Distribution of Korean family namesPhysica A: Statistical Mechanics and its Applications, 2004
- A Brief History of Generative Models for Power Law and Lognormal DistributionsInternet Mathematics, 2004
- Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited∗Journal of Quantitative Linguistics, 2001
- Vertical transmission of culture and the distribution of family namesPhysica A: Statistical Mechanics and its Applications, 2001
- Word Frequency DistributionsPublished by Springer Nature ,2001
- ON A CLASS OF SKEW DISTRIBUTION FUNCTIONSBiometrika, 1955
- Selected Studies of the Principle of Relative Frequency in LanguagePublished by Harvard University Press ,1932