On the origin of long-range correlations in texts
Top Cited Papers
- 2 July 2012
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 109 (29) , 11582-11587
- https://doi.org/10.1073/pnas.1117723109
Abstract
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.Keywords
All Related Versions
This publication has 33 references indexed in Scilit:
- The dimensionality of discourseProceedings of the National Academy of Sciences, 2010
- Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of WordsPLOS ONE, 2009
- Scaling laws of human interaction activityProceedings of the National Academy of Sciences, 2009
- Hierarchical structures induce long-range dynamical correlations in written textsProceedings of the National Academy of Sciences, 2006
- Intermittency and scale-free networks: a dynamical model for human language complexityChaos, Solitons, and Fractals, 2004
- Binary-Step Markov Chains and Long-Range Correlated SystemsPhysical Review Letters, 2003
- Stochastic text generationPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2000
- Markov Processes: Linguistics and Zipf's LawPhysical Review Letters, 1995
- Entropy and Long-Range Correlations in Literary EnglishEurophysics Letters, 1994
- Estimating the information content of symbol sequences and efficient codesIEEE Transactions on Information Theory, 1989