Markov Processes: Linguistics and Zipf's Law

Abstract
It is shown that a 2-parameter random Markov process constructed with N states and biased random transitions gives rise to a stationary distribution where the probabilities of occurrence of the states, P(k),k=1,,N, exhibit the following three universal behaviors which characterize biological sequences and texts in natural languages: (a) the rank-ordered frequencies of occurrence of words are given by Zipf's law P(k)1/kρ, where ρ(k) is slowly increasing for small k; (b) the frequencies of occurrence of letters are given by P(k)=ADln(k); and (c) long-range correlations are observed over long but finite intervals, as a result of the quasiergodicity of the Markov process.