On the relation between types and tokens in literary text
- 1 June 1972
- journal article
- Published by Cambridge University Press (CUP) in Journal of Applied Probability
- Vol. 9 (3) , 507-518
- https://doi.org/10.2307/3212322
Abstract
The ratio of the number Xn of different words (types) in a text of length n (token) words to n has received considerable attention in the literature of statistical linguistics. The present note contains two stochastic models for Xn based on an inhomogeneous discrete Markov process of the pure birth type where the transition probabilities take certain forms depending only upon n. These models are then tested against data obtained from the plays of William Shakespeare.Keywords
This publication has 4 references indexed in Scilit:
- The Advanced Theory of Language as Choice and ChancePublished by Springer Nature ,1966
- Some further notes on a class of skew distribution functionsInformation and Control, 1960
- ON A CLASS OF SKEW DISTRIBUTION FUNCTIONSBiometrika, 1955
- THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERSBiometrika, 1953