On the relation between types and tokens in literary text
- 1 March 1972
- journal article
- research article
- Published by Cambridge University Press (CUP) in Journal of Applied Probability
- Vol. 9 (03) , 507-518
- https://doi.org/10.1017/s002190020003583x
Abstract
The ratio of the number Xn of different words (types) in a text of length n (token) words to n has received considerable attention in the literature of statistical linguistics. The present note contains two stochastic models for Xn based on an inhomogeneous discrete Markov process of the pure birth type where the transition probabilities take certain forms depending only upon n. These models are then tested against data obtained from the plays of William Shakespeare.Keywords
This publication has 3 references indexed in Scilit:
- Some further notes on a class of skew distribution functionsInformation and Control, 1960
- ON A CLASS OF SKEW DISTRIBUTION FUNCTIONSBiometrika, 1955
- THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERSBiometrika, 1953