Estimating an Author's Vocabulary
- 1 March 1973
- journal article
- research article
- Published by JSTOR in Journal of the American Statistical Association
- Vol. 68 (341) , 92
- https://doi.org/10.2307/2284147
Abstract
The problem of estimating an author's vocabulary, given a sample of the author's writings, is considered. It is assumed that the vocabulary is fixed and finite, and that the author writes a composition by successively drawing words from this collection, independently of the previous configuration. Attention is focussed on the random variable X(n), the total number of different words used in a sample of n. It is shown that under fairly general conditions, the distribution of X(n), suitably normalized and scaled, is asymptotically Gaussian, and this result may be used to obtain a large sample estimator of vocabulary size.Keywords
This publication has 0 references indexed in Scilit: