Estimating an Author's Vocabulary

1 March 1973

journal article
research article
Published by JSTOR in Journal of the American Statistical Association

Vol. 68 (341) , 92
https://doi.org/10.2307/2284147

Abstract

The problem of estimating an author's vocabulary, given a sample of the author's writings, is considered. It is assumed that the vocabulary is fixed and finite, and that the author writes a composition by successively drawing words from this collection, independently of the previous configuration. Attention is focussed on the random variable X(n), the total number of different words used in a sample of n. It is shown that under fairly general conditions, the distribution of X(n), suitably normalized and scaled, is asymptotically Gaussian, and this result may be used to obtain a large sample estimator of vocabulary size.

Keywords

ESTIMATING AN AUTHOR'S VOCABULARY

This publication has 0 references indexed in Scilit: