Nonparametric Estimation of the Probability of Discovering a New Species

Abstract
A random sample is taken from a population consisting of an unknown number of distinct species. A quantity of interest is the probability of discovering a new species when an additional draw from the population is made. An estimator of this quantity was introduced by Starr (1979). We prove a conjecture of Starr's that the estimator is uniformly minimum variance unbiased and give various asymptotic properties of the estimator. A nonparametric maximum likelihood estimator that has similar asymptotic properties is introduced. A Monte Carlo study that suggests guidelines for choosing an estimator under various circumstances is given. To amplify, suppose that if we take a sample of size 1 from a population then the probability of drawing a representative of the ith species is p 1. If n draws are made with replacement, the (unconditional) probability that species i will be observed for the first time on the n + 1st draw is given simply as the term p i q n i (q i = 1 − p i ) from a geometric distribution. Consequently, θ n = Σ i p i q n i represents the (unconditional) probability that some new species will be drawn for the first time on the n + 1st draw. No unbiased estimate of θ n , can be derived from a sample of size n, but if one additional observation is allowed then such an estimator arises naturally. Denoted by V 1, this estimator is simply the number of species observed only once divided by n + 1, the number of draws. This estimator was proposed by Robbins (1968). An analogous unbiased estimator, Vm , was proposed by Starr (1979) for the case in which m additional draws are made. Starr conjectured that Vm , is a minimum variance unbiased estimate. We prove Starr's conjecture, using the theory of U statistics. This theory can also be used to show that V m is asymptotically normally distributed as m → ∞ and that the rate of convergence is faster if the p i 's are all equal. As an alternative to estimating θ i by V 1, we consider estimating p i , by the fraction of times species i is observed in n + 1 draws, say , and then estimating θn, by . We also construct a similar estimator when m additional draws are made. Although this nonparametric maximum likelihood estimator is biased for small samples, we show that it has asymptotic properties similar to Vm : it is asymptotically unbiased and has the same asymptotic distribution. Moreover, by way of simulations and special cases, we show that can dominate Vm in terms of mean squared error.

This publication has 0 references indexed in Scilit: