Population size estimation based upon ratios of recapture probabilities
Open Access
- 1 June 2011
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 5 (2B) , 1512-1533
- https://doi.org/10.1214/10-aoas436
Abstract
Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency distribution of counts of repeated identifications of units of the population of interest. This counting variable is necessarily zero-truncated, since units that have never been identified are not in the sample. We consider several applications: clinical medicine, where interest is in estimating patients with adenomatous polyps which have been overlooked by the diagnostic procedure; drug user studies, where interest is in estimating the number of hidden drug users which are not identified; veterinary surveillance of scrapie in the UK, where interest is in estimating the hidden amount of scrapie; and entomology and microbial ecology, where interest is in estimating the number of unobserved species of organisms. In all these examples, simple models such as the homogenous Poisson are not appropriate since they do not account for present and latent heterogeneity. The Poisson–Gamma (negative binomial) model provides a flexible alternative and often leads to well-fitting models. It has a long history and was recently used in the development of the Chao–Bunge estimator. Here we use a different property of the Poisson–Gamma model: if we consider ratios of neighboring Poisson–Gamma probabilities, then these are linearly related to the counts of repeated identifications. Also, ratios have the useful property that they are identical for truncated and untruncated distributions. In this paper we propose a weighted logarithmic regression model to estimate the zero frequency counts, assuming a Gamma–Poisson distribution for the counts. A detailed explanation about the chosen weights and a goodness of fit index are presented, along with extensions to other distributions. To evaluate the proposed estimator, we applied it to the benchmark examples mentioned above, and we compared the results with those obtained through the Chao–Bunge and other estimators. The major benefits of the proposed estimator are that it is defined under mild conditions, whereas the Chao–Bunge estimator fails to be well defined in several of the examples presented; in cases where the Chao–Bunge estimator is defined, its behavior is comparable to the proposed estimator in terms of Bias and MSE as a simulation study shows. Furthermore, the proposed estimator is relatively insensitive to inclusion or exclusion of large outlying frequencies, while sensitivity to outliers is characteristic of most other methods. The implications and limitations of such methods are discussed.Keywords
All Related Versions
This publication has 30 references indexed in Scilit:
- Protistan diversity in suboxic and anoxic waters of the Gotland Deep (Baltic Sea) as revealed by 18S rRNA clone librariesAquatic Microbial Ecology, 2009
- The rational exploration of microbial diversityThe ISME Journal, 2008
- Estimating the coverage of a targeted mobile tuberculosis screening programme among illicit drug users and homeless persons with truncated modelsEpidemiology and Infection, 2007
- Estimating the Prevalence of Male Clients of Prostitute Women in Vancouver With a Simple Capture–Recapture MethodJournal of the Royal Statistical Society Series A: Statistics in Society, 2006
- Nonparametric Maximum Likelihood Estimation of Population Size Based on the Counting DistributionJournal of the Royal Statistical Society Series C: Applied Statistics, 2005
- Estimating the Size of a Criminal Population from Police Records Using the Truncated Poisson Regression ModelStatistica Neerlandica, 2003
- Estimating the Number of Drug Injectors from Needle Exchange DataAddiction Research & Theory, 2003
- An experimental evaluation of capture‐recapture in software inspectionsSoftware Testing, Verification and Reliability, 1995
- Estimating Population Size for Sparse Data in Capture-Recapture ExperimentsBiometrics, 1989
- A Unified Treatment of Integer Parameter ModelsJournal of the American Statistical Association, 1987