Predicting the Conditional Probability of Discovering a New Class
- 1 December 2004
- journal article
- Published by Taylor & Francis in Journal of the American Statistical Association
- Vol. 99 (468) , 1108-1118
- https://doi.org/10.1198/016214504000001709
Abstract
Consider a population comprising disjoint classes. An important problem arising from various fields is prediction of the random conditional probability of discovering a new class. The asymptotic normality of the discovery probability is established in a Poisson model, where the number of individuals from each class is a Poisson process with a class-specific rate. A new derivation is presented for the well-known Good–Toulmin predictor as a moment-based estimator for the asymptotic limit of the discovery probability. The Good–Toulmin predictor is also shown to be a nonparametric empirical Bayes estimator for the expectation of the discovery probability given the rates of the Poisson processes and an approximation to an unbiased estimator only for the identifiable part of the expectation of the discovery probability in a multinomial model. The properties of the moment-based estimator are investigated so that confidence and prediction intervals can be constructed. The Good–Toulmin predictor and the discovery probability are shown to have a nonnegative correlation. A conditional nonparametric maximum likelihood estimator is developed as an alternative to the moment-based estimator. As an application, the methods are used to predict the probability of discovering a new gene from expressed sequence tags in a genomic sequencing experiment.Keywords
This publication has 3 references indexed in Scilit:
- Probabilistic Measures of Adequacy of a Numerical Search for a Global MaximumJournal of the American Statistical Association, 1989
- Stronger Forms of Zipf's LawJournal of the American Statistical Association, 1975
- Estimation of the Mean of the Selected PopulationJournal of the American Statistical Association, 1974