The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Top Cited Papers
- 1 January 2010
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 57 (2) , 1-30
- https://doi.org/10.1145/1667053.1667056
Abstract
We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.Keywords
All Related Versions
Funding Information
- Office of Naval Research (175-6343)
- Division of Behavioral and Cognitive Sciences (BCS-0631518)
- National Science Foundation (745520)
This publication has 46 references indexed in Scilit:
- The Nested Dirichlet ProcessJournal of the American Statistical Association, 2008
- Hierarchical Dirichlet ProcessesJournal of the American Statistical Association, 2006
- Variational inference for Dirichlet process mixturesBayesian Analysis, 2006
- Statistical mechanics of complex networksReviews of Modern Physics, 2002
- Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomiesThe VLDB Journal, 1998
- Bayes FactorsJournal of the American Statistical Association, 1995
- Bayesian Density Estimation and Inference Using MixturesJournal of the American Statistical Association, 1995
- The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation ProblemJournal of the American Statistical Association, 1994
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990
- Sampling-Based Approaches to Calculating Marginal DensitiesJournal of the American Statistical Association, 1990