Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction
- 1 July 1999
- journal article
- Published by MIT Press in Neural Computation
- Vol. 11 (5) , 1155-1182
- https://doi.org/10.1162/089976699300016395
Abstract
We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator The prior is a bias for maximally structured and minimally ambiguous models In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward ex - tinction, effectively turning them off Thus structure discovery is folded into parameter estimation We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model Trimming accelerates learning by sparsifying the model All operations monotonically and maximally increase the posterior probability, yielding structure - learning algorithms only slightly slower than pa - rameter estimation via expectation - maximization (EM), and orders of magnitude faster than search - based structure induction When applied to hidden Markov model (HMM) training, the resulting models show superior generalization to held - out test data In many cases the resulting models are so sparse and concise that they are interpretable , with hidden states that strongly correlate with meaningful categoriesKeywords
This publication has 12 references indexed in Scilit:
- Approximate Statistical Tests for Comparing Supervised Classification Learning AlgorithmsNeural Computation, 1998
- Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability DistributionsNeural Computation, 1997
- HMM topology design using maximum likelihood successive state splittingComputer Speech & Language, 1997
- On the LambertW functionAdvances in Computational Mathematics, 1996
- The birth of the giant componentRandom Structures & Algorithms, 1993
- Gaussian limiting distributions for the number of components in combinatorial structuresJournal of Combinatorial Theory, Series A, 1990
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Dynamic programming algorithm optimization for spoken word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1978
- On some problems of random netsBulletin of Mathematical Biology, 1952
- Connectivity of random netsBulletin of Mathematical Biology, 1951