Inducing Probabilistic Grammars by Bayesian Model Merging

Preprint

13 September 1994

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.cmp-lg/9409010

Abstract

We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based $n$-grams, and stochastic context-free grammars.

Keywords

All Related Versions

Version 1, 1994-09-13, ArXiv

This publication has 0 references indexed in Scilit: