Abstract
The Lincoln robust HMM (hidden Markov model) recognizer has been converted from a single Gaussian or Gaussian mixture PDF per state to tied mixtures in which a single set of Gaussians is shared between all states. There were initial difficulties caused by the use of mixture pruning but these were cured by using observation pruning. Fixed weight smoothing of the mixture weights allowed the use of word-boundary-context-dependent triphone models for both speaker-dependent (SD) and speaker-independent (SI) recognition. A second-differential observation stream further improved SI performance but not SD performance. A novel form of phonetic context model, the semiphone, is also introduced. This model significantly reduces the number of states required to model a vocabulary and unifies triphone and diphone modeling.

This publication has 13 references indexed in Scilit: