Abstract
Good HMM-based speech recognition performance requires at most minimal inaccuracies to be introduced by HMM conditional independence assumptions. In this work, HMM conditional independence assumptions are relaxed in a principled way. For each hidden state value, additional dependencies are added between observation elements to increase both accuracy and discriminability. These additional dependencies are chosen according to natural statistical dependencies extant in training data that are not well modeled by an HMM. The result is called a buried Markov model (BMM) because the underlying Markov chain in an HMM is further hidden (buried) by specific cross-observation dependencies. Gaussian mixture HMMs are extended to represent BMM dependencies and new EM update equations are derived. On preliminary experiments with a large-vocabulary isolated-word speech database, BMMs are able to achieve an 11% improvement in WER with only a 9.5% increase in the number of parameters using a single state per mono-phone speech recognition system.

This publication has 10 references indexed in Scilit: