Abstract
Hidden Markov model (HMM) decomposition is used for recognizing speech in the presence of an interfering background speaker. The foreground speech is modeled by a set of left-to-right isolated word HMMs trained on a small isolated word database, and the background speech is modeled by a parallel ergodic HMM trained on a subset of TIMIT. The standard output approximation (OA) method of estimating the output probability distributions is used, and compared with a simple model combination (MC) technique. Recent work in this area has shown the effectiveness of vocabulary-specific background speech models, and hence this is used as a baseline. The results show that the general ergodic background model is as effective as a vocabulary-specific model. However, the MC technique is not effective.

This publication has 4 references indexed in Scilit: