High performance connected digit recognition using maximum mutual information estimation

Abstract
The authors describe the latest development by the speech research group at CRIM (Centre de Recherche Informatique de Montreal) in speaker-independent connected digit recognition, using hidden Markov Models (HMMs) trained with maximum mutual information estimation, in conjunction with connectionist models. The experiments described were all done on the complete adult portion of the 10 kHz speaker-independent TI/NIST connected digit database. The baseline system, using discrete HMMs and maximum likelihood estimation, has a 98.6% word recognition rate and a 96.1% string recognition rate. The authors describe techniques that made it possible to improve greatly the baseline system recognition rate. The 99.3% recognition rate and 98.0% string recognition rate were obtained with a single model per unit using discrete HMMs and recurrent neural networks. Using semi-continuous HMMs with two models per digit (one for male and one for female speakers), a 99.5% word recognition rate and a 98.4% string recognition rate were achieved.

This publication has 13 references indexed in Scilit: