Continuous speech recognition based on high plausibility regions

Abstract
The authors propose an approach to phoneme-based continuous speech recognition when a time function of the plausibility of observing each phoneme (spotting result) is given. They introduce a criterion for the best sentence, based on the sum of plausibilities of individual symbols composing the sentence. Based on the idea of making use of high plausibility regions to reduce the computational load while maintaining optimality, the method finds the most plausible sentences relating to the input speech. Two optimization procedures are defined to deal with the following embedded search processes: (1) finding the best path connecting peaks of the plausibility functions of two successive symbols, and (2) finding the best time transition slot index for two given peaks. Experimental results show that the method gives better recognition precision while requiring about 1/20 of the computing time of the traditional DP-based methods. The experimental system obtained a 95% sentence recognition rate on a multispeaker test.<>

This publication has 8 references indexed in Scilit: