Complexity reduction in a large vocabulary speech recognizer

Abstract
The authors provide a detailed description of all aspects of the implementation of a large-vocabulary speaker-independent, continuous speech recognizer used as a tool for the development of recognition algorithms based on hidden Markov models (HMMs) and Viterbi decoding. The complexity of HMM recognizers is greatly increased by the introduction of detailed context-dependent units for representing interword coarticulation. A vectorized representation of the data structures involved in the decoding process, along with compilation of the connection information among temporally consecutive words and an efficient implementation of the beam search pruning, has led to a speedup of the algorithm of about one order of magnitude. A guided search can be used during a tuning phase for obtaining a speedup of more than three times. An average recognition time of about 25 s per sentence, although far from real time, allows one to perform a series of training experiments and to tune the recognition system parameters in order to obtain high word accuracy on complex recognition tasks such as the DARPA resource management task.

This publication has 6 references indexed in Scilit: