The 1994 HTK large vocabulary speech recognition system

Abstract
This paper describes recent work on the HTK large vocabulary speech recognition system. The system uses tied-state cross-word context-dependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to produce word lattices to allow flexible and efficient system development, as well as multi-pass operation for use with computationally expensive acoustic and/or language models. The system vocabulary can now be up to 65 k words, the final acoustic models have been extended to be sensitive to more acoustic context (quinphones), a 4-gram language model has been used and unsupervised incremental speaker adaptation incorporated. The resulting system gave the lowest error rates on both the H1-P0 and H1-C1 hub tasks in the November 1994 ARPA CSR evaluation.

This publication has 5 references indexed in Scilit: