An LVQ based reference model for speaker-adaptive speech recognition

Abstract
A novel type of hierarchical phoneme model for speaker adaptation, based on both hidden Markov models (HMM) and learned vector quantization (LVQ) networks is presented. Low-level tied LVQ phoneme models are trained speaker-dependently and independently, yielding a pool of speaker-biased phoneme models which can be mixed into high-level speaker-adaptive phoneme models. Rapid speaker adaptation is performed by finding an optimal mixture for these models at recognition time, given only a small amount of speech data; subsequently, the models are fine-tuned to the new speaker's voice by further parameter reestimation. In preliminary experiments with a continuous speech task using 40 context-free phoneme models at task perplexity 111, the authors achieved 82% word accuracy for speaker-dependent recognition and 73% in the speaker-adaptive mode.

This publication has 6 references indexed in Scilit: