Fast speaker adaptation combined with soft vector quantization in an HMM speech recognition system

1 January 1992

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1 (15206149) , 461-464 vol.1
https://doi.org/10.1109/icassp.1992.225872

Abstract

The authors describe a method for combining speaker adaptation by feature vector transformation with semi-continuous hidden Markov modeling (SCHMM). Since the reference speaker's voice is represented in the SCHMM system by multidimensional Gaussian distributions, it is these distributions rather than feature vectors that must be transformed. The performance of hard-decision vector quantization (HVQ), soft-decision VQ (SVQ), and SCHMM are compared as are the speaker-adaptive and speaker-independent systems. In addition, the influence of dynamic features is investigated. The definition of subword units is optimized, and, with respect to full or diagonal covariance matrices and codebook size, the SCHMM system is optimized. Model initialization and distribution reestimation during training is introduced. Significant improvements are obtained compared to previously reported systems based on HVQ: from 71.6% to 84.6% (speaker-independent) and from 80.4% to 87.4% (speaker-adaptive) mean recognition rate under difficult conditions.

Keywords

This publication has 4 references indexed in Scilit:

Speaker adaptation for recognition systems with a large vocabulary
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Fast speaker adaptation for speech recognition systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Soft-decision vector quantization based on the Dempster/Shafer theory
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1991
Semi-continuous hidden Markov models for speech signals
Computer Speech & Language, 1989