Phonemic HMM constrained by statistical VQ-code transition

Abstract
A hidden Markov modeling technique that uses a statistical modeling of vector quantization (VQ)-code transitions is proposed. A bigram-constrained HMM is obtained by combining a VQ-code bigram and the conventional speaker-independent HMM. The proposed model reduces overlapping of the feature distributions between different phonemes by restricting the local VQ-code transitions. The output probabilities in the model are conditioned by the VQ-code of the previous frame. Therefore, the output probability distribution of the model changes depending on the previous frame even in the same state. A speaker-dependent bigram-constrained HMM is obtained using a VQ-code bigram calculated from utterances of an input speaker. A speaker-independent bigram-constrained HMM is obtained using a VQ-code bigram calculated from utterances of many speakers. The model was evaluated by an 18-Japanese-consonant recognition experiment using 5240 words. The speaker-independent bigram-constrained HMM achieved an average recognition accuracy of 76.3% which is 5.5% higher than the conventional speaker-independent HMM.

This publication has 2 references indexed in Scilit: