Phoneme HMMs constrained by frame correlations

Abstract
Phoneme HMMs (hidden Markov models) that use correlations between two frames are proposed. The proposed technique constrains the output probability distributions of speaker-independent HMMs so that they are suitable for the input speaker. The speaker-dependent BC (bigram-constrained)-HMMs and speaker-independent BC-HMMs are generated from the conventional speaker-independent HMMs by combining the VQ (vector quantization)-code bigram (discrete case and tied-mixture case) or the conditional Gaussian density function (continuous case). The new models were evaluated by 23-phoneme recognition in continuous speech. In the speaker-dependent BC-HMMs, which use the speaker-dependent bigram created by 50 additional sentences of the test speaker, the best recognition accuracy of 74.8% was obtained by the tied-mixture type BC-HMMs. In the speaker-independent BC-HMMs, the best recognition accuracy of 67.5% was obtained by the continuous type BC-HMMs.<>

This publication has 5 references indexed in Scilit: