A maximum-likelihood approach to stochastic matching for robust speech recognition
- 1 May 1996
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Speech and Audio Processing
- Vol. 4 (3) , 190-202
- https://doi.org/10.1109/89.496215
Abstract
Presents a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) /spl Lambda//sub x/. The mismatch between the observed test utterance Y and the models /spl Lambda//sub x/ can be reduced in two ways: 1) by an inverse distortion function F/sub /spl nu//(.) that maps Y into an utterance X that matches better with the models /spl Lambda//sub x/ and 2) by a model transformation function G/sub /spl eta//(.) that maps /spl Lambda//sub x/ to the transformed model /spl Lambda//sub x/ that matches better with the utterance Y. We assume the functional form of the transformations F/sub /spl nu//(.) or G/sub /spl eta//(.) and estimate the parameters /spl nu/ or /spl eta/ in a ML manner using the expectation-maximization (EM) algorithm. The choice of the form of F/sub /spl nu//(.) or G/sub /spl eta//(.) is based on prior knowledge of the nature of the acoustic mismatch. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models, and no additional training data is required for the estimation of the mismatch prior to actual testing. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in improving the performance of a HMM-based continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels.Keywords
This publication has 45 references indexed in Scilit:
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chainsIEEE Transactions on Speech and Audio Processing, 1994
- Integrated models of signal and background with application to speaker identification in noiseIEEE Transactions on Speech and Audio Processing, 1994
- A minimax classification approach with application to robust speech recognitionIEEE Transactions on Speech and Audio Processing, 1993
- Filterbank-energy estimation using mixture and Markov models for recognition of noisy speechIEEE Transactions on Speech and Audio Processing, 1993
- Gain-adapted hidden Markov models for recognition of clean and noisy speechIEEE Transactions on Signal Processing, 1992
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- A frame-synchronous network search algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
- Speech enhancement using a minimum mean-square error log-spectral amplitude estimatorIEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
- Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
- A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov ChainsThe Annals of Mathematical Statistics, 1970