Iterative normalization for speaker-adaptive training in continuous speech recognition
- 13 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The authors present several techniques to improve an algorithm presented last year for speaker-adaptive training in continuous speech recognition. The previous method uses a transformation matrix to modify the hidden Markov model (HMM) parameters of a prechosen prototype speaker to model a target speaker. To estimate the transformation matrix, it aligns a set of target speech with the same set of speech uttered by the prototype speaker using dynamic time warping. The authors focus on improving the previous method in the modeling of the spectral differences between two speakers, and the accuracy of the alignment. To improve the modeling of the spectral differences, they implemented a phoneme-dependent mapping procedure which transforms the prototype HMMs to the estimated target HMMs using a set of phoneme-dependent matrices. To improve the alignment, the authors developed a modeling of the silence, a linear duration normalization, and an iterative normalization procedure. They tested the new methods in the standard DARPA database with a grammar of perplexity 60. The performance shows a 30% word-error reduction compared to the previous algorithm.Keywords
This publication has 5 references indexed in Scilit:
- Rapid speaker adaptation using a probabilistic spectral mappingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The DARPA 1000-word resource management database for continuous speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Improved speaker adaption using text dependent spectral mappingsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Continuous speech recognition results of the BYBLOS system on the DARPA 1000-word resource management databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Dynamic programming algorithm optimization for spoken word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1978