Abstract
A speaker adaptation technique based on the separation of speech spectra variation sources is developed for improving speaker-independent continuous speech recognition. The variation sources include speaker acoustic characteristics, phonologic characteristics, and contextual dependency of allophones. Statistical methods are formulated to normalize speech spectra based on speaker acoustic characteristics and then adapt mixture Gaussian density phone models based on speaker phonologic characteristics. Adaptation experiments using short calibration speech (5 s/speaker) have shown substantial performance improvement over the baseline recognition system. On a TIMIT test set, where the task vocabulary size is 853 and the test set perplexity is 104, the recognition word accuracy has been improved from 86.9% to 90.6% (28.2% error reduction). On a separate test set which contains an additional variation source of recording channel mismatch and with the test set perplexity of 101, the recognition word accuracy has been improved from 65.4% to 85.5% (58.1% error reduction).<>

This publication has 12 references indexed in Scilit: