Sources of degradation of speech recognition in the telephone network

Abstract
In this paper we compare speech recognition accuracy for high- quality speech recorded under controlled conditions with speech as it appears over long-distance telephone lines. In addition to comparing recognition accuracy, we use telephone-channel simu- lation to identify the sources of degradation of speech over tele- phone lines that have the greatest impact on speech recognition accuracy. We first compare the performance of the CMU SPHINX-I system on the TIMIT and NTIMIT databases (3,8). We found that other factors beyond a mere decrease in bandwidth cause the observed degradation in recognition accuracy, and that the environmental compensation algorithms RASTA (6) and CDCN (1) fail to compensate completely for degradations intro- duced by the telephone network. In the second part of this paper we attempt to identify the most problematic telephone-channel impairments using a commercial telephone channel simulator and the SPHINX-II system. Of the various effects considered, additive noise and linear filtering appear to have the greatest impact on rec- ognition accuracy. Finally, we examined the performance of three cepstral compensation algorithms in the presence of the most damaging conditions. We found the compensation algorithms to be effective except for the worst 1% of the telephone channels.

This publication has 7 references indexed in Scilit: