Sources of degradation of speech recognition in the telephone network
- 17 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
In this paper we compare speech recognition accuracy for high- quality speech recorded under controlled conditions with speech as it appears over long-distance telephone lines. In addition to comparing recognition accuracy, we use telephone-channel simu- lation to identify the sources of degradation of speech over tele- phone lines that have the greatest impact on speech recognition accuracy. We first compare the performance of the CMU SPHINX-I system on the TIMIT and NTIMIT databases (3,8). We found that other factors beyond a mere decrease in bandwidth cause the observed degradation in recognition accuracy, and that the environmental compensation algorithms RASTA (6) and CDCN (1) fail to compensate completely for degradations intro- duced by the telephone network. In the second part of this paper we attempt to identify the most problematic telephone-channel impairments using a commercial telephone channel simulator and the SPHINX-II system. Of the various effects considered, additive noise and linear filtering appear to have the greatest impact on rec- ognition accuracy. Finally, we examined the performance of three cepstral compensation algorithms in the presence of the most damaging conditions. We found the compensation algorithms to be effective except for the worst 1% of the telephone channels.Keywords
This publication has 7 references indexed in Scilit:
- NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The SPHINX-II speech recognition system: an overviewComputer Speech & Language, 1993
- Reduced channel dependence for speech recognitionPublished by Association for Computational Linguistics (ACL) ,1992
- Phonetic classification on wide-band and telephone quality speechPublished by Association for Computational Linguistics (ACL) ,1992
- Efficient joint compensation of speech for the effects of additive noise and linear filteringPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)Published by International Speech Communication Association ,1991
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentencesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980