A comparison of signal processing front ends for automatic word recognition

1 July 1995

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Speech and Audio Processing

Vol. 3 (4) , 286-293
https://doi.org/10.1109/89.397093

Abstract

This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated.<>

Keywords

This publication has 11 references indexed in Scilit:

Multi-style training for robust isolated-word speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Recognition of speech under stress and in noise
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A computational model for the peripheral auditory system: Application of speech recognition research
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A comparison of several acoustic representations for speech recognition with degraded and undegraded speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The design for the wall street journal-based CSR corpus
Published by Association for Computational Linguistics (ACL) ,1992
Multiple approaches to robust speech recognition
Published by Association for Computational Linguistics (ACL) ,1992
Speech coding in the auditory nerve: I. Vowel-like sounds
The Journal of the Acoustical Society of America, 1984
Encoding of steady-state vowels in the auditory nerve: Representation in terms of discharge rate
The Journal of the Acoustical Society of America, 1979
Frequency discrimination in the auditory system: Place or periodicity mechanisms?
Proceedings of the IEEE, 1970
Analog Measurements of Sound Radiation from the Mouth
The Journal of the Acoustical Society of America, 1960