Speaker-independent vowel recognition: spectrograms versus cochleagrams
- 4 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 16 (15206149) , 533-536
- https://doi.org/10.1109/icassp.1990.115767
Abstract
The ability of multilayer perceptrons (MLPs) trained with backpropagation to classify vowels excised from natural continuous speech is examined. Two spectral representations are compared: spectrograms and cochleagrams. The features used to train the MLPs include discrete Fourier transform (DFT) or cochleagram coefficients from a single frame in the middle of the vowel, or coefficients from each third of the vowel. The effects of estimates of pitch, duration, and the relative amplitude of the vowel were investigated. The experiments show that with coefficients alone, the cochleagram is superior to the spectrogram in classification performance for all experimental conditions. With the three additional features, however, the results are comparable. Perceptual experiments with trained human listeners on the same data revealed that MLPs perform much better than humans on vowels excised from context.Keywords
This publication has 8 references indexed in Scilit:
- A computational model of filtering, detection, and compression in the cochleaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A computational model for the peripheral auditory system: Application of speech recognition researchPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Some phonetic recognition experiments using artificial neural netsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Speaker dependent and independent speech recognition experiments with an auditory modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Image processing for image understanding with neural netsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1989
- Classification of pitch periods using expert knowledge and neural net classifiersThe Journal of the Acoustical Society of America, 1988
- Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environmentJournal of Phonetics, 1988
- Speech processing in the auditory system I: The representation of speech sounds in the responses of the auditory nerveThe Journal of the Acoustical Society of America, 1985