Speaker-independent vowel recognition: spectrograms versus cochleagrams

Abstract
The ability of multilayer perceptrons (MLPs) trained with backpropagation to classify vowels excised from natural continuous speech is examined. Two spectral representations are compared: spectrograms and cochleagrams. The features used to train the MLPs include discrete Fourier transform (DFT) or cochleagram coefficients from a single frame in the middle of the vowel, or coefficients from each third of the vowel. The effects of estimates of pitch, duration, and the relative amplitude of the vowel were investigated. The experiments show that with coefficients alone, the cochleagram is superior to the spectrogram in classification performance for all experimental conditions. With the three additional features, however, the results are comparable. Perceptual experiments with trained human listeners on the same data revealed that MLPs perform much better than humans on vowels excised from context.

This publication has 8 references indexed in Scilit: