Transform representation of the spectra of acoustic speech segments with applications. I. General approach and application to speech recognition

1 April 1993

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Speech and Audio Processing

Vol. 1 (2) , 180-195
https://doi.org/10.1109/89.222877

Abstract

We present in this series of two papers a new approach for modeling and capturing the time-varying structure of the spectral envelope of speech. In this approach, we use an acoustic subword decomposition and the Karhunen-Loeve transform (UT) to extract and efficiently represent the highly correlated structure of the spectral envelope. Integration of the UT with acoustic subword modeling is a novel approach that concisely represents both steady-state and dynamic features of the spectra in a unified framework that very effectively captures acoustic-phonetic patterns. The organization of these two papers is as follows: the first paper, Part I presents the physiological and perceptual basis for the approach, the frame-based and acoustic-subword-based spectral representation, and applications to speaker-dependent recognition. The performance of the recognition algorithm based on this approach compares favorably to other existing techniques. Part II will present a frequency-domain coding technique by analysis/synthesis. This application of the new method produces good quality speech at low bit rates.

This publication has 18 references indexed in Scilit:

Automatic recognition of syllabic speech segments using spectral and temporal features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Discrete utterance speech recognition without time alignment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Characterization of spectral transitions with applications to acoustic sub-word segmentation and automatic speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Robust LPC analysis and synthesis using the KL transformation of acoustic subwords spectra
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Automatic speech recognition using acoustic sub-words and no time alignment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Frame-specific statistical features for speaker independent speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
Discrete utterance speech recognition without time alignment
IEEE Transactions on Information Theory, 1983
Segmental durations in connected speech signals: Preliminary results
The Journal of the Acoustical Society of America, 1982
Speech Analysis Synthesis and Perception
Published by Springer Nature ,1972
Critical Bands
Published by Elsevier ,1970