An Improved Word-Detection Algorithm for Telephone-Quality Speech Incorporating Both Syntactic and Semantic Constraints

Abstract

Accurate location of the endpoints of spoken words and phrases is important for reliable and robust speech recognition. The endpoint detection problem is fairly straightforward for high-level speech signals in low-level stationary noise environments (e.g., signal-to-noise ratios greater than 30-dB rms). However, this problem becomes considerably more difficult when either the speech signals are too low in level (relative to the background noise), or when the background noise becomes highly nonstationary. Such conditions are often encountered in the switched telephone network when the limitation on using local dialed-up lines is removed. In such cases the background noise is often highly variable in both level and spectral content because of transmission line characteristics, transients and tones from the line and/or from signal generators, etc. Conventional speech endpoint detectors have been shown to perform very poorly (on the order of 50-percent word detection) under these conditions. In this paper we present an improved word-detection algorithm, which can incorporate both vocabulary (syntactic) and task (semantic) information, leading to word-detection accuracies close to 100 percent for isolated digit detection over a wide range of telephone transmission conditions.

This publication has 8 references indexed in Scilit:

Speaker-independent isolated word recognition using a 129-word airline vocabulary
The Journal of the Acoustical Society of America, 1982
An improved endpoint detector for isolated word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
Isolated and Connected Word Recognition--Theory and Selected Applications
IEEE Transactions on Communications, 1981
A level building dynamic time warping algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
Considerations in applying clustering techniques to speaker‐independent word recognition
The Journal of the Acoustical Society of America, 1979
Speaker-independent recognition of isolated words using clustering techniques
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
Minimum prediction residual principle applied to speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975