A review of large-vocabulary continuous-speech

1 September 1996

journal article
review article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Magazine

Vol. 13 (5) , 45
https://doi.org/10.1109/79.536824

Abstract

Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Current laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. As a result, LVR technology appears to be on the brink of widespread deployment across a range of information technology (IT) systems. This article discusses the principles and architecture of current LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modem design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems.

Keywords

This publication has 80 references indexed in Scilit:

Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers
IEEE Transactions on Speech and Audio Processing, 1996
Speaker adaptation using constrained estimation of Gaussian mixtures
IEEE Transactions on Speech and Audio Processing, 1995
Maximum likelihood clustering of Gaussians for speech recognition
IEEE Transactions on Speech and Audio Processing, 1994
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
IEEE Transactions on Speech and Audio Processing, 1994
Spontaneous speech recognition for the credit card corpus using the HTK toolkit
IEEE Transactions on Speech and Audio Processing, 1994
Shared-distribution hidden Markov models for speech recognition
IEEE Transactions on Speech and Audio Processing, 1993
Tied mixture continuous parameter modeling for speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990
Perceptual linear predictive (PLP) analysis of speech
The Journal of the Acoustical Society of America, 1990
A tree-based statistical language model for natural language speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
Maximum likelihood estimation for multivariate observations of Markov sources
IEEE Transactions on Information Theory, 1982