A review of large-vocabulary continuous-speech
- 1 September 1996
- journal article
- review article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Magazine
- Vol. 13 (5) , 45
- https://doi.org/10.1109/79.536824
Abstract
Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Current laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. As a result, LVR technology appears to be on the brink of widespread deployment across a range of information technology (IT) systems. This article discusses the principles and architecture of current LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modem design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems.Keywords
This publication has 80 references indexed in Scilit:
- Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizersIEEE Transactions on Speech and Audio Processing, 1996
- Speaker adaptation using constrained estimation of Gaussian mixturesIEEE Transactions on Speech and Audio Processing, 1995
- Maximum likelihood clustering of Gaussians for speech recognitionIEEE Transactions on Speech and Audio Processing, 1994
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chainsIEEE Transactions on Speech and Audio Processing, 1994
- Spontaneous speech recognition for the credit card corpus using the HTK toolkitIEEE Transactions on Speech and Audio Processing, 1994
- Shared-distribution hidden Markov models for speech recognitionIEEE Transactions on Speech and Audio Processing, 1993
- Tied mixture continuous parameter modeling for speech recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1990
- Perceptual linear predictive (PLP) analysis of speechThe Journal of the Acoustical Society of America, 1990
- A tree-based statistical language model for natural language speech recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
- Maximum likelihood estimation for multivariate observations of Markov sourcesIEEE Transactions on Information Theory, 1982