An investigation on the use of acoustic sub-word units for automatic speech recognition
- 24 March 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 12, 821-824
- https://doi.org/10.1109/icassp.1987.1169589
Abstract
An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches. In this approach, the basic sub-word units are defined acoustically, but not necessarily phonetically. An algorithm was developed which automatically decomposed speech into multiple sub-word segments, based solely upon strict acoustic criteria, without any reference to linguistic content. By repeating this procedure on a large corpus of speech data we obtained an extensive pool of unlabeled sub-word speech segments. Then using well defined clustering techniques, a small set of representative acoustic sub-word units (e.g. an inventory of units) was created. This process is fast, easy to use, and required no human intervention. The interpretation of these sub-word units, in a linguistic sense, in the context of word decoding is an important issue which must be addressed for them to be useful in a large vocabulary system. We have not yet addressed this issue; instead a couple of simple experiments were performed to determine if these acoustic sub-word units had any potential value for speech recognition. For these experiments we used a connected digits database from a single female talker. A 25 sub-word unit codebook of acoustic segments was created from about 1600 segments drawn from 100 connected digit strings. A simple isolated digit recognition system, designed using the statistics of the codewords in the acoustic sub-word unit codebook had a recognition accuracy of 100%. In another experiment a connected digit recognition system was created with representative digit templates created by concatenating the sub-word units in an appropriate manner. The system had a string recognition accuracy of 96%.Keywords
This publication has 13 references indexed in Scilit:
- A database for speaker-independent digit recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Network-based connected digit recognition using explicit acoustic-phonetic modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Evaluation of a word recognition system using syntax analysisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Automatic recognition of continuously spoken sentences from a finite state grammerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- An introduction to hidden Markov modelsIEEE ASSP Magazine, 1986
- A modified K-means clustering algorithm for use in isolated work recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
- Structural methods in automatic speech recognitionProceedings of the IEEE, 1985
- Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands todayIEEE Spectrum, 1981
- Distance measures for speech processingIEEE Transactions on Acoustics, Speech, and Signal Processing, 1976
- Continuous speech recognition by statistical methodsProceedings of the IEEE, 1976