An investigation on the use of acoustic sub-word units for automatic speech recognition

24 March 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 12, 821-824
https://doi.org/10.1109/icassp.1987.1169589

Abstract

An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches. In this approach, the basic sub-word units are defined acoustically, but not necessarily phonetically. An algorithm was developed which automatically decomposed speech into multiple sub-word segments, based solely upon strict acoustic criteria, without any reference to linguistic content. By repeating this procedure on a large corpus of speech data we obtained an extensive pool of unlabeled sub-word speech segments. Then using well defined clustering techniques, a small set of representative acoustic sub-word units (e.g. an inventory of units) was created. This process is fast, easy to use, and required no human intervention. The interpretation of these sub-word units, in a linguistic sense, in the context of word decoding is an important issue which must be addressed for them to be useful in a large vocabulary system. We have not yet addressed this issue; instead a couple of simple experiments were performed to determine if these acoustic sub-word units had any potential value for speech recognition. For these experiments we used a connected digits database from a single female talker. A 25 sub-word unit codebook of acoustic segments was created from about 1600 segments drawn from 100 connected digit strings. A simple isolated digit recognition system, designed using the statistics of the codewords in the acoustic sub-word unit codebook had a recognition accuracy of 100%. In another experiment a connected digit recognition system was created with representative digit templates created by concatenating the sub-word units in an appropriate manner. The system had a string recognition accuracy of 96%.

Keywords

This publication has 13 references indexed in Scilit:

A database for speaker-independent digit recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Network-based connected digit recognition using explicit acoustic-phonetic modeling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Evaluation of a word recognition system using syntax analysis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Automatic recognition of continuously spoken sentences from a finite state grammer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
An introduction to hidden Markov models
IEEE ASSP Magazine, 1986
A modified K-means clustering algorithm for use in isolated work recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
Structural methods in automatic speech recognition
Proceedings of the IEEE, 1985
Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today
IEEE Spectrum, 1981
Distance measures for speech processing
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1976
Continuous speech recognition by statistical methods
Proceedings of the IEEE, 1976