Improving connected letter recognition by lipreading
- 1 January 1993
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 557-560 vol.1
- https://doi.org/10.1109/icassp.1993.319179
Abstract
The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.Keywords
This publication has 8 references indexed in Scilit:
- A novel objective function for improved phoneme recognition using time-delay neural networksIEEE Transactions on Neural Networks, 1990
- Integration of acoustic and visual speech signals using neural networksIEEE Communications Magazine, 1989
- Phoneme recognition using time-delay neural networksIEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
- Lip Reading: Automatic Visual Recognition of Spoken WordsPublished by Optica Publishing Group ,1989
- An improved automatic lipreading system to enhance speech recognitionPublished by Association for Computing Machinery (ACM) ,1988
- Parallel Distributed ProcessingPublished by MIT Press ,1986
- The use of a one-stage dynamic programming algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
- Confusions Among Visually Perceived ConsonantsJournal of Speech and Hearing Research, 1968