Improving connected letter recognition by lipreading

1 January 1993

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 557-560 vol.1
https://doi.org/10.1109/icassp.1993.319179

Abstract

The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.

Keywords

This publication has 8 references indexed in Scilit:

A novel objective function for improved phoneme recognition using time-delay neural networks
IEEE Transactions on Neural Networks, 1990
Integration of acoustic and visual speech signals using neural networks
IEEE Communications Magazine, 1989
Phoneme recognition using time-delay neural networks
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
Lip Reading: Automatic Visual Recognition of Spoken Words
Published by Optica Publishing Group ,1989
An improved automatic lipreading system to enhance speech recognition
Published by Association for Computing Machinery (ACM) ,1988
Parallel Distributed Processing
Published by MIT Press ,1986
The use of a one-stage dynamic programming algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
Confusions Among Visually Perceived Consonants
Journal of Speech and Hearing Research, 1968