Locating and tracking facial speech features

1 January 1996

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1 (10514651) , 652-656 vol.1
https://doi.org/10.1109/icpr.1996.546105

Abstract

This paper describes a robust method for extracting visual speech information from the shape of lips to be used for an automatic speechreading (lipreading) systems. Lip deformation is modelled by a statistically based deformable contour model which learns typical lip deformation from a training set. The main difficulty in locating and tracking lips consists of finding dominant image features for representing the lip contours. We describe the use of a statistical profile model which learns dominant image features from a training set. The model captures global intensity variation due to different illumination and different skin reflectance as well as intensity changes at the inner lip contour due to mouth opening and visibility of teeth and tongue. The method is validated for locating and tracking lip movements on a database of a broad variety of speakers.

Keywords

This publication has 12 references indexed in Scilit:

Speechreading using shape and intensity information
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A unified approach to coding and interpreting face images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Nonlinear manifold learning for visual speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Visual speech recognition using active shape models and hidden Markov models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996
Real-time lip tracking for audio-visual speech recognition applications
Published by Springer Nature ,1996
Learning to recognise talking faces
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996
Use of active shape models for locating structures in medical images
Image and Vision Computing, 1994
Analysis and synthesis of facial image sequences using physical and anatomical models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
Lipreading and audio-visual speech perception
Philosophical Transactions Of The Royal Society B-Biological Sciences, 1992
Integration of acoustic and visual speech signals using neural networks
IEEE Communications Magazine, 1989