Speech-to-image media conversion based on VQ and neural network

1 January 1991

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2865-2868 vol.4
https://doi.org/10.1109/icassp.1991.151000

Abstract

Automatic media conversion schemes from speech to a facial image and a construction of a real-time image synthesis system are presented. The purpose of this research is to realize an intelligent human-machine interface or intelligent communication system with synthesized human face images. A human face image is reconstructed on the display of a terminal using a 3-D surface model and texture mapping technique. Facial motion images are synthesized by transformation of the 3-D model. In the motion driving method, based on vector quantization and the neural network, the synthesized head image can appear to speak some given words and phrases naturally, in synchronization with voice signals from a speaker.

Keywords

This publication has 5 references indexed in Scilit:

Model-based analysis synthesis image coding (MBASIC) system for a person's face
Signal Processing: Image Communication, 1989
The pixel machine: a parallel image computer
ACM SIGGRAPH Computer Graphics, 1989
Parameterized Models for Facial Animation
IEEE Computer Graphics and Applications, 1982
Characteristics of the mouth shape in the production of Japanese - Stroboscopic observation.
Acoustical Science and Technology, 1982
An Algorithm for Vector Quantizer Design
IEEE Transactions on Communications, 1980