Multimedia sensor fusion for intelligent camera control

Abstract
A multisensor-based control system for an active pan/tilt/zoom camera is presented. Acoustic and visual information from multimedia sensors is used to locate the person currently speaking and track people moving about in a room. Pixel-level fusion of skin color with an image produced from interaural sound delay provides a simple means of detecting the face of the current speaker. For wider-scale surveillance tasks, moving targets are detected using color image differencing. Target data is fed to a behavior-based fuzzy control system which uses expert rules to aim the camera. Applications include video-conferencing, security, surveillance, and advances in human-computer interaction. The system has been implemented in on a multimedia PC equipped with a wide angle camera, a Canon VC-CI pan/tilt/zoom camera, and two microphones.

This publication has 16 references indexed in Scilit: