Identification of story units in audio-visual sequences by joint audio and video processing

1 January 1998

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 363-367
https://doi.org/10.1109/icip.1998.723500

Abstract

A novel technique, which uses a joint audio-visual analysis for scene identification and characterization, is proposed. The paper defines four different scene types: dialogues, stories, actions, and generic scenes. It then explains how any audio-visual material can be decomposed into a series of scenes obeying the previous classification, by properly analyzing and then combining the underlying audio and visual information. A rule-based procedure is defined for such purpose. Before such rule-based decision can take place, a series of low-level pre-processing tasks are suggested to adequately measure audio and visual correlations. As far as visual information is concerned, it is proposed to measure the similarities between non-consecutive shots using a learning vector quantization approach. An outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.

Keywords

This publication has 5 references indexed in Scilit:

Video content characterization and compaction for digital library applications
Published by SPIE-Intl Soc Optical Eng ,1997
Identification of successive correlated camera shots using audio and video information
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1997
Statistical approach to scene change detection
Published by SPIE-Intl Soc Optical Eng ,1995
Automatic partitioning of full-motion video
Multimedia Systems, 1993
The self-organizing map
Proceedings of the IEEE, 1990