Combined audio and visual streams analysis for video sequence segmentation

22 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 4 (15206149) , 2665-2668
https://doi.org/10.1109/icassp.1997.595337

Abstract

We present a new approach to video sequence segmentation into individual shots. Unlike previous approaches, our technique segments the video sequence by combining two streams of information extracted from the visual track with audio track segmentation information. The visual streams of information are computed from the coarse data in a 3-D wavelet decomposition of the video track. They consist of (i) information derived from temporal edges detected along the time evolution of the intensity of each pixel in temporally sub-sampled spatially filtered coarse frames, and (ii) information derived from the coarse spatio-temporal evolution of intra-frame edges in the spatially filtered coarse frames. Our approach is particularly matched to progressively transmitted video.

Keywords

This publication has 8 references indexed in Scilit:

Integrated image and speech analysis for content-based video indexing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Detecting scene changes and activities in video databases
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Rapid scene analysis on compressed video
IEEE Transactions on Circuits and Systems for Video Technology, 1995
Automatic parsing and indexing of news video
Multimedia Systems, 1995
Structured Video Computing
IEEE MultiMedia, 1994
Content based video indexing and retrieval
IEEE MultiMedia, 1994
Wavelet transform domain filters: a spatially selective noise filtration technique
IEEE Transactions on Image Processing, 1994
Characterization of signals from multiscale edges
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992