Multichannel video segmentation

1 November 1996

proceedings article
Published by SPIE-Intl Soc Optical Eng

p. 252-264
https://doi.org/10.1117/12.257295

Abstract

A video is a multimedia document which is structured in scenes and shots. Scenes are lists of consecutive shots characterized by common visual and audio features. Shots are sets of consecutive frames separated by cuts, which can be easily recognized by existing techniques. Video segmentation into scenes is a new and open problem. It is needed for scenes retrieval, specially in authoring and interactive video applications. We propose a new approach of video segmentation into scenes, which is based on several media and takes into account the film syntax. We characterize a scene by some similarity between color histograms of the current shot, and of one of the most recent previous shots. Similarity between a shot frame and a frame of a previous shot may indicate the presence of alternate shots, which belong to the same scene. Other techniques based on projective geometry are presented in a companion paper. These techniques enable to detect the movement of the camera. We recognize the speakers of a scene by AR vector model techniques, such as the one proposed by some of the authors in the Orphee system, implemented at Laforia. However the speaker recognition problem is much more difficult when applied to the video CD-I, due to several transition types and various types of noise. We present experimental results, based on this approach. Detection of alternate shots is efficient, but speaker recognition needs improvements.

Keywords

This publication has 0 references indexed in Scilit: