A real-time system for high-level video representation: application to video surveillance

Abstract
The steadily increasing need for video content accessibility necessitates the development of stable systems to represent video sequences based on their high-level (semantic) content. The core of such systems is the automatic extraction of video content. In this paper, a computational layered framework to effectively extract multiple high-level features of a video shot is presented. The objective with this framework is to extract rich high-level video descriptions of real world scenes. In our framework, high-level descriptions are related to moving objects which are represented by their spatio-temporal low-level features. High-level features are represented by generic high-level object features such as events. To achieve higher applicability, descriptions are extracted independently of the video context. Our framework is based on four interacting video processing layers: enhancement to estimate and reduce noise, stabilization to compensate for global changes, analysis to extract meaningful objects, and interpretation to extract context-independent semantic features. The effectiveness and real-time response of the our framework are demonstrated by extensive experimentation on indoor and outdoor video shots in the presence of multi-object occlusion, noise, and artifacts.

This publication has 0 references indexed in Scilit: