Efficient summarization of stereoscopic video sequences
- 1 June 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Circuits and Systems for Video Technology
- Vol. 10 (4) , 501-517
- https://doi.org/10.1109/76.844996
Abstract
An efficient technique for summarization of stereoscopic video sequences is presented, which extracts a small but meaningful set of video frames using a content-based sampling algorithm. The proposed video-content representation provides the capability of browsing digital stereoscopic video sequences and performing more efficient content-based queries and indexing. Each stereoscopic video sequence is first partitioned into shots by applying a shot-cut detection algorithm so that frames (or stereo pairs) of similar visual characteristics are gathered together. Each shot is then analyzed using stereo-imaging techniques, and the disparity field, occluded areas, and depth map are estimated. A multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm is applied for color and depth segmentation, while fusion of color and depth segments is employed for reliable video object extraction. In particular, color segments are projected onto depth segments so that video objects on the same depth plane are retained, while at the same time accurate object boundaries are extracted. Feature vectors are then constructed using multidimensional fuzzy classification of segment features including size, location, color, and depth. Shot selection is accomplished by clustering similar shots based on the generalized Lloyd-Max algorithm, while for a given shot, key frames are extracted using an optimization method for locating frames of minimally correlated feature vectors. For efficient implementation of the latter method, a genetic algorithm is used. Experimental results are presented, which indicate the reliable performance of the proposed scheme on real-life stereoscopic video sequencesKeywords
This publication has 35 references indexed in Scilit:
- On-line retrainable neural networks: improving the performance of neural networks in image analysis problemsIEEE Transactions on Neural Networks, 2000
- Automatic segmentation of moving objects for video object plane generationIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Guest EditorialIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Disparity and occlusion estimation in multiocular systems and their coding for the communication of multiview image sequencesIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Disparity field and depth map coding for multiview 3D image generationSignal Processing: Image Communication, 1998
- Low bit-rate coding of image sequences using adaptive regions of interestIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Video segmentation based on multiple features for interactive multimedia applicationsIEEE Transactions on Circuits and Systems for Video Technology, 1998
- 3-D model-based segmentation of videoconference image sequencesIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Object-based coding of stereo image sequences using joint 3-D motion/disparity compensationIEEE Transactions on Circuits and Systems for Video Technology, 1997
- Graph theory for image analysis: an approach based on the shortest spanning treeIEE Proceedings F Communications, Radar and Signal Processing, 1986