Speaker identification and video analysis for hierarchical video shot classification

23 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 2, 550-553 vol.2
https://doi.org/10.1109/icip.1997.638830

Abstract

We present a new video shot classification and clustering technique to support content-based indexing, browsing and retrieval in video databases. The proposed method is based on the analysis of both the audio and visual data tracks. The visual stream is analyzed using a 3-D wavelet transform and segmented into shot units which are matched and clustered by visual content. Simultaneously, speaker changes are detected by tracking voiced phonemes in the audio signal. The clues obtained from the video and speech data are combined to classify and group the isolated video shots. This integrated approach also allows effective indexing of the audio-visual objects in multimedia databases.

Keywords

This publication has 11 references indexed in Scilit:

Video shot classification using human faces
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Integrated image and speech analysis for content-based video indexing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Combined audio and visual streams analysis for video sequence segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Frequency characteristics of foreign accented speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Efficient matching and clustering of video shots
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Subband analysis for robust speech recognition in the presence of car noise
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Audio as a support to scene change detection and characterization of video sequences
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1997
Multichannel video segmentation
Published by SPIE-Intl Soc Optical Eng ,1996
Rapid scene analysis on compressed video
IEEE Transactions on Circuits and Systems for Video Technology, 1995
Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990