Speaker identification and video analysis for hierarchical video shot classification
- 23 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2, 550-553 vol.2
- https://doi.org/10.1109/icip.1997.638830
Abstract
We present a new video shot classification and clustering technique to support content-based indexing, browsing and retrieval in video databases. The proposed method is based on the analysis of both the audio and visual data tracks. The visual stream is analyzed using a 3-D wavelet transform and segmented into shot units which are matched and clustered by visual content. Simultaneously, speaker changes are detected by tracking voiced phonemes in the audio signal. The clues obtained from the video and speech data are combined to classify and group the isolated video shots. This integrated approach also allows effective indexing of the audio-visual objects in multimedia databases.Keywords
This publication has 11 references indexed in Scilit:
- Video shot classification using human facesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Integrated image and speech analysis for content-based video indexingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Combined audio and visual streams analysis for video sequence segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Frequency characteristics of foreign accented speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Efficient matching and clustering of video shotsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Subband analysis for robust speech recognition in the presence of car noisePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Audio as a support to scene change detection and characterization of video sequencesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997
- Multichannel video segmentationPublished by SPIE-Intl Soc Optical Eng ,1996
- Rapid scene analysis on compressed videoIEEE Transactions on Circuits and Systems for Video Technology, 1995
- Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1990