Multifeature audio segmentation for browsing and annotation
- 20 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.Keywords
This publication has 8 references indexed in Scilit:
- Experiments in syllable-based recognition of continuous speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A hidden Markov model framework for video segmentation using audio and image featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Construction and evaluation of a robust multifeature speech/music discriminatorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- An overview of audio information retrievalMultimedia Systems, 1999
- Tempo and beat analysis of acoustic musical signalsThe Journal of the Acoustical Society of America, 1998
- SpeechSkimmerACM Transactions on Computer-Human Interaction, 1997
- Content-based classification, search, and retrieval of audioIEEE MultiMedia, 1996
- Auditory Scene AnalysisPublished by MIT Press ,1990