Audio scene segmentation using multiple features, models and time scales
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 4, 2441-2444 vol.4
- https://doi.org/10.1109/icassp.2000.859335
Abstract
We present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: a definition of an audio scene; multiple feature models that characterize the dominant sources; and a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.Keywords
This publication has 8 references indexed in Scilit:
- A hardware accelerator for smart information systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Real-time discrimination of broadcast speech/musicPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Video scene segmentation via continuous video coherencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Construction and evaluation of a robust multifeature speech/music discriminatorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Heuristic approach for generic audio data segmentation and annotationPublished by Association for Computing Machinery (ACM) ,1999
- Video MangaPublished by Association for Computing Machinery (ACM) ,1999
- Automatic audio content analysisPublished by Association for Computing Machinery (ACM) ,1996
- Auditory Scene AnalysisPublished by MIT Press ,1990