Audio scene segmentation using multiple features, models and time scales

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 4, 2441-2444 vol.4
https://doi.org/10.1109/icassp.2000.859335

Abstract

We present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: a definition of an audio scene; multiple feature models that characterize the dominant sources; and a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.

Keywords

This publication has 8 references indexed in Scilit:

A hardware accelerator for smart information systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Real-time discrimination of broadcast speech/music
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Video scene segmentation via continuous video coherence
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Construction and evaluation of a robust multifeature speech/music discriminator
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Heuristic approach for generic audio data segmentation and annotation
Published by Association for Computing Machinery (ACM) ,1999
Video Manga
Published by Association for Computing Machinery (ACM) ,1999
Automatic audio content analysis
Published by Association for Computing Machinery (ACM) ,1996
Auditory Scene Analysis
Published by MIT Press ,1990