Speech/music discrimination for multimedia applications

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 4 (15206149) , 2445-2448
https://doi.org/10.1109/icassp.2000.859336

Abstract

Automatic discrimination of speech and music is an important tool in many multimedia applications. Previous work has focused on using long-term features such as differential parameters, variances and time-averages of spectral parameters. These classifiers use features estimated over windows of 0.5-5 seconds, and are relatively complex. We present our results of combining the line spectral frequencies (LSFs) and zero crossing-based features for frame-level narrowband speech/music discrimination. Our classification results for different types of music and speech show the good discriminating power of these features. Our classification algorithms operate using only a frame delay of 20 ms, making them suitable for real-time multimedia applications.

Keywords

This publication has 7 references indexed in Scilit:

A multimode transform predictive coder (MTPC) for speech and audio
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Real-time discrimination of broadcast speech/music
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Construction and evaluation of a robust multifeature speech/music discriminator
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A comparison of features for speech, music discrimination
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
Frame level noise classification in mobile environments
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
Statistical properties of line spectrum pairs
Signal Processing, 1998
Video handling with music and speech detection
IEEE MultiMedia, 1998