Speech/music discrimination for multimedia applications
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 4 (15206149) , 2445-2448
- https://doi.org/10.1109/icassp.2000.859336
Abstract
Automatic discrimination of speech and music is an important tool in many multimedia applications. Previous work has focused on using long-term features such as differential parameters, variances and time-averages of spectral parameters. These classifiers use features estimated over windows of 0.5-5 seconds, and are relatively complex. We present our results of combining the line spectral frequencies (LSFs) and zero crossing-based features for frame-level narrowband speech/music discrimination. Our classification results for different types of music and speech show the good discriminating power of these features. Our classification algorithms operate using only a frame delay of 20 ms, making them suitable for real-time multimedia applications.Keywords
This publication has 7 references indexed in Scilit:
- A multimode transform predictive coder (MTPC) for speech and audioPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Real-time discrimination of broadcast speech/musicPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Construction and evaluation of a robust multifeature speech/music discriminatorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A comparison of features for speech, music discriminationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Frame level noise classification in mobile environmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Statistical properties of line spectrum pairsSignal Processing, 1998
- Video handling with music and speech detectionIEEE MultiMedia, 1998