Detection of target speakers in audio databases

1 January 1999

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 2, 821-824 vol.2
https://doi.org/10.1109/icassp.1999.759797

Abstract

The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target speaker test data which are undetectable in the current configuration of the detection algorithm.

Keywords

This publication has 5 references indexed in Scilit:

Segmentation of speech using speaker identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication, 1995
Speech segmentation and clustering based on speaker features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
An Algorithm for Vector Quantizer Design
IEEE Transactions on Communications, 1980