Segmentation of speech using speaker identification

17 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. i (15206149) , I/161-I/164
https://doi.org/10.1109/icassp.1994.389330

Abstract

This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker segmentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data.

Keywords

This publication has 7 references indexed in Scilit:

Text independent speaker identification using automatic acoustic segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Identification of non-linguistic speech features
Published by Association for Computational Linguistics (ACL) ,1993
Speech segmentation and clustering based on speaker features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993
An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
Training and search algorithms for an interactive wordspotting system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
Segregation of speakers for speech recognition and speaker identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1991