Text independent speaker identification using automatic acoustic segmentation

Abstract
An acoustic-class-dependent technique for text-independent speaker identification on very short utterances is described. The technique is based on maximum-likelihood estimation of a Gaussian mixture model representation of speaker identity. Gaussian mixtures are noted for their robustness as a parametric model and their ability to form smooth estimates of rather arbitrary underlying densities. Speaker model parameters are estimated using a special case of the iterative expectation-maximization (EM) algorithm, and a number of techniques are investigated for improving model robustness. The system is evaluated using a 12 reference speaker population from a conversational speech database. It achieves 80% average text-independent speaker identification performance for a 1-s test utterance length.

This publication has 3 references indexed in Scilit: