An approach to text-independent speaker recognition with short utterances

24 March 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 8, 555-558
https://doi.org/10.1109/icassp.1983.1172258

Abstract

A new technique for text-independent speaker recognition is proposed which uses a statistical model of the speaker's vector quantized speech. The technique retains text-independent properties while allowing considerably shorter test utterances than comparable speaker recognition systems. The frequently-occurring vectors or characters form a model of multiple points in the n dimensional speech space instead of the usual single point models, The speaker recognition depends on the statistical distribution of the distances between the speech frames from the unknown speaker and the closest points in the model. Models were generated with 100 seconds of conversational training speech for each of 11 male speakers. The system was able to identify 11 speakers with 96%, 87%, and 79% accuracy from sections of unknown speech of durations of 10, 5, and 3 seconds, respectively. Accurate recognition was also obtained even when there were variations in channels over which the training and testing data were obtained. A real-time demonstration system has been implemented including both training and recognition processes.

Keywords

This publication has 3 references indexed in Scilit:

Long-term feature averaging for speaker recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1977
Residual energy of linear prediction applied to vowel and speaker recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1976
Minimum prediction residual principle applied to speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975