The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition

4 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 165-168 vol.1
https://doi.org/10.1109/icassp.1990.115564

Abstract

A multinetwork time-delay-neural-network (TDNN)-based connectionist architecture that allows multispeaker phoneme discrimination (/b,d,g/) to be performed at the speaker-dependent recognition rate of 98.4% is presented. The overall network gates the phonemic decisions of modules trained on individual speakers to form its overall classification decision. By dynamically adapting to the input speech and focusing on a combination of speaker-specific modules, the network outperforms a single TDNN trained on the speech of all six speakers (95.9%). To train this network a form of multiplicative connection called the Meta-Pi connection is developed. It is illustrated how the Mega-Pi paradigm implements a dynamically adaptive Bayesian MAP classifier. It learns-without supervision-to recognize the speech of one particular speaker (99.8%) using a dynamic combination of internal models of other speakers exclusively. The Meta-Pi model is a viable basis for a connectionist speech recognition system that can rapidly adapt to new speakers and varying speaker dialects.

Keywords

This publication has 3 references indexed in Scilit:

Consonant recognition by modular construction of large phonemic time-delay neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Phoneme recognition using time-delay neural networks
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
Learning representations by back-propagating errors
Nature, 1986