Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

6 January 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 107-110 vol.1
https://doi.org/10.1109/icassp.1988.196523

Abstract

A time-delay neural network (TDNN) for phoneme recognition is discussed. By the use of two hidden layers in addition to an input and output layer it is capable of representing complex nonlinear decision surfaces. Three important properties of the TDNNs have been observed. First, it was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation. Second, it has learned to form alternate representations linking different acoustic events with the same higher level concept. In this fashion it can implement trading relations between lower level acoustic events leading to robust recognition performance despite considerable variability in the input speech. Third, the network is translation-invariant and does not rely on precise alignment or segmentation of the input. The TDNNs performance is compared with the best of hidden Markov models (HMMs) on a speaker-dependent phoneme-recognition task. The TDNN achieved a recognition of 98.5% compared to 93.7% for the HMM, i.e., a fourfold reduction in error.<>

Keywords

This publication has 6 references indexed in Scilit:

BYBLOS: The BBN continuous speech recognition system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
An introduction to computing with neural nets
IEEE ASSP Magazine, 1987
Neural computation by concentrating information in time.
Proceedings of the National Academy of Sciences, 1987
Learning representations by back-propagating errors
Nature, 1986
Parallel Distributed Processing
Published by MIT Press ,1986
Continuous speech recognition by statistical methods
Proceedings of the IEEE, 1976