Structural methods in automatic speech recognition

1 January 1985

journal article
review article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE

Vol. 73 (11) , 1625-1650
https://doi.org/10.1109/proc.1985.13344

Abstract

The past decade has witnessed substantial progress toward the goal of constructing a machine capable of understanding colloquial discourse. Central to this progress has been the development and application of mathematical methods that permit modeling the speech signal as a complex code with several coexisting levels of structure. The most successful of these are "template matching," stochastic modeling, and probabilistic parsing. The manifestation of common themes such as dynamic programming and finite-state descriptions accentuates a superficial likeness amongst the methods which is often mistaken for the deeper similarity arising from their shared Bayesian foundation. In this paper, we outline the mathematical bases of these methods, invariant metrics, hidden Markov chains, and formal grammars, respectively. We then recount and briefly interpret the results of experiments in speech recognition to which the various methods were applied. Since these mathematical principles seem to bear little resemblance to traditional linguistic characterizations of speech, the success of the experiments is occasionally attributed, even by their authors, merely to excellent engineering. We conclude by speculating that, quite to the contrary, these methods actually constitute a powerful theory of speech that can be reconciled with and elucidate conventional linguistic theories while being used to build truly competent mechanical speech recognizers.

Keywords

This publication has 99 references indexed in Scilit:

Maximum likelihood estimation for multivariate observations of Markov sources
IEEE Transactions on Information Theory, 1982
Connected digit recognition using a level-building DTW algorithm
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
Acoustic correlates of some phonetic categories
The Journal of the Acoustical Society of America, 1980
Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
Review of the ARPA Speech Understanding Project
The Journal of the Acoustical Society of America, 1977
The role of phonological rules in speech understanding research
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975
The Jacobian of a growth transformation
Pacific Journal of Mathematics, 1973
Invariant functions of an iterative process for maximization of a polynomial
Pacific Journal of Mathematics, 1972
Growth transformations for functions on manifolds
Pacific Journal of Mathematics, 1968
A note on two problems in connexion with graphs
Numerische Mathematik, 1959