Properties of large lexicons: Implications for advanced isolated word recognition systems

24 March 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 7, 546-549
https://doi.org/10.1109/icassp.1982.1171902

Abstract

As part of our goal to design large-vocabulary, phonetically-based isolated word recognition systems, we investigated the statistical properties and constraints of the phonemic structures of English words. Our database consisted of five lexicons varying in size from 1250 to 20,000 words. The lexicons included, in addition to a phonemic transcription for each word, the word's frequency of occurrence as determined from the Brown Corpus. We studied the distributions of the phonemes, both individually and by class, within the lexicon and within the corpus. Distributions of consonant clusters were also obtained. Finally, the distribution of words in terms of patterns derived from broad categorization of the phonemes was investigated. This paper summarizes the results of these studies and discusses implications for phonetically-based isolated word recognition strategies.

Keywords

This publication has 4 references indexed in Scilit:

Speaker trained isolated word recognition on a large vocabulary of words
The Journal of the Acoustical Society of America, 1981
Dynamic programming algorithm optimization for spoken word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978
Linear prediction: A tutorial review
Proceedings of the IEEE, 1975
Minimum prediction residual principle applied to speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1975