Very large vocabulary isolated utterance recognition: a comparison between one pass and two pass strategies

Abstract

A system for recognizing isolated utterances belonging to a very large vocabulary is presented that follows a two-pass strategy. The first step, hypothesization, consists in the selection of a subset of word candidates, starting from the segmentation of speech into six broad phonetic classes. This module is implemented through a dynamic programming algorithm working in a three-dimensional space. The search is performed on a tree representing a coarse description of the lexicon. The second step is the search for the best N candidates according to a maximum-likelihood criterion. Each word candidate is represented by a graph of subword hidden Markov models, and a tree structure of the whole word subset is built on line for an efficient implementation of the Viterbi algorithm. A comparison with a direct approach that does not use the hypothesization module shows that the two-pass approach has the same performance with an 80% reduction in computational complexity.

Keywords

This publication has 7 references indexed in Scilit:

Word preselection for large vocabulary speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A model of lexical access from partial phonetic information
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Definition and evaluation of phonetic units for speech recognition by hidden Markov models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Generating word hypotheses in continuous speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Experimental results on large-vocabulary continuous speech recognition and understanding
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
A segmentation algorithm for connected word recognition based on estimation principles
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1983