Abstract
An investigation of speech recognition and language processing is described. The speech recognition part consists of the large phonemic time-delay neural networks (TDNNs) which can automatically spot all 24 Japanese phonemes by simply scanning input speech. The language processing part is made up of a predictive LR parser which predicts subsequent phonemes based on the currently proposed phonemes. This TDNN-LR recognition system provides large-vocabulary and continuous speech recognition. Recognition experiments for ATR's conference registration task were performed using the TDNN-LR method. Speaker-dependent phrase recognition rates of 65.1% for the first choices and 88.8% within the fifth choices were attained. Also, efficiency in the adaptive incremental training using a small number of training tokens extracted from continuous speech was confirmed in the TDNN-LR system.<>

This publication has 9 references indexed in Scilit: