On duration and smoothing rules in a demisyllable-based isolated-word recognition system

Abstract
In a recently proposed approach to isolated‐word recognition, word reference templates are constructed from a universal set of demisyllable units by concatenating the appropriate demisyllables for each vocabulary item. A dynamic time warping (DTW) algorithm is used to align test and reference patterns optimally. Nevertheless some sort of syllable duration preadjustment is necessary because of the large potential difference in duration between isolated and in‐context syllables. We have found that a simple rule that reduces the length of rhyme (final) demisyllables in nonword‐final stressed syllables to approximately half their isolated‐syllable duration provides recognition accuracy as high as that attained through use of complex, highly context‐sensitive rules. In addition to its practical application, this result can be regarded as a further demonstration of the power of DTW. We have also investigated the requirements for parameter smoothing at demisyllable boundaries. We find that an optimal window duration for smoothing is about 60–90 ms, but that failure to smooth reduces recognition accuracy only about 2% in an 1109 word test set; that linear and parabolic smoothing are equally effective; and that it does not appear that recognition accuracy can be improved by smoothing in certain phonetic contexts only. Taken together, these results can be viewed as confirming the suitability of the demisyllable as the basic unit in recognition systems.

This publication has 1 reference indexed in Scilit: