Use of syllable-scale timing to discriminate words

Abstract
Assuming that only primitive, imperfect phonetic labeling of speech could be achieved by an automatic speech segmenter, words from several small vocabularies were identified using discriminant analysis, where the locations of prominent acoustic boundaries were combined in the optimum linear fashion. A set of two-syllable words that shared the same spelling in a weak alphabet (using categories like stop closure, fricative, vowel, etc.), but that differed in such features as which syllable was stressed, the tensity of the vowel, the identity of particular segments, etc., was selected. Discriminant analysis on the vector of six segmental boundaries achieved recognition accuracy 6.3 times better than chance. The words were produced by six talkers at two speaking tempos and measured by hand from sound spectrograms. Testing was performed on productions different from those used in training. When more confusable words (sharing the same stress pattern) were employed, performance was still five times better than chance. In a third experiment, the first two word sets were combined, and a subset of the variables (four of them) shared in common was employed for identification. This time recognition using temporal information was still able to do a reasonable job of discriminating the words. The results suggest that considerable information is available in segmental timing for identifying words even when the phonetic labeling of speech is weak.

This publication has 2 references indexed in Scilit: