Segmentation in isolated word recognition using vector quantization

Abstract
Two types of isolated digit recognition systems based on vector quantization were tested in a speaker-independent task. In both types of systems, a digit was modelled as a sequence of codebooks generated from segments of training data. In systems of the first type, the training and unknown utterances were simply partitioned into 1, 2 or 3 equal-length segments. Recognition involved computing the distortion when the input spectra were vector quantized using the codebook sequences. These systems are closely related to recognizers proposed by Burton et al.[1]. In systems of the second type, training segments corresponded to acoustic-phonetic units and were obtained from hand-marked data. Recognition involved generating a minimum-distortion segmentation of the unknown by dynamic programming. Accuracies approaching 96-97% were achieved by both types of systems.

This publication has 8 references indexed in Scilit: