Abstract
Some time normalization is necessary for reliable word recognition. Recently, attention has been given to nonlinear time warping using dynamic programming. Implementation of full dynamic programming time warping in real time, however, requires substantial computational power. A real-time speech recognizer is described which employes linear time normalization followed by a nonlinear time warp between the linearly normalized input and reference samples. A symmetrical time warp function with maximum magnitude deviation of 12.5% from the main diagonal is used. This combination of linear and nonlinear time warping was compared to linear time normalization with three shift values. In a test with one experienced user speaking natural alphabet data, recognition performance was improved 2.1% to a level of 96.3% correct. In a second test with an experienced user speaking one- and two-digit pairs, the improvement was 0.7% to a level of 99.4%. In tests with 9 mostly inexperienced users speaking digits and phonetic alphabet, the improvement was 0.6% to a level of 98.0%.