Abstract
Traditionally FFT (fast implementation of discrete Fourier transform, DFT) has been utilized in recognition algorithms involving speech. Other discrete orthogonal transforms such as Walsh-Hadamard transform (WHT) and rapid transform (RT) can play equally important roles in the recognition process as they have advantages in implementation and hardware realization. The capability of these transforms in recognizing phonemes based on training matrices and mean square error (mse) criterion is investigated. The speech data base consists of ten sentences spoken by ten different speakers (all male). For recognition purposes the speech is sectioned into 10 ms intervals and is sampled at 20 kHz. Training matrices for all the three transforms are developed. Test matrices in the transform domain are compared with the prototypes based on mse criterion which led to the decision process. WHT and RT appear to offer promise and potential compared to FFT as the former are easier to implement and as they yield recognition results comparable to those of the FFT. Other distance measures and recognition schemes are proposed for improving the classification accuracy.

This publication has 8 references indexed in Scilit: