Abstract
A supervised spectral mapping method for speaker adaptation based on a piecewise linear transformation of cepstral vectors is proposed. In this method, an input vector is mapped onto the target spectral space by a weighted sum of linearly transformed vectors using a set of mapping matrices which are associated with fuzzy partitioned spaces. These matrices were estimated so as to minimize the total mean square error between the mapped and target spectra. This method was compared with the difference interpolation mapping (D-map) method, which is an extension of the codebook mapping methods. Through 16 phoneme recognition tests using a single Gaussian distribution hidden Markov model (HMM), it was found that the proposed method with 16 fuzzy partitioned spaces improved recognition performance by 4% compared to the usual linear mapping method when using 100 training words, and also achieved a 3% higher rate on average than the D-map method.

This publication has 8 references indexed in Scilit: