Abstract
A codeword-dependent neural network (CDNN) is presented for the study of speaker adaptation. The CDNN is used as a nonlinear mapping function to transform speech data between two speakers. The mapping function is characterized by a number of important properties. First, the assembly of mapping functions enhances overall mapping quality. Second, multiple input vectors are used simultaneously in the transformation. This not only makes full use of dynamic information but also alleviates possible errors in the supervision data. Finally, the mapping function is derived from training data, with the quality dependent on the available amount of training data. Based on speaker-dependent models, performance evaluation showed that speaker normalization significantly reduced the error rate from 41.9% to 5.0%.

This publication has 17 references indexed in Scilit: