Comparison of discriminative training criteria

Abstract
A formally unifying approach for a class of discriminative training criteria including maximum mutual information (MMI) and minimum classification error (MCE) criterion is presented, together with the optimization methods of the gradient descent (GD) and extended Baum-Welch (EB) algorithm. Comparisons are discussed for the MMI and the MCE criterion, including the determination of the sets of word sequence hypotheses for discrimination using word graphs. Experiments have been carried out on the SieTill corpus for telephone line recorded German continuous digit strings. Using several approaches for acoustic modeling, the word error rates obtained by MMI training using single densities always were better than those for maximum likelihood (ML) using mixture densities. Finally, the results obtained for corrective training (CT), i.e. using only the best recognized word sequence in addition to the spoken word sequence, could not be improved by using the word graph based discriminative training.

This publication has 5 references indexed in Scilit: