Cinematic techniques for speech processing: temporal decomposition and multivariate linear prediction

Abstract
Two models, the temporal decomposition and the multivariate linear prediction, of the spectral evolution of speech signals capable of processing some aspects of the speech variability are presented. A series of acoustic-phonetic decoding experiments, characterized by the use of spectral targets of the temporal decomposition techniques and a speaker-dependent mode, gives good results compared to a reference system (i.e., 70% vs. 60% for the first choice). Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results (i.e., 98.4% recognition rate for 420 speakers) without using more than one sentence. Taking into account the interpretation of the models, these results show how interesting the cinematic models are for obtaining a reduced variability of the speech signal representation.

This publication has 7 references indexed in Scilit: