On the role of amplitude and phase in the synthesis of male and female voices

Abstract
A pitch-synchronous segmentation, which was shown to be perceptually close to a deconvolution [C. Hamon et al., Proc. IEEE ICASSP'89, 238–241 (1989)], was used to obtain a short-time Fourier representation of the LPC residual. After selected amplitude and phase manipulations of voiced segments, a residue was reconstructed, which was used to drive the LPC synthesis filter. Twenty utterances (ten male, ten female) were investigated under two amplitude (original/flat) and two phase conditions (original/zero), yielding four versions for each utterance. The quality of these versions was judged by 12 subjects in a paired-comparison experiment. Original amplitude information was consistently preferred over original phase information. For female voices, there were significant quality differences between any of the four versions. However, for male voices the original amplitude information alone proved to be sufficient to make the synthetic speech almost indistinguishable from natural speech. [Work was supported in part by the Dutch SPIN-ASSP program.]

This publication has 0 references indexed in Scilit: