Text-to-speech algorithms based on FFT synthesis

Abstract
The authors present FFT synthesis algorithms for a French text-to-speech system based on diphone concatenation. FFT synthesis techniques are capable of producing high quality prosodic modifications of natural speech. Several approaches are presented to reduce the distortions due to diphone concatenation. They are based on appropriate manipulations of the phase spectrum, either by phase equalization across all the diphones, or by phase smoothing between successive diphones. The resulting speech is significantly better quality than with conventional LPC synthesis. An experiment to reduce the computational cost by performing all the FFTs off-line is described. The resulting speech is slightly degraded with respect to 'full' FFT synthesized speech, but it remains more natural in comparison with the LPC speech.

This publication has 9 references indexed in Scilit: