Time alignment of natural speech to synthetic speech

24 March 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 9, 65-68
https://doi.org/10.1109/icassp.1984.1172424

Abstract

A capacity to carry out reliable automatic time alignment of synthetic speech to naturally produced speech offers potential benfits in speech recognition and speaker recognition as well as in synthesis itself. Phrase alignment experiments are described that indicate that alignment to synthetic speech is more difficult than alignment of speech from two natural speakers. An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy. By this measure, alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech, by modifying the spectrum similarity metric, and by generating the synthetic spectra directly from the control parameters using simplified excitation spectra. The improvements seem to limit, however, at a level below that found between natural speakers. It is conjectured that further improvement requires modifications to the synthesis rules themselves.

Keywords

This publication has 13 references indexed in Scilit:

Isolated word recognition using a two-pass pattern recognition approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
ZIP: A dynamic programming algorithm for time-aligning two indefinitely long utterances
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Speech recognition performance assessments and available databases
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Speaker recognition using a feature weighting technique
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Word verification in a speech understanding system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
On talker-independent word recognition in continuous speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
The discriminative network: A mechanism for focusing recognition in whole-word pattern matching
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
On temporal alignment of sentences of natural and synthetic speech
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1983
Speaker adaptation for word-based speech recognition systems
The Journal of the Acoustical Society of America, 1981
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980