Time alignment of natural speech to synthetic speech
- 24 March 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 9, 65-68
- https://doi.org/10.1109/icassp.1984.1172424
Abstract
A capacity to carry out reliable automatic time alignment of synthetic speech to naturally produced speech offers potential benfits in speech recognition and speaker recognition as well as in synthesis itself. Phrase alignment experiments are described that indicate that alignment to synthetic speech is more difficult than alignment of speech from two natural speakers. An artificial speech recognition experiment is introduced as a convenient means of assessing alignment accuracy. By this measure, alignment accuracy is found to be improved considerably by applying certain speaker adaptation transformations to the synthetic speech, by modifying the spectrum similarity metric, and by generating the synthetic spectra directly from the control parameters using simplified excitation spectra. The improvements seem to limit, however, at a level below that found between natural speakers. It is conjectured that further improvement requires modifications to the synthesis rules themselves.Keywords
This publication has 13 references indexed in Scilit:
- Isolated word recognition using a two-pass pattern recognition approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- ZIP: A dynamic programming algorithm for time-aligning two indefinitely long utterancesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Speech recognition performance assessments and available databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Speaker recognition using a feature weighting techniquePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Word verification in a speech understanding systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- On talker-independent word recognition in continuous speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The discriminative network: A mechanism for focusing recognition in whole-word pattern matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- On temporal alignment of sentences of natural and synthetic speechIEEE Transactions on Acoustics, Speech, and Signal Processing, 1983
- Speaker adaptation for word-based speech recognition systemsThe Journal of the Acoustical Society of America, 1981
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentencesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980