Automatic segmentation and labeling of speech

1 January 1991

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15206149,p. 473-476 vol.1
https://doi.org/10.1109/icassp.1991.150379

Abstract

The authors investigate an automatic approach to segmentation of labeled speech and labeling and segmentation of speech when only the orthographic transcription of speech is available. The technique is based on a phone recognition system based on a trigram phonotactic model, gamma distribution phone duration models, and a spectral model based on five different structures for phone models of varying contextual dependencies. The alignment of speech with a given phone sequence is performed as a very constrained phone recognition task with the phonotactic model based only on the given phone sequence. When only orthographic transcription is provided, a classification-tree-based prediction of most likely phone realizations is used as an input network for the phone recognizer. The maximum likelihood phone sequence is then treated as the true phone sequence and its segment boundaries are compared with the reference boundaries.

Keywords

This publication has 7 references indexed in Scilit:

Time alignment of natural speech to synthetic speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Continuous speech recognition from a phonetic transcription
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Acoustic modeling for large vocabulary speech recognition
Computer Speech & Language, 1990
Automatic Speech Recognition
Published by Springer Nature ,1989
Some applications of tree-based modelling to speech and language
Published by Association for Computational Linguistics (ACL) ,1989
Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech & Language, 1986
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
IEEE Transactions on Information Theory, 1967