Automatic segmentation and labeling of speech
- 1 January 1991
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 15206149,p. 473-476 vol.1
- https://doi.org/10.1109/icassp.1991.150379
Abstract
The authors investigate an automatic approach to segmentation of labeled speech and labeling and segmentation of speech when only the orthographic transcription of speech is available. The technique is based on a phone recognition system based on a trigram phonotactic model, gamma distribution phone duration models, and a spectral model based on five different structures for phone models of varying contextual dependencies. The alignment of speech with a given phone sequence is performed as a very constrained phone recognition task with the phonotactic model based only on the given phone sequence. When only orthographic transcription is provided, a classification-tree-based prediction of most likely phone realizations is used as an input network for the phone recognizer. The maximum likelihood phone sequence is then treated as the true phone sequence and its segment boundaries are compared with the reference boundaries.Keywords
This publication has 7 references indexed in Scilit:
- Time alignment of natural speech to synthetic speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Continuous speech recognition from a phonetic transcriptionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Acoustic modeling for large vocabulary speech recognitionComputer Speech & Language, 1990
- Automatic Speech RecognitionPublished by Springer Nature ,1989
- Some applications of tree-based modelling to speech and languagePublished by Association for Computational Linguistics (ACL) ,1989
- Continuously variable duration hidden Markov models for automatic speech recognitionComputer Speech & Language, 1986
- Error bounds for convolutional codes and an asymptotically optimum decoding algorithmIEEE Transactions on Information Theory, 1967