Speaker identification based text to audio alignment for an audio retrieval system
- 22 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2 (15206149) , 1099-1102
- https://doi.org/10.1109/icassp.1997.596133
Abstract
We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generated by the US Government) using a novel method based on speaker identification. Speaker sequence and approximate timing information is extracted from the text transcript and used to constrain a Viterbi alignment of speaker models to the observed audio. Speakers are modeled by computing Gaussian statistics of cepstral coefficients extracted from samples of each person's speech. The speaker identification is used to locate speaker transition points in the audio which are then linked to corresponding speaker transitions in the text transcript. The alignment system has been successfully integrated into a World Wide Web based search and browse system as an experimental service on the Internet.Keywords
This publication has 7 references indexed in Scilit:
- Robust talker-independent audio document retrievalPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Text-independent speaker identificationIEEE Signal Processing Magazine, 1994
- A system for retrieving speech documentsPublished by Association for Computing Machinery (ACM) ,1992
- The use of emphasis to automatically summarize a spoken discoursePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- HMM-based wordspotting for voice editing and indexingPublished by International Speech Communication Association ,1991
- Techniques for information retrieval from voice messagesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1991
- Automatic recognition of keywords in unconstrained speech using hidden Markov modelsIEEE Transactions on Acoustics, Speech, and Signal Processing, 1990