The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task
- 17 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
We report progress made at LIMSI in speaker-independent large vocabulary speech dictation using the ARPA Wall Street Journal-based CSR corpus. The recognizer makes use of continuous density HMM with Gaussian mixture for acoustic modeling and n-gram statistics estimated on the newspaper texts for language modeling. The recognizer uses a time-synchronous graph-search strategy which is shown to still be viable with vocabularies of up to 20 K words when used with bigram back-off language models. A second forward pass, which makes use of a word graph generated with the bigram, incorporates a trigram language model. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. The recognizer has been evaluated in the Nov92 and Nov93 ARPA tests for vocabularies of up to 20,000 words.Keywords
This publication has 13 references indexed in Scilit:
- A phone-based approach to non-linguistic speech feature identificationComputer Speech & Language, 1995
- Speaker-independent continuous speech dictationSpeech Communication, 1994
- Benchmark tests for the DARPA Spoken Language ProgramPublished by Association for Computational Linguistics (ACL) ,1993
- Bayesian learning for hidden Markov model with Gaussian mixture state observation densitiesSpeech Communication, 1992
- The design for the wall street journal-based CSR corpusPublished by Association for Computational Linguistics (ACL) ,1992
- New uses for the N-Best sentence hypotheses within the BYBLOS speech recognition systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- A fast match for continuous speech recognition using allophonic modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- A rapid match algorithm for continuous speech recognitionPublished by Association for Computational Linguistics (ACL) ,1990
- Estimation of probabilities from sparse data for the language model component of a speech recognizerIEEE Transactions on Acoustics, Speech, and Signal Processing, 1987
- The use of a one-stage dynamic programming algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984