Abstract
This paper describes and evaluates a new technique for evaluating confidence in word strings produced by a speech recognition system. It detects misrecognized and out-of-vocabulary words in spontaneous spoken dialogs. The system uses multiple, diverse knowledge sources including acoustics, semantics, pragmatics and discourse to determine if a word string is misrecognized. When likely misrecognitions are detected, a series of tests distinguishes out-of-vocabulary words from other error sources. The work is part of a larger effort to automatically recognize and understand new words when spoken in a spontaneous spoken dialog. The newly developed acoustic confidence metrics output independent probabilites that a word is recognized correctly and a measure of how reliably we can tell if it is wrong. At p <.05, the acoustic methods detect 65% of the errors. The semantic/discourse module detects 98% of the errors that are semantically or contextually inappropriate, but cannot detect contextually consistent misrecognitions. Hence, we merged these two methods and ran them on a single test set to see whether the semantic/pragmatic/discourse component detected input not reliable rejected acoustically and to see how many of the semantically consistent errors could be detected with acoustic normalization methods.

This publication has 8 references indexed in Scilit: