Efficient search using posterior phone probability estimates

Abstract
We present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly well-suited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backed-off trigram language model. For this task, our single-pass decoder took around 15x realtime on an HP735 workstation. At a cost of 7% relative search error, the decoding time can be speeded up to approximately realtime.

This publication has 12 references indexed in Scilit: