Speech recognition based on top‐down and bottom‐up phoneme recognition

Abstract
This paper discusses a speech recognition system which integrates the top‐down and bottom‐up phoneme recognitions. The system is based on the recognition of phonemes, where the top‐down and bottom‐up processings are combined using a table called a blackboard. In top‐down processing, the segmentation and the scoring are performed for each phoneme in the total speech interval, and in the bottom‐up processing, only for the interval in which the phoneme segmentation can be performed with certainty. By this scheme, the two recognition processings cooperate, while maintaining their independence. In the proposed system, the linguistic processing and the acoustic processing are structured hierarchically. The two parts are combined through the blackboard, avoiding duplicated processings in the same environment. To evaluate the constructed system, a spoken word recognition experiment with the word dictionaries composed of 100 or 643 city names, and the continuous speech recognition experiment for 235 minimal phrases uttered by two examinees were performed. It was observed as a result that the recognition performance by the traditional top‐down processing is almost maintained, while the processing time is decreased to one‐half or one‐third in word recognition and less than one‐fourth in minimal phrase recognition.

This publication has 11 references indexed in Scilit: