Abstract
Any speech recognition system designed to function in environments with other sounds present must deal with extraneous sounds. A hierarchical separation and recognition system is proposed as a computational model of how the auditory system processes sounds. The GRASP system (for Grouping Research on Auditory Sound Processing) is a part of that overall framework for sound separation. GRASP is a computational model of the data-driven aspects of auditory separation. It uses the physical cues present in the acoustic signal (such as pitch and onsets) to decide how many sounds are present and of what each sound consists. Its initial use is to separate simultaneously spoken digits from different speakers.

This publication has 13 references indexed in Scilit: