Sensor-fusion for robust identification of persons: a field test

Abstract
We have presented an approach to combine optical lip motion analysis and acoustic voice analysis in order to identify the people speaking. Due to the independence of the different data sources, a higher reliability of the results in comparison with simple optical lip reading was observed. From this proposal, a system prototype has emerged. This improved setup, which is demonstrated, has shown promising results, with a false recognition rate of 0% for both surnames and entry words on a sample of 101 persons. Rejection rates of 8% and 14% respectively have been observed. For performing the recognition, the person to be identified has to speak a single word, which can either be specific of the person (e.g. surname), or be one identical entry word for all persons. Meanwhile, a field test at the entrance of our Institute has been started, which will last for a few months. We demonstrate the first results of this field test. We propose that the combination of motion and voice analysis offers a possibility for realizing robust access control systems.

This publication has 3 references indexed in Scilit: