Towards a multimodal meeting record
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 3, 1593-1596 vol.3
- https://doi.org/10.1109/icme.2000.871074
Abstract
Face-to-face meetings usually encompass several modalities including speech, gesture, handwriting, and person identification. Recognition and integration of each of these modalities is important to create an accurate record of a meeting. However, each of these modalities presents recognition difficulties. Speech recognition must be speaker and domain independent, have low word error rates, and be close to real time to be useful. Gesture and handwriting recognition must be writer independent and support a wide variety of writing styles. Person identification has difficulty with segmentation in a crowded room. Furthermore, in order to produce the record automatically, we have to solve the assignment problem (who is saying what), which involves people identification and speech recognition. This paper examines a multimodal meeting room system under development at Carnegie Mellon University that enables us to track, capture and integrate the important aspects of a meeting from people identification to meeting transcription. Once a multimedia meeting record is created, it can be archived for later retrieval.Keywords
This publication has 5 references indexed in Scilit:
- Automatic detection of human faces in natural scene images by use of a skin color model and of invariant momentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- NPen/sup ++/: a writer independent, large vocabulary on-line cursive handwriting recognition systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Face recognition in a meeting roomPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Color indexingInternational Journal of Computer Vision, 1991
- Eigenfaces for RecognitionJournal of Cognitive Neuroscience, 1991