Grounding language in perception

Abstract
We describe an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input. The program receives an animated line-drawing as input and produces as output a semantic representation of the events occurring in that movie. We suggest that the notions of support, contact, and attachment are crucial to specifying many simple spatial motion event types and present a logical notation for describing classes of events that incorporates such notions as primitives. We then suggest that the truth values of such primitives can be recovered from perceptual input by a process of counterfactual simulation, predicting the effect of hypothetical changes to the world on the immediate future. Finally, we suggest that such counterfactual simulation is performed using knowledge of naive physical constraints such as substantiality, continuity, gravity, and ground plane. We describe the algorithms that incorporate these ideas in the program and illustrate the operation of the program on sample input.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

This publication has 0 references indexed in Scilit: