Abstract
A conceptual framework is provided in which to think of the relationships between the three-dimensional structure of physical space and the geometric properties of a set of cameras that provide pictures from which measurements can be made. We usually think of physical space as being embedded in a three-dimensional Euclidean space, in which measurements of lengths and angles do make sense. It turns out that for artificial systems, such as robots, this is not a mandatory viewpoint and that it is sometimes sufficient to think of physical space as being embedded in an affine or even a projective space. The question then arises of how to relate these models to image measurements and to geometric properties of sets of cameras. It is shown that, in the case of two cameras, a stereo rig, the projective structure of the world can be recovered as soon as the epipolar geometry of the stereo rig is known and that this geometry is summarized by a single 3 × 3 matrix, which is called the fundamental matrix. The affine structure can then be recovered if to this information is added a projective transformation between the two images that is induced by the plane at infinity. Finally, the Euclidean structure (up to a similitude) can be recovered if to these two elements is added the knowledge of two conics (one for each camera) that are the images of the absolute conic, a circle of radius -1 in the plane at infinity. In all three cases it is shown how the three-dimensional information can be recovered directly from the images without explicit reconstruction of the scene structure. This defines a natural hierarchy of geometric structures, a set of three strata that is overlaid upon the physical world and that is shown to be recoverable by simple procedures that rely on two items, the physical space itself together with possibly, but not necessarily, some a priori information about it, and some voluntary motions of the set of cameras.

This publication has 7 references indexed in Scilit: