Stratification of three-dimensional vision: projective, affine, and metric representations

1 March 1995

journal article
Published by Optica Publishing Group in Journal of the Optical Society of America A

Vol. 12 (3) , 465-484
https://doi.org/10.1364/josaa.12.000465

Abstract

A conceptual framework is provided in which to think of the relationships between the three-dimensional structure of physical space and the geometric properties of a set of cameras that provide pictures from which measurements can be made. We usually think of physical space as being embedded in a three-dimensional Euclidean space, in which measurements of lengths and angles do make sense. It turns out that for artificial systems, such as robots, this is not a mandatory viewpoint and that it is sometimes sufficient to think of physical space as being embedded in an affine or even a projective space. The question then arises of how to relate these models to image measurements and to geometric properties of sets of cameras. It is shown that, in the case of two cameras, a stereo rig, the projective structure of the world can be recovered as soon as the epipolar geometry of the stereo rig is known and that this geometry is summarized by a single 3 × 3 matrix, which is called the fundamental matrix. The affine structure can then be recovered if to this information is added a projective transformation between the two images that is induced by the plane at infinity. Finally, the Euclidean structure (up to a similitude) can be recovered if to these two elements is added the knowledge of two conics (one for each camera) that are the images of the absolute conic, a circle of radius

\sqrt{- 1}

in the plane at infinity. In all three cases it is shown how the three-dimensional information can be recovered directly from the images without explicit reconstruction of the scene structure. This defines a natural hierarchy of geometric structures, a set of three strata that is overlaid upon the physical world and that is shown to be recoverable by simple procedures that rely on two items, the physical space itself together with possibly, but not necessarily, some a priori information about it, and some voluntary motions of the set of cameras.

Keywords

This publication has 7 references indexed in Scilit:

A theory of self-calibration of a moving camera
International Journal of Computer Vision, 1992
Affine structure from motion
Journal of the Optical Society of America A, 1991
Properties of essential matrices
International Journal of Imaging Systems and Technology, 1990
Motion from point matches: Multiplicity of solutions
International Journal of Computer Vision, 1990
Some properties of the E matrix in two-view motion estimation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1989
Adaptive changes in perceptual responses and visuomanual coordination during exposure to visual metrical distortion
Vision Research, 1985
A computer algorithm for reconstructing a scene from two projections
Nature, 1981