Abstract
An important problem in robot vision is that of very accurately segmenting a t.v. image into regions that correspond to homogeneous three-dimensional surfaces in the scene. We model an image of a homogeneous surface as a polynomial plus additive white noise. Highly accurate segmentation requires knowing the polynomials representing the image surfaces. We present a maximum likelihood estimation approach to the unsupervised learning of these polynomials. 3-D objects of interest are assumed to be composed of patches of smooth surfaces. An image of an object is decomposed into square windows, with each window assumed to view a piece of one 3-D surface and sometimes pieces of two such surfaces. The window is divided into small square blocks. The data in each block is approximated by a polynomial using least squares estimation. Those blocks recognized as viewing the same 3-D surface are clustered together, and a single polynomial is fit to the data in each cluster. The clustering chosen and the fitting of a polynomial to each cluster is done simultaneously and in such a way as to maximize the likelihood of the data. The clustering used is agglomerative clustering, where each small block is initially treated as a cluster, and clustering proceeds through a sequence of stages, with a pair of clusters merged into a larger cluster at each stage. Statistical tests for the homogeneity of data in the clusters are proposed for deciding when to stop the clustering. This maximum likelihood clustering can also be applied to polynomial model estimation for 3-D range data, and Markov Random Field model estimation for textured images.

This publication has 5 references indexed in Scilit: