Probabilistic web image gathering

10 November 2005

conference paper
Published by Association for Computing Machinery (ACM)

p. 57-64
https://doi.org/10.1145/1101826.1101838

Abstract

We propose a new method for automated large scale gathering of Web images relevant to specified concepts. Our main goal is to build a knowledge base associated with as many concepts as possible for large scale object recognition studies. A second goal is supporting the building of more accurate text-based indexes for Web images. In our method, good quality candidate sets of images for each keyword are gathered as a function of analysis of the surrounding HTML text. The gathered images are then segmented into regions, and a model for the probability distribution of regions for the concept is computed using an iterative algorithm based on the previous work on statistical image annotation. The learned model is then applied to identify which images are visually relevant to the concept implied by the keyword. Implicitly, which regions or the images are relevant is also determined. Our experiments reveal that the new method performs much better than Google Image Search and a simple method based on more standard content based image retrieval methods.

Keywords

This publication has 13 references indexed in Scilit:

Autonomous visual model building based on image crawling through internet search engines
Published by Association for Computing Machinery (ACM) ,2004
A bootstrapping framework for annotating and retrieving WWW images
Published by Association for Computing Machinery (ACM) ,2004
A Visual Category Filter for Google Images
Published by Springer Nature ,2004
Object class recognition by unsupervised scale-invariant learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Unsupervised segmentation of color-texture regions in images and video
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2001
Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web
Computer Vision and Image Understanding, 1999
Combining labeled and unlabeled data with co-training
Published by Association for Computing Machinery (ACM) ,1998
Visually searching the Web for content
IEEE MultiMedia, 1997
Indexing by latent semantic analysis
Journal of the American Society for Information Science, 1990
Term-weighting approaches in automatic text retrieval
Information Processing & Management, 1988