Learning Visual Representations using Images with Captions
- 1 June 2007
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636919,p. 1-8
- https://doi.org/10.1109/cvpr.2007.383173
Abstract
Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fully-supervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of meta-data, but more generally other types of meta-data could be used.Keywords
This publication has 11 references indexed in Scilit:
- Sharing Visual Features for Multiclass and Multiview Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence, 2007
- Multiclass Object Recognition with Sparse, Localized FeaturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene CategoriesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Constructing informative priors using transfer learningPublished by Association for Computing Machinery (ACM) ,2006
- Object Recognition with Features Inspired by Visual CortexPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Identifying Semantically Equivalent Object FragmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The pyramid match kernel: discriminative classification with sets of image featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Learning object categories from Google's image searchPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task SamplingMachine Learning, 1997
- Support-vector networksMachine Learning, 1995