Aggregating local descriptors into a compact image representation
Top Cited Papers
- 1 June 2010
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636919,p. 3304-3311
- https://doi.org/10.1109/cvpr.2010.5540039
Abstract
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it best preserves the quality of vector comparison. The evaluation shows that our approach significantly outperforms the state of the art: the search accuracy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.Keywords
This publication has 23 references indexed in Scilit:
- Product Quantization for Nearest Neighbor SearchPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Improving Bag-of-Features for Large Scale Image SearchInternational Journal of Computer Vision, 2009
- Visual Word AmbiguityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Fisher Kernels on Visual Vocabularies for Image CategorizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Learning Local Image DescriptorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Object retrieval with large vocabularies and fast spatial matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Scalable Recognition with a Vocabulary TreePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A Comparison of Affine Region DetectorsInternational Journal of Computer Vision, 2005
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Video Google: a text retrieval approach to object matching in videosPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003