Aggregating Local Image Descriptors into Compact Codes

Top Cited Papers

13 December 2011

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence

Vol. 34 (9) , 1704-1716
https://doi.org/10.1109/tpami.2011.235

Abstract

This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.

Keywords

This publication has 29 references indexed in Scilit:

The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
Improving Bag-of-Features for Large Scale Image Search
International Journal of Computer Vision, 2009
Video copy detection
Published by Association for Computing Machinery (ACM) ,2007
Fisher Kernels on Visual Vocabularies for Image Categorization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Learning Local Image Descriptors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Object retrieval with large vocabularies and fast spatial matching
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Scalable Recognition with a Vocabulary Tree
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
A Comparison of Affine Region Detectors
International Journal of Computer Vision, 2005
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Video Google: a text retrieval approach to object matching in videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003