Aggregating Local Image Descriptors into Compact Codes
Top Cited Papers
- 13 December 2011
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 34 (9) , 1704-1716
- https://doi.org/10.1109/tpami.2011.235
Abstract
This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.Keywords
This publication has 29 references indexed in Scilit:
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Improving Bag-of-Features for Large Scale Image SearchInternational Journal of Computer Vision, 2009
- Video copy detectionPublished by Association for Computing Machinery (ACM) ,2007
- Fisher Kernels on Visual Vocabularies for Image CategorizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Learning Local Image DescriptorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Object retrieval with large vocabularies and fast spatial matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Scalable Recognition with a Vocabulary TreePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A Comparison of Affine Region DetectorsInternational Journal of Computer Vision, 2005
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Video Google: a text retrieval approach to object matching in videosPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003