Regularizing translation models for better automatic image annotation
- 13 November 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 350-359
- https://doi.org/10.1145/1031171.1031242
Abstract
The goal of automatic image annotation is to automatically generate annotations for images to describe their content. In the past, statistical machine translation models have been successfully applied to automatic image annotation task [8]. It views the process of annotating images as a process of translating the content from a 'visual language' to textual words. One problem with the existing translation models is that common words are usually associated with too many different image regions. As a result, uncommon words have little chance to be used for annotating images. Uncommon words are important for automatic image annotation because they are often used in the queries. In this paper, we propose two modified translation models for automatic image annotation, namely the normalized translation model and the regularized translation model, that specifically address the problem of common annotated words. The basic idea is to raise the number of blobs that are associated with uncommon words. The normalized translation model realizes this by scaling translation probabilities of different words with different factors. The same goal is achieved in the regularized translation model through the introduction of a special Dirichlet prior. Empirical study with the Corel dataset has shown that both two modified translation models outperform the original translation model and several existing approaches for automatic image annotation substantially.Keywords
This publication has 7 references indexed in Scilit:
- On image auto-annotation with latent space modelsPublished by Association for Computing Machinery (ACM) ,2003
- Automatic linguistic indexing of pictures by a statistical modeling approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Modeling annotated dataPublished by Association for Computing Machinery (ACM) ,2003
- Automatic image annotation and retrieval using cross-media relevance modelsPublished by Association for Computing Machinery (ACM) ,2003
- CBSA: content-based soft annotation for multimodal image retrieval using bayes point machinesIEEE Transactions on Circuits and Systems for Video Technology, 2003
- Title language model for information retrievalPublished by Association for Computing Machinery (ACM) ,2002
- Relevance based language modelsPublished by Association for Computing Machinery (ACM) ,2001