Multi-modal Clustering for Multimedia Collections
- 1 June 2007
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Most of the online multimedia collections, such as picture galleries or video archives, are categorized in a fully manual process, which is very expensive and may soon be infeasible with the rapid growth of multimedia repositories. In this paper, we present an effective method for automating this process within the unsupervised learning framework. We exploit the truly multi-modal nature of multimedia collections - they have multiple views, or modalities, each of which contributes its own perspective to the collection's organization. For example, in picture galleries, image captions are often provided that form a separate view on the collection. Color histograms (or any other set of global features) form another view. Additional views are blobs, interest points and other sets of local features. Our model, called Comraf* (pronounced Comraf-Star), efficiently incorporates various views in multi-modal clustering, by which it allows great modeling flexibility. Comraf* is a light-weight version of the recently introduced combinatorial Markov random field (Comraf). We show how to translate an arbitrary Comraf into a series of Comraf* models, and give an empirical evidence for comparable effectiveness of the two. Comraf* demonstrates excellent results on two real-world image galleries: it obtains 2.5-3 times higher accuracy compared with a uni-modal k-means.Keywords
This publication has 13 references indexed in Scilit:
- Unsupervised image-set clustering using an information theoretic frameworkIEEE Transactions on Image Processing, 2006
- Discriminating image senses by clustering with multimodal featuresPublished by Association for Computational Linguistics (ACL) ,2006
- Web image clustering by consistent utilization of visual features and surrounding textsPublished by Association for Computing Machinery (ACM) ,2005
- Clustering artPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clusteringPublished by Association for Computing Machinery (ACM) ,2005
- Multi-way distributional clustering via pairwise interactionsPublished by Association for Computing Machinery (ACM) ,2005
- Multiple Bernoulli relevance models for image and video annotationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Using Maximum Entropy for Automatic Image AnnotationPublished by Springer Nature ,2004
- Automatic image annotation and retrieval using cross-media relevance modelsPublished by Association for Computing Machinery (ACM) ,2003
- Learning the semantics of words and picturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002