Optimal multimodal fusion for multimedia data analysis
- 10 October 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 572-579
- https://doi.org/10.1145/1027527.1027665
Abstract
Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.Keywords
This publication has 15 references indexed in Scilit:
- An ICA algorithm for analyzing multiple data setsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Discovery of a perceptual distance function for measuring image similarityMultimedia Systems, 2003
- Content-based image retrieval with relevance feedback in MARSPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Independent component analysis for understanding multimedia contentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Optimal aggregation algorithms for middlewarePublished by Association for Computing Machinery (ACM) ,2001
- Combining multiple classifiers by averaging or by multiplying?Pattern Recognition, 2000
- 10.1162/153244303768966085Applied Physics Letters, 2000
- Independent component representations for face recognitionPublished by SPIE-Intl Soc Optical Eng ,1998
- A Fast Fixed-Point Algorithm for Independent Component AnalysisNeural Computation, 1997
- An Information-Maximization Approach to Blind Separation and Blind DeconvolutionNeural Computation, 1995