MEGA---the maximizing expected generalization algorithm for learning complex query concepts
- 1 October 2003
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 21 (4) , 347-382
- https://doi.org/10.1145/944012.944014
Abstract
Specifying exact query concepts has become increasingly challenging to end-users. This is because many query concepts (e.g., those for looking up a multimedia object) can be hard to articulate, and articulation can be subjective. In this study, we propose a query-concept learner that learns query criteria through an intelligent sampling process. Our concept learner aims to fulfill two primary design objectives: (1) it has to be expressive in order to model most practical query concepts and (2) it must learn a concept quickly and with a small number of labeled data since online users tend to be too impatient to provide much feedback. To fulfill the first goal, we model query concepts in k -CNF, which can express almost all practical query concepts. To fulfill the second design goal, we propose our maximizing expected generalization algorithm (MEGA), which converges to target concepts quickly by its two complementary steps: sample selection and concept refinement. We also propose a divide-and-conquer method that divides the concept-learning task into G subtasks to achieve speedup. We notice that a task must be divided carefully, or search accuracy may suffer. Through analysis and mining results, we observe that organizing image features in a multiresolution manner, and minimizing intragroup feature correlation, can speed up query-concept learning substantially while maintaining high search accuracy. Through examples, analysis, experiments, and a prototype implementation, we show that MEGA converges to query concepts significantly faster than traditional methods.Keywords
This publication has 23 references indexed in Scilit:
- Clustering for approximate similarity search in high-dimensional spacesIEEE Transactions on Knowledge and Data Engineering, 2002
- A comparison between neural networks and decision trees based on data from industrial radiographic testingPattern Recognition Letters, 2001
- The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experimentsIEEE Transactions on Image Processing, 2000
- Arcing classifier (with discussion and a rejoinder by the author)The Annals of Statistics, 1998
- Supporting ranked Boolean similarity queries in MARSIEEE Transactions on Knowledge and Data Engineering, 1998
- Unsupervised texture classification using vector quantization and deterministic relaxation neural networkIEEE Transactions on Image Processing, 1997
- Learning Boolean formulasJournal of the ACM, 1994
- Efficient and effective Querying by Image ContentJournal of Intelligent Information Systems, 1994
- INTRODUCTIONPublished by Elsevier ,1990
- Fuzzy setsInformation and Control, 1965