Narrowing the semantic gap - improved text-based web document retrieval using visual features
Top Cited Papers
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia
- Vol. 4 (2) , 189-200
- https://doi.org/10.1109/tmm.2002.1017733
Abstract
We present the results of our work that seek to negotiate the gap between low-level features and high-level concepts in the domain of web document retrieval. This work concerns a technique, called the latent semantic indexing (LSI), which has been used for textual information retrieval for many years. In this environment, LSI determines clusters of co-occurring keywords so that a query which uses a particular keyword can then retrieve documents perhaps not containing this keyword, but containing other keywords from the same cluster. In this paper, we examine the use of this technique for content-based web document retrieval, using both keywords and image features to represent the documents. Two different approaches to image feature representation, namely, color histograms and color anglograms, are adopted and evaluated. Experimental results show that LSI, together with both textual and visual features, is able to extract the underlying semantic structure of web documents, thus helping to improve the retrieval performance significantly, even when querying is done using only keywords.Keywords
This publication has 37 references indexed in Scilit:
- Negotiating the semantic gap: from feature maps to semantic landscapesPattern Recognition, 2001
- Semantics in visual information retrievalIEEE MultiMedia, 1999
- Relevance feedback: a power tool for interactive content-based image retrievalIEEE Transactions on Circuits and Systems for Video Technology, 1998
- Using path profiles to predict HTTP requestsComputer Networks and ISDN Systems, 1998
- What is a tall poppy among Web pages?Computer Networks and ISDN Systems, 1998
- Image retrieval using color and shapePattern Recognition, 1996
- Content based image retrieval systemsComputer, 1995
- Unsupervised feature reduction in image segmentation by local transformsPattern Recognition Letters, 1993
- Improving the retrieval of information from external sourcesBehavior Research Methods, Instruments & Computers, 1991
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990