A survey in indexing and searching XML documents
- 1 January 2002
- journal article
- research article
- Published by Wiley in Journal of the American Society for Information Science and Technology
- Vol. 53 (6) , 415-437
- https://doi.org/10.1002/asi.10056
Abstract
XML holds the promise to yield (1) a more precise search by providing additional information in the elements, (2) a better integrated search of documents from heterogeneous sources, (3) a powerful search paradigm using structural as well as content specifications, and (4) data and information exchange to share resources and to support cooperative search. We survey several indexing techniques for XML documents, grouping them into flat‐file, semistructured, and structured indexing paradigms. Searching techniques and supporting techniques for searching are reviewed, including full text search and multistage search. Because searching XML documents can be very flexible, various search result presentations are discussed, as well as database and information retrieval system integration and XML query languages. We also survey various retrieval models, examining how they would be used or extended for retrieving XML documents. To conclude the article, we discuss various open issues that XML poses with respect to information retrieval and database research.Keywords
This publication has 93 references indexed in Scilit:
- XML and information retrievalACM SIGMOD Record, 2001
- Comparative analysis of five XML query languagesACM SIGMOD Record, 2000
- On views and XMLACM SIGMOD Record, 1999
- Quasi-cubesACM SIGMOD Record, 1997
- Querying documents in object databasesInternational Journal on Digital Libraries, 1997
- Integrating contents and structure in text retrievalACM SIGMOD Record, 1996
- An Algebra for Structured Text Search and a Framework for its ImplementationThe Computer Journal, 1995
- Extending the database relational model to capture more meaningACM Transactions on Database Systems, 1979
- Multidimensional Binary Search Trees in Database ApplicationsIEEE Transactions on Software Engineering, 1979
- Analysis and performance of inverted data base structuresCommunications of the ACM, 1975