Articulating information needs in XML query languages
- 1 October 2006
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 24 (4) , 407-436
- https://doi.org/10.1145/1185877.1185879
Abstract
Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML Retrieval Evaluation Initiative. Theoretically, we create two mathematical models of users' knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language.Our main experimental findings are: First, while structure is used in varying degrees of complexity, two-thirds of the queries can be expressed in a fielded-search-like format which does not use the hierarchical structure of the documents. Second, three-quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device.Keywords
This publication has 10 references indexed in Scilit:
- Efficient algorithms for processing XPath queriesACM Transactions on Database Systems, 2005
- Semantic characterizations of navigational XPathACM SIGMOD Record, 2005
- Structural properties of XPath fragmentsTheoretical Computer Science, 2005
- Advances in XML Information RetrievalPublished by Springer Nature ,2005
- XIRQLACM Transactions on Information Systems, 2004
- Term Proximity Scoring for Keyword-Based Retrieval SystemsPublished by Springer Nature ,2003
- Modal LogicPublished by Cambridge University Press (CUP) ,2001
- Expressiveness of concept expressions in first-order description logicsArtificial Intelligence, 1999
- Social Network AnalysisPublished by Cambridge University Press (CUP) ,1994
- The first text REtrieval conference (TREC-1)Published by National Institute of Standards and Technology (NIST) ,1993