Colorful XML
- 13 June 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 251-262
- https://doi.org/10.1145/1007568.1007598
Abstract
XML has a tree-structured data model, which is used to uniformly represent structured as well as semi-structured data, and also enable concise query specification in XQuery, via the use of its XPath (twig) patterns. This in turn can leverage the recently developed technology of structural join algorithms to evaluate the query efficiently. In this paper, we identify a fundamental tension in XML data modeling: (i) data represented as deep trees (which can make effective use of twig patterns) are often un-normalized, leading to update anomalies, while (ii) normalized data tends to be shallow, resulting in heavy use of expensive value-based joins in queries.Our solution to this data modeling problem is a novel multi-colored trees (MCT) logical data model, which is an evolutionary extension of the XML data model, and permits trees with multi-colored nodes to signify their participation in multiple hierarchies. This adds significant semantic structure to individual data nodes. We extend XQuery expressions to navigate between structurally related nodes, taking color into account, and also to create new colored trees as restructurings of an MCT database. While MCT serves as a significant evolutionary extension to XML as a logical data model, one of the key roles of XML is for information exchange. To enable exchange of MCT information, we develop algorithms for optimally serializing an MCT database as XML. We discuss alternative physical representations for MCT databases, using relational and native XML databases, and describe an implementation on top of the Timber native XML database. Experimental evaluation, using our prototype implementation, shows that not only are MCT queries/updates more succinct and easier to express than equivalent shallow tree XML queries, but they can also be significantly more efficient to evaluate than equivalent deep and shallow tree XML queries/updates.Keywords
This publication has 15 references indexed in Scilit:
- TIMBER: A native XML databaseThe VLDB Journal, 2002
- Anatomy of a native XML base management systemThe VLDB Journal, 2002
- ToXgenePublished by Association for Computing Machinery (ACM) ,2002
- OLAP dimension constraintsPublished by Association for Computing Machinery (ACM) ,2002
- Storing and querying ordered XML using a relational database systemPublished by Association for Computing Machinery (ACM) ,2002
- XRelACM Transactions on Internet Technology, 2001
- A foundation for capturing and querying complex multidimensional dataInformation Systems, 2001
- Updating XMLPublished by Association for Computing Machinery (ACM) ,2001
- A query language for a Web-site management systemACM SIGMOD Record, 1997
- Hy+Published by Association for Computing Machinery (ACM) ,1993