Archiving scientific data
- 3 June 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.Keywords
This publication has 10 references indexed in Scilit:
- Keys for XMLPublished by Association for Computing Machinery (ACM) ,2001
- XMillPublished by Association for Computing Machinery (ACM) ,2000
- The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000Nucleic Acids Research, 2000
- Meaningful change detection in structured dataPublished by Association for Computing Machinery (ACM) ,1997
- Change detection in hierarchically structured informationPublished by Association for Computing Machinery (ACM) ,1996
- Fast algorithms for the unit cost editing distance between treesJournal of Algorithms, 1990
- Simple Fast Algorithms for the Editing Distance between Trees and Related ProblemsSIAM Journal on Computing, 1989
- Making data structures persistentJournal of Computer and System Sciences, 1989
- AnO(ND) difference algorithm and its variationsAlgorithmica, 1986
- A file comparison programSoftware: Practice and Experience, 1985