A database interface for file update
- 22 May 1995
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 24 (2) , 386-397
- https://doi.org/10.1145/568271.223854
Abstract
Database systems are concerned with structured data. Unfortunately, data is still often available in an unstructured manner (e.g., in files) even when it does have a strong internal structure (e.g., electronic documents or programs). In a previous paper [2], we focussed on the use of high-level query languages to access such files and developed optimization techniques to do so. In this paper, we consider how structured data stored in files can be updated using database update languages. The interest of using database languages to manipulate files is twofold. First, it opens database systems to external data. This concerns data residing in files or data transiting on communication channels and possibly coming from other databases [2]. Secondly, it provides high level query/update facilities to systems that usually rely on very primitive linguistic support. (See [6] for recent works in this direction). Similar motivations appear in [4, 5, 7, 8, 11, 12, 13, 14, 15, 17, 19, 20, 21] In a previous paper, we introduced the notion of structuring schemas as a mean of providing a database view on structured data residing in a file. A structuring schema consists of a grammar together with semantic actions (in a database language). We also showed how queries on files expressed in a high-level query language (O 2 -SQL [3]) could be evaluated efficiently using variations of standard database optimization techniques. The problem of update was mentioned there but remained largely unexplored. This is the topic of the present paper. We argue that updates on files can be expressed conveniently using high-level database update languages that work on the database view of the file. The key problem is how to propagate an update specified on the database (here a view) to the file (here the physical storage). As a first step, we propose a naive way of update propagation: the database view of the file is materialized; the update is performed on the database; the database is "unparsed" to produce an updated file. For this, we develop an unparsing technique. The problems that we meet while developing this technique are related to the well-known view update problem. ( See, for instance [9, 10, 16, 23].) The technique relies on the existence of an inverse mapping from the database to the file. We show that the existence of such an inverse mapping results from the use of restricted structuring schemas. The naive technique presents two major drawbacks. It is inefficient: it entails intense data construction and unparsing, most of which dealing with data not involved in the update. It may result in information loss: information in the file, that is not recorded in the database, may be lost in the process. The major contribution of this paper is a combination of techniques that allows to minimize both the data construction and the unparsing work. First, we briefly show how optimization techniques from [2] can be used to focus on the relevant portion of the database and to avoid constructing the entire database. Then we show that for a class of structuring schemas satisfying a locality condition, it is possible to carefully circumscribe the unparsing. Some of the results in the paper are negative. They should not come as a surprise since we are dealing with complex theoretical foundations: language theory (for parsing and unparsing), and first-order logic (for database languages). However, we do present positive results for particular classes of structuring schemas. We believe that the restrictions imposed on these schemas are very acceptable in practice. (For instance, all "real" examples of structuring schemas that we examined are local. ) The paper is organized as follows. In Section 2, we present the update problem and the structuring schemas; in Section 3, a naive technique for update propagation and the unparsing technique. Section 4 introduces a locality condition, and presents a more efficient technique for propagating updates in local structuring schemas. The last section is a conclusion.Keywords
This publication has 8 references indexed in Scilit:
- Optimizing queries on filesPublished by Association for Computing Machinery (ACM) ,1994
- From structured documents to novel query facilitiesPublished by Association for Computing Machinery (ACM) ,1994
- Internet resource discovery at the University of ColoradoComputer, 1993
- Using collaborative filtering to weave an information tapestryCommunications of the ACM, 1992
- The Datacycle architectureCommunications of the ACM, 1992
- Algorithms for translating view updates to database updates for views involving selections, projections, and joinsPublished by Association for Computing Machinery (ACM) ,1985
- Updates of Relational ViewsJournal of the ACM, 1984
- On the correct translation of update operations on relational viewsACM Transactions on Database Systems, 1982