Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
- 1 January 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636382,p. 58
- https://doi.org/10.1109/icde.2006.67
Abstract
"Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a "vertical" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.Keywords
This publication has 10 references indexed in Scilit:
- PIVOT and UNPIVOTOptimization and Execution Strategies in an RDBMSPublished by Elsevier ,2004
- Matchmaking: distributed resource management for high throughput computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Flattening an object algebra to provide performancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Metadata-driven ad hoc query of patient data: meeting the needs of clinical studies.Journal of the American Medical Informatics Association, 2002
- A Case for Fractured MirrorsPublished by Elsevier ,2002
- An enterprise directory solution with DB2IBM Systems Journal, 2000
- A query processing strategy for the decomposed storage modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1987
- A decomposition storage modelPublished by Association for Computing Machinery (ACM) ,1985
- Storing a sparse tableCommunications of the ACM, 1979
- A comparison of file organization techniquesPublished by Association for Computing Machinery (ACM) ,1969