Sort vs. hash revisited
- 1 January 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 6 (6) , 934-944
- https://doi.org/10.1109/69.334883
Abstract
Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engineKeywords
This publication has 26 references indexed in Scilit:
- Volcano-an extensible and parallel query evaluation systemIEEE Transactions on Knowledge and Data Engineering, 1994
- Tuning a parallel database algorithm on a shared‐memory multiprocessorSoftware: Practice and Experience, 1992
- The input/output complexity of sorting and related problemsCommunications of the ACM, 1988
- Sorting large files on a backend multiprocessorIEEE Transactions on Computers, 1988
- Join processing in database systems with large main memoriesACM Transactions on Database Systems, 1986
- Design and implementation of the wisconsin storage systemSoftware: Practice and Experience, 1985
- Duplicate record elimination in large data filesACM Transactions on Database Systems, 1983
- Extendible hashing—a fast access method for dynamic filesACM Transactions on Database Systems, 1979
- The design and implementation of INGRESACM Transactions on Database Systems, 1976
- System RACM Transactions on Database Systems, 1976