Sort vs. hash revisited

1 January 1994

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 6 (6) , 934-944
https://doi.org/10.1109/69.334883

Abstract

Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine

Keywords

This publication has 26 references indexed in Scilit:

Volcano-an extensible and parallel query evaluation system
IEEE Transactions on Knowledge and Data Engineering, 1994
Tuning a parallel database algorithm on a shared‐memory multiprocessor
Software: Practice and Experience, 1992
The input/output complexity of sorting and related problems
Communications of the ACM, 1988
Sorting large files on a backend multiprocessor
IEEE Transactions on Computers, 1988
Join processing in database systems with large main memories
ACM Transactions on Database Systems, 1986
Design and implementation of the wisconsin storage system
Software: Practice and Experience, 1985
Duplicate record elimination in large data files
ACM Transactions on Database Systems, 1983
Extendible hashing—a fast access method for dynamic files
ACM Transactions on Database Systems, 1979
The design and implementation of INGRES
ACM Transactions on Database Systems, 1976
System R
ACM Transactions on Database Systems, 1976