Effectiveness of parallel joins
- 1 December 1990
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 2 (4) , 410-424
- https://doi.org/10.1109/69.63253
Abstract
The effectiveness of parallel processing of relational join operations is examined. The skew in the distribution of join attribute values and the stochastic nature of the task processing times are identified as the major factors that can affect the effective exploitation of parallelism. Expressions for the execution time of parallel hash join and semijoin are derived and their effectiveness analyzed. When many small processors are used in the parallel architecture, the skew can result in some processors becoming sources of bottleneck while other processors are being underutilized. Even in the absence of skew, the variations in the processing times of the parallel tasks belonging to a query can lead to high task synchronization delay and impact the maximum speedup achievable through parallel execution. For example, when the task processing time on each processor is exponential with the same mean, the speedup is proportional to P/ln(P) where P is the number of processors. Other factors such as memory size, communication bandwidth, etc., can lead to even lower speedup. These are quantified using analytical models.<>Keywords
This publication has 18 references indexed in Scilit:
- A performance comparison of multimicro and mainframe database architecturesIEEE Transactions on Software Engineering, 1988
- Tradeoffs between coupling small and large processors for transaction processingIEEE Transactions on Computers, 1988
- Join processing in database systems with large main memoriesACM Transactions on Database Systems, 1986
- A simple construction of an upper bound for the mean of the maximum of n identically distributed random variablesJournal of Applied Probability, 1985
- Performance Evaluation of a Database System in Multiple Backend ConfigurationsPublished by Springer Nature ,1985
- Parallel Operation of Magnetic Disk Storage Devices: Synchronized Disk InterleavingPublished by Springer Nature ,1985
- The Genesis of a Database ComputerComputer, 1984
- Join and Semijoin Algorithms for a Multiprocessor Database MachineACM Transactions on Database Systems, 1984
- Performance Modeling of the DBMAC ArchitecturePublished by Springer Nature ,1983
- Access path selection in a relational database management systemPublished by Association for Computing Machinery (ACM) ,1979