Data placement in Bubba
- 1 June 1988
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 17 (3) , 99-108
- https://doi.org/10.1145/971701.50213
Abstract
This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue. In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead. We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.Keywords
This publication has 20 references indexed in Scilit:
- Comparison of dataflow control techniques in distributed data-intensive systemsPublished by Association for Computing Machinery (ACM) ,1988
- Process and dataflow control in distributed data-intensive systemsPublished by Association for Computing Machinery (ACM) ,1988
- A workload characterization pipeline for models of parallel systemsPublished by Association for Computing Machinery (ACM) ,1987
- The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor SystemsIEEE Transactions on Computers, 1987
- The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU timePublished by Association for Computing Machinery (ACM) ,1987
- A measure of program locality and its applicationPublished by Association for Computing Machinery (ACM) ,1984
- Optimal file designs and reorganization pointsACM Transactions on Database Systems, 1982
- Data base reorganization by clustering methodInformation Systems, 1978
- The Operational Analysis of Queueing Network ModelsACM Computing Surveys, 1978
- Notes on data base operating systemsPublished by Springer Nature ,1978