Data placement in Bubba

1 June 1988

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record

Vol. 17 (3) , 99-108
https://doi.org/10.1145/971701.50213

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue. In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead. We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

Keywords

This publication has 20 references indexed in Scilit:

Comparison of dataflow control techniques in distributed data-intensive systems
Published by Association for Computing Machinery (ACM) ,1988
Process and dataflow control in distributed data-intensive systems
Published by Association for Computing Machinery (ACM) ,1988
A workload characterization pipeline for models of parallel systems
Published by Association for Computing Machinery (ACM) ,1987
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems
IEEE Transactions on Computers, 1987
The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time
Published by Association for Computing Machinery (ACM) ,1987
A measure of program locality and its application
Published by Association for Computing Machinery (ACM) ,1984
Optimal file designs and reorganization points
ACM Transactions on Database Systems, 1982
Data base reorganization by clustering method
Information Systems, 1978
The Operational Analysis of Queueing Network Models
ACM Computing Surveys, 1978
Notes on data base operating systems
Published by Springer Nature ,1978