Resource-Aware Scientific Computation on a Heterogeneous Cluster

Abstract
Although researchers can develop software on small, local clusters and move it later to larger clusters and supercomputers, the software must run efficiently in both environments. Two efforts aim to improve the efficiency of scientific computation on clusters through resource-aware dynamic load balancing. The popularity of cost-effective clusters built from commodity hardware has opened up a new platform for the execution of software originally designed for tightly coupled supercomputers. Because these clusters can be built to include any number of processors ranging from fewer than 10 to thousands, researchers in high-performance scientific computation at smaller institutions or in smaller departments can maintain local parallel computing resources to support software development and testing, then move the software to larger clusters and supercomputers. As promising as this ability is, it has also led to the need for local expertise and resources to set up and maintain these clusters. The software must execute efficiently both on smaller local clusters and on larger ones. These computing environments vary in the number of processors, speed of processing and communication resources, and size and speed of memory throughout the memory hierarchy as well as in the availability of support tools and preferred programming paradigms. Software developed and optimized using a particular computing environment might not be as efficient when it's moved to another one. In this article, we describe a small cluster along with two efforts to improve the efficiency of parallel scientific computation on that cluster. Both approaches modify the dynamic load-balancing step of an adaptive solution procedure to tailor the distribution of data across the cooperating processes. This modification helps account for the heterogeneity and hierarchy in various computing environments.