Resource-Aware Scientific Computation on a Heterogeneous Cluster

Abstract

Although researchers can develop software on small, local clusters and move it later to larger clusters and supercomputers, the software must run efficiently in both environments. Two efforts aim to improve the efficiency of scientific computation on clusters through resource-aware dynamic load balancing. The popularity of cost-effective clusters built from commodity hardware has opened up a new platform for the execution of software originally designed for tightly coupled supercomputers. Because these clusters can be built to include any number of processors ranging from fewer than 10 to thousands, researchers in high-performance scientific computation at smaller institutions or in smaller departments can maintain local parallel computing resources to support software development and testing, then move the software to larger clusters and supercomputers. As promising as this ability is, it has also led to the need for local expertise and resources to set up and maintain these clusters. The software must execute efficiently both on smaller local clusters and on larger ones. These computing environments vary in the number of processors, speed of processing and communication resources, and size and speed of memory throughout the memory hierarchy as well as in the availability of support tools and preferred programming paradigms. Software developed and optimized using a particular computing environment might not be as efficient when it's moved to another one. In this article, we describe a small cluster along with two efforts to improve the efficiency of parallel scientific computation on that cluster. Both approaches modify the dynamic load-balancing step of an adaptive solution procedure to tailor the distribution of data across the cooperating processes. This modification helps account for the heterogeneity and hierarchy in various computing environments.

Keywords

This publication has 11 references indexed in Scilit:

New challenges in dynamic load balancing
Applied Numerical Mathematics, 2004
MPICH-G2: A Grid-enabled implementation of the Message Passing Interface
Journal of Parallel and Distributed Computing, 2003
An Adaptive Discontinuous Galerkin Technique with an Orthogonal Basis Applied to Compressible Flow Problems
SIAM Review, 2003
Adaptive System Sensitive Partitioning of AMR Applications on Heterogeneous Clusters
Cluster Computing, 2002
Multilevel mesh partitioning for heterogeneous communication networks
Future Generation Computer Systems, 2001
Parallel optimisation algorithms for multilevel mesh partitioning
Parallel Computing, 2000
The network weather service: a distributed resource performance forecasting service for metacomputing
Future Generation Computer Systems, 1999
Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs
SIAM Review, 1999
Parallel, adaptive finite element methods for conservation laws
Applied Numerical Mathematics, 1994
High-order adaptive methods for parabolic systems
Physica D: Nonlinear Phenomena, 1992