High throughput grid computing with an IBM Blue Gene/L

1 January 2007

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15525244,p. 357-364
https://doi.org/10.1109/clustr.2007.4629250

Abstract

While much high-performance computing is performed using massively parallel MPI applications, many workflows execute jobs with a mix of processor counts. At the extreme end of the scale, some workloads consist of large quantities of single-processor jobs. These types of workflows lead to inefficient usage of massively parallel architectures such as the IBM Blue Gene/L (BG/L) because of allocation constraints forced by its unique system design. Recently, IBM introduced the ability to schedule individual processors on BG/L - a feature named high throughput computing (HTC) - creating an opportunity to exploit the systempsilas power efficiency for other classes of computing. In this paper, we present a Grid-enabled interface supporting HTC on BG/L. This interface accepts single-processor tasks using Globus GRAM, aggregates HTC tasks into BG/L partitions, and requests partition execution using the underlying system scheduler. By separating HTC task aggregation from scheduling, we provide the ability for workflows constructed using standard Grid middleware to run both parallel and serial jobs on the BG/L. We examine the startup latency and performance of running large quantities of HTC jobs. Finally, we deploy Daymet, a component of a coupled climate model, on a BG/L system using our HTC interface.

Keywords

This publication has 8 references indexed in Scilit:

Creating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development, 2005
Grid-BGC: A Grid-Enabled Terrestrial Carbon Cycle Modeling System
Published by Springer Nature ,2005
Condor-a hunter of idle workstations
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Modeling and measuring the effects of disturbance history and climate on carbon and water budgets in evergreen needleleaf forests
Agricultural and Forest Meteorology, 2002
Matchmaking: distributed resource management for high throughput computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing, 2002
An improved algorithm for estimating incident daily solar radiation from measurements of temperature, humidity, and precipitation
Agricultural and Forest Meteorology, 1999