Automatically Tuned Collective Communications

Abstract

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30 percent to 650 percent improvement in performance over the native MPI implementations.

Keywords

This publication has 6 references indexed in Scilit:

FFTW: an adaptive software architecture for the FFT
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Bandwidth-efficient collective communication for clustered wide area systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
HARNESS: a next generation distributed virtual machine
Future Generation Computer Systems, 1999
Automatically Tuned Linear Algebra Software
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
LogP: towards a realistic model of parallel computation
Published by Association for Computing Machinery (ACM) ,1993
Two algorithms for barrier synchronization
International Journal of Parallel Programming, 1988