Program transformation and runtime support for threaded MPI execution on shared-memory machines

1 July 2000

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems

Vol. 22 (4) , 673-700
https://doi.org/10.1145/363911.363920

Abstract

Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.

Keywords

This publication has 14 references indexed in Scilit:

Thread scheduling for multiprogrammed multiprocessors
Published by Association for Computing Machinery (ACM) ,1998
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors
Published by Association for Computing Machinery (ACM) ,1997
Scheduler-conscious synchronization
ACM Transactions on Computer Systems, 1997
A high-performance MPI implementation on a shared-memory vector supercomputer
Parallel Computing, 1997
Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
Journal of Parallel and Distributed Computing, 1997
A high-performance, portable implementation of the MPI message passing interface standard
Parallel Computing, 1996
The Nexus Approach to Integrating Multithreading and Communication
Journal of Parallel and Distributed Computing, 1996
Wait-free synchronization
ACM Transactions on Programming Languages and Systems, 1991
The performance of spin lock alternatives for shared-memory multiprocessors
IEEE Transactions on Parallel and Distributed Systems, 1990
The performance of multiprogrammed multiprocessor scheduling algorithms
Published by Association for Computing Machinery (ACM) ,1990