Program transformation and runtime support for threaded MPI execution on shared-memory machines
- 1 July 2000
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Programming Languages and Systems
- Vol. 22 (4) , 673-700
- https://doi.org/10.1145/363911.363920
Abstract
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.Keywords
This publication has 14 references indexed in Scilit:
- Thread scheduling for multiprogrammed multiprocessorsPublished by Association for Computing Machinery (ACM) ,1998
- Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessorsPublished by Association for Computing Machinery (ACM) ,1997
- Scheduler-conscious synchronizationACM Transactions on Computer Systems, 1997
- A high-performance MPI implementation on a shared-memory vector supercomputerParallel Computing, 1997
- Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of WorkstationsJournal of Parallel and Distributed Computing, 1997
- A high-performance, portable implementation of the MPI message passing interface standardParallel Computing, 1996
- The Nexus Approach to Integrating Multithreading and CommunicationJournal of Parallel and Distributed Computing, 1996
- Wait-free synchronizationACM Transactions on Programming Languages and Systems, 1991
- The performance of spin lock alternatives for shared-memory multiprocessorsIEEE Transactions on Parallel and Distributed Systems, 1990
- The performance of multiprogrammed multiprocessor scheduling algorithmsPublished by Association for Computing Machinery (ACM) ,1990