Design and Implementation of Open MPI over Quadrics/Elan4
- 19 April 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Open MPI is a project recently initiated to provide a fault-tolerant, multi-network capable implementation of MPI-2 (1997), based on experiences gained from FT-MPI (G. E. Fagg et al., 2003), LA-MPI (R. L. Graham et al., 2003), LAM/MPI (J. Squyers et al., 2003), and MVAPICH projects. Its initial communication architecture is layered on top of TCP/IP. In this paper, we have designed and implemented open MPI point-to-point layer on top of a high-end interconnect, Quadrics/Elan4. The restriction of Quadrics static process model has been overcome to accommodate the requirement of MPI-2 dynamic process management. Quadrics queued-based direct memory access (QDMA) and remote direct memory access (RDMA) mechanisms have been integrated to form a low-overhead, high-performance transport layer. Lightweight asynchronous progress is made possible with a combination of Quadrics chained event and QDMA mechanisms. Experimental results indicate that the resulting point-to-point transport layer is able to achieve comparable performance to Quadrics native QDMA operations, from which it is derived. Our implementation provides an MPI-2 compliant message passing library over Quadrics/Elan4 with a performance comparable to MPICH-Quadrics.Keywords
This publication has 8 references indexed in Scilit:
- Open MPI’s TEG Point-to-Point Communications Methodology: Comparison to Existing ImplementationsPublished by Springer Nature ,2004
- A Network-Failure-Tolerant Message-Passing System for Terascale ClustersInternational Journal of Parallel Programming, 2003
- Dynamic process management in an MPI settingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The Quadrics network: high-performance clustering technologyIEEE Micro, 2002
- User-level network interface protocolsComputer, 1998
- Globus: a Metacomputing Infrastructure ToolkitThe International Journal of Supercomputer Applications and High Performance Computing, 1997
- Myrinet: a gigabit-per-second local area networkIEEE Micro, 1995
- An analysis of TCP processing overheadIEEE Communications Magazine, 1989