Efficient collective communication on heterogeneous networks of workstations
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 01903918,p. 460-467
- https://doi.org/10.1109/icpp.1998.708518
Abstract
Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework for implementing collective communication operations (as defined by the Message Passing Interface (MPI) standard) efficiently for the emerging HNOW environments. We first classify different types of heterogeneity in HNOW and then focus on one important characteristic: communication capabilities of workstations. Taking this characteristic into account, we propose two new approaches Speed-Partitioned Ordered Chain (SPOC) and Fastest-Node First (FNF) to implement collective communication operations with reduced latency. We also investigate methods for deriving optimal trees for broadcast and multicast operations. Generating such trees is shown to be computationally intensive. It is shown that the FNF approach, in spite of its simplicity, can deliver performance within 1% of the performance of the optimal trees. Finally, these new approaches are compared with the approach used in the MPICH implementation on experimental as well as on simulated testbeds. On a 24-node existing HNOW environment with SGI workstations and ATM interconnection our approaches reduce the latency of broadcast and multicast operations by a factor of up to 3.5 compared to the approach used in the existing MPICH implementation. On a 64-node simulated testbed, our approaches can reduce the latency of broadcast and multicast operations by a factor of up to 4.5. Thus, these results demonstrate that there is significant potential for our approaches to be applied towards designing scalable collective communication libraries for current and future generation HNOW environments.Keywords
This publication has 9 references indexed in Scilit:
- ECO: Efficient Collective Operations for communication on heterogeneous networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Performance evaluation of some MPI implementations on workstation clustersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Efficient collective communication on heterogeneous networks of workstationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Multicast on irregular switch-based networks with wormhole routingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of WorkstationsJournal of Parallel and Distributed Computing, 1997
- A high-performance, portable implementation of the MPI message passing interface standardParallel Computing, 1996
- Collective communication in wormhole-routed massively parallel computersComputer, 1995
- A case for NOW (Networks of Workstations)IEEE Micro, 1995
- PVM: A framework for parallel distributed computingConcurrency: Practice and Experience, 1990