Effects of communication latency, overhead, and bandwidth in a cluster architecture
- 1 May 1997
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 25 (2) , 85-97
- https://doi.org/10.1145/384286.264146
Abstract
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.Keywords
This publication has 33 references indexed in Scilit:
- The KSR 1: bridging the gap between shared memory and MPPsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- High-performance sorting on networks of workstationsPublished by Association for Computing Machinery (ACM) ,1997
- Fast parallel sorting under LogP: experience with the CM-5IEEE Transactions on Parallel and Distributed Systems, 1996
- Assessing fast network interfacesIEEE Micro, 1996
- Myrinet: a gigabit-per-second local area networkIEEE Micro, 1995
- Complete computer system simulation: the SimOS approachIEEE Parallel & Distributed Technology: Systems & Applications, 1995
- A case for NOW (Networks of Workstations)IEEE Micro, 1995
- TNet: a reliable system area networkIEEE Micro, 1995
- Message passing on the Meiko CS-2Parallel Computing, 1994
- The Manchester prototype dataflow computerCommunications of the ACM, 1985