A comparison of data-parallel collective communication performance and its application

Abstract
Collective communications such as broadcast and reduction are commonly used in data parallel programs. It is important to understand the performance of such primitive communications to characterize parallel systems and analyze the performance of parallel applications running on specific parallel systems. We measured the performance of collective communication operations on several multiprocessor systems. In this paper, we report experimental results for collective communication performance on distributed memory systems. We also describe the performance prediction of data parallel programs using the performance of the primitives.

This publication has 3 references indexed in Scilit: