Communication optimizations for fine-grained UPC applications

1 January 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 1089795X,p. 267-278
https://doi.org/10.1109/pact.2005.13

Abstract

Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applications use bulk memory copies rather than individual reads and writes to the shared space, but finer-grained sharing can be useful for scenarios such as dynamic load balancing, event signaling, and distributed hash tables. In this paper we present three optimization techniques for global address space programs with fine-grained communication: redundancy elimination, use of split-phase communication, and communication coalescing. Parallel UPC programs are analyzed using static single assignment form and a dataflow graph, which are extended to handle the various shared and private pointer types that are available in UPC. The optimizations also take advantage of UPC's relaxed memory consistency model, which reduces the need for cross thread analysis. We demonstrate the effectiveness of the analysis and optimizations using several benchmarks, which were chosen to reflect the kinds of finegrained, communication-intensive phases that exist in some larger applications. The optimizations show speedups of up to 70% on three parallel systems, which represent three different types of cluster network technologies.

Keywords

This publication has 13 references indexed in Scilit:

Message passing vs. shared address space on a cluster of SMPs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems, 1999
Analyses and Optimizations for Shared Address Space Programs
Journal of Parallel and Distributed Computing, 1996
A unified framework for optimizing communication in data-parallel programs
IEEE Transactions on Parallel and Distributed Systems, 1996
Global communication analysis and optimization
Published by Association for Computing Machinery (ACM) ,1996
Effective representation of aliases and indirect memory operations in SSA form
Published by Springer Nature ,1996
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers
Published by Association for Computing Machinery (ACM) ,1995
What are race conditions?
ACM Letters on Programming Languages and Systems, 1992
Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems, 1988
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers, 1979