Early experiences with large-scale Cray XMT systems
- 1 May 2009
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.Keywords
This publication has 4 references indexed in Scilit:
- Parallel Algorithms for Evaluating Centrality Indices in Real-world NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Graph Patterns and the R‐Mat GeneratorPublished by Wiley ,2006
- Exploiting heterogeneous parallelism on a multithreaded multiprocessorPublished by Association for Computing Machinery (ACM) ,1992
- A Set of Measures of Centrality Based on BetweennessSociometry, 1977