Parallel program performance prediction using deterministic task graph analysis
- 1 February 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 22 (1) , 94-136
- https://doi.org/10.1145/966785.966788
Abstract
In this article, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalability, but have been less successful in practice for providing detailed insights and metrics for program performance (leaving these to measurement or simulation). We develop a conceptually simple modeling technique called deterministic task graph analysis that provides detailed performance prediction for shared-memory programs with arbitrary task graphs, a wide variety of task scheduling policies, and significant communication and resource contention. Unlike many previous models that are stochastic models, our model assumes deterministic task execution times (while retaining the use of stochastic models for communication and resource contention). This assumption is supported by a previous study of the influence of nondeterministic delays in parallel programs.We evaluate our model in three ways. First, an experimental evaluation shows that our analysis technique is accurate and efficient for a variety of shared-memory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly nonuniform task times, and significant communication and resource contention. The results also show that the deterministic assumption is crucial to permit accurate and yet efficient analysis of these programs. Second, we use three example programs to illustrate the predictive capabilities of the model. In two cases, broad insights and detailed metrics from the model are used to suggest improvements in load-balancing and the model quickly and accurately predicts the impact of these changes. In the third case, the model provides novel insights into the impact of program design changes that improve communication locality as well as load-balancing, via new (but general-purpose) metrics. Finally, we present results from a comparison of our model and representative stochastic models, and use these to characterize the conditions under which a deterministic model or stochastic models would be appropriate.Keywords
This publication has 27 references indexed in Scilit:
- POEMS: end-to-end performance design of large parallel adaptive computational systemsIEEE Transactions on Software Engineering, 2000
- Visual programming and debugging for parallel computingIEEE Parallel & Distributed Technology: Systems & Applications, 1995
- Performance analysis of mesh interconnection networks with deterministic routingIEEE Transactions on Parallel and Distributed Systems, 1994
- Performance of parallel processorsParallel Computing, 1989
- On the promise of general-purpose parallel computingParallel Computing, 1989
- Speedup versus efficiency in parallel systemsIEEE Transactions on Computers, 1989
- Reevaluating Amdahl's lawCommunications of the ACM, 1988
- The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor SystemsIEEE Transactions on Computers, 1987
- Analytic Queueing Models for Programs with Internal ConcurrencyIEEE Transactions on Computers, 1983
- Performance of Synchronized Iterative Processes in Multiprocessor SystemsIEEE Transactions on Software Engineering, 1982