Performance analysis for teraflop computers: a distributed automatic approach

Abstract
Performance analysis for applications on teraflop computers requires a new combination of concepts: online processing, automation, and distribution. The article presents the design of a new analysis system that performs an automatic search for performance problems. This search is guided by a specification of performance properties based on the APART Specification Language. The system is being implemented as a network of analysis agents that are arranged in a hierarchy. Higher level agents search for global performance problems while lower level agents search local performance problems. Leaf agents request and receive performance data from the monitoring library linked to the application. Our online analysis also takes into account design patterns for parallel applications. These patterns make the analysis more effective and the output more application-related. The analysis is currently being implemented for the Hitachi SR8000 teraflop computer at the Leibniz-Rechenzentrum in Munich within the Peridot project.

This publication has 5 references indexed in Scilit: