Identifying performance bottlenecks on modern microarchitectures using an adaptable probe
- 10 June 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Summary form only given. The gap between peak and delivered performance for scientific applications running on microprocessor-based systems has grown considerably in recent years. The inability to achieve the desired performance even on a single processor is often attributed to an inadequate memory system, but without identification or quantification of a specific bottleneck. In this work, we use an adaptable synthetic benchmark to isolate application characteristics that cause a significant drop in performance, giving application programmers and architects information about possible optimizations. Our adaptable probe, called sqmat, uses only four parameters to capture key characteristics of scientific workloads: working-set size, computational intensity, indirection, and irregularity. This paper describes the implementation of sqmat and uses its tunable parameters to evaluate four leading 64-bit microprocessors that are popular building blocks for current high performance systems: Intel Itanium2, AMD Opteron, IBM Power3, and IBM Power4.Keywords
This publication has 2 references indexed in Scilit:
- Size Scaling of Turbulent Transport in Magnetically Confined PlasmasPhysical Review Letters, 2002
- Iterative minimization techniques forab initiototal-energy calculations: molecular dynamics and conjugate gradientsReviews of Modern Physics, 1992