Tuning memory performance of sequential and parallel programs
- 1 April 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Computer
- Vol. 28 (4) , 32-40
- https://doi.org/10.1109/2.375175
Abstract
To improve program memory performance, programmers and compiler writers can transform the application so that its memory-referencing behavior better exploits the memory hierarchy. The challenge in achieving these program transformations is overcoming the difficulty of statically analyzing or reasoning about an application's referencing behavior and interactions. In addition, many performance-monitoring tools collect high-level information that is inadequately detailed to analyze specific memory performance bugs. We describe MemSpy, a performance-monitoring tool we designed to help programmers discern where and why memory bottlenecks occur. MemSpy guides programmers toward program transformations that improve memory performance through detailed statistics on cache-miss causes and frequency. Because of the natural link between data-reference patterns and memory performance, MemSpy helps programmers comprehend data structure and code segment interactions by displaying statistics in terms of both the program's data and code structures, rather than for code structures aloneKeywords
This publication has 10 references indexed in Scilit:
- Cache profiling and the SPEC benchmarks: a case studyComputer, 1994
- A comparison of trace-sampling techniques for multi-megabyte cachesIEEE Transactions on Computers, 1994
- Fast volume rendering using a shear-warp factorization of the viewing transformationPublished by Association for Computing Machinery (ACM) ,1994
- Effectiveness of trace sampling for performance debugging toolsPublished by Association for Computing Machinery (ACM) ,1993
- Mtool: an integrated system for performance debugging shared memory multiprocessor applicationsIEEE Transactions on Parallel and Distributed Systems, 1993
- SPLASHACM SIGARCH Computer Architecture News, 1992
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processorsJournal of Parallel and Distributed Computing, 1990
- Quartz: a tool for tuning parallel program performancePublished by Association for Computing Machinery (ACM) ,1990
- An execution profiler for modular programsSoftware: Practice and Experience, 1983