Informing memory operations
- 1 May 1998
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 16 (2) , 170-205
- https://doi.org/10.1145/279227.279230
Abstract
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architecturtes do not provide a fine-grained, low-overhead mechanism for observing and reacting to memory behavior directly. To fill this need, this article proposes a new class of memory operations calledinforming memory operations, which essentially consist of a memory operatin combined (either implicitly or explicitly) with a conditional branch-and-ink operation that is taken only if the reference suffers a cache miss. This article describes two different implementations of informing memory operations. One is based on acache-outcome condition code,and the other is based onlow-overhead traps.We find that modern in-order-issue and out-of-order-issue superscalar processors already contain the bulk of the necessary hardware support. We describe how a number of software-based memory optimizations can exploit informing memory operations to enhance performance, and we look at cache coherence with fine-grained access control as a case study. Our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions. We believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations.Keywords
This publication has 22 references indexed in Scilit:
- Continuous profilingACM Transactions on Computer Systems, 1997
- The Mips R10000 superscalar microprocessorIEEE Micro, 1996
- Tuning memory performance of sequential and parallel programsComputer, 1995
- Fine-grain access control for distributed shared memoryPublished by Association for Computing Machinery (ACM) ,1994
- Avoiding conflict misses dynamically in large direct-mapped cachesPublished by Association for Computing Machinery (ACM) ,1994
- A methodology for procedure cloningComputer Languages, 1993
- Mtool: an integrated system for performance debugging shared memory multiprocessor applicationsIEEE Transactions on Parallel and Distributed Systems, 1993
- SPLASHACM SIGARCH Computer Architecture News, 1992
- A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processorsJournal of Parallel and Distributed Computing, 1990
- Performance-measurement tools in a multiprocessor environmentIEEE Transactions on Computers, 1989