Informing memory operations

1 May 1998

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems

Vol. 16 (2) , 170-205
https://doi.org/10.1145/279227.279230

Abstract

Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architecturtes do not provide a fine-grained, low-overhead mechanism for observing and reacting to memory behavior directly. To fill this need, this article proposes a new class of memory operations calledinforming memory operations, which essentially consist of a memory operatin combined (either implicitly or explicitly) with a conditional branch-and-ink operation that is taken only if the reference suffers a cache miss. This article describes two different implementations of informing memory operations. One is based on acache-outcome condition code,and the other is based onlow-overhead traps.We find that modern in-order-issue and out-of-order-issue superscalar processors already contain the bulk of the necessary hardware support. We describe how a number of software-based memory optimizations can exploit informing memory operations to enhance performance, and we look at cache coherence with fine-grained access control as a case study. Our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions. We believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations.

Keywords

This publication has 22 references indexed in Scilit:

Continuous profiling
ACM Transactions on Computer Systems, 1997
The Mips R10000 superscalar microprocessor
IEEE Micro, 1996
Tuning memory performance of sequential and parallel programs
Computer, 1995
Fine-grain access control for distributed shared memory
Published by Association for Computing Machinery (ACM) ,1994
Avoiding conflict misses dynamically in large direct-mapped caches
Published by Association for Computing Machinery (ACM) ,1994
A methodology for procedure cloning
Computer Languages, 1993
Mtool: an integrated system for performance debugging shared memory multiprocessor applications
IEEE Transactions on Parallel and Distributed Systems, 1993
SPLASH
ACM SIGARCH Computer Architecture News, 1992
A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors
Journal of Parallel and Distributed Computing, 1990
Performance-measurement tools in a multiprocessor environment
IEEE Transactions on Computers, 1989