Hiding Memory Latency using Dynamic Scheduling in Shared-Memory Multiprocessors

24 August 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 22-33
https://doi.org/10.1109/isca.1992.753301

Abstract

The large latency of memory accesses is a major impediment to achieving high performance in large scale shared-memory multi- processors. Relaxing the memory consistency model is an attractive technique for hiding this latency by allowing the overlap of memory accesses with other computation and memory accesses. Previous studies on relaxed models have shown that the latency of write accesses can be hidden by buffering writes and allowing reads to bypass pending writes. Hiding the latency of reads by exploiting the overlap allowed by relaxed models is inherently more difficult, however, simply because the processor depends on the return value for its future computation. This paper explores the use of dynamically scheduled processors to exploit the overlap allowed by relaxed models for hiding the latency of reads. Our results are based on detailed simulation studies of several parallel applications. The results show that a substantial fraction of the read latency can be hidden using this technique. However, the major improvements in performance are achieved only at large instruction window sizes.

Keywords

This publication has 22 references indexed in Scilit:

Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
LocusRoute: a parallel global router for standard cells
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Toward a dataflow/von Neumann hybrid architecture
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
MASA: a multithreaded processor architecture for parallel symbolic computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Memory consistency and event ordering in scalable shared-memory multiprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Comparative evaluation of latency reducing and tolerating techniques
ACM SIGARCH Computer Architecture News, 1991
Compiler-directed data prefetching in multiprocessors with memory hierarchies
Published by Association for Computing Machinery (ACM) ,1990
Parallel distributed-time logic simulation
IEEE Design & Test of Computers, 1989
Implementation of precise interrupts in pipelined processors
ACM SIGARCH Computer Architecture News, 1985
An Efficient Algorithm for Exploiting Multiple Arithmetic Units
IBM Journal of Research and Development, 1967