Hiding Memory Latency using Dynamic Scheduling in Shared-Memory Multiprocessors
- 24 August 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The large latency of memory accesses is a major impediment to achieving high performance in large scale shared-memory multi- processors. Relaxing the memory consistency model is an attractive technique for hiding this latency by allowing the overlap of memory accesses with other computation and memory accesses. Previous studies on relaxed models have shown that the latency of write accesses can be hidden by buffering writes and allowing reads to bypass pending writes. Hiding the latency of reads by exploiting the overlap allowed by relaxed models is inherently more difficult, however, simply because the processor depends on the return value for its future computation. This paper explores the use of dynamically scheduled processors to exploit the overlap allowed by relaxed models for hiding the latency of reads. Our results are based on detailed simulation studies of several parallel applications. The results show that a substantial fraction of the read latency can be hidden using this technique. However, the major improvements in performance are achieved only at large instruction window sizes.Keywords
This publication has 22 references indexed in Scilit:
- Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary ResultsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- LocusRoute: a parallel global router for standard cellsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Toward a dataflow/von Neumann hybrid architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- MASA: a multithreaded processor architecture for parallel symbolic computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Memory consistency and event ordering in scalable shared-memory multiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Comparative evaluation of latency reducing and tolerating techniquesACM SIGARCH Computer Architecture News, 1991
- Compiler-directed data prefetching in multiprocessors with memory hierarchiesPublished by Association for Computing Machinery (ACM) ,1990
- Parallel distributed-time logic simulationIEEE Design & Test of Computers, 1989
- Implementation of precise interrupts in pipelined processorsACM SIGARCH Computer Architecture News, 1985
- An Efficient Algorithm for Exploiting Multiple Arithmetic UnitsIBM Journal of Research and Development, 1967