Speculation techniques for improving load related instruction scheduling

1 May 1999

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 27 (2) , 42-53
https://doi.org/10.1145/307338.300983

Abstract

State of the art microprocessors achieve high performance by executing multiple instructions per cycle. In an out-of-order engine, the instruction scheduler is responsible for dispatching instructions to execution units based on dependencies, latencies, and resource availability. Most existing instruction schedulers are doing a less than optimal job of scheduling memory accesses and instructions dependent on them, for the following reasons:• Memory dependencies cannot be resolved prior to execution, so loads are not advanced ahead of preceding stores.• The dynamic latencies of load instructions are unknown, so scheduling dependent instructions is based on either optimistic load-use delay (may cause re-scheduling and re-execution) or pessimistic delay (creating unnecessary delays).• Memory pipelines are more expensive than other execution units, and as such, are a scarce resource. Currently, an increase in the memory execution bandwidth is usually achieved through multi-banked caches where bank conflicts limit efficiency.In this paper we present three techniques to address these scheduler limitations. One is to improve the scheduling of load instructions by using a simple memory disambiguation mechanism. The second is to improve the scheduling of load dependent instructions by employing a Data Cache Hit-Miss Predictor to predict the dynamic load latencies. And the third is to improve the efficiency of load scheduling in a multi-banked cache through Cache-Bank Prediction.

Keywords

This publication has 7 references indexed in Scilit:

Dynamic speculation and synchronization of data dependences
Published by Association for Computing Machinery (ACM) ,1997
Trading conflict and capacity aliasing in conditional branch predictors
Published by Association for Computing Machinery (ACM) ,1997
Increasing cache port efficiency for dynamic superscalar microprocessors
Published by Association for Computing Machinery (ACM) ,1996
ARB: a hardware mechanism for dynamic reordering of memory references
IEEE Transactions on Computers, 1996
Simultaneous multithreading
Published by Association for Computing Machinery (ACM) ,1995
Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor
Published by Association for Computing Machinery (ACM) ,1995
Dynamic memory disambiguation using the memory conflict buffer
Published by Association for Computing Machinery (ACM) ,1994