Load execution latency reduction
- 13 July 1998
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
In order to achieve high performance, contemporary microprocessors must effectively pro- cess the four major instruction types: ALU, branch, load, and store instructions. This paper focuses on the reduction of load instruction execution latency. Load execution latency is dependent on memory access latency, pipeline depth, and data dependencies. Through load effective address prediction both data dependencies and deep pipeline effects can poten- tially be removed from the overall execution time. If a load effective address is correctly predicted, the data cache can be speculatively accessed prior to execution, thus effectively reducing the latency of load execution. A hybrid load effective address prediction technique is proposed, using three basic predic- tors: Last Address Predictor (LAP), Stride Predictor (SP), and Global Dynamic Predictor (GDP). In addition to improving load address prediction, this work explores the balance of data ports in the cache memory hierarchy, and the effects of load and store aliasing in wide superscalar machines. Results: Using a realistic hybrid load address predictor, load address prediction rates range from 32% to 77% at an average of 51% for SPECint95 and 60% to 96% at an aver- age of 87% for SPECfp95. For a wide superscalar machine with a significant number of execution resources, this prediction rate increases IPC by 12% and 19% for SPECint 95 and SPECfp95 respectively. It is also shown that load/store aliasing decreases the average IPC by 33% for SPECint95 and 24% for SPECfp95.Keywords
This publication has 13 references indexed in Scilit:
- Dynamic speculation and synchronization of data dependencesPublished by Association for Computing Machinery (ACM) ,1997
- Speculative execution via address prediction and data prefetchingPublished by Association for Computing Machinery (ACM) ,1997
- Value locality and load value predictionPublished by Association for Computing Machinery (ACM) ,1996
- Effective hardware-based data prefetching for high-performance processorsIEEE Transactions on Computers, 1995
- Streamlining data cache access with fast address calculationPublished by Association for Computing Machinery (ACM) ,1995
- The PowerPC user instruction set architectureIEEE Micro, 1994
- The PowerPC 604 RISC microprocessor.IEEE Micro, 1994
- A load-instruction unit for pipelined processorsIBM Journal of Research and Development, 1993
- Branch Prediction Strategies and Branch Target Buffer DesignComputer, 1984
- Cache MemoriesACM Computing Surveys, 1982