Performance Optimization of Pipelined Primary Caches
- 24 August 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 181-190
- https://doi.org/10.1109/isca.1992.753315
Abstract
The CPU cycle time of a high-performance processor is usually determined by the the access time of the primary cache. As processor speeds increase, designers will have to increase the number of pipeline stages used to fetch data from the cache in order to reduce the dependence of CPU cycle time on cache access time. This paper studies the performance advantages of a pipelined cache for a GaAs implementation of the MIPS based architecture using a design methodology that includes long traces of multiprogrammed applications and detailed timing analysis, The study evaluates instruction and data caches with various pipeline depths, cache sizes, block sizes, and refill penalties. The impact on CPU cycle time of these alternatives is also factored into the evaluation. Hardware-based and software-based strategies are considered for hiding the branch and load delays which may be required to avoid pipeline hazards. The results show that software-based methods for mitigating the penalty of branch delays can be as successful as the hardware-based branch-target buffer approach, despite the code-expansion inherent in the software methods. The situation is similar for load delays; while hardware-based dynamic methods hide more delay cycles than do static approaches, they may give up the advantage by extending the cycle time. Because these methods are quite successful at hiding small numbers of branch and load delays, and because processors with pipelined caches also have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two to three pipeline stages to fetch data from the cache.Keywords
This publication has 13 references indexed in Scilit:
- Comparing Software And Hardware Schemes For Reducing The Cost Of BranchesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The design of a GaAs micro-supercomputerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Multilevel optimization in the design of a high-performance GaAs microcomputerIEEE Journal of Solid-State Circuits, 1991
- A 2-ns cycle, 3.8-ns access 512-kb CMOS ECL SRAM with a fully pipelined architectureIEEE Journal of Solid-State Circuits, 1991
- Reducing the branch penalty by rearranging instructions in a double-width memoryPublished by Association for Computing Machinery (ACM) ,1991
- How many addressing modes are enough?ACM SIGARCH Computer Architecture News, 1987
- Reducing the cost of branchesACM SIGARCH Computer Architecture News, 1986
- Branch Prediction Strategies and Branch Target Buffer DesignComputer, 1984
- A VLSI RISCComputer, 1982
- A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main MemoryIEEE Transactions on Software Engineering, 1978