Performance Optimization of Pipelined Primary Caches

24 August 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 181-190
https://doi.org/10.1109/isca.1992.753315

Abstract

The CPU cycle time of a high-performance processor is usually determined by the the access time of the primary cache. As processor speeds increase, designers will have to increase the number of pipeline stages used to fetch data from the cache in order to reduce the dependence of CPU cycle time on cache access time. This paper studies the performance advantages of a pipelined cache for a GaAs implementation of the MIPS based architecture using a design methodology that includes long traces of multiprogrammed applications and detailed timing analysis, The study evaluates instruction and data caches with various pipeline depths, cache sizes, block sizes, and refill penalties. The impact on CPU cycle time of these alternatives is also factored into the evaluation. Hardware-based and software-based strategies are considered for hiding the branch and load delays which may be required to avoid pipeline hazards. The results show that software-based methods for mitigating the penalty of branch delays can be as successful as the hardware-based branch-target buffer approach, despite the code-expansion inherent in the software methods. The situation is similar for load delays; while hardware-based dynamic methods hide more delay cycles than do static approaches, they may give up the advantage by extending the cycle time. Because these methods are quite successful at hiding small numbers of branch and load delays, and because processors with pipelined caches also have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two to three pipeline stages to fetch data from the cache.

Keywords

This publication has 13 references indexed in Scilit:

Comparing Software And Hardware Schemes For Reducing The Cost Of Branches
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
The design of a GaAs micro-supercomputer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Multilevel optimization in the design of a high-performance GaAs microcomputer
IEEE Journal of Solid-State Circuits, 1991
A 2-ns cycle, 3.8-ns access 512-kb CMOS ECL SRAM with a fully pipelined architecture
IEEE Journal of Solid-State Circuits, 1991
Reducing the branch penalty by rearranging instructions in a double-width memory
Published by Association for Computing Machinery (ACM) ,1991
How many addressing modes are enough?
ACM SIGARCH Computer Architecture News, 1987
Reducing the cost of branches
ACM SIGARCH Computer Architecture News, 1986
Branch Prediction Strategies and Branch Target Buffer Design
Computer, 1984
A VLSI RISC
Computer, 1982
A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory
IEEE Transactions on Software Engineering, 1978