Reducing branch delay to zero in pipelined processors
- 1 March 1993
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 42 (3) , 363-371
- https://doi.org/10.1109/12.210179
Abstract
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Keywords
This publication has 12 references indexed in Scilit:
- 80960-next generationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- The 68040 processor. I. Design and implementationIEEE Micro, 1990
- Instruction fetch unit for parallel execution of branch instructionsPublished by Association for Computing Machinery (ACM) ,1989
- System design using the MIPS R3000/3010 RISC chipsetPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1989
- A mechanism for reducing the cost of branches in RISC architecturesMicroprocessing and Microprogramming, 1988
- Reducing the branch penalty in pipelined processorsComputer, 1988
- The scalable processor architecture (SPARC)Published by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- System Considerations in the Design of the Am29000IEEE Micro, 1987
- Reducing the cost of branchesACM SIGARCH Computer Architecture News, 1986
- Optimizing delayed branchesACM SIGMICRO Newsletter, 1982