Reducing branch delay to zero in pipelined processors

1 March 1993

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. 42 (3) , 363-371
https://doi.org/10.1109/12.210179

Abstract

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.

Keywords

This publication has 12 references indexed in Scilit:

80960-next generation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The 68040 processor. I. Design and implementation
IEEE Micro, 1990
Instruction fetch unit for parallel execution of branch instructions
Published by Association for Computing Machinery (ACM) ,1989
System design using the MIPS R3000/3010 RISC chipset
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1989
A mechanism for reducing the cost of branches in RISC architectures
Microprocessing and Microprogramming, 1988
Reducing the branch penalty in pipelined processors
Computer, 1988
The scalable processor architecture (SPARC)
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1988
System Considerations in the Design of the Am29000
IEEE Micro, 1987
Reducing the cost of branches
ACM SIGARCH Computer Architecture News, 1986
Optimizing delayed branches
ACM SIGMICRO Newsletter, 1982