Superscalar execution with dynamic data forwarding

27 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 130-135
https://doi.org/10.1109/pact.1998.727183

Abstract

We empirically demonstrate that in order to take advantage of increasing issue widths, superscalar processors require quadratically growing instruction window sizes. Since conventional central window design aims to provide full data fan-out to all the instructions which are in the window, designing large instruction windows using conventional techniques is not feasible. We show that full data fan-out is not necessary for achieving high performance when a novel approach is used to distribute the values. We use direct matching using a small on chip memory called the wait memory to implement the instruction window and bring in a small subset of instructions which are likely to become ready into a match unit where instruction selection and operand matching tasks are performed. We show that the match unit needs to grow only linearly with the issue width. We use SPEC95 benchmarks to demonstrate that at a given instruction window size our algorithm provides over 90 percent of the IPC that can be obtained by a central window implementation that provides full data fan-out.

Keywords

This publication has 7 references indexed in Scilit:

Increasing the instruction fetch rate via block-structured instruction set architectures
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Trace cache: a low latency approach to high bandwidth instruction fetching
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Automatic generation of microarchitecture simulators
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
One billion transistors, one uniprocessor, one chip
Computer, 1997
Superspeculative microarchitecture for beyond AD 2000
Computer, 1997
Instruction Issue Logic in Pipelined Supercomputers
IEEE Transactions on Computers, 1984
An Efficient Algorithm for Exploiting Multiple Arithmetic Units
IBM Journal of Research and Development, 1967