Abstract
We empirically demonstrate that in order to take advantage of increasing issue widths, superscalar processors require quadratically growing instruction window sizes. Since conventional central window design aims to provide full data fan-out to all the instructions which are in the window, designing large instruction windows using conventional techniques is not feasible. We show that full data fan-out is not necessary for achieving high performance when a novel approach is used to distribute the values. We use direct matching using a small on chip memory called the wait memory to implement the instruction window and bring in a small subset of instructions which are likely to become ready into a match unit where instruction selection and operand matching tasks are performed. We show that the match unit needs to grow only linearly with the issue width. We use SPEC95 benchmarks to demonstrate that at a given instruction window size our algorithm provides over 90 percent of the IPC that can be obtained by a central window implementation that provides full data fan-out.

This publication has 7 references indexed in Scilit: