Improving CISC instruction decoding performance using a fill unit

Abstract
Current superscalar processors, both RISC and CISC, require substantial instruction fetch and decode bandwidth to keep multiple functional units utilized. While CISC instructions can sometimes provide reduced fetch bandwidth requirements, they are correspondingly more difficult to decode. A hardware assist, called a fill unit, can dynamically collect decoded microoperations into a decoded instruction cache. Future code fetches to those locations can be satisfied out of this cache and thus bypass the decoding logic. This approach is investigated using the Intel x86 architecture, and a speedup of approximately a factor of two over a P6-like decoding structure is seen for the three SPEC benchmarks investigated. This design is accompanied by a microengine-register allocation and renaming scheme that prevents the increased supply of microoperations from placing excessive demands on the normal register renaming hardware.

This publication has 8 references indexed in Scilit: