Decoupling local variable accesses in a wide-issue superscalar processor

1 May 1999

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 27 (2) , 100-110
https://doi.org/10.1145/307338.300988

Abstract

Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called the data-decoupled architecture. The approach, with support from the compiler and/or hardware, partitions the memory stream into two independent streams early in the processor pipeline, and feeds each stream to a separate memory access queue and cache. Under this model, the paper studies the potential of decoupling memory accesses to program's local variables that are allocated on the run-time stack. Using a set of integer and floating-point programs from the SPEC95 benchmark suite, it is shown that local variable accesses constitute a large portion of all the memory references, while their reference space is very small, averaging around 7 words per (static) procedure. To service local variable accesses quickly, two optimizations, fast data forwarding and access combining, are proposed and studied. Some of the important design parameters, such as the cache size, the number of cache ports, and the degree of access combining, are studied based on simulations. The potential performance of the proposed scheme is measured using various configurations, and it is concluded that the scheme can become a viable alternative to building a single multi-ported data cache.

Keywords

This publication has 21 references indexed in Scilit:

One billion transistors, one uniprocessor, one chip
Computer, 1997
Spill code minimization via interference region spilling
Published by Association for Computing Machinery (ACM) ,1997
Superspeculative microarchitecture for beyond AD 2000
Computer, 1997
Increasing cache port efficiency for dynamic superscalar microprocessors
Published by Association for Computing Machinery (ACM) ,1996
The Mips R10000 superscalar microprocessor
IEEE Micro, 1996
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
Published by Association for Computing Machinery (ACM) ,1993
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems, 1990
Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers
IEEE Transactions on Computers, 1990
Register allocation for free
Published by Association for Computing Machinery (ACM) ,1982
Register allocation & spilling via graph coloring
Published by Association for Computing Machinery (ACM) ,1982