Decoupling local variable accesses in a wide-issue superscalar processor
- 1 May 1999
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 27 (2) , 100-110
- https://doi.org/10.1145/307338.300988
Abstract
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called the data-decoupled architecture. The approach, with support from the compiler and/or hardware, partitions the memory stream into two independent streams early in the processor pipeline, and feeds each stream to a separate memory access queue and cache. Under this model, the paper studies the potential of decoupling memory accesses to program's local variables that are allocated on the run-time stack. Using a set of integer and floating-point programs from the SPEC95 benchmark suite, it is shown that local variable accesses constitute a large portion of all the memory references, while their reference space is very small, averaging around 7 words per (static) procedure. To service local variable accesses quickly, two optimizations, fast data forwarding and access combining, are proposed and studied. Some of the important design parameters, such as the cache size, the number of cache ports, and the degree of access combining, are studied based on simulations. The potential performance of the proposed scheme is measured using various configurations, and it is concluded that the scheme can become a viable alternative to building a single multi-ported data cache.Keywords
This publication has 21 references indexed in Scilit:
- One billion transistors, one uniprocessor, one chipComputer, 1997
- Spill code minimization via interference region spillingPublished by Association for Computing Machinery (ACM) ,1997
- Superspeculative microarchitecture for beyond AD 2000Computer, 1997
- Increasing cache port efficiency for dynamic superscalar microprocessorsPublished by Association for Computing Machinery (ACM) ,1996
- The Mips R10000 superscalar microprocessorIEEE Micro, 1996
- Increasing the instruction fetch rate via multiple branch prediction and a branch address cachePublished by Association for Computing Machinery (ACM) ,1993
- The priority-based coloring approach to register allocationACM Transactions on Programming Languages and Systems, 1990
- Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computersIEEE Transactions on Computers, 1990
- Register allocation for freePublished by Association for Computing Machinery (ACM) ,1982
- Register allocation & spilling via graph coloringPublished by Association for Computing Machinery (ACM) ,1982