Scalar operand networks: on-chip interconnect for ILP in partitioned architectures

27 August 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 341-353
https://doi.org/10.1109/hpca.2003.1183551

Abstract

The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALU. Previous superscalar designs implemented this interconnect using centralized structures that do not scale with increasing ILP demands. In search of scalability, recent microprocessor designs in industry and academia exhibit a trend towards distributed resources such as partitioned register files, banked caches, multiple independent compute pipelines, and even multiple program counters. Some of these partitioned microprocessor designs have begun to implement bypassing and operand transport using point-to-point interconnects rather than centralized networks. We call interconnects optimized for scalar data transport, whether centralized or distributed, scalar operand networks. Although these networks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance, they have many unique requirements, including ultra-low latencies (a few cycles versus tens of cycles) and ultra-fast operation-operand matching. This paper discusses the unique properties of scalar operand networks, examines alternative ways of implementing them, and describes in detail the implementation of one such network in the Raw microprocessor. The paper analyzes the performance of these networks for ILP workloads and the sensitivity of overall ILP performance to network properties.

Keywords

This publication has 9 references indexed in Scilit:

The implementation of the next-generation 64b itanium microprocessor
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Anatomy of a Message in the Alewife Multiprocessor
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Increasing and detecting memory address congruence
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The RAW benchmark suite: computation structures for general purpose computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The Raw microprocessor: a computational fabric for software circuits and general-purpose programs
IEEE Micro, 2002
Space-time scheduling of instruction-level parallelism on a raw machine
Published by Association for Computing Machinery (ACM) ,1998
Partitioned register file for TTAs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1995
THE EVOLUTION OF DATAFLOW ARCHITECTURES: FROM STATIC DATAFLOW TO P-RISC
International Journal of High Speed Computing, 1993
A VLSI Architecture for Concurrent Data Structures
Published by Springer Nature ,1987