Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

1 August 2007

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 21-28
https://doi.org/10.1109/hoti.2007.11

Abstract

A mesh of trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput between memory units and processors for single-chip parallel processing (Balkan et al., 2006). In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate Verilog simulations to verify the analytical results claimed in (Balkan et al., 2006). We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitration primitives to handle load and store, the two most common memory operations. We also study the use of pipeline registers in large networks when there are long wires. Simulation based on full network layout demonstrates that significant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of performance features of the MoT interconnection network, as they were previously shown to be competitive with traditional network solutions. The MoT network is currently used in an eXplicit multi-threading (XMT) on-chip parallel processor, which is engineered to support parallel programming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.

Keywords

This publication has 12 references indexed in Scilit:

PRAM-on-chip
Published by Association for Computing Machinery (ACM) ,2007
A scalable communication-centric SoC interconnect architecture
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Concurrent flip-flop and repeater insertion for high performance integrated circuits
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
A methodology for correct-by-construction latency insensitive design
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Building the 4 processor SB-PRAM prototype
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
On the area of hypercube layouts
Information Processing Letters, 2002
Real PRAM Programming
Published by Springer Nature ,2002
The Tera computer system
Published by Association for Computing Machinery (ACM) ,1990
Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories
Acta Informatica, 1984
The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers, 1983