Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing
- 1 August 2007
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
A mesh of trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput between memory units and processors for single-chip parallel processing (Balkan et al., 2006). In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate Verilog simulations to verify the analytical results claimed in (Balkan et al., 2006). We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitration primitives to handle load and store, the two most common memory operations. We also study the use of pipeline registers in large networks when there are long wires. Simulation based on full network layout demonstrates that significant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of performance features of the MoT interconnection network, as they were previously shown to be competitive with traditional network solutions. The MoT network is currently used in an eXplicit multi-threading (XMT) on-chip parallel processor, which is engineered to support parallel programming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.Keywords
This publication has 12 references indexed in Scilit:
- PRAM-on-chipPublished by Association for Computing Machinery (ACM) ,2007
- A scalable communication-centric SoC interconnect architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Concurrent flip-flop and repeater insertion for high performance integrated circuitsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A methodology for correct-by-construction latency insensitive designPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Building the 4 processor SB-PRAM prototypePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On the area of hypercube layoutsInformation Processing Letters, 2002
- Real PRAM ProgrammingPublished by Springer Nature ,2002
- The Tera computer systemPublished by Association for Computing Machinery (ACM) ,1990
- Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memoriesActa Informatica, 1984
- The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel ComputerIEEE Transactions on Computers, 1983