Processor coupling
- 1 April 1992
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 20 (2) , 202-213
- https://doi.org/10.1145/146628.139728
Abstract
The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling, a mechanism for controlling multiple ALUs to exploit both instruction-level and inter-thread parallelism, by using compile time and runtime scheduling. The compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle by cycle basis, and several threads can be active concurrently. We provide simulation results demonstrating that, on four simple numerical benchmarks, processor coupling achieves better performance than purely statically scheduled or multi-processor machine organizations. We examine how performance is affected by restricted communication between ALUs and by long memory latencies. We also present an implementation and feasibility study of a processor coupled node.Keywords
This publication has 10 references indexed in Scilit:
- The Horizon supercomputing system: architecture and softwarePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Instruction-Level Parallel ProcessingScience, 1991
- A variable instruction stream extension to the VLIW architecturePublished by Association for Computing Machinery (ACM) ,1991
- The Tera computer systemPublished by Association for Computing Machinery (ACM) ,1990
- Available instruction-level parallelism for superscalar and superpipelined machinesPublished by Association for Computing Machinery (ACM) ,1989
- Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary resultsPublished by Association for Computing Machinery (ACM) ,1989
- Circuit simulation on shared-memory multiprocessorsIEEE Transactions on Computers, 1988
- Software pipelining: an effective scheduling technique for VLIW machinesPublished by Association for Computing Machinery (ACM) ,1988
- A VLIW architecture for a trace scheduling compilerIEEE Transactions on Computers, 1988
- An Efficient Algorithm for Exploiting Multiple Arithmetic UnitsIBM Journal of Research and Development, 1967