Processor coupling

1 April 1992

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 20 (2) , 202-213
https://doi.org/10.1145/146628.139728

Abstract

The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling, a mechanism for controlling multiple ALUs to exploit both instruction-level and inter-thread parallelism, by using compile time and runtime scheduling. The compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle by cycle basis, and several threads can be active concurrently. We provide simulation results demonstrating that, on four simple numerical benchmarks, processor coupling achieves better performance than purely statically scheduled or multi-processor machine organizations. We examine how performance is affected by restricted communication between ALUs and by long memory latencies. We also present an implementation and feasibility study of a processor coupled node.

Keywords

This publication has 10 references indexed in Scilit:

The Horizon supercomputing system: architecture and software
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Instruction-Level Parallel Processing
Science, 1991
A variable instruction stream extension to the VLIW architecture
Published by Association for Computing Machinery (ACM) ,1991
The Tera computer system
Published by Association for Computing Machinery (ACM) ,1990
Available instruction-level parallelism for superscalar and superpipelined machines
Published by Association for Computing Machinery (ACM) ,1989
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results
Published by Association for Computing Machinery (ACM) ,1989
Circuit simulation on shared-memory multiprocessors
IEEE Transactions on Computers, 1988
Software pipelining: an effective scheduling technique for VLIW machines
Published by Association for Computing Machinery (ACM) ,1988
A VLIW architecture for a trace scheduling compiler
IEEE Transactions on Computers, 1988
An Efficient Algorithm for Exploiting Multiple Arithmetic Units
IBM Journal of Research and Development, 1967