Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636897,p. 306-317
- https://doi.org/10.1109/isca.1998.694790
Abstract
Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between these extremes by enabling tasks with run lengths as small as 20 cycles. As this fine-grain parallelism is orthogonal to ILP and coarse threads, it complements both methods and provides an opportunity for greater speedup. This paper describes the efficient communication and synchronization mechanisms implemented in the Multi-ALU Processor (MAP) chip, including a thread creation instruction, register communication, and a hardware barrier. These register-based mechanisms provide 10 times faster communication and 60 times faster synchronization than mechanisms that operate via a shared on-chip cache. With a three-processor implementation of the MAP: fine-grain speedups of 1.2-2.1 are demonstrated on a suite of applications.Keywords
This publication has 11 references indexed in Scilit:
- The impact of synchronization and granularity on parallel systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Thread prioritization: a thread scheduling mechanism for multiple-context parallel processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Simultaneous multithreading: Maximizing on-chip parallelismPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Application performance on the MIT Alewife machineComputer, 1996
- The M-Machine multicomputerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1995
- Multiscalar processorsPublished by Association for Computing Machinery (ACM) ,1995
- The multiflow trace scheduling compilerThe Journal of Supercomputing, 1993
- TPublished by Association for Computing Machinery (ACM) ,1992
- Limits of instruction-level parallelismPublished by Association for Computing Machinery (ACM) ,1991
- The Tera computer systemPublished by Association for Computing Machinery (ACM) ,1990