Fast barrier synchronization hardware

Abstract
Many recent studies have considered the importance of barrier synchronization overhead on parallel loop performance, especially for large-scale parallel machines. This paper describes a hardware scheme for supporting fast barrier synchronization. It allows barrier synchronization to be performed within a single instruction cycle for moderately sized systems, and is scalable with logarithmic increase in synchronization time. It supports a large number of concurrent barriers, and can also be used to support a number of different barrier synchronization schemes. Simulation results show that under reasonable assumptions, this hardware can decrease parallel loop execution time significantly, especially for statically scheduled loops.

This publication has 14 references indexed in Scilit: