Using processor affinity in loop scheduling on shared-memory multiprocessors

1 April 1994

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 5 (4) , 379-400
https://doi.org/10.1109/71.273046

Abstract

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.<>

Keywords

This publication has 22 references indexed in Scilit:

Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers
IEEE Transactions on Parallel and Distributed Systems, 1993
Factoring: a method for scheduling parallel loops
Communications of the ACM, 1992
Synchronization and communication costs of loop partitioning on shared-memory multiprocessor systems
IEEE Transactions on Parallel and Distributed Systems, 1992
A dynamic scheduling method for irregular parallel programs
Published by Association for Computing Machinery (ACM) ,1992
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems, 1991
The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors
Published by Association for Computing Machinery (ACM) ,1991
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications
Published by Association for Computing Machinery (ACM) ,1991
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
Published by Association for Computing Machinery (ACM) ,1989
Assignment Problems in Parallel and Distributed Computing
Published by Springer Nature ,1987
Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering, 1985