An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration
- 26 March 2003
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 14 (3) , 236-247
- https://doi.org/10.1109/tpds.2003.1189582
Abstract
Effective scheduling strategies to improve response times, throughput, and utilization are an important consideration in large supercomputing environments. Parallel machines in these environments have traditionally used space-sharing strategies to accommodate multiple jobs at the same time by dedicating the nodes to a single job until it completes. This approach, however, can result in low system utilization and large job wait times. This paper discusses three techniques that can be used beyond simple space-sharing to improve the performance of large parallel systems. The first technique we analyze is backfilling, the second is gang-scheduling, and the third is migration. The main contribution of this paper is an analysis of the effects of combining the above techniques. Using extensive simulations based on detailed models of realistic workloads, the benefits of combining the various techniques are shown over a spectrum of performance criteria.Keywords
This publication has 17 references indexed in Scilit:
- Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory MultiprocessorsJournal of Parallel and Distributed Computing, 1998
- Improving first-come-first-serve job scheduling by gang schedulingPublished by Springer Nature ,1998
- Implementing the combination of time sharing and space sharing on AP/LinuxPublished by Springer Nature ,1998
- Improved utilization and responsiveness with gang schedulingPublished by Springer Nature ,1997
- Modeling of workload in MPPsPublished by Springer Nature ,1997
- A historical application profiler for use by parallel schedulersPublished by Springer Nature ,1997
- Using queue time predictions for processor allocationPublished by Springer Nature ,1997
- The EASY — LoadLeveler API projectPublished by Springer Nature ,1996
- Managing checkpoints for parallel programsPublished by Springer Nature ,1996
- Load balancing and fault tolerance in workstation clusters migrating groups of communicating processesACM SIGOPS Operating Systems Review, 1995