Predicting bounds on queuing delay for batch-scheduled parallel machines
- 29 March 2006
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 110-118
- https://doi.org/10.1145/1122971.1122989
Abstract
Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and have the option of choosing at which site or sites to submit a parallel job. In such a situation, the amount of time a user's job will wait in any one batch queue can significantly impact the overall time a user waits from job submission to job completion. In this work, we explore a new method for providing end-users with predictions for the bounds on the queuing delay individual jobs will experience. We evaluate this method using batch scheduler logs for distributed-memory parallel machines that cover a 9-year period at 7 large HPC centers.Our results show that it is possible to predict delay bounds reliably for jobs in different queues, and for jobs requesting different ranges of processor counts. Using this information, scientific application developers can intelligently decide where to submit their parallel codes in order to minimize overall turnaround time.Keywords
This publication has 6 references indexed in Scilit:
- Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing EnvironmentsPublished by Springer Nature ,2005
- Grid ComputingPublished by Wiley ,2003
- Parallel Job Scheduling under Dynamic WorkloadsPublished by Springer Nature ,2003
- The network weather service: a distributed resource performance forecasting service for metacomputingFuture Generation Computer Systems, 1999
- Toward convergence in job schedulers for parallel supercomputersPublished by Springer Nature ,1996
- Parallel job scheduling: Issues and approachesPublished by Springer Nature ,1995