Quelling queue storms
- 24 January 2004
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper characterizes "queue storms" in supercomputer systems and discusses methods for quelling them. Queue storms are anomalously large queue lengths dependent upon the job size mix, the queuing system, the machine size, and correlations and dependencies between job submissions. We use synthetic data generated from actual job log data from the ASCI Blue Mountain supercomputer combined with different long-range dependencies. We show the distribution of times from the first storm to occur, which is in a sense the time when the machine becomes obsolete because it represents the time when the machine first fails to provide satisfactory turnaround. To overcome queue storms, more resources are needed even if they appear superfluous most of the time. We present two methods, including a grid-based solution, for reducing these correlations and their resulting effect on the size and frequency of queue storms.Keywords
This publication has 14 references indexed in Scilit:
- Production job scheduling for parallel shared memory systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- ASCI queuing systems: overview and comparisonsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfillingIEEE Transactions on Parallel and Distributed Systems, 2001
- An evaluation of parallel job scheduling for ASCI Blue-PacificPublished by Association for Computing Machinery (ACM) ,1999
- Fractals and Scaling in FinancePublished by Springer Nature ,1997
- The EASY — LoadLeveler API projectPublished by Springer Nature ,1996
- Long-range dependence in variable-bit-rate video trafficIEEE Transactions on Communications, 1995
- A Fast Fractional Gaussian Noise GeneratorWater Resources Research, 1971
- Robustness of the rescaled range R/S in the measurement of noncyclic long run statistical dependenceWater Resources Research, 1969
- Noah, Joseph, and Operational HydrologyWater Resources Research, 1968