The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale
- 1 January 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 127 (15525244) , 1-12
- https://doi.org/10.1109/clustr.2006.311846
Abstract
We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the probability of such an interruption is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influenceKeywords
This publication has 10 references indexed in Scilit:
- Operating system issues for petascale systemsACM SIGOPS Operating Systems Review, 2006
- System noise, OS clock ticks, and fine-grained parallel applicationsPublished by Association for Computing Machinery (ACM) ,2005
- Analysis of microbenchmarks for performance tuning of clustersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Blue Gene/L programming and operating environmentIBM Journal of Research and Development, 2005
- The Impact of Noise on the Scaling of Collectives: A Theoretical ApproachPublished by Springer Nature ,2005
- The Case of the Missing Supercomputer PerformancePublished by Association for Computing Machinery (ACM) ,2003
- Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating SystemPublished by Association for Computing Machinery (ACM) ,2003
- A performance comparison of Linux and a lightweight kernelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Application-bypass reduction for large-scale clustersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Paging tradeoffs in distributed-shared-memory multiprocessorsThe Journal of Supercomputing, 1996