Automated learning of load-balancing strategies in multiprogrammed distributed systems

1 July 1997

journal article
research article
Published by Taylor & Francis in International Journal of Systems Science

Vol. 28 (11) , 1077-1099
https://doi.org/10.1080/00207729708929470

Abstract

Dynamic load-balancing strategies for distributed systems seek to improve average completion time of independent tasks by migrating each incoming task to the site where it is expected to finish the fastest: usually the site having the smallest load index. SMALL is an offline learning system for developing configuration-specific load-balancing strategies; it learns new load indices as well as tunes the parameters of given migration policies. Using a dynamic workload generator, a number of typical systemwide load patterns are first recorded; the completion times of several benchmark jobs are then measured at each site, under each of the recorded load patterns. These measurements are used to train comparator neural networks simultaneously, one per site. The comparators collectively model a set of perfect load indices in that they seek to rank, at arrival time, the possible destinations for an incoming task by their (not yet known) respective completion times. The numerous parameters of the decentralized dynamic load-balancing policy are then tuned using a genetic algorithm. We present experimental results for a mix of scientific and interactive workloads on Sun workstations connected by Ethernet. The policies tuned by SMALL are shown to exploit idle resources intelligently and effectively.

Keywords

This publication has 9 references indexed in Scilit:

Population-based learning: a method for learning from examples under resource constraints
IEEE Transactions on Knowledge and Data Engineering, 1992
Intelligent process mapping through systematic improvement of heuristics
Journal of Parallel and Distributed Computing, 1992
Predictability of process resource usage: a measurement-based study on UNIX
IEEE Transactions on Software Engineering, 1989
The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers
The International Journal of Supercomputing Applications, 1989
A parallel network that learns to play backgammon
Artificial Intelligence, 1989
GAMMON: a load balancing strategy for local computer systems with multiaccess networks
IEEE Transactions on Computers, 1989
Performance Studies of Dynamic Load Balancing in Distributed Systems
Published by Defense Technical Information Center (DTIC) ,1987
Using stochastic learning automata for job scheduling in distributed processing systems
Journal of Parallel and Distributed Computing, 1986
An Application of Bayesian Decision Theory to Decentralized Control of Job Scheduling
IEEE Transactions on Computers, 1985