Wide area cluster monitoring with Ganglia

1 January 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 289-298
https://doi.org/10.1109/clustr.2003.1253327

Abstract

In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while keeping processing requirements low. A design for presenting high-level Web-based summaries of the monitor network is provided, along with a generalization to a distributed, multiple-resolution monitoring tree. Emphasis is placed on scalability, fast query response, fault tolerance, and grid compatibility. Experimental evidence is presented that demonstrates the performance of our design.

Keywords

This publication has 6 references indexed in Scilit:

Resource-aware stream management with the customizable dproc distributed monitoring mechanisms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Leveraging standard core technologies to programmatically build Linux cluster appliances
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Efficient reactive monitoring
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Toward efficient monitoring
IEEE Journal on Selected Areas in Communications, 2000
The network weather service: a distributed resource performance forecasting service for metacomputing
Future Generation Computer Systems, 1999
Knowledge and common knowledge in a distributed environment
Published by Association for Computing Machinery (ACM) ,1984