Wide area cluster monitoring with Ganglia

Abstract
In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while keeping processing requirements low. A design for presenting high-level Web-based summaries of the monitor network is provided, along with a generalization to a distributed, multiple-resolution monitoring tree. Emphasis is placed on scalability, fast query response, fault tolerance, and grid compatibility. Experimental evidence is presented that demonstrates the performance of our design.

This publication has 6 references indexed in Scilit: