Abstract
I/O-intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media, even if state-of-the-art I/O optimizations are employed. In this paper, we present a distributed multi-storage resource architecture that can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to the traditional single-storage resource architecture, our architecture provides a more flexible and reliable computing environment. It can bring new opportunities for high-performance computing as well as inheriting state-of-the-art I/O optimization approaches that have already been developed. We also develop an application programming interface (API) that provides transparent management and access to various storage resources in our computing environment. As I/O usually dominates the performance in I/O-intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate the performance database. Experiments show that our multi-storage resource architecture is a promising platform for high-performance distributed computing.

This publication has 7 references indexed in Scilit: