Improving parallel data transfer times using predicted variances in shared networks

1 January 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 2, 734-742 Vol. 2
https://doi.org/10.1109/ccgrid.2005.1558636

Abstract

It is increasingly common to use multiple distributed storage systems as a single data store within which large datasets may be replicated. Thus, we face the problem of how to access replicated data efficiently. Multiple-source parallel transfers can reduce access times by transferring data from several replicas in parallel. However, we then face the problem of deciding which data to fetch from which replicas. We propose a Tuned Conservative scheduling technique that uses predicted means and variances for network performance to make data selection decisions. This stochastic scheduling technique adjusts the amount of data fetched on a link according to not only the link performance but the expected variance in that performance. We incorporate our technique into the striped GridFTP server from the Globus Toolkit, and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.

Keywords

This publication has 18 references indexed in Scilit:

The Globus Striped GridFTP Framework and Server
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Homeostatic and tendency-based CPU load predictions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Grid Harvest Service: a system for long-term, application-level task scheduling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Dissecting BitTorrent: Five Months in a Torrent’s Lifetime
Published by Springer Nature ,2004
Conservative Scheduling
Published by Association for Computing Machinery (ACM) ,2003
Using Regression Techniques to Predict Large Data Transfers
The International Journal of High Performance Computing Applications, 2003
A distributed multi-storage resource architecture and I/O performance prediction for scientific computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
USING STOCHASTIC INFORMATION TO PREDICT APPLICATION BEHAVIOR ON CONTENDED RESOURCES
International Journal of Foundations of Computer Science, 2001
On the self-similar nature of Ethernet traffic (extended version)
IEEE/ACM Transactions on Networking, 1994
Distributed parallel data storage systems
Published by Association for Computing Machinery (ACM) ,1994