Improving parallel data transfer times using predicted variances in shared networks
- 1 January 2005
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2, 734-742 Vol. 2
- https://doi.org/10.1109/ccgrid.2005.1558636
Abstract
It is increasingly common to use multiple distributed storage systems as a single data store within which large datasets may be replicated. Thus, we face the problem of how to access replicated data efficiently. Multiple-source parallel transfers can reduce access times by transferring data from several replicas in parallel. However, we then face the problem of deciding which data to fetch from which replicas. We propose a Tuned Conservative scheduling technique that uses predicted means and variances for network performance to make data selection decisions. This stochastic scheduling technique adjusts the amount of data fetched on a link according to not only the link performance but the expected variance in that performance. We incorporate our technique into the striped GridFTP server from the Globus Toolkit, and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.Keywords
This publication has 18 references indexed in Scilit:
- The Globus Striped GridFTP Framework and ServerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Homeostatic and tendency-based CPU load predictionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Grid Harvest Service: a system for long-term, application-level task schedulingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Dissecting BitTorrent: Five Months in a Torrent’s LifetimePublished by Springer Nature ,2004
- Conservative SchedulingPublished by Association for Computing Machinery (ACM) ,2003
- Using Regression Techniques to Predict Large Data TransfersThe International Journal of High Performance Computing Applications, 2003
- A distributed multi-storage resource architecture and I/O performance prediction for scientific computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- USING STOCHASTIC INFORMATION TO PREDICT APPLICATION BEHAVIOR ON CONTENDED RESOURCESInternational Journal of Foundations of Computer Science, 2001
- On the self-similar nature of Ethernet traffic (extended version)IEEE/ACM Transactions on Networking, 1994
- Distributed parallel data storage systemsPublished by Association for Computing Machinery (ACM) ,1994