Improving parallel data transfer times using predicted variances in shared networks

Abstract
It is increasingly common to use multiple distributed storage systems as a single data store within which large datasets may be replicated. Thus, we face the problem of how to access replicated data efficiently. Multiple-source parallel transfers can reduce access times by transferring data from several replicas in parallel. However, we then face the problem of deciding which data to fetch from which replicas. We propose a Tuned Conservative scheduling technique that uses predicted means and variances for network performance to make data selection decisions. This stochastic scheduling technique adjusts the amount of data fetched on a link according to not only the link performance but the expected variance in that performance. We incorporate our technique into the striped GridFTP server from the Globus Toolkit, and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.

This publication has 18 references indexed in Scilit: