A Fresh Look at the Reliability of Long-term Digital Storage

Abstract
Many emerging Web services, such as email, photo sharing, and web site archives, need to preserve large amounts of quickly-accessible data indefinitely into the future. In this paper, we make the case that these applications' demands on large scale storage systems over long time horizons require us to re-evaluate traditional storage system designs. We examine threats to long-lived data from an end-to-end perspective, taking into account not just hardware and software faults but also faults due to humans and organizations. We present a simple model of long-term storage failures that helps us reason about the various strategies for addressing these threats in a cost-effective manner. Using this model we show that the most important strategies for increasing the reliability of long-term storage are detecting latent faults quickly, automating fault repair to make it faster and cheaper, and increasing the independence of data replicas.
All Related Versions