The Hadoop Distributed File System

Top Cited Papers

1 May 2010

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 2160195X,p. 1-10
https://doi.org/10.1109/msst.2010.5496972

Abstract

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

Keywords

This publication has 7 references indexed in Scilit:

The life and times of a zookeeper
Published by Association for Computing Machinery (ACM) ,2009
GFS: Evolution on Fast-forward
Queue, 2009
Pro Hadoop
Published by Springer Nature ,2009
The Google file system
Published by Association for Computing Machinery (ACM) ,2003
Naming policies in the Spring system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The per-process view of naming and remote execution
IEEE Parallel & Distributed Technology: Systems & Applications, 1993
The use of name spaces in Plan 9
ACM SIGOPS Operating Systems Review, 1993