The Hadoop Distributed File System
Top Cited Papers
- 1 May 2010
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 2160195X,p. 1-10
- https://doi.org/10.1109/msst.2010.5496972
Abstract
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.Keywords
This publication has 7 references indexed in Scilit:
- The life and times of a zookeeperPublished by Association for Computing Machinery (ACM) ,2009
- GFS: Evolution on Fast-forwardQueue, 2009
- Pro HadoopPublished by Springer Nature ,2009
- The Google file systemPublished by Association for Computing Machinery (ACM) ,2003
- Naming policies in the Spring systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The per-process view of naming and remote executionIEEE Parallel & Distributed Technology: Systems & Applications, 1993
- The use of name spaces in Plan 9ACM SIGOPS Operating Systems Review, 1993