Data-Intensive Text Processing with MapReduce
Top Cited Papers
- 1 January 2010
- journal article
- Published by Springer Nature in Synthesis Lectures on Human Language Technologies
- Vol. 3 (1) , 1-177
- https://doi.org/10.2200/s00274ed1v01y201006hlt007
Abstract
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce d...Keywords
This publication has 104 references indexed in Scilit:
- Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit ChaskiThe Prague Bulletin of Mathematical Linguistics, 2010
- GFS: Evolution on Fast-forwardQueue, 2009
- Web page classificationACM Computing Surveys, 2009
- A break in the cloudsACM SIGCOMM Computer Communication Review, 2008
- DryadACM SIGOPS Operating Systems Review, 2007
- Statistical mechanics of complex networksReviews of Modern Physics, 2002
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Scale and performance in a distributed file systemACM Transactions on Computer Systems, 1988
- Implementation techniques for main memory database systemsACM SIGMOD Record, 1984
- The Strength of Weak Ties: A Network Theory RevisitedSociological Theory, 1983