An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

Top Cited Papers

Open Access

21 December 2010

journal article
Published by Springer Nature in BMC Bioinformatics

Vol. 11 (S12) , S1
https://doi.org/10.1186/1471-2105-11-s12-s1

Abstract

Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce.

Keywords

This publication has 24 references indexed in Scilit:

Community-driven computational biology with Debian Linux
BMC Bioinformatics, 2010
Hybrid cloud and cluster computing paradigms for life science applications
BMC Bioinformatics, 2010
SeqWare Query Engine: storing and searching sequence data in the cloud
BMC Bioinformatics, 2010
Cloud-scale RNA-sequencing differential expression analysis with Myrna
Genome Biology, 2010
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Genome Research, 2010
Cloud computing and the DNA data race
Nature Biotechnology, 2010
The case for cloud computing in genome informatics
Genome Biology, 2010
Searching for SNPs with cloud computing
Genome Biology, 2009
CloudBurst: highly sensitive read mapping with MapReduce
Bioinformatics, 2009
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Genome Biology, 2009