An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
Top Cited Papers
Open Access
- 21 December 2010
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 11 (S12) , S1
- https://doi.org/10.1186/1471-2105-11-s12-s1
Abstract
Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce.Keywords
This publication has 24 references indexed in Scilit:
- Community-driven computational biology with Debian LinuxBMC Bioinformatics, 2010
- Hybrid cloud and cluster computing paradigms for life science applicationsBMC Bioinformatics, 2010
- SeqWare Query Engine: storing and searching sequence data in the cloudBMC Bioinformatics, 2010
- Cloud-scale RNA-sequencing differential expression analysis with MyrnaGenome Biology, 2010
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Cloud computing and the DNA data raceNature Biotechnology, 2010
- The case for cloud computing in genome informaticsGenome Biology, 2010
- Searching for SNPs with cloud computingGenome Biology, 2009
- CloudBurst: highly sensitive read mapping with MapReduceBioinformatics, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009