The Ensembl Computing Architecture
Open Access
- 3 May 2004
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 14 (5) , 971-975
- https://doi.org/10.1101/gr.1866304
Abstract
Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.Keywords
This publication has 6 references indexed in Scilit:
- A rapid method for determining sequences in DNA by primed synthesis with DNA polymerasePublished by Elsevier ,2004
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970