The Ensembl Computing Architecture

Open Access

3 May 2004

journal article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 14 (5) , 971-975
https://doi.org/10.1101/gr.1866304

Abstract

Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.

Keywords

This publication has 6 references indexed in Scilit:

A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase
Published by Elsevier ,2004
Identification of common molecular subsequences
Published by Elsevier ,2004
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Basic local alignment search tool
Journal of Molecular Biology, 1990
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970