The Ensembl Analysis Pipeline
Open Access
- 3 May 2004
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 14 (5) , 934-941
- https://doi.org/10.1101/gr.1859804
Abstract
The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules (“Runnables” and “RunnableDBs”) which are `wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the “RuleManager”) which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.Keywords
This publication has 17 references indexed in Scilit:
- The Ensembl Automatic Gene Annotation SystemGenome Research, 2004
- The Ensembl Core Software Libraries: Figure 1Genome Research, 2004
- The Ensembl Computing ArchitectureGenome Research, 2004
- Biopipe: A Flexible Framework for Protocol-Based Bioinformatics AnalysisGenome Research, 2003
- The InterPro Database, 2003 brings increased coverage and new featuresNucleic Acids Research, 2003
- The Human Genome Browser at UCSCGenome Research, 2002
- Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNAGenome Research, 2002
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. CohenJournal of Molecular Biology, 2001
- A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteinsComputers & Chemistry, 2000
- tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic SequenceNucleic Acids Research, 1997