A computational genomics pipeline for prokaryotic sequencing projects

Open Access

2 June 2010

journal article
genome analysis
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (15) , 1819-1826
https://doi.org/10.1093/bioinformatics/btq284

Abstract

Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact:king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 45 references indexed in Scilit:

Reordering contigs of draft genomes using the Mauve Aligner
Bioinformatics, 2009
DIYA: a bacterial annotation pipeline for any genomics lab
Bioinformatics, 2009
Accurate whole human genome sequencing using reversible terminator chemistry
Nature, 2008
Aggressive assembly of pyrosequencing reads with mates
Bioinformatics, 2008
Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis
Proceedings of the National Academy of Sciences, 2008
InterPro and InterProScan
Published by Springer Nature ,2007
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology, 2001
Improved microbial gene identification with GLIMMER
Nucleic Acids Research, 1999
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence
Nucleic Acids Research, 1997