Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
Top Cited Papers
Open Access
- 1 January 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (19) , 6083-6089
- https://doi.org/10.1093/nar/gki892
Abstract
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.Keywords
This publication has 39 references indexed in Scilit:
- MagicMatch--cross-referencing sequence identifiers across databasesBioinformatics, 2005
- The complete genome sequence of Francisella tularensis, the causative agent of tularemiaNature Genetics, 2005
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Metabolic database systems for the analysis of genome‐wide functionBiotechnology & Bioengineering, 2003
- The Phylogenetic Extent of Metabolic Enzymes and PathwaysGenome Research, 2003
- The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and communityNucleic Acids Research, 2003
- The Pathway Tools softwareBioinformatics, 2002
- Functional Versatility and Molecular Diversity of the Metabolic Map of Escherichia coliGenome Research, 2001
- MultiFun, a Multifunctional Classification Scheme forEscherichia coliK-12 Gene ProductsMicrobial & Comparative Genomics, 2000
- Integrated pathway–genome databases and their role in drug discoveryTrends in Biotechnology, 1999