Gene3D: merging structure and function for a Thousand genomes

Open Access

11 November 2009

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 38 (suppl_1) , D296-D300
https://doi.org/10.1093/nar/gkp987

Abstract

Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.

Keywords

This publication has 27 references indexed in Scilit:

KEGG for linking genomes to life and the environment
Nucleic Acids Research, 2007
SIMAP structuring the network of protein similarities
Nucleic Acids Research, 2007
eggNOG: automated construction and annotation of orthologous groups of genes
Nucleic Acids Research, 2007
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Research, 2007
IntAct--open source resource for molecular interaction data
Nucleic Acids Research, 2006
MINT: the Molecular INTeraction database
Nucleic Acids Research, 2006
PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways
Nucleic Acids Research, 2006
The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data
Nucleic Acids Research, 2006
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology, 2001