Gene3D: merging structure and function for a Thousand genomes
Open Access
- 11 November 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (suppl_1) , D296-D300
- https://doi.org/10.1093/nar/gkp987
Abstract
Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.Keywords
This publication has 27 references indexed in Scilit:
- KEGG for linking genomes to life and the environmentNucleic Acids Research, 2007
- SIMAP structuring the network of protein similaritiesNucleic Acids Research, 2007
- eggNOG: automated construction and annotation of orthologous groups of genesNucleic Acids Research, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- IntAct--open source resource for molecular interaction dataNucleic Acids Research, 2006
- MINT: the Molecular INTeraction databaseNucleic Acids Research, 2006
- PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathwaysNucleic Acids Research, 2006
- The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB dataNucleic Acids Research, 2006
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. CohenJournal of Molecular Biology, 2001