Abstract
The Gene3D release 4 database and web portal (http://cathwww.biochem.ucl.ac.uk:8080/Gene3D) provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives-including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository formanually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein-protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.