The CATH extended protein‐family database: Providing structural annotations for genome sequences
Open Access
- 1 February 2002
- journal article
- Published by Wiley in Protein Science
- Vol. 11 (2) , 233-244
- https://doi.org/10.1110/ps.16802
Abstract
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.Keywords
This publication has 31 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologuesProtein Engineering, Design and Selection, 2000
- The Protein Data BankNucleic Acids Research, 2000
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Homology-based fold predictions for Mycoplasma genitalium proteinsJournal of Molecular Biology, 1998
- Domain assignment for protein structures using a consensus approach: Characterization and analysisProtein Science, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- One thousand families for the molecular biologistNature, 1992
- Protein structure alignmentJournal of Molecular Biology, 1989