SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments
- 1 January 2002
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 30 (1) , 268-272
- https://doi.org/10.1093/nar/30.1.268
Abstract
The SUPERFAMILY database contains a library of hidden Markov models representing all proteins of known structure. The database is based on the SCOP 'superfamily' level of protein domain classification which groups together the most distantly related proteins which have a common evolutionary ancestor. There is a public server at http://supfam.org which provides three services: sequence searching, multiple alignments to sequences of known structure, and structural assignments to all complete genomes. Given an amino acid or nucleotide query sequence the server will return the domain architecture and SCOP classification. The server produces alignments of the query sequences with sequences of known structure, and includes multiple alignments of genome and PDB sequences. The structural assignments are carried out on all complete genomes (currently 59) covering approximately half of the soluble protein domains. The assignments, superfamily breakdown and statistics on them are available from the server. The database is currently used by this group and others for genome annotation, structural genomics, gene prediction and domain-based genomic studies.Keywords
This publication has 10 references indexed in Scilit:
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coliJournal of Molecular Biology, 2001
- Domain combinations in archaeal, eubacterial and eukaryotic proteomesJournal of Molecular Biology, 2001
- InterPro—an integrated documentation resource for protein families, domains and functional sitesBioinformatics, 2000
- The Pfam Protein Families DatabaseNucleic Acids Research, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Hidden Markov models for detecting remote protein homologies.Bioinformatics, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- SCOP: a structural classification of proteins database for the investigation of sequences and structures.Journal of Molecular Biology, 1995