Protein folds and families: sequence and structure alignments
Open Access
- 1 January 1999
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 27 (1) , 244-247
- https://doi.org/10.1093/nar/27.1.244
Abstract
Dali and HSSP are derived databases organizing protein space in the structurally known regions. We use an automatic structure alignment program (Dali) for the classification of all known 3D structures based on all-against-all comparison of 3D structures in the Protein Data Bank. The HSSP database associates 1D sequences with known 3D structures using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). As a result, the HSSP database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 36% of all sequences in Swiss-Prot. The structure classification by Dali and the sequence families in HSSP can be browsed jointly from a web interface providing a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences. In particular, this results in a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The organization of protein structures and families provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The databases are available from http://www.embl-ebi.ac.uk/dali/Keywords
This publication has 18 references indexed in Scilit:
- Dictionary of recurrent domains in protein structuresProteins-Structure Function and Bioinformatics, 1998
- CAMPASS: a database of structurally aligned protein superfamiliesStructure, 1998
- Touring protein fold space with Dali/FSSPNucleic Acids Research, 1998
- The HSSP database of protein structure-sequence alignments and family profilesNucleic Acids Research, 1998
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Mapping the Protein UniverseScience, 1996
- SCOP: a structural classification of proteins database for the investigation of sequences and structures.Journal of Molecular Biology, 1995
- Fast and simple monte carlo algorithm for side chain optimization in proteins: Application to model building by homologyProteins-Structure Function and Bioinformatics, 1992
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977