Protein folds and families: sequence and structure alignments

Open Access

1 January 1999

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 27 (1) , 244-247
https://doi.org/10.1093/nar/27.1.244

Abstract

Dali and HSSP are derived databases organizing protein space in the structurally known regions. We use an automatic structure alignment program (Dali) for the classification of all known 3D structures based on all-against-all comparison of 3D structures in the Protein Data Bank. The HSSP database associates 1D sequences with known 3D structures using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). As a result, the HSSP database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 36% of all sequences in Swiss-Prot. The structure classification by Dali and the sequence families in HSSP can be browsed jointly from a web interface providing a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences. In particular, this results in a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The organization of protein structures and families provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The databases are available from http://www.embl-ebi.ac.uk/dali/

Keywords

This publication has 18 references indexed in Scilit:

Dictionary of recurrent domains in protein structures
Proteins-Structure Function and Bioinformatics, 1998
CAMPASS: a database of structurally aligned protein superfamilies
Structure, 1998
Touring protein fold space with Dali/FSSP
Nucleic Acids Research, 1998
The HSSP database of protein structure-sequence alignments and family profiles
Nucleic Acids Research, 1998
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997
Mapping the Protein Universe
Science, 1996
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
Journal of Molecular Biology, 1995
Fast and simple monte carlo algorithm for side chain optimization in proteins: Application to model building by homology
Proteins-Structure Function and Bioinformatics, 1992
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977