Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions

Open Access

1 May 1995

journal article
research article
Published by Wiley in Protein Science

Vol. 4 (5) , 872-884
https://doi.org/10.1002/pro.5560040507

Abstract

An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.

Keywords

This publication has 30 references indexed in Scilit:

Domain insertion
Protein Engineering, Design and Selection, 1994
Crystal Structure of Glucose Oxidase from Aspergillus niger Refined at 2·3 Å Reslution
Journal of Molecular Biology, 1993
A new approach to protein fold recognition
Nature, 1992
Refined crystal structure of Streptomyces griseus trypsin at 1.7 Å resolution
Journal of Molecular Biology, 1988
Compact units in proteins
Biochemistry, 1986
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
Hierarchic organization of domains in globular proteins
Journal of Molecular Biology, 1979
The tree structural organization of proteins
Journal of Molecular Biology, 1978
Recognition of structural domains in globular proteins
Journal of Molecular Biology, 1974
Nucleation, Rapid Folding, and Globular Intrachain Regions in Proteins
Proceedings of the National Academy of Sciences, 1973