Domain assignment for protein structures using a consensus approach: Characterization and analysis
Open Access
- 1 February 1998
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 7 (2) , 233-242
- https://doi.org/10.1002/pro.5560070202
Abstract
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach (using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)) was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically and of these, 13.5% were multi‐domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.Keywords
This publication has 27 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitionsProtein Science, 1995
- Identification and analysis of domains in proteinsProtein Engineering, Design and Selection, 1995
- A procedure for detecting structural domains in proteinsProtein Science, 1995
- A procedure for the automatic determination of hydrophobic cores in protein structuresProtein Science, 1995
- Binary discontinuous compact protein domainsProtein Engineering, Design and Selection, 1994
- MOLSCRIPT: a program to produce both detailed and schematic plots of protein structuresJournal of Applied Crystallography, 1991
- Compact units in proteinsBiochemistry, 1986
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977