Assessment of the probabilities for evolutionary structural changes in protein folds
Open Access
- 4 February 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (7) , 832-841
- https://doi.org/10.1093/bioinformatics/btm022
Abstract
Motivation: The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. Results: We have tried to assess the comparative probabilities for a number of known structural changes, and to relate the probabilities of such changes with the distance between protein sequences. We have formalized these structural changes using a topological representation of structures (TOPS), and have developed an algorithm for measuring structural distances that involve few evolutionary steps. The probabilities of structural changes then were estimated on the basis of all-against-all comparisons of the sequence and structure of protein domains from the CATH-95 representative set. The results obtained are reasonably consistent for a number of different data subsets and permit the identification of several ‘most popular’ types of evolutionary changes in protein structure. The results also suggest that alterations in protein structure are more likely to occur when the sequence similarity is >10% (the average similarity being ∼6% for the data sets employed in this study), and that the distribution of probabilities of structural changes is fairly uniform within the interval of 15–50% sequence similarity. Availability: The algorithms have been implemented on the Windows operating system in C++ and using the Borland Visual Component Library. The source code is available on request from the first author. The data sets used for this study (representative sets of protein domains, matrices of sequence similarities and structural distances) are available on http://bioinf.mii.lu.lv/epsrc_project/struct_ev.html. Contact:juris.viksna@mii.lu.lvKeywords
This publication has 19 references indexed in Scilit:
- Evolution of new protein topologies through multistep gene rearrangementsNature Genetics, 2006
- Rapid motif-based prediction of circular permutations in multi-domain proteinsBioinformatics, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Finding evolutionary relations beyond superfamilies: Fold‐based superfamiliesProtein Science, 2003
- Recursive domains in proteinsProtein Science, 2002
- Circularly permuted proteins in the protein structure databaseProtein Science, 2001
- Fold Change in Evolution of Protein StructuresJournal of Structural Biology, 2001
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- Protein alchemy: Changing β-sheet into α-helixNature Structural & Molecular Biology, 1997
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983