Detecting Coevolution in and among Protein Domains
Open Access
- 2 November 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (11) , e211-2134
- https://doi.org/10.1371/journal.pcbi.0030211
Abstract
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. The sequences of different components within and across genes often undergo coordinated changes in order to maintain the structures or functions of the genes. Identifying the coordinated changes—the “coevolution”—of those components in the context of evolution is important in predicting the structures, interactions, and functions of genes. The authors incur a large-scale screening on all the known protein sequences and build a compendium about the coevolving relations of all protein domains—subunits of proteins. The majority of the coevolving protein domains either belongs to the same proteins, appears in the same protein complexes, or shares the same functional annotations. Furthermore, coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes. More strikingly, many coevolving positions are located at functionally important sites of the molecules. The results provide useful insights about the relations between sequence evolution and protein structures and functions.Keywords
This publication has 64 references indexed in Scilit:
- Specificity in protein interactions and its relationship with sequence diversity and coevolutionProceedings of the National Academy of Sciences, 2007
- Identification and Classification of Conserved RNA Secondary Structures in the Human GenomePLoS Computational Biology, 2006
- Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole GenomesPLoS Computational Biology, 2005
- Multiple sequence alignment with the Clustal series of programsNucleic Acids Research, 2003
- Two crystal structures demonstrate large conformational changes in the eukaryotic ribosomal translocaseNature Structural & Molecular Biology, 2003
- The 2.0Å Resolution Structure of the Catalytic Portion of a Cyanobacterial Membrane-bound Manganese Superoxide DismutaseJournal of Molecular Biology, 2002
- Non–coding RNA genes and the modern RNA worldNature Reviews Genetics, 2001
- The Protein Data BankNucleic Acids Research, 2000
- High resolution crystal structure of a Mg2+-dependent porphobilinogen synthaseJournal of Molecular Biology, 1999
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981