Protein and DNA Sequence Determinants of Thermophilic Adaptation

Abstract
There have been considerable attempts in the past to relate phenotypic trait—habitat temperature of organisms—to their genotypes, most importantly compositions of their genomes and proteomes. However, despite accumulation of anecdotal evidence, an exact and conclusive relationship between the former and the latter has been elusive. We present an exhaustive study of the relationship between amino acid composition of proteomes, nucleotide composition of DNA, and optimal growth temperature (OGT) of prokaryotes. Based on 204 complete proteomes of archaea and bacteria spanning the temperature range from −10 °C to 110 °C, we performed an exhaustive enumeration of all possible sets of amino acids and found a set of amino acids whose total fraction in a proteome is correlated, to a remarkable extent, with the OGT. The universal set is Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL), and the correlation coefficient is as high as 0.93. We also found that the G + C content in 204 complete genomes does not exhibit a significant correlation with OGT (R = −0.10). On the other hand, the fraction of A + G in coding DNA is correlated with temperature, to a considerable extent, due to codon patterns of IVYWREL amino acids. Further, we found strong and independent correlation between OGT and the frequency with which pairs of A and G nucleotides appear as nearest neighbors in genome sequences. This adaptation is achieved via codon bias. These findings present a direct link between principles of proteins structure and stability and evolutionary mechanisms of thermophylic adaptation. On the nucleotide level, the analysis provides an example of how nature utilizes codon bias for evolutionary adaptation to extreme conditions. Together these results provide a complete picture of how compositions of proteomes and genomes in prokaryotes adjust to the extreme conditions of the environment. Prokaryotes living at extreme environmental temperatures exhibit pronounced signatures in the amino acid composition of their proteins and the nucleotide compositions of their genomes, reflective of adaptation to their thermal environments. However, despite significant efforts, the definitive answer of what are the genomic and proteomic compositional determinants of optimal growth temperature (OGT) of prokaryotic organisms remained elusive. Here we performed a comprehensive analysis of amino acid and nucleotide compositional signatures of thermophylic adaptation by exhaustively evaluating all combinations of amino acids and nucleotides as possible determinants of OGT for all prokaryotic organisms with fully sequenced genomes. We discovered that total concentration of seven amino acids in proteomes—IVYWREL—serves as a universal proteomic predictor of OGT in prokaryotes. Resolving the old-standing controversy, we determined that the variation in nucleotide composition (increase of purine load, or A + G content with temperature) is largely a consequence of thermal adaptation of proteins. However, the frequency with which A and G nucleotides appear as nearest neighbors in genome sequences is strongly and independently correlated with OGT as a result of codon bias in corresponding genomes. Together these results provide a complete picture of proteomic and genomic determinants of thermophilic adaptation.