Statistical correlation between protein secondary structure and messenger RNA stem‐loop structure

Abstract
A new integrated sequence‐structure database, called IADE (Integrated ASTRAL‐DSSP‐EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem‐loop frequencies and protein secondary structure. It was found that the α‐helices and β‐strands on proteins tend to be preferably “coded” by mRNA stem region, while the coils on proteins tend to be preferably “coded” by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four‐amino‐acid‐fragment that shows the pronounced secondary structural (α‐helix or β‐strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect. © 2003 Wiley Periodicals, Inc. Biopolymers 73: 16–26, 2004