Protein simple sequence conservation
- 23 January 2004
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 54 (4) , 629-638
- https://doi.org/10.1002/prot.10623
Abstract
Protein simple sequences, a subset of low-complexity sequences, are regions of sequence highly enriched in one or a few residue types. Simple sequences are exceedingly common, the average being more than one per protein sequence. Despite being so common, such sequences are not well-studied. The simple sequences that have been subjected to detailed study are often found to possess important functions. Here we present a survey of protein simple sequences, generally enriched in a single residue type, with the aim of studying their conservation. We find that the majority of such simple sequences are not conserved. However, conserved protein simple sequences are relatively common, with ∼11% of the surveyed protein families possessing a conserved simple sequence. The data obtained in this study support the idea that simple sequences are conserved for functional reasons. Such functions can range from substrate binding, to mediating protein-protein interactions, to structural integrity. A perhaps surprising finding is that the residue enriching a conserved simple sequence is itself not necessarily conserved. Neither is the length of many of the highly conserved simple sequences. In the few cases where structural and functional data is available it is found that the conserved simple sequences are consistent with both local structure and function. The data presented support the idea that protein simple sequences can be conserved and have important roles in protein structure and function. Proteins 2004;54:000–000.Keywords
Funding Information
- National Science Foundation (MCB0110720)
This publication has 58 references indexed in Scilit:
- Abundance and Distributions of Eukaryote Protein Simple SequencesMolecular & Cellular Proteomics, 2002
- Evolution of Simple Sequence in ProteinsJournal of Molecular Evolution, 2000
- The importance of being proline: the interaction of proline‐rich motifs in signaling proteins with their cognate domainsThe FASEB Journal, 2000
- A census of protein repeatsJournal of Molecular Biology, 1999
- Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development.Proceedings of the National Academy of Sciences, 1996
- [33] Analysis of compositionally biased regions in sequence databasesPublished by Elsevier ,1996
- Non-globular domains in protein sequences: Automated segmentation using complexity measuresComputers & Chemistry, 1994
- Transcriptional Activation Modulated by Homopolymeric Glutamine and Proline StretchesScience, 1994
- The structure and function of proline-rich regions in proteinsBiochemical Journal, 1994
- Chance and Statistical Significance in Protein and DNA Sequence AnalysisScience, 1992