The relationship of protein conservation and sequence length
Open Access
- 1 November 2002
- journal article
- research article
- Published by Springer Nature in BMC Ecology and Evolution
- Vol. 2 (1) , 1-10
- https://doi.org/10.1186/1471-2148-2-20
Abstract
In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.Keywords
This publication has 20 references indexed in Scilit:
- Selection for short introns in highly expressed genesNature Genetics, 2002
- Essential Genes Are More Evolutionarily Conserved Than Are Nonessential Genes in BacteriaGenome Research, 2002
- Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilisProceedings of the National Academy of Sciences, 2002
- Molecular Chaperones in the Cytosol: from Nascent Chain to Folded ProteinScience, 2002
- Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculiNature, 2001
- On the total number of genes and their length distribution in complete microbial genomesTrends in Genetics, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Biology's new Rosetta stoneNature, 1997
- Threading a database of protein coresProteins-Structure Function and Bioinformatics, 1995
- Mechanisms of spontaneous mutation in DNA repair-proficient Escherichia coliMutation Research - Fundamental and Molecular Mechanisms of Mutagenesis, 1991