Evolutionary Fingerprinting of Genes
Open Access
- 28 October 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 27 (3) , 520-536
- https://doi.org/10.1093/molbev/msp260
Abstract
Over time, natural selection molds every gene into a unique mosaic of sites evolving rapidly or resisting change—an “evolutionary fingerprint” of the gene. Aspects of this evolutionary fingerprint, such as the site-specific ratio of nonsynonymous to synonymous substitution rates (dN/dS), are commonly used to identify genetic features of potential biological interest; however, no framework exists for comparing evolutionary fingerprints between genes. We hypothesize that protein-coding genes with similar protein structure and/or function tend to have similar evolutionary fingerprints and that comparing evolutionary fingerprints can be useful for discovering similarities between genes in a way that is analogous to, but independent of, discovery of similarity via sequence-based comparison tools such as Blast. To test this hypothesis, we develop a novel model of coding sequence evolution that uses a general bivariate discrete parameterization of the evolutionary rates. We show that this approach provides a better fit to the data using a smaller number of parameters than existing models. Next, we use the model to represent evolutionary fingerprints as probability distributions and present a methodology for comparing these distributions in a way that is robust against variations in data set size and divergence. Finally, using sequences of three rapidly evolving RNA viruses (HIV-1, hepatitis C virus, and influenza A virus), we demonstrate that genes within the same functional group tend to have similar evolutionary fingerprints. Our framework provides a sound statistical foundation for efficient inference and comparison of evolutionary rate patterns in arbitrary collections of gene alignments, clustering homologous and nonhomologous genes, and investigation of biological and functional correlates of evolutionary rates.Keywords
This publication has 44 references indexed in Scilit:
- The Influenza Virus Resource at the National Center for Biotechnology InformationJournal of Virology, 2008
- Evolutionary Model Selection with a Genetic Algorithm: A Case Study Using Stem RNAMolecular Biology and Evolution, 2006
- Adaptation to Different Human Populations by HIV-1 Revealed by Codon-Based AnalysesPLoS Computational Biology, 2006
- Positive Selection, Relaxation, and Acceleration in the Evolution of the Human and Chimp GenomePLoS Computational Biology, 2006
- A Dirichlet process model for detecting positive selection in protein-coding DNA sequencesProceedings of the National Academy of Sciences, 2006
- Hearing silence: non-neutral evolution at synonymous sites in mammalsNature Reviews Genetics, 2006
- A Scan for Positively Selected Genes in the Genomes of Humans and ChimpanzeesPLoS Biology, 2005
- A Genetic Algorithm Approach to Detecting Lineage-Specific Variation in Selection PressureMolecular Biology and Evolution, 2004
- Ancient Adaptive Evolution of the Primate Antiviral DNA-Editing Enzyme APOBEC3GPLoS Biology, 2004
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997