Estimating the tempo and mode of gene family evolution from comparative genomic data
Open Access
- 2 August 2005
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 15 (8) , 1153-1160
- https://doi.org/10.1101/gr.3567505
Abstract
Comparison of whole genomes has revealed that changes in the size of gene families among organisms is quite common. However, there are as yet no models of gene family evolution that make it possible to estimate ancestral states or to infer upon which lineages gene families have contracted or expanded. In addition, large differences in family size have generally been attributed to the effects of natural selection, without a strong statistical basis for these conclusions. Here we use a model of stochastic birth and death for gene family evolution and show that it can be efficiently applied to multispecies genome comparisons. This model takes into account the lengths of branches on phylogenetic trees, as well as duplication and deletion rates, and hence provides expectations for divergence in gene family size among lineages. The model offers both the opportunity to identify large-scale patterns in genome evolution and the ability to make stronger inferences regarding the role of natural selection in gene family expansion or contraction. We apply our method to data from the genomes of five yeast species to show its applicability.Keywords
This publication has 48 references indexed in Scilit:
- A model explaining the size distribution of gene and protein familiesMathematical Biosciences, 2004
- Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiaeNature, 2004
- Maximum Likelihood for Genome Phylogeny on Gene ContentStatistical Applications in Genetics and Molecular Biology, 2004
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Can Genes Explain Biological Complexity?Science, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981