Reconciliation with Non-Binary Species Trees
- 1 October 2008
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 15 (8) , 981-1006
- https://doi.org/10.1089/cmb.2008.0092
Abstract
Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|VG | · (kS + hS)) time, where |VG| is the number of nodes in the gene tree, hS is the height of the species tree and kS is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.Keywords
This publication has 61 references indexed in Scilit:
- Gene Family Evolution across 12 Drosophila GenomesPLoS Genetics, 2007
- Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolutionGenome Biology, 2007
- primetv: a viewer for reconciled treesBMC Bioinformatics, 2007
- The Evolution of Mammalian Gene FamiliesPLOS ONE, 2006
- Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage SortingPLoS Genetics, 2006
- Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databasesBioinformatics, 2005
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family TreesJournal of Computational Biology, 2000
- Gene Trees in Species TreesSystematic Biology, 1997
- Reconstruction of Ancient Molecular PhylogenyMolecular Phylogenetics and Evolution, 1996