Gene Family Evolution by Duplication, Speciation, and Loss

Abstract
We consider two algorithmic questions related to the evolution of gene families. First, given a gene tree for a gene family, can the evolutionary history of this family be explained with only speciation and duplication events? Such gene trees are called DS-trees. We show that this question can be answered in linear time, and that a DS-tree induces a single species tree. We then study a natural extension of this problem: what is the minimum number of gene losses involved in an evolutionary history leading to an observed gene tree or set of gene trees? Based on our characterization of DS-trees, we propose a heuristic for this problem, and evaluate it on a dataset of plants gene families and on simulated data.