Probability models for genome rearrangement and linear invariants for phylogenetic inference

Abstract
We review the combinatorial optimization problems in cal- culating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some tree-building methods, we explore the probabiization of genome rearrangment models prior to developing a methodology based on branch-length invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for inversions on un- signed circular genomes and, using a non-trivial recurrence relation, inversions on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of ho mologous gene sequences can be used to derive a complete set of linear invariants for unsigned inversions, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition nor pure signed inversion mod- els. The invariants are based on an extended Jukes-Cantor semigroup. We ilhrstrate the use of these invariants to re- late mitochondrial genomes from a number of invertebrate animals.