Likelihood Analysis of Phylogenetic Networks Using Directed Graphical Models

Open Access

1 June 2000

journal article
research article
Published by Oxford University Press (OUP) in Molecular Biology and Evolution

Vol. 17 (6) , 875-881
https://doi.org/10.1093/oxfordjournals.molbev.a026367

Abstract

A method for computing the likelihood of a set of sequences assuming a phylogenetic network as an evolutionary hypothesis is presented. The approach applies directed graphical models to sequence evolution on networks and is a natural generalization of earlier work by Felsenstein on evolutionary trees, including it as a special case. The likelihood computation involves several steps. First, the phylogenetic network is rooted to form a directed acyclic graph (DAG). Then, applying standard models for nucleotide/amino acid substitution, the DAG is converted into a Bayesian network from which the joint probability distribution involving all nodes of the network can be directly read. The joint probability is explicitly dependent on branch lengths and on recombination parameters (prior probability of a parent sequence). The likelihood of the data assuming no knowledge of hidden nodes is obtained by marginalization, i.e., by summing over all combinations of unknown states. As the number of terms increases exponentially with the number of hidden nodes, a Markov chain Monte Carlo procedure (Gibbs sampling) is used to accurately approximate the likelihood by summing over the most important states only. Investigating a human T-cell lymphotropic virus (HTLV) data set and optimizing both branch lengths and recombination parameters, we find that the likelihood of a corresponding phylogenetic network outperforms a set of competing evolutionary trees. In general, except for the case of a tree, the likelihood of a network will be dependent on the choice of the root, even if a reversible model of substitution is applied. Thus, the method also provides a way in which to root a phylogenetic network by choosing a node that produces a most likely network.

Keywords

This publication has 32 references indexed in Scilit:

Mix and Match in the Tree of Life
Science, 1999
Molecular Genetic Analysis of Remains of a 2,000-Year-Old Human Population in China—and Its Relevance for the Origin of the Modern Japanese Population
American Journal of Human Genetics, 1999
Learning probabilistic networks
The Knowledge Engineering Review, 1998
Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies
Molecular Biology and Evolution, 1996
A guide to the literature on learning probabilistic networks from data
IEEE Transactions on Knowledge and Data Engineering, 1996
Sampling-Based Approaches to Calculating Marginal Densities
Journal of the American Statistical Association, 1990
Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea
Journal of Molecular Evolution, 1989
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981
Consensus-trees
Bulletin of Mathematical Biology, 1981
Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process
Journal of Theoretical Biology, 1974