Learning Multiple Evolutionary Pathways from Cross-Sectional Data

We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.

This publication has 31 references indexed in Scilit: