Nearly tight bounds on the learnability of evolution

22 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 02725428,p. 524-533
https://doi.org/10.1109/sfcs.1997.646141

Abstract

Evolution is often modeled as a stochastic process which modifies DNA. One of the most popular and successful such processes are the Cavender-Farris (CF) trees, which are represented as edge weighted trees. The Phylogeny Construction Problem is that of, given /spl kappa/ samples drawn from a CF tree, output a CF tree which is close to the original. Each CF tree naturally defines a random variable, and the gold standard for reconstructing such trees is the maximum likelihood estimator of this variable. This approach is notoriously computationally expensive. We show that a very simple algorithm, which is a variant on one of the most popular algorithms used by practitioners, converges on the true tree at a rate which differs from the optimum by a constant. We do this by analyzing upper and lower bounds for the convergence rate of learning very simple CF trees, and then show that the learnability of each CF tree is sandwiched between two such simpler trees. Our results rely on the fact that, if the right metric is used, the likelihood space of CF trees is smooth.

Keywords

This publication has 4 references indexed in Scilit:

On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)
SIAM Journal on Computing, 1998
Efficient algorithms for inverting evolution
Published by Association for Computing Machinery (ACM) ,1996
Recovering a tree from the leaf colourations it generates under a Markov model
Applied Mathematics Letters, 1994
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA
Journal of Molecular Evolution, 1985