The geometry of statistical models for biological sequences

  • 9 November 2003
Abstract
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. The graphical models that have been applied towards these problems include homogeneous hidden Markov models (HMMs) for annotation, tree models for phylogenetics, and pair (or multi) hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems associated with different models. This paper presents a unified mathematical framework for the underlying statistical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, specific sequences generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. The question addressed here is how the solution to various inference problems depends on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry.

This publication has 0 references indexed in Scilit: