Bayesian Restoration of a Hidden Markov Chain with Applications to DNA Sequencing
- 1 January 1999
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 6 (2) , 261-277
- https://doi.org/10.1089/cmb.1999.6.261
Abstract
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.Keywords
This publication has 32 references indexed in Scilit:
- Rao-Blackwellisation of sampling schemesBiometrika, 1996
- Annealing Markov Chain Monte Carlo with Applications to Ancestral InferenceJournal of the American Statistical Association, 1995
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- Protein classification by stochastic modeling and optimal filtering of amino-acid sequencesMathematical Biosciences, 1994
- An Experimentally Derived Data Set Constructed for Testing Large-Scale DNA Sequence Assembly AlgorithmsGenomics, 1993
- Inference from Iterative Simulation Using Multiple SequencesStatistical Science, 1992
- Maximum likelihood estimation and identification directly from single-channel recordingsProceedings Of The Royal Society B-Biological Sciences, 1992
- Maximum likelihood hidden Markov modeling using a dominant sequence of statesIEEE Transactions on Signal Processing, 1991
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989