A linear memory algorithm for Baum-Welch training

Open Access

19 September 2005

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 6 (1) , 231
https://doi.org/10.1186/1471-2105-6-231

Abstract

Background: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses.Results: We introduce the first linear space algorithm for Baum-Welch training. For a hidden Markov model withMstates,Tfree transition andEfree emission parameters, and an input sequence of lengthL, our new algorithm requiresO(M) memory andO(LMT_max(T + E)) time for one Baum-Welch iteration, whereT_maxis the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm withO(log(L)M) memory andO(log(L)LMT_max) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of lengthL, the memory requirement ofO(log(L)L^n-1M) is reduced toO(L^n-1M) memory while the running time is changed fromO(log(L)LⁿMT_max+Lⁿ(T+E)) toO(LⁿMT_max(T+E)).An added advantage of our new algorithm is that a reduced time requirement can be traded for an increased memory requirement andvice versa, such that for anyc∈ {1, ..., (T+E)}, a time requirement ofLⁿMT_maxcincurs a memory requirement ofL^n-1M(T+E-c).Conclusion: For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms.

Keywords

All Related Versions

This publication has 14 references indexed in Scilit:

Gene structure conservation aids similarity based gene prediction
Nucleic Acids Research, 2004
Comparative ab initio prediction of gene structures using pair HMMs
Bioinformatics, 2002
A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure
BMC Bioinformatics, 2002
Initial sequencing and analysis of the human genome
Nature, 2001
Optimal Scaling of Discrete Approximations to Langevin Diffusions
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1998
Reduced space sequence alignment
Bioinformatics, 1997
Hidden Markov Models in Computational Biology
Journal of Molecular Biology, 1994
Optimal alignments in linear space
Bioinformatics, 1988
Optimization by Simulated Annealing
Science, 1983
A linear space algorithm for computing maximal common subsequences
Communications of the ACM, 1975