A linear memory algorithm for Baum-Welch training

Preprint

11 May 2005

preprint
Published by arXiv in arXiv

Abstract

Background: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models. Until know, no memory-efficient algorithm was available. Methods and results: We introduce a linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, an input sequence of length L and |T| transition and |E| emission parameters, it requires O(M) memory and O(L(|T|+|E|)|T|) time rather than O(LM) memory and O(L|T|+L(|T|+|E|)) time. For a pair hidden Markov model with input sequences L_x and L_y, the requirement of O(L_xL_yM) memory and O(L_xL_y|T|+L_xL_y(|T|+|E|)) time is reduced to O(min{L_x,L_y}M) memory and O(L_xL_y(|T|+|E|)|T|) time. For hidden Markov models used on long sequences, for example in gene finding, this means a huge memory saving compared to the best available current algorithms which need O(sqrt(L)M) (hidden Markov models) or O(sqrt(L_x)L_yM) (pair hidden Markov models) memory. Additionally, our new algorithm is very simple in the sense that its implementation does not require sophisticated programming techniques like checkpoints or recursive functions. Conclusions: Using our algorithm, Baum-Welch training can now be performed on standard personal computers, even for sophisticated pair hidden Markov models and very long biological sequences.

Keywords

All Related Versions

This publication has 0 references indexed in Scilit: