Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems
- 1 April 2002
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 9 (2) , 389-399
- https://doi.org/10.1089/10665270252935520
Abstract
Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA–cDNA and DNA–protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.Keywords
This publication has 20 references indexed in Scilit:
- Conservation, Regulation, Synteny, and Introns in a Large-scale C. briggsae–C. elegans Genomic AlignmentGenome Research, 2000
- Human and Mouse Gene Structure: Comparative Analysis and Application to Exon PredictionGenome Research, 2000
- Using GeneWise in the Drosophila Annotation ExperimentGenome Research, 2000
- Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gapsBioinformatics, 2000
- Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene PairsGenome Research, 1999
- A Computer Program for Aligning a cDNA Sequence with a Genomic DNA SequenceGenome Research, 1998
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Finding Genes in DNA with a Hidden Markov ModelJournal of Computational Biology, 1997
- Gene recognition via spliced sequence alignment.Proceedings of the National Academy of Sciences, 1996
- Stochastic models for heterogeneous DNA sequencesBulletin of Mathematical Biology, 1989