Finding Genes in DNA with a Hidden Markov Model
- 1 January 1997
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 4 (2) , 127-141
- https://doi.org/10.1089/cmb.1997.4.127
Abstract
This study describes a new Hidden Markov Model (HMM) system for segmenting uncharacterized genomic DNA sequences into exons, introns, and intergenic regions. Separate HMM modules were designed and trained for specific regions of DNA: exons, introns, intergenic regions, and splice sites. The models were then tied together to form a biologically feasible topology. The integrated HMM was trained further on a set of eukaryotic DNA sequences and tested by using it to segment a separate set of sequences. The resulting HMM system which is called VEIL (Viterbi Exon-Intron Locator), obtains an overall accuracy on test data of 92% of total bases correctly labelled, with a correlation coefficient of 0.73. Using the more stringent test of exact exon prediction, VEIL correctly located both ends of 53% of the coding exons, and 49% of the exons it predicts are exactly correct. These results compare favorably to the best previous results for gene structure prediction and demonstrate the benefits of using HMMs for this problem.Keywords
This publication has 15 references indexed in Scilit:
- Improved splice site detection in GeniePublished by Association for Computing Machinery (ACM) ,1997
- Evaluation of Gene Structure Prediction ProgramsGenomics, 1996
- Hidden Markov models of biological primary sequence information.Proceedings of the National Academy of Sciences, 1994
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994
- A hidden Markov model that finds genes inE.coliDNANucleic Acids Research, 1994
- Assessment of protein coding measuresNucleic Acids Research, 1992
- Prediction of gene structureJournal of Molecular Biology, 1992
- Hidden Markov chains and the analysis of genome structureComputers & Chemistry, 1992
- The prediction of exons through an analysis of spliceable open reading framesNucleic Acids Research, 1992
- [16] Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome projectPublished by Elsevier ,1990