Using Database Matches with HMMGene for Automated Gene Detection in Drosophila
Open Access
- 1 April 2000
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 10 (4) , 523-528
- https://doi.org/10.1101/gr.10.4.523
Abstract
The application of the gene finder HMMGene to theAdh region of the Drosophila melanogaster is described, and the prediction results are analyzed.HMMGene is based on a probabilistic model called a hidden Markov model, and the probabilistic framework facilitates the inclusion of database matches of varying degrees of certainty. It is shown that database matches clearly improve the performance of the gene finder. For instance, the sensitivity for coding exons predicted with both ends correct grows from 62% to 70% on a high-quality test set, when matches to proteins, cDNAs, repeats, and transposons are included. The specificity drops more than the sensitivity increases when ESTs are used. This is due to the high noise level in EST matches, and it is discussed in more detail why this is and how it might be improved.Keywords
This publication has 13 references indexed in Scilit:
- Genie—Gene Finding in Drosophila melanogasterGenome Research, 2000
- Genome Annotation Assessment in Drosophila melanogasterGenome Research, 2000
- GeneMark.hmm: new solutions for gene findingNucleic Acids Research, 1998
- The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998Nucleic Acids Research, 1998
- An introduction to hidden Markov models for biological sequencesPublished by Elsevier ,1998
- Gene Finding: Putting the Parts TogetherPublished by Elsevier ,1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Finding Genes in DNA with a Hidden Markov ModelJournal of Computational Biology, 1997
- A hidden Markov model that finds genes inE.coliDNANucleic Acids Research, 1994