A Dictionary-Based Approach for Gene Annotation
- 1 October 1999
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 6 (3-4) , 419-430
- https://doi.org/10.1089/106652799318364
Abstract
This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.Keywords
This publication has 17 references indexed in Scilit:
- Computational gene identification: an open problemComputers & Chemistry, 1997
- A Tool for Analyzing and Annotating Genomic SequencesGenomics, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Finding Genes in DNA with a Hidden Markov ModelJournal of Computational Biology, 1997
- Sequence Alignment with Tandem DuplicationJournal of Computational Biology, 1997
- Attenuated function of a variant form of the helix‐loop‐helix protein, Id‐3, generated by an alternative splicing mechanismFEBS Letters, 1996
- Gene recognition via spliced sequence alignment.Proceedings of the National Academy of Sciences, 1996
- Evaluation of Gene Structure Prediction ProgramsGenomics, 1996
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990