A Dictionary-Based Approach for Gene Annotation

1 October 1999

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 6 (3-4) , 419-430
https://doi.org/10.1089/106652799318364

Abstract

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.

Keywords

This publication has 17 references indexed in Scilit:

Computational gene identification: an open problem
Computers & Chemistry, 1997
A Tool for Analyzing and Annotating Genomic Sequences
Genomics, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Prediction of complete gene structures in human genomic DNA
Journal of Molecular Biology, 1997
Finding Genes in DNA with a Hidden Markov Model
Journal of Computational Biology, 1997
Sequence Alignment with Tandem Duplication
Journal of Computational Biology, 1997
Attenuated function of a variant form of the helix‐loop‐helix protein, Id‐3, generated by an alternative splicing mechanism
FEBS Letters, 1996
Gene recognition via spliced sequence alignment.
Proceedings of the National Academy of Sciences, 1996
Evaluation of Gene Structure Prediction Programs
Genomics, 1996
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990