Using background knowledge to improve inductive learning of DNA sequences

17 December 2002

proceedings article
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 351-357
https://doi.org/10.1109/caia.1994.323654

Abstract

Successful inductive learning requires that training data be expressed in a form where underlying regularities can be recognized by the learning system. Unfortunately, many applications of inductive learning-especially in the domain of molecular biology-have assumed that data are provided in a form already suitable for learning, whether or not such an assumption is actually justified. This paper describes the use of background knowledge of molecular biology to re-express data into a form more appropriate for learning. Our results show dramatic improvements in classification accuracy for two very different classes of DNA sequences using traditional “off-the-sheIf” decision-tree and neural-network inductive-learning methods

Keywords

This publication has 17 references indexed in Scilit:

Compilation ofE.colimRNA promoter sequences
Nucleic Acids Research, 1993
Effect of neural network input span on phoneme classification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1990
Neural Network Models for Promoter Recognition
Journal of Biomolecular Structure and Dynamics, 1989
Analysis of the occurrence of promoter-sites in DNA
Nucleic Acids Research, 1986
Rigorous pattern-recognition methods for DNA sequences
Journal of Molecular Biology, 1985
Periodic Structurally Similar Oligomers are Found on One Side of the Axes of Symmetry in the lac, trp, and gal Operators
Journal of Biomolecular Structure and Dynamics, 1984
Escherichia colipromoter sequences predictin vitroRNA polymerase selectivity
Nucleic Acids Research, 1984
Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at the exon-intron boundaries.
Proceedings of the National Academy of Sciences, 1978
MODEL-DIRECTED LEARNING OF PRODUCTION RULES11This work was supported by the Advanced Research Projects Agency under contract DAHC 15-73-C-0435, and by the National Institutes of Health under grant RR 00612–07.
Published by Elsevier ,1978
On the statistical significance of primary structural features found in DNA-protein interaction sites
Nucleic Acids Research, 1975