Using background knowledge to improve inductive learning of DNA sequences

Abstract
Successful inductive learning requires that training data be expressed in a form where underlying regularities can be recognized by the learning system. Unfortunately, many applications of inductive learning-especially in the domain of molecular biology-have assumed that data are provided in a form already suitable for learning, whether or not such an assumption is actually justified. This paper describes the use of background knowledge of molecular biology to re-express data into a form more appropriate for learning. Our results show dramatic improvements in classification accuracy for two very different classes of DNA sequences using traditional “off-the-sheIf” decision-tree and neural-network inductive-learning methods