Rule-Based Neural Networks for Classification and Probability Estimation

1 November 1992

journal article
Published by MIT Press in Neural Computation

Vol. 4 (6) , 781-804
https://doi.org/10.1162/neco.1992.4.6.781

Abstract

In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied in the network is explicitly encoded in the form of understandable rules. This enables the network's decision to be understood, and provides an audit trail of how that decision was arrived at. We utilize an information theoretic approach to learning a model of the domain knowledge from examples. This model takes the form of a set of probabilistic conjunctive rules between discrete input evidence variables and output class variables. These rules are then mapped onto the weights and nodes of a feedforward neural network resulting in a directly specified architecture. The network acts as parallel Bayesian classifier, but more importantly, can also output posterior probability estimates of the class variables. Empirical tests on a number of data sets show that the rule-based classifier performs comparably with standard neural network classifiers, while possessing unique advantages in terms of knowledge representation and probability estimation.

Keywords

This publication has 9 references indexed in Scilit:

An information theoretic approach to rule induction from databases
IEEE Transactions on Knowledge and Data Engineering, 1992
Multisurface method of pattern separation for medical diagnosis applied to breast cytology.
Proceedings of the National Academy of Sciences, 1990
Self-organizing network for optimum supervised learning
IEEE Transactions on Neural Networks, 1990
Stochastic Complexity
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1987
Universal coding, information, prediction, and estimation
IEEE Transactions on Information Theory, 1984
Inductive Inference: Theory and Methods
ACM Computing Surveys, 1983
Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy
IEEE Transactions on Information Theory, 1980
The amount of information that y gives about X
IEEE Transactions on Information Theory, 1968
The design of conditional probability computers
Information and Control, 1959