Rule-Based Neural Networks for Classification and Probability Estimation
- 1 November 1992
- journal article
- Published by MIT Press in Neural Computation
- Vol. 4 (6) , 781-804
- https://doi.org/10.1162/neco.1992.4.6.781
Abstract
In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied in the network is explicitly encoded in the form of understandable rules. This enables the network's decision to be understood, and provides an audit trail of how that decision was arrived at. We utilize an information theoretic approach to learning a model of the domain knowledge from examples. This model takes the form of a set of probabilistic conjunctive rules between discrete input evidence variables and output class variables. These rules are then mapped onto the weights and nodes of a feedforward neural network resulting in a directly specified architecture. The network acts as parallel Bayesian classifier, but more importantly, can also output posterior probability estimates of the class variables. Empirical tests on a number of data sets show that the rule-based classifier performs comparably with standard neural network classifiers, while possessing unique advantages in terms of knowledge representation and probability estimation.Keywords
This publication has 9 references indexed in Scilit:
- An information theoretic approach to rule induction from databasesIEEE Transactions on Knowledge and Data Engineering, 1992
- Multisurface method of pattern separation for medical diagnosis applied to breast cytology.Proceedings of the National Academy of Sciences, 1990
- Self-organizing network for optimum supervised learningIEEE Transactions on Neural Networks, 1990
- Stochastic ComplexityJournal of the Royal Statistical Society Series B: Statistical Methodology, 1987
- Universal coding, information, prediction, and estimationIEEE Transactions on Information Theory, 1984
- Inductive Inference: Theory and MethodsACM Computing Surveys, 1983
- Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropyIEEE Transactions on Information Theory, 1980
- The amount of information that y gives about XIEEE Transactions on Information Theory, 1968
- The design of conditional probability computersInformation and Control, 1959