Neural Network Classification of Mutagens Using Structural Fragment Data

Abstract
A neural network was applied to a large, structurally heterogeneous data set of mutagens and nonmutagens to investigate structure-property relationships. Substructural data comprising a total of 1280 fragments were used as inputs. The training of the back-propagation networks was directed by an algorithm which selected an optimal subset of fragments in order to maximize their discriminating power, and a good predictive network. The system comprised three programs: the first used a keyfile of 100 fragments to generate training and test files, the second was the network itself and a procedure for ranking the effectiveness of these fragments and the third randomly replaced the lowest fragments. This cycle was then repeated. After running on a 386/33 PC several networks produced approximately 11% failures in the test set and 6% in the training set. By simplifying the output of the hidden layer it was possible to describe the hidden layer states in terms of clusters of mutagens and non-mutagens. Some of these clusters were structurally homogeneous and contained known mutagenic and non-mutagenic structural classes. This analysis provided a useful means of demonstrating how the network was classifying the data.