Multivariate Qsar Analysis of a Skin Sensitization Database

Abstract
There is a regulatory requirement for the potential of a new chemical to cause skin sensitization to be assessed. This requirement is presently fulfilled by the use of animal tests. In this study a data base of heterogeneous organic compounds from the guinea pig maximization test has been subjected to multivariate QSAR analysis. The compounds were described both by whole molecule parameters and structural features associated with likely sites of reactivity. Principal component analysis was applied to the data set and although it functions reasonably well to reduce the dimensionality of a large data matrix, it is only moderately useful as a predictive tool when descriptors were chosen rationally. Stepwise discriminant analysis produces a fourteen parameter model, of which twelve were structural features associated with reactivity. This however predicts only 82.6% of compounds correctly after cross validation. There is trend for the linear discriminant analysis model to predict compounds as non sensitizers, suggesting that the parameters incorporated were not wholly suitable for discriminating between the two classes. Another criticism of linear discriminant analysis is that it may be unable to cope with the likely embedded data structure. With this in mind, the structural alerts may be better employed in an expert system, to identify potential hazard, where they will not suffer the limitations of a statistical model.