Abstract
By means of clustering, one is able to manage large databases easily. Clustering according to structure similarity distinguished the several chemical classes that were present in our training set. All the clusters showed correlation of log WS with log K OW and melting point, except EINECS-cluster 1. This cluster contains only chemicals with melting points below room temperature, resulting in a log WS-log K OW , relationship. The observed weak correlation for this cluster is probably due to the insufficient number of available screens. Such a limited amount of screens allows relatively very different chemicals to share the same cluster. Using statistical criteria, our approach resulted in three QSARs with reasonably good predictive capabilities, originating from clusters 1639, 3472, and 5830. The models resulting from the smaller clusters 6873, 8154, and 16424 are characterised by high correlation coefficients which describe the cluster itself very well but, due to our stringent bootstrap criterion, they are close to randomness. Clusters 6815 and 18083 showed rather low correlations. The models originating from clusters 1639, 3472, and 5830 proved their usefulness by external validation. The log WS-values calculated with our QSARs agreed within 1 log-unit to these reported in the literature.