Pattern recognition study of QSAR substituent descriptors

Abstract
Parameter values for 59 common substituents and 74 descriptors used in QSAR studies were compiled. This data matrix was analysed by a variety of multivariate techniques. Linear regression confirmed that lipophilicity can be factorized into two terms, one related to molecular bulk and the other to polarity. Principal component analysis (PCA) of parameters revealed 5 significant principal components and a grouping of lipophilic, steric and electronic parameters. The different loadings of parameters with 5 PCA were also explored. The classification of substituents by cluster analysis (CA) proved rather disappointing. In contrast, the SIMCA method classified substituents of increasing bulk into 5 groups of increasing polarity.