Cluster Analysis as Selection and Dereplication Tool for the Identification of New Natural Compounds from Large Sample Sets

Abstract
Cluster analysis of gas-chromatographic (GC) data of ca. 500 bacterial isolates was used as an aid in detection and identification of new natural compounds. This approach reduces the number of GC/MS analysis (dereplication) and concomitantly improves the selection of samples with high probability to contain unknown natural products. Lipophilic bacterial extracts were derivatized and analyzed by GC under standardized conditions. A program was developed to convert chromatographic data into a two-dimensional matrix. Based on the results of hierarchical cluster analysis samples were selected for further investigation by GC/MS and NMR. This approach avoided unnecessary analysis of similar samples. By this method, the unusual oligoprenylsesquiterpenes 1 and 2 as well as new aromatic amides 7 and 8 were identified.