Abstract
Summary This paper describes a method for selecting a small, highly diverse subset from a large pool of molecules. The method has been employed in the design of combinatorial synthetic libraries for use in high-throughput screening for pharmaceutical lead generation. It computes diversity in terms of the main factors relevant to ligand-protein binding, namely the three-dimensional arrangement of steric bulk and of polar functionalities and molecular entropy. The method was used to select a set of 20 carboxylates suitable for use as side-chain precursors in a polyamine-based library. The method depends on estimates of various physical-chemical parameters involved in ligand-protein binding; experiments examined the sensitivity of the method to these parameters. This paper compares the diversity of randomly and rationally selected side-chain sets; the results suggest that careful design of synthetic combinatorial libraries may increase their effectiveness several-fold.