Abstract
The application of Cheminformatics to High-Throughput Screening (HTS) data requires the use of robust modeling methods. Robust models must be able to accommodate false positive and false negative data yet retain good explanatory and predictive power. Recursive Partitioning has been shown to accommodate false positive and false negative data in the model building phase but suffers from a high false positive rate in the prediction phase, especially with sparse data sets such as HTS data. Here, we introduce Consensus Selection as a procedure to decrease the false positive rate of Recursive Partitioning-based models. Consensus Selection by Multiple Recursion Trees can increase the hit rate of a High-Throughput Screen in excess of 30-fold while significantly reducing the false positive rate relative to single Recursion Tree models.