K. C. Brogle, T. Gund and D. J. Kyle Pages 103 - 113 ( 11 )
A process has been developed whereby libraries of compounds for lead optimization can be synthesized and screened with greater efficiency using computational tools. In this method, analogues of a lead chemical structure are considered in the form of a virtual library. Less than 1/3 of the library is selected as a training set by clustering the compounds and choosing the centroid of each cluster. This training set is then used to generate a model using PLS regression upon the experimental values from that assay using 1D/2D descriptors. The model is applied to the remaining compounds (the test set) for which assay values are predicted and a rank ordering established. An example of this was a set of 169 PDE4 inhibitors. A predictive model was achieved using a training set of 52 compounds. When applied to the remaining 117 compounds this model allowed a rank ordering of these compounds for synthesis and testing. Selecting the top 33 compounds of the test set gives 78% of the compounds with the desired activity (hits) by synthesizing only 50% of the library, including the training set. Selecting the top 59 of the test set gives 97% of the hits from only 67% of the library. This process succeeds by avoiding two principal weaknesses of 2D descriptors: lack of interpretation and lack of extrapolation. Two principal assumptions of QSAR are shown to be unnecessary; removing descriptor redundancy does not improve fit and a predictive r2 greater than 0.5 is not necessary if rank-ordering is desired.
1D/2D descriptors, QSAR, Scripting, dithiothreitol, binary fingerprints
Purdue Pharma, L.P.,Department of Computational, Combinatorial and Medicinal Chemistry, 6Cedar Brook Drive, Cranbury, NJ 08512, USA.