Precision Medicine & Drug Discovery
Author: Eden Romm
Coauthor(s): Igor F. Tsigelny, Valentina L Kouznetsova
Status: Work In Progress
Abstract Winner - Precision Medicine & Drug Discovery
Machine-learning models for selection of drug-candidates for treatment of Alzheimer’s disease
We analyzed a set of more than a hundred known inhibitors of beta-secretase studied as possible drugs for the treatment of Alzheimer’s disease. Using the PaDEL descriptors-calculating program we established a set of parameters representing the structural, physical, and chemical characteristics of these inhibitors. We developed a machine-learning model for predicting novel beta-secretase inhibitors using the descriptors of already existing inhibitors. Three different machine learning functions, Multilayer Perceptron (MLP), LogitBoost (LB), and Decision Table (DT) were applied to the set of molecular descriptors for active and inactive compounds in a 5-fold cross validation. The models were then tested on a data set completely separate from the one that was used for the machine learning. This new set contained active and inactive molecules. The highest accuracy achieved by any model tested with the 5-fold cross validation method was 93.75% by LB. The most accurate model built under the same conditions by MLP was 91.48% accurate, and by DT 86.36%. The highest accuracy achieved with the first data set used as a training set and the completely new data set used as the testing set was 88.89% by MLP. The other two predictive models peaked at 77.78% accuracy by DT and 85.86% by LB. These results suggest that models built using the MLP algorithm, although slightly less accurate in the original 5-fold cross validation, are more transferrable to new data sets. The consistency is most likely due to the intricate profile neural nets like MLP can create to identify their targets. The molecules where then clustered by two separate functions to group the different classes of beta-secretase inhibitors. The MLP model’s accuracy for each cluster was assessed by examining the frequency of misclassification of all molecules by the four most accurate models. Molecules in the most accurate clusters as assigned by the first function, which contained 90% of the molecules, where classified correctly 87.04% of the time by MLP. The most accurate clusters as assigned by the second method accounted for 81% of the molecules and where classified correctly 93.51% of the time by MLP. The model developed by these methods is currently being employed for selection of possible drug-candidates from databases of compounds.