The aim of the study is connected with the development of methodology for automatic evaluation of ligand-receptor complexes with the use of machine learning methods. Docking, which belongs to the group of the most important chem- and bioinformatic tasks, requires further results analysis, as for the existing scoring functions it is difficult to obtain correlation between their values and results of the in vitro experiments. Such analysis could be performed by visual inspection, however it is very time-consuming and subjective, especially in case of large number of compounds. The Project is therefore a response for the need for automation of the process of docking results analysis, providing the high quality of the obtained results at the same time, that is the highest possible efficiency of prediction of biological experiments results.
The Project work will be divided into two main parts: preparation for and performing of the docking procedure and analysis of the docking results with the use of machine learning and specially prepared representation of the obtained ligand-receptor complexes. All the experiments will be performed for serotonin receptors 5-HT6 and 5-HT7, however, the tool will be constructed in a way that will enable its application for any target. Due to lack of crystallographic data on 5-HT6 and 5-HT7 receptors, carrying out the docking procedure will be connected with the necessity of construction of homology models of these receptors. Eight different GPCRs will be used as templates for homology modelling purposes. For each template, the best model will be selected (according to the values of the enrichment factor, EF) and 5 with the highest EF will be chosen for further studies. Compounds with known activity towards 5-HT6 and 5-HT7 receptors (molecules with experimentally proved activity and inactivity towards these proteins will be fetched from the ChEMBL database) will be docked into the binding sites of these receptors, together with assumed inactives, selected from the ZINC database according to the DUD (Directory of Useful Decoys) approach.
The docking results will be described by means of Structural Interaction Fingerprints (SIFts) and Spectrophores. SIFts provide information about the interactions between ligand and each of the amino acids of the receptors, whereas Spectrophores will be the source of information about the conformation of the docked compound, as they consist of atomic properties values calculated in a way that is dependent on the actual spatial orientation of a molecule.
Such prepared representation of the docking results will be an input for machine learning experiments, with the use of 5 different classification algorithms, followed by multi-step results analysis. In the first step, each complex ligand-protein will be considered as separate instance. Therefore, calculation of evaluating parameters for such experiment will be then used for the assessment of particular machine learning algorithm. In the next step, for each instance that was docked into the binding site of the constructed homology model, the consensus from all learning algorithms will be generated by calculating the weight average with weights provided by the performance of machine learning methods from the previous step (the better performance of particular algorithm, the higher weight). Then, another weight average calculation will be applied in such way that for particular ligand docked into a given receptor model a final answer will be produced with weights being a value of scoring function provided by the docking program. The final step is connected with consensus making being a weight average for results obtained for various receptor models built on different templates with weights being the values of enrichment factors calculated during the homology models generation. After each stage, evaluating parameters (recall, precision, MCC) will be calculated in order to verify the effectiveness of the developed methodology.
Expected impact of the project on the development of science, civilization and society
As docking is an extremely important part of virtual screening campaigns, the developed methodology will provide a significant help in selecting compounds for synthesis/purchasing/further research. Moreover, as a it is connected with a combination of structure- and ligand-based approach, as well as an application of a new kind of representation of chemical compounds (not only by describing their structure and physicochemical properties, but also by providing information about the way of their interaction with biological target), the proposed approach might start a new path of chem- and bioinformatic experiments related to the docking issues. However, it should be noted that the described methodology will be neither new scoring function nor new way of molecules representation. Evaluation provided by machine learning algorithms will be binary classification, that is active-inactive and according to its assumptions should help in identification and rejection from further research such molecules, which despite successful docking into the binding site of the receptor and favourable value of scoring function are characterized by such way of interaction with the target protein that the probability of being active in vitro or in vivo is low enough to neglect them from further consideration.
Sabina Podlewska (Smusz), MSc
phone: +4812 66 23 301
Rafał Kafel, MSc
phone: +4812 66 23 301
1. Smusz, S.; Mordalski, S.; Witek, J.; Rataj, K.; Kafel, R.; Bojarski, A.J. Multi-Step Protocol for Automatic Evaluation of Docking Results Based on Machine Learning Methods-A Case Study of Serotonin Receptors 5-HT6 and 5-HT7. J. Chem. Inf. Model. 2015, 55, 823-832. (http://www.ncbi.nlm.nih.gov/pubmed/25806997)
2. Mordalski, S.; Witek, J.; Smusz, S.; Rataj, K.; Bojarski, A.J. Multiple conformational states in retrospective virtual screening – homology models vs. crystal structures: beta-2 adrenergic receptor case study. J. Cheminform. 2015, 9, 7:13. (http://www.ncbi.nlm.nih.gov/pubmed/25949744)
3. Czarnecki, W.M.; Podlewska, S.; Bojarski, A.J. Extremely Randomized Machine Learning Methods for Compound Activity Prediction. Molecules, 2015, 11, 20107-20117 (http://www.ncbi.nlm.nih.gov/pubmed/26569196)
b) participation in conferences:
1. Witek, J.; Smusz, S.; Rataj, K.; Mordalski, S.; Bojarski, A.J.; Combination of structural interaction profiles as a method for optimization of its application in docking results analysis; VIth Conversatory on Medicinal Chemistry, 18-20.09.2014, Lublin, Poland, Book of Abstracts, p. 157 Abstract Poster
2. Smusz, S.; Mordalski, S.; Witek, J.; Rataj, K.; Bojarski, A.J.; A machine learning-based for docking results analysis; The 10th International Conference on Chemical Structures, 1-5.06.2014, Noordwijkerhout, the Netherlands, Book of Abstracts, p.143Abstract
3. Smusz, S.; Witek, J.; Rataj. K; Mordalski, S.; Bojarski, A.J.; Structural interaction profiles combination as a method for optimization of its application in docking results analysis – beta-2 adrenergic receptor case study; The GLISTEN Budapest 2014 Conference, 02-04.10.2014, Budapest, Hungary, Book of Abstracts, P412 Abstract Poster
4. Podlewska, S.; Lacivita, E.; Leopoldo, M.; Bojarski, A.J. Narzędzia do oceny stabilności metabolicznej in silico. Zeszyty Naukowe Towarzystwa Doktorantów Uniwersytetu Jagiellońskiego, Numer 10 (1/2015), Bartłomiej Jałocha, Ed., e-ISSN 2082-3827, p-ISSN 2084-977X, str. 112. Abstract
5. Podlewska, S.; Lacivita, E.; Leopoldo, M.; Bojarski, A.J. Tools for in silico evaluation of cytochrome P450-mediated compounds metabolism, V Meeting of the Paul Ehrlich MedChem Euro-PhD Network, 03-05.07.2015, Kraków, Poland, Book of Abstracts, p. 85 Abstract
The study is supported by a grant PRELUDIUM UMO-2013/09/N/NZ2/01917 financed by the National Science Centre