Chapter23: Molecule Ideation Using Matched Molecular Pairs
reading notes of《Artificial Intelligence in Drug Design》
文章目录
- 1.Introduction
- 2.MMP Algorithms
- 3.BioDig: The GSK Transform Database
- 4.Large Scale Molecule Ideation Using MMPs
- 5.Quantifying the Value of an MMP-Based Knowledge Base
- 6.The Ever-Growing Tail of New Transforms
- 7.The Subset of Useful MedChem Transforms
- 8.Assessing MMPs as a Molecule Generation Tool
- 9.First Test - Human Inclusion
- 10.Scond Test - Human Imitation
- 11.Third Test - Legacy Projects
- 12.Conclusion
1.Introduction
- Matched Molecular Pair (MMP) analysis is one of the many ways medicinal chemists can understand SAR data. The attraction of MMP analysis lies in its ability to intuitively relate structural changes to changes in a rele- vant property.
2.MMP Algorithms
- There are several implementations of the MMP algorithm in the literature. One of the most used MMP generation algorithm that has been adapted by many institutions was originally published by Hussain and Rea.
- The common core fragment is termed the context (typically >50% of the molecule by heavy atom count). Two molecules with the same context are termed an MMP. The variable part between the molecule pair is termed the transform and encodes a change from fragment X to fragment Y. The transform is typically represented as a SMIRKS reaction.
- A similar procedure has been extended for MMPs with a chemical core change. In this case multiple cuts or fragmentation operations are applied to the molecules. Where the terminal groups are all the same, but the core is different, an MMP is defined with a core or scaffold change encoded. Figure 1 shows a pictorial demonstration of the MMP algorithm.
- Deriving MMP’s across a large set of molecules with associated physicochemical properties or assay readouts allows for generalization of the Transforms across the dataset. If two or more com- pound pairs share the same transform the data can be aggregated. For each transform, statistics are derived to express the change for a chosen endpoint as a mean change with associated standard deviation or related statistics.
3.BioDig: The GSK Transform Database
- For a dataset of 300K compounds approximately 2.3 million MMPs can be extracted. This necessitates a solution for bulk storage and fast query reporting. These requirements along with the process of indexing transforms lend themselves to a relational database. This database is named BioDig at GSK.
4.Large Scale Molecule Ideation Using MMPs
- MMPs have been historically used to interrogate the effect of a chemical transform on physicochemical properties such as LogD, clearance, and membrane permeability.
- At GSK we have extended its applicability as a molecule library generation tool.
- For example, the effect on solubility when a primary amide is replaced by a secondary amide is different for an aliphatic and an aromatic context (Refer Fig. 3).
- SMARTS patterns can be generalized with aliphatic and aromatic flags as opposed to full atom type information. This extends a single transform into 6 related forms as shown in Fig. 4.
5.Quantifying the Value of an MMP-Based Knowledge Base
- A key aspect in the application of an MMP-based knowledge base is quantifying its usefulness in a medicinal chemistry design scenario. Ideally, the database must be comprehensive enough to cover the full range of transforms that could be used. Each transform in the database must also be derived from enough data to make it statistically valid.
- To help answer these questions, a comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database as compared to those in a larger 2.1 million compound diversity set. A second comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database against a subset of transforms seen in historical small molecule discovery projects.
6.The Ever-Growing Tail of New Transforms
- A linear relationship was seen between the number of molecules in the dataset and the final number of derived matched pairs and transforms. This is seen in Table 1 and Fig. 5.
7.The Subset of Useful MedChem Transforms
- The knowledge database was analyzed to assess how many of the Top 100, 500, 1000, 2500, 5K, 10K, 25K, 50K, and 100K MedChem project transforms were contained in the database. The results are given in Table 2.
8.Assessing MMPs as a Molecule Generation Tool
- Three tests were used to assess the performance of molecule generators used at GSK including an MMP-based molecule generator.
BioDig—a matched molecular pair-based algorithm described earlier in this chapter.
BRICS—a fragment replacement-based algorithm.
RG2Smi—a language processing machine learning algorithm that translates a reduced graph input to a SMILES output.
The first explored the ability of the algorithms to reproduce ideas generated by a team of medicinal chemists.
The second test explored whether the additional ~ 103 molecules generated by the algorithms were considered good ideas by the medicinal chemists.
Finally, the algorithms were assessed for their ability to generate molecules in legacy drug discovery programs from a single starting molecule in the series.
- The tests were comparing three inhouse molecule generators (Fig. 6).
9.First Test - Human Inclusion
10.Scond Test - Human Imitation
11.Third Test - Legacy Projects
12.Conclusion
- MMP analysis has emerged as a key method in the medicinal chemistry toolbox and there are many examples of publicly available algorithms and applications. Many companies have worked to sum- marize MMPs into databases of transforms.
Chapter23: Molecule Ideation Using Matched Molecular Pairs相关推荐
- Chem. Sci. | SyntaLinker: 基于Transformer神经网络的片段连接生成器
作者 | 杨禹尧 今天给大家介绍的是生物岛实验室陈红明研究员的团队,联合中山大学药学院药物分子设计中心的徐峻教授,发表在英国皇家化学学会出版的化学核心期刊Chemical Science上的一篇论文. ...
- 2022 ICML | Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
论文:https://arxiv.org/abs/2205.07249 代码:https://github.com/pengxingang/Pocket2Mol Pocket2Mol : 基于3D蛋白 ...
- 【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](6)
[论文阅读]A Gentle Introduction to Graph Neural Networks [图神经网络入门](6) GNN playground Some empirical GNN ...
- Crosstalk高速信号质量测试仪
Crosstalk高速信号质量测试仪 XTK-28/32 Crosstalk Modeling Platform Fully test SERDES – Dial in channel optimiz ...
- PEP 634 – Structural Pattern Matching: Specification
PEP 634 – Structural Pattern Matching: Specification PEP 634 – 结构化模式匹配:规范 PEP: 634 Title: Structural ...
- 2018_Semantic SLAM Based on Object Detection and Improved Octomap_note
注释 (2022/4/15 上午9:14:24) "ABSTRACT" (Zhang 等., 2018, p. 1) (pdf) 提出了什么: "In this pape ...
- Zoom to learn, learn to zoom超分辨网络
目录 论文 主要贡献 背景 创新点一.SR-RAW数据集 创新点二.CoBi损失函数 结果 结论 论文 Zhang X, Chen Q, Ng R, et al. Zoom to learn, lea ...
- Accurate prediction of molecular targets using a self-supervised image rep...(论文解读)
Accurate prediction of molecular targets using a self-supervised image representation learning frame ...
- 什么是分子优化(Molecule Optimization)以及相关论文
药物与生物大分子的相互关系(分子与药物以及人体关系)_马鹏森的博客-CSDN博客 这里说的"分子优化",其实就是"药物中的分子优化"的简称 ,药物中的分子与人体 ...
最新文章
- leetcode--反转链表--python
- R语言使用ggplot2包geom_jitter()函数绘制分组(strip plot,一维散点图)带状图(自定义调色板填充色、dark2、灰度比例)实战
- 什么是清华大学的“三好”学生?
- [网络安全自学篇] 四十三.恶意样本原理及远程服务器IPC$安全缺陷解析
- 2018年10月28日宁波dotnet社区活动回顾及下次活动预告
- 针对JDK 14提议的另外六个JEP
- 关于体育的python毕业设计_Python实例13:体育竞技分析
- C++ - STL迭代器失效
- 根据中心点、半径长度和角度画点
- Oracle PL/SQL之NEXT_DAY - 取得下一个星期几所在的日期
- C++第五章课后习题16-字符串按逆序输出
- 中value大小_如何在Spring/SpringBoot 中做参数校验?你需要了解的都在这里!
- PC建立WIFI热点
- International Journal of Rock Mechanics and Mining Sciences (Vol 124-12月期最新研究译文)
- java tomcat jvm内存_【转】Linux下tomcat JVM内存
- Java之Socket实现文件传输
- oracle查看redo文件,Oracle Redo文件恢复
- 无线基础知识学习(一)
- 计算机任务管理器无法响应,Win7系统电脑在任务管理器中关闭进程时总是未响应的解决方法...
- 虚拟机如何安装优麒麟19.10