reading notes of《Artificial Intelligence in Drug Design》

文章目录

1.Introduction
2.MMP Algorithms
3.BioDig: The GSK Transform Database
4.Large Scale Molecule Ideation Using MMPs
5.Quantifying the Value of an MMP-Based Knowledge Base
6.The Ever-Growing Tail of New Transforms
7.The Subset of Useful MedChem Transforms
8.Assessing MMPs as a Molecule Generation Tool
9.First Test - Human Inclusion
10.Scond Test - Human Imitation
11.Third Test - Legacy Projects
12.Conclusion

1.Introduction

Matched Molecular Pair (MMP) analysis is one of the many ways medicinal chemists can understand SAR data. The attraction of MMP analysis lies in its ability to intuitively relate structural changes to changes in a rele- vant property.

2.MMP Algorithms

There are several implementations of the MMP algorithm in the literature. One of the most used MMP generation algorithm that has been adapted by many institutions was originally published by Hussain and Rea.
The common core fragment is termed the context (typically >50% of the molecule by heavy atom count). Two molecules with the same context are termed an MMP. The variable part between the molecule pair is termed the transform and encodes a change from fragment X to fragment Y. The transform is typically represented as a SMIRKS reaction.
A similar procedure has been extended for MMPs with a chemical core change. In this case multiple cuts or fragmentation operations are applied to the molecules. Where the terminal groups are all the same, but the core is different, an MMP is defined with a core or scaffold change encoded. Figure 1 shows a pictorial demonstration of the MMP algorithm.
Deriving MMP’s across a large set of molecules with associated physicochemical properties or assay readouts allows for generalization of the Transforms across the dataset. If two or more com- pound pairs share the same transform the data can be aggregated. For each transform, statistics are derived to express the change for a chosen endpoint as a mean change with associated standard deviation or related statistics.

3.BioDig: The GSK Transform Database

For a dataset of 300K compounds approximately 2.3 million MMPs can be extracted. This necessitates a solution for bulk storage and fast query reporting. These requirements along with the process of indexing transforms lend themselves to a relational database. This database is named BioDig at GSK.

4.Large Scale Molecule Ideation Using MMPs

MMPs have been historically used to interrogate the effect of a chemical transform on physicochemical properties such as LogD, clearance, and membrane permeability.
At GSK we have extended its applicability as a molecule library generation tool.
For example, the effect on solubility when a primary amide is replaced by a secondary amide is different for an aliphatic and an aromatic context (Refer Fig. 3).
SMARTS patterns can be generalized with aliphatic and aromatic flags as opposed to full atom type information. This extends a single transform into 6 related forms as shown in Fig. 4.

5.Quantifying the Value of an MMP-Based Knowledge Base

A key aspect in the application of an MMP-based knowledge base is quantifying its usefulness in a medicinal chemistry design scenario. Ideally, the database must be comprehensive enough to cover the full range of transforms that could be used. Each transform in the database must also be derived from enough data to make it statistically valid.
To help answer these questions, a comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database as compared to those in a larger 2.1 million compound diversity set. A second comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database against a subset of transforms seen in historical small molecule discovery projects.

6.The Ever-Growing Tail of New Transforms

A linear relationship was seen between the number of molecules in the dataset and the final number of derived matched pairs and transforms. This is seen in Table 1 and Fig. 5.

7.The Subset of Useful MedChem Transforms

The knowledge database was analyzed to assess how many of the Top 100, 500, 1000, 2500, 5K, 10K, 25K, 50K, and 100K MedChem project transforms were contained in the database. The results are given in Table 2.

8.Assessing MMPs as a Molecule Generation Tool

Three tests were used to assess the performance of molecule generators used at GSK including an MMP-based molecule generator.
- BioDig—a matched molecular pair-based algorithm described earlier in this chapter.
- BRICS—a fragment replacement-based algorithm.
- RG2Smi—a language processing machine learning algorithm that translates a reduced graph input to a SMILES output.
- The first explored the ability of the algorithms to reproduce ideas generated by a team of medicinal chemists.
- The second test explored whether the additional ~ 103 molecules generated by the algorithms were considered good ideas by the medicinal chemists.
- Finally, the algorithms were assessed for their ability to generate molecules in legacy drug discovery programs from a single starting molecule in the series.
The tests were comparing three inhouse molecule generators (Fig. 6).

9.First Test - Human Inclusion

10.Scond Test - Human Imitation

11.Third Test - Legacy Projects

12.Conclusion

MMP analysis has emerged as a key method in the medicinal chemistry toolbox and there are many examples of publicly available algorithms and applications. Many companies have worked to sum- marize MMPs into databases of transforms.

Chapter23: Molecule Ideation Using Matched Molecular Pairs相关推荐

Chem. Sci. | SyntaLinker: 基于Transformer神经网络的片段连接生成器
作者 | 杨禹尧今天给大家介绍的是生物岛实验室陈红明研究员的团队,联合中山大学药学院药物分子设计中心的徐峻教授,发表在英国皇家化学学会出版的化学核心期刊Chemical Science上的一篇论文. ...
2022 ICML | Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
论文:https://arxiv.org/abs/2205.07249 代码:https://github.com/pengxingang/Pocket2Mol Pocket2Mol : 基于3D蛋白 ...
【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（6）
[论文阅读]A Gentle Introduction to Graph Neural Networks [图神经网络入门](6) GNN playground Some empirical GNN ...
Crosstalk高速信号质量测试仪
Crosstalk高速信号质量测试仪 XTK-28/32 Crosstalk Modeling Platform Fully test SERDES – Dial in channel optimiz ...
PEP 634 – Structural Pattern Matching: Specification
PEP 634 – Structural Pattern Matching: Specification PEP 634 – 结构化模式匹配:规范 PEP: 634 Title: Structural ...
2018_Semantic SLAM Based on Object Detection and Improved Octomap_note
注释 (2022/4/15 上午9:14:24) "ABSTRACT" (Zhang 等., 2018, p. 1) (pdf) 提出了什么: "In this pape ...
Zoom to learn, learn to zoom超分辨网络
目录论文主要贡献背景创新点一.SR-RAW数据集创新点二.CoBi损失函数结果结论论文 Zhang X, Chen Q, Ng R, et al. Zoom to learn, lea ...
Accurate prediction of molecular targets using a self-supervised image rep...（论文解读）
Accurate prediction of molecular targets using a self-supervised image representation learning frame ...
什么是分子优化（Molecule Optimization）以及相关论文
药物与生物大分子的相互关系(分子与药物以及人体关系)_马鹏森的博客-CSDN博客这里说的"分子优化",其实就是"药物中的分子优化"的简称 ,药物中的分子与人体 ...

Chapter23: Molecule Ideation Using Matched Molecular Pairs

文章目录

1.Introduction

2.MMP Algorithms

3.BioDig: The GSK Transform Database

4.Large Scale Molecule Ideation Using MMPs

5.Quantifying the Value of an MMP-Based Knowledge Base

6.The Ever-Growing Tail of New Transforms

7.The Subset of Useful MedChem Transforms

8.Assessing MMPs as a Molecule Generation Tool

9.First Test - Human Inclusion

10.Scond Test - Human Imitation

11.Third Test - Legacy Projects

12.Conclusion

Chapter23: Molecule Ideation Using Matched Molecular Pairs相关推荐

最新文章

热门文章