reading notes of《Artificial Intelligence in Drug Design》


文章目录

  • 1.Introduction
  • 2.Supervised Learning in Antibody Development
    • 2.1.Biophysical Properties
    • 2.2.Product Quality Attributes
    • 2.3.Process Behavior
  • 3.Unsupervised Learning in Antibody Development
    • 3.1.Transfer Learning of Unsupervised and Self-Supervised Models
  • 4.Conclusion

1.Introduction

  • Within the scope of developing therapeutic monoclonal antibodies (mAbs), there are many steps that all contribute to the overall cost and time required to bring a biotherapeutic drug to a patient.
  • Antibody library-based discovery is frequently done in phage or yeast cell platforms. In vivo discovery platforms such as animal immunizations or even human B-cell panning utilize transient production methods to create material for testing which normally does not replicate the therapeutic development process.
  • The bulk of commercial therapeutic antibody production utilizes Chinese hamster ovary (CHO) cells and there are significant differences between the cells and production methods.
  • While expression levels can be estimated from large yeast datasets, these estimations are only vaguely “directional” for expected expression in mammalian cell lines. The individual cellular mechanisms of a mammalian cell and a yeast cell are too different.
  • Even if one can train reliable machine-learned predictors for a behavior like solubility in a specific solvent or extent of molecule–molecule interactions, the bigger challenge is mapping these beha- viors to in vitro process development tasks for an antibody or its in vivo behaviors.
  • This final challenge is where deep learning may provide the most benefit, design of data that spans the complicated spaces of the following.
      1. Germ lines.
      1. CDR diversity.
      1. Antibody formats (e.g., scFv, full length, Fab, Fc-fusion, multispecifics).
      1. Specific sequence liabilities (e.g., deamidation, isomerization, glycosylation sites).
      1. In vivo immunogenicity and clearance likelihood.

2.Supervised Learning in Antibody Development

  • There are two predominant pathways to prediction of behavior from molecular features;

    • The most frequently attempted approach is to use some intermediate representation of the antibody structure generated from molecular modeling and using some frequently hand-picked set of features as inputs.
    • The second approach, which is gaining more traction from deep learning efforts, is prediction straight from amino acid sequence, frequently encoded in a one- hot-encoded (OHE) form for each residue.
  • The real key to antibody behavior predictions is more likely buried in small scale distances and interactions—a domain in which the AlphaFold models simply cannot yet contribute.

  • On the other hand, unlike the generalized protein problem being handled by the AlphaFold models, a significant portion of the antibody sequence and structure is so conserved that homology modeling (using preexisting known sequences and structures as a starting point) provides a very reasonable estimate of base structure. This high level of conservation also permits the use of structure-based residue alignment methods which greatly reduce the complexity of the latent space that must be inferred from sequence.

2.1.Biophysical Properties

  • The availability of large protein solubility datasets (over 100k protein sequences) has recently opened the door for deep learning solubility prediction.
  • The DeepSol algorithm uses a convolutional neural network (CNN) which takes as input an amino acid sequence and outputs a probability that the associated protein is soluble. The SKADE algorithm uses an attention-based deep learning model on the same task.
  • While these models are not immediately applicable to antibody engineering—the soluble vs. insoluble classification dataset does not likely encode the subtler patterns associated with small solubility changes for a small number of mutations—they demonstrate that solubility is predictable from primary sequence.
  • A predictive model has also been reported for antibody hydrophobicity, trained on hydrophobic interaction chromatography retention time (HIC RT) measurements for over 5000 antibody antigen binding fragments (Fabs). Jain et al. used this data set to create two traditional machine learning predictors to predict (1) solvent accessible surface area (SASA) from engineered sequence features and (2) HIC RT class from SASA.

2.2.Product Quality Attributes

  • Product quality attributes (PQAs)—especially posttranslational modifications such as deamidation, isomerization, and glycosylation—are intriguing targets for predictive modeling.
  • A recent publication on machine learning for deamidation prediction provides an illustrative example for the current state of supervised learning in PQAs.
  • There have also been examples reported of using machine learning to predict mAb glycoform distributions in CHO cells, most recently using artificial neural networks.
  • While promising, the results also need to be aligned to the type of cells, transfection method, media composition, and even production mode (batch vs. continuous perfusion) as these all can have an impact on PQA.

2.3.Process Behavior

  • Creative use of high-throughput “scale down models”—lab methods that work in multiwell plates at the dozens, hundreds, or even thousands scale—and hybrid in silico modeling approaches offer a glimmer of hope for process behavior predictive modeling.
  • Two promising corners in process behavior prediction are using mL scale bioreactors for collecting productivity data and small scale purification experiments.

3.Unsupervised Learning in Antibody Development

  • The goal of these generative models is to create diverse, hyperrealistic synthetic candidates given an example dataset of true sam- ples. Curated human repertoire datasets, like the Observed Anti- body Space (OAS), provide a rich data source of true human antibody sequence.
  • There have also been applications of models to generate libraries of binders to a particular target/antigen. Variational Autoencoders have been used in coordination with Gaussian Mixture Models to allow for latent space clustering of antibody CDRs for specific targets. The model allows the users to navigate within the clusters of the latent space to generate novel binders to a given target. This approach can be seen as a means of performing CDR affinity maturation in silico, given a set of hits to an antigen, postlibrary screening.
  • Masked Language Models (MLM) may be particularly useful in the antibody space due to the antibody’s comparatively long protein sequence and complex structure where long-range context matters significantly.

3.1.Transfer Learning of Unsupervised and Self-Supervised Models

  • While GAN and MLM models are powerful generative and qualitative assessment tools, the ability to use transfer learning to further adapt these models may be the true transformative power of these approaches. With a trained model that has captured the larger domain of antibody sequence relationships, we can apply transfer learning to focus these models down to subsets of antibody types.
  • The path of transfer learning these models also opens the door to generating highly diverse training data for supervised learning applications and thereby further refining the models’ predictive abilities and our understanding of the underlying biophysical beha- viors.

4.Conclusion

  • Each of these intermediate successes in deep learning is useful, but the path is long.

Chapter19: Deep Learning in Therapeutic Antibody Development相关推荐

  1. Turning Design Mockups Into Code With Deep Learning

    原文链接地址:https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/ Emil Wallner o ...

  2. Deep Learning(深度学习)学习笔记整理系列

    一.概述 Artificial Intelligence,也就是人工智能,就像长生不老和星际漫游一样,是人类最美好的梦想之一.虽然计算机技术已经取得了长足的进步,但是到目前为止,还没有一台电脑能产生& ...

  3. 《Deep Learning With Python second edition》英文版读书笔记:第十一章DL for text: NLP、Transformer、Seq2Seq

    文章目录 第十一章:Deep learning for text 11.1 Natural language processing: The bird's eye view 11.2 Preparin ...

  4. Deep Learning for Computer Vision with Caffe and cuDNN

    转载自:Deep Learning for Computer Vision with Caffe and cuDNN | Parallel Forall http://devblogs.nvidia. ...

  5. Securing the Deep Learning Stack

    This is the first post of Nervana's "Security Meets Deep Learning" series. Security is one ...

  6. 1.4 为什么深度学习会兴起?(Why is Deep Learning taking off?)

    深度学习和神经网络之前的基础技术理念已经存在大概几十年了,为什么它们现在才突然流行起来呢? 在过去的几年里,很多人都问我为什么深度学习能够如此有效.当我回答这个问题时,我通常给他们画个图,在水平轴上画 ...

  7. A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs)

    论文阅读记录 数据类型 在预定义时间窗口中,按照传输控制协议/互联网协议(TCP/IP)数据包将网络流量数据建模成时间序列数据. 数据:KDDCup-99/ NSL-KDD/ UNSW-NB15 NI ...

  8. Turbo Autoencoder: Deep learning based channel code for point-to-point communication channels

    Turbo Autoencoder: Deep learning based channel code for point-to-point communication channels Abstra ...

  9. Geometric deep learning: going beyond Euclidean data译文

    Geometric deep learning: going beyond Euclidean data(几何深度学习:超越欧几里得数据) 摘要: 许多科学领域研究具有非欧几里德空间的底层结构的数据. ...

最新文章

  1. RoI Pooling 与 RoI Align 有什么区别?
  2. Java JDBC批处理插入数据操作
  3. jquery-1.10.2.min.map是什么,怎么用?
  4. graphpad如何加标注_如何以YOLOv3训练自己的数据集 以小蕃茄为例
  5. 库克为 iOS 操碎了心
  6. ffmpeg h264 解码 转
  7. [Regular] 4、正则表达式的匹配原理原则
  8. asp.net 通过ajax方式调用webmethod方法使用自定义类传参及获取返回参数
  9. 计算机EV录屏培训体会,停课不停学19|好用的EV录屏软件助力线上教学
  10. 主流前端-后端-数据库总结--前端框架篇
  11. 北风的年终总结2021
  12. C# 使用NAudio合并mp3、wav音频文件
  13. 海外:国外最受欢迎的5个电子邮件服务网站
  14. A4988与42步进电机
  15. OSChina 周二乱弹 ——室友开始买假发女装了
  16. JVM元数据空间增长分析
  17. 瑞格科技IPO被终止:曾拟募资5.6亿 江振翔三兄弟为实控人
  18. 嵌入式基础学习-烧写工具
  19. json to go
  20. centos7.3 虚拟环境安装

热门文章

  1. Windows实时拓展套件-Kithara RealTime Suite
  2. DIY Android之一--原生Android系统主题支持的设计和实现
  3. css如何快速将网站设置为灰色背景
  4. ffmpeg h264解码器提取
  5. 洛谷:海底高铁(P3406)C++
  6. 量化投资大师詹姆斯·西蒙斯经典演讲:数学,常识和运气
  7. 噪声强度(噪声功率)、SNR、dBW
  8. 温情小说之 - 冷掉的咖啡…… (转)
  9. YOLO 9000论文翻译 | YOLO9000:Better, Faster, Stronger
  10. 轻钢别墅与旅游景区的完美结合