reading notes of《Artificial Intelligence in Drug Design》

文章目录

1.Introduction
2.Supervised Learning in Antibody Development
- 2.1.Biophysical Properties
- 2.2.Product Quality Attributes
- 2.3.Process Behavior
3.Unsupervised Learning in Antibody Development
- 3.1.Transfer Learning of Unsupervised and Self-Supervised Models
4.Conclusion

1.Introduction

Within the scope of developing therapeutic monoclonal antibodies (mAbs), there are many steps that all contribute to the overall cost and time required to bring a biotherapeutic drug to a patient.
Antibody library-based discovery is frequently done in phage or yeast cell platforms. In vivo discovery platforms such as animal immunizations or even human B-cell panning utilize transient production methods to create material for testing which normally does not replicate the therapeutic development process.
The bulk of commercial therapeutic antibody production utilizes Chinese hamster ovary (CHO) cells and there are significant differences between the cells and production methods.
While expression levels can be estimated from large yeast datasets, these estimations are only vaguely “directional” for expected expression in mammalian cell lines. The individual cellular mechanisms of a mammalian cell and a yeast cell are too different.
Even if one can train reliable machine-learned predictors for a behavior like solubility in a specific solvent or extent of molecule–molecule interactions, the bigger challenge is mapping these beha- viors to in vitro process development tasks for an antibody or its in vivo behaviors.
This final challenge is where deep learning may provide the most benefit, design of data that spans the complicated spaces of the following.
- 1. Germ lines.
- 1. CDR diversity.
- 1. Antibody formats (e.g., scFv, full length, Fab, Fc-fusion, multispecifics).
- 1. Specific sequence liabilities (e.g., deamidation, isomerization, glycosylation sites).
- 1. In vivo immunogenicity and clearance likelihood.

2.Supervised Learning in Antibody Development

There are two predominant pathways to prediction of behavior from molecular features;
- The most frequently attempted approach is to use some intermediate representation of the antibody structure generated from molecular modeling and using some frequently hand-picked set of features as inputs.
- The second approach, which is gaining more traction from deep learning efforts, is prediction straight from amino acid sequence, frequently encoded in a one- hot-encoded (OHE) form for each residue.
The real key to antibody behavior predictions is more likely buried in small scale distances and interactions—a domain in which the AlphaFold models simply cannot yet contribute.
On the other hand, unlike the generalized protein problem being handled by the AlphaFold models, a significant portion of the antibody sequence and structure is so conserved that homology modeling (using preexisting known sequences and structures as a starting point) provides a very reasonable estimate of base structure. This high level of conservation also permits the use of structure-based residue alignment methods which greatly reduce the complexity of the latent space that must be inferred from sequence.

2.1.Biophysical Properties

The availability of large protein solubility datasets (over 100k protein sequences) has recently opened the door for deep learning solubility prediction.
The DeepSol algorithm uses a convolutional neural network (CNN) which takes as input an amino acid sequence and outputs a probability that the associated protein is soluble. The SKADE algorithm uses an attention-based deep learning model on the same task.
While these models are not immediately applicable to antibody engineering—the soluble vs. insoluble classification dataset does not likely encode the subtler patterns associated with small solubility changes for a small number of mutations—they demonstrate that solubility is predictable from primary sequence.
A predictive model has also been reported for antibody hydrophobicity, trained on hydrophobic interaction chromatography retention time (HIC RT) measurements for over 5000 antibody antigen binding fragments (Fabs). Jain et al. used this data set to create two traditional machine learning predictors to predict (1) solvent accessible surface area (SASA) from engineered sequence features and (2) HIC RT class from SASA.

2.2.Product Quality Attributes

Product quality attributes (PQAs)—especially posttranslational modifications such as deamidation, isomerization, and glycosylation—are intriguing targets for predictive modeling.
A recent publication on machine learning for deamidation prediction provides an illustrative example for the current state of supervised learning in PQAs.
There have also been examples reported of using machine learning to predict mAb glycoform distributions in CHO cells, most recently using artificial neural networks.
While promising, the results also need to be aligned to the type of cells, transfection method, media composition, and even production mode (batch vs. continuous perfusion) as these all can have an impact on PQA.

2.3.Process Behavior

Creative use of high-throughput “scale down models”—lab methods that work in multiwell plates at the dozens, hundreds, or even thousands scale—and hybrid in silico modeling approaches offer a glimmer of hope for process behavior predictive modeling.
Two promising corners in process behavior prediction are using mL scale bioreactors for collecting productivity data and small scale purification experiments.

3.Unsupervised Learning in Antibody Development

The goal of these generative models is to create diverse, hyperrealistic synthetic candidates given an example dataset of true sam- ples. Curated human repertoire datasets, like the Observed Anti- body Space (OAS), provide a rich data source of true human antibody sequence.
There have also been applications of models to generate libraries of binders to a particular target/antigen. Variational Autoencoders have been used in coordination with Gaussian Mixture Models to allow for latent space clustering of antibody CDRs for specific targets. The model allows the users to navigate within the clusters of the latent space to generate novel binders to a given target. This approach can be seen as a means of performing CDR affinity maturation in silico, given a set of hits to an antigen, postlibrary screening.
Masked Language Models (MLM) may be particularly useful in the antibody space due to the antibody’s comparatively long protein sequence and complex structure where long-range context matters significantly.

3.1.Transfer Learning of Unsupervised and Self-Supervised Models

While GAN and MLM models are powerful generative and qualitative assessment tools, the ability to use transfer learning to further adapt these models may be the true transformative power of these approaches. With a trained model that has captured the larger domain of antibody sequence relationships, we can apply transfer learning to focus these models down to subsets of antibody types.
The path of transfer learning these models also opens the door to generating highly diverse training data for supervised learning applications and thereby further refining the models’ predictive abilities and our understanding of the underlying biophysical beha- viors.

4.Conclusion

Each of these intermediate successes in deep learning is useful, but the path is long.

Chapter19: Deep Learning in Therapeutic Antibody Development相关推荐

Turning Design Mockups Into Code With Deep Learning
原文链接地址:https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/ Emil Wallner o ...
Deep Learning（深度学习）学习笔记整理系列
一.概述 Artificial Intelligence,也就是人工智能,就像长生不老和星际漫游一样,是人类最美好的梦想之一.虽然计算机技术已经取得了长足的进步,但是到目前为止,还没有一台电脑能产生& ...
《Deep Learning With Python second edition》英文版读书笔记：第十一章DL for text: NLP、Transformer、Seq2Seq
文章目录第十一章:Deep learning for text 11.1 Natural language processing: The bird's eye view 11.2 Preparin ...
Deep Learning for Computer Vision with Caffe and cuDNN
转载自:Deep Learning for Computer Vision with Caffe and cuDNN | Parallel Forall http://devblogs.nvidia. ...
Securing the Deep Learning Stack
This is the first post of Nervana's "Security Meets Deep Learning" series. Security is one ...
1.4 为什么深度学习会兴起？(Why is Deep Learning taking off?)
深度学习和神经网络之前的基础技术理念已经存在大概几十年了,为什么它们现在才突然流行起来呢? 在过去的几年里,很多人都问我为什么深度学习能够如此有效.当我回答这个问题时,我通常给他们画个图,在水平轴上画 ...
A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs)
论文阅读记录数据类型在预定义时间窗口中,按照传输控制协议/互联网协议(TCP/IP)数据包将网络流量数据建模成时间序列数据. 数据:KDDCup-99/ NSL-KDD/ UNSW-NB15 NI ...
Turbo Autoencoder: Deep learning based channel code for point-to-point communication channels
Turbo Autoencoder: Deep learning based channel code for point-to-point communication channels Abstra ...
Geometric deep learning: going beyond Euclidean data译文
Geometric deep learning: going beyond Euclidean data(几何深度学习:超越欧几里得数据) 摘要: 许多科学领域研究具有非欧几里德空间的底层结构的数据. ...

Chapter19: Deep Learning in Therapeutic Antibody Development

文章目录

1.Introduction

2.Supervised Learning in Antibody Development

2.1.Biophysical Properties

2.2.Product Quality Attributes

2.3.Process Behavior

3.Unsupervised Learning in Antibody Development

3.1.Transfer Learning of Unsupervised and Self-Supervised Models

4.Conclusion

Chapter19: Deep Learning in Therapeutic Antibody Development相关推荐

最新文章

热门文章