CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理


Subjects: cs.AI、cs.Cv

1.Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring


作者: Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li





Image-text pretrained models, e.g., CLIP, have shown impressive general multi-modal knowledge learned from large-scale image-text data pairs, thus attracting increasing attention for their potential to improve visual representation learning in the video domain. In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain. We find that current temporal modeling mechanisms are tailored to either high-level semantic-dominant tasks (e.g., retrieval) or low-level visual pattern-dominant tasks (e.g., recognition), and fail to work on the two cases simultaneously. The key difficulty lies in modeling temporal dependency while taking advantage of both high-level and low-level knowledge in CLIP model. To tackle this problem, we present Spatial-Temporal Auxiliary Network (STAN) -- a simple and effective temporal modeling mechanism extending CLIP model to diverse video tasks. Specifically, to realize both low-level and high-level knowledge transferring, STAN adopts a branch structure with decomposed spatial-temporal modules that enable multi-level CLIP features to be spatial-temporally contextualized. We evaluate our method on two representative video tasks: Video-Text Retrieval and Video Recognition. Extensive experiments demonstrate the superiority of our model over the state-of-the-art methods on various datasets, including MSR-VTT, DiDeMo, LSMDC, MSVD, Kinetics-400, and Something-Something-V2. Codes will be available at

2.The Projection-Enhancement Network (PEN)


作者: Christopher Z. Eddy, Austin Naylor, Bo Sun



当代细胞科学中的实例分割方法根据实验和数据结构使用二维或三维卷积网络。然而,显微镜系统的限制或防止光毒性的努力通常需要记录次优的采样数据,这大大降低了这种三维数据的效用,特别是在对象之间有显著轴向重叠的拥挤环境中。在这种情况下,二维分割对细胞形态来说更可靠,也更容易进行注释。在这项工作中,我们提出了投影增强网络(PEN),这是一个新颖的卷积模块,它处理子采样的3D数据并产生2D RGB语义压缩,并与选择的实例分割网络一起训练以产生2D分割。我们的方法结合了增加细胞密度,使用低密度的细胞图像数据集来训练PEN,并通过策划数据集来评估PEN。我们表明,通过PEN,CellPose中学习到的语义表示对深度进行了编码,与作为输入的最大强度投影图像相比,大大提高了分割性能,但对基于区域的网络如Mask-RCNN的分割没有类似帮助。最后,我们剖析了PEN与CellPose在并排球状体的传播细胞上对细胞密度的分割强度。我们将PEN作为一个数据驱动的解决方案,以形成三维数据的压缩表示,改善实例分割网络的二维分割。

Contemporary approaches to instance segmentation in cell science use 2D or 3D convolutional networks depending on the experiment and data structures. However, limitations in microscopy systems or efforts to prevent phototoxicity commonly require recording sub-optimally sampled data regimes that greatly reduces the utility of such 3D data, especially in crowded environments with significant axial overlap between objects. In such regimes, 2D segmentations are both more reliable for cell morphology and easier to annotate. In this work, we propose the Projection Enhancement Network (PEN), a novel convolutional module which processes the sub-sampled 3D data and produces a 2D RGB semantic compression, and is trained in conjunction with an instance segmentation network of choice to produce 2D segmentations. Our approach combines augmentation to increase cell density using a low-density cell image dataset to train PEN, and curated datasets to evaluate PEN. We show that with PEN, the learned semantic representation in CellPose encodes depth and greatly improves segmentation performance in comparison to maximum intensity projection images as input, but does not similarly aid segmentation in region-based networks like Mask-RCNN. Finally, we dissect the segmentation strength against cell density of PEN with CellPose on disseminated cells from side-by-side spheroids. We present PEN as a data-driven solution to form compressed representations of 3D data that improve 2D segmentations from instance segmentation networks.

Subjects: cs.AI、cs.LG、cs.CE、cs.CL

1.Molecular Language Model as Multi-task Generator


作者: Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, Huajun Chen




Molecule generation with desired properties has grown immensely in popularity by disruptively changing the way scientists design molecular structures and providing support for chemical and materials design. However, despite the promising outcome, previous machine learning-based deep generative models suffer from a reliance on complex, task-specific fine-tuning, limited dimensional latent spaces, or the quality of expert rules. In this work, we propose MolGen, a pre-trained molecular language model that effectively learns and shares knowledge across multiple generation tasks and domains. Specifically, we pre-train MolGen with the chemical language SELFIES on more than 100 million unlabelled molecules. We further propose multi-task molecular prefix tuning across several molecular generation tasks and different molecular domains (synthetic & natural products) with a self-feedback mechanism. Extensive experiments show that MolGen can obtain superior performances on well-known molecular generation benchmark datasets. The further analysis illustrates that MolGen can accurately capture the distribution of molecules, implicitly learn their structural characteristics, and efficiently explore the chemical space with the guidance of multi-task molecular prefix tuning. Codes, datasets, and the pre-trained model will be available in this https


  1. 每日学术速递5.28

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CL 1.Improving Factuality and Reasoni ...

  2. 每日学术速递1.26

    CV - 计算机视觉 今天带来的是北航IRIP实验室被国际人工智能联合会议IJCAI-ECAI 2022接收的3篇论文. IJCAI 是人工智能领域中最主要的学术会议之一,原为单数年召开,自2015年 ...

  3. 每日学术速递1.27

    CV - 计算机视觉  |  ML - 机器学习 |  RL - 强化学习 前沿推介: ICLR 2023 ICLR 全称为国际学习表征会议(International Conference on L ...

  4. 每日学术速递1.29

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 更多Ai资讯: Subjects:cs.CV 1. Compact Transformer Trac ...

  5. 每日学术速递2.16

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Efficient Teacher: Semi-Supervis ...

  6. 每日学术速递5.30

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Accelerated Coordinate Encoding: ...

  7. 每日学术速递5.26

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CV 1.Text2NeRF: Text-Driven 3D Scene ...

  8. 每日学术速递5.15

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.CL 1.Not All Languages Are Created Eq ...

  9. 每日学术速递4.12

    CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理 Subjects: cs.HC 随着新的"生成代理"论文的发布,LLM刚刚达到了 ...


  1. C++中引用()的用法和举例说明
  2. [AtCoder Educational DP Contest] V - Subtree(树形dp + 前缀积/后缀积)
  3. LeetCode 165. 比较版本号
  4. 使用bash编写Linux shell脚本--复合命令
  5. axure 小程序 lib_【kboneui】打通 H5/微信小程序,多端UI库
  6. oracle 截取小数点_oracle函数(关于处理小数点位数和时间) | 学步园
  7. 三.因子图优化学习---董靖博士在泡泡实验室的公开课学习
  8. ES查看索引库结构和数据
  9. python接管已经打开的浏览器_Python Webdriver 从新使用已经打开的浏览器实例
  10. 手机三十分钟熄屏如何一直亮_如何让手机屏幕常亮
  11. rollup函数 和cube函数 的区别?
  12. 用计算机解决对长江水源治理的问题,科学调控长江水资源的思考
  13. 使用代理服务器隐藏电脑上网真实IP地址
  14. python爬虫:批量下载qq空间里的照片(一)
  15. 计算机二级front和rear什么意思,关于计算机二级考试内容
  16. 1加9pro刷个lineageOS Android11
  17. Abp Vnext修改密码强度
  18. IP地址和子网掩码换算
  19. NKOJ-Unknow 不死的 LYM
  20. 第四章 开始Unity Shader学习之旅(1)


  1. Vue获取DOM元素并修改属性
  2. 来啊,来魔改啊,人生重开模拟器一键托管上线
  3. 深度学习——day36 读论文:基于深度学习的海洋环境感知
  4. ckeditor复制粘贴word
  5. 文科学生计算机二级怎么考,对于文科生说计算机二级哪个比较好考
  6. 采用联想笔记本搭建自己的Centos8服务器(摸索中)
  7. 如果做一个架构或 team leader
  8. BMP图像结构及绘制
  9. fifa15服务器位置,fifa15数据包放哪里
  10. 程序员编程之道之快乐编程好习惯