Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

（2020 AAAI）

Dezhao Luo, Chang Liu, Yu Zhou, Dongbao Yang, Can Ma, Qixiang Ye, Weiping Wang

Notes

论文：Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

代码： https://github.com/BestJuly/VCP

Category

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Contributions

Method

1、Representation Learning

2、Model Assesment

Result

Contributions

1、提出了 VCP（Video Cloze Procedure）这种视频自监督的方法。该自监督方法的创新点：每次输入 m 个 video clips，只对其中一段 video clip 做 5 种 operations 中的一种，让模型预测模型做了哪一种 operation。

【只对其中一段 video clip 做 5 种 operations 中的一种，相当于有另外 m-1 个不做 operation 的 video clips 做参考！】

【让模型预测模型做了哪一种 operation，可以降低模型学习的难度，而不是去完成某一种任务。比如：同样是 temporal shuffling，有些自监督方法是要求模型预测出 shuffle 的顺序，但 VCP 是去判断是否做了 temporal shuffling 还是别的操作】

2、提出用 VCP 作为其他自监督模型的评估标准。（有创意，但实验设置感觉略有不合理之处，仅是个人不成熟的想法，欢迎指正。）

Method

1、Representation Learning

1、首先是 video clip sampling。论文里说是每隔 l 帧 sample 一个 k 帧的视频。一个原始视频连续 sample 出 m 个这样的 video clips。然后从这连续的 m 个 video clips 中随机选出一个做 operation，其他不变。

代码和论文里的具体实现是：连续 3 个 video clips，每隔 8 帧 sample 一个 16 帧的 video clip。对中间那个 video clip 做五种操作中的任意一种。

2、五种 operations

original：不变

spatial rotation：将 video clip 旋转 90° / 180° / 270°

sptial permutation：把 video clip 按空间维度划分为 4 块，任意选其中两块交换位置，一共 $C_{4}^{2}$ = 6 种

temporal remote shuffling：将这个 video clip 直接替换为一个向前或向后距离较远的一个 video clip

temporal adjacent shuffling：把 video clip 按时间维度划分为 4 块，任意选其中两块交换位置，一共 $C_{4}^{2}$ = 6 种

这里的 sptial permutation 和 temporal adjacent shuffling 沿用了 “只对 3 个 video clips 种的一个做 operation” 的思想，只对其中两块进行 shuffle，降低难度，同时还保留了两块做参考。

3、训练

将 3 个 video clips （其中中间那个）依次送入 3D-CNN 中，每个 video clip 得到一个固定维度的 representation，再 concatenate 起来，送入最后的五分类 Linear Clasifier（判断是五种操作中的哪一种）。

注意，这里的 3D-CNN 不是一个具体的网络，是一类三位卷积网络，eg：C3D、R3D-18、R(2+1)D-18 。

2、Model Assesment

方法：

别的自监督模型训练完之后，fix 住 backbone 的参数，将原来自监督模型中的 head 换成 VCP 的五分类 linear classifier。然后按照 VCP 处理数据的方法（产生 3 个 video clips，对其中的一个 video clip 做五种 operations 中的一种）喂入数据。

相当于用其他自监督模型训练好的的 encoder 去 represent 三个 video clips，然后让 VCP 的五分类 linear classifier 去评估其他自监督模型表达的性能。

如图，VCOP 是一个 video clip 的 temporal order recognition 自监督方法，所以对 temporal 上的 operation 分类准确率会高一些，spatial 上的 operation 分类准确率会低一些。

这里我不太能理解的点就是：VCP 作为其他自监督模型的评估模型，怎么还能和自己的方法比较呢。。。

Result

（这里的 video retrieval 流程和 ReID 基本一样！）

论文阅读：(2020 AAAI) Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning相关推荐

深度学习论文阅读图像分类篇（五）：ResNet《Deep Residual Learning for Image Recognition》
深度学习论文阅读图像分类篇(五):ResNet<Deep Residual Learning for Image Recognition> Abstract 摘要 1. Introduct ...
论文阅读 [TPAMI-2022] Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised I
论文阅读 [TPAMI-2022] Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised I ...
论文阅读：Self-supervised video representation learning with space-time cubic puzzles
论文名称:Self-supervised video representation learning with space-time cubic puzzles(2019 AAAI) 论文作者:Dah ...
论文阅读：Self-Supervised Video Representation Learning With Odd-One-Out Networks
目录 Contributions Method 1.Model 2.Three sampling strategies. 3.Video frame encoding. Results More Re ...
【论文阅读】Cross Language Image Matching for Weakly Supervised Semantic Segmentation
这篇论文是CLIP模型较早的在弱监督分割上应用的论文. 论文标题: Cross Language Image Matching for Weakly Supervised Semantic Segme ...
论文阅读：AAAI 2020 Relation Network for Person Re-identification 论文翻译
Relation Network for Person Re-identification Hyunjong Park, Bumsub Ham∗ School of Electrical and El ...
【论文阅读】 Reinforced Video Captioning with Entailment Rewards
这篇论文主要有两个亮点: 以往的seq2seq模型在训练时都是经过word-level的交叉熵损失优化的,该损失与最终评估任务的sentence-level的度量没有很好的相关性:并且,以往的模型会遭 ...
论文阅读：AAAI 2021 SeqNet 行人搜索 SOTA论文
Sequential End-to-end Network for Efficient Person Search Zhengjia Li 李正甲苗夺谦同济大学座位挨着甲哥的菜鸡,希望跟着沾光 ...
论文阅读 | Enhanced Quadratic Video Interpolation
前言:发表在ECCV workshop2020上的一篇文章,在QVI的方法上做的改进,是AIM2020时域超分辨挑战赛的第一论文地址:[here] 代码地址:[here] Enhanced Quad ...

论文阅读：(2020 AAAI) Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Category

Contributions

Method

1、Representation Learning

2、Model Assesment

Result

论文阅读：(2020 AAAI) Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning相关推荐

最新文章

热门文章