CVPR2020 Harmonizing Transferability and Discriminability for Adapting Object Detector
Harmonizing Transferability and Discriminability for Adapting Object Detector
code:https://github.com/chaoqichen/HTCN
Abstract
论文中设计了三个模块,用于源域和目标域之间的对齐,同时能够使得检测器能够对源域特征和目标域特征具有一定的辨别能力。
(1) Importance Weighted Adversarial Training with input Interpolation (IWAT-I), which strengthens the global discriminability by re-weighting the interpolated image-level features; (2) Context-aware Instance-Level Alignment (CILA) module, which enhances the local discriminability by capturing the underlying complementary effect between the instance-level feature and the global context information for the instance-level feature alignment; (3) local feature masks that calibrate the local transferability to provide semantic guidance for the following discriminative pattern alignment.
Introduction
由于源域与目标域之间存在domain shift,在源域训练得到的目标检测器不能够很好地泛化到目标域上。
This hinders the deployment of models in real-world situations where data distributions typically vary from one domain to another. Unsupervised Domain Adaptation (UDA) serves as a promising solution to solve this problem by transferring knowledge from a labeled source domain to a fully unlabeled target domain.
之前的方法对于域自适应中的目标检测问题,通过将对抗学习的方法纳入到传统的目标检测框架中,通过对各层次的图像特征进行对齐,实现模型的可迁移性。作者认为通过对抗的方式学习得到transferability
的代价是使得模型的discriminability
有所下降。
文章中提到的transferability
discriminability
解释如下:
Note that, in this paper, the transferability refers to the invariance of the learned representations across domains, and discriminability refers to the ability of the detector to localize and distinguish different instances.
Related Work
Unsupervised Domain Adaptation
Typically, UDA methods propose to bridge different domains by matching the high-order statistics of source and target feature distributions in the latent space.
With insights from the practice of Generative Adversarial Nets (GAN), tremendous works have been done by leveraging the two-player game to achieve domain confusion with Gradient Reversal Layer (GRL) for feature alignment.
In addition, other GAN-based works aim to achieve pixel-level adaptation in virtue of image-toimage translation techniques
UDA for Object Detection
Nevertheless, all these UDA methods do not properly handle the the potential contradiction between transferability and discriminability when adapting object detectors in the context of adversarial adaptation.
Hierarchical Transferability Calibration Network (HTCN)
由于transferability
discriminability
之间存在一定的矛盾关系,因此作者主要从两个方面入手解决此问题。
(1)通过对多层特征进行匹配对迁移性进行校准(2)将多层级间的特征进行传递,实现跨域的对齐。
Importance Weighted Adversarial Training with Input Interpolation
首先利用cycleGAN
对源域和目标域图像进行插值(interpolation)
,由于输入不同的样本器包含的可迁移信息是不同的,因此通过计算不同样本所具有的不确定性,计算每个样本对应的重加权系数,来表征每个样本的迁移能力。
IWAT-I
模块中,对于域判别器D2D_{2}D2的输出为di=D2(G1∘G2(xi))d_{i} = D_{2}\left(G_{1}\circ G_{2}(x_{i})\right)di=D2(G1∘G2(xi)),其对应的输入样本xix_{i}xi的域标签信息,其不确定都计算方式如下:
vi=H(di)=−di⋅log(di)−(1−di)⋅log(1−di)(1)v_{i} = H(d_{i}) = -d_{i}\cdot \log(d_{i})- (1-d_{i})\cdot \log(1-d_{i}) \qquad (1) vi=H(di)=−di⋅log(di)−(1−di)⋅log(1−di)(1)
则每个样本的权重为1+vi1+v_{i}1+vi,
Images with high uncertainty (hard-to-distinguish by D2D_{2}D2) should be up-weighted, vice versa.
该样本re-weighted得到
gi=fi×(1+vi)(2)g_{i} = f_{i}\times \left(1+v_{i}\right) \qquad (2) gi=fi×(1+vi)(2)
所以D3D_{3}D3的对抗损失为
Lga=E[log(D3(G3(gis)))]+E[1−log(D3(G3(git)))](3)\mathcal{L}_{ga} = \mathbb{E}[\log(D_{3}(G_{3}(g_{i}^{s})))] + \mathbb{E}[1 - \log(D_{3}(G_{3}(g_{i}^{t})))] \qquad (3) Lga=E[log(D3(G3(gis)))]+E[1−log(D3(G3(git)))](3)
Context-Aware Instance-Level Alignment
之前实例级别的对齐是基于ROI-Pooling
后的特征进行局部特征的对齐,主要是对域间样本object scale、viewpoint、deformation、appearance
特征的对齐。然而存在的问题是这些特征只考虑到目标周围局部特征的对齐,没有考虑整体上下文的一个信息。因此对于不同域之间实例特征是有差别的,二上下文向量通过从底层特征进行融合,能够保证域间的不变性,作者通过将二者融合,实现二者之间的互补。
从backbone
不同层获得context vector
fci(i=1,2,3)f_{c}^{i}(i=1,2,3)fci(i=1,2,3),第iii个图像经过ROI-Polling
后得到第jjj个区域对应的特征为finsi,jf_{ins}^{i,j}finsi,j,通过将其concatenate
进行实例特征和context vector
的融合得到[fc1,fc2,fc3,fins][f_{c}^{1},f_{c}^{2},f_{c}^{3},f_{ins}][fc1,fc2,fc3,fins],但是此种方法存在一个问题,context vector
与实例级的特征之间是独立的,无法实现上述所说的二者的互补。由于这两个特征是不对称的,
采用下述方式将fci(i=1,2,3)f_{c}^{i}(i=1,2,3)fci(i=1,2,3),与finsf_{ins}fins相乘,会产生维度爆炸。
ffus=[fc1,fc2,fc3]⊗fins(4)\boldsymbol{f}_{fus} = [f_{c}^{1},f_{c}^{2},f_{c}^{3}]\otimes f_{ins} \qquad(4) ffus=[fc1,fc2,fc3]⊗fins(4)
作者利用随机化的方式对特征向量积进行无偏估计,
we propose to leverage the randomized methods as an unbiased estimator of the tensor product.
ffus=1d(R1fc)⊙(R2fins)(5)\boldsymbol{f}_{fus} = \frac{1}{\sqrt{d}}(\boldsymbol{R}_{1}\boldsymbol{f}_{c})\odot(\boldsymbol{R}_{2}\boldsymbol{f}_{ins}) \qquad(5) ffus=d1(R1fc)⊙(R2fins)(5)
其中⊙\odot⊙表示Hadamard product
CA-ILA损失为
Lins=−1Ns∑i=1Ns∑i,jlog(Dins(ffusi,j)s)−1Nt∑i=1Nt∑i,jlog(1−Dins(ffusi,j)t)(6)\mathcal{L}_{ins} = -\frac{1}{N_s}\sum^{N_s}_{i=1}\sum_{i,j}\log(D_{ins}(\boldsymbol{f}^{i,j}_{fus})_s) -\frac{1}{N_t}\sum^{N_t}_{i=1}\sum_{i,j}\log(1-D_{ins}(\boldsymbol{f}^{i,j}_{fus})_t) \qquad(6) Lins=−Ns1i=1∑Nsi,j∑log(Dins(ffusi,j)s)−Nt1i=1∑Nti,j∑log(1−Dins(ffusi,j)t)(6)
Local Feature Mask for Semantic Consistency
对于不同域的图像尽管scene layouts
、object co-occurrence
、background
之间存在一定的差异,但是对于同一类目标器在不同的域中应该具有相同的sketch
,作者假设图像中某些区域相对于其他区域更具有一定的描述性和优势,因此作者在浅层特征的基础上对局部特征计算编码得到mask
,指导后续的语义一致性。
对于G1G_1G1的pixel-wise
对抗损失为
Lla=1Ns⋅HW∑i=1Ns∑k=1HWlog(D1(G1(xis)k))2+1Nt⋅HW∑i=1Nt∑k=1HWlog(1−D1(G1(xit)k))2(7)\mathcal{L}_{la} = \frac{1}{N_s\cdot HW}\sum^{N_s}_{i=1}\sum^{HW}_{k=1} \log(D_1(G_1(x^s_i)_k))^2 +\frac{1}{N_t\cdot HW}\sum^{N_t}_{i=1}\sum^{HW}_{k=1} \log(1-D_1(G_1(x^t_i)_k))^2 \qquad(7) Lla=Ns⋅HW1i=1∑Nsk=1∑HWlog(D1(G1(xis)k))2+Nt⋅HW1i=1∑Ntk=1∑HWlog(1−D1(G1(xit)k))2(7)
对于源域和目标域的特征mask
mfsm^s_fmfs和mftm^t_fmft的计算是利用D1D_1D1的不确定度,
G1G_1G1提取的源域和目标域样本特征为rik=(G1(xi))kr^k_i=(G_1(x_i))_krik=(G1(xi))k,D1D_1D1的输出为dik=D1(rik)d^k_i=D_1(r^k_i)dik=D1(rik)类似公式(1)的计算方式,每个区域的不确定度为v(rik)=H(dik)v(r^k_i)=H(d^k_i)v(rik)=H(dik),由此得到每个样本对应的feature mask
为mfk=2−v(rik)m^k_f=2-v(r^k_i)mfk=2−v(rik),经过re-weighted后的特征为r~ik←rik⋅mik\widetilde{r}^k_i\leftarrow r^k_i\cdot m^k_irik←rik⋅mik。
D2D_2D2的对抗损失为
Lma=E[log(D2(G2(f~is)))]+E[1−log(D2(G2(f~it)))](8)\mathcal{L}_{ma}=\mathbb{E}[\log(D_2(G_2(\widetilde{f}^s_i)))] +\mathbb{E}[1-\log(D_2(G_2(\widetilde{f}^t_i)))] \qquad(8) Lma=E[log(D2(G2(fis)))]+E[1−log(D2(G2(fit)))](8)
Training Loss
Lcls\mathcal{L}_{cls}Lcls和Lreg\mathcal{L}_{reg}Lreg为目标检测的损失,总的目标函数为
maxD1,D2,D3minG1,G2,G3Lcls+Lreg−λ(Lla+Lma+Lga+Lins)(9)\max\limits_{D_1,D_2,D_3}\min\limits_{G_1,G_2,G_3} \mathcal{L}_{cls}+\mathcal{L}_{reg}-\lambda(\mathcal{L}_{la}+\mathcal{L}_{ma}+\mathcal{L}_{ga}+\mathcal{L}_{ins}) \qquad(9) D1,D2,D3maxG1,G2,G3minLcls+Lreg−λ(Lla+Lma+Lga+Lins)(9)
Experiments
Cityscapes to Foggy-Cityscapes
PASCAL VOC to Clipart
Sim10K to Cityscapes
Ablation Study
CVPR2020 Harmonizing Transferability and Discriminability for Adapting Object Detector相关推荐
- 论文笔记:Harmonizing Transferability and Discriminability for Adapting Object Detectors
论文地址:https://ieeexplore.ieee.org/document/9157147 源码地址:https://github.com/chaoqichen/HTCN 1 Main Ide ...
- 论文简读《Harmonizing Transferability and Discriminability for Adapting Object Detectors》
CVPR2020 | Code 思想:首先文章提出当前基于对抗的方法 image and instance levels alignment [7], strong-local and weak-gl ...
- 搭建目标检测模型之Harmonizing Transferability and Discriminability for Adapting Object Detectors
搭建环境 准备数据集 下载数据集 数据集1:PASCAL_VOC 07+12 and Clipart 数据集2:cityscapes and foggy_cityscapes 修改数据集配置信息 预训 ...
- Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation
ICML2019: Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain A ...
- 论文解读:Adapting Object Detectors via Selective Cross-Domain Alignment
论文题目:Adapting Object Detectors via Selective Cross-Domain Alignment(CVPR2019) 论文主要贡献:以往的域自适应的方法在分类和分 ...
- 目标检测--Light-Head R-CNN: In Defense of Two-Stage Object Detector
Light-Head R-CNN: In Defense of Two-Stage Object Detector Code will be make publicly available 本文对 T ...
- Light-Head R-CNN: In Defense of Two-Stage Object Detector
论文笔记:https://zhuanlan.zhihu.com/p/33158548 论文链接:https://arxiv.org/abs/1711.07264 这篇文章从题目上看就一目了然:捍卫tw ...
- 【CVPR 2021】VarifocalNet: An IoU-aware Dense Object Detector的译读笔记
论文 VarifocalNet: An IoU-aware Dense Object Detector 摘要 准确排序大量候选框对dense检测器获得高精度是十分重要的.之前的工作使用类别分数或者类别 ...
- Improved 3D Object Detector Under Snowfall Weather Condition Based on LiDAR Point Cloud
Improved 3D Object Detector Under Snowfall Weather Condition Based on LiDAR Point Cloud Method Doubl ...
最新文章
- Generalized Linear Models
- java剪切txt文件_用Java把剪切板的内容实时保存到txt
- 数百名车主因断网被锁车外 马斯克:将确保此类事件不再发生
- 马士兵oracle视频教程笔记
- 人工智能肉搏战:商汤和旷世们的商业化征途
- qq音乐api android,QQ音乐
- OMRON继电器基础讲解
- 单片机c语言1ms程序,51单片机c语言延时函数 Void delay 1ms(unsigned int ms){un
- python编写密码登录程序_python初学之用户登录的实现过程(实例讲解)
- 大白小课程-跟着官方教程学习Scratch3.0-P04制作音乐
- 第二章 6 选择并 遮住
- uni-app 快速发送短信
- 团队任务3每日立会(2018-10-23)
- fuchsia - google 新系统学习(一)
- 《ffmpeg basics》中文版 -- 16.数字音频
- 中科院用不起的知网,一年主营业务收入11.6亿元,毛利率高过工商银行
- 2020一级计算机考证
- 超神学院德诺计算机,超神学院:扒一扒隐藏起来的人物,德诺星系的人有没有活下来的?...
- 基于DEM模拟淹没区域随时间推演算法代码展示
- [技术发展-28]:信息通信网大全、新的技术形态、信息通信行业高质量发展概览