原文链接:Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

作者:Yawei Luo , Liang Zheng, Tao Guan , Junqing Yu , Yi Yang









总loss = Labv+Lseg



1)定义:multi-view learning的一种,构造两个不同的分类器,利用小规模的标注语料,对大规模的未标注语料进行标注


3)算法:假设数据有两种特征表达,比如图像特征(X1, Y1)和文本特征(X2, Y2)。对于未标注数据同样有两种View。算法如下:

  1. 在标注集下,从(X1, Y1),(X2, Y2)分别训练得到两个个分类模型C1,C2
  2. 分别使用C1与C2对未标注数据进行预测
  3. 将C1所预测的前K个置信度最高的样本加入C2的训练数据集
  4. 将C2所预测的前K个置信度最高的样本加入C1的训练数据集
  5. 回到第1步


  1. 两种view的获得方式:dropout [33], consensus regularization [34] or parameter diverse
  2. 置信度:对于每次迭代的训练集可以得到一个已标注数据的概率分布(可以假设为正态分布),以此来评定未标注数据的分布是否贴近

作用:should capture the essential aspect of a pixel across the source and target domains,



1)generator G —— can be any FCN-based segmentation network

组成:feature extractor E 、two classifiers C1 and C2

ResNet-101、vgg16 based FCN8s

2)discriminator D —— CNN-based binary classifier with a fully-convolutional output

consists of 5 convolution layers with kernel 4 × 4 with channel numbers {64, 128, 256, 512, 1} and stride of 2.

Each convolution layer is followed by a Leaky-ReLU parameterized by 0.2 except the last layer.
add an up-sampling layer to the last layer to rescale the output to the size of the input map


一共有三个loss:1)the segmentation loss, 2)the weight discrepancy loss ,3)the self-adaptive adversarial loss

1)segmentation loss(multi-class cross-entropy loss)

Pic denotes the predicted probability of class c on pixel i
Yic denotes the ground truth probability of class c on the pixel i.
If pixel i belongs to class c, yic = 1, otherwise yic = 0

2)the weight discrepancy loss —— minimizing cosine similarity of weights

w1 and w2 are obtained by flattening and concatenating the weights of the convolution filters of C1 and C2

目的:保证C1 and C2的多样性,即两个不同的view

3)the self-adaptive adversarial loss


p(1) and p(2) are predictions made by C1 and C2
M(·, ·) denotes the cosine distance:保证从C1和C2中出来的预测map的一致性,计算时是pixel-pixel级
λlocal controls the adaptive weight for adversarial loss.
a small number  :to stabilize the training process





Cityscapes: a real-world dataset with 5,000 street scenes
GTA5: 24,966 high-resolution images compatible with the Cityscapes annotated classes. 
SYNTHIA: 9400 synthetic images

迁移任务:SYNTHIA → Cityscapes、 GTA5 → Cityscapes


source only:没有迁移,在源域训练后在目标域上预测

几个souce only:针对以下方法的backbone





