See Better Before Looking Closer: Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification
Paper PDF

文章目录

  • Abstract
  • Innovation
  • Pipeline
    • Weakly Supervised Attention Learning
      • Spatial Representation
      • Bilinear Attention Pooling(BAP)
      • Attention Regularization
    • Attention-guided Data Augmentation
      • Augmentation Map
      • Attention Cropping
      • Attention Dropping
    • Object Localization and Refinement
  • Experiments
    • Ablation:
    • Comparison with random data augmentation
    • Comparison with Stage-of-the-Art Methods

Abstract

In practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, they propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object’s discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts’ features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance.

In summary, the main contributions of this work are:

  1. They propose Weakly Supervised Attention Learning to generate attention maps to represent the spatial distribution of discriminative object’s parts, And use BAP module to get the whole object feature by accumulating the object’s part feature.
  2. Based on attention maps, they propose attention-guided data augmentation to improve the efficiency of data augmentation, including attention cropping and attention dropping. Attention cropping randomly crops and resizes one of the attention part to enhance the local feature representation. Attention dropping randomly erases one of the attention region out of the image in order to encourage the model to extract the feature from multiple discriminative parts.
  3. They utilize attention maps to accurately locate the whole object and enlarge it to further improve the classification accuracy.

Innovation

  1. Bilinear Attention Pooling(BAP)
  2. Attention Regularization
  3. Attention-guided Data Augmentation

Pipeline


The training process can be divided into two parts: Weakly Supervised Attention Learning and Attention-guided Data Augmentation:

Weakly Supervised Attention Learning

Spatial Representation

Attention maps A which is obtained from Feature maps F by convolutional function f(⋅)f(\cdot)f(⋅) in Equ 1 . Each Attention map AkA_{k}Ak​ represents one of the object’s part or visual pattern, such as the head of a bird, the wheel of a car or the wing of an aircraft. Attention maps will be utilized to augment training data.
A=f(F)=⋃k=1MAk(1)A=f(F)=\bigcup_{k=1}^{M}A_{k} \tag{1} A=f(F)=k=1⋃M​Ak​(1)

Bilinear Attention Pooling(BAP)


They propose Bilinear Attention Pooling (BAP) to extract features from these parts are represented by Attention maps. We element-wise multiply feature maps F by each attention map AkA_{k}Ak​ in order to generate M part feature maps FkF_{k}Fk​, as shown in Equ 2
Fk=Ak⊙F(k=1,2,...,M)(2)F_{k} = A_{k} \odot F (k = 1, 2, ...,M) \tag{2} Fk​=Ak​⊙F(k=1,2,...,M)(2)

Then, They further extract discriminative local feature by additional feature extraction function g(⋅)g(\cdot)g(⋅) such as Global Average Pooling (GAP), Global Maximum Pooling (GMP) or convolutions, in order to obtain kthk_{th}kth​ attention feature fkf_{k}fk​.
fk=g(Fk)(3)f_{k}=g(F_{k}) \tag{3} fk​=g(Fk​)(3)

Object’s feature is represented by part feature matrix P which is stacked by these part features fkf_{k}fk​.

P=(g(a1⊙F)g(a2⊙F)...g(aM⊙F))=(f1f2...fM)(4)P=\begin{pmatrix} g(a_{1} \odot F) \\ g(a_{2} \odot F) \\ ... \\ g(a_{M} \odot F) \end{pmatrix} =\begin{pmatrix} f_{1} \\ f_{2} \\ ... \\ f_{M} \end{pmatrix} \tag{4} P=⎝⎜⎜⎛​g(a1​⊙F)g(a2​⊙F)...g(aM​⊙F)​⎠⎟⎟⎞​=⎝⎜⎜⎛​f1​f2​...fM​​⎠⎟⎟⎞​(4)

Attention Regularization

For each fine-grained category, They expect that attention map AkA_{k}Ak​ can represent the same kthk_{th}kth​object’s part. They penalize the variances of features that belong to the same object’s part, which means that part feature fkf_{k}fk​ will get close to the a global feature center ckc_{k}ck​ and attention map AkA_{k}Ak​ will be activated in the same kthk_{th}kth​ object’s part. The loss function can be represented by LAL_{A}LA​ in Equ 5.
LA=∑k=1M∥fk−ck∥22(5)L_{A}=\sum_{k=1}^{M}\left \| f_{k} - c_{k} \right \|_{2}^{2} \tag{5} LA​=k=1∑M​∥fk​−ck​∥22​(5)

ckc_{k}ck​ wil updates by the Equ 6 from initialization zero.
ck←ck+β(fk−ck)(6)c_{k} \leftarrow c_{k} + \beta(f_{k} -c_{k}) \tag{6} ck​←ck​+β(fk​−ck​)(6)

Attention-guided Data Augmentation

Random image cropping, is low-efficiency and a high percentage of them contain many background noises, which might lower the training efficiency, affect the quality of the extracted features and cancel out its benefits. Using Attention as guideline, the crop images may focus more on the target.

Augmentation Map

For each training image, they randomly choose one of its attention map AkA_{k}Ak​ to guide the data augmentation process, and normalize it as kthk_{th}kth​ Augmentation Map Ak∗A_{k}^{*}Ak∗​
Ak∗=Ak−min(Ak)max(Ak)−min(Ak)(7)A_{k}^{*} = \frac{A_{k}-min(A_{k})}{max(A_{k})-min(A_{k})} \tag{7} Ak∗​=max(Ak​)−min(Ak​)Ak​−min(Ak​)​(7)

Attention Cropping

Crop Mask CkC_{k}Ck​ from Ak∗A_{k}^{*}Ak∗​ by setting element Ak∗(i,j)A_{k}^{*}(i,j)Ak∗​(i,j) which is greater than threshold θc\theta_{c}θc​ to 1, and others to 0, as represented in Equ 8.

Ck(i,j)={1,if Ak∗(i,j)>θc0,otherwise.(8)C_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{c} \\ 0, & \text {otherwise.} \end{cases}} \tag{8} Ck​(i,j)={1,0,​ if Ak∗​(i,j)>θc​otherwise.​(8)

We then find a bounding box Bk that can cover the whole
selected positive region of CkC_{k}Ck​ and enlarge this region from raw image as the augmented input data.

Attention Dropping

To encourage attention maps represent multiple discriminative object’s parts, they propose attention dropping. Specifically, they obtain attention Drop Mask DkD_{k}Dk​ by setting element Ak∗(i,j)A_{k}^{*}(i,j)Ak∗​(i,j) which is greater than threshold θd\theta_{d}θd​ to 0, and others to 1, as shown in Equ 9

Dk(i,j)={1,if Ak∗(i,j)>θd0,otherwise.(9)D_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{d} \\ 0, & \text {otherwise.} \end{cases}} \tag{9} Dk​(i,j)={1,0,​ if Ak∗​(i,j)>θd​otherwise.​(9)

Object Localization and Refinement

In the testing process, after the model outputs the coarse-stage classification result and corresponding attention maps for the raw image, we can predict the whole region of the object and enlarge it to predict fine-grained result by the same network model. Object Map AmA_{m}Am​ that indicates the location of object is calculated by Equ 10.
Am=1M∑k=1MAk(10)A_{m}=\frac{1}{M}\sum_{k=1}^{M} A_{k} \tag{10} Am​=M1​k=1∑M​Ak​(10)
The final classification result is averaged by the coarse- grained prediction and fine-grained prediction. The detailed process of Coarse-to-Fine prediction is described as Algorithm below:

Experiments

Ablation:

Comparison with random data augmentation


Comparison with Stage-of-the-Art Methods


WS-DAN:Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification相关推荐

  1. WS-DAN 复现 WSDAN(Weakly Supervised Data Augmentation Network)

    目录 一, WS-DAN介绍 二,准备 2.1 平台 2.2 源码 三,开始复现 3.1,创建实例 3.2 在jupyter lab中创建文件 3.3 下载ws-dan项目 3.4 数据集准备 3.5 ...

  2. Weakly Supervised Data Augmentation Net-work (WS-DAN)

    Weakly Supervised Data Augmentation Net-work (WS-DAN) 原文:Weakly Supervised Data Augmentation Net-wor ...

  3. See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visu

    See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visua ...

  4. data augmentation 数据增强方法总结

    1.问题描述 收集数据准备微调深度学习模型时,经常会遇到某些分类数据严重不足的情况,另外数据集过小容易造成模型的过拟合. 本文参考一些网友对于数据增强方法的一些tips,后续会附上自己实现的C++代码 ...

  5. 【论文阅读】Weakly Supervised Semantic Segmentation using Out-of-Distribution Data

    一篇弱监督分割领域的论文,发表在CVPR2022上: 论文标题: Weakly Supervised Semantic Segmentation using Out-of-Distribution D ...

  6. 【论文阅读】UntrimmedNets for Weakly Supervised Action Recognition and Detection

    Abstract 提出 UntrimmedNet ,从Untrimmed视频的视频级标签中直接学习动作识别和检测模型,分为 classification 和 selection 两个模块,可端到端训练 ...

  7. [CVPR 2016] Weakly Supervised Deep Detection Networks论文笔记

    Weakly Supervised Deep Detection Networks,Hakan Bilen,Andrea Vedaldi https://www.cv-foundation.org/o ...

  8. 论文笔记 Weakly Supervised Deep Detection Networks - CVPR 2016

    Weakly Supervised Deep Detection Networks Hakan Bilen, Andrea Vedaldi CVPR, 2016 (PDF) (Citations 58 ...

  9. Weakly Supervised Video Salient Object Detection

    Weakly Supervised Video Salient Object Detection 摘要 1. Introduction 2. Related Work 3. Our Method 3. ...

最新文章

  1. linux下查看所有用户及所有用户组
  2. python 并发编程 多线程 目录
  3. Linux内存管理:知识点总结(ARM64)
  4. 这四款录屏工具,也许是电脑录屏软件中免费、无广告且最实用的
  5. C语言求素数/质数最高效的方法
  6. POI导出设置列为文本类型
  7. Windows10键盘快捷键大全
  8. git下载单个文件夹
  9. No enclosing instance of type * is accessible. Must qualify the allocation with an enclosing instanc
  10. 电脑重装系统,微信备份与恢复聊天记录,保存的文件。微信聊天记录迁移
  11. imperva 获取gti文档
  12. 贝塞尔波纹+蒙版和螺旋线进度条控件
  13. cs231n-2022-assignment1#Q4:Two-Layer Neural Network(Part1)
  14. 买定离手!AI预测英雄联盟S12冠军;微软使用AI提高农业生产效率;编程语言的自动生成;机器学习核方法入门·电子书;前沿论文 | ShowMeAI资讯日报
  15. 基于thinkphp校园二手交易网站#毕业设计
  16. Python:计算两个时间相隔多少天
  17. windows 防火墙配置(只允许外网连接,不允许内网连接)
  18. Flink并行度与slot之间的关系
  19. FlickR的雅虎通插件FlickengR:好友图片连环播
  20. 如何写网络营销用的成功案例文章

热门文章

  1. AnyProxy 安装笔记
  2. 精选3款论文翻译神器,直接翻译PDF全文英文文献!
  3. matlab中等号的用法,matlab中“==”两个等号连一块是啥意思?怎么用?
  4. matlab中合并分子分母,matlab – 将分子和分母多项式分解为偶数和奇数部分
  5. 量子计算机新宇宙,脑洞大开!未来的量子计算机将运行在平行宇宙里
  6. pyautogui 滑动页面_PyAutoGui 鼠标控制文档
  7. 使用Octomap生成二维占据栅格导航地图
  8. 徐凌云老师--沪师经纪
  9. 机器视觉HALCON软件学习总结
  10. 如何自动识别快递单号和批量查询的方法