Learning Transferable Architectures for Scalable Image Recognition论文简述

本文是谷歌在2018年提出的一篇神经网络结构最优搜索的论文

本文的核心思想是基于一篇Neural Architecture Search With Reinforcement Learning，的优化和改进。

我们知道在深度学习任务比如图像识别，图像分类中模型结构是非常关键的，比如VGG，ResNet，DenseNet等在网络结构的设计。但是，这种人为设计的结构就一定好么？不一定，因此希望能通过一个神经网络来按照一定策略学习得到最佳的网络结构。

首先，先简单说一说Neural Architecture Search With Reinforcement Learning这篇论文。

这篇文论的话，现在看来比较粗暴，用强化学习（reinforcement learning）来学习一个最优的网络结构，简单来说就是通过一个controller在搜索空间（search space）中得到一个网络结构（child network），然后用这个网络结构在数据集上训练得到准确率，再将这个准确率回传给controller，controller继续优化得到另一个网络结构，如此反复进行直到得到最佳的结果。

很明显核心就是如何通过这个controller得到我们的网络结构，以及如何训练我们的controller。首先，如何得到我们的子网络结构，controller采用的就是RNN结构。那么为什么是RNN结构呢?因为作者观察到，神经网络的结构和连通性通常可以用一串长度可变的字符串来表示，因此可以用一个循环神经网络来生成这样的字符串。也就是如图这样

那么如何训练呢？这里就用到了强化学习的方法。我每次选出一个子网络，得到子网络精度，这个精度就是我们的reward，而搜索空间就是我们的action空间，controller就是我们的agent，如此便可以训练我们模型了。

接下来进入正题。

那么本篇论文，就是基于这样的一个方法的改进。

首先，在更新controller参数的时候采用 Proximal Policy Optimization (PPO)，而不是原先的policy geadient method；同时借鉴了目前优秀网络结构（ResNet，GoogleNet）的重复堆叠思想，与上篇论文不同，这里是针对于算子(op)进行搜索的，之后会详细讲到。

因此本文的结构为：

通过借鉴ResNet和GooleNet中网络结构堆叠的思想，这篇论文采用的的最小堆叠单位就是convolution cell，而convlolution cell主要包含两种：第一种是不改变输入feature map的大小的卷积，也就是下图中的Normal Cell；第二种是将输入feature map的长宽各减少为原来的一半的卷积，也就是下图中的Reduction Cell。整体的网络结构就是Normal Cell与Reduction Cell相互穿插，因此当网络结构定义如图，那么本文的controller就用来预测下图中的Normal Cell和Reduction Cell。

每一个Cell由N个block组成，每个block由5个prediction steps组成。这5个steps分别是：

Step 1. Select a hidden state from hi, hi−1 or from the set of hidden states created in previous blocks.

Step 2. Select a second hidden state from the same options as in Step 1.

Step 3. Select an operation to apply to the hidden state selected in Step 1.

Step 4. Select an operation to apply to the hidden state selected in Step 2.

Step 5. Select a method to combine the outputs of Step 3 and 4 to create a new hidden state.

首先的两步就是选择隐藏状态，可以看下图就理解了

而对于step3和step4来说，可选择的op有以下几种

对于最后一步的话就只有add和concat

作者在文中举例说明了一种搜出来的结构如图

最后，通过实验验证NASNet的特点在于在参数数量较少的情况还还能保持较高的准确率。

如有错误，欢迎各位批评指正！

Learning Transferable Architectures for Scalable Image Recognition论文简述相关推荐

X3D: Expanding Architectures for Efficient Video Recognition 论文学习
Abstract 本文提出的 X3D 是一组高效率的视频网络,沿着网络的空间.时间.宽度和深度维度来对较小的2D图像分类结构进行扩展.受到机器学习中特征选择方法的启发,本文使用了一个简单的.逐步的网络 ...
精读《X3D: Expanding Architectures for Efficient Video Recognition》论文
文章目录 1 背景说明 2 之前方法存在的问题 3 文章要解决的核心问题 4 文章的贡献 5 结论 6 X3D Networks 6.1 Basis instantiation 6.2 Expansi ...
[Transformer] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
微调不到0.2%就超越现有微调方法?港大等提出即插即用的轻量级模块AdaptFormerhttps://mp.weixin.qq.com/s/v5OUKK2jZdm63SwP192yKQ AdaptF ...
CLIP论文翻译、Learning Transferable Visual Models From Natural Language Supervision翻译
CLIP论文翻译.Learning Transferable Visual Models From Natural Language Supervision翻译文章目录 CLIP论文翻译.Learn ...
CLIP 论文学习笔记《Learning Transferable Visual Models From Natural Language Supervision》
论文标题:Learning Transferable Visual Models From Natural Language Supervision 论文地址:https://arxiv.org/a ...
论文阅读 Learning Transferable Visual Models From Natural Language Supervisio
Learning Transferable Visual Models From Natural Language Supervision Computer Vision and Pattern Re ...
【论文翻译】X3D: Expanding Architectures for Efficient Video Recognition
参考 X3D: Expanding Architectures for Efficient Video Recognition个人论文笔记 X3D: Expanding Architectures f ...
【论文模型讲解】CLIP（Learning Transferable Visual Models From Natural Language Supervision）
文章目录前言 0 摘要 1 Introduction and Motivating Work 2 Approach 2.0 模型整体结构 2.1 数据集 2.2 选择一种高效的预训练方法 2.3 模 ...
CLIP论文笔记--《Learning Transferable Visual Models From Natural Language Supervision》
CLIP论文笔记--<Learning Transferable Visual Models From Natural Language Supervision> 1.Introducti ...

Learning Transferable Architectures for Scalable Image Recognition论文简述

Learning Transferable Architectures for Scalable Image Recognition论文简述相关推荐

最新文章

热门文章