原文链接 小样本学习与智能前沿 。 在这个公众号后台回复“200527”,即可获得课件电子资源。

文章目录

  • Few-shot image classification
    • Three regimes of image classification
    • Problem formulation
    • A flavor of current few-shot algorithms
    • How well does few-shot learning work today?
    • The key idea
    • Transductive Learning
    • An example
    • Results on benchmark datasets
    • The ImageNet-21k dataset
    • A proposal for systematic evaluation
  • A thermodynamical view of representation learning
    • Transfer learning
    • Information Bottleneck Principle
    • The key idea
    • An auto-encoder
    • Rate-Distortion curve
    • Rate-Distortion-Classification (RDC) surface
    • Equilibrium surface of optimal free-energy
    • An iso-classification loss process
  • Summary

Few-shot image classification

Three regimes of image classification

Problem formulation

Training set consists of labeled samples from lots of “tasks”, e.g., classifying cars, cats, dogs, planes . . .
Data from the new task, e.g., classifying strawberries has:

Few-shot setting considers the case when s is small.

A flavor of current few-shot algorithms

Meta-learning forms the basis for almost all current algorithms. Here’s one successful instantiation.

Prototypical Networks [Snell et al., 2017]

  • Collect a meta-training set, this consists of a large number of related tasks
  • Train one model on all these tasks to ensure that the clustering of features of this model correctly classifies the task
  • If the test task comes from the same distribution as the meta-training tasks, we can use the clustering on the new task to classify new classes

How well does few-shot learning work today?

The key idea

A classifier trained on a dataset DsD_sDs​ is a function F that classifies data x using

The parameters θ∗=θ(Ds)θ^∗ = θ(D_s )θ∗=θ(Ds​) of the classifier are a statistic of the dataset Ds obtained after training. Maintaining this statistic avoids having to search over functions F at inference time.

We cannot learn a good (sufficient) statistic using few samples. So we will search over functions at test-time more explicitly

Transductive Learning

## A very simple baseline

  1. Train a large deep network on the meta-training dataset with the standard classification loss
  2. Initialize a new “classifier head” on top of the logits to handle new classes
  3. Fine-tune with the few labeled data from the new task
  4. Perform transductive learning using the unlabeled test data

with a few practical tricks like cosine annealing of step-sizes,
mixup regularization, 16-bit training, very heavy data augmentation, and label smoothing cross-entrop

An example

Results on benchmark datasets

The ImageNet-21k dataset


1-shot, 5-way accuracies are as high as 89%, 1-shot 20-way accuracies are about 70%.

A proposal for systematic evaluation

A thermodynamical view of representation learning

表征学习的热力学观点

Transfer learning

Let’s take an example from computer vision

(.Zamir, A. R., Sax, A., Shen, W., Guibas, L. J., Malik, J., & Savarese, S. Taskonomy: Disentangling task transfer learning. CVPR 2018)

Information Bottleneck Principle

信息瓶颈原则
A generalization of rate-distortion theory for learning relevant representations of data [Tishby et al., 2000]

Z is a representation of the data X. We want

  • Z to be sufficient to predict the target Y , and
  • Z to be small in size, e.g., few number of bits.


Doing well on one task requires throwing away nuisance information [Achille & Soatto, 2017].

The key idea

The IB Lagrangian simply minimizes I(X;Z)I(X;Z)I(X;Z), it does not let us measure what was thrown away.
Choose a canonical task to measure discarded information. Setting

i.e., reconstruction of data, gives a special task. It is the superset of all tasks and forces the model to learn lossless representations.

The architecture we will focus on is

An auto-encoder

Shanon entropy measures the complexity of data

Distortion D measures the quality of reconstruction

Rate R measures the average excess bits used to encode the representation

Rate-Distortion curve

We know that [Alemi et al., 2017]

this is the well-known ELBO (evidence lower-bound). Let

This is a Lagrange relaxation of the fact that given a variational family and data there is an optimal value R = func(D) that best sandwiches (1).

Rate-Distortion-Classification (RDC) surface

Let us extend the Lagrangian to

where the classification loss is

Can also include other quantities like the entropy S of the model parameters


The existence of a convex surface func(R,D,C,S) = 0 tying together these functionals allows a formal connection to thermodynamics [Alemi and Fischer 2018]

Just like energy is conserved in physical processes, information is conserved in the model, either it is in the encoder-classifier pair or it is in the decoder.

Equilibrium surface of optimal free-energy

The RDC surface determines all possible representations that can be learnt from given data. Can solve the variational problem for F(λ,γ) to get

and

This is called the “equilibrium surface” because training converges to some point on this surface. We now construct ways to travel on the surface
The surface depends on data p(x,y).

An iso-classification loss process

A quasi-static process happens slowly enough for the system to remain in equilibrium with its surroundings, e.g., reversible expansion of an ideal gas.
We will create a quasi-static process to travel on the RDC surface. This constraint is

e.g., if we want classification loss to be constant in time, we need

……
更多精彩,请下载 资源文件 了解。
关注公众号“小样本学习与智能前沿”,回台回复“200527” ,即可获取资源文件“Learning with Few Labeled Data.pdf”

Summary

Simple methods such as transductive fine-tuning work extremely well for few-shot learning. This is really because of powerful function approximators such as neural networks.
The RDC surface is a fundamental quantity and enables principled methods for transfer learning. Also unlocks new paths to understanding regularization and properties of neural architecture for classical supervised learning.
We did well in the era of big data without understanding much about data; this is unlikely to work in the age of little data.

Email questions to pratikac@seas.upenn.edu Read more at

  1. Dhillon, G., Chaudhari, P., Ravichandran, A., and Soatto, S. (2019). A baseline for few-shot image classification. arXiv:1909.02729. ICLR 2020.
  2. Li, H., Chaudhari, P., Yang, H., Lam, M., Ravichandran, A., Bhotika, R., & Soatto, S. (2020). Rethinking the Hyperparameters for Fine-tuning. arXiv:2002.11770. ICLR 2020.
  3. Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2019). Meta-Q-Learning. arXiv:1910.00125. ICLR 2020.
  4. Gao, Y., and Chaudhari, P. (2020). A free-energy principle for representation learning. arXiv:2002.12406.

少标签数据学习:宾夕法尼亚大学Learning with Few Labeled Data相关推荐

  1. 少标签数据学习,54页ppt

    来源:专知 人类的视觉系统证明,用极少的样本就可以学习新的类别;人类不需要一百万个样本就能学会区分野外的有毒蘑菇和可食用蘑菇.可以说,这种能力来自于看到了数百万个其他类别,并将学习到的表现形式转化为新 ...

  2. 2019年最全的大数据学习大纲总结,持续更新.....

    一,题记 要说当下IT行业什么最火?ABC无出其右.所谓ABC者,AI + Big Data + Cloud也,即人工智能.大数据和云计算(云平台).每个领域目前都有行业领袖在引领前行,今天我们来讨论 ...

  3. 某大佬整理的大数据学习路线和教程视频

    一,题记 要说当下IT行业什么最火?ABC无出其右.所谓ABC者,AI + Big Data + Cloud也,即人工智能.大数据和云计算(云平台).每个领域目前都有行业领袖在引领前行,今天我们来讨论 ...

  4. (清华毕业生)大佬总结的“大数据”学习路线+教程

    一,题记 要说当下IT行业什么最火?ABC无出其右.所谓ABC者,AI + Big Data + Cloud也,即人工智能.大数据和云计算(云平台).每个领域目前都有行业领袖在引领前行,今天我们来讨论 ...

  5. 【论文笔记】Learning to Count in the Crowd from Limited Labeled Data

    文章目录 Abstract 1 Introduction 3 Preliminaries 4 GP-based iterative learning 4.1 Labeled stage 4.2 Unl ...

  6. Talk | 阿姆斯特丹大学博士生胡涛:计算机视觉中的标签效率学习

    本期为TechBeat人工智能社区第509期线上Talk! 北京时间6月29日(周四)20:00,阿姆斯特丹大学博士生-胡涛的Talk将准时在TechBeat人工智能社区开播! 他与大家分享的主题是: ...

  7. 【宾夕法尼亚大学机器人课程学习】Motion Planning

    目录: 一.定义 二.Grassfire Algorithm 三.Dijkstra's Algorithm狄克斯特拉算法 四.A_star Algorithm A*算法 五.构形空间configura ...

  8. 使用聚类算法进行标签传播学习(Clustering for Semi-Supervised Learning)

    使用聚类算法进行标签传播学习(Clustering for Semi-Supervised Learning) 目录 使用聚类算法进行标签传播学习(Clustering for Semi-Superv ...

  9. 深圳爱思拓大数据 网站_建议收藏!13个大数据学习网站很少人知道!附大数据自学资料分享...

    数据分析重要性 越来越多的管理者意识到数据分析对经济发展.企业运营的重要意义 在古代,得琅琊阁者得天下 现在,得大数据者得天下 我总结的数据分析五步走: 1.锁定分析目标,梳理思路,叫纸上谈兵: 2. ...

最新文章

  1. MSI-X 之有别于MSI
  2. Linux 文件的权限
  3. Oracle用户相关命令
  4. 关于div容器高度随着浏览器宽度按照宽高比自适应的问题(css解决方案)
  5. YII2操作mongodb笔记(转)
  6. uoni扫地机器人好用吗_扫地机器人好用吗?了解性能看这篇
  7. 技术胖1-4季视频复习— (看视频笔记)
  8. SQL Server自动化运维系列——监控磁盘剩余空间及SQL Server错误日志(Power Shell)...
  9. Layui表单账号注册校验密码是否一致
  10. LeetCode(344)——反转字符串(JavaScript)
  11. SQL Server中,varchar和nvarchar如何选择
  12. pyCharm-激活码(2018)
  13. Navicat12.0 激活
  14. IT之路,从迷茫“愤青”到团队项目经理,他是如何一步步走出来的?
  15. Linux系统各发行版镜像下载(借阅)
  16. 分支定界-附Python代码
  17. opencv 特征提取 -SIFT
  18. 胡谈编程语言:从C语言到Julia
  19. 2019年第十届蓝桥杯决赛(国赛) C++大学A组 D题 序列求和【全网找不到的题解?】
  20. 信捷plc modbus通信

热门文章

  1. 如何写英文科技论文 (Unit2 主动语态与动词使用)
  2. 多线程爬取中超全部2018赛季职业球员
  3. 数北机房,数字北京机房
  4. 网络安全应急响应技术实战指南
  5. linux kernel ebtables接口
  6. flat方法的简单实现
  7. Jeston TX1配置Caffe教程-从裸板开始
  8. 1987年图灵奖--约翰·科克简介
  9. 2021-03-08NDVI(未完,待续)
  10. 柠檬导航巨人导航正品蓝_福特合作Telenav提供汽车导航服务 即使无蜂窝网络也可正常工作...