文章目录

引入
1 主动学习框架
- 1.1 符号与定义
- - 1.1.1 单 / 多标签实例
  - 1.1.2 相关指标
  - 1.1.3 一般框架
- 1.2 算法类别
- - 1.2.1 How to select unlabeled instances for labeling
  - 1.2.2 How to evaluate selected unlabeled instances
  - 1.2.3 A combined view
2 基于IID实例不确定性的主动学习
- 2.1 如何选取无标签实例
- - 2.1.1 Uncertainty sampling
  - 2.1.2 Expected gradient length
  - 2.1.3 Variance reduction

引入

论文地址：https://link.springer.com/article/10.1007/s10115-012-0507-8
主要内容：对主动学习的实例选择进行介绍，并将其分为以下两类：
1）基于独立同分布实例的不确定性；
2）考虑实例之间的关联。
所有图片源自原论文，无意侵权。

1 主动学习框架

1.1 符号与定义

表1：部分符号表

符号	含义
D={e1,⋯,en},ei∈X×Y\mathcal{D} = \{ e_1, \cdots, e_n \}, e_i \in \mathcal{X} \times \mathcal{Y}D={e1,⋯,en},ei∈X×Y	实例集合表示
D=DL∪DUD = D^L \cup D^UD=DL∪DU	主动学习实例集合典型表示
DL={(x1,y1),⋯,(xn,yn)}D^L = \{ (x_1, y_1), \cdots, (x_n, y_n) \}DL={(x1,y1),⋯,(xn,yn)}	有标签实例子集
DU={(x1,?),⋯,(xu,?)}D^U = \{ (x_1, ?), \cdots, (x_u, ?) \}DU={(x1,?),⋯,(xu,?)}	无标签实例子集
X\mathcal{X}X	qqq维特征空间
Y\mathcal{Y}Y	lll维标签空间
xi∈Xx_i \in \mathcal{X}xi∈X	样本

1.1.1 单 / 多标签实例

根据实例包含的标签数量，将其分为单标签实例和多标签实例，具体如下：

定义1 单标签实例：ei={xi,yi}e_i = \{ x_i, y_i \}ei={xi,yi}，且xi={fi1,⋯,fi,q}x_i = \{ f_{i1}, \cdots, f_{i, q} \}xi={fi1,⋯,fi,q}，其中yiy^iyi表示实例标签；
定义2 多标签实例：ei={xi,yi1,⋯,yi,l}e_i = \{ x_i, y_{i1}, \cdots, y_{i, l} \}ei={xi,yi1,⋯,yi,l}，且xi={fi1,⋯,fi,q}x_i = \{ f_{i1}, \cdots, f_{i, q} \}xi={fi1,⋯,fi,q}。

1.1.2 相关指标

主动学习中，实例集合DDD通常包含少量有标签实例以及大量无标签实例，即D=DL∪DUD = D^L \cup D^UD=DL∪DU。为了训练准确的模型，需要对DUD^UDU中的实例在考虑代价和时间消耗的情况下进行标记。因此，主动学习中采用评估指标来选择最有效用值的实例进行标记。以下对两种重要的指标—不确定性指标和相关性进行描述。

定义3 不确定性指标：一个函数fuf_ufu，将DUD^UDU或者DU×YD^U \times \mathcal{Y}DU×Y映射到R\mathbb{R}R，例如entropy、margin或者diversity，具体如下：

fu:{DU↦R,feature view;DU×Y↦R,feature-label view.(1)f_u: \begin{cases} D^U \mapsto \mathbb{R}, \qquad \qquad \text{feature view};\\ D^U \times \mathcal{Y} \mapsto \mathbb{R}, \qquad \text{feature-label view}. \end{cases} \tag{1} fu:{DU↦R,feature view;DU×Y↦R,feature-label view.(1)其中feature view表示只从样本特征考虑，feature -label view则需要考虑标签。

定义4 相关性指标：函数qcq_cqc，用于度量实例xix_ixi和xjx_jxj之间的相关性，具体如下：

qc:{DU×DU↦R,feature view;Y×Y↦R,label view;(DU,Y)×(DU,Y)↦R,both views.(2)q_c: \begin{cases} D^U \times D^U \mapsto \mathbb{R}, \qquad \qquad\qquad \text{feature view};\\ \mathcal{Y} \times \mathcal{Y} \mapsto \mathbb{R}, \ \ \ \ \ \ \qquad \qquad \qquad \text{label view};\\ (D^U, \mathcal{Y}) \times (D^U, \mathcal{Y}) \mapsto \mathbb{R}, \qquad \text{both views}. \end{cases} \tag{2} qc:⎩⎪⎨⎪⎧DU×DU↦R,feature view;Y×Y↦R, label view;(DU,Y)×(DU,Y)↦R,both views.(2) 基于公式2，实例xix_ixi与DUD^UDU中其他所有实例的相关性定义如下：

Qc(xi)=1∣DU∣∑xj∈DU/xiqc(xi,xj).(3)Q_c (x_i) = \frac{1}{\mid D^U \mid} \sum_{x_j \in D^U / x_i} q_c (x_i, x_j). \tag{3} Qc(xi)=∣DU∣1xj∈DU/xi∑qc(xi,xj).(3) 定义5 效用指标：函数uuu，用于评估无标签实例的标记价值，具体如下：

u={fu,未指定qc;fu×qc,指定qc.u = \begin{cases} f_u, \qquad \qquad \text{未指定}q_c;\\ f_u \times q_c, \qquad \text{指定}q_c. \end{cases} u={fu,未指定qc;fu×qc,指定qc. 直观上，uuu越大，实例的效用越高

定义6 查询策略：通过选择一个确定的效用指标，查询策略基于当前模型的预测结果 (用于计算不确定性) 和 / 或数据分布 (用于计算相关性)来对无标记实例进行评估，并选择最优实例进行标注。

1.1.3 一般框架

主动学习的一般框架如下：

算法1 主动学习一般框架
输入：
有标签实例集DLD^LDL、无标签实例集DUD^UDU、最大训练集大小mmm
输出：
模型Θ\ThetaΘ
1：while 训练集当前大小 ≤m\leq m≤m do
2： Θ←\Theta \leftarrowΘ← 基于DLD^LDL训练;
3： DU←D∖DLD^U \leftarrow D \setminus D^LDU←D∖DL;
4： for xi∈DUx_i \in D^Uxi∈DU do
5： ui←u(xi,Θ)u_i \leftarrow u (x_i, \Theta)ui←u(xi,Θ);
6： end for
7： x∗←arg⁡max⁡i(ui)x^* \leftarrow \arg \max_i (u_i)x∗←argmaxi(ui);
8： DL←DL⋃x∗D^L \leftarrow D^L \bigcup x^*DL←DL⋃x∗;
9： DU←DU∖x∗D^U \leftarrow D^U \setminus x^*DU←DU∖x∗;
10：end while

1.2 算法类别

该章节只做简要叙述，可以跳过。

1.2.1 How to select unlabeled instances for labeling

如下表，两大查询策略分别未不确定性和多样性 (diversity)：

1.2.2 How to evaluate selected unlabeled instances

实例的效用由所训练模型的预测结果决定，最为"模棱两可"的实例也具有最大不确定性。又或者实例的标签由投票机制给定，最具信息量的实力则是预测结果最不一致的那一个¹。

1.2.3 A combined view

将how to select unlabeled instances for labeling与how to evaluate selected unlabeled instances结合，将得到主动学习的分层结构图，如下：

2 基于IID实例不确定性的主动学习

定义7 基于IID实例不确定性的主动学习：给定无标签集DUD^UDU、有标签集DLD^LDL以及效用指标u(⋅)=fu(⋅)u (\cdot) = f_u (\cdot)u(⋅)=fu(⋅)，其目标为根据uuu标记DUD^UDU中最有信息量的实例，以获取训练集DLD^LDL。

2.1 如何选取无标签实例

2.1.1 Uncertainty sampling

下表总结了主要的查询策略以及优化目标：

Least confidence (LC) ²的优化目标如下：
xLC∗=arg max⁡x1−PΘ(y^∣x),(5)x_{\rm LC}^* = \argmax_x 1 - P_{\Theta} (\hat{y} \mid x), \tag{5} xLC∗=xargmax1−PΘ(y^∣x),(5)其中y^\hat{y}y^是具有最大先验概率的类别标签。

Sample margin ³的优化目标如下：

xM⋆=argmax⁡xPΘ(y1^∣x)−PΘ(y2^∣x)x_{M}^{\star}=\underset{x}{\operatorname{argmax}} P_{\Theta}\left(\hat{y_{1}} \mid x\right)-P_{\Theta}\left(\hat{y_{2}} \mid x\right) xM⋆=xargmaxPΘ(y1^∣x)−PΘ(y2^∣x)其中y^1\hat{y}_1y^1和y^2\hat{y}_2y^2是具有最大可能性的类别标签。

Entropy的优化目标如下：

xE⋆=argmax⁡x−∑iPΘ(y^i∣xk)log⁡PΘ(y^i∣xk)x_{E}^{\star}=\underset{x}{\operatorname{argmax}}-\sum_{i} P_{\Theta}\left(\hat{y}_{i} \mid x_{k}\right) \log P_{\Theta}\left(\hat{y}_{i} \mid x_{k}\right) xE⋆=xargmax−i∑PΘ(y^i∣xk)logPΘ(y^i∣xk)其中PΘ(y^i∣xk)P_{\Theta}\left(\hat{y}_{i} \mid x_{k}\right)PΘ(y^i∣xk)表示实例xkx_kxk属于第iii类的先验概率。

2.1.2 Expected gradient length

查询实例有这样的特点，如果将其添加到训练集，则会导致目标函数的梯度发生最大变化 ⁴。具体如下：

xEGL∗=argmax⁡x−∑yiP(yi∣x;Θ)∥∇∂(L+<x,yi>;Θ)∥x_{\mathrm{EGL}}^{*}=\underset{x}{\operatorname{argmax}}-\sum_{y_{i}} P\left(y_{i} \mid x ; \Theta\right)\left\|\nabla \partial\left(L^{+<x, y_{i}>} ; \Theta\right)\right\| xEGL∗=xargmax−yi∑P(yi∣x;Θ)∥∥∇∂(L+<x,yi>;Θ)∥∥其中∇∂(L+<x,yi>;Θ)\nabla \partial\left(L^{+<x, y_{i}>} ; \Theta\right)∇∂(L+<x,yi>;Θ)表示新添加实例后，使用对数似然∂\partial∂去更新梯度；Θ\ThetaΘ表示模型参数。

2.1.3 Variance reduction

Seung H,S, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th annual workshop on computational learning theory (COLT 1992), Pittsburgh, pp 287–294 ↩︎
Culotta A, McCallum A (2005) Reducing labeling effort for stuctured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence (AAAI 2005), pp 746–751 ↩︎
Bottou L (1991) One approche theorique del apprentissage connexionniste: applications. Ala reconnaissance de la parole. Doctoral dissertation, Universite de Paris XI ↩︎
Settles B (2010) Active learning literature survey. Technical report 1648, University of Wisconsin, Mad-ison ↩︎

论文阅读 (九)：A survey on instance selection for active learning (2012)相关推荐

论文阅读 [TPAMI-2022] Augmentation Invariant and Instance Spreading Feature for Softmax Embedding
论文阅读 [TPAMI-2022] Augmentation Invariant and Instance Spreading Feature for Softmax Embedding 论文搜索(s ...
【论文阅读】DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning
[论文阅读]DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning 1 本文解决了什么问题? 斗地主是一个非常具有 ...
论文阅读笔记（5）：Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering
论文阅读笔记(5):Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering,基于Oracle的可伸 ...
论文阅读：A survey of visual analytics techniques for machine learning
题目:A survey of visual analytics techniques for machine learning A survey of visual analytics techniq ...
【论文阅读】A Survey of Challenges and Opportunities in Sensing and Analytics for Risk Factors of Cardiova
论文阅读:A Survey of Challenges and Opportunities in Sensing and Analytics for Risk Factors of Cardiovas ...
论文阅读：A Survey on Deep Learning for Named Entity Recognition
这是一篇2020年发的命名实体识别的综述性论文,从NER的语料库,定义,评估指标,到深度学习中的NER的技术都有涉及到. A Survey on Deep Learning for Named Ent ...
论文阅读：A Survey on Evolutionary Constrained Multi-objective Optimization，来自TEVC
文章目录 1.论文摘要 2.正文部分问题介绍算法分类算法优缺点问题适应性不同算法的应用情况 3.总结来自TEVC上最新的论文. Title: A Survey on Evolutionar ...
综述论文阅读”A comprehensive survey on graph neural networks“（TNNLS2020）
论文标题 A comprehensive survey on graph neural networks 论文作者.链接作者:Wu, Zonghan and Pan, Shirui and Chen ...
论文阅读 (四)：MILIS: Multiple Instance Learning with Instance Selection.
文章目录引入学前娱乐摘要算法过程训练测试实例选择和分类器学习 A.A.A. 包级特征表示 B.B.B. 初始化实例原型 C.C.C. 分类 D.D.D. 实例更新引入论文地址学 ...
论文阅读：A Survey of Open Domain Event Extraction 综述：开放域事件抽取
A Survey of Open Domain Event Extraction 综述:开放域事件抽取目录 A Survey of Open Domain Event Extraction 综述:开 ...

论文阅读 (九)：A survey on instance selection for active learning (2012)