Introduction
Poincaré Embeddings
- The Limitations of Euclidean Space for Hierarchical Data
- Embedding Hierarchies in Hyperbolic Space
Evaluation
References

Introduction

如今，表征学习变得越来越重要 (e.g. word embedding, embeddings of graphs, embeddings of multi-relational data)，许多复杂数据集也都具有一定的层次结构 (latent hierarchical structures)，但欧氏空间中优化得到的 Embeddings 建模复杂关系的能力受限于 embed 维数
为了增强 embed 对表征对象间复杂关系的表征能力，作者提出将具有层次关系的表征对象嵌入到一种双曲空间 – $n$ 维 Poincaré ball 中。此外，作者基于 Riemannian optimization 对 Poincaré ball 中的 embed 进行优化。实验证明 Poincaré embeddings 在编码具有层次特征的数据时，在表征能力和泛化能力上都超过了 Euclidean embeddings，特别是在低维数条件下

Poincaré Embeddings

The Limitations of Euclidean Space for Hierarchical Data

Euclidean space is unable to obtain comparably low distortion for trees, even using an unbounded number of dimensions.

对于 hierarchical data，假如树的 branching factor 固定，则随着层数的加深，树的结点数将以指数级速度增长。如果要生成树的每个结点对应的 embed，则需要使得结点间的距离符合树结构，子结点和父结点间的距离应该比较接近，不同分支的叶结点间应彼此远离
下图展示了将 branching factor 为 4 的树结构嵌入到二维欧氏空间，可以看到欧氏空间的位置已经不太够用了，如果树结构层数更多，不同分支的各个叶结点离得将更近，这些叶结点在树结构上离得很远，但嵌入在欧氏空间上时距离确会很近。欧氏空间适合嵌入网格结构的数据，如果想要更好地表征更深层次的树结构，就必须使用更高维度的欧氏空间，这可能会导致更大的时间、空间开销甚至模型过拟合

Embedding Hierarchies in Hyperbolic Space

Trees can be embedded with arbiytarily low distortion into the Poincaré’s disk.

Hyperbolic Space 可以很好地建模 hierarchical data，有研究表明 “any finite tree can be embedded into a finite hyperbolic space such that distances are preserved approximately”.
下图展示了将树结构嵌入到 Poincaré disk. 由于靠近单位圆的边界时，距离以指数级速度增长，因此下图中每个相邻节点间的距离实际上都是相等的，并且虽然叶结点看上去比较拥挤，但实际上相隔的距离非常远，不同子树下叶结点之间的测地线可能也基本近似于它们之间在树结构下的最短路径，很好地体现了树的层级性

补充定理用于说明双曲空间和树的适配关系 (c.f. Section 2 in Ganea, et al.)：
(1) $\delta$ -hyperbolic metric space 可以被嵌入到 finite weighted tree 中
(2) Any tree can be embedded with arbitrary low distortion into the Poincaré disk (with only 2 dimensions), whereas this is not true for Euclidean spaces even when an unbounded number of dimensions is allowed. (Sarkar, 2011; De Sa et al., 2018)

Poincaré Embeddings

作者假设要表征的数据具有隐性的层次树结构 (并没有直接获取到层次树结构)，然后通过无监督学习的方式将层次数据嵌入到 Poincaré ball 中来让数据的 embed 间的距离反映它们之间的语义相似度。引入层次树的先验结构信息有助于降低模型的时间和空间开销，并且提高模型的泛化能力

Why Poincaré ball model instead of a simple Poincaré disk model?
(1) First, in many datasets such as text corpora, multiple latent hierarchies can co-exist, which can not always be modeled in two dimensions.
(2) Second, a larger embedding dimension can decrease the difficulty for an optimization method to find a good embedding (also for single hierarchies) as it allows for more degrees of freedom during the optimization process.

为了便于进行梯度优化，作者使用 Poincaré ball model (Its distance function is differentiable and it has a relatively simple constraint on the representations.). Poincaré ball model 对应黎曼流形 $(\mathcal B^d,g_x)$ ，其中 $\mathcal B^d=\{x\in\R^d|\|x\|<1\}$ ， $g_x$ 为 Riemannian metric tensor
其中 $x\in\mathcal B^d$ ， $g^E=I_n$ 为 Euclidean metric tensor. $\in\mathcal B^d$ 间的距离为

Optimization

对于具有层次结构的数据 $\mathcal S=\{x_i\}_{i=1}^n$ ，我们想要找到它们对应的 embed $\Theta=\{\theta_i\}_{i=1}^n$ ，其中 $\theta_i\in\mathcal B^d$ ，使得 embed 间的 Poincaré distance 能反映它们之间的语义相似程度。为了得到 embed，需要求解如下优化问题：
(损失函数定义见 “Evaluation/Embedding Taxonomies” 一节)
由于 Poincaré Ball 为黎曼流形，因此我们可以通过 stochastic Riemannian optimization methods (RSGD, RSVRG, …) 求解。令 $\mathcal T_\theta\mathcal B$ 为点 $\theta\in\mathcal B^d$ 处的 tangent space， $\nabla_R\in\mathcal T_\theta\mathcal B$ 为 $\mathcal L(\theta)$ 的 Riemannian gradient， $\nabla_E$ 为 $\mathcal L(\theta)$ 的 Euclidean gradient，RSGD 的参数更新方式如下：
$\theta_{t+1}=\theta_t-\eta_t\nabla_R\mathcal L(\theta_t)$ 由于 Poincaré ball 为双曲空间的一种 conformal model，因此相邻向量在 Poincaré ball 中的角度和在欧氏空间里的角度相同 (具有保角性)，但向量长度在两个空间内不一样，因此为了从 Euclidean gradient 推出 Riemannian gradient，需要将 $\nabla_E$ 乘上 Poincaré ball metric tensor 的逆 $g_\theta^{-1}$ 来进行缩放
$\nabla_R=\frac{(1-\|\theta_t\|^2)^2}{4}\nabla_E$ 此外，还需要限制优化时的 embed 位于单位圆内
其中 $\varepsilon=10^{-5}$ . 最终的参数更新公式为

关于保角性 (conformal)：
(1) A metric $\tilde g$ is said to be conformal to another metric $g$ if it defines the same angles, i.e.
for all $x\in M$ ， $\ { 0 } u,v\in T_xM \backslash \{0\}$ .
(2) Poincaré ball 中的 metric tensor $g_x^{\mathbb D}$ 为
其中 $g^E=I_n$ 为 Euclidean metric tensor，它们满足
因此 Poincaré ball model 具有保角性
(3) 这也等价于存在 smooth function $\R$ ，i.e.，conformal factor，使得对所有 $x\in M$ ，都有 $\tilde g_x=\lambda_x^2 g_x$

Training Details

作者还使用了一些 tricks 来提升模型性能：(1) 用均匀分布 $\mathcal U(-0.001,0.001)$ 来随机初始化 embed，这可以让所有 embed 在初始化时靠近 $\mathcal B^d$ 的原点。(2) 为了得到一个较好的 initial angular layout，作者设置了 initial “burn-in” phase，在 10 个 epochs 内使用 $\eta/10$ 的学习率进行训练。结合均匀分布的位置初始化策略，这可以提升 angular layout 的质量，同时又不会让 embed 过于靠近边界

Evaluation

Embedding Taxonomies

作者在 transitive closure of the WORDNET noun hierarchy 上进行了实验，用于测试 Poincaré embeddings 对具有 clear latent hierarchical structure 的数据的嵌入能力。该数据集 $\mathcal D=\{(u,v)\}$ 包含 82,115 nouns 之间的 743,241 hypernymy relations，损失函数采用
$\begin{align*} \mathcal{L}(\Theta) &= -\sum_{\substack{(u,v) \in \mathcal{D}}} \log \frac{e^{-d(u,v)}}{e^{-d(u,v)} + \sum_{v'\in \mathcal{N}(u)} e^{-d(u, v')}} \\ \tag{14} \end{align*}$ 其中 $\mathcal N(u)=\{v'|(u,v')\notin\mathcal D\}\cup\{u\}$ 为 $u$ 的负样本集合，训练时给每个正样本随机采样 10 个负样本，整个优化过程十分类似于 Word2vec’s Skip-Gram loss with negative sampling
Reconstruction. 为了直接检验 embed 的表征质量，作者直接从 embed 重建数据，得到重建数据属于所有名词的概率，利用概率进行排序，其中 ground-truth 的 Rank 可以作为 metric. 作者将所有样本 ground-truth Rank 的均值以及它们的 mAP 作为测试指标
Link Prediction. 为了检验 embed 的泛化能力，作者将数据集划分为训练、验证和测试集来进行 link prediction，可以得到正样本对间的距离 $d (u, v)$ 在所有负样本对距离 $\{d(u,v')|u,v'\notin\mathcal D\}$ 中的 Rank. 作者将所有正样本对 Rank 的均值以及它们的 mAP 作为测试指标

Euclidean: $d(u, v) = \|u − v\|^2$
Translational: $d(u, v) = \|u − v + r\|^2$ . For this score function, we also learn the global translation vector $r$ during training.

下图为 mammals 子树对应的 Two-dimensional Poincaré embeddings 的可视化，蓝边为 Ground-truth “is-a” relations. A Poincaré embedding with $d = 5$ achieves mean rank 1.26 and MAP 0.927 on this subtree.

Network Embeddings

作者在 4 个 social networks 数据集上进行了 link prediction 实验，存在边的概率值采用下式计算：
其中 $r, t > 0$ 为超参

Lexical Entailment

References

paper: Nickel, Maximillian, and Douwe Kiela. “Poincaré embeddings for learning hierarchical representations.” Advances in neural information processing systems 30 (2017).
code: https://github.com/facebookresearch/poincare-embeddings
Implementation by Gensim: https://radimrehurek.com/gensim/models/poincare.html and a jupyter notebook tutorial: https://nbviewer.org/github/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Poincare%20Tutorial.ipynb
Implemented by “Hyperbolic Entailment Cones for Learning Hierarchical Embeddings”: https://github.com/dalab/hyperbolic_cones
Hyperbolic Geometry and Poincaré Embeddings
Implementing Poincaré Embeddings

[NeurIPS 2017] Poincaré Embeddings for Learning Hierarchical Representations相关推荐

Deep Learning Hierarchical Representations for Image Steganalysis【Ye-Net：图像隐写分析的深度学习层次表示】
Deep Learning Hierarchical Representations for Image Steganalysis [Ye-Net:图像隐写分析的深度学习层次表示] Abstract ...
论文阅读|struc2vec: Learning Node Representations from Structural Identity
论文阅读|struc2vec: Learning Node Representations from Structural Identity 文章目录论文阅读|struc2vec: Learning ...
【多标签文本分类】HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization
·阅读摘要: 本文提出结合基于CNN微调的HFT-CNN模型来解决层级多标签文本分类问题. [1] HFT-CNN: Learning Hierarchical Category Struct ...
【论文阅读】Deep Neural Networks for Learning Graph Representations | day14，15
<Deep Neural Networks for Learning Graph Representations>- (AAAI-16)-2016 文章目录一.模型 1.1解决了两个问题 ...
2019 TIP之ReID：Learning Modality-Specific Representations for Visible-Infrared Person Re-Identificati
Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification 当前的问题及概述: 由 ...
GAP：Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training 论文解读 ...
Learning Shape Representations for Clothing Variations in Person Re-Identification
##Learning Shape Representations for Clothing Variations in Person Re-Identification
【层级多标签文本分类】HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorizati
HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization 1.背景 1.作 ...
【ACMMM 2022】Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement
[ACMMM 2022]Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement 代码:https://g ...

[NeurIPS 2017] Poincaré Embeddings for Learning Hierarchical Representations

Contents