loss函数之KLDivLoss

KL散度

KL散度，又叫相对熵，用于衡量两个分布（离散分布和连续分布）之间的距离。

设p(x)p(x)p(x) 、q(x)q(x)q(x) 是离散随机变量XXX的两个概率分布，则ppp 对qqq 的KL散度是:

DKL(p∥q)=Ep(x)log⁡p(x)q(x)=∑i=1Np(xi)⋅(log⁡p(xi)−log⁡q(xi))D_{K L}(p \| q)=E_{p(x)} \log \frac{p(x)}{q(x)}=\sum_{i=1}^{N} p\left(x_{i}\right) \cdot\left(\log p\left(x_{i}\right)-\log q\left(x_{i}\right)\right)DKL(p∥q)=Ep(x)logq(x)p(x)=i=1∑Np(xi)⋅(logp(xi)−logq(xi))

KLDivLoss

对于包含NNN个样本的batch数据 D(x,y)D(x, y)D(x,y)，xxx是神经网络的输出，并且进行了归一化和对数化；yyy是真实的标签（默认为概率），xxx与yyy同维度。

第nnn个样本的损失值lnl_{n}ln计算如下:

ln=yn⋅(log⁡yn−xn)l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)ln=yn⋅(logyn−xn)

class KLDivLoss(_Loss):__constants__ = ['reduction']def __init__(self, size_average=None, reduce=None, reduction='mean'):super(KLDivLoss, self).__init__(size_average, reduce, reduction)def forward(self, input, target):return F.kl_div(input, target, reduction=self.reduction)

pytorch中通过torch.nn.KLDivLoss类实现，也可以直接调用F.kl_div 函数，代码中的size_average与reduce已经弃用。reduction有四种取值mean,batchmean, sum, none，对应不同的返回ℓ(x,y)\ell(x, y)ℓ(x,y)。默认为mean

L={l1,…,lN}L=\left\{l_{1}, \ldots, l_{N}\right\}L={l1,…,lN}

ℓ(x,y)={L⁡,if reduction =’none’ mean⁡(L),if reduction =’mean’ N∗mean⁡(L),if reduction =’batchmean’ sum⁡(L),if reduction =’sum’ \ell(x, y)=\left\{\begin{array}{ll}\operatorname L, & \text { if reduction }=\text { 'none' } \\ \operatorname{mean}(L), & \text { if reduction }=\text { 'mean' } \\ N*\operatorname {mean}(L), & \text { if reduction }=\text { 'batchmean' } \\ \operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array} \right.ℓ(x,y)=⎩⎪⎪⎨⎪⎪⎧L,mean(L),N∗mean(L),sum(L), if reduction = ’none’ if reduction = ’mean’ if reduction = ’batchmean’ if reduction = ’sum’

例子：

import torch
import torch.nn as nn
import mathdef validate_loss(output, target):val = 0for li_x, li_y in zip(output, target):for i, xy in enumerate(zip(li_x, li_y)):x, y = xyloss_val = y * (math.log(y, math.e) - x)val += loss_valreturn val / output.nelement()torch.manual_seed(20)
loss = nn.KLDivLoss()
input = torch.Tensor([[-2, -6, -8], [-7, -1, -2], [-1, -9, -2.3], [-1.9, -2.8, -5.4]])
target = torch.Tensor([[0.8, 0.1, 0.1], [0.1, 0.7, 0.2], [0.5, 0.2, 0.3], [0.4, 0.3, 0.3]])
output = loss(input, target)
print("default loss:", output)output = validate_loss(input, target)
print("validate loss:", output)loss = nn.KLDivLoss(reduction="batchmean")
output = loss(input, target)
print("batchmean loss:", output)loss = nn.KLDivLoss(reduction="mean")
output = loss(input, target)
print("mean loss:", output)loss = nn.KLDivLoss(reduction="none")
output = loss(input, target)
print("none loss:", output)

输出：

default loss: tensor(0.6209)
validate loss: tensor(0.6209)
batchmean loss: tensor(1.8626)
mean loss: tensor(0.6209)
none loss: tensor([[1.4215, 0.3697, 0.5697],[0.4697, 0.4503, 0.0781],[0.1534, 1.4781, 0.3288],[0.3935, 0.4788, 1.2588]])