NLP常用损失函数代码实现

  NLP常用的损失函数主要包括多类分类(SoftMax + CrossEntropy)、对比学习(Contrastive Learning)、三元组损失(Triplet Loss)和文本相似度(Sentence Similarity)。其中分类和文本相似度是非常常用的两个损失函数,对比学习和三元组损失则是近两年比较新颖的自监督损失函数。

  本文不是对损失函数的理论讲解,只是简单对这四个损失函数进行了实现,方便在模型实验中快速嵌入损失函数模块。为了能够快速直观地看到损失函数的执行过程和结果,本文基于HuggingFace-BERT实现简单的演示(没有训练过程)。读者可以在自己的模型框架中直接嵌套相应的损失函数。


一、分类损失——SoftMax+CrossEntropy

  分类损失表示输入一个句子(或一个句子对),对齐进行多类分类。代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:25
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : SoftmaxLayerWithLoss.py
# !/usr/bin/env python
# coding=utf-8import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfigclass SoftmaxLayerWithLoss(nn.Module):"""This loss aims to calculate softmax between input sentences (pairs) with labels@:param hidden_dim: The hidden dimension@:param num_labels: The number of labels@:param is_sentence_pair: (bool) Whether to feed sentence pair@:param combine_type: The type of combination of sentence pair:- cat: rep = torch.cat([rep_a, rep_b], -1)- diff: rep = rep_a - rep_b- mul: rep = rep_a * rep_b- avg: rep =  (rep_a + rep_b) / 2.0- sum: rep = rep_a + rep_b"""def __init__(self,hidden_dim: int,num_labels: int,is_sentence_pair=False,combine_type='cat', # cat / diff / mul / avg / sum):super(SoftmaxLayerWithLoss, self).__init__()self.hidden_dim = hidden_dimself.num_labels = num_labelsself.is_sentence_pair = is_sentence_pairself.combine_type = combine_typeassert self.combine_type in ['cat', 'diff', 'mul', 'avg', 'sum']if self.combine_type == 'cat':self.hidden_dim = self.hidden_dim * 2self.classifier = nn.Linear(self.hidden_dim, num_labels)def forward(self, rep_a, rep_b=None, label: Tensor=None):# rep_a: [batch_size, hidden_dim]# rep_b: [batch_size, hidden_dim]rep = Noneif self.combine_type == 'cat':rep = torch.cat([rep_a, rep_b], -1)if self.combine_type == 'diff':rep = rep_a - rep_bif self.combine_type == 'mul':rep = rep_a * rep_bif self.combine_type == 'avg':rep = (rep_a + rep_b) / 2if self.combine_type == 'sum':rep = rep_a + rep_boutput = self.classifier(rep)loss_fct = nn.CrossEntropyLoss()if label is not None:loss = loss_fct(output, label.view(-1))return losselse:return rep, outputif __name__ == "__main__":# configure for huggingface pre-trained language modelsconfig = BertConfig.from_pretrained('bert-base-cased')# tokenizer for huggingface pre-trained language modelstokenizer = BertTokenizer.from_pretrained('bert-base-cased')# pytorch_model.bin for huggingface pre-trained language modelsmodel = BertModel.from_pretrained('bert-base-cased')# obtain two batch of examples, each corresponding example is a pairexamples1 = ['This is the book.', 'Disney film is well seeing for us.']examples2 = ['I love to read it.', 'I don\'t want to have a try due to the hardness.']label = [1, 0]# convert each example for feature# {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}features1 = tokenizer(examples1, add_special_tokens=True, padding=True)features2 = tokenizer(examples2, add_special_tokens=True, padding=True)# padding and convert to feature batchmax_seq_lem = 16features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}label = torch.Tensor(label).long()# obtain sentence embedding by averaged poolingrep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]# obtain contrastive lossloss_fn = SoftmaxLayerWithLoss(hidden_dim=rep_a.shape[-1], num_labels=2, is_sentence_pair=True, combine_type='cat')loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)print(loss) # tensor(0.6986, grad_fn=<SumBackward0>)

二、文本相似度损失

  文本相似度旨在对两个句子计算其余弦相似度。余弦相似度作为概率值,损失函数则为MSE,代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:55
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : SimilarityLoss.py
# !/usr/bin/env python
# coding=utf-8import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfigclass CosineSimilarityLoss(nn.Module):"""CosineSimilarityLoss expects, that the InputExamples consists of two texts and a float label.It computes the vectors u = model(input_text[0]) and v = model(input_text[1]) and measures the cosine-similarity between the two.By default, it minimizes the following loss: ||input_label - cos_score_transformation(cosine_sim(u,v))||_2.:param loss_fct: Which pytorch loss function should be used to compare the cosine_similartiy(u,v) with the input_label? By default, MSE:  ||input_label - cosine_sim(u,v)||_2:param cos_score_transformation: The cos_score_transformation function is applied on top of cosine_similarity. By default, the identify function is used (i.e. no change)."""def __init__(self, loss_fct = nn.MSELoss(), cos_score_transformation=nn.Identity()):super(CosineSimilarityLoss, self).__init__()self.loss_fct = loss_fctself.cos_score_transformation = cos_score_transformationdef forward(self, rep_a, rep_b, label: Tensor):# rep_a: [batch_size, hidden_dim]# rep_b: [batch_size, hidden_dim]output = self.cos_score_transformation(torch.cosine_similarity(rep_a, rep_b))# print(output) # tensor([0.9925, 0.5846], grad_fn=<DivBackward0>), tensor(0.1709, grad_fn=<MseLossBackward0>)return self.loss_fct(output, label.view(-1))if __name__ == "__main__":# configure for huggingface pre-trained language modelsconfig = BertConfig.from_pretrained('bert-base-cased')# tokenizer for huggingface pre-trained language modelstokenizer = BertTokenizer.from_pretrained('bert-base-cased')# pytorch_model.bin for huggingface pre-trained language modelsmodel = BertModel.from_pretrained('bert-base-cased')# obtain two batch of examples, each corresponding example is a pairexamples1 = ['Beijing is one of the biggest city in China.', 'Disney film is well seeing for us.']examples2 = ['Shanghai is the largest city in east of China.', 'ACL 2021 will be held in line due to COVID-19.']label = [1, 0]# convert each example for feature# {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}features1 = tokenizer(examples1, add_special_tokens=True, padding=True)features2 = tokenizer(examples2, add_special_tokens=True, padding=True)# padding and convert to feature batchmax_seq_lem = 24features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}label = torch.Tensor(label).long()# obtain sentence embedding by averaged poolingrep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]# obtain contrastive lossloss_fn = CosineSimilarityLoss()loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)print(loss) # tensor(0.1709, grad_fn=<SumBackward0>)

三、对比损失

  对比学习(Contrastive Learning)指的是给定一个anchor以及若干候选项。anchor表示一个确定的特征向量,或由神经网络(例如BERT)表征的向量,candidate则是一组候选项,其中包含positive(与anchor同类)和若干negative(与anchor不同类)。对比学习的目标是尽可能让同类的相似度更大,不同类的相似度越小。详细可看如下代码以及实例:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 14:50
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : ContrastiveLoss.py
# !/usr/bin/env python
# coding=utf-8from enum import Enum
import torch
import torch.nn.functional as F
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfigclass SiameseDistanceMetric(Enum):"""The metric for the contrastive loss"""EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)COSINE_DISTANCE = lambda x, y: 1-F.cosine_similarity(x, y)class ContrastiveLoss(nn.Module):"""Contrastive loss. Expects as input two texts and a label of either 0 or 1. If the label == 1, then the distance between thetwo embeddings is reduced. If the label == 0, then the distance between the embeddings is increased.@:param distance_metric: The distance metric function@:param margin: (float) The margin distance@:param size_average: (bool) Whether to get averaged lossInput example of forward function:rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]label: [0, 1, ..., 1]Return example of forward function:0.015 (averged)2.672 (sum)"""def __init__(self, distance_metric=SiameseDistanceMetric.COSINE_DISTANCE, margin: float = 0.5, size_average:bool = False):super(ContrastiveLoss, self).__init__()self.distance_metric = distance_metricself.margin = marginself.size_average = size_averagedef forward(self, rep_anchor, rep_candidate, label: Tensor):# rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors# rep_candidate: [batch_size, hidden_dim] denotes the representations of positive / negative# label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pairdistances = self.distance_metric(rep_anchor, rep_candidate)losses = 0.5 * (label.float() * distances.pow(2) + (1 - label).float() * F.relu(self.margin - distances).pow(2))return losses.mean() if self.size_average else losses.sum()if __name__ == "__main__":# configure for huggingface pre-trained language modelsconfig = BertConfig.from_pretrained('bert-base-cased')# tokenizer for huggingface pre-trained language modelstokenizer = BertTokenizer.from_pretrained('bert-base-cased')# pytorch_model.bin for huggingface pre-trained language modelsmodel = BertModel.from_pretrained('bert-base-cased')# obtain two batch of examples, each corresponding example is a pairexamples1 = ['This is the sentence anchor 1.', 'It is the second sentence in this article named Section D.']examples2 = ['It is the same as anchor 1.', 'I think it is different with Section D.']label = [1, 0]# convert each example for feature# {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}features1 = tokenizer(examples1, add_special_tokens=True, padding=True)features2 = tokenizer(examples2, add_special_tokens=True, padding=True)# padding and convert to feature batchmax_seq_lem = 16features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}label = torch.Tensor(label).long()# obtain sentence embedding by averaged poolingrep_anchor = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]rep_candidate = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]rep_anchor = torch.mean(rep_anchor, -1) # [batch_size, hidden_dim]rep_candidate = torch.mean(rep_candidate, -1) # [batch_size, hidden_dim]# obtain contrastive lossloss_fn = ContrastiveLoss()loss = loss_fn(rep_anchor=rep_anchor, rep_candidate=rep_candidate, label=label)print(loss) # tensor(0.0869, grad_fn=<SumBackward0>)

四、三元组损失

  三元组损失(Triplet Loss)与对比学习比较类似,其旨在拉近anchor与positive的距离,拉开anchor与negative的距离。不同之处在于Triplet Loss考虑到anchor与其他表征向量的最小距离margin值,损失函数则是margin loss。代码如下所示:

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 15:25
# @Author  : Jianing Wang
# @Email   : lygwjn@gmail.com
# @File    : TripletLoss.py
# !/usr/bin/env python
# coding=utf-8from enum import Enum
import torch
from torch import nn, Tensor
import torch.nn.functional as F
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfigclass TripletDistanceMetric(Enum):"""The metric for the triplet loss"""COSINE = lambda x, y: 1 - F.cosine_similarity(x, y)EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)class TripletLoss(nn.Module):"""This class implements triplet loss. Given a triplet of (anchor, positive, negative),the loss minimizes the distance between anchor and positive while it maximizes the distancebetween anchor and negative. It compute the following loss function:loss = max(||anchor - positive|| - ||anchor - negative|| + margin, 0).Margin is an important hyperparameter and needs to be tuned respectively.@:param distance_metric: The distance metric function@:param triplet_margin: (float) The margin distanceInput example of forward function:rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]label: [0, 1, ..., 1]Return example of forward function:0.015 (averged)2.672 (sum)"""def __init__(self, distance_metric=TripletDistanceMetric.EUCLIDEAN, triplet_margin: float = 0.5):super(TripletLoss, self).__init__()self.distance_metric = distance_metricself.triplet_margin = triplet_margindef forward(self, rep_anchor, rep_positive, rep_negative):# rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors# rep_positive: [batch_size, hidden_dim] denotes the representations of positive, sometimes, it canbe dropout# rep_negative: [batch_size, hidden_dim] denotes the representations of negative# label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pairdistance_pos = self.distance_metric(rep_anchor, rep_positive)distance_neg = self.distance_metric(rep_anchor, rep_negative)losses = F.relu(distance_pos - distance_neg + self.triplet_margin)return losses.mean()if __name__ == "__main__":# configure for huggingface pre-trained language modelsconfig = BertConfig.from_pretrained('bert-base-cased')# tokenizer for huggingface pre-trained language modelstokenizer = BertTokenizer.from_pretrained('bert-base-cased')# pytorch_model.bin for huggingface pre-trained language modelsmodel = BertModel.from_pretrained('bert-base-cased')# obtain two batch of examples, each corresponding example is a pairanchor_example = ['I am an anchor, which is the source example sampled from corpora.'] # anchor sentencepositive_example = ['I am an anchor, which is the source example.','I am the source example sampled from corpora.'] # positive, which randomly dropout or noise from anchornegative_example = ['It is different with the anchor.','My name is Jianing Wang, please give me some stars, thank you!'] # negative, which randomly sampled from corpora# convert each example for feature# {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}anchor_feature = tokenizer(anchor_example, add_special_tokens=True, padding=True)positive_feature = tokenizer(positive_example, add_special_tokens=True, padding=True)negative_feature = tokenizer(negative_example, add_special_tokens=True, padding=True)# padding and convert to feature batchmax_seq_lem = 24anchor_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in anchor_feature.items()}positive_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in positive_feature.items()}negative_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in negative_feature.items()}# obtain sentence embedding by averaged poolingrep_anchor = model(**anchor_feature)[0] # [1, max_seq_len, hidden_dim]rep_positive = model(**positive_feature)[0] # [batch_size, max_seq_len, hidden_dim]rep_negative = model(**negative_feature)[0] # [batch_size, max_seq_len, hidden_dim]# repeatrep_anchor = torch.mean(rep_anchor, -1) # [1, hidden_dim]rep_positive = torch.mean(rep_positive, -1) # [batch_size, hidden_dim]rep_negative = torch.mean(rep_negative, -1) # [batch_size, hidden_dim]# obtain contrastive lossloss_fn = TripletLoss()loss = loss_fn(rep_anchor=rep_anchor, rep_positive=rep_positive, rep_negative=rep_negative)print(loss) # tensor(0.5001, grad_fn=<MeanBackward0>)

NLP常用损失函数代码实现——SoftMax/Contrastive/Triplet/Similarity相关推荐

  1. 常用损失函数总结(L1 loss、L2 loss、Negative Log-Likelihood loss、Cross-Entropy loss、Hinge Embedding loss、Margi)

    常用损失函数总结(L1 loss.L2 loss.Negative Log-Likelihood loss.Cross-Entropy loss.Hinge Embedding loss.Margi) ...

  2. 图像分割之常用损失函数-Dice Loss

    哈喽大家好 ! 我是唐宋宋宋,很荣幸与您相见!! Dice Loss Dice Loss是由Dice系数而得名的,Dice系数是一种用于评估两个样本相似性的度量函数,其值越大意味着这两个样本越相似,D ...

  3. 神经网络常用损失函数Loss Function

    深度学习神经网络常用损失函数 损失函数--Loss Function 1. MSE--均方误差损失函数 2. CEE--交叉熵误差损失函数 3. mini-batch版交叉熵误差损失函数 损失函数–L ...

  4. 一文看懂机器学习中的常用损失函数

    作者丨stephenDC 编辑丨zandy 来源 | 大数据与人工智能(ID: ai-big-data) 导语:损失函数虽然简单,却相当基础,可以看做是机器学习的一个组件.机器学习的其他组件,还包括激 ...

  5. 添加softmax层_PyTorch入门之100行代码实现softmax回归分类

    本文首发于公众号[拇指笔记] 1. 使用pytorch实现softmax回归模型 使用pytorch可以更加便利的实现softmax回归模型. 1.1 获取和读取数据 读取小批量数据的方法: 首先是获 ...

  6. 常用损失函数(Loss Function)

    [深度学习]一文读懂机器学习常用损失函数(Loss Function) 最近太忙已经好久没有写博客了,今天整理分享一篇关于损失函数的文章吧,以前对损失函数的理解不够深入,没有真正理解每个损失函数的特点 ...

  7. 【深度学习】一文读懂机器学习常用损失函数(Loss Function)

    [深度学习]一文读懂机器学习常用损失函数(Loss Function) 最近太忙已经好久没有写博客了,今天整理分享一篇关于损失函数的文章吧,以前对损失函数的理解不够深入,没有真正理解每个损失函数的特点 ...

  8. NLP常用工具及机器学习各类工具比较

     1.NLP常用工具  转自http://www.cppblog.com/baby-fly/archive/2010/10/08/129003.html 各种工具包的有效利用可以使研究者事半功倍. ...

  9. 损失函数定义及常用损失函数

    损失函数定义 损失函数(Loss function)是用来估量你模型的预测值 f(x)f(x) 与真实值 YY 的不一致程度,它是一个非负实值函数,通常用 L(Y,f(x))L(Y,f(x)) 来表示 ...

最新文章

  1. PHP5中PDO(PHP DATA OBJECT)模块基础详解
  2. Linux基础命令---sysctl修改内核参数
  3. 最终成为了热门的语言——python
  4. 数据中心战略的三个真相
  5. java创建阻塞_如何从HttpsURLConnection创建Java非阻塞InputStream?
  6. mysql 修改编码不成功解决办法
  7. 有机晶体数据库_面向Journal of Organic Chemistry作者的晶体学信息文件(CIF)工作流程将于10月6日作出调整...
  8. Win7 x64 sp1安装orcale 10g
  9. 在pcduino开发板上写驱动控制板载LED的闪烁
  10. Sublime Text 2 代码编辑器使用技巧
  11. Linux USB驱动-鼠标驱动
  12. 总在说思科华为认证 可你真的清楚它们的区别吗?
  13. 2016年辽宁省电子设计大赛自动循迹小车制作心得
  14. 属性编辑器PropertyEditor
  15. 无聊科技正经事周刊(第3期):美团的推荐算法,是在玩火吗?
  16. 格力空调设置定时关机
  17. 盘点 | 2022值得学习的编程语言 TOP 7
  18. 微服务学习总结4(网关和consul结合)
  19. 国际顶会 SIGCOMM,我们来了!
  20. 服务器维修故障诊断思路大全

热门文章

  1. 电脑如何压缩jpg格式图片?jpg图片压缩工具推荐
  2. HBase协处理器及二级索引
  3. 传统旅游商集体介入旅游垂直搜索
  4. 查看SQL数据库中是否含有全角字符
  5. python轮胎缺陷检测_当机器视觉走进轮胎缺陷检测,人工与AI,谁才是主流
  6. H5 video标签封面
  7. ubuntu root 用户切换到普通用户
  8. socks4代理网络渗透测试实验
  9. kill 与killall
  10. 为什么油烟净化器会引起火灾?