1 常见NLP文本分类模型

1.1 TextCNN

论文原文:《Convolutional Neural Networks for Sentence Classification》

论文地址:1408.5882.pdf (arxiv.org)

结构图如下:

值得一提的是,在2016年的《A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification》作者通过大量实验对TextCNN进行网络参数选取,并给出了参数建议。论文地址:https://arxiv.org/pdf/1510.03820.pdf,文章经典的结构图如下:

1.2 TextRNN

TextRNN指的是利用RNN循环神经网络解决文本分类问题。

论文原文:《Recurrent Neural Network for Text Classification with Multi-Task Learning》

论文链接:https://www.ijcai.org/Proceedings/16/Papers/408.pdf

结构图如下:

1.3 TextRCNN

论文原文:《Recurrent Convolutional Neural Networks for Text Classification》

论文链接:TextRCNN论文

结构图如下:

1.4 FastText

论文原文:《Bag of Tricks for Efficient Text Classification》

论文链接:https://arxiv.org/pdf/1607.01759v2.pdf

结构图如下:

1.5 HAN

论文原文:《Hierarchical Attention Networks for Document Classification》

论文链接:https://aclanthology.org/N16-1174.pdf

结构图如下:

1.6 CharCNN

论文原文:《Character-level Convolutional Networks for Text Classification》

论文链接:CharCNN论文

结构图如下:

1.7 Transformer

论文原文:《Attention is all you need》

论文链接:https://arxiv.org/pdf/1706.03762.pdf

结构图如下:

2 代码实现

import torch
import torch.nn as nn
import torch.nn.functional as Fimport numpy as npimport math
import copy#TextCNN
class TextCNN(nn.Module):def __init__(self, args):super(TextCNN, self).__init__()self.args = argsclass_num = args.class_numchanel_num = 1filter_num = args.filter_numfilter_sizes = args.filter_sizesvocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dimself.embedding = nn.Embedding(vocabulary_size, embedding_dimension)if args.static:self.embedding = self.embedding.from_pretrained(args.vectors, freeze=not args.non_static)if args.multichannel:self.embedding2 = nn.Embedding(vocabulary_size, embedding_dimension).from_pretrained(args.vectors)chanel_num += 1else:self.embedding2 = Noneself.convs = nn.ModuleList([nn.Conv2d(chanel_num, filter_num, (size, embedding_dimension)) for size in filter_sizes])self.dropout = nn.Dropout(args.dropout)self.fc = nn.Linear(len(filter_sizes) * filter_num, class_num)def forward(self, x):if self.embedding2:x = torch.stack([self.embedding(x), self.embedding2(x)], dim=1)else:x = self.embedding(x)x = x.unsqueeze(1)x = [F.relu(conv(x)).squeeze(3) for conv in self.convs]x = [F.max_pool1d(item, int(item.size(2))).squeeze(2) for item in x]x = torch.cat(x, 1)x = self.dropout(x)logits = self.fc(x)return logits#TextRNN
class LSTM(torch.nn.Module):def __init__(self, args):super(LSTM, self).__init__()self.embed_size = args.embedding_dimself.label_num = args.class_numself.embed_dropout = 0.1self.fc_dropout = 0.1self.hidden_num = 1self.hidden_size = 50self.hidden_dropout = 0self.bidirectional = Truevocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dimself.embeddings = nn.Embedding(vocabulary_size, embedding_dimension)# self.embeddings.weight.data.copy_(torch.from_numpy(vocabulary_size))self.embeddings.weight.requires_grad = Falseself.lstm = nn.LSTM(self.embed_size,self.hidden_size,dropout=self.hidden_dropout,num_layers=self.hidden_num,batch_first=True,bidirectional=True)self.embed_dropout = nn.Dropout(self.embed_dropout)self.fc_dropout = nn.Dropout(self.fc_dropout)self.linear1 = nn.Linear(self.hidden_size * 2, self.label_num)self.softmax = nn.Softmax()def forward(self, input):x = self.embeddings(input)x = self.embed_dropout(x)batch_size = len(input)_, (lstm_out, _) = self.lstm(x)lstm_out = lstm_out.permute(1, 0, 2)lstm_out = lstm_out.contiguous().view(batch_size, -1)out = self.linear1(lstm_out)out = self.fc_dropout(out)out = self.softmax(out)return out#TextRCNN
class BiLSTM(nn.Module):def __init__(self, args):super(BiLSTM, self).__init__()self.embed_size = args.embedding_dimself.label_num = args.class_numself.embed_dropout = 0.1self.fc_dropout = 0.1self.hidden_num = 2self.hidden_size = 50self.hidden_dropout = 0self.bidirectional = Truevocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dimself.embeddings = nn.Embedding(vocabulary_size, embedding_dimension)# self.embeddings.weight.data.copy_(torch.from_numpy(word_embeddings))self.embeddings.weight.requires_grad = Falseself.lstm = nn.LSTM(self.embed_size,self.hidden_size,dropout=self.hidden_dropout,num_layers=self.hidden_num,batch_first=True,bidirectional=self.bidirectional)self.embed_dropout = nn.Dropout(self.embed_dropout)self.fc_dropout = nn.Dropout(self.fc_dropout)self.linear1 = nn.Linear(self.hidden_size * 2, self.hidden_size // 2)self.linear2 = nn.Linear(self.hidden_size // 2, self.label_num)def forward(self, input):out = self.embeddings(input)out = self.embed_dropout(out)out, _ = self.lstm(out)out = torch.transpose(out, 1, 2)out = torch.tanh(out)out = F.max_pool1d(out, out.size(2))out = out.squeeze(2)out = self.fc_dropout(out)out = self.linear1(F.relu(out))output = self.linear2(F.relu(out))return output#FastText
class FastText(nn.Module):def __init__(self, args):super().__init__()self.output_dim = args.class_numvocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dimself.embeddings = nn.Embedding(vocabulary_size, embedding_dimension)self.fc = nn.Linear(embedding_dimension, self.output_dim)def forward(self, text):# text = [sent len, batch size]text = text.permute(1,0)embedded = self.embeddings(text)# embedded = [sent len, batch size, emb dim]embedded = embedded.permute(1, 0, 2)# embedded = [batch size, sent len, emb dim]pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1)# pooled = [batch size, embedding_dim]return self.fc(pooled)#HAN
class SelfAttention(nn.Module):def __init__(self, input_size, hidden_size):super(SelfAttention, self).__init__()self.W = nn.Linear(input_size, hidden_size, True)self.u = nn.Linear(hidden_size, 1)def forward(self, x):u = torch.tanh(self.W(x))a = F.softmax(self.u(u), dim=1)x = a.mul(x).sum(1)return x
class HAN(nn.Module):def __init__(self,args):super(HAN, self).__init__()hidden_size_gru = 50  # 50hidden_size_att = 100  # 100num_classes = args.class_numvocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dimself.num_words = 64 #词Pading大小self.embed = nn.Embedding(vocabulary_size, embedding_dimension)self.gru1 = nn.GRU(embedding_dimension, hidden_size_gru, bidirectional=True, batch_first=True)self.att1 = SelfAttention(hidden_size_gru * 2, hidden_size_att)self.gru2 = nn.GRU(hidden_size_att, hidden_size_gru, bidirectional=True, batch_first=True)self.att2 = SelfAttention(hidden_size_gru * 2, hidden_size_att)# 这里fc的参数很少,不需要dropoutself.fc = nn.Linear(hidden_size_att, num_classes, True)def forward(self, x):# 64 512 200x = x.view(x.size(0) * self.num_words, -1).contiguous()x = self.embed(x)x, _ = self.gru1(x)x = self.att1(x)x = x.view(x.size(0) // self.num_words, self.num_words, -1).contiguous()x, _ = self.gru2(x)x = self.att2(x)x = self.fc(x)x = F.log_softmax(x, dim=1)  # softmaxreturn x#CharCNN
class CharCNN(nn.Module):def __init__(self, args):super(CharCNN, self).__init__()self.num_chars = 64self.features = [128, 128, 128, 128, 128, 128]self.kernel_sizes = [7, 7, 3, 3, 3, 3]self.dropout = args.dropoutself.num_labels = args.class_numvocabulary_size = args.vocabulary_sizeembedding_dimension = args.embedding_dim# Embedding Layerself.embeddings = nn.Embedding(vocabulary_size, embedding_dimension)self.embeddings.weight.requires_grad = Falseself.in_features =  [self.num_chars]+self.features[:-1]self.out_features = self.featuresself.conv1d_1 = nn.Sequential(nn.Conv1d(self.in_features[0], self.out_features[0], self.kernel_sizes[0], stride=1),nn.BatchNorm1d(self.out_features[0]),nn.ReLU(),nn.MaxPool1d(kernel_size=3, stride=3))self.conv1d_2 = nn.Sequential(nn.Conv1d(self.in_features[1], self.out_features[1], self.kernel_sizes[1], stride=1),nn.BatchNorm1d(self.out_features[1]),nn.ReLU(),nn.MaxPool1d(kernel_size=3, stride=3))self.conv1d_3 = nn.Sequential(nn.Conv1d(self.in_features[2], self.out_features[2], self.kernel_sizes[2], stride=1),nn.BatchNorm1d(self.out_features[2]),nn.ReLU())self.conv1d_4 = nn.Sequential(nn.Conv1d(self.in_features[3], self.out_features[3], self.kernel_sizes[3], stride=1),nn.BatchNorm1d(self.out_features[3]),nn.ReLU())self.conv1d_5 = nn.Sequential(nn.Conv1d(self.in_features[4], self.out_features[4], self.kernel_sizes[4], stride=1),nn.BatchNorm1d(self.out_features[4]),nn.ReLU())self.conv1d_6 = nn.Sequential(nn.Conv1d(self.in_features[5], self.out_features[5], self.kernel_sizes[5], stride=1),nn.BatchNorm1d(self.out_features[5]),nn.ReLU(),nn.MaxPool1d(kernel_size=3, stride=3))self.fc1 = nn.Sequential(nn.Linear(128, 128),nn.ReLU(),nn.Dropout(self.dropout))self.fc2 = nn.Sequential(nn.Linear(128, 128),nn.ReLU(),nn.Dropout(self.dropout))self.fc3 = nn.Linear(128, self.num_labels)def forward(self, x):# x = torch.Tensor(x).long()   # batch_size=128, num_chars=128, seq_len=64x = self.embeddings(x)# x = x.permute(0,2,1)x = self.conv1d_1(x)  # b, out_features[0], (seq_len-f + 1)-f/s+1  = 64, 256, (1014-7+1)-3/3 + 1=1008-3/3+1=336x = self.conv1d_2(x)  # 64, 256, (336-7+1)-3/3+1=110x = self.conv1d_3(x)  # 64, 256, 110-3+1=108x = self.conv1d_4(x)  # 64, 256, 108-3+1=106x = self.conv1d_5(x)  # 64, 256, 106-3=1=104x = self.conv1d_6(x)  # 64, 256, (104-3+1)-3/3+1=34x = x.view(x.size(0), -1)  # 64, 256, 34 -> 64, 8704out = self.fc1(x)  # 64, 1024out = self.fc2(out)  # 64, 1024out = self.fc3(out)  # 64, 4return out#Transformer
class Transformer_Config(object):"""配置参数"""def __init__(self, args):# self.model_name = 'Transformer'# self.train_path = dataset + '/data/train.txt'                                # 训练集# self.dev_path = dataset + '/data/dev.txt'                                    # 验证集# self.test_path = dataset + '/data/test.txt'                                  # 测试集# self.class_list = [x.strip() for x in open(#     dataset + '/data/class.txt', encoding='utf-8').readlines()]              # 类别名单# self.vocab_path = dataset + '/data/vocab.pkl'                                # 词表# self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'        # 模型训练结果# self.log_path = dataset + '/log/' + self.model_name# self.embedding_pretrained = torch.tensor(#     np.load(dataset + '/data/' + embedding)["embeddings"].astype('float32'))\#     if embedding != 'random' else None                                       # 预训练词向量self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')   # 设备self.dropout = 0.5                                              # 随机失活self.require_improvement = 2000                                 # 若超过1000batch效果还没提升,则提前结束训练self.num_classes = args.class_num                         # 类别数self.n_vocab = args.vocabulary_size                                                # 词表大小,在运行时赋值self.num_epochs = args.epochs                                            # epoch数self.batch_size = args.batch_size                                           # mini-batch大小self.pad_size = 64                                              # 每句话处理成的长度(短填长切)self.learning_rate = 5e-4                                       # 学习率self.embedding_pretrained = None# self.embed = self.embedding_pretrained.size(1)\#     if self.embedding_pretrained is not None else 300           # 字向量维度self.embed = 128self.dim_model = args.embedding_dimself.hidden = 1024self.last_hidden = 512self.num_head = 2self.num_encoder = 2
'''Attention Is All You Need'''
class Transformer(nn.Module):def __init__(self, config):super(Transformer, self).__init__()if config.embedding_pretrained is not None:self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False)else:self.embedding = nn.Embedding(config.n_vocab, config.embed)self.postion_embedding = Positional_Encoding(config.embed, config.pad_size, config.dropout, config.device)self.encoder = Encoder(config.dim_model, config.num_head, config.hidden, config.dropout)self.encoders = nn.ModuleList([copy.deepcopy(self.encoder)# Encoder(config.dim_model, config.num_head, config.hidden, config.dropout)for _ in range(config.num_encoder)])self.fc1 = nn.Linear(config.pad_size * config.dim_model, config.num_classes)# self.fc2 = nn.Linear(config.last_hidden, config.num_classes)# self.fc1 = nn.Linear(config.dim_model, config.num_classes)def forward(self, x):out = self.embedding(x)out = self.postion_embedding(out)for encoder in self.encoders:out = encoder(out)out = out.view(out.size(0), -1)# out = torch.mean(out, 1)out = self.fc1(out)return out
class Encoder(nn.Module):def __init__(self, dim_model, num_head, hidden, dropout):super(Encoder, self).__init__()self.attention = Multi_Head_Attention(dim_model, num_head, dropout)self.feed_forward = Position_wise_Feed_Forward(dim_model, hidden, dropout)def forward(self, x):out = self.attention(x)out = self.feed_forward(out)return out
class Positional_Encoding(nn.Module):def __init__(self, embed, pad_size, dropout, device):super(Positional_Encoding, self).__init__()self.device = deviceself.pe = torch.tensor([[pos / (10000.0 ** (i // 2 * 2.0 / embed)) for i in range(embed)] for pos in range(pad_size)])self.pe[:, 0::2] = np.sin(self.pe[:, 0::2])self.pe[:, 1::2] = np.cos(self.pe[:, 1::2])self.dropout = nn.Dropout(dropout)def forward(self, x):out = x + nn.Parameter(self.pe, requires_grad=False).to(self.device)out = self.dropout(out)return out
class Scaled_Dot_Product_Attention(nn.Module):'''Scaled Dot-Product Attention '''def __init__(self):super(Scaled_Dot_Product_Attention, self).__init__()def forward(self, Q, K, V, scale=None):'''Args:Q: [batch_size, len_Q, dim_Q]K: [batch_size, len_K, dim_K]V: [batch_size, len_V, dim_V]scale: 缩放因子 论文为根号dim_KReturn:self-attention后的张量,以及attention张量'''attention = torch.matmul(Q, K.permute(0, 2, 1))if scale:attention = attention * scale# if mask:  # TODO change this#     attention = attention.masked_fill_(mask == 0, -1e9)attention = F.softmax(attention, dim=-1)context = torch.matmul(attention, V)return context
class Multi_Head_Attention(nn.Module):def __init__(self, dim_model, num_head, dropout=0.0):super(Multi_Head_Attention, self).__init__()self.num_head = num_headassert dim_model % num_head == 0self.dim_head = dim_model // self.num_headself.fc_Q = nn.Linear(dim_model, num_head * self.dim_head)self.fc_K = nn.Linear(dim_model, num_head * self.dim_head)self.fc_V = nn.Linear(dim_model, num_head * self.dim_head)self.attention = Scaled_Dot_Product_Attention()self.fc = nn.Linear(num_head * self.dim_head, dim_model)self.dropout = nn.Dropout(dropout)self.layer_norm = nn.LayerNorm(dim_model)def forward(self, x):batch_size = x.size(0)Q = self.fc_Q(x)K = self.fc_K(x)V = self.fc_V(x)Q = Q.view(batch_size * self.num_head, -1, self.dim_head)K = K.view(batch_size * self.num_head, -1, self.dim_head)V = V.view(batch_size * self.num_head, -1, self.dim_head)# if mask:  # TODO#     mask = mask.repeat(self.num_head, 1, 1)  # TODO change thisscale = K.size(-1) ** -0.5  # 缩放因子context = self.attention(Q, K, V, scale)context = context.view(batch_size, -1, self.dim_head * self.num_head)out = self.fc(context)out = self.dropout(out)out = out + x  # 残差连接out = self.layer_norm(out)return out
class Position_wise_Feed_Forward(nn.Module):def __init__(self, dim_model, hidden, dropout=0.0):super(Position_wise_Feed_Forward, self).__init__()self.fc1 = nn.Linear(dim_model, hidden)self.fc2 = nn.Linear(hidden, dim_model)self.dropout = nn.Dropout(dropout)self.layer_norm = nn.LayerNorm(dim_model)def forward(self, x):out = self.fc1(x)out = F.relu(out)out = self.fc2(out)out = self.dropout(out)out = out + x  # 残差连接out = self.layer_norm(out)return out

3 结果讨论

本文针对文本情感二分类任务展开训练,采用数据集的数据量包含Train有56700条,Evaluate有7000条。得到测试结果如下表。

可以看到由于本文的数据量比较小,所以小模型还有更好的检测效果。如Transformer有点大材小用,缺少发挥空间。欢迎大家学习讨论。

【NLP】文本分类-情感分类相关推荐

  1. MXNet中使用卷积神经网络textCNN对文本进行情感分类

    在图像识别领域,卷积神经网络是非常常见和有用的,我们试图将它应用到文本的情感分类上,如何处理呢?其实思路也是一样的,图片是二维的,文本是一维的,同样的,我们使用一维的卷积核去处理一维的文本(当作一维的 ...

  2. 【学习笔记】NLP之影评情感分类

    本文对影评数据进行NLP情感分类,数据分为标注数据(含sentiment)和非标注数据(不含sentiment),数据25000条,列出前五条如下: 自然语言处理和文本分析的问题中,词袋(Bag of ...

  3. MXNet中使用双向循环神经网络BiRNN对文本进行情感分类

    文本分类类似于图片分类,也是很常见的一种分类任务,将一段不定长的文本序列变换为文本的类别.这节主要就是关注文本的情感分析(sentiment analysis),对电影的评论进行一个正面情绪与负面情绪 ...

  4. MXNet中使用双向循环神经网络BiRNN对文本进行情感分类<改进版>

    在上一节的情感分类当中,有些评论是负面的,但预测的结果是正面的,比如,"this movie was shit"这部电影是狗屎,很明显就是对这部电影极不友好的评价,属于负类评价,给 ...

  5. 【NLP】中文情感分类单标签

    章节 背景介绍 预处理 完整的 GitHub 项目代码地址: https://github.com/sherlcok314159/ML/blob/main/nlp/practice/sentiment ...

  6. 【毕业设计】深度学习中文文本分类(新闻分类 情感分类 垃圾邮件分类)

    文章目录 0 简介 1 前言 2 中文文本分类 3 数据集准备 4 经典机器学习方法 4.1 分词.去停用词 4.2 文本向量化 tf-idf 4.3 构建训练和测试数据 4.4 训练分类器 4.4. ...

  7. BERT 预训练模型及文本分类(情感分类)

    https://www.cnblogs.com/wwj99/p/12283799.html

  8. 使用Python和机器学习进行文本情感分类

    使用Python和机器学习进行文本情感分类 1. 效果图 2. 原理 3. 源码 参考 这篇博客将介绍如何使用Python进行机器学习的文本情感分类(Text Emotions Classificat ...

  9. kaggle之电影文本情感分类

    电影文本情感分类 Github地址 Kaggle地址 这个任务主要是对电影评论文本进行情感分类,主要分为正面评论和负面评论,所以是一个二分类问题,二分类模型我们可以选取一些常见的模型比如贝叶斯.逻辑回 ...

最新文章

  1. OpenvSwitch — Overview
  2. .net中实现拖拽控件
  3. 设计模式复习-装饰模式
  4. 源码分析netty服务器创建过程vs java nio服务器创建
  5. 工商银行打造在线诊断平台的探索与实践
  6. navicat 8 mysql生成关系_MySQL数据库通过navicat建立多对多关系
  7. 私有云的部署(1)_ISCSI 无盘引导的一些心得
  8. iSCSI存储设备的udev绑定 以及iscsi重启卡住解决方法
  9. 虚拟邮箱怎么设置方法_腾讯企业邮箱邮件列表白名单设置方法
  10. 使用HttpsUrlConnedtion连接https地址时异常处理 (方式二)
  11. django 1.8 官方文档翻译: 2-3-2 关联对象参考
  12. 图像处理常用八大算法
  13. Android开发笔记(一百六十三)高仿京东的沉浸式状态栏
  14. 想学习C语言,学习路线是什么?
  15. [转]c++中RTTI的观念和使用
  16. Easy machine learning pipelines with pipelearner: intro and call for contributors
  17. 02.STM32开发板资源介绍与驱动
  18. Android 调用12306接口,聚合数据Android SDK 12306火车票查询订票演示示例 编辑
  19. python读取word文档并做简单的批量文档筛选
  20. 时间脱敏,也许能稍稍帮助你摆脱焦虑

热门文章

  1. 基于ModelArts和CANN的端到端行人检测和跟踪Demo(Python版本)【训练篇】
  2. iOS开发 版本更新提醒
  3. 【说明书】核酸测试盒使用说明
  4. 高等数学18讲(19版)7.12
  5. 理解Elasticsearch中的桶聚合(Bucket aggregation)
  6. 数字电路与逻辑设计——模型机时序部件的实现
  7. 三元组java_用Java从数组创建三元组
  8. 学python处理数据结构_Python学习【第2篇】:Python数据结构
  9. 域名服务器系统所维护的信息是,域名服务系统所维护的信息是什么?
  10. jar not loaded. See Servlet Spec 3.0, section 10.7.2. Offending class