Paper link: https://arxiv.org/pdf/1906.01213v1.pdf

Code link:

Source:2019 ACL

Author:Jasminexjf

Time:2019-06-25

ACL会议是自然语言处理领域NLP的顶会,覆盖了语言分析,信息抽取,信息检索,自动问答,情感分析和观点挖掘,文摘和文本生成,文本分类和挖掘,面向Web2.0的自然语言处理,机器翻译,口语处理等众多研究方向。ACL被中国计算机学会推荐国际学术会议列表认定为A类会议.

该文章的信息:title:Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis(作者:Jialong Tang, Ziyao Lu, jinsong su, Yubin Ge, Linfeng Song, Le Sun and Jiebo Luo)。论文共同第一作者是中国科学院软件研究所2018级博士研究生唐家龙和厦门大学软件学院2018级硕士研究生陆紫耀,通讯作者是苏劲松副教授。

概述1:

本文针对神经网络在学习过程中存在的强模式过学习弱模式欠学习的问题,提出了渐进自监督注意力机制算法,有效缓解了上述问题。本文主要基于擦除的思想,使得模型能够渐进的挖掘文本中需要关注的信息,并平衡强模式和弱模式的学习程度。在基于方面层次的情感分析三个公开数据集和两个经典的基础模型上测试表明,所提出的方法取得了不错的性能表现。

概述2:

在方面层次的情感分类任务中,使用注意力机制来捕获上下文文本中与给定方面最为相关的信息是近年来研究者们的普遍做法。然而,注意力机制容易过多的关注数据中少部分有强烈情感极性的高频词汇,而忽略那些频率较低的词

本文提出了一种渐进的自监督注意力的学习算法,能够自动的、渐进的挖掘文本中重要的监督信息,从而在模型训练过程中约束注意力机制的学习。该团队迭代的在训练实例上擦除对情感极性“积极”/“消极”的词汇。这些词在下一轮学习过程中将会被一个特殊标记替代,并记录下来。最终,团队针对不同情况,设计出不同的监督信号,在最终模型训练目标函数中作为正则化项约束注意力机制的学习。

在SemEval 14 REST,LAPTOP以及口语化数据集TWITTER上的实验结果表明,团队提出的渐进注意力机制能在多个前沿模型的基础之上取得显著性提升。

基础架构如下:


1. Abstract:

In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones.

2. Introduction

Aspect-level sentiment classification (ASC), as an indispensable task in sentiment analysis, aims at inferring the sentiment polarity of an input sentence in a certain aspect.
However, the existing attention mechanism in ASC suffers from a major drawback. Specifically, it is prone to overly focus on a few frequent words with sentiment polarities and little attention is laid upon low-frequency ones. As a result, the performance of attentional neural ASC models is still far from satisfaction. We speculate that this is because there exist widely “apparent patterns” and “inapparent patterns”. Here, “apparent patterns” are interpreted as high-frequency words with strong sentiment polarities and “inapparent patterns” are referred to as low-frequency ones in training data. As above-mentioned , NNs are easily affected by these two modes: “apparent patterns” tend to be overly learned while “inapparent patterns” often can not be fully learned.

方面级情感分类(ASC)是情态分析中不可缺少的一项工作,其目的是对输入句在某一方面的情态极性进行推理。

然而,ASC中现有的注意机制存在着很大的缺陷。具体来说,它很容易过度关注少数带有情感极性的高频词,而很少关注低频词。因此,注意神经ASC模型的性能还远远不能令人满意。我们推测这是因为存在广泛的“明显模式”和“不明显模式”。这里,“显性模式”被解释为情绪极性较强的高频词,“隐性模式”在训练数据中被称为低频模式。如上所述,NNs很容易受到这两种模式的影响:“显性模式”容易被过度学习,而“隐性模式”往往不能被完全学习。

例子:

In the first three training sentences given the fact that the context word “small” occurs frequently with negative sentiment, the attention mechanism pays more attention to it and directly relates the sentences containing it with negative sentiment. This inevitably causes an other informative context word “crowded” to be partially neglected in spite of it als opossesses negative sentiment. Consequently, a neural ASC model incorrectly predicts the sentiment of the last two test sentences: in the first test sentence, the neural ASC model fails to capture the negative sentiment implicated by”crowded”;while,in the second test sentence, the attention mechanism directly focuses on “small” though it is not related to the given aspect..

在前三个训练句中,由于语境词“小”经常与消极情绪一起出现,注意机制对其给予了更多的关注,并将包含小情绪的句子与消极情绪直接联系起来。这就不可避免地导致了另一个信息上下文单词“crowded”被部分忽略,而被排除在斜体和否定性词汇之外。因此,情绪的神经ASC模型错误地预测最后两个测试句子:在第一个测试中句子,神经ASC模型未能捕获的负面情绪与“拥挤”;同时,在第二个测试句子,注意机制直接关注“小”虽然没有相关方面.

so  propose a novel progressive self-supervised attention learning approach for neural ASC models。

contributions are three-fold:

(1) Through in-depth analysis, we point out the existing drawback of the attention mechanism for ASC.

(2) We propose a novel incremental approach to automatically extract attention supervision information for neural ASC models. To the best of our knowledge, our work is the first attempt to explore automatic attention supervision information mining for ASC.

(3)We apply our approachto two dominant neural ASC models: Memory Network(MN)(Tangetal.,2016b;Wangetal.,2018) and Transformation Network (TNet) (Li et al., 2018). Experimental results on several benchmark datasets demonstrate the effectiveness of our approach.

(1)通过深入分析,指出了目前一般的注意力机制存在的不足。

(2)提出了一种新的神经ASC模型注意监控信息自动提取的增量方法。

(3)我们将我们的方法应用于两个主要的神经ASC模型:

记忆网络(MN)(Tangetal.,2016b;Wangetal.,2018) 和

转换网络(TNet) (Li etal.,2018)。几个基准数据集的实验结果证明了该方法的有效性。

3.Background

3.1  Memory Networks(MN):

then define the final vector representation v(t) of t as the averaged aspect embedding of its words;

and                                                     o = \sum_i Softmax(v_{t}^TMm_i)h_i

3.2 Tramework Network(TNet/TNet-ATT)

(1) The bottom layer is a Bi-LSTM that transforms the input x into the contextualized word representationsh^{(0)}(x) = (h_1^{(0)},h_2^{(0)},\cdots,h_N^{(0)})(i.e. hidden states of Bi-LSTM).

(2) The middle part, as the core of the whole model, contains L layers of Context-Preserving Transformation (CPT), where word representations are updated as h^{(l+1)}(x) = CPT(h^{(l)}(x)). The key operation of CPT layers is Target-Specific Transformation. It contains another Bi-LSTM for generating v(t) via an attention mechanism, and then incorporates v(t) into the word representations. Besides, CPT layers are also equipped with a Context-Preserving Mechanism (CPM) to preserve the context information and learn more abstract word-level features. In the end, we obtain the word-level semantic representations h(x) = (h_1,h_2,\cdots,h_N),with h_i =h_i^{(L)}

(3) The topmost part is a CNN layer used to produce the aspect-related sentence representation o for the sentiment classification.

(1)底层是Bi-LSTM,它将输入x转换为上下文化的单词表示形式  h^{(0)}(x) = (h_1^{(0)},h_2^{(0)},\cdots,h_N^{(0)})(即Bi-LSTM的隐藏状态)。

(2)中间部分作为整个模型的核心,包含L层上下文保持转换(Context-Preserving Transformation:CPT),其中单词表示形式更新为h^{(l+1)}(x) = CPT(h^{(l)}(x))。CPT层的关键操作是特定于目标的转换。它包含另一个Bi-LSTM,用于通过注意机制生成v(t),然后将v(t)合并到单词表示中。此外,CPT层还配备了上下文保存机制(Context-Preserving Mechanism: CPM)来保存上下文信息和学习更抽象的单词级特性。最后,我们得到了单词级语义表示 h(x) = (h_1,h_2,\cdots,h_N),with h_i =h_i^{(L)}

(3)最上层是CNN层,用于生成与方面相关的句子表示o进行情感分类。

3.3 training objective(NLL)

4. model

we first use the initial training corpus D to conduct model training, and then obtain the initial model parameters θ(0) (Line 1). Then, we continue training the model for K iterations, where influential context words of all trainingin stances can be iteratively extracted (Lines 6-25). During this process, for each training instance (x,t,y), we introduce two word sets initialized as ∅ (Lines 2-5) to record its extracted context words: (1) s_a(x) consists of context words with active effects on the sentiment prediction of x. Each word of s_a(x) will be encouraged to remain considered in the refined model training,and (2) s_m(x) contains context words with misleading effects, whose attention weights are expected to be decreased. Specifically, at the k-th training iteration, we adopt the following steps to deal with (x,t,y):

我们第一次使用初始训练语料库维进行模型训练,然后获得初始模型参数θ(0)(第1行)。然后,我们继续训练模型K迭代,影响力的上下文的所有训练立场可以反复提取(6-25行)。在这个过程中,对于每一个训练实例(t x, y),我们介绍两个词集初始化为∅(2 - 5行)来记录其提取上下文的话:(1) s_a(x)是由上下文词汇与积极影响x的情绪预测。每个单词的 s_a(x)将被鼓励仍然认为改进模型中的训练,和(2) s_m(x)包含上下文与误导的影响,关注权重的预计将下降。具体来说,在第k次训练迭代时,我们采用以下步骤来处理(x,t,y):

step1:line 9 to line 11

step2:line12

step3; line 13 to line 20

step4: line21 to line 24  (detail please see the paper)

where                                        E(\alpha (x')) = - \sum _{i=1}^{N}\alpha (x_{i}^{'})log\alpha (x_{i}^{'})

Through K iterations of the above steps, we manage to extract influential context words of all training instances. Table 2 illustrates the context word mining process of the first sentence shown in Table 1. In this example, we iteratively extract three context words in turn: “small”, “crowded” and “quick”. The former two words are included in s_a(x), while the last one is contained in s_m(x). Finally, the extracted context words of each training instance will be included into D, forming a final training corpus Ds with attention supervision information (Lines 26-29), which will be used to carry out the last model training (Line 30).

5.Expweiments

5.3 Parameter-Setting

the dimnesion of Glove is 300; "OOV": U[-0.25,0.25]; the initialization of other paramters:U[-0.01,0.01]; dropout; Adam; learning rate is 0.001.

iterations k =5; regulariztion coefficients \gamma:

laptop 0.1
resta 0.5
twitter 0.1

accuracy and macro-F1

dataset split: 80% for training and 20% for testing.

5.4 Results

explore the chage of \varepsilon_a

case study:

6.conclusion and future work

方面级paper8Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis(2019ACL)相关推荐

  1. Relation-Aware Collaborative Learning for Unified Aspect-Based Sentiment Analysis(ACL 2020)

    目录 标题翻译:面向统一方面情感分析的关系感知协同学习 原文链接:https://aclanthology.org/2020.acl-main.340.pdf 摘要 1 引言 2 相关工作 3 方法 ...

  2. Progressive Self-Supervised Attention Learning forAspect-Level Sentiment Analysis论文阅读

    Progressive Self-Supervised Attention Learning forAspect-Level Sentiment Analysis翻译及理解 1.本文针对神经网络在学习 ...

  3. Micro-expression recognition with supervised contrastive learning基于对比监督学习的微表情识别

    Micro-expression recognition with supervised contrastive learning基于对比监督学习的微表情识别--2022 PRL 文章链接:https ...

  4. 论文阅读——Multi-Scale Image Contextual Attention Learning for Inpainting

    原文链接:MUSICAL: Multi-Scale Image Contextual Attention Learning for Inpainting (IJCAI 2019). Wang, N., ...

  5. 交通事故风险预测——《TA-STAN: A Deep Spatial-Temporal Attention Learning Framework...》

    一.文章信息 <TA-STAN: A Deep Spatial-Temporal Attention Learning Framework for Regional Traffic Accide ...

  6. Supervised Contrastive Learning(学习笔记)

    Supervised Contrastive Learning NeurlPS2020 原文链接: https://arxiv.org/pdf/2004.11362v4.pdf b站讲解视频: htt ...

  7. 文章分享《Supervised Contrastive Learning》 自监督对比学习和有监督对比学习的区别

    文章题目 Supervised Contrastive Learning 作者: Prannay Khosla Piotr Teterwak Chen Wang Aaron Sarna 时间:2020 ...

  8. NIPS20 - 将对比学习用于监督学习任务《Supervised Contrastive Learning》

    文章目录 原文地址 论文阅读方法 初识 相知 回顾 代码 原文地址 原文 论文阅读方法 三遍论文法 初识 对比学习这两年在自监督学习.无监督学习任务中非常火,取得了非常优秀的性能.这个工作就是想办法将 ...

  9. 论文翻译(5)-Contextual Inter-modal Attention for Multi-modal Sentiment Analysis

    Contextual Inter-modal Attention for Multi-modal Sentiment Analysis 多模态情感分析中的语境跨模态注意 github地址:https: ...

最新文章

  1. 实现单机五子棋,难吗?
  2. MyBatis动态SQL,写SQL更爽
  3. numpy.random.normal
  4. Java、Python、JS、C语言,哪个更值得学?
  5. 【方案分享】2021年钟薛高营销策划方案.pptx(附下载链接)
  6. NYOJ4 - ASCII码排序
  7. asp.net mvc5+Echarts3.0+AspNet.SignalR2.0 实时监控cpu占用率推送
  8. Spark OOM:java heap space,OOM:GC overhead limit exceeded解决方法
  9. intellij idea 2017破解
  10. 【问题描述】打印2018年的日历
  11. python文件的运行方法
  12. Spring的事件处理
  13. linux运行ktr文件,Linux下用命令來執行kettle文件資源庫的文件ktr與kjb的方法
  14. 如何在模拟器中测试Windows Phone 8的NFC应用
  15. fs.readFileSync 引入路径错误
  16. Python 简单的人名对话
  17. 最新苹果同步器技术-手机群控操作-脚本录制分屏控制-实时同步操作一系列APP功能解析分享
  18. 域名投资须知:哪些域名有流量
  19. c语言 ptr 用法,C++之智能指针std::shared_ptr简单使用和理解
  20. formvalidator4.1.3下载,formvalidator4.1.3下载地址,formvalidator4.1.3最新下载地址【蕃薯耀分享】

热门文章

  1. IE如何降级【解决办法】
  2. python线程(中途停止while循环)
  3. vcs中-f -file -F的区别
  4. 太原理工计算机专硕好考吗,太原理工研究生好考吗
  5. CSS设置表格tr行间距的方法
  6. mysql中除数为0怎么解决_ora-01476除数为0的解决办法,oracle中decode()的使用
  7. Matlab非线性优化函数fmincon
  8. conceptd什么时候上市_本田新款7座suv车型来袭 新款conceptD有望国内上市
  9. C++使用hash_map时警告
  10. Enemy源码简单分析