神经机器翻译中语言学知识的引入

博客地址：http://blog.csdn.net/wangxinginnlp/article/details/56488921

准备在组内做一个关于神经机器翻译中语言学知识的引入（syntax + NMT）的报告，先把相关文章进行罗列下：

**下面都是个人观点

1. Linguistic Input Features Improve Neural Machine Translation (WMT2016)

http://www.statmt.org/wmt16/pdf/W16-2209.pdf

*提出用各种linguistic features来丰富encoding信息

*设计一种把linguistic features输入encoder的模式（虽然很简单）

2. Tree-to-Sequence Attentional Neural Machine Translation (ACL2016)

http://www.aclweb.org/anthology/P/P16/P16-1078.pdf

*设计一种tree-based encoder显示地把tree中间的节点表示出来，但tree-based encoder貌似只用到binary合成信息，没有用节点的类别信息（NP VP这种）

*attention mechanism在tree-based encoder和sequence encoder两种encoder的输出上分别做attention，然后把两种context vector简单相加形成最终的context vector供decoder使用

3. Multi-task sequenceto sequence learning (ICLR 2016)

https://arxiv.org/pdf/1511.06114.pdf

*One-to-many Setting中一个encoder，多个decoder分别用来做translation，parsing和auto-encode

*source-side syntax很巧妙的被利用：用source-side syntax做parsing，监督信号会被传到共享的encoder帮助更好地进行source sentence representation，从而帮助更好地进行translation，反之亦然

4. Factored Neural Machine Translation

https://arxiv.org/pdf/1609.04621.pdf

*主要针对limited vocabulary问题，target-side vocabulary

*用morphological analyser获得a factored representation of each word (lemmas, Partof Speech tag, tense, person, gender and number)

*decoder输出lemmas sequence和factors sequence这两种sequence，还提出一种解决方案限制两种sequence长度相同

*最后通过lemmas + factors恢复出原始的word

5. Learning to Parse and Translate Improves Neural Machine Translation

https://arxiv.org/pdf/1702.03525.pdf

*用到了recurrentneural network grammars（参见引文1 http://www.aclweb.org/anthology/N/N16/N16-1024.pdf）

*利用target-side parse information来帮助更好地encoding source sentence

*引文1 中 recurrentneural network grammars的output是action sequence，decoder每一次都使用Stack Buffer Action三者的hidden states来预测下一个action。这个工作里面把Buffer hidden state换成NMT decoder hidden state，并且每次在output出shift action才会output一个word。所以新的decoder output sequence是mixture sequence of actions and target words。

*类似，这里面的action sequence用到了target-side syntax information，帮助更好地进行source sentence representation

*问题：testing阶段，decoding出来的action sequence可能是不完整

6. Syntax-aware Neural Machine Translation Using CCG

https://arxiv.org/pdf/1702.01147.pdf

*source-side和target-side都用到syntactic information

*source-side syntactic information用文章[1]的所提的方法进行使用，重点在target-side syntactic information的使用

*三种方法：1> serializing CCG tag和word放在一个sequence里面进行解码

2> Multitasking (1) – shared encoder 一个encoder，两个decoder，一个解码CCG，一个解码word

3> Multitasking (2) – distinct softmax 一个encoder一个decoder，不过要从同一时刻的decoder hidden state中分别解码出word和CCG

作者很诚实地指出：The serializing approach increases the length of the target sequence which might lead to loss of information learned at lexical level. For the multitasking (1) approach there is no explicit way to constrain the number of predicted words and tags to match. The multitasking (2) approach does not condition the prediction of target words on thesyntactic context.

7. Neural Machine Translation with Source-Side Latent Graph Parsing

https://arxiv.org/pdf/1702.02265.pdf

*写的不明白，部分地方没有读懂

*做了一个多层的encoder。first layer hidden state来弄pos tag，second layer hidden state拿出来弄dependency parsing，但是在dependency parsing layer中每个hidden state和其他hidden state的倒腾下，通过这个倒腾计算他们之间relation的强弱关系，这个强弱关系和其他的hidden state加权求和用来形成一个新的hidden state（这个state包含了长距离的关系，毕竟显示地和其他的长距离的second layer hidden state直接倒腾过）。 decoding时候，decoder需要在新的hidden state sequence上也算一个context vector（这个vector被认为包含了source sentence内部长距离关系），最后这个新的context vector也参与target word prediction。

*亮点就是source-side dependency parsing layer hidden state之间要算一下relation，帮助捕捉long distance dependencies

相关论文（在encoder-decoder结构中考虑tree信息）：

101. Language to Logical Form with Neural Attention (ACL 2016)

http://www.aclweb.org/anthology/P16-1004

*任务semantic parsing是实现natural language input utterances到logical forms的转换，作者采用流行的encoder-decoder结构来进行。

*作者提出一种hierarchical tree decoder，作者认为这种decoder可以explicitly captures the compositional structure of logical forms，也就是实现了sequence-to-tree的转换。想法很直观。

*hierarchical tree decoder广度优先生成target tree，并提出一种parent-feeding connection来做下一层subtree的生成(下一层subtree时间是等上一层生成完，时间上和上一层最后一个终结符节点最相关，但是结构上和上一层对应的非终结符节点更相关。见Figure 4的说明)。

102. Tree-structured decoding with doubly-recurrent neural networks (ICLR 2017)

https://openreview.net/pdf?id=HkYhZDqxg

*和文章101一样，也是广度优先生成target tree（见Figure 1）。

*和文章101不一样 1)节点生成（node generation）部分。需要两个RNN分别记录父亲节点（parent）和前一时刻兄弟节点（previous sibling），称之为ancestral and fraternal hidden states。2)树结构预测，每个节点生成后需要做两次预测（1）是否有孩子（2）是否有后面的兄弟节点，从而可以生成树的结构。而文章101则是直接在每层中对应地方加非终结符和每个子树后加一个生成终结符。

*为什么树结构生成没有和文章101一样？作者的从训练复杂度和概率估计两个方面解释了自己的做法:These ideas has been adopted by most tree decoders (Dong & Lapata, 2016). There are two important downsides of using a padding strategy for topology prediction in trees. First,the size of the tree can grow considerably. While in the sequence framework only one stoppingtoken is needed, a tree with n nodes might need up to O(n) padding nodes to be added. This canhave important effects in training speed. The second reason is that a single stopping token selected competitively with other tokens requires one to continually update the associated parameters in response to any changes in the distribution over ordinary tokens so as to maintain topological control.

103. Does String-Based Neural MT Learn Source Syntax? (EMNLP 2016)

http://www.aclweb.org/anthology/D/D16/D16-1159.pdf

*很有意思，作者从encoder的不同层的hidden state中去预测不同层次的syntax information，得到了一些有意思的结论：We find that both local and global syntactic information about source sentences is captured by the encoder. Different types of syntax is stored in different layers, with different concentration degrees. 简单说就是encoder学习出的东西多多少少含有（隐式的）syntax information，encoder的对句子的不同层次抽象与syntax information也或多或少有些对应关系。

104. When Are Tree Structures Necessary for Deep Learning of Representations? (EMNLP 2015)

http://www.aclweb.org/anthology/D/D15/

*和文章103一样，读读非常有意思。

105. Tree Memory Networks for Modelling Long-term Temporal Dependencies

https://arxiv.org/pdf/1703.04706.pdf

看题目很有意思，留着改天看。

总结：

1）丰富输入单元的信息

2）多任务间互相促进：multi-task learning

3）结构改变：sequence-to-tree or tree-to-sequence

4）http://hlt.suda.edu.cn/~xwang/slides/syntax_NMT.pdf

神经机器翻译中语言学知识的引入相关推荐

线上直播 | NVIDIA TensorRT在神经机器翻译中的应用
神经机器翻译(Neural Machine Translation,简称 NMT)存在于各种各样的消费者应用程序中,包括 web 站点.路标.在外语中生成字幕等. NVIDIA 的可编程推理加速器 T ...
神经机器翻译中的关键技术
这里写目录标题前言基本流程语言学分析前处理(Pre-process) 正则化(Normalization) Truecase 预分词(Pre-tokenization) 分词(Tokeniza ...
神经机器翻译的前世今生--转自散文网
本文转自散文网,原文链接如下:http://sanwen.net/a/mjyslpo.html 神经机器翻译 2016-11-13 03:17雅译公司推荐100次 1. 引言神经机器翻译( ...
[转]神经机器翻译（NMT）相关资料整理
1 简介自2013年提出了神经机器翻译系统之后,神经机器翻译系统取得了很大的进展.最近几年相关的论文,开源系统也是层出不穷.本文主要梳理了神经机器翻译入门.进阶所需要阅读的资料和论文,并提供了相关链 ...
神经机器翻译（NMT）详细资料整理
1 简介自2013年提出了神经机器翻译系统之后,神经机器翻译系统取得了很大的进展.最近几年相关的论文,开源系统也是层出不穷.本文主要梳理了神经机器翻译入门.进阶所需要阅读的资料和论文,并提供了相关链 ...
神经机器翻译系统资料
作者:zhbzz2007 出处:http://www.cnblogs.com/zhbzz2007 欢迎转载,也请保留这段声明.谢谢! 1 简介自2013年提出了神经机器翻译系统之后,神经机器翻译系统 ...
可视化神经机器翻译模型（基于注意力机制的Seq2seq模型）
可视化神经机器翻译模型(基于注意力机制的Seq2seq模型) 序列到序列模型是深度学习模型,在机器翻译.文本摘要和图像字幕等任务中取得了很大的成功.谷歌翻译在2016年底开始在生产中使用这样的模型 ...
直播实录 | 非自回归神经机器翻译 + ICLR 2018 论文解读
本文为 3 月 9 日,香港大学博士生--顾佳涛博士在第 24 期 PhD Talk 中的直播分享实录. 在本期 PhD Talk 中,来自香港大学的博士生顾佳涛,向大家介绍了他们在加速神经机器翻译( ...
今晚直播：非自回归神经机器翻译 | PhD Talk #24
「PhD Talk」是 PaperWeekly 的学术直播间,旨在帮助更多的青年学者宣传其最新科研成果.我们一直认为,单向地输出知识并不是一个最好的方式,而有效地反馈和交流可能会让知识的传播更加有意义 ...

神经机器翻译中语言学知识的引入

神经机器翻译中语言学知识的引入相关推荐

最新文章

热门文章