常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper

任务	描述	corpus/dataset	评价指标	SOTA 结果	Papers
Chunking	组块分析	Penn Treebank	F1	95.77	A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Common sense reasoning	常识推理	Event2Mind	cross-entropy	4.22	Event2Mind: Commonsense Inference on Events, Intents, and Reactions
Parsing	句法分析	Penn Treebank	F1	95.13	Constituency Parsing with a Self-Attentive Encoder
Coreference resolution	指代消解	CoNLL 2012	average F1	73	Higher-order Coreference Resolution with Coarse-to-fine Inference
Dependency parsing	依存句法分析	Penn Treebank	POS UAS LAS	97.3 95.44 93.76	Deep Biaffine Attention for Neural Dependency Parsing
Task-Oriented Dialogue/Intent Detection	任务型对话/意图识别	ATIS/Snips	accuracy	94.1 97.0	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Slot Filling	任务型对话/槽填充	ATIS/Snips	F1	95.2 88.8	Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Task-Oriented Dialogue/Dialogue State Tracking	任务型对话/状态追踪	DSTC2	Area Food Price Joint	90 84 92 72	Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Domain adaptation	领域适配	Multi-Domain Sentiment Dataset	average accuracy	79.15	Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Entity Linking	实体链接	AIDA CoNLL-YAGO	Micro-F1-strong Macro-F1-strong	86.6 89.4	End-to-End Neural Entity Linking
Information Extraction	信息抽取	ReVerb45K	Precision Recall F1	62.7 84.4 81.9	CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Grammatical Error Correction	语法错误纠正	JFLEG	GLEU	61.5	Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
Language modeling	语言模型	Penn Treebank	Validation perplexity Test perplexity	48.33 47.69	Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Lexical Normalization	词汇规范化	LexNorm2015	F1 Precision Recall	86.39 93.53 80.26	MoNoise: Modeling Noise Using a Modular Normalization System
Machine translation	机器翻译	WMT 2014 EN-DE	BLEU	35.0	Understanding Back-Translation at Scale
Multimodal Emotion Recognition	多模态情感识别	IEMOCAP	Accuracy	76.5	Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Multimodal Metaphor Recognition	多模态隐喻识别	verb-noun pairs adjective-noun pairs	F1	0.75 0.79	Black Holes and White Rabbits: Metaphor Identification with Visual Features
Multimodal Sentiment Analysis	多模态情感分析	MOSI	Accuracy	80.3	Context-Dependent Sentiment Analysis in User-Generated Videos
Named entity recognition	命名实体识别	CoNLL 2003	F1	93.09	Contextual String Embeddings for Sequence Labeling
Natural language inference	自然语言推理	SciTail	Accuracy	88.3	Improving Language Understanding by Generative Pre-Training
Part-of-speech tagging	词性标注	Penn Treebank	Accuracy	97.96	Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Question answering	问答	CliCR	F1	33.9	CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension
Word segmentation	分词	VLSP 2013	F1	97.90	A Fast and Accurate Vietnamese Word Segmenter
Word Sense Disambiguation	词义消歧	SemEval 2015	F1	67.1	Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison
Text classification	文本分类	AG News	Error rate	5.01	Universal Language Model Fine-tuning for Text Classification
Summarization	摘要	Gigaword	ROUGE-1 ROUGE-2 ROUGE-L	37.04 19.03 34.46	Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
Sentiment analysis	情感分析	IMDb	Accuracy	95.4	Universal Language Model Fine-tuning for Text Classification
Semantic role labeling	语义角色标注	OntoNotes	F1	85.5	Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Semantic parsing	语义解析	LDC2014T12	F1 Newswire F1 Full	0.71 0.66	AMR Parsing with an Incremental Joint Model
Semantic textual similarity	语义文本相似度	SentEval	MRPC SICK-R SICK-E STS	78.6/84.4 0.888 87.8 78.9/78.6	Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
Relationship Extraction	关系抽取	New York Times Corpus	P@10% P@30%	73.6 59.5	RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information
Relation Prediction	关系预测	WN18RR	H@10 H@1 MRR	59.02 45.37 49.83	Predicting Semantic Relations using Global Graph Properties

常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper相关推荐

从想法到实干，2018年13项NLP绝美新研究
https://www.toutiao.com/a6638865460580319757/ 机器之心 2018-12-25 17:48:38 在即将过去的 2018 年中,自然语言处理有很多令人激动的 ...
继BERT之后，这个新模型再一次在11项NLP基准上打破纪录
机器之心报道作者:思源自 BERT 打破 11 项 NLP 的记录后,可应用于广泛任务的 NLP 预训练模型就已经得到大量关注.最近微软推出了一个综合性模型,它在这 11 项 NLP 任务中超过了 ...
木马爱修改的常见注册表项及其功能
IE相关: 设置IE多线程下载网页的线程数: HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\InternetSettings ...
北航计算机学院本科优秀毕业论文,我校荣获32项北京市普通高等学校优秀本科生毕业设计（论文）...
北航新闻网12月6日电(通讯员曲越)为进一步加强本科教育教学工作,提升毕业设计(论文)质量,市教委组织开展了2019年北京市普通本科高校大学生毕业论文(设计)评优工作,近日,市教委公布了评选结果,我 ...
不止于美，华阳国际揽获深圳市优秀工程勘察设计奖32项殊荣
建筑审美活动是人对建筑的生命体验活动和情感价值活动,具有超功利性.主体性.审美快感的综合性等主要特征,因此,对于建筑来说,美是不可缺少的一方面.但建筑工程质量的优劣将会影响到人民在其进行活动的舒适程度 ...
十一项全球最具权威的大数据资质认证
文章讲的是十一项全球最具权威的大数据资质认证,数据科学家.数据分析师.工程师乃至开发人员注意了!只要拥有能够处理大数据相关技术的能力,人才市场就会展现出慷慨的笑容与热情的怀抱.想在下一次求职时更上一层 ...
谷歌 | 最新110亿参数的T5模型17项NLP任务霸榜SuperGLUE！
新智元报道来源:github 谷歌在最新发布的论文<Exploring the Limits of Transfer Learning with a Unified Text-to-T ...
ACL 2019 | 基于知识增强的语言表示模型，多项NLP任务表现超越BERT（附论文解读）...
来源:PaperWeekly 本文共2000字,建议阅读10分钟. 本文提出了一种新方法,将知识图谱的信息加入到模型的训练中. 论文动机自从 BERT 被提出之后,整个自然语言处理领域进入了一个全新 ...
多项NLP任务新SOTA，Facebook提出预训练模型BART
2019-11-04 13:38:14 论文选自arXiv 作者:Mike Lewis等机器之心编译参与:魔王.一鸣 FaceBook 近日提出了一个名为BART的预训练语言模型.该模型结合双向和自回 ...

常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper

常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper相关推荐

最新文章

热门文章