最近做的一个项目是短文本关键词提取(twitter, linkedin post),这里主要用到了两个算法,一个是TextRank, 一个是RAKE,总的来说,这两个算法思路上差别很大,但对于短文本的关键词提取来说,RAKE算法效果更为明显。

TextRank 介绍

 TextRank 算法是一种用于文本的基于图的排序算法。其基本思想来源于谷歌的 PageRank算法, 通过把文本分割成若干组成单元(单词、句子)并建立图模型, 利用投票机制对文本中的重要成分进行排序, 仅利用单篇文档本身的信息即可实现关键词提取、文摘。和 LDA、HMM 等模型不同, TextRank不需要事先对多篇文档进行学习训练, 因其简洁有效而得到广泛应用。
  TextRank 一般模型可以表示为一个有向有权图 G =(V, E), 由点集合 V和边集合 E 组成, E 是V ×V的子集。图中任两点 Vi , Vj 之间边的权重为 wji , 对于一个给定的点 Vi, In(Vi) 为 指 向 该 点 的 点 集 合 , Out(Vi) 为点 Vi 指向的点集合。点 Vi 的得分定义如下:

  其中, d 为阻尼系数, 取值范围为 0 到 1, 代表从图中某一特定点指向其他任意点的概率, 一般取值为 0.85。使用TextRank 算法计算图中各点的得分时, 需要给图中的点指定任意的初值, 并递归计算直到收敛, 即图中任意一点的误差率小于给定的极限值时就可以达到收敛, 一般该极限值取 0.0001。



  1. 把给定的文本T按照完整句子进行分割,即

  2. 对于每个句子,进行分词和词性标注处理,并过滤掉停用词,只保留指定词性的单词,如名词、动词、形容词,即,其中是保留后的候选关键词。

  3. 构建候选关键词图G = (V,E),其中V为节点集,由(2)生成的候选关键词组成,然后采用共现关系(co-occurrence)构造任两点之间的边,两个节点之间存在边仅当它们对应的词汇在长度为K的窗口中共现,K表示窗口大小,即最多共现K个单词。

  4. 根据上面公式,迭代传播各节点的权重,直至收敛。

  5. 对节点权重进行倒序排序,从而得到最重要的T个单词,作为候选关键词。

  6. 由(5)得到最重要的T个单词,在原始文本中进行标记,若形成相邻词组,则组合成多词关键词。例如,文本中有句子“Matlab code for plotting ambiguity function”,如果“Matlab”和“code”均属于候选关键词,则组合成“Matlab code”加入关键词序列。


RAKE(Rapid Automatic keyword extraction) 介绍





  • wordScore = wordDegree(w) / wordFrequency(w)



  另外,值得说明的是,关于分数计算这部分,wordDegree(w)实际上是等于word和每一个phrase里面的词共现的次数加上word的frequency。具体算法请看附件论文,《Automatic Keyword Extraction from IndividualDocumen》




“Great interview by Gerry Dick with Ball State University’s new president, Geoffrey Mearns, who recognizes the need to offer curriculum that meets students’ needs. Aidex would welcome the opportunity to introduce our latest learning technologies, including Desktop Metal, metal 3D printing; SynDaver Labs and its lifelike human cadavers; and FANUC America Corporation robotics and CNC technology. These technologies elevate the educational experience and prepare students for fantastic careers. We hope to visit Muncie soon to present these and other STEM technologies.”


[('fanuc america corporation robotics', 16.0), ('ball state university', 9.0), ('lifelike human cadavers', 9.0), ('including desktop metal', 9.0), ('metal 3d printing', 9.0), ('latest learning technologies', 8.333333333333334), ('stem technologies', 4.333333333333334), ('technologies elevate', 4.333333333333334), ('educational experience', 4.0), ('geoffrey mearns', 4.0), ('syndaver labs', 4.0), ('great interview', 4.0), ('prepare students', 4.0), ('visit muncie', 4.0), ('meets students', 4.0), ('cnc technology', 4.0), ('offer curriculum', 4.0), ('gerry dick', 4.0), ('fantastic careers', 4.0), ('aidex', 1.0), ('recognizes', 1.0), ('introduce', 1.0), ('president', 1.0), ('opportunity', 1.0), ('present', 1.0), ('hope', 1.0)]



“Yesterday, Desktop Metal CEO Ric Fulop joined Bloomberg Radio to discuss the future of metal 3D printing. Listen to the interview here”


[('desktop metal ceo ric fulop joined bloomberg radio', 61.5), ('metal 3d printing', 11.5), ('yesterday', 1.0), ('interview', 1.0), ('future', 1.0), ('discuss', 1.0), ('listen', 1.0)]


“3D printing metal on a desktop FDM printer, exclusive interview with The Virtual Foundry founder : Is 2017 going to be the year for 3D printing metal? Recently 3D Printing Industry reported announcements from Markforged about their forthcoming Metal X 3D “


[('recently 3d printing industry reported announcements', 31.25), ('3d printing metal', 9.916666666666666), ('virtual foundry founder', 9.0), ('desktop fdm printer', 9.0), ('forthcoming metal', 4.666666666666666), ('exclusive interview', 4.0), ('3d', 3.25), ('year', 1.0), ('markforged', 1.0), ('2017', 0)]



  • Desktop Metal is proud to welcome Morris Group, Inc.. as an authorized reseller of its metal 3D printing systems in 30 states. With the addition of Desktop Metal’s Studio System™ to its existing lineup of CNC machine tools, Morris Group’s extensive distributor network provides an end-to-end suite of advanced solutions to manufacturers of precision metal parts.

  • In the latest episode of podcast, The Digital Factory, Desktop Metal CEO Ric Fulop shares his thoughts on the state of the metal 3D printing industry.

  • We’re excited to announce our Series D Funding with support from our strategic partners NEA, GV, GE Ventures, among others.

  • Register now for ‘s Metal 3D Printing webinar featuring Desktop Metal and the Studio System, the world’s first office-friendly metal 3D printing system.

  • The latest issue of examines how recent advances make 3D printing a powerful competitor to conventional mass production. Read the full article here, including commentary from Desktop Metal CEO Ric Fulop.

  • We’re honored to be recognized as one of ‘s 50 Smartest Companies of 2017.

  • Desktop Metal is honored to join the prestigious roster of recipients of the World Economic Forum Technology Pioneers program. For the press release, please visit: .

  • See the full list of Technology Pioneers 2017 here: .

  • At RAPID+TCT last month, Desktop Metal CTO Jonah Myerberg spoke with about leveraging metal 3D printing for the full product life cycle, from prototyping to mass production.

  • At RAPID+TCT, Desktop Metal CTO Jonah Myerberg talked to TechCrunch about our metal 3D printing solutions. Check out the video here:

  • This past weekend, Desktop Metal was honored to be recognized as Startup of the Year by the 3D Printing Industry awards. Thank you to all who voted!

  • Yesterday, Desktop Metal CEO Ric Fulop joined Bloomberg Radio to discuss the future of metal 3D printing. Listen to the interview here:

  • Today in the Wall Street Journal: 3D printing is transforming manufacturing, from prototyping to mass production.

  • Desktop Metal CEO Ric Flop joined CNBC’s Squawk Box to discuss the latest in metal 3D printing–from prototyping to mass production.


str score
desktop metal ceo ric fulop joined bloomberg radio 52.1515151515
desktop metal ceo ric flop joined cnbc 43.8181818182
metal 3d printing webinar featuring desktop metal 36.1090909091
desktop metal ceo ric fulop shares 34.6515151515
desktop metal cto jonah myerberg spoke 33.3181818182
desktop metal cto jonah myerberg talked 33.3181818182
world economic forum technology pioneers program 29.5
desktop metal ceo ric fulop 28.6515151515
recent advances make 3d printing 23.2909090909
office-friendly metal 3d printing system 20.7909090909
metal 3d printing industry 16.7909090909
metal 3d printing systems 16.7909090909
leveraging metal 3d printing 16.7909090909
3d printing industry awards 16.2909090909
metal 3d printing solutions 15.7909090909
full product life cycle 14.6666666667
metal 3d printing 12.7909090909
metal 3d printing– 11.5909090909
precision metal parts 10.5
desktop metal 9.31818181818
desktop metal’ 9.31818181818
strategic partners nea 9.0
extensive distributor network 9.0
wall street journal 9.0
cnc machine tools 9.0
3d printing 8.29090909091
technology pioneers 2017 8.0
conventional mass production 7.5
studio system 5.0
advanced solutions 5.0
studio system™ 5.0
full list 4.66666666667
full article 4.66666666667
mass production 4.5
50 smartest companies 4.0
transforming manufacturing 4.0
including commentary 4.0
authorized reseller 4.0
ge ventures 4.0
squawk box 4.0
end-to-end suite 4.0
prestigious roster 4.0
digital factory 4.0
morris group 4.0
past weekend 4.0
press release 4.0
existing lineup 4.0
morris group’ 4.0
powerful competitor 4.0
latest episode 3.66666666667
latest issue 3.66666666667
world 3.5
latest 1.66666666667
gv 1.0
month 1.0
voted 1.0
announce 1.0
techcrunch 1.0
recipients 1.0
read 1.0
discuss 1.0
honored 1.0
series 1.0
startup 1.0
prototyping 1.0
year 1.0
funding 1.0
state 1.0
rapid+tct 1.0
recognized 1.0
visit 1.0
addition 1.0
support 1.0
today 1.0
listen 1.0
manufacturers 1.0
30 states 1.0
podcast 1.0
join 1.0
excited 1.0
future 1.0
video 1.0
proud 1.0
examines 1.0
check 1.0
interview 1.0
yesterday 1.0
thoughts 1.0
register 1.0
2017 0


str str str str
desktop metal d printing production
product join joined morris
myerberg latest advances advanced solutions
fulop distributor network ric machine
partners ge pioneers economic


str mean score
desktop metal desktop metal 49.5083333333
desktop metal 49.5083333333
morris 42.1666666667
metal 3d printing tool 38.9375
morris group 37.0
metal 3d printing solutions 36.8333333333
make metal 3d printing 35.1583333333
represent desktop metal 34.1166666667
desktop metal offers 34.1166666667
desktop metal products 34.1166666667
metal additive manufacturing 33.3388888889
office-friendly metal 3d printing system 32.7966666667
end-to-end metal 3d printing solutions 30.8066666667
innovative metal 3d printing systems 29.9566666667
metal cutting manufacturers 29.6722222222
precision metal parts 29.0611111111
morris group distributor 28.3055555556
morris company 27.25
bound metal deposition 26.8388888889
3d printing process 23.9
morris group distribution network 23.125
local morris group distributor 22.2916666667
groundbreaking 3d printing technology 21.5083333333
studio system 21.3



短文本关键词提取算法RAKE TextRank及改进相关推荐

  1. 基于TextRank的关键词提取算法

    基于TextRank的关键词提取算法 前沿 TextRank是一种文本排序算法,是基于著名的网页排序算法PageRank改动而来.在介绍TextRank前,我们先简单介绍下什么是PageRank.另外 ...

  2. tfidf关键词提取_基于TextRank提取关键词、关键短语、摘要,文章排序

    之前使用TFIDF做过行业关键词提取,TFIDF仅从词的统计信息出发,而没有充分考虑词之间的语义信息.TextRank考虑到了相邻词的语义关系,是一种基于图排序的关键词提取算法. TextRank的提 ...

  3. KeyBert、TextRank等九种本文关键词提取算法(KPE)原理及代码实现

    关键词提取 (Keyphrase Extraction,KPE) 任务可以自动提取文档中能够概括核心内容的短语,有利于下游信息检索和 NLP 任务.当前,由于对文档进行标注需要耗费大量资源且缺乏大规模 ...

  4. 关键词提取算法TextRank

    很久以前,我用过TFIDF做过行业关键词提取.TFIDF仅仅从词的统计信息出发,而没有充分考虑词之间的语义信息.现在本文将介绍一种考虑了相邻词的语义关系.基于图排序的关键词提取算法TextRank. ...

  5. 关键词提取算法之RAKE

    关键词提取算法之RAKE RAKE(Rapid Automatic Keyword Extraction)算法,作者Alyona Medelyan,她的GitHub上有很多关键字提取的项目. RAKE ...

  6. TextRank关键词提取算法

    参考:百度AI Studio课程_学习成就梦想,AI遇见未来_AI课程 - 百度AI Studio - 人工智能学习与实训社区 (baidu.com) 1.关键词提取算法分类 1.有监督 将关键词提取 ...

  7. 系统学习NLP(二十一)--关键词提取算法总结

    先说一下自动文摘的方法.自动文摘(Automatic Summarization)的方法主要有两种:Extraction和Abstraction.其中Extraction是抽取式自动文摘方法,通过提取 ...

  8. 关键词提取算法—TF/IDF算法

    关键词提取算法一般可分为有监督学习和无监督学习两类. 有监督的关键词提取方法可以通过分类的方式进行,通过构建一个较为完善的词表,然后判断每个文档与词表中的每个词的匹配程度,以类似打标签的方式,达到关键 ...

  9. 广告行业中那些趣事系列60:详解超好用的无监督关键词提取算法Keybert

    导读:本文是"数据拾光者"专栏的第六十篇文章,这个系列将介绍在广告行业中自然语言处理和推荐系统实践.本篇从理论到实践介绍了超好用的无监督关键词提取算法Keybert,对于希望使用无 ...


  1. ggplot2 调整绘图区域大小
  2. 「杂谈」计算机视觉人脸图像的十几个大的应用方向,你懂了几分?
  3. XCode 学习技巧之 User Scripts
  4. 【arduino】初测ESP32的DAC生成AV视频模拟信号项目:ESP32CompositeVideo
  5. ubuntu安装mysql,error: No curses/termcap library found报错
  6. python 框架和 spring mvc_Django和Spring MVC,该选择哪个框架进行Web开发学习?
  7. hystrix熔断 简介_Hystrix简介
  8. 【代码升级】【iCore3 双核心板】例程二十八:FSMC实验——读写FPGA
  9. latex 生成中文目录乱码问题解决
  10. mysql加锁6_MySQL优化(6):Mysql锁机制
  12. 分析图第四讲5.29
  13. 文件上传下载——sz和rz
  14. 2019ccpc河北省赛总结
  15. live555 rtsp直播卡顿马赛克优化
  16. 学习资料:8大行业,30个大数据实践案例分享
  17. 通达信自动交易软件步骤分析
  18. Java实现对称密钥算法
  19. win7如何进入修复计算机,win7电脑故障怎么进入安全模式修复
  20. 一文弄懂Spring Cloud的5大核心组件详解:Eureka+Hystrix+Zuul+Ribbon


  1. error:bad signature
  2. bochs上网镜像怎么上网_bochs core镜像可上网版
  3. 【VFS】Apache VFS: FTP
  4. caxa计算机绘图工程师,CAXA 制造工程师界面
  5. GCC - GIMPLE IR学习之pass
  6. 衣服的褶皱怎么画?怎样才能画好衣服的褶皱?
  7. java-php-python-ssm苹果酒店住房管理计算机毕业设计
  8. [Linux]CentOS7校准时间--NTP
  9. java Null==undefined_javascript中的undefined和null有什么区别
  10. 从头搭建一个基于 Python 的在线聊天室