sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

QQ:231469242

欢迎喜欢nltk朋友交流

Wordnet with NLTK

英语的同义词和反义词函数
# -*- coding: utf-8 -*-
"""
Spyder Editor

英语的同义词和反义词函数
"""import nltk
from nltk.corpus import wordnet
syns=wordnet.synsets('program')
'''
syns
Out[11]:
[Synset('plan.n.01'),Synset('program.n.02'),Synset('broadcast.n.02'),Synset('platform.n.02'),Synset('program.n.05'),Synset('course_of_study.n.01'),Synset('program.n.07'),Synset('program.n.08'),Synset('program.v.01'),Synset('program.v.02')]'''print(syns[0].name())'''
plan.n.01
'''    #just the word只显示文字,lemma要点
print(syns[0].lemmas()[0].name())
'''
plan
'''
#单词句子使用
print(syns[0].examples())
'''
['they drew up a six-step plan', 'they discussed plans for a new bond issue']
'''    '''
synonyms=[]
antonyms=[]list_good=wordnet.synsets("good")
for syn in list_good:for l in syn.lemmas():#print('l.name()',l.name())synonyms.append(l.name())if l.antonyms():antonyms.append(l.antonyms()[0].name())print(set(synonyms))
print(set(antonyms))
'''word="good"
#返回一个单词的同义词和反义词列表
def Word_synonyms_and_antonyms(word):synonyms=[]antonyms=[]list_good=wordnet.synsets(word)for syn in list_good:for l in syn.lemmas():#print('l.name()',l.name())synonyms.append(l.name())if l.antonyms():antonyms.append(l.antonyms()[0].name())return (set(synonyms),set(antonyms))#返回一个单词的同义词列表
def Word_synonyms(word):list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)return list_synonyms_and_antonyms[0]#返回一个单词的反义词列表
def Word_antonyms(word):list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)return list_synonyms_and_antonyms[1]    '''
Word_synonyms("evil")
Out[43]:
{'evil','evilness','immorality','iniquity','malefic','malevolent','malign','vicious','wickedness'}Word_antonyms('evil')
Out[44]: {'good', 'goodness'}
'''

wordNet是一个英语词汇数据库,普林斯顿大学创建,是nltk语料库的一部分

WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus.

You can use WordNet alongside the NLTK module to find the meanings of words, synonyms同义词, antonyms反义词, and more. Let's cover some examples.

First, you're going to need to import wordnet:

from nltk.corpus import wordnet

Then, we're going to use the term "program" to find synsets 同义词集合like so:

syns = wordnet.synsets("program")

An example of a synset:

print(syns[0].name())

plan.n.01

Just the word: 只显示单词

print(syns[0].lemmas()[0].name())

plan

Definition of that first synset:

print(syns[0].definition())

a series of steps to be carried out or goals to be accomplished

Examples of the word in use:

print(syns[0].examples())

['they drew up a six-step plan', 'they discussed plans for a new bond issue']

Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:

synonyms = []
antonyms = [] for syn in wordnet.synsets("good"): for l in syn.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[0].name()) print(set(synonyms)) print(set(antonyms))
{'beneficial', 'just', 'upright', 'thoroughly', 'in_force', 'well', 'skilful', 'skillful', 'sound', 'unspoiled', 'expert', 'proficient', 'in_effect', 'honorable', 'adept', 'secure', 'commodity', 'estimable', 'soundly', 'right', 'respectable', 'good', 'serious', 'ripe', 'salutary', 'dear', 'practiced', 'goodness', 'safe', 'effective', 'unspoilt', 'dependable', 'undecomposed', 'honest', 'full', 'near', 'trade_good'} {'evil', 'evilness', 'bad', 'badness', 'ill'}

As you can see, we got many more synonyms than antonyms, since we just looked up the antonym for the first lemma, but you could easily balance this buy also doing the exact same process for the term "bad."

比较单词近似度

Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method for semantic related-ness.

Let's compare the noun of "ship" and "boat:"

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('boat.n.01') print(w1.wup_similarity(w2))

0.9090909090909091

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('car.n.01') print(w1.wup_similarity(w2))

0.6956521739130435

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('cat.n.01') print(w1.wup_similarity(w2))

0.38095238095238093

Next, we're going to pick things up a bit and begin to cover the topic of Text Classification.

python风控评分卡建模和风控常识

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

转载于:https://www.cnblogs.com/webRobot/p/6080208.html

自然语言22_Wordnet with NLTK相关推荐

  1. python语言pos_Python自然语言处理(二)--NLTK调用Stanford_NLP_Tools完成NLP任务

    上一篇博文Python自然语言处理(一)介绍了如何利用NLTK自带的函数快速进行NLP任务,适用于对NLP处理要求不高的场景. 如果对NLP的效果有较高要求的话,那些NLTK自带的函数可能就无法满足要 ...

  2. Python自然语言处理-自然语言工具包(NLTK)

    一. 简介 如何理解每个单词的具体含义.自然语言工具包(Natural Language Toolkit,NKTK)就是这样一个python库,用于识别和标记英语文本单词中各个词的词性(parts o ...

  3. python自然语言处理工具NLTK各个包的意思和作用总结

    [转]http://www.myexception.cn/perl-python/464414.html [原]Python NLP实战之一:环境准备 最近正在学习Python,看了几本关于Pytho ...

  4. 自然语言处理库——NLTK

    NLTK(www.nltk.org)是在处理预料库.分类文本.分析语言结构等多项操作中最长遇到的包.其收集的大量公开数据集.模型上提供了全面.易用的接口,涵盖了分词.词性标注(Part-Of-Spee ...

  5. 自然语言16_Chunking with NLTK

    sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&am ...

  6. python自然语言处理工具nltk安装_安装自然语言处理工具Nltk以及初次使用

    步骤一:卸载已经安装的python 步骤二:安装python科学计算工具,里面自动安装了很多库,像numpy,matplotlib,nltk等等,并且会自动安装python,安装完成后,不需要配置环境 ...

  7. 【自然语言处理】-nltk库学习笔记(一)

    句子切分(Sentence Tokenize) nltk的分词是句子级别的,所以对于一篇文档首先要将文章按句子进行分割,然后句子进行分词 from nltk.tokenize import sent_ ...

  8. 开源nlp自然语言处理 word2vec nltk textblob crf++ 机器人、翻译、简繁转换、分词、词性、词向量、关键词主题、命名体识别、语义分析、情感正负面、近义同义词、句子相似性、聚类

    github开源:https://github.com/lhyxcxy/nlp 说明 本例子主要集成各种nlp框架 主要功能如下 (1)自动问答机器人 (2)中文翻译,及繁体转简体 (3)关键词提取, ...

  9. Python自然语言处理 NLTK 库用法入门教程

                                                               NLP (Natural Language Processing):自然语言处理 ...

最新文章

  1. symbol(s) not found for architexture i386 报错
  2. 一次生产的 JVM 优化案例
  3. 反射获取构造方法并使用【应用】
  4. 焦旭超 201771010109《面向对象程序设计课程学习进度条》
  5. Java技术:项目构建工具Maven最佳替代者gradle介绍
  6. Openssl搭建私有CA认证
  7. 三、比特币白皮书:一种点对点的电子现金系统
  8. SpringCloud Gateway 集成 oauth2 实现统一认证授权_03
  9. java web 邮件_JavaWeb -- 邮件收发
  10. 仓库保管工计算机试题,仓库保管工中级试卷答案.doc
  11. 使用base64:url 来定义背景图片url
  12. C# ToString()格式笔记
  13. 初探VBScript
  14. 初识C语言,入门小程序
  15. 安吉丽娜-朱莉曝光罕见少女照(图)
  16. 【插件】油猴插件安装
  17. 我的 2020 年终总结
  18. ORR和BIC伪指令应用
  19. PPT制作技巧汇总之图形对象与多媒体应用(office 2007)
  20. 计算机论文致谢词范文500字,大专论文的结尾致谢500字(论文的致谢语)

热门文章

  1. JS计算俩个日期之间相差的天数,过滤节假日和周末
  2. 夜游模拟器连接AndroidStudio问题
  3. 【文本分类】基于预训练语言模型的BERT-CNN多层级专利分类研究
  4. python人脸签到_人脸实时签到(three.js+tracking.js)基于浏览器
  5. python实现自动翻译剪切板
  6. linux切换桌面环境bug,LinuxMint 17.1 Cinnamon桌面窗口焦点bug
  7. VR晕眩原因及解决方法
  8. 贪吃蛇游戏(java)(全注释)
  9. 《荣誉勋章:战士》卡顿低配设置心得
  10. 华为商城 删除订单_华为5G基站芯片稳了!外媒传来消息,台积电还是出手了