这本书对应python2的中文版书籍网上有很多,但是随后更新的python3的版本却微乎其微,只能从官网上的电子英文版开看了,反正也全当练习了。

官网明确更新的几条观月NLTK 3.0的信息,间接说明这些可能很重要或者很常用,就像print对于python一样。

NLTK also includes some pervasive changes:

  • many types are initialised from strings using a fromstring() method
  • many functions now return iterators instead of lists
  • ContextFreeGrammar is now called CFG and WeightedGrammar is now called PCFG
  • batch_tokenize() is now called tokenize_sents(); there are corresponding changes for batch taggers, parsers, and classifiers
  • some implementations have been removed in favour of external packages, or because they could not be maintained adequately

详情:https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0

第一章没什么新内容,多了一个concordance的方法

>>> text5.concordance('lol')
Displaying 25 of 25 matches:
ast PART 24 / m boo . 26 / m and sexy lol U115 boo . JOIN PART he drew a girl w
ope he didnt draw a penis PART ewwwww lol & a head between her legs JOIN JOIN s
a bowl i got a blunt an a bong ...... lol JOIN well , glad it worked out my cha
e " PART Hi U121 in ny . ACTION would lol @ U121 . . . but appearently she does
30 make sure u buy a nice ring for U6 lol U7 Hi U115 . ACTION isnt falling for didnt ya hear !!!! PART JOIN geeshhh lol U6 PART hes deaf ppl here dont get it
es nobody here i wanna misbeahve with lol JOIN so read it . thanks U7 .. Im hap
ies want to chat can i talk to him !! lol U121 !!! forwards too lol JOIN ALL PE
k to him !! lol U121 !!! forwards too lol JOIN ALL PErvs ... redirect to U121 'loves ME the most i love myself JOIN lol U44 how do u know that what ? jerkett
ng wrong ... i can see it in his eyes lol U20 = fiance Jerketts lmao wtf yah I
cooler by the minute what 'd I miss ? lol noo there too much work ! why not ?? that mean I want you ? U6 hello room lol U83 and this .. has been the grammar the rule he 's in PM land now though lol ah ok i wont bug em then someone wann
flight to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 80
ht to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 808265
082653953 K-Fed got his ass kicked .. Lol . ACTION laughs . i got a first class. i got a first class ticket to hell lol U7 JOIN any texas girls in here ? any. whats up U155 i was only kidding . lol he 's a douchebag . Poor U121 i 'm bo??? sits with U30 Cum to my shower . lol U121 . ACTION U1370 watches his nads ur nad with a stick . ca u U23 ewwww lol *sniffs* ewwwwww PART U115 ! owww spl
ACTION is resisting . ur female right lol U115 beeeeehave Remember the LAst tim
pm's me . charge that is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLOis 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLOLLL U12 thats not nic
s . lmao no U115 Check my record . :) Lol lick em U7 U23 how old r u lol Way to


通过实验,可以知道dispersion_plot是注意大小写的,可以稍微见得,在NLP处理过程中大小写都是要很注意的。
对于generate这个函数,根据网页:https://github.com/nltk/nltk/issues/736来看,仍然没有解决,最近的一条回复竟然是18号,然而很多其他也并不能给出相应的解答,无非都是没办法,不去管,我这边也尝试了几种不同的方式,也没有得到不错的结果……故而暂且搁置,文章说第三章会再见,我们第三期再说。

token被译为标识符(管他第二个字念什么),括号和标点符号的组合体貌似算是一种标识符,有点意思。
word type 词类型,含有标点符号的一般不叫word type,而是叫item type,换句话说纯正的单词表才会是word type。

1.3上来这个saying是什么就不知道,中间一串省略号…

>>> saying = ['After', 'all', 'is', 'said', 'and', 'done','more', 'is', 'said', 'than', 'done']
>>> tokens=set(saying)
>>> tokens=sorted(tokens)
>>> tokens[-2:]
['said', 'than']

“单纯来看”

再使用hapaxes方法的时候可能会出现IDLE短时死机的可能,不过等一会儿就好了,毕竟9000多个词呢。

Collocations被翻译成了搭配,好像没什么问题

只计数小写的词肯定有问题啊,国家名地名什么的……

babelize_shell()这个函数已经不再使用了,官网的电子书给出了解释:

Today, practical translation systems exist for particular pairs of languages, and some are integrated into web search engines. However, these systems have some serious shortcomings, which are starkly revealed by translating a sentence back and forth between a pair of languages until equilibrium is reached, e.g.:0> how long before the next flight to Alice Springs?
1> wie lang vor dem folgenden Flug zu Alice Springs?
2> how long before the following flight to Alice jump?
3> wie lang vor dem folgenden Flug zu Alice springen Sie?
4> how long before the following flight to Alice do you jump?
5> wie lang, bevor der folgende Flug zu Alice tun, Sie springen?
6> how long, before the following flight to Alice does, do you jump?
7> wie lang bevor der folgende Flug zu Alice tut, tun Sie springen?
8> how long before the following flight to Alice does, do you jump?
9> wie lang, bevor der folgende Flug zu Alice tut, tun Sie springen?
10> how long, before the following flight does to Alice, do do you jump?
11> wie lang bevor der folgende Flug zu Alice tut, Sie tun Sprung?
12> how long before the following flight does leap to Alice, does you?
Observe that the system correctly translates Alice Springs from English to German (in the line starting 1>), but on the way back to English, this ends up as Alice jump (line 2). The preposition before is initially translated into the corresponding German preposition vor, but later into the conjunction bevor (line 5). After line 5 the sentences become nonsensical (but notice the various phrasings indicated by the commas, and the change from jump to leap). The translation system did not recognize when a word was part of a proper name, and it misinterpreted the grammatical structure.

正如之前讨论所得出的结果一样,现在很多翻译器的翻译结果都是呈离散型的,换句话说一句话翻译过去在翻译过来并不能和原句相同,这也许是现在NLP面临的另外一个难题吧。

《Natural Language Processing with Python》读书笔记 001期相关推荐

  1. 《Natural Language Processing with Python》读书笔记 004期

    编程是切勿急躁,但是也不能慢悠悠啊[手动捂脸] 这章主要都是python的非常基础的知识,有很多BUG也都是非常非常有特点的需要注意的 基本上对于个人来讲没有特别多的新知识了 assert的用法可以再 ...

  2. 《Natural Language Processing with Python》读书笔记 003期

    这个2554.txt已经改名了貌似,改成2554-0.txt了.把代码也相应改了. 长度变成了:1176965 多了一些编码: >>> len(tokens) 257726 > ...

  3. 《Natural Language Processing with Python》读书笔记 002期

    第二章一开始核心就是再讲nltk里面内置的各种语料库,但是个人觉得这个并不是这张的重点,重点在于后面如何自己构造自己的语料库,毕竟如果一般训练的话,都肯定是拿自己手头的data来搞. 这个地方其实也没 ...

  4. 预训练综述 Pre-trained Models for Natural Language Processing: A Survey 阅读笔记

    原文链接:https://arxiv.org/pdf/2003.08271.pdf 此文为邱锡鹏大佬发布在arXiv上的预训练综述,主要写了预训练模型(PTM)的历史,任务分类,PTM的扩展,将PTM ...

  5. Python在自然语言处理领域的应用 Natural Language Processing With Python: Analyzing Text

    作者:禅与计算机程序设计艺术 1.简介 概述 在自然语言处理领域,Python被视作最优秀.应用范围最广泛.社区氛围最活跃.学习曲线最平缓的一门编程语言.它提供丰富的库函数和框架支持,有着庞大的生态系 ...

  6. 《Natural Language Processing with PyTorch》 Chapter 2: A Quick Tour of Traditional NLP 笔记

    <Natural Language Processing with PyTorch> Chapter 2: A Quick Tour of Traditional NLP 笔记 这本书 本 ...

  7. 论文阅读笔记(一)【Journal of Machine Learning Research】Natural Language Processing (Almost) from Scratch(未完)

    学习内容 题目: 自然语言从零开始 Natural Language Processing (Almost) from Scratch 2021年7月28日 1-5页 这将是一个长期的过程,因为本文长 ...

  8. 【吴恩达深度学习笔记】5.2自然语言处理与词嵌入Natural Language Processing and Word Embeddings

    第五门课 序列模型(Sequence Models) 2.1词汇表征(Word Representation) 词嵌入(word embeddings)是语言表示的一种方式,可以让算法自动的理解一些类 ...

  9. 自然语言处理NLP 2022年最新综述:An introduction to Deep Learning in Natural Language Processing

    论文题目:An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools ...

最新文章

  1. 结对编程项目作业-设计文档
  2. 让Everest 0.6支持Intel 82852/82855 GM/GME显卡图形登录
  3. linux shell ddos木马,利用Shell 脚本解决DDOS攻击问题
  4. oracle判断侦听状态,oracle 监听状态为BLOCKED
  5. 游戏企业的“逆袭”,从用好这套解决方案开始 →
  6. mysql+不锁表添加字段_MySQL5.6在线DDL不锁表(在线添加字段)
  7. WebDriver介绍
  8. 微信头像单张图片上传
  9. 计算机组装与维护模拟测试题三答案,春季高考信息技术模拟题3(计算机组装与维修部分含答案)...
  10. 前端_网页编程 Form表单与模板引擎(上)
  11. 2021年中国健康险行业创新研究报告
  12. 河南科技大学计算机系宿舍,河南科技大学宿舍条件怎么样—河南科技大学宿舍图片...
  13. ubuntu18的网关ip在哪里配_技术|如何在 Ubuntu 18.04 LTS 中配置 IP 地址
  14. mysql error 1790_Mysql 数据恢复报错
  15. mysql jpa缓存,如何在Spring Data JPA CRUDRepository中添加缓存功能
  16. 开源GIS(九)——openlayers中简单要素的添加与geojson数据修改添加
  17. silverlight5 ToolKit下载地址
  18. 教你轻松解决CSRF跨站请求伪造攻击
  19. 支持向量回归(Support Vector Regression)
  20. Android天气预报详解

热门文章

  1. 如何在 IDEA 中创建并部署 JavaWeb 程序
  2. java初中级面试题集锦
  3. 第一章-总论财务管理概述
  4. Python全栈[第二篇]:计算机基础知识-进制
  5. 英雄联盟怎么解除小窗口_英雄联盟手游亚索怎么操作-英雄联盟手游亚索操作攻略...
  6. Getfasta--根据Acession Number(Ac号)批量下载GenBank分子序列数据的自动化程序
  7. 【微服务】服务调用----Ribbon
  8. 创新虎仔音箱试玩报告
  9. 游戏中人工智能的优化
  10. Android 模拟游戏手柄按键(跨进程 KeyEvent 事件)实践方案