利用NLTK做中英文分词

美图欣赏：

一.NLTK环境配置

1.安装nltk包（如果开始能装忽然爆红多装几次）

pip install nltk

2.在python consol里面

 //1.先导入包import nltk//2.下载基本的数据nltk.download()

注:如果在线下载失败，可以自行官网下载然后放到指定文件夹。

二.利用NLTK做英文分词

这里选用的是anaconda做解释器

1.实现段落分句

import nltk
# 获取一段文本
text = "In the coming new term, there will be many challenging exams. Firstly, in June, there is a College English Test Band Four. In May, Certificate of Accounting Professional is around the corner. Without sufficient preparations, I can hardly expect to pass those exams. So I have to plan more time to take enough preparation."#（1）实现段落分句
#language是关键词（红色表示），默认是english
#sent_tokenize方法作用就是分句 (以“.”作为分割符)tokenize = nltk.sent_tokenize(text, language='english')
print(tokenize)

打印结果:

['In the coming new term, there will be many challenging exams.', 'Firstly, in June, there is a College English Test Band Four.', 'In May, Certificate of Accounting Professional is around the corner.', 'Without sufficient preparations, I can hardly expect to pass those exams.', 'So I have to plan more time to take enough preparation.']

2.实现分词

#（2）实现分词
#append方法:循环出来的内容进行追加到数组words[]中
#word_tokenize方法作用进行分词words = []
for word in tokenize:words.append(nltk.word_tokenize(word))
print(words)

打印结果：

[['In', 'the', 'coming', 'new', 'term', ',', 'there', 'will', 'be', 'many', 'challenging', 'exams', '.'], ['Firstly', ',', 'in', 'June', ',', 'there', 'is', 'a', 'College', 'English', 'Test', 'Band', 'Four', '.'], ['In', 'May', ',', 'Certificate', 'of', 'Accounting', 'Professional', 'is', 'around', 'the', 'corner', '.'], ['Without', 'sufficient', 'preparations', ',', 'I', 'can', 'hardly', 'expect', 'to', 'pass', 'those', 'exams', '.'], ['So', 'I', 'have', 'to', 'plan', 'more', 'time', 'to', 'take', 'enough', 'preparation', '.']]

3.词性标注

#(3)词性标注
#pos_tag方法就是做词性解析wordtagging = []
for cixing in words:wordtagging.append(nltk.pos_tag(cixing))
print(wordtagging)

打印结果：

[[('In', 'IN'), ('the', 'DT'), ('coming', 'VBG'), ('new', 'JJ'), ('term', 'NN'), (',', ','), ('there', 'EX'), ('will', 'MD'), ('be', 'VB'), ('many', 'JJ'), ('challenging', 'VBG'), ('exams', 'NNS'), ('.', '.')], [('Firstly', 'RB'), (',', ','), ('in', 'IN'), ('June', 'NNP'), (',', ','), ('there', 'EX'), ('is', 'VBZ'), ('a', 'DT'), ('College', 'NNP'), ('English', 'NNP'), ('Test', 'NNP'), ('Band', 'NNP'), ('Four', 'NNP'), ('.', '.')], [('In', 'IN'), ('May', 'NNP'), (',', ','), ('Certificate', 'NNP'), ('of', 'IN'), ('Accounting', 'NNP'), ('Professional', 'NNP'), ('is', 'VBZ'), ('around', 'IN'), ('the', 'DT'), ('corner', 'NN'), ('.', '.')], [('Without', 'IN'), ('sufficient', 'JJ'), ('preparations', 'NNS'), (',', ','), ('I', 'PRP'), ('can', 'MD'), ('hardly', 'RB'), ('expect', 'VB'), ('to', 'TO'), ('pass', 'VB'), ('those', 'DT'), ('exams', 'NNS'), ('.', '.')], [('So', 'RB'), ('I', 'PRP'), ('have', 'VBP'), ('to', 'TO'), ('plan', 'VB'), ('more', 'JJR'), ('time', 'NN'), ('to', 'TO'), ('take', 'VB'), ('enough', 'JJ'), ('preparation', 'NN'), ('.', '.')]]

三.利用NLTK做中文分词

1.实现段落分句

import nltk#解析中文
#做中文分词解析，分割符一定要用“.”才可以正确识别解析（“.”后面一定要一个空格）text1 = '同是风华正茂，怎敢甘拜下风 . 保持学习，保持饥饿'
Juzi_chinese = nltk.sent_tokenize(text1)
print(Juzi_chinese)

结果：

['同是风华正茂，怎敢甘拜下风 .', '保持学习，保持饥饿']

2.实现分词

#分词解析的是文本，不是句子
#word_tokenize方法实现分词tokens=nltk.word_tokenize(text1)
print(tokens)

打印结果：

['同是', '风华', '正茂，怎敢', '甘拜', '下风', '.', '保持', '学习，保持', '饥饿']

      ————保持饥饿，保持学习Jackson_MVP

利用NLTK做中英文分词相关推荐

jieba nltk 进行中英文分词
Jieba.NLTK等中英文分词工具进行分词建议:中文分词使用 jieba(SnowNlp.THULAC.NLPIR.StanfordCoreNLP)进行分词,英文使用 NLTK进行分词:还有git ...
Jieba、NLTK等中英文分词工具进行分词
实验目的: 利用给定的中英文文本序列(见 Chinese.txt 和 English.txt),分别利用以下给定的中英文分词工具进行分词并对不同分词工具产生的结果进行简要对比分析. 实验工具: 中文 ...
利用NLTK进行分句分词
2019独角兽企业重金招聘Python工程师标准>>> .输入一个段落,分成句子(Punkt句子分割器) import nltk import nltk.data def split ...
nltk中文分句_利用NLTK进行分句分词
1.输入一个段落,分成句子(Punkt句子分割器) import nltk import nltk.data def splitSentence(paragraph): tokenizer = nlt ...
基于Python的中英文分词基础：正则表达式和jieba分词器
基于Python的中英文分词基础:正则表达式和jieba分词器前言介绍英文字符串处理 Python中的str 正则表达式 Python中的正则表达式模块 re 小练习字符串中出现频次最多的字母 ...
中英文分词后进行词频统计（包含词云制作）
文章目录 1.英文词频统计和词云制作 2.中文词频统计和词云制作 2.1 错误发现 2.2 错误改正在之前的分词学习后,开始处理提取的词语进行词频统计,因为依据词频是进行关键词提取的最简单方法: ...
python用中文怎么说-如何用Python做中文分词？
打算绘制中文词云图?那你得先学会如何做中文文本分词.跟着我们的教程,一步步用Python来动手实践吧. 需求在<如何用Python做词云>一文中,我们介绍了英文文本的词云制作方法.大家玩 ...
python 英语分词_如何用Python做中文分词？
打算绘制中文词云图?那你得先学会如何做中文文本分词.跟着我们的教程,一步步用Python来动手实践吧. 需求在<如何用Python做词云>一文中,我们介绍了英文文本的词云制作方法.大家玩 ...
PHP+mysql数据库开发搜索功能：中英文分词+全文检索（MySQL全文检索+中文分词（SCWS））...
PHP+mysql数据库开发类似百度的搜索功能:中英文分词+全文检索中文分词: a) robbe PHP中文分词扩展: http://www.boyunjian.com/v/softd/robb ...

利用NLTK做中英文分词

利用NLTK做中英文分词相关推荐

最新文章

热门文章