Jieba与三国？——使用jieba统计《三国演义》词频

《三国演义》词频统计

（原创文章，转载请标明出处）

使用Jieba分词，统计《三国演义》的词频，最后生成词云

主要特点如下：

1. 制作了两个自定义字典（三国演义人物名、三国演义官职一览表）
2. 使用停用词词典

import re
import jieba
import csv
from collections import Counter
from pyecharts import options as opts
from pyecharts.charts import Page, WordCloud
from pyecharts.globals import SymbolTypedef ReadText(filename):"""读取文本内容"""with open(filename, 'r', encoding='utf-8') as f:text = f.read()return textdef CutWords(text, *filelist):"""分词（filelist传入自定义字典）"""text = re.sub('（[\u4e00-\u9fa5]+）', '', text)        # 只保留中文汉字for file in filelist:jieba.load_userdict(file) words = jieba.lcut(text)                               # 分词words = [word for word in words if len(word)>1]       # 去掉只有一个字的词return wordsdef StopWords(words, stopfile):"""去除停用词"""with open(stopfile, 'r', encoding='utf-8') as f:              # 打开存放停用词的文件stoplist = f.readlines()stoplist = [stop.strip('\n') for stop in stoplist]words = [word for word in words if word not in stoplist] # 去除停用词return wordsdef WriteCSV(filename, freqdict, num=0):"""将词频统计结果写入csv"""with open(filename, 'w', encoding='utf-8', newline='') as f:if num == 0:num = len(freqdict.keys())                    # 默认统计全部词语freqlist = freqdict.most_common(num)              # 词语列表（元素是tuple：词语，词频）writer = csv.writer(f)writer.writerow(('词汇', '词频'))for freq in freqlist:writer.writerow((freq[0], freq[1]))if __name__ == '__main__':text = ReadText('三国演义.txt')text = re.sub('曰', '', text)                                  # 手动清理filelist = ['三国演义人物名.txt', '三国演义官职一览表.txt']    # 自定义字典words = CutWords(text, filelist[0], filelist[1])newwords = StopWords(words, stopfile='stop_words.txt')wordfreq = Counter(newwords)                                   # 计数WriteCSV('三国演义词频统计.csv', wordfreq, num=50)             # Top50# 生成词云图wordcloud = WordCloud()wordcloud.add('', wordfreq.most_common(50), word_size_range=[20,100])wordcloud.set_global_opts(title_opts=opts.TitleOpts(title='三国演义词云Top50'))wordcloud.render('三国演义词云图Top50.html')

Jieba与三国？——使用jieba统计《三国演义》词频相关推荐

jieba库词频统计_用jieba库统计文本词频及云词图的生成
一.安装jieba库 :\>pip install jieba #或者 pip3 install jieba 二.jieba库解析 jieba库主要提供提供分词功能,可以辅助自定义分词词典. j ...
对中国四大名著--红楼梦使用jieba进行分词处理排除去停词统计完整词频并按降序排列前20词绘制词云图
文章目录前言一.jieba是什么? 支持四种分词模式: 支持繁体分词支持自定义词典 MIT 授权协议二.直接上代码了总结前言今天用jieba和词云库做个作业,顺便记录一下,作业要求: 1 ...
jieba.analyse的使用：提取关键字/词频制作词云
jieba.analyse的使用:提取关键字/词频制作词云 ① jieba.analyse.extract_tags 提取关键字: print("***案例1***"*3) txt ...
用Python统计中英文词频
本设计基于Python3.6实现中英文词频统计功能英文词频统计统计哈姆雷特英文版,txt格式文件地址: hamlet.txt 思路分析: 获取文件中词汇转换为统一格式,如小写或者大写切割词汇 ...
python 对excel文件进行分词并进行词频统计_python 词频分析
python词频分析昨天看到几行关于用 python 进行词频分析的代码,深刻感受到了 python 的强大之处.(尤其是最近自己为了在学习 c 语言感觉被它的语法都快搞炸了,python 从来没有 ...
python jieba库_python中jieba库的介绍和应用
jieba库作为python中的第三方库,在平时是非常实用的,例如一些网站就是利用jieba库的中文分词搜索关键词进行工作. 一.安装环境 window + python 二.安装方式在电脑命令符( ...
统计csv词频_中文词频统计与词云生成
一.中文词频统计 1. 下载一长篇中文小说. 2. 从文件读取待分析文本. 3. 安装并使用jieba进行中文分词. pip install jieba import jieba jieba.lcut ...
python怎么安装jieba库-python环境jieba分词的安装
我的python环境是Anaconda3安装的,由于项目需要用到分词,使用jieba分词库,在此总结一下安装方法. 安装说明 ======= 代码对 Python 2/3 均兼容 * 全自动安装:`e ...
2.5.jieba分词工具、Jieba安装、全模式/精确模式、添加自定义词典、关键词抽取、词性标注、词云展示
2.5.jieba分词工具 2.5.1.Jieba安装 2.5.2.全模式/精确模式 2.5.3.添加自定义词典 2.5.4.关键词抽取 2.5.5.词性标注 2.5.6.词云展示 2.5.jieba ...

Jieba与三国？——使用jieba统计《三国演义》词频

《三国演义》词频统计

使用Jieba分词，统计《三国演义》的词频，最后生成词云

Jieba与三国？——使用jieba统计《三国演义》词频相关推荐

最新文章

热门文章