【python】epub的电子书繁体字与简体字转换

在网上找到了一本非常喜欢的电子书，但是它是繁体字。在网上找了很多转换工具不是收费就是转换后格式出现问题，后面又找到了一段python代码，使用起来非常简单，而且格式不会出现问题，非常好用！

import opencc
import os
import zipfile
import shutildef test_convert():"""s2t.json Simplified Chinese to Traditional Chinese 簡體到繁體t2s.json Traditional Chinese to Simplified Chinese 繁體到簡體"""converter = opencc.OpenCC('s2t')print(converter.convert('汉字acb'))  # 漢字converter = opencc.OpenCC('t2s')print(converter.convert('漢字123'))  # 漢字def unzip_dir(zipfilename, unzipdirname):"""解压zip文件"""fullzipfilename = os.path.abspath(zipfilename)fullunzipdirname = os.path.abspath(unzipdirname)print("Start to unzip file %s to folder %s ..." % (zipfilename, unzipdirname))# Check input ...if not os.path.exists(fullzipfilename):print("Dir/File %s is not exist, Press any key to quit..." % fullzipfilename)inputStr = input()returnif not os.path.exists(fullunzipdirname):os.mkdir(fullunzipdirname)else:if os.path.isfile(fullunzipdirname):print("File %s is exist, are you sure to delet it first ? [Y/N]" % fullunzipdirname)while 1:inputStr = input()if inputStr == "N" or inputStr == "n":returnelse:if inputStr == "Y" or inputStr == "y":os.remove(fullunzipdirname)print("Continue to unzip files ...")break# Start extract files ...srcZip = zipfile.ZipFile(fullzipfilename, "r")for eachfile in srcZip.namelist():if eachfile.endswith('/'):# is a directoryprint('Unzip directory %s ...' % eachfilename)os.makedirs(os.path.normpath(os.path.join(fullunzipdirname, eachfile)))continueprint("Unzip file %s ..." % eachfile)eachfilename = os.path.normpath(os.path.join(fullunzipdirname, eachfile))eachdirname = os.path.dirname(eachfilename)if not os.path.exists(eachdirname):os.makedirs(eachdirname)fd = open(eachfilename, "wb")fd.write(srcZip.read(eachfile))fd.close()srcZip.close()print("Unzip file succeed!")def zip_dir(dirname,zipfilename):filelist = []if os.path.isfile(dirname):filelist.append(dirname)else :for root, dirs, files in os.walk(dirname):for dir in dirs:filelist.append(os.path.join(root,dir))for name in files:filelist.append(os.path.join(root, name))zf = zipfile.ZipFile(zipfilename, "w", zipfile.zlib.DEFLATED)for tar in filelist:arcname = tar[len(dirname):]#print arcnamezf.write(tar,arcname)zf.close()def convert_file_to_chinese(file_path):"""按行读取文件，存储到list集合中，转换元素的语言(繁体->简体)，将结果写回到文件中"""file_lines = []converter = opencc.OpenCC('t2s')with open(file_path, mode='r', encoding='utf-8') as f:for line in f.readlines():file_lines.append(converter.convert(line))with open(file_path, mode='w', encoding='utf-8') as f:f.writelines(file_lines)def convert_epub_simplified(file_path):"""将epub文件转繁体换成简体"""if not (os.path.exists(file_path) or os.path.isfile(file_path)):raise Exception("请检查文件路径：{}", file_path)dir_name, file_name = os.path.split(file_path)unzip_dir_path = dir_name + "\\unzip"unzip_dir(epub_file_path, unzip_dir_path)files = find_content_files(unzip_dir_path)for file in files:convert_file_to_chinese(file)new_file_name = file_name[0:file_name.rindex(".")] + "-简体.epub"new_epub_file_path = os.path.join(dir_name, new_file_name)zip_dir(unzip_dir_path, new_epub_file_path)# os.remove(unzip_dir_path)shutil.rmtree(unzip_dir_path)def find_content_files(folder_path):"""查询文件夹中所有需要修改的文件的路径，返回路径的列表只转换epub文件的内容，搜索后缀为 'xhtml' 的文件"""result_files = []for root, dirs, files in os.walk(folder_path):# for dir in dirs:#     filelist.append(os.path.join(root, dir))for name in files:if name.endswith('html'):result_files.append(os.path.join(root, name))return result_filesif __name__ == '__main__':# 测试繁体简体转换# test_convert()epub_file_path = "电子书路径"convert_epub_simplified(epub_file_path)

【python】epub的电子书繁体字与简体字转换相关推荐

中文繁体字与简体字转换
/* * $Id: ChangeCode.java$ * * 来自:http://thorlst.blog.163.com/blog/static/59275749201122402041317 ...
【Python】用OpenCC将繁体字转为简体字
官方GitHub:BYVoid/OpenCC 相关文章:python实现中文的繁简转换一.安装 pip install opencc-python-reimplemented 二.使用 1. 繁体字 ...
学点实用工作小技巧【Python】汉字转拼音、繁体字和简体字互转、提取字符串中的中文（英文）、判断是否纯中文（英文）
大家早上好,本人姓吴,如果觉得文章写得还行的话也可以叫我吴老师.欢迎大家跟我一起走进数据分析的世界,一起学习! 感兴趣的朋友可以关注我或者我的数据分析专栏,里面有许多优质的文章跟大家分享哦. 前言又 ...
python-10-爬虫selenium繁体字转换为简体字和NCR字符处理
1 将NCR字符转换成真实字符以 &# 或 &#x 开头的字符串叫做 NCR 字符,在浏览器中查看会直接转换成中文. 一个Numeric Character Reference编码是 ...
python epub.js_如何利用Python打包HTML页面为epub?
最近沉迷于将各种博客和官方文档html转化成pdf,结果用手机看还是不太方便,所以想到将html转化成epub格式的电子书,要用os,re,requests,lxm,zipfile,五个库,在这里分享 ...
教你使用Python爬虫获取电子书资源实战！喜欢学习的小伙伴过来看啦！
最近在学习Python,相对java来说python简单易学.语法简单,工具丰富,开箱即用,适用面广做全栈开发那是极好的,对于小型应用的开发,虽然运行效率慢点,但开发效率极高.大大提高了咱们的生产力. ...
python epub 精品_如何利用Python打包HTML页面为epub?
最近沉迷于将各种博客和官方文档html转化成pdf,结果用手机看还是不太方便,所以想到将html转化成epub格式的电子书,要用os,re,requests,lxm,zipfile,五个库,在这里分享 ...
vue + epub.js 电子书
最近在写一个电子书的功能,从2016年写到了2017年,如今总算告一段落,下面总结一下途中遇到的问题吧. 1. 前期准备 a) Epub.js GitHub: https://github.com/f ...
网络爬虫-繁体字转简体字
繁体字转简体字在网络爬虫的过程中获取到的网页源码经常会含有一些繁体字,像维基百科网页上面显示的是简体字但获取到的网页源码中通常要经过转换才可以方便的使用. 下面是繁体转简体字的工具类: packag ...

【python】epub的电子书繁体字与简体字转换

【python】epub的电子书繁体字与简体字转换相关推荐

最新文章

热门文章