python版小说分割转换器

前段时间写了个简单的[url=http://greatghoul.iteye.com/blog/610134]TXT2HTML小说转换器HTA版[/url]，现在拿python再实现一遍，自动按章节分割成多个HTML文件，并建立目录，方便阅读。

[b]效果图：[/b]
[img]http://dl.iteye.com/upload/attachment/216180/11783431-03e0-323d-9a93-41ccd9c9f6bc.jpg[/img]

[b]脚本代码：[/b]

# encoding: gbk## 将txt小说分割转换成多个HTML文件## @author : GreatGhoul# @email  : greatghoul@gmail.com# @blog   : http://greatghoul.iteye.com

import reimport os

# regex for the section title# sec_re = re.compile(r'第.+卷\s+.+\s+第.+章\s+.+')

# txt book's path.source_path = 'f:\\佣兵天下.txt'

path_pieces = os.path.split(source_path)novel_title = re.sub(r'(\..*$)|($)', '', path_pieces[1])target_path = '%s%s_html' % (path_pieces[0], novel_title)section_re = re.compile(r'^\s*第.+卷\s+.*$')section_head = '''    <html>        <head>            <meta http-equiv="Content-Type" content="GBK"/>            <title>%s</title>        </head>        <body style="font-family:楷体,宋体;font-size:16px; margin:0;            padding: 20px; background:#FAFAD2;color:#2B4B86;text-align:center;">            <h2>%s</h2><a href="#bottom">去页尾</a><hr/>'''

# escape xml/htmldef escape_xml(code):    text = code    text = re.sub(r'<', '<', text)    text = re.sub(r'>', '>', text)    text = re.sub(r'&', '&', text)    text = re.sub(r'\t', '    ', text)    text = re.sub(r'\s', ' ', text)    return text

# entry of the scriptdef main():    # create the output folder    if not os.path.exists(target_path):        os.mkdir(target_path)

    # open the source file    input = open(source_path, 'r')

    sec_count = 0    sec_cache = []    idx_cache = []

    output = open('%s\\%d.html' % (target_path, sec_count), 'w')    preface_title = '%s 前言' % novel_title    output.writelines([section_head % (preface_title, preface_title)])    idx_cache.append('<li><a href="%d.html">%s</a></li>'                     % (sec_count, novel_title))

    for line in input:        # is a chapter's title?        if line.strip() == '':            pass        elif re.match(section_re, line):            line = re.sub(r'\s+', ' ', line)            print 'converting %s...' % line

            # write the section footer            sec_cache.append('<hr/><p>')            if sec_count == 0:                sec_cache.append('<a href="index.html">目录</a> | ')                sec_cache.append('<a href="%d.html">下一篇</a> | '                                 % (sec_count + 1))            else:                sec_cache.append('<a href="%d.html">上一篇</a> | '                                 % (sec_count - 1))                sec_cache.append('<a href="index.html">目录</a> | ')                sec_cache.append('<a href="%d.html">下一篇</a> | '                                 % (sec_count + 1))            sec_cache.append('<a name="bottom" href="#">回页首</a></p>')            sec_cache.append('</body></html>')            output.writelines(sec_cache)            output.flush()            output.close()            sec_cache = []            sec_count += 1

            # create a new section            output = open('%s\\%d.html' % (target_path, sec_count), 'w')            output.writelines([section_head % (line, line)])            idx_cache.append('<li><a href="%d.html">%s</a></li>'                             % (sec_count, line))        else:            sec_cache.append('<p style="text-align:left;">%s</p>'                             % escape_xml(line))

    # write rest lines    sec_cache.append('<a href="%d.html">下一篇</a> | '                     % (sec_count - 1))    sec_cache.append('<a href="index.html">目录</a> | ')    sec_cache.append('<a name="bottom" href="#">回页首</a></p></body></html>')    output.writelines(sec_cache)    output.flush()    output.close()    sec_cache = []

    # write the menu    output = open('%s\\index.html' % (target_path), 'w')    menu_head = '%s 目录' % novel_title    output.writelines([section_head % (menu_head, menu_head), '<ul style="text-align:left">'])    output.writelines(idx_cache)    output.writelines(['</ul><body></html>'])    output.flush()    output.close()    inx_cache = []

    print 'completed. %d chapter(s) in total.' % sec_count

if __name__ == '__main__':    main()

将其中的[quote]source_path = 'f:\\佣兵天下.txt'[/quote]修改成TXT小说的路径，再根据情况，稍微修改下匹配章节标题的正则[quote]section_re = re.compile(r'^\s*第.+卷\s+.*$')[/quote]即可。脚本会在小说所在目录生成一个"文件名_html"的文件夹用于存放节割后的文件。

刚刚接触python，感觉写的代码很不精简，请大家帮忙改进下。

python版小说分割转换器 | #python相关推荐

类似零基础学python的小说_零基础小白十分钟用Python搭建小说网站！Python真的强！...
零基础小白十分钟用Python搭建小说网站!Python真的强!-1.jpg (128.29 KB, 下载次数: 0) 2018-10-8 18:51 上传 Python 和放大镜的二进制代码人生苦 ...
spark编程基础python版 pdf_Spark编程基础Python版-第5章-Spark-SQL.pdf
<Spark编程基础(Python版)> 教材官网:/post/spark-python/ 温馨提示:编辑幻灯片母版,可以修改每页PPT的厦大校徽和底部文字第5章Spark SQL (P ...
用python分析小说_用Python分析《斗破苍穹》
原标题:用Python分析<斗破苍穹> 来自:量化小白上分记(微信号:quanthzp) 近期根据小说<斗破苍穹>改编的同名电视剧正在热映,本文对<斗破苍穹>进行文 ...
python爬虫小说代码示例-Python从零开始写爬虫-4 解析HTML获取小说正文
Python从零开始写爬虫-4 解析HTML获取小说正文在上一节中, 我们已经学会如何获取小说的目录, 这一节我们将学习如何通过正则表达式(在第二节学习过)来获取小说正文. 首先, 先随便选择一个章 ...
用python分析小说_用Python对哈利波特系列小说进行情感分析
原标题:用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里,我们想按照章节(列表中第一个就是章节1的内容,列表中第二个是章节2的内容)进行分析,这就需要用到正 ...
python车牌字符分割_OpenCV+Python识别车牌和字符分割的实现
本篇文章主要基于python语言和OpenCV库(cv2)进行车牌区域识别和字符分割,开篇之前针对在python中安装opencv的环境这里不做介绍,可以自行安装配置! 车牌号检测需要大致分为四个部分 ...
用python爬小说_使用python+Scrapy爬小说
图片来源网络写在前面最近在学习python,不得不说python真是好用,至少生成程序的速度快,语法也比较简单ヾ(◍°∇°◍)ﾉﾞ感觉很强大,之前怎么就没有想到学一下这个呢,如果大学抢课的时候 ...
python教学小说04 写python必懂的潜规则：冒号和缩进
Guido van Rossum "这人不是我,他是大名鼎鼎的python之父,Guido van Rossum,中文名叫吉多·范罗苏姆."猫看眼前的照片说. "这大佬 ...
漫画算法python版下载_用 Python 下载漫画
ReLIFE 1. 开篇前一阵子看了 Relife 这部七月番动画,短短 13 集很快就看完,深深无法自拔,然后去查了下,发现原来是改编自漫画的,于是立马去找漫画的资源,但是搜来搜去都没有找到资源下 ...

python版小说分割转换器 | #python

python版小说分割转换器 | #python相关推荐

最新文章

热门文章