python 爬虫

python爬取金庸小说人物

‘’’
通过http://www.jinyongwang.com/data/renwu/来获取金庸小说中的人物
输出结果为
小说1
人物1 人物2 人物3 …
小说2
人物1 人物2 人物3 …
…
话不多说直接上码
‘’’

import requests
from bs4 import BeautifulSoup
import re#获取网页内容
def get_html(url):page = requests.get(url)#print(page.content)return page.content#获取小说及人物并写入txt中
def analyse_html(html):#第一次使用BeautifulSoup，感觉perfectsoup = BeautifulSoup(html,'html.parser')body = soup.body#print(body)main = body.find('div',attrs = {'class':'main'})booklist = main.find('div',attrs = {'class':'booklist'})#写入指定txtfile_path = r'E:\names.txt'file = open(file_path,'a',encoding='utf-8')#可以通过使用正则的方式，找到含有小说名的  'h2' 标签，和含有人物名的 'div' 标签for dataname in booklist.find_all(re.compile('h2|div')):# print(dataname)# print(dataname['class'])# print(type(dataname['class']))#可以直接通过  tag['class']  的方式直接获取 tag 值if dataname['class'][0] == 'dataname':dataname = dataname.find('span')book_name = dataname.get_text()print(book_name+'\n')file.write('\n'+book_name+'\n')elif dataname['class'][0] == 'datapice' :for a in dataname('a'):#含有人物图片的和不含有的获取方式有少许差异#   <a href="/data/2752.html"><img alt="郑旦" src="/public/uploads/baike/2015-08-15/95771439622810_120.jpg"/>郑旦</a>    这种含有人物图片的直接通过 get_text()  即可获取if a.find('i') == None:role_name = a.get_text().replace(' ', '')#print(role_name)else:#    <a href="/data/2767.html"><i class="icon"></i>卓天雄</a>   #这种不含有人物图片的直接通过  get_text() 获取，会多获取到一个 ‘’，暂无其他好办法去掉，只能通过字符串切割的方式去掉role_name = a.get_text(strip=True).replace(' ','')[1:]file.write(role_name+' ')file.close()
if __name__ == '__main__':url = 'http://www.jinyongwang.com/data/renwu/'html = get_html(url)analyse_html(html)

第一次在CSDN上写些东东，感觉还不错~~

python爬取金庸小说人物相关推荐

不到30行python代码爬取金庸小说
爬取金庸小说代码如下: import requests from bs4 import BeautifulSoupdef get_html(url): html=requests.get(url)h ...
Python爬取金庸人物
Step: 目标文章:鹿鼎记实现功能: 人物统计云图程序源码: -- coding: utf-8 -- """ Created on Sat Jul 7 16:57 ...
金庸小说人物知识图谱构建——图谱可视化
读取上步得到的共现矩阵,将人物间的共现频次提取处理,形成如下CSV文件这一步的代码如下 import xlrddef readxls(path):xl = xlrd.open_workbook(pa ...
Python 爬取起点的小说（非vip）
Python 爬取起点的小说(非vip) 起点小说网是一个小说种类比较全面的网站,当然,作为收费类网站,VIP类的小说也很多,章节是VIP的话,有一个动态加载,也就 ...
Python爬取网页所有小说
Python爬取网页所有小说 python 2.7.15 练习beautifulsoup的使用不了解bs的可以先看一下这个bs文档一.看URL的规律因为是要爬取网页上所有的小说,所以不仅要获取网 ...
【Python】爬取金庸射雕英雄传连载版以及金庸作品里所有江湖门派
文章目录爬取射雕英雄传连载版爬取所有金庸作品集中的江湖门派爬取射雕英雄传连载版手机上的一本连载版居然没有目录,哎,没目录看着可难受了. 趁着有空,就写了一个脚本提取了下连载版.下次有空的话,再 ...
【Python】手把手教你用Python爬取某网小说数据，并进行可视化分析
网络文学是以互联网为展示平台和传播媒介,借助相关互联网手段来表现文学作品及含有一部分文字作品的网络技术产品,在当前成为一种新兴的文学现象,并快速兴起,各种网络小说也是层出不穷,今天我们使用seleni ...
python爬取起点vip小说章节_python 爬取起点小说vip章节（失败）
今天心血来潮,想爬取起点vip小说章节,花费了足足0.27大洋后,悟出来一个人生道理,这个应该是爬不下来.但是这0.27大洋也教会了我两个知识点. 1.服务器只会响应客户端的请求,不会主动给客户端发送 ...
Python爬取喜马拉雅有声小说【转载】
话不多说直接上源码爬取喜马拉雅有声小说-夜惊魂 import re import os import json import requests def Night_fright(): start_ ...

python爬取金庸小说人物

python 爬虫

python爬取金庸小说人物

python爬取金庸小说人物相关推荐

最新文章

热门文章