【Python】网络爬虫(静态网站)实例

本爬虫的特点：

1.目标：静态网站

2.级数：二级

3.线程：单线程（未采用同步，为了避免顺序错乱，因此采用单线程）

4.结果：爬取一部网络小说，将分散的各章节合并成一个txt文本文件

获取网页模板：

def get_url(url):try:response = requests.get(url)print(response.encoding)print(response.apparent_encoding)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textelse:print("url Error:", url)except RequestException:print("URL RequestException Error:", url)return None

解析保存函数：

def parse_url(html):count = 0essay = ""pattern = re.compile('<td class="L"><a href="(.*?)">(.*?)</a></td>', re.S)items = re.findall(pattern, html)pattern_page = re.compile('<meta property="og:url" content="(.*?)"/>', re.S)item_page = re.findall(pattern_page, html)print(items)print(items.__len__())for item in items:count += 1if count <= 2416:continuethis_url = item_page[0] + item[0]this_title = item[1]essay = get_book(this_url, this_title).replace("\ufffd", "*")try:if count % 100 == 1:file = open(sys.path[0]+"凡人修仙传.txt", "a")file.write(essay)if count % 100 == 0 or count == items.__len__():file.close()print("前"+str(count)+"章保存完毕！")print("下载到第 " + str(count) + "章", item, count / items.__len__() * 100, "%")except RequestException:# print("Error", item)print(essay)

完整代码：

import requests
from requests.exceptions import RequestException
import re
import sys
from multiprocessing import Pool
import sqlite3
import osdef get_url(url):try:response = requests.get(url)print(response.encoding)print(response.apparent_encoding)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textelse:print("url Error:", url)except RequestException:print("URL RequestException Error:", url)return Nonedef parse_url(html):count = 0essay = ""pattern = re.compile('<td class="L"><a href="(.*?)">(.*?)</a></td>', re.S)items = re.findall(pattern, html)pattern_page = re.compile('<meta property="og:url" content="(.*?)"/>', re.S)item_page = re.findall(pattern_page, html)print(items)print(items.__len__())for item in items:count += 1if count <= 2416:continuethis_url = item_page[0] + item[0]this_title = item[1]essay = get_book(this_url, this_title).replace("\ufffd", "*")try:if count % 100 == 1:file = open(sys.path[0]+"凡人修仙传.txt", "a")file.write(essay)if count % 100 == 0 or count == items.__len__():file.close()print("前"+str(count)+"章保存完毕！")print("下载到第 " + str(count) + "章", item, count / items.__len__() * 100, "%")except RequestException:# print("Error", item)print(essay)def get_book(url, title):data = "\n" + str(title) + "\n"pattern = re.compile('<dd id="contents">(.*?)</dd>', re.S)essay = re.findall(pattern, get_url(url))essay_str = str(essay[0])data = data + essay_str.replace("&nbsp;", " ").replace("<br />", "\n")return dataif __name__ == '__main__':parse_url(get_url("https://www.x23us.com/html/0/328/"))

【Python】网络爬虫(静态网站)实例相关推荐

python网络爬虫资源库名_Python网络爬虫
网友NO.524767 Python网络爬虫与信息提取(实例讲解) 课程体系结构: 1.Requests框架:自动爬取HTML页面与自动网络请求提交 2.robots.txt:网络爬虫排除标准 3.B ...
python爬虫实例教程-Python网络爬虫实例教程（视频讲解版）
第1章网络爬虫概述 1 1.1 认识网络爬虫 1 1.1.1 网络爬虫的含义 1 1.1.2 网络爬虫的主要类型 2 1.1.3 简单网络爬虫的架构 3 1.1.4 网络爬虫的应用场景 3 1.2 ...
python爬虫教程书籍-Python网络爬虫实例教程（视频讲解版）
第1章网络爬虫概述 1 1.1 认识网络爬虫 1 1.1.1 网络爬虫的含义 1 1.1.2 网络爬虫的主要类型 2 1.1.3 简单网络爬虫的架构 3 1.1.4 网络爬虫的应用场景 3 1.2 ...
python爬虫教程-Python网络爬虫实例教程（视频讲解版）
第1章网络爬虫概述 1 1.1 认识网络爬虫 1 1.1.1 网络爬虫的含义 1 1.1.2 网络爬虫的主要类型 2 1.1.3 简单网络爬虫的架构 3 1.1.4 网络爬虫的应用场景 3 1.2 ...
python基础实例韦玮 pdf_精通Python网络爬虫核心技术、框架与项目实战作者:韦玮PDF...
文件目录: 书本介绍: 书名精通Python网络爬虫:核心技术.框架与项目实战作者韦玮著出版社机械工业出版社出版日期 2017 内容简介本书从系统化的视角,为那些想学习Python网络爬 ...
Python网络爬虫实例1：股票数据定向爬虫
Python网络爬虫实例:股票数据定向爬虫一.功能描述目标:获取上交所和深交所所有股票的名称和交易信息输出:保存到文件中技术路线:requests-bs4-re 二.候选数据网站选择候选网站 ...
【Python爬虫9】Python网络爬虫实例实战
文章目录 2.1自动化登录Facebook 2.3自动化登录Linkedin 爬取Google真实的搜索表单爬取依赖JavaScript的网站Facebook 爬取典型在线商店Gap 爬取拥有地图接 ...
python网络爬虫_Python网络爬虫——爬取视频网站源视频！
原标题:Python网络爬虫--爬取视频网站源视频! 学习前提 1.了解python基础语法 2.了解re.selenium.BeautifulSoup.os.requests等python第三方库 ...
嵩天《Python网络爬虫与信息提取》实例2：中国大学排名定向爬虫
在介绍完requests库和robots协议后,嵩天老师又重点介绍了如何通过BeautifulSoup库进行网页解析和信息提取.这一部分就是在前面内容的基础上,综合运用requests库和Beauti ...

【Python】网络爬虫(静态网站)实例

【Python】网络爬虫(静态网站)实例相关推荐

最新文章

热门文章