python爬虫(1)电影

应用正则表达式和requests库爬取电影

import json
import requests
from requests.exceptions import RequestException
import re
import timedef get_one_page(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'}  #改写浏览器信息 以免访问失败response = requests.get(url, headers=headers)if response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef parse_one_page(html):pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?data-src="(.*?)".*?name"><a'+ '.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'+ '.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>', re.S)  #正则表达式法获取内容items = re.findall(pattern, html) #获取全部内容for item in items:yield {'index': item[0],'image': item[1],'title': item[2],'actor': item[3].strip()[3:],'time': item[4].strip()[5:],'score': item[5] + item[6]}def write_to_file(content):with open('abc.txt', 'a', encoding='utf-8') as f: #写入文件里f.write(json.dumps(content, ensure_ascii=False) + '\n') def main(offset):url = 'http://maoyan.com/board/4?offset=' + str(offset)html = get_one_page(url)for item in parse_one_page(html):print(item)write_to_file(item)if __name__ == '__main__':  #调用模块获取函数for i in range(10):main(offset=i * 10)
time.sleep(1)

若想了解更多基础知识请点击下面链接获取
requests库详细讲解

正则表达式详细讲解

python爬虫(1)电影相关推荐

【大数据分析专业毕设之基于python爬虫的电影票房大数据预测分析+大屏可视化分析
[大数据分析专业毕设之基于python爬虫的电影票房大数据预测分析+大屏可视化分析-哔哩哔哩https://b23.tv/saIKtBH flask web框架,数据使用requests模块爬取数据, ...
python爬虫—豆瓣电影海报（按类别）
原文地址:http://www.alannah.cn/2019/04/06/getdouban/ python爬虫-豆瓣电影海报目标:通过python爬虫在豆瓣电影上按类别对电影海报等数据进行抓取, ...
python 爬虫实例电影-Python爬虫教程-17-ajax爬取实例（豆瓣电影）
Python爬虫教程-17-ajax爬取实例(豆瓣电影) ajax: 简单的说,就是一段js代码,通过这段代码,可以让页面发送异步的请求,或者向服务器发送一个东西,即和服务器进行交互对于ajax: ...
Python爬虫豆瓣电影top250
我的另一篇博客,Python爬虫豆瓣读书评分9分以上榜单有了上次的基础,这次简单爬了下豆瓣上电影TOP250,链接豆瓣电影TOP250. 打开链接,查看网页源代码,查找我们需要的信息的字 ...
python爬虫豆瓣电影按电影类型,豆瓣电影---按分类爬取
全部代码以及分析见GitHub:https://github.com/dta0502/douban-movie 我突然想看下有什么电影可以看.由于我偏爱剧情类电影,因此我用Python爬虫来爬取剧情类 ...
Python爬虫—猫眼电影排行TOP100
学习Python爬虫也有段时间了,之前也看了许多大神的案例,自己也琢磨了爬取原理,由于是个人公号记录,就省去教科书式教学,纯属记录,由于爬取过程中一直着力于自身技能,所有代码均未作批注,以后文章中必须 ...
Python爬虫-猫眼电影《冰雪奇缘2》评论数据的可视化分析
[TPython爬虫-猫眼电影<冰雪奇缘2>评论数据的可视化分析项目简介爬虫,称为网页蜘蛛或网络机器人,用于自动获取互联网上的信息.我通过python爬虫来爬取猫眼电影的评论,对最新热 ...
python爬虫豆瓣电影评价_使用爬虫爬取豆瓣电影影评数据Python版
在使用爬虫爬取豆瓣电影影评数据Java版一文中已详细讲解了爬虫的实现细节,本篇仅为展示Python版本爬虫实现,所以直接上代码完整代码爬虫主程序 # 爬虫启动入口 from C02.data ...
python爬虫获取电影天堂中电影的标题与下载地址，并用正则表达匹配电影类型
在电影天堂的列表页面,爬取每个链接的子页面中的,电影标题以及下载地址,并用正则表达式匹配出想要的电影类型源代码获取: https://github.com/akh5/Python/blob/mast ...
python爬虫豆瓣电影
最近学习python 顺便写下爬虫练手爬的是豆瓣电影排行榜 http://movie.douban.com/chart python版本2.7.6 // 安装 Beautiful Soup sudo ...

python爬虫(1)电影

python爬虫(1)电影相关推荐

最新文章

热门文章