上一篇文章：https://blog.csdn.net/weixin_44826986/article/details/124138028

1.爬取流程

1.1 接口导入

我们以demo.py为基础进行爬取
我们要爬取的网站是https://spa1.scrape.center/
但我们发现它使用接口，数据不在页面中，在接口中
https://spa1.scrape.center/api/movie?limit=10&offset=0

创建Scrapy文件：
找到目标文件夹：scrapy startproject Movie
进行创建的Moive project:cd Moive
进行部署：scrapy genspider example example.com
example和example.com可以随便写，进入项目时可更改;

moive.py代码更改如下

import scrapyclass MovieSpider(scrapy.Spider):name = 'movie'allowed_domains = ['spa1.scrape.center']# start_urls = ['http://spa1.scrape.center/']def start_requests(self):# 将从demo.py中获取的接口网页连接加入,列表生成式# range(0 ,101, 10) 取0-100 间隔是10web_url = ['https://spa1.scrape.center/api/movie?limit=10&offset={}'.format(page) for page in range(0, 91, 10)]for i in web_url:# scrapy.Request()参数一必须为stryield scrapy.Request(i, self.get_content)def get_content(self, response):print(response.url)

更改setting.py文件：

Scrapy框架的运行需要在终端输入:scrapy crawl 项目名
项目名就是movie.py文件中的name变量
为了执行方便，可以在该文件位置，创建一个run.py如下：

在run.py中写执行语句，之后只要运行run.py就可以启动框架：

from scrapy.cmdline import executeexecute(['scrapy', 'crawl', 'movie'])

1.2 数据提取

查看response.text：发现为Json数据

{"count":100,"results":[{"id":41,"name":"萤火之森","alias":"蛍火の杜へ","cover":"https://p1.meituan.net/movie/4c55f3bf5fa9660db3cb7014651a0950267034.jpg@464w_644h_1e_1c","categories":["剧情","爱情","动画","奇幻"],"published_at":"2011-09-17","minute":45,"score":8.8,"regions":["日本"]},{"id":42,"name":"素媛","alias":"소원","cover":"https://p0.meituan.net/movie/19653e8af59cf473cd40f9ccc0658d93692304.jpg@464w_644h_1e_1c","categories":["剧情"],"published_at":"2013-10-02","minute":123,"score":8.8,"regions":["韩国"]},{"id":43,"name":"小鞋子","alias":"بچههای آسمان","cover":"https://p1.meituan.net/movie/135c612860fae899df2220149664d97a173555.jpg@464w_644h_1e_1c","categories":["剧情","家庭"],"published_at":null,"minute":89,"score":8.8,"regions":["伊朗"]},{"id":44,"name":"熔炉","alias":"도가니","cover":"https://p1.meituan.net/movie/2a0783b4fd95566568f24adfad2181bb5392280.jpg@464w_644h_1e_1c","categories":["剧情"],"published_at":"2011-09-22","minute":125,"score":8.8,"regions":["韩国"]},{"id":45,"name":"大话西游之大圣娶亲","alias":"A Chinese Odyssey Part Two - Cinderella","cover":"https://p1.meituan.net/moviemachine/508056769092059fe43a611b949f27d14863831.jpg@464w_644h_1e_1c","categories":["喜剧","爱情","奇幻"],"published_at":"2014-10-24","minute":110,"score":8.9,"regions":["中国香港","中国大陆"]},{"id":46,"name":"新龙门客栈","alias":"New Dragon Gate Inn","cover":"https://p1.meituan.net/movie/7833126c8c21a11571bb52fbdece0acb811449.jpg@464w_644h_1e_1c","categories":["动作","爱情","武侠","古装"],"published_at":"2012-02-24","minute":88,"score":8.9,"regions":["中国香港","中国大陆"]},{"id":47,"name":"触不可及","alias":"Intouchables","cover":"https://p1.meituan.net/movie/1e700e53e4fe29dd5942381bb353c8532239179.jpg@464w_644h_1e_1c","categories":["剧情","喜剧"],"published_at":"2011-11-02","minute":112,"score":8.9,"regions":["法国"]},{"id":48,"name":"钢琴家","alias":"The Pianist","cover":"https://p0.meituan.net/movie/bcbe59fc51580317adf94537a61a1a26142090.jpg@464w_644h_1e_1c","categories":["剧情","音乐","传记","历史","战争"],"published_at":"2002-05-24","minute":150,"score":8.9,"regions":["法国","德国","英国","波兰"]},{"id":49,"name":"本杰明·巴顿奇事","alias":"The Curious Case of Benjamin Button","cover":"https://p0.meituan.net/movie/2526f77c650bf7cf3d5ee2dccdeac332244951.jpg@464w_644h_1e_1c","categories":["剧情","爱情","奇幻"],"published_at":"2008-12-25","minute":166,"score":8.9,"regions":["美国"]},{"id":50,"name":"倩女幽魂","alias":"A Chinese Ghost Story","cover":"https://p1.meituan.net/movie/96d98200d2afb4b87ff189f9c15b6545568339.jpg@464w_644h_1e_1c","categories":["爱情","奇幻","武侠","古装"],"published_at":"2011-04-30","minute":98,"score":8.9,"regions":["中国香港"]}]}

添加Json库：import json
执行以下语句：

     # 将接口中提取的Json数据转为dict字典source = json.loads(response.text)print(source)

{'count': 100, 'results': [{'id': 71, 'name': '当幸福来敲门', 'alias': 'The Pursuit of Happyness', 'cover': 'https://p1.meituan.net/movie/7d1d85610651dbe1c8687781a87d1008184950.jpg@464w_644h_1e_1c', 'categories': ['剧情', '家庭', '传记'], 'published_at': '2008-01-17', 'minute': 117, 'score': 8.9, 'regions': ['美国']}, {'id': 72, 'name': '幽灵公主', 'alias': 'もののけ姫', 'cover': 'https://p0.meituan.net/movie/a08f65e6cb50fab32df5da69ff116f593095363.jpg@464w_644h_1e_1c', 'categories': ['动画', '奇幻', '冒险'], 'published_at': '1998-05-01', 'minute': 134, 'score': 8.9, 'regions': ['日本']}, {'id': 73, 'name': '十二怒汉', 'alias': '12 Angry Men', 'cover': 'https://p0.meituan.net/movie/df15efd261060d3094a73ef679888d4f238149.jpg@464w_644h_1e_1c', 'categories': ['剧情'], 'published_at': '1957-04-13', 'minute': 96, 'score': 8.9, 'regions': ['美国']}, {'id': 74, 'name': '搏击俱乐部', 'alias': 'Fight Club', 'cover': 'https://p0.meituan.net/movie/b3defc07dfaa1b6f5b74852ce38a3f8f242792.jpg@464w_644h_1e_1c', 'categories': ['剧情', '动作', '悬疑', '惊悚'], 'published_at': '1999-09-10', 'minute': 139, 'score': 8.9, 'regions': ['美国', '德国']}, {'id': 75, 'name': '疯狂原始人', 'alias': 'The Croods', 'cover': 'https://p1.meituan.net/movie/bc022b86345c643ca21d759166f77a553679589.jpg@464w_644h_1e_1c', 'categories': ['喜剧', '动画', '冒险'], 'published_at': '2013-04-20', 'minute': 98, 'score': 8.9, 'regions': ['美国']}, {'id': 76, 'name': '阿凡达', 'alias': 'Avatar', 'cover': 'https://p1.meituan.net/movie/e540384dc6c9f63bdb27cc554588a77f44305.jpg@464w_644h_1e_1c', 'categories': ['动作', '科幻', '冒险'], 'published_at': '2010-01-04', 'minute': 162, 'score': 8.9, 'regions': ['美国', '英国']}, {'id': 77, 'name': '哈尔的移动城堡', 'alias': 'ハウルの動く城', 'cover': 'https://p0.meituan.net/movie/0127b451d5b8f0679c6f81c8ed414bb2432442.jpg@464w_644h_1e_1c', 'categories': ['动画', '奇幻', '冒险'], 'published_at': '2004-09-05', 'minute': 119, 'score': 8.9, 'regions': ['日本']}, {'id': 78, 'name': '盗梦空间', 'alias': 'Inception', 'cover': 'https://p1.meituan.net/movie/d40efe1183f29d5900f5c60be3c8a89d339225.jpg@464w_644h_1e_1c', 'categories': ['剧情', '科幻', '悬疑', '冒险'], 'published_at': '2010-09-01', 'minute': 148, 'score': 8.9, 'regions': ['美国', '英国']}, {'id': 79, 'name': '忠犬八公的故事', 'alias': "Hachi: A Dog's Tale", 'cover': 'https://p0.meituan.net/movie/5f0a709378d6b567807aa9685610f818282136.jpg@464w_644h_1e_1c', 'categories': ['剧情'], 'published_at': '2009-06-13', 'minute': 93, 'score': 8.9, 'regions': ['美国', '英国']}, {'id': 80, 'name': '拯救大兵瑞恩', 'alias': 'Saving Private Ryan', 'cover': 'https://p1.meituan.net/movie/a2a287c77415dc1f85b04d288f7d63ab1089754.jpg@464w_644h_1e_1c', 'categories': ['剧情', '历史', '战争'], 'published_at': '1998-11-13', 'minute': 169, 'score': 8.9, 'regions': ['美国']}]}

完整代码：

import scrapy
import jsonclass MovieSpider(scrapy.Spider):name = 'movie'allowed_domains = ['spa1.scrape.center']# start_urls = ['http://spa1.scrape.center/']def start_requests(self):# 将从demo.py中获取的接口网页连接加入,列表生成式web_url = ['https://spa1.scrape.center/api/movie?limit=10&offset={}'.format(page) for page in range(0, 91, 10)]for i in web_url:yield scrapy.Request(i, self.get_content)def get_content(self, response):source = json.loads(response.text)for i in source['results']:name = i['name']categories = i['categories']score = i['score']print(name, categories, score)

这样就可以获取到页面中的电影名，电影分类，电影评分

Scrapy爬取2-接口爬取相关推荐

scrapy 斗鱼主播信息爬取
原文链接: scrapy 斗鱼主播信息爬取上一篇: scrapy 妹子图网站全站图片爬取下一篇: TensorFlow models 的slim 模块使用预训练模型进行识别 api http ...
Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）
本次Scrapy爬虫的目标是爬取"融360"网站上所有银行理财产品的信息,并存入MongoDB中.网页的截图如下,全部数据共12多万条. 我们不再过多介绍Scrapy的创建 ...
Crawler之Scrapy：数据挖掘必备的scrapy框架之最完整爬取网页内容攻略
相关文章推荐 Scrapy:Python3版本上安装数据挖掘必备的scrapy框架详细攻略(二最完整爬取网页内容信息攻略) 目录 scrapy框架之最完整爬取网页内容攻略 scrapy框架之最完整爬取 ...
Python实训day07am【爬取数据接口、webdriver、自动化测试工具selenium】
Python实训-15天-博客汇总表目录 1.网络爬虫-课后练习题 1.1.写法1 1.2.写法2 2.Selenium自动化测试工具 2.1.安装工具 2.2.命令行操作直接爬取HTML (30 ...
python 接入百度地图数据包下载_Python爬虫-利用百度地图API接口爬取数据并保存至MySQL数据库...
首先,我这里有一份相关城市以及该城市的公园数量的txt文件: 分析-02.png 其次,利用百度地图API提供的接口爬取城市公园的相关信息. 所利用的API接口有两个: 1.http://api.ma ...
Python爬虫之scrapy框架360全网图片爬取
Python爬虫之scrapy框架360全网图片爬取在这里先祝贺大家程序员节快乐,在此我也有一个好消息送给大家,本人已开通了微信公众号,我会把资源放在公众号上,还请大家小手动一动,关注过微信公众号, ...
基于scrapy下的租房信息爬取与数据展示工具的设计与实现
环境:python 3.6.0 Anaconda custom 64bit 4.3.0 Pycharm x64 专业版 2018.1.2 Web strom x64 专业版 2018.1.3 scra ...
[Python Scrapy爬虫] 二.翻页爬取农产品信息并保存本地
前面 "Python爬虫之Selenium+Phantomjs+CasperJS" 介绍了很多Selenium基于自动测试的Python爬虫程序,主要利用它的xpath语句,通过分 ...
使用Scrapy、PhantomJS和Selenium爬取知网文献摘要
使用Scrapy.PhantomJS和Selenium爬取知网文献摘要.以下例子用于爬取"医药卫生科技"类文献摘要. 1.使用Scrapy创建项目 scrapy startproj ...

Scrapy爬取2-接口爬取

1.爬取流程

1.1 接口导入

1.2 数据提取

Scrapy爬取2-接口爬取相关推荐

最新文章

热门文章